# dbt Developer Hub > End user documentation, guides and technical reference for dbt ## API Reference ### About the Discovery API schema With the Discovery API, you can query the metadata in dbt to learn more about your dbt deployments and the data they generate. You can analyze the data to make improvements. If you are new to the API, refer to [About the Discovery API](https://docs.getdbt.com/docs/dbt-cloud-apis/discovery-api.md) for an introduction. You might also find the [use cases and examples](https://docs.getdbt.com/docs/dbt-cloud-apis/discovery-use-cases-and-examples.md) helpful. The Discovery API *schema* provides all the pieces necessary to query and interact with the Discovery API. The most common queries use the `environment` endpoint: [![](/img/icons/dbt-bit.svg)](https://docs.getdbt.com/docs/dbt-cloud-apis/discovery-schema-environment.md) ###### [Environment schema](https://docs.getdbt.com/docs/dbt-cloud-apis/discovery-schema-environment.md) [Query and compare a model’s definition (intended) and its applied (actual) state.](https://docs.getdbt.com/docs/dbt-cloud-apis/discovery-schema-environment.md) [![](/img/icons/dbt-bit.svg)](https://docs.getdbt.com/docs/dbt-cloud-apis/discovery-schema-environment-applied.md) ###### [Applied schema](https://docs.getdbt.com/docs/dbt-cloud-apis/discovery-schema-environment-applied.md) [Query the actual state of objects and metadata in the warehouse after a \`dbt run\` or \`dbt build\`.](https://docs.getdbt.com/docs/dbt-cloud-apis/discovery-schema-environment-applied.md) [![](/img/icons/dbt-bit.svg)](https://docs.getdbt.com/docs/dbt-cloud-apis/discovery-schema-environment-definition.md) ###### [Definition schema](https://docs.getdbt.com/docs/dbt-cloud-apis/discovery-schema-environment-definition.md) [Query intended state in project code and configuration defined in your dbt project.](https://docs.getdbt.com/docs/dbt-cloud-apis/discovery-schema-environment-definition.md) [![](/img/icons/dbt-bit.svg)](https://docs.getdbt.com/docs/dbt-cloud-apis/discovery-schema-environment-applied-modelHistoricalRuns.md) ###### [Model Historical Runs schema](https://docs.getdbt.com/docs/dbt-cloud-apis/discovery-schema-environment-applied-modelHistoricalRuns.md) [Query information about a model's run history.](https://docs.getdbt.com/docs/dbt-cloud-apis/discovery-schema-environment-applied-modelHistoricalRuns.md) #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### About the Discovery API StarterEnterpriseEnterprise + ### About the Discovery API [Starter](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing")[Enterprise](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing")[Enterprise +](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing") Every time dbt runs a project, it generates and stores information about the project. The metadata includes details about your project’s models, sources, and other nodes along with their execution results. With the dbt Discovery API, you can query this comprehensive information to gain a better understanding of your DAG and the data it produces. By leveraging the metadata in dbt, you can create systems for data monitoring and alerting, lineage exploration, and automated reporting. This can help you improve data discovery, data quality, and pipeline operations within your organization. You can access the Discovery API through [ad hoc queries](https://docs.getdbt.com/docs/dbt-cloud-apis/discovery-querying.md), custom applications, a wide range of [partner ecosystem integrations](https://www.getdbt.com/product/integrations/) (like BI/analytics, catalog and governance, and quality and observability), and by using dbt features like [model timing](https://docs.getdbt.com/docs/deploy/run-visibility.md#model-timing) and [data health tiles](https://docs.getdbt.com/docs/explore/data-tile.md). [![A rich ecosystem for integration ](/img/docs/dbt-cloud/discovery-api/discovery-api-figure.png?v=2 "A rich ecosystem for integration ")](#)A rich ecosystem for integration You can query the dbt metadata: * At the [environment](https://docs.getdbt.com/docs/environments-in-dbt.md) level for both the latest state (use the `environment` endpoint) and historical run results (use `modelHistoricalRuns`) of a dbt project in production. * At the job level for results on a specific dbt job run for a given resource type, like `models` or `test`. #### Prerequisites[​](#prerequisites "Direct link to Prerequisites") * You must have a dbt [multi-tenant](https://docs.getdbt.com/docs/cloud/about-cloud/tenancy.md#multi-tenant) or [single tenant](https://docs.getdbt.com/docs/cloud/about-cloud/tenancy.md#single-tenant) account. * You must be on a [Starter, Enterprise, or Enterprise+ plan](https://www.getdbt.com/pricing/). * Your projects must be on a dbt [release tracks](https://docs.getdbt.com/docs/dbt-versions/cloud-release-tracks.md) or dbt version 1.0 or later. Refer to [Upgrade dbt version in Cloud](https://docs.getdbt.com/docs/dbt-versions/upgrade-dbt-version-in-cloud.md) to upgrade. #### What you can use the Discovery API for[​](#what-you-can-use-the-discovery-api-for "Direct link to What you can use the Discovery API for") Click the following tabs to learn more about the API's use cases, the analysis you can do, and the results you can achieve by integrating with it. To use the API directly or integrate your tool with it, refer to [Uses case and examples](https://docs.getdbt.com/docs/dbt-cloud-apis/discovery-use-cases-and-examples.md) for detailed information. * Performance * Quality * Discovery * Governance * Development Use the API to look at historical information like model build time to determine the health of your dbt projects. Finding inefficiencies in orchestration configurations can help decrease infrastructure costs and improve timeliness. To learn more about how to do this, refer to [Performance](https://docs.getdbt.com/docs/dbt-cloud-apis/discovery-use-cases-and-examples.md#performance). You can use, for example, the [model timing](https://docs.getdbt.com/docs/deploy/run-visibility.md#model-timing) tab to help identify and optimize bottlenecks in model builds: [![Model timing visualization in dbt](/img/docs/dbt-cloud/discovery-api/model-timing.png?v=2 "Model timing visualization in dbt")](#)Model timing visualization in dbt Use the API to determine if the data is accurate and up-to-date by monitoring test failures, source freshness, and run status. Accurate and reliable information is valuable for analytics, decisions, and monitoring to help prevent your organization from making bad decisions. To learn more about this, refer to [Quality](https://docs.getdbt.com/docs/dbt-cloud-apis/discovery-use-cases-and-examples.md#quality). When used with [webhooks](https://docs.getdbt.com/docs/deploy/webhooks.md), it can also help with detecting, investigating, and alerting issues. Use the API to find and understand dbt assets in integrated tools using information like model and metric definitions, and column information. For more details, refer to [Discovery](https://docs.getdbt.com/docs/dbt-cloud-apis/discovery-use-cases-and-examples.md#discovery). Data producers must manage and organize data for stakeholders, while data consumers need to quickly and confidently analyze data on a large scale to make informed decisions that improve business outcomes and reduce organizational overhead. The API is useful for discovery data experiences in catalogs, analytics, apps, and machine learning (ML) tools. It can help you understand the origin and meaning of datasets for your analysis. [![Data lineage produced by dbt](/img/docs/collaborate/dbt-explorer/example-model-details.png?v=2 "Data lineage produced by dbt")](#)Data lineage produced by dbt Use the API to review who developed the models and who uses them to help establish standard practices for better governance. For more details, refer to [Governance](https://docs.getdbt.com/docs/dbt-cloud-apis/discovery-use-cases-and-examples.md#governance). Use the API to review dataset changes and uses by examining exposures, lineage, and dependencies. From the investigation, you can learn how to define and build more effective dbt projects. For more details, refer to [Development](https://docs.getdbt.com/docs/dbt-cloud-apis/discovery-use-cases-and-examples.md#development). [![Use exposures to embed data health tiles in your dashboards to distill trust signals for data consumers.](/img/docs/collaborate/dbt-explorer/data-tile-pass.jpg?v=2 "Use exposures to embed data health tiles in your dashboards to distill trust signals for data consumers.")](#)Use exposures to embed data health tiles in your dashboards to distill trust signals for data consumers. #### Types of project state[​](#types-of-project-state "Direct link to Types of project state") You can query these two types of [project state](https://docs.getdbt.com/docs/dbt-cloud-apis/project-state.md) at the environment level: * **Definition** — The logical state of a dbt project’s [resources](https://docs.getdbt.com/docs/build/projects.md) that update when the project is changed. * **Applied** — The output of successful dbt DAG execution that creates or describes the state of the database (for example: `dbt run`, `dbt test`, source freshness, and so on) These states allow you to easily examine the difference between a model’s definition and its applied state so you can get answers to questions like, did the model run? or did the run fail? Applied models exist as a table/view in the data platform given their most recent successful run. #### Related docs[​](#related-docs "Direct link to Related docs") * [Use cases and examples for the Discovery API](https://docs.getdbt.com/docs/dbt-cloud-apis/discovery-use-cases-and-examples.md) * [Query the Discovery API](https://docs.getdbt.com/docs/dbt-cloud-apis/discovery-querying.md) * [Schema](https://docs.getdbt.com/docs/dbt-cloud-apis/discovery-schema-job.md) #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Account-scoped personal access tokens warning User API tokens have been deprecated and will no longer work. [Migrate](#migrate-deprecated-user-api-keys-to-personal-access-tokens) to personal access tokens to resume services. Each dbt user with a [Developer license](https://docs.getdbt.com/docs/cloud/manage-access/seats-and-users.md) can create a new personal access token (PAT) to access the dbt API and dbt CLI. This token can execute queries against the dbt API on the user's behalf. To access dbt APIs and resources on behalf of the *account*, we recommend using service tokens instead. Learn more about [which token type you should use](https://docs.getdbt.com/docs/dbt-cloud-apis/authentication.md#which-token-type-should-you-use) to understand the token differences. PATs inherit the permissions of the user that created them. For example, if a developer-licensed user with Project Admin role access to specific projects creates a PAT, the token will get the Project Admin role with access to the same projects as the user. These tokens are also account-specific, so if a user has access to more than one dbt account with the same email address, they need to create a unique PAT for each one of these accounts. #### Create a personal access token[​](#create-a-personal-access-token "Direct link to Create a personal access token") Creating an account-scoped PAT requires only a few steps. 1. Navigate to your **Account Settings**, expand **API tokens** and click **Personal tokens**. 2. Click **Create personal access token**. 3. Give the token a descriptive name and click **Save**. 4. Copy the token before closing the window. *It will not be available after, and you will have to create a new token if you lose it.* To maintain best security practices, it's recommended that you regularly rotate your PATs. To do so, create a new token and delete the old one once it's in place. #### Delete a personal access token[​](#delete-a-personal-access-token "Direct link to Delete a personal access token") To permanently delete a PAT: 1. Navigate to your **Account Settings**, expand **API tokens** and click **Personal tokens**. 2. Find the token you want to delete and click "X" to the right of the token description fields. 3. **Confirm delete** and the token will no longer be valid. #### Migrate deprecated user API keys to personal access tokens[​](#migrate-deprecated-user-api-keys-to-personal-access-tokens "Direct link to Migrate deprecated user API keys to personal access tokens") The migration to PATs is critical if you are using user API keys today. The current API key is located under **Personal Settings → API Key**. There are a few things to understand if you are using a user API key today: * PATs are more secure. * To promote the least privilege and high-security assurance for your dbt accounts, we highly recommend moving to the new account-scoped PATs. * You must create and use unique tokens in each one of your dbt accounts that share the same email address. * For example, if belongs to two dbt accounts: Spice Harvesting Account and Guild Navigator Account. Before this release, the same API key was used to access both of these accounts. * After this release, Paul has to individually go into these accounts and create a unique PAT for each account he wants to access the API for. These PATs are account-specific and not user specific. * Cross-Account API endpoints will change in behavior when using PATs. * These are namely /v2/accounts and /v3/accounts. Since all PATs are now account specific, getting all accounts associated with a username cannot work. /v3/accounts will only return account metadata that’s relevant to the PAT that’s being used. * User account metadata will only contain information about the specific account under which the request is being made. * Any other accounts that belong to that user account will need to be requested through the PAT that belongs to that account. Undocumented APIs If you’re using any undocumented and unsupported API endpoints, please note that these can be deprecated without any notice. If you are using any undocumented endpoints and have use-cases that are not satisfied by the current API, please reach out to . ##### Using the personal access tokens[​](#using-the-personal-access-tokens "Direct link to Using the personal access tokens") Are you using a user API key today to access dbt APIs in any of your workflows? If not, you don’t have any action to take. If you are using a user API key, please follow the instructions below. 1. Make a list of all the places where you’re making a call to the dbt API using the dbt user API key. 2. Create a new PAT under **Account Settings → API Tokens → Personal Tokens.** For instructions, see [Create a personal access token](#create-a-personal-access-token). 3. Replace the API key in your APIs with the PAT you created. You can use a PAT wherever you previously used an API key. To replace the API key with a PAT, include the PAT in the Authorization header of your API requests. For example: `Authorization: Bearer `. Make sure to replace `` with the new PAT you created. note The option to rotate API keys is used for existing API keys, not for replacing them with PATs. You do not need to replace your API key with a PAT in the dbt UI. 4. Ensure that you’re using a PAT only where it's needed. For flows that require a service account, please use a service token. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### APIs overview StarterEnterpriseEnterprise + ### APIs overview [Starter](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing")[Enterprise](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing")[Enterprise +](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing") Accounts on the Starter, Enterprise, and Enterprise+ plans can query the dbt APIs. dbt provides the following APIs: * The [dbt Administrative API](https://docs.getdbt.com/docs/dbt-cloud-apis/admin-cloud-api.md) can be used to administrate a dbt account. It can be called manually or with [the dbt Terraform provider](https://registry.terraform.io/providers/dbt-labs/dbtcloud/latest). * The [dbt Discovery API](https://docs.getdbt.com/docs/dbt-cloud-apis/discovery-api.md) can be used to fetch metadata related to the state and health of your dbt project. * The [Semantic Layer APIs](https://docs.getdbt.com/docs/dbt-cloud-apis/sl-api-overview.md) provides multiple API options which allow you to query your metrics defined in the Semantic Layer. If you want to learn more about webhooks, refer to [Webhooks for your jobs](https://docs.getdbt.com/docs/deploy/webhooks.md). #### How to Access the APIs[​](#how-to-access-the-apis "Direct link to How to Access the APIs") dbt supports two types of API Tokens: [personal access tokens](https://docs.getdbt.com/docs/dbt-cloud-apis/user-tokens.md) and [service account tokens](https://docs.getdbt.com/docs/dbt-cloud-apis/service-tokens.md). Requests to the dbt APIs can be authorized using these tokens. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Applied object schema The applied object allows you to query information about a particular model based on `environmentId`. The [Example queries](#example-queries) illustrate a few fields you can query with this `environment` object. Refer to [Fields](#fields) to view the entire schema, which provides all possible fields you can query. ##### Example queries[​](#example-queries "Direct link to Example queries") You can use your production environment's `id`: ```graphql query Example { environment(id: 834){ # Get the latest state of the production environment applied { # The state of an executed node as it exists as an object in the database models(first: 100){ # Pagination to ensure manageable response for large projects edges { node { uniqueId, name, description, rawCode, compiledCode, # Basic properties database, schema, alias, # Table/view identifier (can also filter by) executionInfo {executeCompletedAt, executionTime}, # Metadata from when the model was built tests {name, executionInfo{lastRunStatus, lastRunError}}, # Latest test results catalog {columns {name, description, type}, stats {label, value}}, # Catalog info ancestors(types:[Source]) {name, ...on SourceAppliedStateNestedNode {freshness{maxLoadedAt, freshnessStatus}}}, # Source freshness } children {name, resourceType}}} # Immediate dependencies in lineage totalCount } # Number of models in the project } } } ``` ##### Fields[​](#fields "Direct link to Fields") When querying the `applied` field of `environment`, you can use the following fields. ### Fetching data... #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Authentication tokens [![](/img/icons/dbt-bit.svg)](https://docs.getdbt.com/docs/dbt-cloud-apis/user-tokens.md) ###### [Personal access tokens](https://docs.getdbt.com/docs/dbt-cloud-apis/user-tokens.md) [Learn about user tokens and how to use them to execute queries against the dbt API.](https://docs.getdbt.com/docs/dbt-cloud-apis/user-tokens.md) [![](/img/icons/dbt-bit.svg)](https://docs.getdbt.com/docs/dbt-cloud-apis/service-tokens.md) ###### [Service account tokens](https://docs.getdbt.com/docs/dbt-cloud-apis/service-tokens.md) [Learn how to use service account tokens to securely authenticate with dbt APIs for system-level integrations.](https://docs.getdbt.com/docs/dbt-cloud-apis/service-tokens.md) #### Types of API access tokens[​](#types-of-api-access-tokens "Direct link to Types of API access tokens") **Personal access tokens:** Preferred and secure way of accessing dbt APIs on behalf of a user. PATs are scoped to an account and can be enhanced with more granularity and control. **Service tokens:** Service tokens are similar to service accounts and are the preferred method to enable access on behalf of the dbt account. ##### Which token type should you use[​](#which-token-type-should-you-use "Direct link to Which token type should you use") You should use service tokens broadly for any production workflow where you need a service account. You should use PATs only for developmental workflows *or* dbt client workflows that require user context. The following examples show you when to use a personal access token (PAT) or a service token: * **Connecting a partner integration to dbt** — Some examples include the [Semantic Layer Google Sheets integration](https://docs.getdbt.com/docs/cloud-integrations/avail-sl-integrations.md), Hightouch, Datafold, a custom app you’ve created, etc. These types of integrations should use a service token instead of a PAT because service tokens give you visibility, and you can scope them to only what the integration needs and ensure the least privilege. We highly recommend switching to a service token if you’re using a personal access token for these integrations today. * **Production Terraform** — Use a service token since this is a production workflow and is acting as a service account and not a user account. * **dbt CLI** — Use a PAT since the dbt CLI works within the context of a user (the user is making the requests and has to operate within the context of their user account). * **Testing a custom script and staging Terraform or Postman** — We recommend using a PAT as this is a developmental workflow and is scoped to the user making the changes. When you push this script or Terraform into production, use a service token instead. * **API endpoints requiring user context** — Use PATs to authenticate to any API endpoint that requires user context (for example, endpoints to create and update user credentials). #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### dbt Administrative API StarterEnterpriseEnterprise + ### dbt Administrative API [Starter](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing")[Enterprise](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing")[Enterprise +](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing") The dbt Administrative API is enabled by default for [Starter, Enterprise, and Enterprise+ plans](https://www.getdbt.com/pricing/). It can be used to: * Download artifacts after a job has completed * Kick off a job run from an orchestration tool * Manage your dbt account * and more dbt currently supports two versions of the Administrative API: v2 and v3. In general, v3 is the recommended version to use, but we don't yet have all our v2 routes upgraded to v3. We're currently working on this. If you can't find something in our v3 docs, check out the shorter list of v2 endpoints because you might find it there. Many endpoints of the Administrative API can also be called through the [dbt Terraform provider](https://registry.terraform.io/providers/dbt-labs/dbtcloud/latest). The built-in documentation on the Terraform registry contains [a guide on how to get started with the provider](https://registry.terraform.io/providers/dbt-labs/dbtcloud/latest/docs/guides/1_getting_started) as well as [a page showing all the Terraform resources available](https://registry.terraform.io/providers/dbt-labs/dbtcloud/latest/docs/guides/99_list_resources) to configure. [![](/img/icons/pencil-paper.svg)](https://docs.getdbt.com/dbt-cloud/api-v2) ###### [API v2](https://docs.getdbt.com/dbt-cloud/api-v2) [Our legacy API version, with limited endpoints and features. Contains information not available in v3.](https://docs.getdbt.com/dbt-cloud/api-v2) [![](/img/icons/pencil-paper.svg)](https://docs.getdbt.com/dbt-cloud/api-v3) ###### [API v3](https://docs.getdbt.com/dbt-cloud/api-v3) [Our latest API version, with new endpoints and features.](https://docs.getdbt.com/dbt-cloud/api-v3) [![](/img/icons/pencil-paper.svg)](https://registry.terraform.io/providers/dbt-labs/dbtcloud/latest) ###### [dbt Terraform provider](https://registry.terraform.io/providers/dbt-labs/dbtcloud/latest) [The Terraform provider maintained by dbt Labs which can be used to manage a dbt account.](https://registry.terraform.io/providers/dbt-labs/dbtcloud/latest) [](https://registry.terraform.io/providers/dbt-labs/dbtcloud/latest) #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Definition object schema The definition object allows you to query the logical state of a given project node given its most recent manifest generated models. The [Example queries](#example-queries) illustrate a few fields you can query with this `definition` object. Refer to [Fields](#fields) to view the entire schema, which provides all possible fields you can query. ##### Example queries[​](#example-queries "Direct link to Example queries") You can use your production environment's `id`: ```graphql query Example { environment(id: 834){ # Get the latest state of the production environment definition { # The logical state of a given project node given its most recent manifest generated models(first: 100, filter:{access:public}){ # Filter on model access (or other properties) edges { node { rawCode, # Compare to see if/how the model has changed since the last build jobDefinitionId, runGeneratedAt, # When the code was last compiled or run contractEnforced, group, version}}} # Model governance } } } ``` ##### Fields[​](#fields "Direct link to Fields") When querying the `definition` field of `environment`, you can use the following fields. ### Fetching data... #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Environment object schema You can use the environment object to query and compare definition (intended) and applied (actual) states for nodes (models, seeds, snapshots, models, and more) in your dbt project. For example, you specify an `environmentId` to learn more about a particular model (or other node type) in that environment. The [Example queries](#example-queries) illustrate a few fields you can query with this `environment` object. Refer to [Fields](#fields) to view the entire schema, which provides all possible fields you can query. ##### Arguments[​](#arguments "Direct link to Arguments") When querying for `environment`, you can use the following arguments. ### Fetching data... ##### Example queries[​](#example-queries "Direct link to Example queries") You can use your production environment's `id`: ```graphql query Example { environment(id: 834){ # Get the latest state of the production environment applied { # The state of an executed node as it exists as an object in the database models(first: 100){ # Pagination to ensure manageable response for large projects edges { node { uniqueId, name, description, rawCode, compiledCode, # Basic properties database, schema, alias, # Table/view identifier (can also filter by) executionInfo {executeCompletedAt, executionTime}, # Metadata from when the model was built tests {name, executionInfo{lastRunStatus, lastRunError}}, # Latest test results catalog {columns {name, description, type}, stats {label, value}}, # Catalog info ancestors(types:[Source]) {name, ...on SourceAppliedStateNode {freshness{maxLoadedAt, freshnessStatus}}}, # Source freshness } children {name, resourceType}}} # Immediate dependencies in lineage totalCount } # Number of models in the project } definition { # The logical state of a given project node given its most recent manifest generated models(first: 100, filter:{access:public}){ # Filter on model access (or other properties) edges { node { rawCode, # Compare to see if/how the model has changed since the last build jobDefinitionId, runGeneratedAt, # When the code was last compiled or run contractEnforced, group, version}}} # Model governance } } ``` With the deprecation of the data type `Int` for `id`, below is an example of replacing it with `BigInt`: ```graphql query ($environmentId: BigInt!, $first: Int!) { environment(id: $environmentId) { applied { models(first: $first) { edges { node { uniqueId executionInfo { lastRunId } } } } } } } ``` With the deprecation of `modelByEnvironment`, below is an example of replacing it with `environment`: ```graphql query ($environmentId: BigInt!, $uniqueId: String) { environment(id: $environmentId) { applied { modelHistoricalRuns(uniqueId: $uniqueId) { uniqueId executionTime executeCompletedAt } } } } ``` ##### Fields[​](#fields "Direct link to Fields") When querying an `environment`, you can use the following fields. ### Fetching data... For details on querying the `applied` field of `environment`, you can visit: [Applied](https://docs.getdbt.com/docs/dbt-cloud-apis/discovery-schema-environment-applied.md) For details querying the `definition` field of `environment`, you can visit: [Definition](https://docs.getdbt.com/docs/dbt-cloud-apis/discovery-schema-environment-definition.md) #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Exposure object schema The exposure object allows you to query information about a particular exposure. To learn more, refer to [Add Exposures to your DAG](https://docs.getdbt.com/docs/build/exposures.md). ##### Arguments[​](#arguments "Direct link to Arguments") When querying for an `exposure`, the following arguments are available. ### Fetching data... Below we show some illustrative example queries and outline the schema of the exposure object. ##### Example query[​](#example-query "Direct link to Example query") The example below queries information about an exposure including the owner's name and email, the URL, and information about parent sources and parent models. ```graphql { job(id: 123) { exposure(name: "my_awesome_exposure") { runId projectId name uniqueId resourceType ownerName url ownerEmail parentsSources { uniqueId sourceName name state maxLoadedAt criteria { warnAfter { period count } errorAfter { period count } } maxLoadedAtTimeAgoInS } parentsModels { uniqueId } } } } ``` ##### Fields[​](#fields "Direct link to Fields") When querying for an `exposure`, the following fields are available: ### Fetching data... #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Exposure tile object schema [Exposure health tiles](https://docs.getdbt.com/docs/explore/data-tile.md) distill data health signals for data consumers and can be embedded in downstream tools. You can query information on these tiles from the Discovery API. The [Example query](#example-query) illustrates a few fields you can query with the `exposureTile` object. Refer to [Fields](#fields) to view the entire schema, which provides all possible fields you can query. ##### Arguments[​](#arguments "Direct link to Arguments") When querying for `exposureTile`, you can use the following arguments: ### Fetching data... ##### Example query[​](#example-query "Direct link to Example query") You can specify the `environmentId` and filter by a model's `uniqueId` to understand the data quality and metadata information for the exposure tile associated with the `customers` model in the `marketing` package: ```graphql query { environment(id: 834) { applied { exposureTile( filter: {uniqueId: "model.marketing.customers"} # Use this format for unique ID: RESOURCE_TYPE.PACKAGE_NAME.RESOURCE_NAME ) { accountId # The account ID of this node environmentId projectId exposureType filePath quality } } } } ``` ##### Fields[​](#fields "Direct link to Fields") When querying for `exposureTile`, you can use the following fields: ### Fetching data... #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Exposures object schema [Exposures](https://docs.getdbt.com/docs/build/exposures.md) are dbt resources that represent downstream uses of your project, such as dashboards, applications, or data science pipelines. You can query exposures through the Discovery API to understand which assets depend on your models. The [Example query](#example-query) illustrates a few fields you can query with the `exposures` object. Refer to [Fields](#fields) to view the entire schema, which provides all possible fields you can query. ##### Arguments[​](#arguments "Direct link to Arguments") When querying for `exposures`, you can use the following arguments: ### Fetching data... ##### Example query[​](#example-query "Direct link to Example query") You can specify the `environmentId`, `first: 100`, and filter by model `uniqueIds` to return all the downstream exposures (dashboards, applications, etc.) that depend on the `customers` model in the `marketing` package, limited to the first 100 results: ```graphql query { environment(id: 834) { applied { exposures( filter: { uniqueIds: ["model.marketing.customers"] # Use this format for unique ID: RESOURCE_TYPE.PACKAGE_NAME.RESOURCE_NAME }, first: 100 ) { edges { node { accountId exposureType fqn projectId url } } } } } } ``` ##### Fields[​](#fields "Direct link to Fields") When querying for `exposures`, you can use the following fields: ### Fetching data... ##### Key fields from nodes[​](#key-fields-from-nodes "Direct link to Key fields from nodes") ### Fetching data... #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Exposures object schema The exposures object allows you to query information about all exposures in a given job. To learn more, refer to [Add Exposures to your DAG](https://docs.getdbt.com/docs/build/exposures.md). ##### Arguments[​](#arguments "Direct link to Arguments") When querying for `exposures`, the following arguments are available. ### Fetching data... Below we show some illustrative example queries and outline the schema of the exposures object. ##### Example query[​](#example-query "Direct link to Example query") The example below queries information about all exposures in a given job including the owner's name and email, the URL, and information about parent sources and parent models for each exposure. ```graphql { job(id: 123) { exposures(jobId: 123) { runId projectId name uniqueId resourceType ownerName url ownerEmail parentsSources { uniqueId sourceName name state maxLoadedAt criteria { warnAfter { period count } errorAfter { period count } } maxLoadedAtTimeAgoInS } parentsModels { uniqueId } } } } ``` ##### Fields[​](#fields "Direct link to Fields") When querying for `exposures`, the following fields are available: ### Fetching data... #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### GraphQL StarterEnterpriseEnterprise + ### GraphQL [Starter](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing")[Enterprise](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing")[Enterprise +](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing") [GraphQL](https://graphql.org/) (GQL) is an open-source query language for APIs. It offers a more efficient and flexible approach compared to traditional RESTful APIs. With GraphQL, users can request specific data using a single query, reducing the need for many server round trips. This improves performance and minimizes network overhead. GraphQL has several advantages, such as self-documenting, having a strong typing system, supporting versioning and evolution, enabling rapid development, and having a robust ecosystem. These features make GraphQL a powerful choice for APIs prioritizing flexibility, performance, and developer productivity. #### dbt Semantic Layer GraphQL API[​](#dbt-semantic-layer-graphql-api "Direct link to dbt Semantic Layer GraphQL API") The Semantic Layer GraphQL API allows you to explore and query metrics and dimensions. Due to its self-documenting nature, you can explore the calls conveniently through a schema explorer. The schema explorer URLs vary depending on your [deployment region](https://docs.getdbt.com/docs/cloud/about-cloud/access-regions-ip-addresses.md). Use the following table to find the right link for your region: | Deployment type | Schema explorer URL | | -------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | North America multi-tenant | | | EMEA multi-tenant | | | APAC multi-tenant | | | Single tenant | `https://semantic-layer.YOUR_ACCESS_URL/api/graphql`

Replace `YOUR_ACCESS_URL` with your specific account prefix followed by the appropriate Access URL for your region and plan. | | Multi-cell | `https://YOUR_ACCOUNT_PREFIX.semantic-layer.REGION.dbt.com/api/graphql`

Replace `YOUR_ACCOUNT_PREFIX` with your specific account identifier and `REGION` with your location, which could be `us1.dbt.com`. | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | **Example** * If your Single tenant access URL is `ABC123.getdbt.com`, your schema explorer URL will be `https://semantic-layer.ABC123.getdbt.com/api/graphql`. dbt Partners can use the Semantic Layer GraphQL API to build an integration with the Semantic Layer. Note that the Semantic Layer GraphQL API doesn't support `ref` to call dbt objects. Instead, use the complete qualified table name. If you're using dbt macros at query time to calculate your metrics, you should move those calculations into your Semantic Layer metric definitions as code. #### Requirements to use the GraphQL API[​](#requirements-to-use-the-graphql-api "Direct link to Requirements to use the GraphQL API") * A dbt project on dbt v1.6 or higher * Metrics are defined and configured * A dbt [service token](https://docs.getdbt.com/docs/dbt-cloud-apis/service-tokens.md) with "Semantic Layer Only” and "Metadata Only" permissions or a [personal access token](https://docs.getdbt.com/docs/dbt-cloud-apis/user-tokens.md) #### Using the GraphQL API[​](#using-the-graphql-api "Direct link to Using the GraphQL API") If you're a dbt user or partner with access to dbt and the [Semantic Layer](https://docs.getdbt.com/docs/use-dbt-semantic-layer/dbt-sl.md), you can [set up](https://docs.getdbt.com/docs/use-dbt-semantic-layer/setup-sl.md) and test this API with data from your own instance by configuring the Semantic Layer and obtaining the right GQL connection parameters described in this document. Refer to [Get started with the Semantic Layer](https://docs.getdbt.com/guides/sl-snowflake-qs.md) for more info. Authentication uses either a dbt [service account token](https://docs.getdbt.com/docs/dbt-cloud-apis/service-tokens.md) or a [personal access token](https://docs.getdbt.com/docs/dbt-cloud-apis/user-tokens.md) passed through a header as follows. To explore the schema, you can enter this information in the "header" section. ```shell {"Authorization": "Bearer "} ``` Each GQL request also requires a dbt `environmentId`. The API uses both the service or personal token in the header and `environmentId` for authentication. ##### Metadata calls[​](#metadata-calls "Direct link to Metadata calls") ###### Fetch data platform dialect[​](#fetch-data-platform-dialect "Direct link to Fetch data platform dialect") In some cases in your application, it may be useful to know the dialect or data platform that's internally used for the Semantic Layer connection (such as if you are building `where` filters from a user interface rather than user-inputted SQL). The GraphQL API has an easy way to fetch this with the following query: ```graphql { environmentInfo(environmentId: BigInt!) { dialect } } ``` ###### Fetch available metrics[​](#fetch-available-metrics "Direct link to Fetch available metrics") ```graphql metricsPaginated( environmentId: BigInt! search: String = null groupBy: [GroupByInput!] = null pageNum: Int! = 1 pageSize: Int = null ): MetricResultPage! { items: [Metric!]! pageNum: Int! pageSize: Int totalItems: Int! totalPages: Int! } ``` ###### Fetch available dimensions for metrics[​](#fetch-available-dimensions-for-metrics "Direct link to Fetch available dimensions for metrics") ```graphql dimensionsPaginated( environmentId: BigInt! metrics: [MetricInput!]! search: String = null pageNum: Int! = 1 pageSize: Int = null ): DimensionResultPage! { items: [Dimension!]! pageNum: Int! pageSize: Int totalItems: Int! totalPages: Int! } ``` ###### Fetch available granularities given metrics[​](#fetch-available-granularities-given-metrics "Direct link to Fetch available granularities given metrics") Note: This call for `queryableGranularities` returns only queryable granularities for metric time - the primary time dimension across all metrics selected. ```graphql queryableGranularities( environmentId: BigInt! metrics: [MetricInput!]! ): [TimeGranularity!]! ``` You can also get queryable granularities for all other dimensions using the `dimensions` call: ```graphql { dimensionsPaginated(environmentId: BigInt!, metrics:[{name:"order_total"}]) { items { name queryableGranularities # --> ["DAY", "WEEK", "MONTH", "QUARTER", "YEAR"] } } } ``` You can also optionally access it from the metrics endpoint: ```graphql { metricsPaginated(environmentId: BigInt!) { items { name dimensions { name queryableGranularities } } } } ``` ###### Fetch entities[​](#fetch-entities "Direct link to Fetch entities") ```graphql entitiesPaginated( environmentId: BigInt! metrics: [MetricInput!] = null search: String = null pageNum: Int! = 1 pageSize: Int = null ): EntityResultPage! { items: [Entity!]! pageNum: Int! pageSize: Int totalItems: Int! totalPages: Int! } ``` ###### Fetch entities and dimensions to group metrics[​](#fetch-entities-and-dimensions-to-group-metrics "Direct link to Fetch entities and dimensions to group metrics") ```graphql groupBysPaginated( environmentId: BigInt! metrics: [MetricInput!] = null search: String = null pageNum: Int! = 1 pageSize: Int = null ): EntityDimensionResultPage! { items: [EntityDimension!]! pageNum: Int! pageSize: Int totalItems: Int! totalPages: Int! } ``` ###### Metric types[​](#metric-types "Direct link to Metric types") ```graphql Metric { name: String! description: String type: MetricType! typeParams: MetricTypeParams! filter: WhereFilter dimensions: [Dimension!]! queryableGranularities: [TimeGranularity!]! } ``` ```text MetricType = [SIMPLE, RATIO, CUMULATIVE, DERIVED] ``` ###### Metric type parameters[​](#metric-type-parameters "Direct link to Metric type parameters") ###### Dimension types[​](#dimension-types "Direct link to Dimension types") ```graphql Dimension { name: String! description: String type: DimensionType! typeParams: DimensionTypeParams isPartition: Boolean! expr: String queryableGranularities: [TimeGranularity!]! } ``` ```text DimensionType = [CATEGORICAL, TIME] ``` ###### List saved queries[​](#list-saved-queries "Direct link to List saved queries") List all saved queries for the specified environment: ```graphql savedQueriesPaginated( environmentId: BigInt! search: String = null pageNum: Int! = 1 pageSize: Int = null ): SavedQueryResultPage! { items: [SavedQuery!]! pageNum: Int! pageSize: Int totalItems: Int! totalPages: Int! } ``` ###### List a saved query[​](#list-a-saved-query "Direct link to List a saved query") List a single saved query using environment ID and query name: ```graphql { savedQuery(environmentId: "123", savedQueryName: "query_name") { name description label queryParams { metrics { name } groupBy { name grain datePart } where { whereSqlTemplate } } } } ``` ##### Querying[​](#querying "Direct link to Querying") When querying for data, *either* a `groupBy` *or* a `metrics` selection is required. The following section provides examples of how to query metrics: * [Create query](#create-metric-query) * [Fetch query result](#fetch-query-result) ###### Create query[​](#create-query "Direct link to Create query") ```graphql createQuery( environmentId: BigInt! metrics: [MetricInput!]! groupBy: [GroupByInput!] = null limit: Int = null where: [WhereInput!] = null order: [OrderByInput!] = null ): CreateQueryResult ``` ```graphql MetricInput { name: String! alias: String! } GroupByInput { name: String! grain: TimeGranularity = null } WhereInput { sql: String! } OrderByinput { # -- pass one and only one of metric or groupBy metric: MetricInput = null groupBy: GroupByInput = null descending: Boolean! = false } ``` ###### Fetch query result[​](#fetch-query-result "Direct link to Fetch query result") ```graphql query( environmentId: BigInt! queryId: String! ): QueryResult! ``` The GraphQL API uses a polling process for querying since queries can be long-running in some cases. It works by first creating a query with a mutation, \`createQuery, which returns a query ID. This ID is then used to continuously check (poll) for the results and status of your query. The typical flow would look as follows: 1. Kick off a query ```graphql mutation { createQuery( environmentId: 123456 metrics: [{name: "order_total"}] groupBy: [{name: "metric_time"}] ) { queryId # => Returns 'QueryID_12345678' } } ``` 2. Poll for results ```graphql { query(environmentId: 123456, queryId: "QueryID_12345678") { sql status error totalPages jsonResult arrowResult } } ``` 3. Keep querying 2. at an appropriate interval until status is `FAILED` or `SUCCESSFUL` ##### Output format and pagination[​](#output-format-and-pagination "Direct link to Output format and pagination") ###### Output format[​](#output-format "Direct link to Output format") By default, the output is in Arrow format. You can switch to JSON format using the following parameter. However, due to performance limitations, we recommend using the JSON parameter for testing and validation. The JSON received is a base64 encoded string. To access it, you can decode it using a base64 decoder. The JSON is created from pandas, which means you can change it back to a dataframe using `pandas.read_json(json, orient="table")`. Or you can work with the data directly using `json["data"]`, and find the table schema using `json["schema"]["fields"]`. Alternatively, you can pass `encoded:false` to the jsonResult field to get a raw JSON string directly. ```graphql { query(environmentId: BigInt!, queryId: Int!, pageNum: Int! = 1) { sql status error totalPages arrowResult jsonResult(orient: PandasJsonOrient! = TABLE, encoded: Boolean! = true) } } ``` The results default to the table but you can change it to any [pandas](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_json.html) supported value. ###### Pagination[​](#pagination "Direct link to Pagination") By default, we return 1024 rows per page. If your result set exceeds this, you need to increase the page number using the `pageNum` option. ##### Run a Python query[​](#run-a-python-query "Direct link to Run a Python query") The `arrowResult` in the GraphQL query response is a byte dump, which isn't visually useful. You can convert this byte data into an Arrow table using any Arrow-supported language. Refer to the following Python example explaining how to query and decode the arrow result: ```python import base64 import pyarrow as pa import time headers = {"Authorization":"Bearer "} query_result_request = """ { query(environmentId: 70, queryId: "12345678") { sql status error arrowResult } } """ while True: gql_response = requests.post( "https://semantic-layer.cloud.getdbt.com/api/graphql", json={"query": query_result_request}, headers=headers, ) if gql_response.json()["data"]["status"] in ["FAILED", "SUCCESSFUL"]: break # Set an appropriate interval between polling requests time.sleep(1) """ gql_response.json() => { "data": { "query": { "sql": "SELECT\n ordered_at AS metric_time__day\n , SUM(order_total) AS order_total\nFROM semantic_layer.orders orders_src_1\nGROUP BY\n ordered_at", "status": "SUCCESSFUL", "error": null, "arrowResult": "arrow-byte-data" } } } """ def to_arrow_table(byte_string: str) -> pa.Table: """Get a raw base64 string and convert to an Arrow Table.""" with pa.ipc.open_stream(base64.b64decode(byte_string)) as reader: return pa.Table.from_batches(reader, reader.schema) arrow_table = to_arrow_table(gql_response.json()["data"]["query"]["arrowResult"]) # Perform whatever functionality is available, like convert to a pandas table. print(arrow_table.to_pandas()) """ order_total ordered_at 3 2023-08-07 112 2023-08-08 12 2023-08-09 5123 2023-08-10 """ ``` ##### Additional create query examples[​](#additional-create-query-examples "Direct link to Additional create query examples") The following section provides query examples for the GraphQL API, such as how to query metrics, dimensions, where filters, and more: * [Query metric alias](#query-metric-alias) — Query with metric alias, which allows you to use simpler or more intuitive names for metrics instead of their full definitions. * [Query with a time grain](#query-with-a-time-grain) — Fetch multiple metrics with a change in time dimension granularities. * [Query multiple metrics and multiple dimensions](#query-multiple-metrics-and-multiple-dimensions) — Select common dimensions for multiple metrics. * [Query a categorical dimension on its own](#query-a-categorical-dimension-on-its-own) — Group by a categorical dimension. * [Query with a where filter](#query-with-a-where-filter) — Use the `where` parameter to filter on dimensions and entities using parameters. * [Query with order](#query-with-order) — Query with `orderBy`, accepts basic string that's a Dimension, Metric, or Entity. Defaults to ascending order. * [Query with limit](#query-with-limit) — Query using a `limit` clause. * [Query saved queries](#query-saved-queries) — Query using a saved query using the `savedQuery` parameter for frequently used queries. * [Query with just compiling SQL](#query-with-just-compiling-sql) — Query using a compile keyword using the `compileSql` mutation. * [Query records](#query-records) — View all the queries made in your project. ###### Query metric alias[​](#query-metric-alias "Direct link to Query metric alias") ```graphql mutation { createQuery( environmentId: "123" metrics: [{name: "metric_name", alias: "metric_alias"}] ) { ... } } ``` ###### Query with a time grain[​](#query-with-a-time-grain "Direct link to Query with a time grain") ```graphql mutation { createQuery( environmentId: "123" metrics: [{name: "order_total"}] groupBy: [{name: "metric_time", grain: MONTH}] ) { queryId } } ``` Note that when using granularity in the query, the output of a time dimension with a time grain applied to it always takes the form of a dimension name appended with a double underscore and the granularity level - `{time_dimension_name}__{DAY|WEEK|MONTH|QUARTER|YEAR}`. Even if no granularity is specified, it will also always have a granularity appended to it and will default to the lowest available (usually daily for most data sources). It is encouraged to specify a granularity when using time dimensions so that there won't be any unexpected results with the output data. ###### Query multiple metrics and multiple dimensions[​](#query-multiple-metrics-and-multiple-dimensions "Direct link to Query multiple metrics and multiple dimensions") ```graphql mutation { createQuery( environmentId: "123" metrics: [{name: "food_order_amount"}, {name: "order_gross_profit"}] groupBy: [{name: "metric_time", grain: MONTH}, {name: "customer__customer_type"}] ) { queryId } } ``` ###### Query a categorical dimension on its own[​](#query-a-categorical-dimension-on-its-own "Direct link to Query a categorical dimension on its own") ```graphql mutation { createQuery( environmentId: "123" groupBy: [{name: "customer__customer_type"}] ) { queryId } } ``` ###### Query with a where filter[​](#query-with-a-where-filter "Direct link to Query with a where filter") The `where` filter takes a list argument (or a string for a single input). Depending on the object you are filtering, there are a couple of parameters: * `Dimension()` — Used for any categorical or time dimensions. For example, `Dimension('metric_time').grain('week')` or `Dimension('customer__country')`. * `Entity()` — Used for entities like primary and foreign keys, such as `Entity('order_id')`. Note: If you prefer a `where` clause with a more explicit path, you can optionally use `TimeDimension()` to separate categorical dimensions from time ones. The `TimeDimension` input takes the time dimension and optionally the granularity level. `TimeDimension('metric_time', 'month')`. ```graphql mutation { createQuery( environmentId: "123" metrics:[{name: "order_total"}] groupBy:[{name: "customer__customer_type"}, {name: "metric_time", grain: month}] where:[{sql: "{{ Dimension('customer__customer_type') }} = 'new'"}, {sql:"{{ Dimension('metric_time').grain('month') }} > '2022-10-01'"}] ) { queryId } } ``` ###### Multi-hop joins[​](#multi-hop-joins "Direct link to Multi-hop joins") In cases where you need to query across multiple related tables (multi-hop joins), use the `entity_path` argument to specify the path between related entities. The following are examples of how you can define these joins: * In this example, you're querying the `location_name` dimension but specifying that it should be joined using the `order_id` field. ```sql {{Dimension('location__location_name', entity_path=['order_id'])}} ``` * In this example, the `salesforce_account_owner` dimension is joined to the `region` field, with the path going through `salesforce_account`. ```sql {{ Dimension('salesforce_account_owner__region',['salesforce_account']) }} ``` ###### Query with order[​](#query-with-order "Direct link to Query with order") ```graphql mutation { createQuery( environmentId: "123" metrics: [{name: "order_total"}] groupBy: [{name: "metric_time", grain: MONTH}] orderBy: [{metric: {name: "order_total"}}, {groupBy: {name: "metric_time", grain: MONTH}, descending:true}] ) { queryId } } ``` ###### Query with limit[​](#query-with-limit "Direct link to Query with limit") ```graphql mutation { createQuery( environmentId: "123" metrics: [{name:"food_order_amount"}, {name: "order_gross_profit"}] groupBy: [{name:"metric_time", grain: MONTH}, {name: "customer__customer_type"}] limit: 10 ) { queryId } } ``` ###### Query saved queries[​](#query-saved-queries "Direct link to Query saved queries") This takes the same inputs as the `createQuery` mutation, but includes the field `savedQuery`. You can use this for frequently used queries. ```graphql mutation { createQuery( environmentId: "123" savedQuery: "new_customer_orders" ) { queryId } } ``` A note on querying saved queries When querying [saved queries](https://docs.getdbt.com/docs/build/saved-queries.md),you can use parameters such as `where`, `limit`, `order`, `compile`, and so on. However, keep in mind that you can't access `metric` or `group_by` parameters in this context. This is because they are predetermined and fixed parameters for saved queries, and you can't change them at query time. If you would like to query more metrics or dimensions, you can build the query using the standard format. ###### Query with just compiling SQL[​](#query-with-just-compiling-sql "Direct link to Query with just compiling SQL") This takes the same inputs as the `createQuery` mutation. ```graphql mutation { compileSql( environmentId: "123" metrics: [{name:"food_order_amount"} {name:"order_gross_profit"}] groupBy: [{name:"metric_time", grain: MONTH}, {name:"customer__customer_type"}] ) { sql } } ``` ###### Query records[​](#query-records "Direct link to Query records") Use this endpoint to view all the queries made in your project. This covers both Insights and Semantic Layer queries. ```graphql { queryRecords( environmentId:123 ) { items { queryId status startTime endTime connectionDetails sqlDialect connectionSchema error queryDetails { ... on SemanticLayerQueryDetails { params { type metrics { name } groupBy { name grain } limit where { sql } orderBy { groupBy { name grain } metric { name } descending } savedQuery } } ... on RawSqlQueryDetails { queryStr compiledSql numCols queryDescription queryTitle } } } totalItems pageNum pageSize } } ``` #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### JDBC API StarterEnterpriseEnterprise + ### JDBC API [Starter](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing")[Enterprise](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing")[Enterprise +](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing") The Semantic Layer Java Database Connectivity (JDBC) API enables users to query metrics and dimensions using the JDBC protocol, while also providing standard metadata functionality. A JDBC driver is a software component enabling a Java application to interact with a data platform. Here's some more information about our JDBC API: * The Semantic Layer JDBC API utilizes the open-source JDBC driver with ArrowFlight SQL protocol. * You can download the JDBC driver from [Maven](https://search.maven.org/remotecontent?filepath=org/apache/arrow/flight-sql-jdbc-driver/12.0.0/flight-sql-jdbc-driver-12.0.0.jar). * The Semantic Layer supports ArrowFlight SQL driver version 12.0.0 and higher. * You can embed the driver into your application stack as needed, and you can use dbt Labs' [example project](https://github.com/dbt-labs/example-semantic-layer-clients) for reference. * If you’re a partner or user building a homegrown application, you’ll need to install an AWS root CA to the Java Trust [documentation](https://www.amazontrust.com/repository/) (specific to Java and JDBC call). dbt Labs partners can use the JDBC API to build integrations in their tools with the Semantic Layer #### Using the JDBC API[​](#using-the-jdbc-api "Direct link to Using the JDBC API") If you are a dbt user or partner with access to dbt and the [Semantic Layer](https://docs.getdbt.com/docs/use-dbt-semantic-layer/dbt-sl.md), you can [setup](https://docs.getdbt.com/docs/use-dbt-semantic-layer/setup-sl.md) and test this API with data from your own instance by configuring the Semantic Layer and obtaining the right JDBC connection parameters described in this document. You *may* be able to use our JDBC API with tools that do not have an official integration with the Semantic Layer. If the tool you use allows you to write SQL and either supports a generic JDBC driver option (such as DataGrip) or supports Dremio and uses ArrowFlightSQL driver version 12.0.0 or higher, you can access the Semantic Layer API. Refer to [Get started with the Semantic Layer](https://docs.getdbt.com/guides/sl-snowflake-qs.md) for more info. Note that the Semantic Layer GraphQL API doesn't support `ref` to call dbt objects. Instead, use the complete qualified table name. If you're using dbt macros at query time to calculate your metrics, you should move those calculations into your Semantic Layer metric definitions as code. #### Authentication[​](#authentication "Direct link to Authentication") dbt authorizes requests to the Semantic Layer API. You need to provide an Environment ID, Host, and [service account tokens](https://docs.getdbt.com/docs/dbt-cloud-apis/service-tokens.md) or [personal access tokens](https://docs.getdbt.com/docs/dbt-cloud-apis/user-tokens.md). #### Connection parameters[​](#connection-parameters "Direct link to Connection parameters") The JDBC connection requires a few different connection parameters. This is an example of a URL connection string and the individual components: ```text jdbc:arrow-flight-sql://semantic-layer.cloud.getdbt.com:443?&environmentId=202339&token=AUTHENTICATION_TOKEN ``` | JDBC parameter | Description | Example | | --------------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | --------------------------------------------------------------------------------------------- | | `jdbc:arrow-flight-sql://` | The protocol for the JDBC driver. | `jdbc:arrow-flight-sql://` | | `semantic-layer.cloud.getdbt.com` | The [access URL](https://docs.getdbt.com/docs/cloud/about-cloud/access-regions-ip-addresses.md) for your account's dbt region. You must always add the `semantic-layer` prefix before the access URL. | For dbt deployment hosted in North America, use `semantic-layer.cloud.getdbt.com` | | `environmentId` | The unique identifier for the dbt production environment, you can retrieve this from the dbt URL
when you navigate to **Environments** under **Deploy**. | If your URL ends with `.../environments/222222`, your `environmentId` is `222222`

| | `AUTHENTICATION_TOKEN` | You can use either a dbt [service token](https://docs.getdbt.com/docs/dbt-cloud-apis/service-tokens.md) with “Semantic Layer Only” and "Metadata Only" permissions or a dbt [personal access token](https://docs.getdbt.com/docs/dbt-cloud-apis/user-tokens.md). Create a new service or personal token on the **Account Settings** page. | `token=AUTHENTICATION_TOKEN` | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | \*Note — If you're testing locally on a tool like DataGrip, you may also have to provide the following variable at the end or beginning of the JDBC URL `&disableCertificateVerification=true`. #### Querying the API for metadata[​](#querying-the-api-for-metadata "Direct link to Querying the API for metadata") The Semantic Layer JDBC API has built-in metadata calls which can provide a user with information about their metrics and dimensions. Expand the following toggles for examples and metadata commands:  Fetch defined metrics You can use this query to fetch all defined metrics in your dbt project: ```bash select * from {{ semantic_layer.metrics() }} ```  Fetch dimension for a metric You can use this query to fetch all dimensions for a metric. Note, metrics is a required argument that lists one or multiple metrics in it. ```bash select * from {{ semantic_layer.dimensions(metrics=['food_order_amount'])}} ```  Fetch granularities for metrics You can use this query to fetch queryable granularities for a list of metrics. This API request allows you to only show the time granularities that make sense for the primary time dimension of the metrics (such as metric\_time), but if you want queryable granularities for other time dimensions, you can use the dimensions() call, and find the column queryable\_granularities. Note, metrics is a required argument that lists one or multiple metrics. ```bash select * from {{ semantic_layer.queryable_granularities(metrics=['food_order_amount', 'order_gross_profit'])}} ```  Fetch available metrics given dimensions You can use this query to fetch available metrics given dimensions. This command is essentially the opposite of getting dimensions given a list of metrics. Note, group\_by is a required argument that lists one or multiple dimensions. ```bash select * from {{ semantic_layer.metrics_for_dimensions(group_by=['customer__customer_type']) }} ```  Fetch granularities for all time dimensions You can use this example query to fetch available granularities for all time dimensions (the similar queryable granularities API call only returns granularities for the primary time dimensions for metrics). The following call is a derivative of the dimensions() call and specifically selects the granularity field. ```bash select NAME, QUERYABLE_GRANULARITIES from {{ semantic_layer.dimensions( metrics=["order_total"] ) }} ```  Paginate metadata calls In the case when you don't want to return the full result set from a metadata call, you can paginate the results for both `semantic_layer.metrics()` and `semantic_layer.dimensions()` calls using the `page_size` and `page_number` parameters. * `page_size`: This is an optional variable which sets the number of records per page. If left as None, there is no page limit. * `page_number`: This is an optional variable which specifies the page number to retrieve. Defaults to `1` (first page) if not specified. Examples: ```sql -- Retrieves the 5th page with a page size of 10 metrics select * from {{ semantic_layer.metrics(page_size=10, page_number=5) }} -- Retrieves the 1st page with a page size of 10 metrics select * from {{ semantic_layer.metrics(page_size=10) }} -- Retrieves all metrics without pagination select * from {{ semantic_layer.metrics() }} ``` You can use the same pagination parameters for `semantic_layer.dimensions(...)`.  List saved queries You can use this example query to list all available saved queries in your dbt project. **Command** ```bash select * from semantic_layer.saved_queries() ``` **Output** ```bash | NAME | DESCRIPTION | LABEL | METRICS | GROUP_BY | WHERE_FILTER | ```  Fetch metric aliases You can query metrics using aliases for simpler or more intuitive names, even if the alias isn't defined in the metric configuration. The query returns the alias as the metric name, for example: ```sql select * from {{ semantic_layer.query(metrics=[Metric("metric_name", alias="metric_alias")]) }} ``` In this example, if you define an alias for `revenue` as `banana`, the query will return a column named `banana` even if `banana` isn't defined in the metric configuration. However, when using `where` Jinja clauses, you need to reference the *actual* metric name (`revenue` in this case) instead of the alias. For more a more detailed example, see [Query metric alias](#query-metric-alias). #### Querying the API for values[​](#querying-the-api-for-values "Direct link to Querying the API for values") To query values, the following parameters are available. Your query must have *either* a `metric` **or** a `group_by` parameter to be valid. | Parameter | Description | Example | | ------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | `metrics` | The metric name as defined in your dbt metric configuration | `metrics=['revenue']` | | `group_by` | Dimension names or entities to group by. We require a reference to the entity of the dimension (other than for the primary time dimension), which is pre-appended to the front of the dimension name with a double underscore. | `group_by=['user__country', 'metric_time']` | | `grain` | A parameter specific to any time dimension and changes the grain of the data from the default for the metric. | `group_by=[Dimension('metric_time')`
`grain('week\|day\|month\|quarter\|year')]` | | `where` | A where clause that allows you to filter on dimensions and entities using parameters. This takes a filter list OR string. Inputs come with `Dimension`, and `Entity` objects. Granularity is required if the `Dimension` is a time dimension | `"{{ where=Dimension('customer__country') }} = 'US')"` | | `limit` | Limit the data returned | `limit=10` | | `order` | Order the data returned by a particular field | `order_by=['order_gross_profit']`, use `-` for descending, or full object notation if the object is operated on: `order_by=[Metric('order_gross_profit').descending(True)`] | | `compile` | If true, returns generated SQL for the data platform but does not execute | `compile=True` | | `saved_query` | A saved query you can use for frequently used queries. | `select * from {{ semantic_layer.query(saved_query="new_customer_orders"` | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | ##### Note on time dimensions and `metric_time`[​](#note-on-time-dimensions-and-metric_time "Direct link to note-on-time-dimensions-and-metric_time") You will notice that in the list of dimensions for all metrics, there is a dimension called `metric_time`. `Metric_time` is a reserved keyword for any metric's default aggregation time dimension. For any time-series metric, the `metric_time` keyword should always be available for use in queries. This is a common dimension across *all* metrics in a semantic graph. You can look at a single metric or hundreds of metrics, and if you group by `metric_time`, it will always give you the correct time series. Additionally, when performing granularity calculations that are global (not specific to a particular time dimension), we recommend you always operate on `metric_time` and you will get the correct answer. Note that `metric_time` should be available in addition to any other time dimensions that are available for the metric(s). In the case where you are looking at one metric (or multiple metrics from the same data source), the values in the series for the primary time dimension and `metric_time` are equivalent. #### Examples[​](#examples "Direct link to Examples") The following sections provide examples of how to query metrics using the JDBC API: * [Fetch metadata for metrics](#fetch-metadata-for-metrics) — Filter/add any SQL outside of the templating syntax. * [Query common dimensions](#query-common-dimensions) — Select common dimensions for multiple metrics. * [Query grouped by time](#query-grouped-by-time) — Fetch revenue and new customers grouped by time. * [Query with a time grain](#query-with-a-time-grain) — Fetch multiple metrics with a change in time dimension granularities. * [Group by categorical dimension](#group-by-categorical-dimension) — Group by a categorical dimension. * [Query only a dimension](#query-only-a-dimension) — Get the full list of dimension values for the chosen dimension. * [Query by all dimensions](#query-by-all-dimensions) — Query by all valid dimensions. * [Query with where filters](#query-with-where-filters) — Use the `where` parameter to filter on dimensions and entities using parameters. * [Query with a limit](#query-with-a-limit) — Query using a `limit` or `order_by` clause. * [Query with order by examples](#query-with-order-by-examples) — Query with `order_by`, accepts basic string that's a Dimension, Metric, or Entity. Defaults to ascending order. Add a `-` sign in front of the object for descending order. * [Query with compile keyword](#query-with-compile-keyword) — Query using a compile keyword to preview the final SQL before execution. * [Query a saved query](#query-a-saved-query) — Query using a saved query with optional parameters like `limit` or `where`. * [Query metric alias](#query-metric-alias) — Query metrics using aliases, which allow you to use simpler or more intuitive names for metrics instead of their full definitions. * [Multi-hop joins](#multi-hop-joins) — Query across multiple related tables (multi-hop joins) using the `entity_path` argument to specify the path between related entities. ##### Fetch metadata for metrics[​](#fetch-metadata-for-metrics "Direct link to Fetch metadata for metrics") You can filter/add any SQL outside of the templating syntax. For example, you can use the following query to fetch the name and dimensions for a metric: ```bash select name, dimensions from {{ semantic_layer.metrics() }} WHERE name='food_order_amount' ``` ##### Query common dimensions[​](#query-common-dimensions "Direct link to Query common dimensions") You can select common dimensions for multiple metrics. Use the following query to fetch the name and dimensions for multiple metrics: ```bash select * from {{ semantic_layer.dimensions(metrics=['food_order_amount', 'order_gross_profit']) }} ``` ##### Query grouped by time[​](#query-grouped-by-time "Direct link to Query grouped by time") The following example query uses the [shorthand method](#faqs) to fetch revenue and new customers grouped by time: ```bash select * from {{ semantic_layer.query(metrics=['food_order_amount','order_gross_profit'], group_by=['metric_time']) }} ``` ##### Query with a time grain[​](#query-with-a-time-grain "Direct link to Query with a time grain") Use the following example query to fetch multiple metrics with a change in time dimension granularities: ```bash select * from {{ semantic_layer.query(metrics=['food_order_amount', 'order_gross_profit'], group_by=[Dimension('metric_time').grain('month')]) }} ``` ##### Group by categorical dimension[​](#group-by-categorical-dimension "Direct link to Group by categorical dimension") Use the following query to group by a categorical dimension: ```bash select * from {{ semantic_layer.query(metrics=['food_order_amount', 'order_gross_profit'], group_by=[Dimension('metric_time').grain('month'), 'customer__customer_type']) }} ``` ##### Query only a dimension[​](#query-only-a-dimension "Direct link to Query only a dimension") In this case, you'll get the full list of dimension values for the chosen dimension. ```bash select * from {{ semantic_layer.query(group_by=['customer__customer_type']) }} ``` ##### Query by all dimensions[​](#query-by-all-dimensions "Direct link to Query by all dimensions") You can use the `semantic_layer.query_with_all_group_bys` endpoint to query by all valid dimensions. ```sql select * from {{ semantic_layer.query_with_all_group_bys(metrics =['revenue','orders','food_orders'], compile= True) }} ``` This returns all dimensions that are valid for the set of metrics in the request. ##### Query with where filters[​](#query-with-where-filters "Direct link to Query with where filters") Where filters in API allow for a filter list or string. We recommend using the filter list for production applications as this format will realize all benefits from the Predicate pushdown where possible. Where Filters have a few objects that you can use: * `Dimension()` — Used for any categorical or time dimensions. `Dimension('metric_time').grain('week')` or `Dimension('customer__country')`. * `TimeDimension()` — Used as a more explicit definition for time dimensions, optionally takes in a granularity `TimeDimension('metric_time', 'month')`. * `Entity()` — Used for entities like primary and foreign keys - `Entity('order_id')`. You can use the following example to query using a `where` filter with the string format: ```bash select * from {{ semantic_layer.query(metrics=['food_order_amount', 'order_gross_profit'], group_by=[Dimension('metric_time').grain('month'),'customer__customer_type'], where="{{ Dimension('metric_time').grain('month') }} >= '2017-03-09' AND {{ Dimension('customer__customer_type' }} in ('new') AND {{ Entity('order_id') }} = 10") }} ``` * (Recommended for better performance) Use the following example to query using a `where` filter with a filter list format: ```bash select * from {{ semantic_layer.query(metrics=['food_order_amount', 'order_gross_profit'], group_by=[Dimension('metric_time').grain('month'),'customer__customer_type'], where=["{{ Dimension('metric_time').grain('month') }} >= '2017-03-09'", "{{ Dimension('customer__customer_type') }} in ('new')", "{{ Entity('order_id') }} = 10"]) }} ``` ##### Query with a limit[​](#query-with-a-limit "Direct link to Query with a limit") Use the following example to query using a `limit` or `order_by` clause: ```bash select * from {{ semantic_layer.query(metrics=['food_order_amount', 'order_gross_profit'], group_by=[Dimension('metric_time')], limit=10) }} ``` ##### Query with order by examples[​](#query-with-order-by-examples "Direct link to Query with order by examples") Order By can take a basic string that's a Dimension, Metric, or Entity, and this will default to ascending order ```bash select * from {{ semantic_layer.query(metrics=['food_order_amount', 'order_gross_profit'], group_by=[Dimension('metric_time')], limit=10, order_by=['order_gross_profit']) }} ``` For descending order, you can add a `-` sign in front of the object. However, you can only use this short-hand notation if you aren't operating on the object or using the full object notation. ```bash select * from {{ semantic_layer.query(metrics=['food_order_amount', 'order_gross_profit'], group_by=[Dimension('metric_time')], limit=10, order_by=['-order_gross_profit']) }} ``` If you are ordering by an object that's been operated on (for example, you changed the granularity of the time dimension), or you are using the full object notation, descending order must look like: ```bash select * from {{ semantic_layer.query(metrics=['food_order_amount', 'order_gross_profit'], group_by=[Dimension('metric_time').grain('week')], limit=10, order_by=[Metric('order_gross_profit').descending(True), Dimension('metric_time').grain('week').descending(True) ]) }} ``` Similarly, this will yield ascending order: ```bash select * from {{ semantic_layer.query(metrics=['food_order_amount', 'order_gross_profit'], group_by=[Dimension('metric_time').grain('week')], limit=10, order_by=[Metric('order_gross_profit'), Dimension('metric_time').grain('week')]) }} ``` ##### Query with compile keyword[​](#query-with-compile-keyword "Direct link to Query with compile keyword") * Use the following example to query using a `compile` keyword: ```sql select * from {{ semantic_layer.query(metrics=['food_order_amount', 'order_gross_profit'], group_by=[Dimension('metric_time').grain('month'),'customer__customer_type'], compile=True) }} ``` * Use the following example to compile SQL with a [saved query](https://docs.getdbt.com/docs/build/saved-queries.md). You can use this for frequently used queries. ```sql select * from {{ semantic_layer.query(saved_query="new_customer_orders", limit=5, compile=True}} ``` A note on querying saved queries When querying [saved queries](https://docs.getdbt.com/docs/build/saved-queries.md),you can use parameters such as `where`, `limit`, `order`, `compile`, and so on. However, keep in mind that you can't access `metric` or `group_by` parameters in this context. This is because they are predetermined and fixed parameters for saved queries, and you can't change them at query time. If you would like to query more metrics or dimensions, you can build the query using the standard format. ##### Query a saved query[​](#query-a-saved-query "Direct link to Query a saved query") Use the following example to query a [saved query](https://docs.getdbt.com/docs/build/saved-queries.md): ```sql select * from {{ semantic_layer.query(saved_query="new_customer_orders", limit=5}} ``` The JDBC API will use the saved query (`new_customer_orders`) as defined and apply a limit of 5 records. ##### Query metric alias[​](#query-metric-alias "Direct link to Query metric alias") You can query metrics using aliases, which allow you to use simpler or more intuitive names for metrics instead of their full definitions. ```sql select * from {{ semantic_layer.query(metrics=[Metric("revenue", alias="metric_alias")]) }} ``` For example, let's say your metric configuration includes an alias like `total_revenue_global` for the `order_total` metric. You can query the metric using the alias instead of the original name: ```sql select * from {{ semantic_layer.query(metrics=[Metric("order_total", alias="total_revenue_global")], group_by=['metric_time']) }} ``` The result will be: ```text | METRIC_TIME | TOTAL_REVENUE_GLOBAL | |:-------------:|:------------------: | | 2023-12-01 | 1500.75 | | 2023-12-02 | 1725.50 | | 2023-12-03 | 1850.00 | ``` tip Note that you need to use the actual metric name when using the `where` Jinja clauses. For example, if you used `banana` as an alias for `revenue`, you need to use the actual metric name, `revenue`, in the `where` clause, not `banana`. ```graphql semantic_layer.query(metrics=[Metric("revenue", alias="banana")], where="{{ Metric('revenue') }} > 0") ``` ##### Multi-hop joins[​](#multi-hop-joins "Direct link to Multi-hop joins") In cases where you need to query across multiple related tables (multi-hop joins), use the `entity_path` argument to specify the path between related entities. The following are examples of how you can define these joins: * In this example, you're querying the `location_name` dimension but specifying that it should be joined using the `order_id` field. ```sql {{Dimension('location__location_name', entity_path=['order_id'])}} ``` * In this example, the `salesforce_account_owner` dimension is joined to the `region` field, with the path going through `salesforce_account`. ```sql {{ Dimension('salesforce_account_owner__region',['salesforce_account']) }} ``` #### FAQs[​](#faqs "Direct link to FAQs") I'm receiving an \`Failed ALPN\` error when trying to connect to the dbt Semantic Layer. If you're receiving a `Failed ALPN` error when trying to connect the dbt Semantic Layer with the various [data integration tools](https://docs.getdbt.com/docs/cloud-integrations/avail-sl-integrations.md) (such as Tableau, DBeaver, Datagrip, ADBC, or JDBC), it typically happens when connecting from a computer behind a corporate VPN or Proxy (like Zscaler or Check Point). The root cause is typically the proxy interfering with the TLS handshake as the Semantic Layer uses gRPC/HTTP2 for connectivity. To resolve this: * If your proxy supports gRPC/HTTP2 but isn't configured to allow ALPN, adjust its settings accordingly to allow ALPN. Or create an exception for the dbt domain. * If your proxy does not support gRPC/HTTP2, add an SSL interception exception for the dbt domain in your proxy settings This should help in successfully establishing the connection without the Failed ALPN error.  Why do some dimensions use different syntax, like \`metric\_time\` versus \`Dimension('metric\_time')\`? When you select a dimension on its own, such as `metric_time` you can use the shorthand method which doesn't need the “Dimension” syntax. However, when you perform operations on the dimension, such as adding granularity, the object syntax `[Dimension('metric_time')` is required.  What does the double underscore \`'\_\_'\` syntax in dimensions mean? The double underscore `"__"` syntax indicates a mapping from an entity to a dimension, as well as where the dimension is located. For example, `user__country` means someone is looking at the `country` dimension from the `user` table.  What is the default output when adding granularity? The default output follows the format `{{time_dimension_name}__{granularity_level}}`. So for example, if the `time_dimension_name` is `ds` and the granularity level is yearly, the output is `ds__year`. #### Related docs[​](#related-docs "Direct link to Related docs") * [Semantic Layer integration best practices](https://docs.getdbt.com/guides/sl-partner-integration-guide.md) #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Job object schema The job object allows you to query information about a particular model based on `jobId` and, optionally, a `runId`. If you don't provide a `runId`, the API returns information on the latest runId of a job. The [example query](#example-query) illustrates a few fields you can query in this `job` object. Refer to [Fields](#fields) to see the entire schema, which provides all possible fields you can query. ##### Arguments[​](#arguments "Direct link to Arguments") When querying for `job`, you can use the following arguments. ### Fetching data... ##### Example Query[​](#example-query "Direct link to Example Query") You can use your production job's `id`. ```graphql query JobQueryExample { # Provide runId for looking at specific run, otherwise it defaults to latest run job(id: 940) { # Get all models from this job's latest run models(schema: "analytics") { uniqueId executionTime } # Or query a single node source(uniqueId: "source.jaffle_shop.snowplow.event") { uniqueId sourceName name state maxLoadedAt criteria { warnAfter { period count } errorAfter { period count } } maxLoadedAtTimeAgoInS } } } ``` ##### Fields[​](#fields "Direct link to Fields") When querying an `job`, you can use the following fields. ### Fetching data... #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Lineage object schema The lineage object allows you to query lineage across your resources. The [Example query](#example-query) illustrates a few fields you can query with the `lineage` object. Refer to [Fields](#fields) to view the entire schema, which provides all possible fields you can query. ##### Arguments[​](#arguments "Direct link to Arguments") When querying for `lineage`, you can use the following arguments: ### Fetching data... ##### Example query[​](#example-query "Direct link to Example query") You can specify the `environmentId` and filter by "Model" as the resource type to see lineage information for all models in this environment, including their dependencies, materialization type, and metadata: ```graphql query { environment(id: 834) { applied { lineage( filter: {"types": ["Model"]} # Return results for the Model type ) { name resourceType filePath projectId materializationType parentIds tags uniqueId } } } } ``` ##### Fields[​](#fields "Direct link to Fields") When querying for `lineage`, you can use the following fields: ### Fetching data... #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Model Historical Runs object schema The model historical runs object allows you to query information about a model's run history. The [Example query](#example-query) illustrates a few fields you can query with the `modelHistoricalRuns` object. Refer to [Fields](#fields) to view the entire schema, which provides all possible fields you can query. ##### Arguments[​](#arguments "Direct link to Arguments") When querying for `modelHistoricalRuns`, you can use the following arguments: ### Fetching data... ##### Example query[​](#example-query "Direct link to Example query") You can specify the `environmentId` and the model's `uniqueId` to return the model and its execution history for the `customers` model in the `marketing` package, including performance metrics and test results for the last 20 times it was run, regardless of which job ran it. ```graphql query { environment(id: 834) { applied { modelHistoricalRuns( uniqueId: "model.marketing.customers" # Use this format for unique ID: RESOURCE_TYPE.PACKAGE_NAME.RESOURCE_NAME lastRunCount: 20 ) { runId # Get historical results for a particular model runGeneratedAt executionTime # View build time across runs status tests { name status executeCompletedAt } # View test results across runs } } } } ``` ##### Fields[​](#fields "Direct link to Fields") When querying for `modelHistoricalRuns`, you can use the following fields: ### Fetching data... #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Model object schema The model object allows you to query information about a particular model in a given job. ##### Arguments[​](#arguments "Direct link to Arguments") When querying for a `model`, the following arguments are available. ### Fetching data... Below we show some illustrative example queries and outline the schema (all possible fields you can query) of the model object. ##### Example query for finding parent models and sources[​](#example-query-for-finding-parent-models-and-sources "Direct link to Example query for finding parent models and sources") The example query below uses the `parentsModels` and `parentsSources` fields to fetch information about a model’s parent models and parent sources. The jobID and uniqueID fields are placeholders that you will need to replace with your own values. ```graphql { job(id: 123) { model(uniqueId: "model.jaffle_shop.dim_user") { parentsModels { runId uniqueId executionTime } parentsSources { runId uniqueId state } } } } ``` ##### Example query for model timing[​](#example-query-for-model-timing "Direct link to Example query for model timing") The example query below could be useful if you want to understand information around execution timing on a given model (start, end, completion). ```graphql { job(id: 123) { model(uniqueId: "model.jaffle_shop.dim_user") { runId projectId name uniqueId resourceType executeStartedAt executeCompletedAt executionTime } } } ``` ##### Example query for column-level information[​](#example-query-for-column-level-information "Direct link to Example query for column-level information") You can use the following example query to understand more about the columns of a given model. This query will only work if the job has generated documentation; that is, it will work with the command `dbt docs generate`. ```graphql { job(id: 123) { model(uniqueId: "model.jaffle_shop.dim_user") { columns { name index type comment description tags meta } } } } ``` ##### Fields[​](#fields "Direct link to Fields") When querying for a `model`, the following fields are available: ### Fetching data... #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Models object schema [Models](https://docs.getdbt.com/docs/build/models.md) are the foundational dbt resource that transform raw data into curated datasets using SQL (or Python). Each model represents a single SELECT statement, typically materialized as a table or view in your warehouse. You can query information about models through the Discovery API. The [Example query](#example-query) illustrates a few fields you can query with the `models` object. Refer to [Fields](#fields) to view the entire schema, which provides all possible fields you can query. ##### Arguments[​](#arguments "Direct link to Arguments") When querying for `models`, you can use the following arguments: ### Fetching data... ##### Example query[​](#example-query "Direct link to Example query") You can specify the `environmentId` to return model information for all models in the given environment, including their metadata, configuration, tests, and ownership details, limited to the first 100 results: ```graphql query { environment(id: 834) { applied { models (first: 100) { edges { node { name description access accountId catalog { owner } config environmentId tests { name description } } } } } } } ``` ##### Fields[​](#fields "Direct link to Fields") When querying for `models`, you can use the following fields: ### Fetching data... ##### Key fields from nodes[​](#key-fields-from-nodes "Direct link to Key fields from nodes") ### Fetching data... #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Models object schema The models object allows you to query information about all models in a given job. ##### Arguments[​](#arguments "Direct link to Arguments") When querying for `models`, the following arguments are available. ### Fetching data... Below we show some illustrative example queries and outline the schema of the models object. ##### Example queries[​](#example-queries "Direct link to Example queries") The database, schema, and identifier arguments are all optional. This means that with this endpoint you can: * Find a specific model by providing `..` * Find all of the models in a database and/or schema by providing `` and/or `` ###### Find models by their database, schema, and identifier[​](#find-models-by-their-database-schema-and-identifier "Direct link to Find models by their database, schema, and identifier") The example query below finds a model by its unique database, schema, and identifier. ```graphql { job(id: 123) { models(database:"analytics", schema: "analytics", identifier:"dim_customers") { uniqueId } } } ``` ###### Find models by their schema[​](#find-models-by-their-schema "Direct link to Find models by their schema") The example query below finds all models in this schema and their respective execution times. ```graphql { job(id: 123) { models(schema: "analytics") { uniqueId executionTime } } } ``` ##### Fields[​](#fields "Direct link to Fields") The models object can access the *same fields* as the [Model node](https://docs.getdbt.com/docs/dbt-cloud-apis/discovery-schema-job-model.md). The difference is that the models object can output a list so instead of querying for fields for one specific model, you can query for those parameters for all models within a jobID, database, and so on. When querying for `models`, the following fields are available: ### Fetching data... #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Owners object schema [Owners](https://docs.getdbt.com/docs/build/groups.md) help you identify the user or domain responsible for a dbt asset. For most assets, owners are defined in your project code using groups. Exposures are an exception: for downstream exposures that represent BI assets, owners are automatically pulled from the downstream tool based on who owns that asset. You can query ownership information through the Discovery API. The [Example query](#example-query) illustrates a few fields you can query with the `owners` object. Refer to [Fields](#fields) to view the entire schema, which provides all possible fields you can query. ##### Arguments[​](#arguments "Direct link to Arguments") When querying for `owners`, you can use the following arguments: ### Fetching data... ##### Example query[​](#example-query "Direct link to Example query") You can specify the `environmentId` and "exposure" as the `OwnerResourceType` to return people who own exposures (downstream BI assets) in this environment, including their contact information. ```graphql query { environment(id: 834) { applied { owners(resource: exposure) { email name } } } } ``` ##### Fields[​](#fields "Direct link to Fields") When querying for `owners`, you can use the following fields: ### Fetching data... #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Packages object schema [dbt packages](https://docs.getdbt.com/docs/build/packages.md) are libraries with models, macros, and other resources that tackle a specific problem area utilized by dbt projects. You can query project packages through the Discovery API. The [Example query](#example-query) illustrates a few fields you can query with the `packages` object. Refer to [Fields](#fields) to view the entire schema, which provides all possible fields you can query. ##### Arguments[​](#arguments "Direct link to Arguments") When querying for `packages`, you can use the following arguments: ### Fetching data... ##### Example query[​](#example-query "Direct link to Example query") You can specify the `environmentId` and "model" as the resource to see all dbt packages in this environment that contain model resources: ```graphql query { environment(id: 834) { applied { packages(resource: "model") } } } ``` #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Project state in dbt dbt provides a stateful way of deploying dbt. Artifacts are accessible programmatically via the [Discovery API](https://docs.getdbt.com/docs/dbt-cloud-apis/discovery-querying.md) in the metadata platform. With the implementation of the `environment` endpoint in the Discovery API, we've introduced the idea of multiple states. The Discovery API provides a single API endpoint that returns the latest state of models, sources, and other nodes in the DAG. A single [deployment environment](https://docs.getdbt.com/docs/environments-in-dbt.md) should represent the production state of a given dbt project. There are two states that can be queried in dbt: * **Applied state** refers to what exists in the data warehouse after a successful `dbt run`. The model build succeeds and now exists as a table in the warehouse. * **Definition state** depends on what exists in the project given the code defined in it (for example, manifest state), which hasn’t necessarily been executed in the data platform (maybe just the result of `dbt compile`). #### Definition (logical) vs. applied state of dbt nodes[​](#definition-logical-vs-applied-state-of-dbt-nodes "Direct link to Definition (logical) vs. applied state of dbt nodes") In a dbt project, the state of a node *definition* represents the configuration, transformations, and dependencies defined in the SQL and YAML files. It captures how the node should be processed in relation to other nodes and tables in the data warehouse and may be produced by a `dbt build`, `run`, `parse`, or `compile`. It changes whenever the project code changes. A node’s *applied state* refers to the node’s actual state after it has been successfully executed in the DAG; for example, models are executed; thus, their state is applied to the data warehouse via `dbt run` or `dbt build`. It changes whenever a node is executed. This state represents the result of the transformations and the actual data stored in the database, which for models can be a table or a view based on the defined logic. The applied state includes execution info, which contains metadata about how the node arrived in the applied state: the most recent execution (successful or attempted), such as when it began, its status, and how long it took. Here’s how you’d query and compare the definition vs. applied state of a model using the Discovery API: ```graphql query Compare($environmentId: Int!, $first: Int!) { environment(id: $environmentId) { definition { models(first: $first) { edges { node { name rawCode } } } } applied { models(first: $first) { edges { node { name rawCode executionInfo { executeCompletedAt } } } } } } } ``` Most Discovery API use cases will favor the *applied state* since it pertains to what has actually been run and can be analyzed. #### Affected states by node type[​](#affected-states-by-node-type "Direct link to Affected states by node type") The following table shows the states of dbt nodes and how they are affected by the Discovery API. | Node | Executed in DAG | Created by execution | Exists in database | Lineage | States | | -------------------------------------------------------------------------------------- | --------------- | -------------------- | ------------------ | --------------------- | -------------------- | | [Analysis](https://docs.getdbt.com/docs/build/analyses.md) | No | No | No | Upstream | Definition | | [Data test](https://docs.getdbt.com/docs/build/data-tests.md) | Yes | Yes | No | Upstream | Applied & definition | | [Exposure](https://docs.getdbt.com/docs/build/exposures.md) | No | No | No | Upstream | Definition | | [Group](https://docs.getdbt.com/docs/build/groups.md) | No | No | No | Downstream | Definition | | [Macro](https://docs.getdbt.com/docs/build/jinja-macros.md) | Yes | No | No | N/A | Definition | | [Metric](https://docs.getdbt.com/docs/build/metrics-overview.md) | No | No | No | Upstream & downstream | Definition | | [Model](https://docs.getdbt.com/docs/build/models.md) | Yes | Yes | Yes | Upstream & downstream | Applied & definition | | [Saved queries](https://docs.getdbt.com/docs/build/saved-queries.md)
(not in API) | N/A | N/A | N/A | N/A | N/A | | [Seed](https://docs.getdbt.com/docs/build/seeds.md) | Yes | Yes | Yes | Downstream | Applied & definition | | [Semantic model](https://docs.getdbt.com/docs/build/semantic-models.md) | No | No | No | Upstream & downstream | Definition | | [Snapshot](https://docs.getdbt.com/docs/build/snapshots.md) | Yes | Yes | Yes | Upstream & downstream | Applied & definition | | [Source](https://docs.getdbt.com/docs/build/sources.md) | Yes | No | Yes | Downstream | Applied & definition | | [Unit tests](https://docs.getdbt.com/docs/build/unit-tests.md) | Yes | Yes | No | Downstream | Definition | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | #### Caveats about state/metadata updates[​](#caveats-about-statemetadata-updates "Direct link to Caveats about state/metadata updates") Over time, Cloud Artifacts will provide information to maintain state for features/services in dbt and enable you to access state in dbt and its downstream ecosystem. Cloud Artifacts is currently focused on the latest production state, but this focus will evolve. Here are some limitations of the state representation in the Discovery API: * Users must access the default production environment to know the latest state of a project. * The API gets the definition from the latest manifest generated in a given deployment environment, but that often won’t reflect the latest project code state. * Compiled code results may be outdated depending on dbt run step order and failures. * Catalog info can be outdated, or incomplete (in the applied state), based on if/when `docs generate` was last run. * Source freshness checks can be out of date (in the applied state) depending on when the command was last run, and it’s not included in `build`. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Python SDK StarterEnterpriseEnterprise + ### Python SDK [Starter](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing")[Enterprise](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing")[Enterprise +](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing") The [`dbt-sl-sdk` Python software development kit](https://github.com/dbt-labs/semantic-layer-sdk-python) (SDK) is a Python library that provides you with easy access to the dbt Semantic Layer with Python. It allows developers to interact with the dbt Semantic Layer APIs and query metrics and dimensions in downstream tools. #### Installation[​](#installation "Direct link to Installation") To install the Python SDK, you'll need to specify optional dependencies depending on whether you want to use it synchronously, backed by [requests](https://github.com/psf/requests/), or with asynchronous ([asyncio](https://docs.python.org/3/library/asyncio.html) backed by [aiohttp](https://github.com/aio-libs/aiohttp/)). The Python SDK supports the Long-Term Support (LTS) versions of Python, such as 3.9, 3.10, 3.11, and 3.12. When Python discontinues support for a version, the Python SDK will also discontinue support for that version. If you’re using a non-supported version, you may experience compatibility issues and won’t receive updates or security patches from the SDK. * Sync installation * Async installation Sync installation means your program waits for each task to finish before moving on to the next one. It's simpler, easier to understand, and suitable for smaller tasks or when your program doesn't need to handle many tasks at the same time. ```bash pip install "dbt-sl-sdk[sync]" ``` If you're using async frameworks like [FastAPI](https://fastapi.tiangolo.com/) or [Strawberry](https://github.com/strawberry-graphql/strawberry), installing the sync version of the SDK will block your event loop and can significantly slow down your program. In this case, we strongly recommend using async installation. Async installation means your program can start a task and then move on to other tasks while waiting for the first one to finish. This can handle many tasks at once without waiting, making it faster and more efficient for larger tasks or when you need to manage multiple tasks at the same time. For more details, refer to [asyncio](https://docs.python.org/3/library/asyncio.html). ```bash pip install "dbt-sl-sdk[sync]" ``` Since the [Python ADBC driver](https://github.com/apache/arrow-adbc/tree/main/python/adbc_driver_manager) doesn't yet support asyncio natively, `dbt-sl-sdk` uses a [`ThreadPoolExecutor`](https://github.com/dbt-labs/semantic-layer-sdk-python/blob/5e52e1ca840d20a143b226ae33d194a4a9bc008f/dbtsl/api/adbc/client/asyncio.py#L62) to run `query` and `list dimension-values` (all operations that are done with ADBC). This is why you might see multiple Python threads spawning. If you're using async frameworks like [FastAPI](https://fastapi.tiangolo.com/) or [Strawberry](https://github.com/strawberry-graphql/strawberry), installing the sync version of the Python SDK will block your event loop and can significantly slow down your program. In this case, we strongly recommend using async installation. #### Usage[​](#usage "Direct link to Usage") To run operations against the Semantic Layer APIs, instantiate (create an instance of) a `SemanticLayerClient` with your specific [API connection parameters](https://docs.getdbt.com/docs/dbt-cloud-apis/sl-api-overview.md): ```python from dbtsl import SemanticLayerClient client = SemanticLayerClient( environment_id=123, auth_token="", host="semantic-layer.cloud.getdbt.com", ) # query the first metric by `metric_time` def main(): with client.session(): metrics = client.metrics() table = client.query( metrics=[metrics[0].name], group_by=["metric_time"], ) print(table) main() ``` **Note**: All method calls that reach out to the APIs need to be within a `client.session()` context manager. This allows the client to establish a connection to the APIs only once and reuse the same connection between API calls. We recommend creating an application-wide session and reusing the same session throughout the application for optimal performance. Creating a session per request is discouraged and inefficient. ##### asyncio usage[​](#asyncio-usage "Direct link to asyncio usage") If you're using asyncio, import `AsyncSemanticLayerClient` from `dbtsl.asyncio`. The `SemanticLayerClient` and `AsyncSemanticLayerClient` APIs are identical, but the async version has async methods that you need to `await`. ```python import asyncio from dbtsl.asyncio import AsyncSemanticLayerClient client = AsyncSemanticLayerClient( environment_id=123, auth_token="", host="semantic-layer.cloud.getdbt.com", ) async def main(): async with client.session(): metrics = await client.metrics() table = await client.query( metrics=[metrics[0].name], group_by=["metric_time"], ) print(table) asyncio.run(main()) ``` ##### Lazy loading for large fields[​](#lazy-loading-for-large-fields "Direct link to Lazy loading for large fields") By default, the Python SDK eagerly loads nested lists of objects such as `dimensions`, `entities`, and `measures` for each `Metric` — even if you don't need them. This is generally convenient, but in large projects, it can lead to slower responses due to the amount of data returned. To improve performance, you can opt into lazy loading by passing `lazy=True` when creating the client. With lazy loading enabled, the SDK skips fetching large nested fields until you explicitly request them on a per-model basis. For example, the following code fetches all available metrics from the metadata API and displays only the dimensions of certain metrics: list\_metrics\_lazy\_sync.py ```python """Fetch all available metrics from the metadata API and display only the dimensions of certain metrics.""" from argparse import ArgumentParser from dbtsl import SemanticLayerClient def get_arg_parser() -> ArgumentParser: p = ArgumentParser() p.add_argument("--env-id", required=True, help="The dbt environment ID", type=int) p.add_argument("--token", required=True, help="The API auth token") p.add_argument("--host", required=True, help="The API host") return p def main() -> None: arg_parser = get_arg_parser() args = arg_parser.parse_args() client = SemanticLayerClient( environment_id=args.env_id, auth_token=args.token, host=args.host, lazy=True, ) with client.session(): metrics = client.metrics() for i, m in enumerate(metrics): print(f"📈 {m.name}") print(f" type={m.type}") print(f" description={m.description}") assert len(m.dimensions) == 0 # skip if index is odd if i & 1: print(" dimensions=skipped") continue # load dimensions only if index is even m.load_dimensions() print(" dimensions=[") for dim in m.dimensions: print(f" {dim.name},") print(" ]") if __name__ == "__main__": main() ``` Refer to the [lazy loading example](https://github.com/dbt-labs/semantic-layer-sdk-python/blob/main/examples/list_metrics_lazy_sync.py) for more details. #### Integrate with dataframe libraries[​](#integrate-with-dataframe-libraries "Direct link to Integrate with dataframe libraries") The Python SDK returns all query data as [pyarrow](https://arrow.apache.org/docs/python/index.html) tables. The Python SDK library doesn't come bundled with [Polars](https://pola.rs/) or [Pandas](https://pandas.pydata.org/). If you use these libraries, add them as dependencies in your project. To use the data with libraries like Polars or Pandas, manually convert the data into the desired format. For example: ###### If you're using pandas[​](#if-youre-using-pandas "Direct link to If you're using pandas") ```python # ... initialize client arrow_table = client.query(...) pandas_df = arrow_table.to_pandas() ``` ###### If you're using polars[​](#if-youre-using-polars "Direct link to If you're using polars") ```python import polars as pl # ... initialize client arrow_table = client.query(...) polars_df = pl.from_arrow(arrow_table) ``` #### Usage examples[​](#usage-examples "Direct link to Usage examples") For additional usage examples, check out the [usage examples](https://github.com/dbt-labs/semantic-layer-sdk-python/tree/main/examples), some of which include: * [Fetching dimension values sync](https://github.com/dbt-labs/semantic-layer-sdk-python/blob/main/examples/fetch_dimension_values_sync.py) * Fetching metrics [async](https://github.com/dbt-labs/semantic-layer-sdk-python/blob/main/examples/fetch_metric_async.py) and [sync](https://github.com/dbt-labs/semantic-layer-sdk-python/blob/main/examples/fetch_metric_sync.py) * [List saved queries async](https://github.com/dbt-labs/semantic-layer-sdk-python/blob/main/examples/list_saved_queries_async.py) #### Disable telemetry[​](#disable-telemetry "Direct link to Disable telemetry") By default, the Python SDK sends some [platform-related information](https://github.com/dbt-labs/semantic-layer-sdk-python/blob/main/dbtsl/env.py) to dbt Labs. To opt-out, set the `PLATFORM.anonymous` attribute to `True`: ```python from dbtsl.env import PLATFORM PLATFORM.anonymous = True # ... initialize client ``` #### Contribute[​](#contribute "Direct link to Contribute") To contribute to this project, check out our [contribution guidelines](https://github.com/dbt-labs/semantic-layer-sdk-python/blob/main/CONTRIBUTING.md) and open a GitHub [issue](https://github.com/dbt-labs/semantic-layer-sdk-python/issues) or [pull request](https://github.com/dbt-labs/semantic-layer-sdk-python/pulls). #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Query the Discovery API The Discovery API supports ad-hoc queries and integrations. If you are new to the API, refer to [About the Discovery API](https://docs.getdbt.com/docs/dbt-cloud-apis/discovery-api.md) for an introduction. Use the Discovery API to evaluate data pipeline health and project state across runs or at a moment in time. dbt Labs provide a default [GraphQL explorer](https://metadata.cloud.getdbt.com/graphql) for this API, enabling you to run queries and browse the schema. However, you can also use any GraphQL client of your choice to query the API. Since GraphQL describes the data in the API, the schema displayed in the GraphQL explorer accurately represents the graph and fields available to query. #### Prerequisites[​](#prerequisites "Direct link to Prerequisites") * You must have a dbt [multi-tenant](https://docs.getdbt.com/docs/cloud/about-cloud/tenancy.md#multi-tenant) or [single tenant](https://docs.getdbt.com/docs/cloud/about-cloud/tenancy.md#single-tenant) account. * You must be on a [Starter, Enterprise, or Enterprise+ plan](https://www.getdbt.com/pricing/). * Your projects must be on a dbt [release tracks](https://docs.getdbt.com/docs/dbt-versions/cloud-release-tracks.md) or dbt version 1.0 or later. Refer to [Upgrade dbt version in Cloud](https://docs.getdbt.com/docs/dbt-versions/upgrade-dbt-version-in-cloud.md) to upgrade. #### Authorization[​](#authorization "Direct link to Authorization") Currently, authorization of requests takes place [using a service token](https://docs.getdbt.com/docs/dbt-cloud-apis/service-tokens.md). dbt admin users can generate a Metadata Only service token that is authorized to execute a specific query against the Discovery API. Once you've created a token, you can use it in the Authorization header of requests to the dbt Discovery API. Be sure to include the Token prefix in the Authorization header, or the request will fail with a `401 Unauthorized` error. Note that `Bearer` can be used instead of `Token` in the Authorization header. Both syntaxes are equivalent. #### Access the Discovery API[​](#access-the-discovery-api "Direct link to Access the Discovery API") 1. Create a [service account token](https://docs.getdbt.com/docs/dbt-cloud-apis/service-tokens.md) to authorize requests. dbt Admin users can generate a *Metadata Only* service token, which can be used to execute a specific query against the Discovery API to authorize requests. 2. Find the API URL to use from the [Discovery API endpoints](#discovery-api-endpoints) table. 3. For specific query points, refer to the [schema documentation](https://docs.getdbt.com/docs/dbt-cloud-apis/discovery-schema-job.md). #### Run queries using HTTP requests[​](#run-queries-using-http-requests "Direct link to Run queries using HTTP requests") You can run queries by sending a `POST` request to the Discovery API, making sure to replace: * `YOUR_API_URL` with the appropriate [Discovery API endpoint](#discovery-api-endpoints) for your region and plan. * `YOUR_TOKEN` in the Authorization header with your actual API token. Be sure to include the Token prefix. * `QUERY_BODY` with a GraphQL query, for example `{ "query": "", "variables": "" }` * `VARIABLES` with a dictionary of your GraphQL query variables, such as a job ID or a filter. * `ENDPOINT` with the endpoint you're querying, such as environment. ```shell curl 'YOUR_API_URL' \ -H 'authorization: Bearer YOUR_TOKEN' \ -H 'content-type: application/json' -X POST --data QUERY_BODY ``` Python example: ```python response = requests.post( 'YOUR_API_URL', headers={"authorization": "Bearer "+YOUR_TOKEN, "content-type": "application/json"}, json={"query": QUERY_BODY, "variables": VARIABLES} ) metadata = response.json()['data'][ENDPOINT] ``` Every query will require an environment ID or job ID. You can get the ID from a dbt URL or using the Admin API. There are several illustrative example queries on this page. For more examples, refer to [Use cases and examples for the Discovery API](https://docs.getdbt.com/docs/dbt-cloud-apis/discovery-use-cases-and-examples.md). #### Discovery API endpoints[​](#discovery-api-endpoints "Direct link to Discovery API endpoints") The following are the endpoints for accessing the Discovery API. Use the one that's appropriate for your region and plan. | Deployment type | Discovery API URL | | -------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | | North America multi-tenant | | | EMEA multi-tenant | | | APAC multi-tenant | | | Multi-cell | `https://YOUR_ACCOUNT_PREFIX.metadata.REGION.dbt.com/graphql`

Replace `YOUR_ACCOUNT_PREFIX` with your specific account identifier and `REGION` with your location, which could be `us1.dbt.com`. | | Single-tenant | `https://metadata.YOUR_ACCESS_URL/graphql`

Replace `YOUR_ACCESS_URL` with your specific account prefix with the appropriate [Access URL](https://docs.getdbt.com/docs/cloud/about-cloud/access-regions-ip-addresses.md) for your region and plan. | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | #### Reasonable use[​](#reasonable-use "Direct link to Reasonable use") Discovery (GraphQL) API usage is subject to request rate and response size limits to maintain the performance and stability of the metadata platform and prevent abuse. Job-level endpoints are subject to query complexity limits. Nested nodes (like parents), code (like rawCode), and catalog columns are considered as most complex. Overly complex queries should be broken up into separate queries with only necessary fields included. dbt Labs recommends using the environment endpoint instead for most use cases to get the latest descriptive and result metadata for a dbt project. #### Retention limits[​](#retention-limits "Direct link to Retention limits") You can use the Discovery API to query data from the previous two months. For example, if today was April 1st, you could query data back to February 1st. #### Run queries with the GraphQL explorer[​](#run-queries-with-the-graphql-explorer "Direct link to Run queries with the GraphQL explorer") You can run ad-hoc queries directly in the [GraphQL API explorer](https://metadata.cloud.getdbt.com/graphql) and use the document explorer on the left-hand side to see all possible nodes and fields. Refer to the [Apollo explorer documentation](https://www.apollographql.com/docs/graphos/explorer/explorer) for setup and authorization information for GraphQL. 1. Access the [GraphQL API explorer](https://metadata.cloud.getdbt.com/graphql) and select fields you want to query. 2. Select **Variables** at the bottom of the explorer and replace any `null` fields with your unique values. 3. [Authenticate](https://www.apollographql.com/docs/graphos/explorer/connecting-authenticating#authentication) using Bearer auth with `YOUR_TOKEN`. Select **Headers** at the bottom of the explorer and select **+New header**. 4. Select **Authorization** in the **header key** dropdown list and enter your Bearer auth token in the **value** field. Remember to include the Token prefix. Your header key should be in this format: `{"Authorization": "Bearer }`.
[![Enter the header key and Bearer auth token values](/img/docs/dbt-cloud/discovery-api/graphql_header.jpg?v=2 "Enter the header key and Bearer auth token values")](#)Enter the header key and Bearer auth token values 1. Run your query by clicking the blue query button in the top right of the **Operation** editor (to the right of the query). You should see a successful query response on the right side of the explorer. [![Run queries using the Apollo Server GraphQL explorer](/img/docs/dbt-cloud/discovery-api/graphql.jpg?v=2 "Run queries using the Apollo Server GraphQL explorer")](#)Run queries using the Apollo Server GraphQL explorer ##### Fragments[​](#fragments "Direct link to Fragments") Use the [`... on`](https://www.apollographql.com/docs/react/data/fragments/) notation to query across lineage and retrieve results from specific node types. ```graphql query ($environmentId: BigInt!, $first: Int!) { environment(id: $environmentId) { applied { models(first: $first, filter: { uniqueIds: "MODEL.PROJECT.MODEL_NAME" }) { edges { node { name ancestors(types: [Model, Source, Seed, Snapshot]) { ... on ModelAppliedStateNestedNode { name resourceType materializedType executionInfo { executeCompletedAt } } ... on SourceAppliedStateNestedNode { sourceName name resourceType freshness { maxLoadedAt } } ... on SnapshotAppliedStateNestedNode { name resourceType executionInfo { executeCompletedAt } } ... on SeedAppliedStateNestedNode { name resourceType executionInfo { executeCompletedAt } } } } } } } } } ``` ##### Pagination[​](#pagination "Direct link to Pagination") Querying large datasets can impact performance on multiple functions in the API pipeline. Pagination eases the burden by returning smaller data sets one page at a time. This is useful for returning a particular portion of the dataset or the entire dataset piece-by-piece to enhance performance. dbt utilizes cursor-based pagination, which makes it easy to return pages of constantly changing data. Use the `PageInfo` object to return information about the page. The available fields are: * `startCursor` string type — Corresponds to the first `node` in the `edge`. * `endCursor` string type — Corresponds to the last `node` in the `edge`. * `hasNextPage` boolean type — Whether or not there are more `nodes` after the returned results. There are connection variables available when making the query: * `first` integer type — Returns the first n `nodes` for each page, up to 500. * `after` string type — Sets the cursor to retrieve `nodes` after. It's best practice to set the `after` variable with the object ID defined in the `endCursor` of the previous page. Below is an example that returns the `first` 500 models `after` the specified Object ID in the variables. The `PageInfo` object returns where the object ID where the cursor starts, where it ends, and whether there is a next page. [![Example of pagination](/img/Paginate.png?v=2 "Example of pagination")](#)Example of pagination Below is a code example of the `PageInfo` object: ```graphql pageInfo { startCursor endCursor hasNextPage } totalCount # Total number of records across all pages ``` ##### Filters[​](#filters "Direct link to Filters") Filtering helps to narrow down the results of an API query. If you want to query and return only models and tests that are failing or find models that are taking too long to run, you can fetch execution details such as [`executionTime`](https://docs.getdbt.com/docs/dbt-cloud-apis/discovery-schema-job-models.md#fields), [`runElapsedTime`](https://docs.getdbt.com/docs/dbt-cloud-apis/discovery-schema-job-models.md#fields), or [`status`](https://docs.getdbt.com/docs/dbt-cloud-apis/discovery-schema-job-models.md#fields). This helps data teams monitor the performance of their models, identify bottlenecks, and optimize the overall data pipeline. Below is an example that filters for results of models that have succeeded on their `lastRunStatus`: [![Example of filtering](/img/Filtering.png?v=2 "Example of filtering")](#)Example of filtering Below is an example that filters for models that have an error on their last run and tests that have failed: ```graphql query ModelsAndTests($environmentId: BigInt!, $first: Int!) { environment(id: $environmentId) { applied { models(first: $first, filter: { lastRunStatus: error }) { edges { node { name executionInfo { lastRunId } } } } tests(first: $first, filter: { status: "fail" }) { edges { node { name executionInfo { lastRunId } } } } } } } ``` #### Related content[​](#related-content "Direct link to Related content") * [Use cases and examples for the Discovery API](https://docs.getdbt.com/docs/dbt-cloud-apis/discovery-use-cases-and-examples.md) * [Schema](https://docs.getdbt.com/docs/dbt-cloud-apis/discovery-schema-job.md) #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Resources object schema The resources object allows you to paginate across all resources in your environment. The [Example query](#example-query) illustrates a few fields you can query with the `resources` object. Refer to [Fields](#fields) to view the entire schema, which provides all possible fields you can query. ##### Arguments[​](#arguments "Direct link to Arguments") When querying for `resources`, you can use the following arguments: ### Fetching data... ##### Example query[​](#example-query "Direct link to Example query") You can specify the `environmentId`, filter by "Model" as the type, and limit to the first 100 results to see comprehensive information about the first 100 model resources in this environment, including their metadata, tags, and file locations: ```graphql query { environment(id: 834) { applied { resources( filter: { types: [ Model ] }, first: 100 ) { edges { node { accountId description environmentId filePath meta name projectId resourceType uniqueId tags } } } } } } ``` ##### Fields[​](#fields "Direct link to Fields") When querying for `resources`, you can use the following fields: ### Fetching data... ##### Key fields from nodes[​](#key-fields-from-nodes "Direct link to Key fields from nodes") ### Fetching data... #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Seed object schema The seed object allows you to query information about a particular seed in a given job. ##### Arguments[​](#arguments "Direct link to Arguments") When querying for a `seed`, the following arguments are available. ### Fetching data... Below we show some illustrative example queries and outline the schema of the seed object. ##### Example query[​](#example-query "Direct link to Example query") The example query below pulls relevant information about a given seed. For instance, you can view the load time. ```graphql { job(id: 123) { seed(uniqueId: "seed.jaffle_shop.raw_customers") { database schema uniqueId name status error } } } ``` ##### Fields[​](#fields "Direct link to Fields") When querying for a `seed`, the following fields are available: ### Fetching data... #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Seeds object schema [Seeds](https://docs.getdbt.com/docs/build/seeds.md) are CSV files in your dbt project that dbt can load into your data warehouse. You can query seeds through the Discovery API. The [Example query](#example-query) illustrates a few fields you can query with the `seeds` object. Refer to [Fields](#fields) to view the entire schema, which provides all possible fields you can query. ##### Arguments[​](#arguments "Direct link to Arguments") When querying for `seeds`, you can use the following arguments: ### Fetching data... ##### Example query[​](#example-query "Direct link to Example query") You can specify the `environmentId`, filter by the database, and limit to the first 100 to show information about the first 100 seed files in the `analytics` database, including their metadata and file locations: ```graphql query ($environmentId: BigInt!, $first: Int!, $filter: GenericMaterializedFilter) { environment(id: $environmentId) { applied { seeds( first: 100, filter: { database: "analytics" } ) { edges { node { description name filePath projectId fqn tags uniqueId resourceType } } } } } } ``` ##### Fields[​](#fields "Direct link to Fields") When querying for `seeds`, you can use the following fields: ### Fetching data... ##### Key fields from nodes[​](#key-fields-from-nodes "Direct link to Key fields from nodes") ### Fetching data... #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Seeds object schema The seeds object allows you to query information about all seeds in a given job. ##### Arguments[​](#arguments "Direct link to Arguments") When querying for `seeds`, the following arguments are available. ### Fetching data... Below we show some illustrative example queries and outline the schema of the seeds object. ##### Example query[​](#example-query "Direct link to Example query") The example query below pulls relevant information about all seeds in a given job. For instance, you can view load times. ```graphql { job(id: 123) { seeds { uniqueId name executionTime status } } } ``` ##### Fields[​](#fields "Direct link to Fields") When querying for `seeds`, the following fields are available: ### Fetching data... #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Semantic Layer APIs StarterEnterpriseEnterprise + ### Semantic Layer APIs [Starter](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing")[Enterprise](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing")[Enterprise +](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing") The rapid growth of different tools in the modern data stack has helped data professionals address the diverse needs of different teams. The downside of this growth is the fragmentation of business logic across teams, tools, and workloads.

The [Semantic Layer](https://docs.getdbt.com/docs/use-dbt-semantic-layer/dbt-sl.md) allows you to define metrics in code (with [MetricFlow](https://docs.getdbt.com/docs/build/about-metricflow.md)) and dynamically generate and query datasets in downstream tools based on their dbt governed assets, such as metrics and models. Integrating with the Semantic Layer will help organizations that use your product make more efficient and trustworthy decisions with their data. It also helps you to avoid duplicative coding, optimize development workflow, ensure data governance, and guarantee consistency for data consumers. You can use the Semantic Layer for a variety of tools and applications of data. Some common use cases are: * Business intelligence (BI), reporting, and analytics * Data quality and monitoring * Governance and privacy * Data discovery and cataloging * Machine learning and data science [![](/img/icons/dbt-bit.svg)](https://docs.getdbt.com/docs/dbt-cloud-apis/sl-graphql.md) ###### [GraphQL API](https://docs.getdbt.com/docs/dbt-cloud-apis/sl-graphql.md) [Use GraphQL to query metrics and dimensions in downstream tools.](https://docs.getdbt.com/docs/dbt-cloud-apis/sl-graphql.md) [![](/img/icons/dbt-bit.svg)](https://docs.getdbt.com/docs/dbt-cloud-apis/sl-jdbc.md) ###### [JDBC API](https://docs.getdbt.com/docs/dbt-cloud-apis/sl-jdbc.md) [Use a JDBC driver to query metrics and dimensions in downstream tools, while also providing standard metadata functionality.](https://docs.getdbt.com/docs/dbt-cloud-apis/sl-jdbc.md) [![](/img/icons/dbt-bit.svg)](https://docs.getdbt.com/docs/dbt-cloud-apis/sl-python.md) ###### [Python SDK](https://docs.getdbt.com/docs/dbt-cloud-apis/sl-python.md) [Use the Python SDK to interact with the dbt Semantic Layer using Python.](https://docs.getdbt.com/docs/dbt-cloud-apis/sl-python.md) #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Service account tokens StarterEnterpriseEnterprise + ### Service account tokens [Starter](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing")[Enterprise](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing")[Enterprise +](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing") Service account tokens enable you to securely authenticate with the dbt API by assigning each token a narrow set of permissions that more precisely manages access to the API. While similar to [personal access tokens](https://docs.getdbt.com/docs/dbt-cloud-apis/user-tokens.md), service account tokens belong to an account rather than a user. You can use service account tokens for system-level integrations that do not run on behalf of any one user. Assign any permission sets available in dbt to your service account token, which can vary slightly depending on your plan: * Enterprise and Enterprise+ plans can apply any permission sets available to service tokens. * Developer and Starter plans can apply Semantic Layer permissions set to service tokens. * Legacy Team plans can apply Account Admin, Member, Job Admin, Read-Only, Metadata, and Semantic Layer permissions set to service tokens. You can assign as many permission sets as needed to one token. For more on permissions sets, see "[Enterprise Permissions](https://docs.getdbt.com/docs/cloud/manage-access/enterprise-permissions.md)." #### Generate service account tokens[​](#generate-service-account-tokens "Direct link to Generate service account tokens") You can generate service tokens if you have a Developer [license](https://docs.getdbt.com/docs/cloud/manage-access/seats-and-users.md) and account admin [permissions](https://docs.getdbt.com/docs/cloud/manage-access/about-user-access.md#permission-sets). To create a service token in dbt, follow these steps: 1. From dbt, click on your account name in the left side menu and select **Account settings**. 2. On the left sidebar, click on **Service Tokens**. 3. Click the **+ New Token** button to generate a new token. 4. Once the token is generated, you won't be able to view this token again so make sure to save it somewhere safe. #### Permissions for service account tokens[​](#permissions-for-service-account-tokens "Direct link to Permissions for service account tokens") You can assign service account tokens to any permission set available in dbt. When you assign a permission set to a token, you will also be able to choose whether to grant those permissions to all projects in the account or to specific projects. ##### Team plans using service account tokens[​](#team-plans-using-service-account-tokens "Direct link to Team plans using service account tokens") The following permissions can be assigned to a service account token on a Team plan. Refer to [Enterprise permissions](https://docs.getdbt.com/docs/cloud/manage-access/enterprise-permissions.md) for more information about these roles. * Account Admin — Account Admin service tokens have full `read + write` access to an account, so please use them with caution. A Team plan refers to this permission set as an "Owner role." * Billing Admin * Job Admin * Metadata Only * Member * Read-only * Semantic Layer Only ##### Enterprise plans using service account tokens[​](#enterprise-plans-using-service-account-tokens "Direct link to Enterprise plans using service account tokens") Refer to [Enterprise permissions](https://docs.getdbt.com/docs/cloud/manage-access/enterprise-permissions.md) for more information about these roles. * Account Admin — Account Admin service tokens have full `read + write` access to an account, so please use them with caution. * Account Viewer * Admin * Analyst * Billing Admin * Database Admin * Developer * Git Admin * Job Admin * Job Runner * Job Viewer * Manage marketplace apps * Metadata Only * Semantic Layer Only * Security Admin * Stakeholder * Team Admin #### Service token update[​](#service-token-update "Direct link to Service token update") On July 18, 2023, dbt Labs changed how tokens are generated and validated to increase performance. These improvements only apply to tokens created after July 18, 2023. Old tokens remain valid, but if they are used in high-frequency API invocations, we recommend you rotate them for reduced latency. To rotate your token: 1. Navigate to **Account settings** and click **Service tokens** on the left side pane. 2. Verify the **Created** date for the token is *on or before* July 18, 2023. 3. Click **+ New Token** on the top right side of the screen. Ensure the new token has the same permissions as the old one. 4. Copy the new token and replace the old one in your systems. Store it in a safe place, as it will not be available again once the creation screen is closed. 5. Delete the old token in dbt by clicking the **trash can icon**. *Only take this action after the new token is in place to avoid service disruptions*. #### FAQs[​](#faqs "Direct link to FAQs") I'm receiving a 403 error 'Forbidden: Access denied' when using service tokens All [service token](https://docs.getdbt.com/docs/dbt-cloud-apis/service-tokens.md) traffic is subject to IP restrictions. When using a service token, the following 403 response error indicates the IP is not on the allowlist. To resolve this, you should add your third-party integration CIDRs (network addresses) to your allowlist. The following is an example of the 403 response error: ```json { "status": { "code": 403, "is_success": False, "user_message": ("Forbidden: Access denied"), "developer_message": None, }, "data": { "account_id": , "user_id": , "is_service_token": , "account_access_denied": True, }, } ``` #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Snapshots object schema [Snapshots](https://docs.getdbt.com/docs/build/snapshots.md) represent point-in-time copies of your data, allowing you to track historical changes. You can query your snapshots from the Discovery API. The [Example query](#example-query) illustrates a few fields you can query with the `snapshots` object. Refer to [Fields](#fields) to view the entire schema, which provides all possible fields you can query. ##### Arguments[​](#arguments "Direct link to Arguments") When querying for `snapshots`, you can use the following arguments: ### Fetching data... ##### Example query[​](#example-query "Direct link to Example query") You can specify the `environmentId`, filter by the database, and limit to the first 100 to see the first 100 snapshots in the `analytics` database, including their execution performance and status information: ```graphql query { environment(id: 834) { applied { snapshots( filter: { database: "analytics" }, first: 100 ) { edges { node { executionInfo { compileCompletedAt compileStartedAt executeCompletedAt executeStartedAt executionTime lastRunStatus lastRunId } fqn name } } } } } } ``` ##### Fields[​](#fields "Direct link to Fields") When querying for `snapshots`, you can use the following fields: ### Fetching data... ##### Key fields from nodes[​](#key-fields-from-nodes "Direct link to Key fields from nodes") ### Fetching data... #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Snapshots object schema The snapshots object allows you to query information about all snapshots in a given job. ##### Arguments[​](#arguments "Direct link to Arguments") When querying for `snapshots`, the following arguments are available. ### Fetching data... Below we show some illustrative example queries and outline the schema of the snapshots object. ##### Example query[​](#example-query "Direct link to Example query") The database, schema, and identifier arguments are optional. This means that with this endpoint you can: * Find a specific snapshot by providing `..` * Find all of the snapshots in a database and/or schema by providing `` and/or `` ###### Find snapshots information for a job[​](#find-snapshots-information-for-a-job "Direct link to Find snapshots information for a job") The example query returns information about all snapshots in this job. ```graphql { job(id: 123) { snapshots { uniqueId name executionTime environmentId executeStartedAt executeCompletedAt } } } ``` ##### Fields[​](#fields "Direct link to Fields") When querying for `snapshots`, the following fields are available: ### Fetching data... #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Source object schema The source object allows you to query information about a particular source in a given job. ##### Arguments[​](#arguments "Direct link to Arguments") When querying for a `source`, the following arguments are available. ### Fetching data... Below we show some illustrative example queries and outline the schema of the source object. ##### Example query[​](#example-query "Direct link to Example query") The query below pulls relevant information about a given source. For instance, you can view the load time and the state (pass, fail, error) of that source. ```graphql { job(id: 123) { source(uniqueId: "source.jaffle_shop.snowplow.event") { uniqueId sourceName name state maxLoadedAt criteria { warnAfter { period count } errorAfter { period count } } maxLoadedAtTimeAgoInS } } } ``` ##### Fields[​](#fields "Direct link to Fields") When querying for a `source`, the following fields are available: ### Fetching data... #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Sources object schema [Sources](https://docs.getdbt.com/docs/build/sources.md) make it possible to name and describe the data loaded into your warehouse by your extract and load tools. You can query sources through the Discovery API. The [Example query](#example-query) illustrates a few fields you can query with the `sources` object. Refer to [Fields](#fields) to view the entire schema, which provides all possible fields you can query. ##### Arguments[​](#arguments "Direct link to Arguments") When querying for `sources`, you can use the following arguments: ### Fetching data... ##### Example query[​](#example-query "Direct link to Example query") You can specify the `environmentId` and filter on the database name, to return the freshness and execution status, for the first 100 sources from the given database: ```graphql query { environment(id: 834) { applied { sources( filter: { database: "analytics" }, first: 100 ) { edges { node { name fqn description filePath freshness { freshnessChecked freshnessStatus } sourceName sourceDescription tests { name description testType executionInfo { lastRunStatus } } } } } } } } ``` ##### Fields[​](#fields "Direct link to Fields") When querying for `sources`, you can use the following fields: ### Fetching data... ##### Key fields from nodes[​](#key-fields-from-nodes "Direct link to Key fields from nodes") ### Fetching data... #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Sources object schema The sources object allows you to query information about all sources in a given job. ##### Arguments[​](#arguments "Direct link to Arguments") When querying for `sources`, the following arguments are available. ### Fetching data... Below we show some illustrative example queries and outline the schema of the sources object. ##### Example queries[​](#example-queries "Direct link to Example queries") The database, schema, and identifier arguments are optional. This means that with this endpoint you can: * Find a specific source by providing `..` * Find all of the sources in a database and/or schema by providing `` and/or `` ###### Finding sources by their database, schema, and identifier[​](#finding-sources-by-their-database-schema-and-identifier "Direct link to Finding sources by their database, schema, and identifier") The example query below finds a source by its unique database, schema, and identifier. ```graphql { job(id: 123) { sources( database: "analytics" schema: "analytics" identifier: "dim_customers" ) { uniqueId } } } ``` ###### Finding sources by their schema[​](#finding-sources-by-their-schema "Direct link to Finding sources by their schema") The example query below finds all sources in this schema and their respective states (pass, error, fail). ```graphql { job(id: 123) { sources(schema: "analytics") { uniqueId state } } } ``` ##### Fields[​](#fields "Direct link to Fields") The sources object can access the *same fields* as the [source node](https://docs.getdbt.com/docs/dbt-cloud-apis/discovery-schema-job-source.md). The difference is that the sources object can output a list so instead of querying for fields for one specific source, you can query for those parameters for all sources within a jobID, database, and so on. When querying for `sources`, the following fields are available: ### Fetching data... #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Tags object schema [Tags](https://docs.getdbt.com/reference/resource-configs/tags.md) provide a mechanism to categorize and group resources within a dbt project, enabling selective execution and management of these resources. You can query tags through the Discovery API. The [Example query](#example-query) illustrates a few fields you can query with the `tags` object. Refer to [Fields](#fields) to view the entire schema, which provides all possible fields you can query. ##### Example query[​](#example-query "Direct link to Example query") You can use the `environmentId` to return the name of all the tags in your environment: ```graphql query { environment(id: 834) { applied { tags { name } } } } ``` ##### Fields[​](#fields "Direct link to Fields") When querying for `tags`, you can use the following fields: ### Fetching data... #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Test object schema The test object allows you to query information about a particular test. ##### Arguments[​](#arguments "Direct link to Arguments") When querying for a `test`, the following arguments are available. ### Fetching data... Below we show some illustrative example queries and outline the schema (all possible fields you can query) of the test object. ##### Example query[​](#example-query "Direct link to Example query") The example query below outputs information about a test including the state of the test result. In order of severity, the result can be one of these: "error", "fail", "warn", or "pass". ```graphql { job(id: 123) { test(uniqueId: "test.internal_analytics.not_null_metrics_id") { runId accountId projectId uniqueId name columnName state } } } ``` ##### Fields[​](#fields "Direct link to Fields") When querying for a `test`, the following fields are available: ### Fetching data... #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Tests object schema [Tests](https://docs.getdbt.com/docs/build/data-tests.md) are assertions you make about your models and other resources in your dbt project. When you run `dbt test`, dbt will tell you if each test in your project passes or fails. You can query tests through the Discovery API to understand information about them. The [Example query](#example-query) illustrates a few fields you can query with the `tests` object. Refer to [Fields](#fields) to view the entire schema, which provides all possible fields you can query. ##### Arguments[​](#arguments "Direct link to Arguments") When querying for `tests`, you can use the following arguments: ### Fetching data... ##### Example query[​](#example-query "Direct link to Example query") You can use the `environmentId` and filter by test types to return metadata for all tests in the environment: ```graphql query { environment(id: 834) { applied { tests( filter: { testTypes: [ GENERIC_DATA_TEST, SINGULAR_DATA_TEST, UNIT_TEST ] }, first: 100 ) { edges { node { name model description expect resourceType testType given } } } } } } ``` ##### Fields[​](#fields "Direct link to Fields") When querying for `tests`, you can use the following fields: ### Fetching data... ##### Key fields from nodes[​](#key-fields-from-nodes "Direct link to Key fields from nodes") ### Fetching data... #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Tests object schema The tests object allows you to query information about all tests in a given job. ##### Arguments[​](#arguments "Direct link to Arguments") When querying for `tests`, the following arguments are available. ### Fetching data... Below we show some illustrative example queries and outline the schema (all possible fields you can query) of the tests object. ##### Example query[​](#example-query "Direct link to Example query") The example query below finds all tests in this job and includes information about those tests. ```graphql { job(id: 123) { tests { runId accountId projectId uniqueId name columnName state } } } ``` ##### Fields[​](#fields "Direct link to Fields") When querying for `tests`, the following fields are available: ### Fetching data... #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Use cases and examples for the Discovery API With the Discovery API, you can query the metadata in dbt to learn more about your dbt deployments and the data it generates to analyze them and make improvements. You can use the API in a variety of ways to get answers to your business questions. Below describes some of the uses of the API and is meant to give you an idea of the questions this API can help you answer. | Use case | Outcome | Example questions | | --------------------------- | ---------------------------------------------------------------------------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------- | | [Performance](#performance) | Identify inefficiencies in pipeline execution to reduce infrastructure costs and improve timeliness. | - What’s the latest status of each model?
- Do I need to run this model?
- How long did my DAG take to run? | | [Quality](#quality) | Monitor data source freshness and test results to resolve issues and drive trust in data. | - How fresh are my data sources?
- Which tests and models failed?
- What’s my project’s test coverage? | | [Discovery](#discovery) | Find and understand relevant datasets and semantic nodes with rich context and metadata. | - What do these tables and columns mean?
- What's the full data lineage at a model level?
- Which metrics can I query? | | [Governance](#governance) | Audit data development and facilitate collaboration within and between teams. | - Who is responsible for this model?
- How do I contact the model’s owner?
- Who can use this model? | | [Development](#development) | Understand dataset changes and usage and gauge impacts to inform project definition. | - How is this metric used in BI tools?
- Which nodes depend on this data source?
- How has a model changed? What impact? | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | #### Performance[​](#performance "Direct link to Performance") You can use the Discovery API to identify inefficiencies in pipeline execution to reduce infrastructure costs and improve timeliness. Below are example questions and queries you can run. For performance use cases, people typically query the historical or latest applied state across any part of the DAG (for example, models) using the `environment`, `modelHistoricalRuns`, or job-level endpoints. ##### How long did each model take to run?[​](#how-long-did-each-model-take-to-run "Direct link to How long did each model take to run?") It’s helpful to understand how long it takes to build models (tables) and tests to execute during a dbt run. Longer model build times result in higher infrastructure costs and fresh data arriving later to stakeholders. Analyses like these can be in observability tools or ad-hoc queries, like in a notebook. [![Model timing visualization in dbt](/img/docs/dbt-cloud/discovery-api/model-timing.png?v=2 "Model timing visualization in dbt")](#)Model timing visualization in dbt Example query with code Data teams can monitor the performance of their models, identify bottlenecks, and optimize the overall data pipeline by fetching execution details like `executionTime` and `runElapsedTime`: 1. Use latest state environment-level API to get a list of all executed models and their execution time. Then, sort the models by `executionTime` in descending order. ```graphql query AppliedModels($environmentId: BigInt!, $first: Int!) { environment(id: $environmentId) { applied { models(first: $first) { edges { node { name uniqueId materializedType executionInfo { lastSuccessRunId executionTime executeStartedAt } } } } } } } ``` 2. Get the most recent 20 run results for the longest running model. Review the results of the model across runs or you can go to the job/run or commit itself to investigate further. ```graphql query ModelHistoricalRuns( $environmentId: BigInt! $uniqueId: String $lastRunCount: Int ) { environment(id: $environmentId) { applied { modelHistoricalRuns( uniqueId: $uniqueId lastRunCount: $lastRunCount ) { name runId runElapsedTime runGeneratedAt executionTime executeStartedAt executeCompletedAt status } } } } ``` 3. Use the query results to plot a graph of the longest running model’s historical run time and execution time trends. ```python # Import libraries import os import matplotlib.pyplot as plt import pandas as pd import requests # Set API key auth_token = *[SERVICE_TOKEN_HERE]* # Query the API def query_discovery_api(auth_token, gql_query, variables): response = requests.post('https://metadata.cloud.getdbt.com/graphql', headers={"authorization": "Bearer "+auth_token, "content-type": "application/json"}, json={"query": gql_query, "variables": variables}) data = response.json()['data'] return data # Get the latest run metadata for all models models_latest_metadata = query_discovery_api(auth_token, query_one, variables_query_one)['environment'] # Convert to dataframe models_df = pd.DataFrame([x['node'] for x in models_latest_metadata['applied']['models']['edges']]) # Unnest the executionInfo column models_df = pd.concat([models_df.drop(['executionInfo'], axis=1), models_df['executionInfo'].apply(pd.Series)], axis=1) # Sort the models by execution time models_df_sorted = models_df.sort_values('executionTime', ascending=False) print(models_df_sorted) # Get the uniqueId of the longest running model longest_running_model = models_df_sorted.iloc[0]['uniqueId'] # Define second query variables variables_query_two = { "environmentId": *[ENVR_ID_HERE]* "lastRunCount": 10, "uniqueId": longest_running_model } # Get the historical run metadata for the longest running model model_historical_metadata = query_discovery_api(auth_token, query_two, variables_query_two)['environment']['applied']['modelHistoricalRuns'] # Convert to dataframe model_df = pd.DataFrame(model_historical_metadata) # Filter dataframe to only successful runs model_df = model_df[model_df['status'] == 'success'] # Convert the runGeneratedAt, executeStartedAt, and executeCompletedAt columns to datetime model_df['runGeneratedAt'] = pd.to_datetime(model_df['runGeneratedAt']) model_df['executeStartedAt'] = pd.to_datetime(model_df['executeStartedAt']) model_df['executeCompletedAt'] = pd.to_datetime(model_df['executeCompletedAt']) # Plot the runElapsedTime over time plt.plot(model_df['runGeneratedAt'], model_df['runElapsedTime']) plt.title('Run Elapsed Time') plt.show() # # Plot the executionTime over time plt.plot(model_df['executeStartedAt'], model_df['executionTime']) plt.title(model_df['name'].iloc[0]+" Execution Time") plt.show() ``` Plotting examples: [![The plot of runElapsedTime over time](/img/docs/dbt-cloud/discovery-api/plot-of-runelapsedtime.png?v=2 "The plot of runElapsedTime over time")](#)The plot of runElapsedTime over time [![The plot of executionTime over time](/img/docs/dbt-cloud/discovery-api/plot-of-executiontime.png?v=2 "The plot of executionTime over time")](#)The plot of executionTime over time ##### What’s the latest state of each model?[​](#whats-the-latest-state-of-each-model "Direct link to What’s the latest state of each model?") The Discovery API provides information about the applied state of models and how they arrived in that state. You can retrieve the status information from the most recent run and most recent successful run (execution) from the `environment` endpoint and dive into historical runs using job-based and `modelByEnvironment` endpoints. Example query The API returns full identifier information (`database.schema.alias`) and the `executionInfo` for both the most recent run and most recent successful run from the database: ```graphql query ($environmentId: BigInt!, $first: Int!) { environment(id: $environmentId) { applied { models(first: $first) { edges { node { uniqueId compiledCode database schema alias materializedType executionInfo { executeCompletedAt lastJobDefinitionId lastRunGeneratedAt lastRunId lastRunStatus lastRunError lastSuccessJobDefinitionId runGeneratedAt lastSuccessRunId } } } } } } } ``` ##### What happened with my job run?[​](#what-happened-with-my-job-run "Direct link to What happened with my job run?") You can query the metadata at the job level to review results for specific runs. This is helpful for historical analysis of deployment performance or optimizing particular jobs. Example query Deprecated example: ```graphql query ($jobId: Int!, $runId: Int!) { models(jobId: $jobId, runId: $runId) { name status tests { name status } } } ``` New example: ```graphql query ($jobId: BigInt!, $runId: BigInt!) { job(id: $jobId, runId: $runId) { models { name status tests { name status } } } } ``` ##### What’s changed since the last run?[​](#whats-changed-since-the-last-run "Direct link to What’s changed since the last run?") Unnecessary runs incur higher infrastructure costs and load on the data team and their systems. A model doesn’t need to be run if it’s a view and there's no code change since the last run, or if it’s a table/incremental with no code change since last run and source data has not been updated since the last run. Example query With the API, you can compare the `rawCode` between the definition and applied state, and review when the sources were last loaded (source `maxLoadedAt` relative to model `executeCompletedAt`) given the `materializedType` of the model: ```graphql query ($environmentId: BigInt!, $first: Int!) { environment(id: $environmentId) { applied { models( first: $first filter: { uniqueIds: "MODEL.PROJECT.MODEL_NAME" } ) { edges { node { rawCode ancestors(types: [Source]) { ... on SourceAppliedStateNestedNode { freshness { maxLoadedAt } } } executionInfo { runGeneratedAt executeCompletedAt } materializedType } } } } definition { models( first: $first filter: { uniqueIds: "MODEL.PROJECT.MODEL_NAME" } ) { edges { node { rawCode runGeneratedAt materializedType } } } } } } ``` #### Quality[​](#quality "Direct link to Quality") You can use the Discovery API to monitor data source freshness and test results to diagnose and resolve issues and drive trust in data. When used with [webhooks](https://docs.getdbt.com/docs/deploy/webhooks.md), can also help with detecting, investigating, and alerting issues. Below lists example questions the API can help you answer. Below are example questions and queries you can run. For quality use cases, people typically query the historical or latest applied state, often in the upstream part of the DAG (for example, sources), using the `environment` or `environment { applied { modelHistoricalRuns } }` endpoints. ##### Which models and tests failed to run?[​](#which-models-and-tests-failed-to-run "Direct link to Which models and tests failed to run?") By filtering on the latest status, you can get lists of models that failed to build and tests that failed during their most recent execution. This is helpful when diagnosing issues with the deployment that result in delayed or incorrect data. Example query with code 1. Get the latest run results across all jobs in the environment and return only the models and tests that errored/failed. ```graphql query ($environmentId: BigInt!, $first: Int!) { environment(id: $environmentId) { applied { models(first: $first, filter: { lastRunStatus: error }) { edges { node { name executionInfo { lastRunId } } } } tests(first: $first, filter: { status: "fail" }) { edges { node { name executionInfo { lastRunId } } } } } } } ``` 2. Review the historical execution and test failure rate (up to 20 runs) for a given model, such as a frequently used and important dataset. ```graphql query ($environmentId: BigInt!, $uniqueId: String!, $lastRunCount: Int) { environment(id: $environmentId) { applied { modelHistoricalRuns(uniqueId: $uniqueId, lastRunCount: $lastRunCount) { name executeStartedAt status tests { name status } } } } } ``` 3. Identify the runs and plot the historical trends of failure/error rates. ##### When was the data my model uses last refreshed?[​](#when-was-the-data-my-model-uses-last-refreshed "Direct link to When was the data my model uses last refreshed?") You can get the metadata on the latest execution for a particular model or across all models in your project. For instance, investigate when each model or snapshot that's feeding into a given model was last executed or the source or seed was last loaded to gauge the *freshness* of the data. Example query with code ```graphql query ($environmentId: BigInt!, $first: Int!) { environment(id: $environmentId) { applied { models( first: $first filter: { uniqueIds: "MODEL.PROJECT.MODEL_NAME" } ) { edges { node { name ancestors(types: [Model, Source, Seed, Snapshot]) { ... on ModelAppliedStateNestedNode { name resourceType materializedType executionInfo { executeCompletedAt } } ... on SourceAppliedStateNestedNode { sourceName name resourceType freshness { maxLoadedAt } } ... on SnapshotAppliedStateNestedNode { name resourceType executionInfo { executeCompletedAt } } ... on SeedAppliedStateNestedNode { name resourceType executionInfo { executeCompletedAt } } } } } } } } } ``` ```python # Extract graph nodes from response def extract_nodes(data): models = [] sources = [] groups = [] for model_edge in data["applied"]["models"]["edges"]: models.append(model_edge["node"]) for source_edge in data["applied"]["sources"]["edges"]: sources.append(source_edge["node"]) for group_edge in data["definition"]["groups"]["edges"]: groups.append(group_edge["node"]) models_df = pd.DataFrame(models) sources_df = pd.DataFrame(sources) groups_df = pd.DataFrame(groups) return models_df, sources_df, groups_df # Construct a lineage graph with freshness info def create_freshness_graph(models_df, sources_df): G = nx.DiGraph() current_time = datetime.now(timezone.utc) for _, model in models_df.iterrows(): max_freshness = pd.Timedelta.min if "meta" in models_df.columns: freshness_sla = model["meta"]["freshness_sla"] else: freshness_sla = None if model["executionInfo"]["executeCompletedAt"] is not None: model_freshness = current_time - pd.Timestamp(model["executionInfo"]["executeCompletedAt"]) for ancestor in model["ancestors"]: if ancestor["resourceType"] == "SourceAppliedStateNestedNode": ancestor_freshness = current_time - pd.Timestamp(ancestor["freshness"]['maxLoadedAt']) elif ancestor["resourceType"] == "ModelAppliedStateNestedNode": ancestor_freshness = current_time - pd.Timestamp(ancestor["executionInfo"]["executeCompletedAt"]) if ancestor_freshness > max_freshness: max_freshness = ancestor_freshness G.add_node(model["uniqueId"], name=model["name"], type="model", max_ancestor_freshness = max_freshness, freshness = model_freshness, freshness_sla=freshness_sla) for _, source in sources_df.iterrows(): if source["maxLoadedAt"] is not None: G.add_node(source["uniqueId"], name=source["name"], type="source", freshness=current_time - pd.Timestamp(source["maxLoadedAt"])) for _, model in models_df.iterrows(): for parent in model["parents"]: G.add_edge(parent["uniqueId"], model["uniqueId"]) return G ``` Graph example: [![A lineage graph with source freshness information](/img/docs/dbt-cloud/discovery-api/lineage-graph-with-freshness-info.png?v=2 "A lineage graph with source freshness information")](#)A lineage graph with source freshness information ##### Are my data sources fresh?[​](#are-my-data-sources-fresh "Direct link to Are my data sources fresh?") Checking [source freshness](https://docs.getdbt.com/docs/build/sources.md#source-data-freshness) allows you to ensure that sources loaded and used in your dbt project are compliant with expectations. The API provides the latest metadata about source loading and information about the freshness check criteria. [![Source freshness page in dbt](/img/docs/dbt-cloud/discovery-api/source-freshness-page.png?v=2 "Source freshness page in dbt")](#)Source freshness page in dbt Example query ```graphql query ($environmentId: BigInt!, $first: Int!) { environment(id: $environmentId) { applied { sources( first: $first filter: { freshnessChecked: true, database: "production" } ) { edges { node { sourceName name identifier loader freshness { freshnessJobDefinitionId freshnessRunId freshnessRunGeneratedAt freshnessStatus freshnessChecked maxLoadedAt maxLoadedAtTimeAgoInS snapshottedAt criteria { errorAfter { count period } warnAfter { count period } } } } } } } } } ``` ##### What’s the test coverage and status?[​](#whats-the-test-coverage-and-status "Direct link to What’s the test coverage and status?") [Data tests](https://docs.getdbt.com/docs/build/data-tests.md) are an important way to ensure that your stakeholders are reviewing high-quality data. You can execute tests during a dbt run. The Discovery API provides complete test results for a given environment or job, which it represents as the `children` of a given node that’s been tested (for example, a `model`). Example query For the following example, the `parents` are the nodes (code) that's being tested and `executionInfo` describes the latest test results: ```graphql query ($environmentId: BigInt!, $first: Int!) { environment(id: $environmentId) { applied { tests(first: $first) { edges { node { name columnName parents { name resourceType } executionInfo { lastRunStatus lastRunError executeCompletedAt executionTime } } } } } } } ``` ##### How is this model contracted and versioned?[​](#how-is-this-model-contracted-and-versioned "Direct link to How is this model contracted and versioned?") To enforce the shape of a model's definition, you can define contracts on models and their columns. You can also specify model versions to keep track of discrete stages in its evolution and use the appropriate one. Example query ```graphql query { environment(id: 123) { applied { models(first: 100, filter: { access: public }) { edges { node { name latestVersion contractEnforced constraints { name type expression columns } catalog { columns { name type } } } } } } } } ``` #### Discovery[​](#discovery "Direct link to Discovery") You can use the Discovery API to find and understand relevant datasets and semantic nodes with rich context and metadata. Below are example questions and queries you can run. For discovery use cases, people typically query the latest applied or definition state, often in the downstream part of the DAG (for example, mart models or metrics), using the `environment` endpoint. ##### What does this dataset and its columns mean?[​](#what-does-this-dataset-and-its-columns-mean "Direct link to What does this dataset and its columns mean?") Query the Discovery API to map a table/view in the data platform to the model in the dbt project; then, retrieve metadata about its meaning, including descriptive metadata from its YAML file and catalog information from its YAML file and the schema. Example query ```graphql query ($environmentId: BigInt!, $first: Int!) { environment(id: $environmentId) { applied { models( first: $first filter: { database: "analytics" schema: "prod" identifier: "customers" } ) { edges { node { name description tags meta catalog { columns { name description type } } } } } } } } ``` ##### What's the full data lineage at a model level?[​](#whats-the-full-data-lineage-at-a-model-level "Direct link to What's the full data lineage at a model level?") The Discovery API enables access to comprehensive model-level data lineage by exposing: * Upstream dependencies of models, including relationships to [sources](https://docs.getdbt.com/docs/build/sources.md), [seeds](https://docs.getdbt.com/docs/build/seeds.md), and [snapshots](https://docs.getdbt.com/docs/build/snapshots.md) * Model execution metadata such as run status, execution time, and freshness * Column-level details, including tests and descriptions * References between models to reconstruct lineage across your project Example query Here's a GraphQL query example that retrieves full model-level data lineage using the Discovery API: ```graphql query ($environmentId: BigInt!, $first: Int!) { environment(id: $environmentId) { applied { models(first: $first) { edges { node { name ancestors(types: [Model, Source, Seed, Snapshot]) { ... on ModelAppliedStateNestedNode { name resourceType } ... on SourceAppliedStateNestedNode { sourceName name resourceType } } } } } } } } ``` ##### Which metrics are available?[​](#which-metrics-are-available "Direct link to Which metrics are available?") You can define and query metrics using the [Semantic Layer](https://docs.getdbt.com/docs/build/about-metricflow.md), use them for documentation purposes (like for a data catalog), and calculate aggregations (like in a BI tool that doesn’t query the SL). Example query ```graphql query ($environmentId: BigInt!, $first: Int!) { environment(id: $environmentId) { definition { metrics(first: $first) { edges { node { name description type formula filter tags parents { name resourceType } } } } } } } ``` #### Governance[​](#governance "Direct link to Governance") You can use the Discovery API to audit data development and facilitate collaboration within and between teams. For governance use cases, people tend to query the latest definition state, often in the downstream part of the DAG (for example, public models), using the `environment` endpoint. ##### Who is responsible for this model?[​](#who-is-responsible-for-this-model "Direct link to Who is responsible for this model?") You can define and surface the groups each model is associated with. Groups contain information like owner. This can help you identify which team owns certain models and who to contact about them. Example query ```graphql query ($environmentId: BigInt!, $first: Int!) { environment(id: $environmentId) { applied { models(first: $first, filter: { uniqueIds: ["MODEL.PROJECT.NAME"] }) { edges { node { name description resourceType access group } } } } definition { groups(first: $first) { edges { node { name resourceType models { name } ownerName ownerEmail } } } } } } ``` ##### Who can use this model?[​](#who-can-use-this-model "Direct link to Who can use this model?") You can enable people the ability to specify the level of access for a given model. In the future, public models will function like APIs to unify project lineage and enable reuse of models using cross-project refs. Example query ```graphql query ($environmentId: BigInt!, $first: Int!) { environment(id: $environmentId) { definition { models(first: $first) { edges { node { name access } } } } } } ``` *** ```graphql query ($environmentId: BigInt!, $first: Int!) { environment(id: $environmentId) { definition { models(first: $first, filter: { access: public }) { edges { node { name } } } } } } ``` #### Development[​](#development "Direct link to Development") You can use the Discovery API to understand dataset changes and usage and gauge impacts to inform project definition. Below are example questions and queries you can run. For development use cases, people typically query the historical or latest definition or applied state across any part of the DAG using the `environment` endpoint. ##### How is this model or metric used in downstream tools?[​](#how-is-this-model-or-metric-used-in-downstream-tools "Direct link to How is this model or metric used in downstream tools?") [Exposures](https://docs.getdbt.com/docs/build/exposures.md) provide a method to define how a model or metric is actually used in dashboards and other analytics tools and use cases. You can query an exposure’s definition to see how project nodes are used and query its upstream lineage results to understand the state of the data used in it, which powers use cases like a freshness and quality status tile. [![Embed data health tiles in your dashboards to distill trust signals for data consumers.](/img/docs/collaborate/dbt-explorer/data-tile-pass.jpg?v=2 "Embed data health tiles in your dashboards to distill trust signals for data consumers.")](#)Embed data health tiles in your dashboards to distill trust signals for data consumers. Example query Below is an example that reviews an exposure and the models used in it including when they were last executed. ```graphql query ($environmentId: BigInt!, $first: Int!) { environment(id: $environmentId) { applied { exposures(first: $first) { edges { node { name description ownerName url parents { name resourceType ... on ModelAppliedStateNestedNode { executionInfo { executeCompletedAt lastRunStatus } } } } } } } } } ``` ##### How has this model changed over time?[​](#how-has-this-model-changed-over-time "Direct link to How has this model changed over time?") The Discovery API provides historical information about any resource in your project. For instance, you can view how a model has evolved over time (across recent runs) given changes to its shape and contents. Example query Review the differences in `compiledCode` or `columns` between runs or plot the “Approximate Size” and “Row Count” `stats` over time: ```graphql query ( $environmentId: BigInt! $uniqueId: String! $lastRunCount: Int! $withCatalog: Boolean! ) { environment(id: $environmentId) { applied { modelHistoricalRuns( uniqueId: $uniqueId lastRunCount: $lastRunCount withCatalog: $withCatalog ) { name compiledCode columns { name } stats { label value } } } } } ``` ##### Which nodes depend on this data source?[​](#which-nodes-depend-on-this-data-source "Direct link to Which nodes depend on this data source?") dbt lineage begins with data sources. For a given source, you can look at which nodes are its children then iterate downstream to get the full list of dependencies. Currently, querying beyond 1 generation (defined as a direct parent-to-child) is not supported. To see the grandchildren of a node, you need to make two queries: one to get the node and its children, and another to get the children nodes and their children. Example query ```graphql query ($environmentId: BigInt!, $first: Int!) { environment(id: $environmentId) { applied { sources( first: $first filter: { uniqueIds: ["SOURCE_NAME.TABLE_NAME"] } ) { edges { node { loader children { uniqueId resourceType ... on ModelAppliedStateNestedNode { database schema alias } } } } } } } } ``` #### Related docs[​](#related-docs "Direct link to Related docs") * [Query Discovery API](https://docs.getdbt.com/docs/dbt-cloud-apis/discovery-querying.md) #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ## Best Practices ### Available materializations Views and tables and incremental models, oh my! In this section we’ll start getting our hands dirty digging into the three basic materializations that ship with dbt. They are considerably less scary and more helpful than lions, tigers, or bears — although perhaps not as cute (can data be cute? We at dbt Labs think so). We’re going to define, implement, and explore: * 🔍 [**views**](https://docs.getdbt.com/docs/build/materializations.md#view) * ⚒️ [**tables**](https://docs.getdbt.com/docs/build/materializations.md#table) * 📚 [**incremental model**](https://docs.getdbt.com/docs/build/materializations.md#incremental) info 👻 There is a fourth default materialization available in dbt called [**ephemeral materialization**](https://docs.getdbt.com/docs/build/materializations.md#ephemeral). It is less broadly applicable than the other three, and better deployed for specific use cases that require weighing some tradeoffs. We chose to leave it out of this guide and focus on the three materializations that will power 99% of your modeling needs. **Views and Tables are the two basic categories** of object that we can create across warehouses. They exist natively as types of objects in the warehouse, as you can see from this screenshot of Snowflake (depending on your warehouse the interface will look a little different). **Incremental models** and other materializations types are a little bit different. They tell dbt to **construct tables in a special way**. ![Tables and views in the browser on Snowflake.](/assets/images/tables-and-views-d510f9a1eecc0c54f4352182389f3435.png) ##### Views[​](#views "Direct link to Views") * ✅ **The default materialization in dbt**. A starting project has no configurations defined for materializations, which means *everything* is by default built as a view. * 👩‍💻 **Store *only the SQL logic* of the transformation in the warehouse, *not the data***. As such, they make a great default. They build almost instantly and cost almost nothing to build. * ⏱️ Always reflect the **most up-to-date** version of the input data, as they’re run freshly every time they’re queried. * 👎 **Have to be processed every time they’re queried, so slower to return results than a table of the same data.** That also means they can cost more over time, especially if they contain intensive transformations and are queried often. ##### Tables[​](#tables "Direct link to Tables") * 🏗️ **Tables store the data itself** as opposed to views which store the query logic. This means we can pack all of the transformation compute into a single run. A view is storing a *query* in the warehouse. Even to preview that data we have to query it. A table is storing the literal rows and columns on disk. * 🏎️ Querying lets us **access that transformed data directly**, so we get better performance. Tables feel **faster and more responsive** compared to views of the same logic. * 💸 **Improves compute costs.** Compute is significantly more expensive than storage. So while tables use much more storage, it’s generally an economical tradeoff, as you only pay for the transformation compute when you build a table during a job, rather than every time you query it. * 🔍 **Ideal for models that get queried regularly**, due to the combination of these qualities. * 👎 **Limited to the source data that was available when we did our most recent run.** We’re ‘freezing’ the transformation logic into a table. So if we run a model as a table every hour, at 10:59a we still only have data up to 10a, because that was what was available in our source data when we ran the table last at 10a. Only at the next run will the newer data be included in our rebuild. ##### Incremental models[​](#incremental-models "Direct link to Incremental models") * 🧱 **Incremental** models build a **table** in **pieces over time**, only adding and updating new or changed records. * 🏎️  **Builds more quickly** than a regular table of the same logic. * 🐢 **Initial runs are slow.** Typically we use incremental models on very large datasets, so building the initial table on the full dataset is time consuming and equivalent to the table materialization. * 👎 **Add complexity.** Incremental models require deeper consideration of layering and timing. * 👎 Can drift from source data over time. As we’re not processing all of the source data when we run an incremental model, extra effort is required to capture changes to historical data. ##### Comparing the materialization types[​](#comparing-the-materialization-types "Direct link to Comparing the materialization types") | | view | table | incremental | | -------------------- | ------------------------------------ | -------------------------------------- | -------------------------------------- | | 🛠️⌛ **build time** | 💚  fastest — only stores logic | ❤️  slowest — linear to size of data | 💛  medium — builds flexible portion | | 🛠️💸 **build costs** | 💚  lowest — no data processed | ❤️  highest — all data processed | 💛  medium — some data processed | | 📊💸 **query costs** | ❤️  higher — reprocess every query | 💚  lower — data in warehouse | 💚  lower — data in warehouse | | 🍅🌱 **freshness** | 💚  best — up-to-the-minute of query | 💛  moderate — up to most recent build | 💛  moderate — up to most recent build | | 🧠🤔 **complexity** | 💚 simple - maps to warehouse object | 💚 simple - map to warehouse concept | 💛 moderate - adds logical complexity | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | info 🔑 **Time is money.** Notice in the above chart that the time and costs rows contain the same results. This is to highlight that when we’re talking about time in warehouses, we’re talking about compute time, which is the primary driver of costs. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Best practice guides #### [🗃️ How we structure our dbt projects](https://docs.getdbt.com/best-practices/how-we-structure/1-guide-overview.md) [4 items](https://docs.getdbt.com/best-practices/how-we-structure/1-guide-overview.md) --- ### Best practices #### Putting it all together[​](#putting-it-all-together "Direct link to Putting it all together") * 📊 We've walked through **creating semantic models and metrics** for basic coverage of a key business area. * 🔁 In doing so we've looked at how to **refactor a frozen rollup** into a dynamic, flexible new life in the Semantic Layer. #### Best practices[​](#best-practices "Direct link to Best practices") * ✅ **Prefer normalization** when possible to allow MetricFlow to denormalize dynamically for end users. * ✅ Use **marts to denormalize** when needed, for instance grouping tables together into richer components, or getting measures on dimensional tables attached to a table with a time spine. * ✅ When source data is **well normalized** you can **build semantic models on top of staging models**. * ✅ **Prefer** computing values in **measures and metrics** when possible as opposed to in frozen rollups. * ❌ **Don't directly refactor the code you have in production**, build in parallel so you can audit the Semantic Layer output and deprecate old marts gracefully. #### Key commands[​](#key-commands "Direct link to Key commands") * 🔑 Use `dbt parse` to generate a fresh semantic manifest. * 🔑 Use `dbt sl list dimensions --metrics [metric name]` to check that you're increasing dimensionality as you progress. * 🔑 Use `dbt sl query [query options]` to preview the output from your metrics as you develop. #### Next steps[​](#next-steps "Direct link to Next steps") * 🗺️ Use these best practices to map out your team's plan to **incrementally adopt the Semantic Layer**. * 🤗 Get involved in the community and ask questions, **help craft best practices**, and share your progress in building a Semantic Layer. * [Validate semantic nodes in CI](https://docs.getdbt.com/docs/deploy/ci-jobs.md#semantic-validations-in-ci) to ensure code changes made to dbt models don't break these metrics. The Semantic Layer is the biggest paradigm shift thus far in the young practice of analytics engineering. It's ready to provide value right away, but is most impactful if you move your project towards increasing normalization, and allow MetricFlow to do the denormalization for you with maximum dimensionality. We will be releasing more resources soon covering implementation of the Semantic Layer in dbt with various integrated BI tools. This is just the beginning, hopefully this guide has given you a path forward for building your data platform in this new era. Refer to [Semantic Layer FAQs](https://docs.getdbt.com/docs/use-dbt-semantic-layer/sl-faqs.md) for more information. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Best practices for dbt and Unity Catalog Your Databricks dbt project should be configured after following the ["How to set up your databricks dbt project guide"](https://docs.getdbt.com/guides/set-up-your-databricks-dbt-project.md). Now we’re ready to start building a dbt project using Unity Catalog. However, we should first consider how we want to allow dbt users to interact with our different catalogs. We recommend the following best practices to ensure the integrity of your production data: #### Isolate your Bronze (aka source) data[​](#isolate-your-bronze-aka-source-data "Direct link to Isolate your Bronze (aka source) data") We recommend using Unity Catalog because it allows you to reference data across your organization from any other catalog, legacy Hive metastore, external metastore, or Delta Live Table pipeline outputs. Additionally, Databricks offers the capability to [interact with external data](https://docs.databricks.com/external-data/index.html#interact-with-external-data-on-databricks) and supports query federation to many [database solutions](https://docs.databricks.com/query-federation/index.html#what-is-query-federation-for-databricks-sql). This means your dev and prod environments will have access to your source data, even if it is defined in another catalog or external data source. Raw data in your Bronze layer should be defined as dbt [sources](https://docs.getdbt.com/docs/build/sources.md) and should be read-only for all dbt interactions in both development and production. By default, we recommend that all of these inputs should be accessible by all dbt users in all dbt environments. This ensures that transformations in all environments begin with the same input data, and the results observed in development will be replicated when that code is deployed. That being said, there are times when your company’s data governance requirements necessitate using multiple workspaces or data catalogs depending on the environment. If you have different data catalogs/schemas for your source data depending on your environment, you can use the [target.name](https://docs.getdbt.com/reference/dbt-jinja-functions/target.md#use-targetname-to-change-your-source-database) to change the data catalog/schema you’re pulling from depending on the environment. If you use multiple Databricks workspaces to isolate development from production, you can use dbt’s [environment variables](https://docs.getdbt.com/docs/build/environment-variables.md) in your connection config strings to reference multiple workspaces from one dbt project. You can also do the same thing for your SQL warehouse so you can have different sizes based on your environments. To do so, use dbt's [environment variable syntax](https://docs.getdbt.com/docs/build/environment-variables.md#special-environment-variables) for Server Hostname of your Databricks workspace URL and HTTP Path for the SQL warehouse in your connection settings. Note that Server Hostname still needs to appear to be a valid domain name to pass validation checks, so you will need to hard-code the domain suffix on the URL, eg `{{env_var('DBT_HOSTNAME')}}.cloud.databricks.com` and the path prefix for your warehouses, eg `/sql/1.0/warehouses/{{env_var('DBT_HTTP_PATH')}}`. [![Using environment variable syntax in connection configs](/img/guides/databricks-guides/databricks-connection-env-vars.png?v=2 "Using environment variable syntax in connection configs")](#)Using environment variable syntax in connection configs When you create environments in dbt, you can assign environment variables to populate the connection information dynamically. Don’t forget to make sure the tokens you use in the credentials for those environments were generated from the associated workspace. [![Defining default environment variable values](/img/guides/databricks-guides/databricks-env-variables.png?v=2 "Defining default environment variable values")](#)Defining default environment variable values #### Access Control[​](#access-control "Direct link to Access Control") For granting access to data consumers, use dbt’s [grants config](https://docs.getdbt.com/reference/resource-configs/grants.md) to apply permissions to database objects generated by dbt models. This lets you configure grants as a structured dictionary rather than writing all the SQL yourself and lets dbt take the most efficient path to apply those grants. As for permissions to run dbt and read non-consumer-facing data sources, the table below summarizes an access model. Effectively, all developers should get no more than read access on the prod catalog and write access in the dev catalog. When using dbt, schema creation is taken care of for you; unlike traditional data warehousing workflows, you do not need to manually create any Unity Catalog assets other than the top-level catalogs. The **prod** service principal should have “read” access to raw source data, and “write” access to the prod catalog. If you add a **test** catalog and associated dbt environment, you should create a dedicated service principal. The test service principal should have *read* on raw source data, and *write* on the **test** catalog but no permissions on the prod or dev catalogs. A dedicated test environment should be used for [CI testing](https://www.getdbt.com/blog/adopting-ci-cd-with-dbt-cloud/) only. **Table-level grants:** | | Source Data | Development catalog | Production catalog | Test catalog | | ---------------------------- | ----------- | ------------------- | ------------------ | --------------- | | developers | select | select & modify | select or none | none | | production service principal | select | none | select & modify | none | | Test service principal | select | none | none | select & modify | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | **Schema-level grants:** | | Source Data | Development catalog | Production catalog | Test catalog | | ---------------------------- | ----------- | --------------------------------- | -------------------------------- | -------------------------------- | | developers | use | use, create schema, table, & view | use or none | none | | production service principal | use | none | use, create schema, table & view | none | | Test service principal | use | none | none | use, create schema, table & view | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | #### Next steps[​](#next-steps "Direct link to Next steps") Ready to start transforming your Unity Catalog datasets with dbt? Check out the resources below for guides, tips, and best practices: * [How we structure our dbt projects](https://docs.getdbt.com/best-practices/how-we-structure/1-guide-overview.md) * [Self-paced dbt fundamentals training course](https://learn.getdbt.com/courses/dbt-fundamentals) * [Customizing CI/CD](https://docs.getdbt.com/guides/custom-cicd-pipelines.md) * [Debugging errors](https://docs.getdbt.com/guides/debug-errors.md) * [Writing custom generic tests](https://docs.getdbt.com/best-practices/writing-custom-generic-tests.md) * [dbt packages hub](https://hub.getdbt.com/) #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Best practices for materializations First, let’s consider some properties of various levels of our dbt project and materializations. * 🔍 **Views** return the freshest, real-time state of their input data when they’re queried, this makes them ideal as **building blocks** for larger models. * 🧶  When we’re building a model that stitches lots of other models together, we don’t want to worry about all those models having different states of freshness because they were built into tables at different times. We want all those inputs to give us all the underlying source data available. * 🤏 **Views** are also great for **small datasets** with minimally intensive logic that we want **near realtime** access to. * 🛠️ **Tables** are the **most performant** materialization, as they just return the transformed data when they’re queried, with no need to reprocess it. * 📊  This makes tables great for **things end users touch**, like a mart that services a popular dashboard. * 💪 Tables are also ideal for **frequently used, compute intensive** transformations. Making a table allows us to ‘freeze’ those transformations in place. * 📚  **Incremental models** are useful for the **same purposes as tables**, they just enable us to build them on larger datasets, so they can be **built** *and* **accessed** in a **performant** way. ##### Project-level configuration[​](#project-level-configuration "Direct link to Project-level configuration") Keeping these principles in mind, we can applying these materializations to a project. Earlier we looked at how to configure an individual model's materializations. In practice though, we'll want to set materializations at the folder level, and use individual model configs to override those as needed. This will keep our code DRY and avoid repeating the same config blocks in every model. * 📂  In the `dbt_project.yml` we have a `models:` section (by default at the bottom of the file) we can use define various **configurations for entire directories**. * ⚙️  These are the **same configs that are passed to a `{{ config() }}` block** for individual models, but they get set for *every model in that directory and any subdirectories nested within it*. * ➕  We demarcate between a folder name and a configuration by using a `+`, so `marketing`, `paid_ads`, and `google` below are folder names, whereas **`+materialized` is a configuration** being applied to those folder and all folders nested below them. * ⛲  Configurations set in this way **cascade**, the **more specific scope** is the one that will be set. * 👇🏻  In the example below, all the models in the `marketing` and `paid_ads` folders would be views, but the `google` sub folder would be **tables.** ```yaml models: jaffle_shop: marketing: +materialized: view paid_ads: google: +materialized: table ``` ##### Staging views[​](#staging-views "Direct link to Staging views") We’ll start off simple with staging models. Lets consider some aspects of staging models to determine the ideal materialization strategy: * 🙅‍♀️ Staging models are **rarely accessed** directly by our **end users.** * 🧱 They need to be always up-to-date and in sync with our source data as a **building blocks** for later models * 🔍  It’s clear we’ll want to keep our **staging models as views**. * 👍  Since views are the **default materialization** in dbt, we don’t *have* to do any specific configuration for this. * 💎  Still, for clarity, it’s a **good idea** to go ahead and **specify the configuration** to be explicit. We’ll want to make sure our `dbt_project.yml` looks like this: ```yaml models: jaffle_shop: staging: +materialized: view ``` ##### Ephemeral intermediate models[​](#ephemeral-intermediate-models "Direct link to Ephemeral intermediate models") Intermediate models sit between staging and marts, breaking up complex transformations into manageable pieces. Consider these aspects when choosing their materialization: * 🚫 Intermediate models are not accessed directly by end users. They exist to simplify mart logic. * 🧩 They serve as building blocks that get referenced by marts or other intermediate models. * 👻 This makes them ideal candidates for ephemeral materialization, which doesn't create objects in your warehouse. Ephemeral models are interpolated as CTE into the models that reference them. This keeps your warehouse clean and avoids cluttering it with models that aren't meant for direct querying: ```yaml models: jaffle_shop: staging: +materialized: view intermediate: +materialized: ephemeral ``` When to avoid ephemeral models Ephemeral models can make troubleshooting more difficult since they don't exist as queryable objects. If you need to inspect intermediate results during development, consider materializing them as views in a custom schema with restricted permissions instead. This gives you visibility while keeping them separate from production models. For more details on intermediate model patterns, refer to [How we structure our dbt projects: Intermediate](https://docs.getdbt.com/best-practices/how-we-structure/3-intermediate.md). ##### Table and incremental marts[​](#table-and-incremental-marts "Direct link to Table and incremental marts") As we’ve learned, views store only the logic of the transformation in the warehouse, so our runs take only a couple seconds per model (or less). What happens when we go to query the data though? ![Long query time from Snowflake](data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAXsAAABKCAMAAABtsQSFAAACAVBMVEX///+i0dzo6OhKSkoAu+b50pFNTU1QUFBXV1f+/v79/f2bm5vy8vJkZGSysrLNzc37+/torlLX19f29vZVVVXHx8d6enqIiIjx8fHv7+9aWlpTU1Pd3d1fX1/i4uLj4+OhoaGLi4uYmJhmZmbPz89wcHD09PS8vLzs7Oz39/fc3Nzq6urm5ubg4ODR0dHe3t5ycnJ3d3egzJJcXFx/f3+oqKiXl5dubm7IyMienp7u7u6dnZ24uLiQkJDT09NhYWHz8/OSkpKtra1eXl6rq6ucnJy0tLTFxcVxcXH6+vr4+PhpaWmwsLDBwcF+fn5qamr8/PyDg4PU1NTp6emjo6OMjIx4eHjt7e18fHyvr6++vr7Jycna2trLy8vj8N7k5OR1dXWAgIBtbW3KysrS0tLr6+vDw8Pf39+mpqanp6fV1dWNjY25ubmqqqr19fVra2uBgYHCwsLZ2dmWlpaampq2tra3t7eHh4ezs7N9fX21tbXGxsZ0dHTl5eW/v7+GhoaioqLh4eH5+fmurq6lpaWZmZmxsbGfn59xs1zAwMCUlJSkpKS6urqKioqgoKCpqanW1tb97NFlZWX73a32+vS9vb3Y7PHQ0NDG4b5xs13//Pbw8PC53eX/+/aFhYWNwn3Ozs73+/yEvXOVlZWsrKy7u7vY2Njn5+fs9ent+v3MzMwpLeKTAAAHzklEQVR42u3bZ1ObZxaA4Xs5m7eoISOhaoHoIMCAqaY30zsYDDY27r13x71u4pbNbjabzfa+/pWrBkYUj3E8ZBKe64tGOo800j2aV/py+N/GoHwyqv2P6K3yY0FRFEVRFEVRFEX5CbOhxG03DGO0u5E11Rplzvxs1mFGn1q/a4ZFpRUsMxGd9jlYrUSuocRkSOiVXQpYU65kNcoe1mGRoq/m6rtsJJ2qYpl0LaO1SbvNam8LURLtd8O0pBFp6btaiOnweBwWnt4Hx7QrV8orxf2Ksj0FV7av0X47mIEOnKf6Kof5vFqfLmR4+vCUmWgPtCzQ7biRQ3lLQVMJHZ9Dxd4x7wnMoapbrUx4oacFuoe2cPsqKS/XtEqxjzlk1CO15OngFV+0vUeMFjOo3XAHXWu2py6PK27HE6ko+7O9dyxTwjeDs0vtc6apkY63pXrLUPUIuQXw5R30HVwN3MyRihmZpE18uOe2avvqmmoJ0CLl/bWy/0vpcR6pWGyfvOa0iz0tbX6d9g0arhlcxu34Nccsw3loPN5e0vd1aEXUjMDUXsiQkixpNIP3ou0tUg6VddgdfskrypLCrdpebxPNx6jE9JQbIvUnVrTniog7bK7dvslOpEoXmY23t3kN0fYm2odCe4ugZheMiKZpUsHXnUe1/mj7h/H7HsLTZytP3j9Xv4Wv94fkODWSkV3+sB1X+lMJcUFKqEy2zwHLwWa79K7dvmsP1d4yczDR/ow94rQm2msAifYdC7Yok9nBPTlE2xfKRPR+CRXarc6KYEHnFm5fJgHLWfEcr5TyHLl+SfIIy0KLxNsPaIHeMilIH5Ejq9vn+0+Pu0sxas18mcVR7cPrcU4aq9ofkQanY9CJTbSJWHvyRo75B+fALXcJStoWbk+L3CSsi+HAXy9SfZnSPrEfElu0PVOaxmO3SIdldXuRgPcgFIkYgSFmqiWtIija4K2V7Xmji/E54DGIty/OE/E4ocUAbxeKjZhHhYkbFjldgN/kPZyliXEhmKVO1mD6SeV6hKIoiqIoivIz9csPh/KppMVtoH2a8omgKIqiKIqiKIqiKIqiKIqiKIqivMfO37PcvUrtwhDvN9GWsvmTasegVl0L7D+kd9UNwJszbJS5MEpMf47uzjUxYsZJ9Sd+sL+xDsuUkU9Mvl0bjYBvRDeeAP472uHedwfKL0zDt0ZUMx9gZnX6bduWxx+WB62dWi3vla6lbP6kOCa7I0VyjWz9fnpRV5WJ1cNGOcRNzFUjv0EfYn9GRobdS4pff7bkFx/td6ypxB6STACXPI14dZPKyhPD0gv28RPnNP/igbOa3Q6Otui7a+QDNLPktweS6bftZImpOYC34sN7Ap7mY9tVkDNJ8UhR/eXE7k5/ov3S5s9K93SgYR+eGqBYv/QR7We0q4n2d+rgohcgImVkXFzoNEn4w28+Rfs/sqbCXU493v6yuLgrxXSWwuhrvhIXXIssHnBEHHbIvUiSbba19i7z3T3lZPXAX5uhYZ6Dz7qPEHfyXfq/f3dgVXqyxQUQ3Ie+A/Kaxy70XeszSrLFPtQY393pY1l76vJYqV06spMvARyq+4j2t+6nJ9oPuTMvacMA0zcolLqXhoOkz354+yjWk2g/Fuw4Pd4FmMXX5TSzoSb7yP5lB2Lta9q+LjhLTLF1ar79m+fzc9Zsm/V7HljPs6t1wPrN0efpxDSx6C/btn13YGV6jruJqa9Ntj8ddNKvNWRLFjyM7e4UpbRv0Fhle5UYtWa7pAGEFzbePlO3JdtPBkRCPmC/lFEmrdjGNrU9J0WkB7CJjFu4KpXDXq00tf3JnONTMpdofwyLNQuaHUy9LbS+aD1mHWiMPuCyANERiw78Ktp9ZXoqpB8g8EWy/VB8TaczWwDs0d2dRyntm+yswdajP0DPB8g5s+H2LvdLku3tF9v9BR6g6gawW4x/lGxq+0wt3zksEWAsP1BHWHdCyJHSPq6uKtEejlkt0Brm0j93nOu9mVkL96zP/9VPTBMp8VemxyKXqPPEvmbuXjCae/XvbTabM9G+c3DPK1Lad+1hpYl0IHyYvhaOh+66b2+4/UtxG7oY7eCSDDiuwWUpAzjfW31mU9tfqQRC3a7hEjgZoigAHD6V2n5fKXSPLrYfsLZDwznSdj8+MtlUuw8Y+Pfra8S8IDX+Tlbo1ooO6nIGbhRkd8p1v/bElVF9MNHeJ9pEsv27zZ+VvtVazRPVdZyWcFq9tDmx9vl8PhcfzBf93zCrZ5iOIwQOlWYX1MMdLzDR1ViycHEzfmuTab+wFcn8+Tk56tTO+CMhLzbtjeusRKKDZe33Hi59aDxYbM+TZwPFTftgl3WGJqufrHB7f/czYnpIib+TVU4GRJMG2N8mnraz5BuivSbRniqDZPulzZ/VmgyRmhLIzBPRroBVog6zIeluaPNSERIZneSypAGMi3RlbcZ/TEBvwC+TZq4m+jnYERDxFMJ8UPTrsUHsQLJ96aCI17LU3vbCursZaH4OPWHgmdUaPkbMf1juv6zFZt4ucAJO4nwmSeZgmA9RPEbceWek7SAfywLgGuCd/nY20dEgYPoWPwpx/sQghcvJcgMWUjhL+DDyHq0XgoWyUg6p5mRt2/mpeZyz/mDTbb/uZwuxudYfKIqiKIqiKIqiKIqiKIqi/Dz9H3h3hQ8qA29FAAAAAElFTkSuQmCC) Our marts are slow to query! Let’s contrast the same aspects of marts that we considered for staging models to assess the best materialization strategy: * 📊  Marts are **frequently accessed directly by our end users**, and need to be **performant.** * ⌛  Can often **function with intermittently refreshed data**, end user decision making in many domains is **fine with hourly or daily data.** * 🛠️  Given the above properties we’ve got a great use case for **building the data itself** into the warehouse, not the logic. In other words, **a table**. * ❓ The only decision we need to make with our marts is whether we can **process the whole table at once or do we need to do it in chunks**, that is, are we going to use the `table` materialization or `incremental`. info 🔑 **Golden Rule of Materializations** Start with models as views, when they take too long to query, make them tables, when the tables take too long to build, make them incremental. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Best practices for workflows This page contains the collective wisdom of experienced users of dbt on how to best use it in your analytics work. Observing these best practices will help your analytics team work as effectively as possible, while implementing the pro-tips will add some polish to your dbt projects! #### Best practice workflows[​](#best-practice-workflows "Direct link to Best practice workflows") ##### Version control your dbt project[​](#version-control-your-dbt-project "Direct link to Version control your dbt project") All dbt projects should be managed in version control. Git branches should be created to manage development of new features and bug fixes. All code changes should be reviewed by a colleague (or yourself) in a Pull Request prior to merging into your production branch, such as `main`. Git guide We've codified our best practices in Git, in our [Git guide](https://github.com/dbt-labs/corp/blob/main/git-guide.md). ##### Use separate development and production environments[​](#use-separate-development-and-production-environments "Direct link to Use separate development and production environments") dbt makes it easy to maintain separate production and development environments through the use of targets within a profile. We recommend using a `dev` target when running dbt from your command line and only running against a `prod` target when running from a production deployment. You can read more [about managing environments here](https://docs.getdbt.com/docs/environments-in-dbt.md). ##### Use a style guide for your project[​](#use-a-style-guide-for-your-project "Direct link to Use a style guide for your project") SQL styles, field naming conventions, and other rules for your dbt project should be codified, especially on projects where multiple dbt users are writing code. Our style guide We've made our [style guide](https://docs.getdbt.com/best-practices/how-we-style/0-how-we-style-our-dbt-projects.md) public – these can act as a good starting point for your own style guide. #### Best practices in dbt projects[​](#best-practices-in-dbt-projects "Direct link to Best practices in dbt projects") ##### Use the ref function[​](#use-the-ref-function "Direct link to Use the ref function") The [ref](https://docs.getdbt.com/reference/dbt-jinja-functions/ref.md) function is what makes dbt so powerful! Using the `ref` function allows dbt to infer dependencies, ensuring that models are built in the correct order. It also ensures that your current model selects from upstream tables and views in the same environment that you're working in. Always use the `ref` function when selecting from another model, rather than using the direct relation reference (e.g. `my_schema.my_table`). ##### Limit references to raw data[​](#limit-references-to-raw-data "Direct link to Limit references to raw data") Your dbt project will depend on raw data stored in your database. Since this data is normally loaded by third parties, the structure of it can change over time – tables and columns may be added, removed, or renamed. When this happens, it is easier to update models if raw data is only referenced in one place. Using sources for raw data references We recommend defining your raw data as [sources](https://docs.getdbt.com/docs/build/sources.md), and selecting from the source rather than using the direct relation reference. Our dbt projects don't contain any direct relation references in any models. ##### Rename and recast fields once[​](#rename-and-recast-fields-once "Direct link to Rename and recast fields once") Raw data is generally stored in a source-conformed structure, that is, following the schema and naming conventions that the source defines. Not only will this structure differ between different sources, it is also likely to differ from the naming conventions you wish to use for analytics. The first layer of transformations in a dbt project should: * Select from only one source * Rename fields and tables to fit the conventions you wish to use within your project, for example, ensuring all timestamps are named `_at`. These conventions should be declared in your project coding conventions (see above). * Recast fields into the correct data type, for example, changing dates into UTC and prices into dollar amounts. All subsequent data models should be built on top of these models, reducing the amount of duplicated code. What happened to base models? Earlier versions of this documentation recommended implementing “base models” as the first layer of transformation, and gave advice on the SQL within these models. We realized that while the reasons behind this convention were valid, the specific advice around "base models" represented an opinion, so we moved it out of the official documentation. You can instead find our opinions on [how we structure our dbt projects](https://docs.getdbt.com/best-practices/how-we-structure/1-guide-overview.md). ##### Break complex models up into smaller pieces[​](#break-complex-models-up-into-smaller-pieces "Direct link to Break complex models up into smaller pieces") Complex models often include multiple Common Table Expressions (CTEs). In dbt, you can instead separate these CTEs into separate models that build on top of each other. It is often a good idea to break up complex models when: * A CTE is duplicated across two models. Breaking the CTE into a separate model allows you to reference the model from any number of downstream models, reducing duplicated code. * A CTE changes the grain of a the data it selects from. It's often useful to test any transformations that change the grain (as in, what one record represents) of your data. Breaking a CTE into a separate model allows you to test this transformation independently of a larger model. * The SQL in a query contains many lines. Breaking CTEs into separate models can reduce the cognitive load when another dbt user (or your future self) is looking at the code. ##### Group your models in directories[​](#group-your-models-in-directories "Direct link to Group your models in directories") Within your `models/` directory, you can have any number of nested subdirectories. We leverage directories heavily, since using a nested structure within directories makes it easier to: * Configure groups of models, by specifying configurations in your `dbt_project.yml` file. * Run subsections of your DAG, by using the [model selection syntax](https://docs.getdbt.com/reference/node-selection/syntax.md). * Communicate modeling steps to collaborators * Create conventions around the allowed upstream dependencies of a model, for example, "models in the `marts` directory can only select from other models in the `marts` directory, or from models in the `staging` directory". ##### Add tests to your models[​](#add-tests-to-your-models "Direct link to Add tests to your models") dbt provides a framework to test assumptions about the results generated by a model. Adding tests to a project helps provide assurance that both: * your SQL is transforming data in the way you expect, and * your source data contains the values you expect Recommended tests Our [style guide](https://github.com/dbt-labs/corp/blob/main/dbt_style_guide.md) recommends that at a minimum, every model should have a primary key that is tested to ensure it is unique, and not null. ##### Consider the information architecture of your data warehouse[​](#consider-the-information-architecture-of-your-data-warehouse "Direct link to Consider the information architecture of your data warehouse") When a user connects to a data warehouse via a SQL client, they often rely on the names of schemas, relations, and columns, to understand the data they are presented with. To improve the information architecture of a data warehouse, we: * Use [custom schemas](https://docs.getdbt.com/docs/build/custom-schemas.md) to separate relations into logical groupings, or hide intermediate models in a separate schema. Generally, these custom schemas align with the directories we use to group our models, and are configured from the `dbt_project.yml` file. * Use prefixes in table names (for example, `stg_`, `fct_` and `dim_`) to indicate which relations should be queried by end users. ##### Choose your materializations wisely[​](#choose-your-materializations-wisely "Direct link to Choose your materializations wisely") [materialization](https://docs.getdbt.com/docs/build/materializations.md) determine the way models are built through configuration. As a general rule: * Views are faster to build, but slower to query compared to tables. * Incremental models provide the same query performance as tables, are faster to build compared to the table materialization, however they introduce complexity into a project. We often: * Use views by default * Use ephemeral models for lightweight transformations that shouldn't be exposed to end-users * Use tables for models that are queried by BI tools * Use tables for models that have multiple descendants * Use incremental models when the build time for table models exceeds an acceptable threshold #### Pro-tips for workflows[​](#pro-tips-for-workflows "Direct link to Pro-tips for workflows") ##### Use the model selection syntax when running locally[​](#use-the-model-selection-syntax-when-running-locally "Direct link to Use the model selection syntax when running locally") When developing, it often makes sense to only run the model you are actively working on and any downstream models. You can choose which models to run by using the [model selection syntax](https://docs.getdbt.com/reference/node-selection/syntax.md). ##### Run only modified models to test changes ("slim CI")[​](#run-only-modified-models-to-test-changes-slim-ci "Direct link to Run only modified models to test changes (\"slim CI\")") To merge code changes with confidence, you want to know that those changes will not cause breakages elsewhere in your project. For that reason, we recommend running models and tests in a sandboxed environment, separated from your production data, as an automatic check in your git workflow. (If you use GitHub and dbt, read about [how to set up CI jobs](https://docs.getdbt.com/docs/deploy/ci-jobs.md). At the same time, it costs time (and money) to run and test all the models in your project. This inefficiency feels especially painful if your PR only proposes changes to a handful of models. By comparing to artifacts from a previous production run, dbt can determine which models are modified and build them on top of of their unmodified parents. ```bash dbt run -s state:modified+ --defer --state path/to/prod/artifacts dbt test -s state:modified+ --defer --state path/to/prod/artifacts ``` By comparing to artifacts from a previous production run, dbt can determine model and test result statuses. * `result:fail` * `result:error` * `result:warn` * `result:success` * `result:skipped` * `result:pass` For smarter reruns, use the `result:` selector instead of manually overriding dbt commands with the models in scope. ```bash dbt run --select state:modified+ result:error+ --defer --state path/to/prod/artifacts ``` * Rerun all my erroneous models AND run changes I made concurrently that may relate to the erroneous models for downstream use ```bash dbt build --select state:modified+ result:error+ --defer --state path/to/prod/artifacts ``` * Rerun and retest all my erroneous models AND run changes I made concurrently that may relate to the erroneous models for downstream use ```bash dbt build --select state:modified+ result:error+ result:fail+ --defer --state path/to/prod/artifacts ``` * Rerun all my erroneous models AND all my failed tests * Rerun all my erroneous models AND run changes I made concurrently that may relate to the erroneous models for downstream use * There's a failed test that's unrelated to modified or error nodes(think: source test that needs to refresh a data load in order to pass) ```bash dbt test --select result:fail --exclude --defer --state path/to/prod/artifacts ``` * Rerun all my failed tests and exclude tests that I know will still fail * This can apply to updates in source data during the "EL" process that need to be rerun after they are refreshed > Note: If you're using the `--state target/` flag, `result:error` and `result:fail` flags can only be selected concurrently(in the same command) if using the `dbt build` command. `dbt test` will overwrite the `run_results.json` from `dbt run` in a previous command invocation. Only supported by v1.1 or newer. By comparing to a `sources.json` artifact from a previous production run to a current `sources.json` artifact, dbt can determine which sources are fresher and run downstream models based on them. ```bash # job 1 dbt source freshness # must be run to get previous state ``` Test all my sources that are fresher than the previous run, and run and test all models downstream of them: ```bash # job 2 dbt source freshness # must be run again to compare current to previous state dbt build --select source_status:fresher+ --state path/to/prod/artifacts ``` To learn more, read the docs on [state](https://docs.getdbt.com/reference/node-selection/syntax.md#about-node-selection). #### Pro-tips for dbt Projects[​](#pro-tips-for-dbt-projects "Direct link to Pro-tips for dbt Projects") ##### Limit the data processed when in development[​](#limit-the-data-processed-when-in-development "Direct link to Limit the data processed when in development") In a development environment, faster run times allow you to iterate your code more quickly. We frequently speed up our runs by using a pattern that limits data based on the [target](https://docs.getdbt.com/reference/dbt-jinja-functions/target.md) name: ```sql select * from event_tracking.events {% if target.name == 'dev' %} where created_at >= dateadd('day', -3, current_date) {% endif %} ``` Another option is to use the [environment variable `DBT_CLOUD_INVOCATION_CONTEXT`](https://docs.getdbt.com/docs/build/environment-variables.md#dbt-platform-context). This environment variable provides metadata about the execution context of dbt. The possible values are `prod`, `dev`, `staging`, and `ci`. **Example usage**: ```text {% if env_var('DBT_CLOUD_INVOCATION_CONTEXT') != 'prod' %} ``` ##### Use grants to manage privileges on objects that dbt creates[​](#use-grants-to-manage-privileges-on-objects-that-dbt-creates "Direct link to Use grants to manage privileges on objects that dbt creates") Use `grants` in [resource configs](https://docs.getdbt.com/reference/resource-configs/grants.md) to ensure that permissions are applied to the objects created by dbt. By codifying these grant statements, you can version control and repeatably apply these permissions. ##### Separate source-centric and business-centric transformations[​](#separate-source-centric-and-business-centric-transformations "Direct link to Separate source-centric and business-centric transformations") When modeling data, we frequently find there are two stages: 1. Source-centric transformations to transform data from different sources into a consistent structure, for example, re-aliasing and recasting columns, or unioning, joining or deduplicating source data to ensure your model has the correct grain; and 2. Business-centric transformations that transform data into models that represent entities and processes relevant to your business, or implement business definitions in SQL. We find it most useful to separate these two types of transformations into different models, to make the distinction between source-centric and business-centric logic clear. ##### Managing whitespace generated by Jinja[​](#managing-whitespace-generated-by-jinja "Direct link to Managing whitespace generated by Jinja") If you're using macros or other pieces of Jinja in your models, your compiled SQL (found in the `target/compiled` directory) may contain unwanted whitespace. Check out the [Jinja documentation](http://jinja.pocoo.org/docs/2.10/templates/#whitespace-control) to learn how to control generated whitespace. #### Related docs[​](#related-docs "Direct link to Related docs") * [Updating our permissioning guidelines: grants as configs in dbt Core v1.2](https://docs.getdbt.com/blog/configuring-grants) #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Building metrics tip Note that this best practices guide doesn't yet use the [new YAML specification](https://docs.getdbt.com/docs/build/latest-metrics-spec.md). We're working on updating this guide to use the new spec and file structure soon! To read more about the new spec, see [Creating metrics](https://docs.getdbt.com/docs/build/metrics-overview.md). #### How to build metrics[​](#how-to-build-metrics "Direct link to How to build metrics") * 💹 We'll start with one of the most important metrics for any business: **revenue**. * 📖 For now, our metric for revenue will be **defined as the sum of order totals excluding tax**. #### Defining revenue[​](#defining-revenue "Direct link to Defining revenue") * 🔢 Metrics have four basic properties: * `name:` We'll use 'revenue' to reference this metric. * `description:` For documentation. * `label:` The display name for the metric in downstream tools. * `type:` one of `simple`, `ratio`, or `derived`. * 🎛️ Each type has different `type_params`. * 🛠️ We'll build a **simple metric** first to get the hang of it, and move on to ratio and derived metrics later. * 📏 Simple metrics are built on a **single measure defined as a type parameter**. * 🔜 Defining **measures as their own distinct component** on semantic models is critical to allowing the **flexibility of more advanced metrics**, though simple metrics act mainly as **pass-through that provide filtering** and labeling options. models/marts/orders.yml ```yml metrics: - name: revenue description: Sum of the order total. label: Revenue type: simple type_params: measure: order_total ``` #### Query your metric[​](#query-your-metric "Direct link to Query your metric") You can use the dbt CLI for metric validation or queries during development, via the `dbt sl` set of subcommands. Here are some useful examples: ```bash dbt sl query revenue --group-by metric_time__month dbt sl list dimensions --metrics revenue # list all dimensions available for the revenue metric ``` * It's best practice any time we're updating our Semantic Layer code to run `dbt parse` to update our development semantic manifest. * `dbt sl query` is not how you would typically use the tool in production, that's handled by the dbt Semantic Layer's features. It's available for testing results of various metric queries in development, exactly as we're using it now. * Note the structure of the above query. We select the metric(s) we want and the dimensions to group them by — we use dunders (double underscores e.g.`metric_time__[time bucket]`) to designate time dimensions or other non-unique dimensions that need a specified entity path to resolve (e.g. if you have an orders location dimension and an employee location dimension both named 'location' you would need dunders to specify `orders__location` or `employee__location`). #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Building semantic models tip Note that this best practices guide doesn't yet use the [new YAML specification](https://docs.getdbt.com/docs/build/latest-metrics-spec.md). We're working on updating this guide to use the new spec and file structure soon! To read more about the new spec, see [Creating metrics](https://docs.getdbt.com/docs/build/metrics-overview.md). #### How to build a semantic model[​](#how-to-build-a-semantic-model "Direct link to How to build a semantic model") A semantic model is the Semantic Layer equivalent to a logical layer model (what historically has just been called a 'model' in dbt land). Just as configurations for models are defined on the `models:` YAML key, configurations for semantic models are housed under `semantic models:`. A key difference is that while a logical model consists of configuration and SQL or Python code, a **semantic model is defined purely via YAML**. Rather than encoding a specific dataset, a **semantic model describes relationships and expressions** that let your end users select and refine their own datasets dynamically and reliably. * ⚙️ Semantic models are **comprised of three components**: * 🫂 **entities**: these describe the **relationships** between various semantic models (think ids) * 🔪 **dimensions**: these are the columns you want to **slice, dice, group, and filter by** (think timestamps, categories, booleans). * 📏 **measures**: these are the **quantitative values you want to aggregate** * 🪣 We define **columns as being an entity, dimension, or measure**. Columns will typically fit into one of these 3 buckets, or if they're a complex aggregation expression, they might constitute a metric. #### Defining orders[​](#defining-orders "Direct link to Defining orders") Let's zoom in on how we might define an *orders* semantic model. * 📗 We define it as a **YAML dictionary in the `semantic_models` list**. * 📑 It will have a **name, entities list, dimensions list, and measures list**. * ⏬ We recommend defining them **in this order consistently** as a style best practice. models/marts/orders.yml ```yaml semantic_models: - name: orders entities: ... # we'll define these later dimensions: ... # we'll define these later measures: ... # we'll define these later ``` * Next we'll point to the corresponding logical model by supplying a [`ref`](https://docs.getdbt.com/reference/dbt-jinja-functions/ref.md) in the `model:` property, and a `description` for documentation. models/marts/orders.yml ```yml semantic_models: - name: orders description: | Model containing order data. The grain of the table is the order id. model: ref('stg_orders') entities: ... dimensions: ... measures: ... ``` #### Establishing our entities[​](#establishing-our-entities "Direct link to Establishing our entities") * 🫂 Entities are the **objects and concepts** in our data that *have* dimensions and measures. You can think of them as the **nouns** of our project, the **spines** of our queries that we may want to aggregate by, or simply the **join keys**. * 🔀 Entities help MetricFlow understand **how various semantic models relate to one another**. * ⛓️ Unlike many other semantic layers, in MetricFlow **we do not need to describe joins explicitly**, instead the **relationships are implicitly described by entities**. * 1️⃣ Each semantic model should have **one primary entity** defined for itself, and **any number of foreign entities** for other semantic models it may join to. * 🫂 Entities require a **name and type** * 🔑 Types available are **primary**, **foreign**, **unique** or **natural** — we'll be focused on the first two for now, but you can [read more about unique and natural keys](https://docs.getdbt.com/docs/build/entities.md#entity-types). ##### Entities in action[​](#entities-in-action "Direct link to Entities in action") If we look at an example staging model for orders, we see that it has 3 id columns, so we'll need three entities. models/staging/stg\_orders.sql ```sql renamed as ( select ---------- ids id as order_id, store_id as location_id, customer as customer_id, ---------- properties (order_total / 100.0) as order_total, (tax_paid / 100.0) as tax_paid, ---------- timestamps ordered_at from source ``` * 👉 We add them with a **`name`, `type`, and optional `expr`** (expression). The expression can be any valid SQL expression on your platform. * 📛 If you **don't add an expression**, MetricFlow will **assume the name is equal to the column name** in the underlying logical model. * 👍 Our best practices pattern is to, whenever possible, provide a `name` that is the singular form of the subject or grain of the table, and use `expr` to specify the precise column name (with `_id` etc). This will let us write **more readable metrics** on top of these semantic models. For example, we'll use `location` instead of `location_id`. models/marts/orders.yml ```yml semantic_models: - name: orders ... entities: # we use the column for the name here because order is a reserved word in SQL - name: order_id type: primary - name: location type: foreign expr: location_id - name: customer type: foreign expr: customer_id dimensions: ... measures: ... ``` #### Defining our dimensions[​](#defining-our-dimensions "Direct link to Defining our dimensions") * 🧮 Dimensions are the columns that we want to **filter and group by**, **the adjectives of our project**. They come in three types: * **categorical** * **time** * slowly changing dimensions — [these are covered in the documentation](https://docs.getdbt.com/docs/build/dimensions.md#scd-type-ii), and a little more complex. To focus on building your mental models of MetricFlow's fundamentals, we won't be using SCDs in this guide. * ➕ We're **not limited to existing columns**, we can use the `expr` property to add simple computations in our dimensions. * 📛 Categorical dimensions are the simplest, they simply require a `name` and `type` (type being categorical). **If the `name` property matches the name of the dimension column**, that's it, you're done. If you want or need to use a `name` other than the column name, or do some filtering or computation, **you can supply an optional `expr` property** to evaluate for the dimension. ##### Dimensions in action[​](#dimensions-in-action "Direct link to Dimensions in action") * 👀 Let's look at our staging model again and see what fields we have available. models/staging/stg\_orders.sql ```sql select ---------- ids -> entities id as order_id, store_id as location_id, customer as customer_id, ---------- numerics -> measures (order_total / 100.0) as order_total, (tax_paid / 100.0) as tax_paid, ---------- timestamps -> dimensions ordered_at from source ``` * ⏰ For now the only dimension to add is a **time dimension**: `ordered_at`. * 🕰️ At least one **primary time dimension** is **required** for any semantic models that **have measures**. * 1️⃣ We denote this with the `is_primary` property, or if there is only a one-time dimension supplied it is primary by default. Below we only have `ordered_at` as a timestamp so we don't need to specify anything except the *minimum granularity* we're bucketing to (in this case, day). By this we mean that we're not going to be looking at orders at a finer granularity than a day. models/marts/orders.yml ```yml dimensions: - name: ordered_at expr: date_trunc('day', ordered_at) type: time type_params: time_granularity: day ``` tip **Dimensional models**. You may have some models that do not contain measures, just dimensional data that enriches other facts. That's totally fine, a semantic model does not require dimensions or measures, it just needs a primary entity, and if you do have measures, a primary time dimension. We'll discuss an alternate situation, dimensional tables that have static numeric values like supply costs or tax rates but no time dimensions, later in the Guide. * 🔢 We can also **make a dimension out of a numeric column** that would typically be a measure. * 🪣 Using `expr` we can **create buckets of values that we label** for our dimension. We'll add one of these in for labeling 'large orders' as any order totals over $50. models/marts/orders.yml ```yml dimensions: - name: ordered_at expr: date_trunc('day', ordered_at) type: time type_params: time_granularity: day - name: is_large_order type: categorical expr: case when order_total > 50 then true else false end ``` #### Making our measures[​](#making-our-measures "Direct link to Making our measures") * 📏 Measures are the final component of a semantic model. They describe the **numeric values that we want to aggregate**. * 🧱 Measures form **the building blocks of metrics**, with entities and dimensions helping us combine, group, and filter those metrics correctly. * 🏃 You can think of them as something like the **verbs of a semantic model**. ##### Measures in action[​](#measures-in-action "Direct link to Measures in action") * 👀 Let's look at **our staging model** one last time and see what **fields we want to measure**. models/staging/stg\_orders.sql ```sql select ---------- ids -> entities id as order_id, store_id as location_id, customer as customer_id, ---------- numerics -> measures (order_total / 100.0) as order_total, (tax_paid / 100.0) as tax_paid, ---------- timestamps -> dimensions ordered_at from source ``` * ➕ Here `order_total` and `tax paid` are the **columns we want as measures**. * 📝 We can describe them via the code below, specifying a **name, description, aggregation, and expression**. * 👍 As before MetricFlow will default to the **name being the name of a column when no expression is supplied**. * 🧮 [Many different aggregations](https://docs.getdbt.com/docs/build/measures.md#aggregation) are available to us. Here we just want sums. models/marts/orders.yml ```yml measures: - name: order_total description: The total amount for each order including taxes. agg: sum - name: tax_paid description: The total tax paid on each order. agg: sum ``` * 🆕 We can also **create new measures using expressions**, for instance adding a count of individual orders as below. models/marts/orders.yml ```yml - name: order_count description: The count of individual orders. expr: 1 agg: sum ``` #### Reviewing our work[​](#reviewing-our-work "Direct link to Reviewing our work") Our completed code will look like this, our first semantic model! Here are two examples showing different organizational approaches:  Co-located approach models/marts/orders.yml ```yml semantic_models: - name: orders defaults: agg_time_dimension: ordered_at description: | Order fact table. This table is at the order grain with one row per order. model: ref('stg_orders') entities: - name: order_id type: primary - name: location type: foreign expr: location_id - name: customer type: foreign expr: customer_id dimensions: - name: ordered_at expr: date_trunc('day', ordered_at) # use date_trunc(ordered_at, DAY) if using BigQuery type: time type_params: time_granularity: day - name: is_large_order type: categorical expr: case when order_total > 50 then true else false end measures: - name: order_total description: The total revenue for each order. agg: sum - name: order_count description: The count of individual orders. expr: 1 agg: sum - name: tax_paid description: The total tax paid on each order. agg: sum ```  Parallel sub-folder approach models/semantic\_models/sem\_orders.yml ```yml semantic_models: - name: orders defaults: agg_time_dimension: ordered_at description: | Order fact table. This table is at the order grain with one row per order. model: ref('stg_orders') entities: - name: order_id type: primary - name: location type: foreign expr: location_id - name: customer type: foreign expr: customer_id dimensions: - name: ordered_at expr: date_trunc('day', ordered_at) # use date_trunc(ordered_at, DAY) if using BigQuery type: time type_params: time_granularity: day - name: is_large_order type: categorical expr: case when order_total > 50 then true else false end measures: - name: order_total description: The total revenue for each order. agg: sum - name: order_count description: The count of individual orders. expr: 1 agg: sum - name: tax_paid description: The total tax paid on each order. agg: sum ``` As you can see, the content of the semantic model is identical in both approaches. The key differences are: 1. **File location** * Co-located approach: `models/marts/orders.yml` * Parallel sub-folder approach: `models/semantic_models/sem_orders.yml` 2. **File naming** * Co-located approach: Uses the same name as the corresponding mart (`orders.yml`) * Parallel sub-folder approach: Prefixes the file with `sem_` (`sem_orders.yml`) Choose the approach that best fits your project structure and team preferences. The co-located approach is often simpler for new projects, while the parallel sub-folder approach can be clearer for migrating large existing projects to the Semantic Layer. #### Next steps[​](#next-steps "Direct link to Next steps") Let's review the basics of semantic models: * 🧱 Consist of **entities, dimensions, and measures**. * 🫂 Describe the **semantics and relationships of objects** in the warehouse. * 1️⃣ Correspond to a **single logical model** in your dbt project. Next up, let's use our new semantic model to **build a metric**! #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Clone incremental models as the first step of your CI job Before you begin, you must be aware of a few conditions: * `dbt clone` is only available with dbt version 1.6 and newer. Refer to our [upgrade guide](https://docs.getdbt.com/docs/dbt-versions/upgrade-dbt-version-in-cloud.md) for help enabling newer versions in dbt. * This strategy only works for warehouse that support zero copy cloning (otherwise `dbt clone` will just create pointer views). * Some teams may want to test that their incremental models run in both incremental mode and full-refresh mode. Imagine you've created a [Slim CI job](https://docs.getdbt.com/docs/deploy/continuous-integration.md) in dbt and it is configured to: * Defer to your production environment. * Run the command `dbt build --select state:modified+` to run and test all of the models you've modified and their downstream dependencies. * Trigger whenever a developer on your team opens a PR against the main branch. [![Example of a slim CI job with the above configurations](/img/best-practices/slim-ci-job.png?v=2 "Example of a slim CI job with the above configurations")](#)Example of a slim CI job with the above configurations Now imagine your dbt project looks something like this in the DAG: [![Sample project DAG](/img/best-practices/dag-example.png?v=2 "Sample project DAG")](#)Sample project DAG When you open a pull request (PR) that modifies `dim_wizards`, your CI job will kickoff and build *only the modified models and their downstream dependencies* (in this case, `dim_wizards` and `fct_orders`) into a temporary schema that's unique to your PR. This build mimics the behavior of what will happen once the PR is merged into the main branch. It ensures you're not introducing breaking changes, without needing to build your entire dbt project. #### What happens when one of the modified models (or one of their downstream dependencies) is an incremental model?[​](#what-happens-when-one-of-the-modified-models-or-one-of-their-downstream-dependencies-is-an-incremental-model "Direct link to What happens when one of the modified models (or one of their downstream dependencies) is an incremental model?") Because your CI job is building modified models into a PR-specific schema, on the first execution of `dbt build --select state:modified+`, the modified incremental model will be built in its entirety *because it does not yet exist in the PR-specific schema* and [is\_incremental will be false](https://docs.getdbt.com/docs/build/incremental-models.md#understand-the-is_incremental-macro). You're running in `full-refresh` mode. This can be suboptimal because: * Typically incremental models are your largest datasets, so they take a long time to build in their entirety which can slow down development time and incur high warehouse costs. * There are situations where a `full-refresh` of the incremental model passes successfully in your CI job but an *incremental* build of that same table in prod would fail when the PR is merged into main (think schema drift where [on\_schema\_change](https://docs.getdbt.com/docs/build/incremental-models.md#what-if-the-columns-of-my-incremental-model-change) config is set to `fail`) You can alleviate these problems by zero copy cloning the relevant, pre-existing incremental models into your PR-specific schema as the first step of the CI job using the `dbt clone` command. This way, the incremental models already exist in the PR-specific schema when you first execute the command `dbt build --select state:modified+` so the `is_incremental` flag will be `true`. You'll have two commands for your dbt CI check to execute: 1. Clone all of the pre-existing incremental models that have been modified or are downstream of another model that has been modified: ```shell dbt clone --select state:modified+,config.materialized:incremental,state:old ``` 2. Build all of the models that have been modified and their downstream dependencies: ```shell dbt build --select state:modified+ ``` Because of your first clone step, the incremental models selected in your `dbt build` on the second step will run in incremental mode. [![Clone command in the CI config](/img/best-practices/clone-command.png?v=2 "Clone command in the CI config")](#)Clone command in the CI config Your CI jobs will run faster, and you're more accurately mimicking the behavior of what will happen once the PR has been merged into main. ##### Expansion on "think schema drift" where [on\_schema\_change](https://docs.getdbt.com/docs/build/incremental-models.md#what-if-the-columns-of-my-incremental-model-change) config is set to `fail`" from above[​](#expansion-on-think-schema-drift-where-on_schema_change-config-is-set-to-fail-from-above "Direct link to expansion-on-think-schema-drift-where-on_schema_change-config-is-set-to-fail-from-above") Imagine you have an incremental model `my_incremental_model` with the following config: ```sql {{ config( materialized='incremental', unique_key='unique_id', on_schema_change='fail' ) }} ``` Now, let’s say you open up a PR that adds a new column to `my_incremental_model`. In this case: * An incremental build will fail. * A `full-refresh` will succeed. If you have a daily production job that just executes `dbt build` without a `--full-refresh` flag, once the PR is merged into main and the job kicks off, you will get a failure. So the question is - what do you want to happen in CI? * Do you want to also get a failure in CI, so that you know that once this PR is merged into main you need to immediately execute a `dbt build --full-refresh --select my_incremental_model` in production in order to avoid a failure in prod? This will block your CI check from passing. * Do you want your CI check to succeed, because once you do run a `full-refresh` for this model in prod you will be in a successful state? This may lead unpleasant surprises if your production job is suddenly failing when you merge this PR into main if you don’t remember you need to execute a `dbt build --full-refresh --select my_incremental_model` in production. There’s probably no perfect solution here; it’s all just tradeoffs! Our preference would be to have the failing CI job and have to manually override the blocking branch protection rule so that there are no surprises and we can proactively run the appropriate command in production once the PR is merged. ##### Expansion on "why `state:old`"[​](#expansion-on-why-stateold "Direct link to expansion-on-why-stateold") For brand new incremental models, you want them to run in `full-refresh` mode in CI, because they will run in `full-refresh` mode in production when the PR is merged into `main`. They also don't exist yet in the production environment... they're brand new! If you don't specify this, you won't get an error just a “No relation found in state manifest for…”. So, it technically works without specifying `state:old` but adding `state:old` is more explicit and means it won't even try to clone the brand new incremental models. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Conclusion You're now following best practices in your project, and have optimized the materializations of your DAG. You’re equipped with the 3 main materializations that cover almost any analytics engineering situation! There are more configs and materializations available, as well as specific materializations for certain platforms and adapters — and like everything with dbt, materializations are extensible, meaning you can create your own [custom materializations](https://docs.getdbt.com/guides/create-new-materializations.md) for your needs. So this is just the beginning of what you can do with these powerful configurations. For the vast majority of users and companies though, tables, views, and incremental models will handle everything you can throw at them. Develop your intuition and expertise for these materializations, and you’ll be well on your way to tackling advanced analytics engineering problems. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Configuring materializations #### Configuring materializations[​](#configuring-materializations "Direct link to Configuring materializations") Choosing which materialization is as simple as setting any other configuration in dbt. We’ll look first at how we select our materializations for individual models, then at more powerful ways of setting materializations for entire folders of models. ##### Configuring tables and views[​](#configuring-tables-and-views "Direct link to Configuring tables and views") Let’s look at how we can use tables and views to get started with materializations: * ⚙️ We can configure an individual model’s materialization using a **Jinja `config` block**, and passing in the **`materialized` argument**. This tells dbt what materialization to use. * 🚰 The underlying specifics of what is run depends on [which **adapter** you’re using](https://docs.getdbt.com/docs/supported-data-platforms.md), but the end results will be equivalent. * 😌 This is one of the many valuable aspects of dbt: it lets us use a **declarative** approach, specifying the *outcome* that we want in our code, rather than *specific steps* to achieve it (the latter is an *imperative* approach if you want to get computer science-y about it 🤓). * 🔍 In the below case, we want to create a SQL **view**, and can **declare** that in a **single line of code**. Note that python models [do not support materializing as views](https://docs.getdbt.com/docs/build/materializations.md#python-materializations) at this time. ```sql {{ config( materialized='view' ) }} select ... ``` info 🐍 **Not all adapters support python yet**, check the [docs here to be sure](https://docs.getdbt.com/docs/build/python-models.md#specific-data-platforms) before spending time writing python models. * Configuring a model to materialize as a `table` is simple, and possible for both SQL and python models. - SQL - Python ```sql {{ config( materialized='table' ) }} select ... ``` ```python def model(dbt, session): dbt.config(materialized="table") # model logic return model_df ``` Go ahead and try some of these out! #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Coordinating model versions Coordinating model versions across your mesh is a critical part of the model versioning process. This guide will walk you through the safe best practices for coordinating producers and consumers when introducing model versions. An important part of our dbt Mesh workflow is [model versions](https://docs.getdbt.com/docs/mesh/govern/model-versions.md). This enables better data model management and is critical in a scenario where multiple teams share models across projects. Releasing a new model version safely requires coordination between model producers (who build the models) and model consumers (who depend on them). This guide goes over the following topics: * [How producers introduce new model versions safely](#best-practices-for-producers) * [How consumers evaluate and migrate to those new versions](#best-practices-for-consumers) For how versioning works at a technical level (YAML structure, contracts, aliasing), see [model versions](https://docs.getdbt.com/docs/mesh/govern/model-versions.md). #### Best practices for producers[​](#best-practices-for-producers "Direct link to Best practices for producers") Producers own the creation, rollout, communication, and deprecation of model versions. The following steps go over what producers should do when introducing a new version of a model. * [Step 1: Decide when a change needs a new version](#step-1-decide-when-a-changes-needs-a-new-version) * [Step 2: Create the new version safely](#step-2-create-the-new-version-safely) * [Step 3: Add a deprecation date](#step-3-add-a-deprecation-date) * [Step 4: Communicate the new version](#step-4-communicate-the-new-version) * [Step 5: Remove the old version](#step-5-remove-the-old-version) * [Step 6: Clean up deprecated versions](#step-6-clean-up-deprecated-versions) ###### Step 1: Decide when a change needs a new version[​](#step-1-decide-when-a-change-needs-a-new-version "Direct link to Step 1: Decide when a change needs a new version") When creating an original version of a model, use [model contracts](https://docs.getdbt.com/docs/mesh/govern/model-contracts.md) to ensure that breaking changes produce errors during development. The model contract ensures you, as a producer, are not changing the shape or data type of the output model. If a change breaks the contract, like removing or changing a column type, this means you should create a new model contract, and thus a new model version. Here are some examples of breaking changes that might need a new version: * Removing a column * Renaming a column * Changing a column type Here are some examples of non-breaking changes: * Adding a new column * Fixing a bug in an existing column Here are examples of changes that might be breaking depending on your business logic: * Changing logic behind a metric * Changing granularity * Modifying filters * Rewriting `CASE` statements ###### Step 2: Create the new version safely[​](#step-2-create-the-new-version-safely "Direct link to Step 2: Create the new version safely") After deciding that a change needs a new [version](https://docs.getdbt.com/reference/resource-properties/versions.md), follow these steps to create the new version without disrupting existing workflows. Let's say you're removing a column: 1. Create a new version of the model file. For example, `fishtown_analytics_orders_v2.sql`. Each version of a model must have its own SQL file. 2. Keep the default version stable. In the model's `properties.yml` file: * Set [`versions`](https://docs.getdbt.com/reference/resource-properties/versions.md) to include the old version and the new version: `- v: 1` and `- v: 2` respectively. * Set the [`latest_version:`](https://docs.getdbt.com/reference/resource-properties/latest_version.md) to `latest_version: 1`. This ensures that downstream consumers using `ref(...)` won’t accidentally start using v2. Without setting this, the default will be the highest numerical version, which could be a breaking change for consumers. 3. Set an [alias](https://docs.getdbt.com/reference/resource-configs/alias.md) or create a view over the latest model version. By aliasing or creating a view over the latest model version, you ensure `fishtown_analytics_orders` (without the version suffix) always exists as an object in the warehouse, pointing to the latest version. This also protects external tools and BI dashboards. ###### Step 3: Add a deprecation date[​](#step-3-add-a-deprecation-date "Direct link to Step 3: Add a deprecation date") 1. In the model's `properties.yml` file, set a [`deprecation_date`](https://docs.getdbt.com/reference/resource-properties/deprecation_date.md) for the model's old version. The `deprecation_date` is a date in the future that signifies when the old version will be removed. This notifies downstream consumers and will appear in the `dbt run` logs as a warning that the old version is nearing deprecation and consumers will need to [migrate](#best-practices-for-consumers) to the new version. info If your model has an [enforced contract](https://docs.getdbt.com/docs/mesh/govern/model-contracts.md), you cannot delete the model until after the `deprecation_date` has passed. dbt doesn't allow deleting models with enforced contracts before their `deprecation_date` to protect downstream consumers. If you try to delete a versioned model before its `deprecation_date`, dbt will raise an error during development runs and cause jobs to fail. models/properties.yml ```yaml models: - name: fishtown_analytics_orders latest_version: 1 columns: - name: column_to_remove - name: column_to_keep versions: - v: 1 # old version — uses all top-level columns deprecation_date: "2025-12-31" - v: 2 # new version columns: - include: all exclude: [column_to_remove] # <— specify which columns were removed in v2 ``` 2. Merge the new version into the main branch. 3. Run the job to build the new version. 4. Verify that the new version builds successfully. 5. Verify that the deprecation date is set correctly in the `dbt run` logs. If you try to reference models (for example, `{{ ref('upstream_project', 'model_name', v=1) }}`) using the `v=1` argument after the deprecation date, the `ref` call will fail once the producer project removes the `v1` version. ###### Step 4: Communicate the new version[​](#step-4-communicate-the-new-version "Direct link to Step 4: Communicate the new version") After creating a new version and setting a deprecation date for the old version, communicate the new version to downstream consumers. Let them know that: * A new version is available and a deprecation timeline exists. * Consumers can test the new version and [migrate](#best-practices-for-consumers) over. * To test the new version, consumers can use `v=2` when referencing the model. For example, `{{ ref('upstream_project', 'model_name', v=2) }}`. ###### Step 5: Remove the old version[​](#step-5-remove-the-old-version "Direct link to Step 5: Remove the old version") Once the consumers confirm they've tested and migrated over to the new version, you can set the new version as the latest version: models/properties.yml ```yaml models: - name: fishtown_analytics_orders latest_version: 2 # update from 1 to 2 to set the new version as the latest version versions: - v: 1 # this represents the old version - v: 2 # this represents the new version ``` This then updates the default `ref` to the new version. For example, `{{ ref('upstream_project', 'fishtown_analytics_orders') }}` will now resolve to the `fishtown_analytics_orders_v2` model in the `upstream_project`. If consumers want to use the old version, they can use `v=1` when referencing the model: `{{ ref('upstream_project', 'fishtown_analytics_orders', v=1) }}`. ###### Step 6: Clean up deprecated versions[​](#step-6-clean-up-deprecated-versions "Direct link to Step 6: Clean up deprecated versions") After all consumers have [migrated](#best-practices-for-consumers) to the new version, you can clean up the deprecated version. You could choose to "hard delete" all old versions, or "soft delete" them for continuity. info If your model has an [enforced contract](https://docs.getdbt.com/docs/mesh/govern/model-contracts.md), you cannot delete the model until after the `deprecation_date` has passed. dbt doesn't allow deleting models with enforced contracts before their `deprecation_date` to protect downstream consumers. If you try to delete a versioned model before its `deprecation_date`, dbt will raise an error during development runs and cause jobs to fail. * Hard delete (cleanest) * Soft delete (retains continuity) "Hard deleting" old versions is the cleanest approach and removes all old version artifacts from your project: 1. Delete the `fishtown_analytics_orders_v1.sql` file and rename the new version back to `fishtown_analytics_orders.sql`. 2. Delete all version specifications from your `.yml` file. 3. Drop or delete the `fishtown_analytics_orders_v1` object from your warehouse with a manual script or using a cleanup macro. "Soft deleting" old versions retains all old version artifacts to avoid confusion if more model versions get introduced in future, and for continuity. Bear in mind that your version control platform will also have the history of all of these changes. 1. Repoint the `fishtown_analytics_orders` alias to your latest version file (for example, `fishtown_analytics_orders_v2`), or create a view on top of the latest model version. 2. Use the `enabled` [config option](https://docs.getdbt.com/reference/resource-configs/enabled.md) to disable the deprecated model version so that it doesn’t run in dbt jobs and can’t be referenced in a cross-project ref. For example: models/properties.yml ```yaml models: - name: fishtown_analytics_orders latest_version: 1 columns: - name: column_to_remove - name: column_to_keep versions: - v: 1 # old version — uses all top-level columns deprecation_date: "2025-12-31" config: enabled: false # disable deprecated version so it no longer runs - v: 2 # new version columns: - include: all exclude: [column_to_remove] # <— specify which columns were removed in v2 ``` 3. Drop or delete the `fishtown_analytics_orders_v1` object from your warehouse with a manual script or appropriate process or using a cleanup macro. ... and that's it! You should now have a new version of the model and a deprecated version. The next section is meant for consumers to evaluate and migrate to the new version. #### Best practices for consumers[​](#best-practices-for-consumers "Direct link to Best practices for consumers") Consumers rely on upstream models and need to make sure that version transitions don’t introduce unintended breakages. Refer to the following steps to migrate to the new version: 1. Begin writing a cross-project reference to use a public model from a different project. In this case, `{{ ref('upstream_project', 'fishtown_analytics_orders') }}`. 2. Once you see deprecation warnings, test the latest version of a model by explicitly referencing it in your `ref`. For example, `{{ ref('upstream_project', 'fishtown_analytics_orders', v=2) }}`. Check if it's a breaking change for you or has any unintended impacts on your project. * If it does, consider explicitly “pinning” to the current, working version of the model before the new version becomes the default: `{{ ref('upstream_project', 'fishtown_analytics_orders', v=1) }}`. Bear in mind that you will need to migrate at some point before the deprecation date. 3. Before the deprecation date, you can migrate to the new version of the model by removing the version specification in your cross-project reference: `{{ ref('upstream_project', 'fishtown_analytics_orders')`. Make any downstream logic changes needed to accommodate this new version. Consumers should plan migrations to align with their own team’s release cycles. #### Related docs[​](#related-docs "Direct link to Related docs") * [Quickstart with Mesh](https://docs.getdbt.com/guides/mesh-qs.md) #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### dbt Mesh FAQs Mesh is a new architecture enabled by dbt. It allows you to better manage complexity by deploying multiple interconnected dbt projects instead of a single large, monolithic project. It’s designed to accelerate development, without compromising governance. #### Overview of Mesh[​](#overview-of-mesh "Direct link to Overview of Mesh")  What are the main benefits of implementing dbt Mesh? Here are some benefits of implementing dbt Mesh: * **Ship data products faster**: With a more modular architecture, teams can make changes rapidly and independently in specific areas without impacting the entire system, leading to faster development cycles. * **Improve trust in data:** Adopting dbt Mesh helps ensure that changes in one domain's data models do not unexpectedly break dependencies in other domain areas, leading to a more secure and predictable data environment. * **Reduce complexity**: By organizing transformation logic into distinct domains, dbt Mesh reduces the complexity inherent in large, monolithic projects, making them easier to manage and understand. * **Improve collaboration**: Teams are able to share and build upon each other's work without duplicating efforts. Most importantly, all this can be accomplished without the central data team losing the ability to see lineage across the entire organization, or compromising on governance mechanisms.  What are model contracts? dbt [model contracts](https://docs.getdbt.com/docs/mesh/govern/model-contracts.md) serve as a governance tool enabling the definition and enforcement of data structure standards in your dbt models. They allow you to specify and uphold data model guarantees, including column data types, allowing for the stability of dependent models. Should a model fail to adhere to its established contracts, it will not build successfully.  What are model versions? dbt [model versions](https://docs.getdbt.com/docs/mesh/govern/model-versions.md) are iterations of your dbt models made over time. In many cases, you might knowingly choose to change a model’s structure in a way that “breaks” the previous model contract, and may break downstream queries depending on that model’s structure. When you do so, creating a new version of the model is useful to signify this change. You can use model versions to: * Test "prerelease" changes (in production, in downstream systems). * Bump the latest version, to be used as the canonical "source of truth." * Offer a migration window off the "old" version.  What are model access modifiers? A [model access modifier](https://docs.getdbt.com/docs/mesh/govern/model-access.md) in dbt determines if a model is accessible as an input to other dbt models and projects. It specifies where a model can be referenced using [the `ref` function](https://docs.getdbt.com/reference/dbt-jinja-functions/ref.md). There are three types of access modifiers: * **Private:** A model with a private access modifier is only referenceable by models within the same group. This is intended for models that are implementation details and are meant to be used only within a specific group of related models. * **Protected:** Models with a protected access modifier can be referenced by any other model within the same dbt project or when the project is installed as a package. This is the default setting for all models, ensuring backward compatibility, especially when groups are assigned to an existing set of models. * **Public:** A public model can be referenced across different groups, packages, or projects. This is suitable for stable and mature models that serve as interfaces for other teams or projects.  What are model groups? A [model group](https://docs.getdbt.com/docs/mesh/govern/model-access.md#groups) in dbt is a concept used to organize models under a common category or ownership. This categorization can be based on various criteria, such as the team responsible for the models or the specific data source they model.  What are some potential challenges when using dbt Mesh? This is a new way of working, and the intentionality required to build, and then maintain, cross-project interfaces and dependencies may feel like a slowdown versus what some developers are used to. The intentional friction introduced promotes thoughtful changes, fostering a mindset that values stability and systematic adjustments over rapid transformations. Orchestration across multiple projects is also likely to be slightly more challenging for many organizations, although we’re currently developing new functionality that will make this process simpler.  How does this relate to the concept of data mesh? dbt Mesh allows you to better *operationalize* data mesh by enabling decentralized, domain-specific data ownership and collaboration. In data mesh, each business domain is responsible for its data as a product. This is the same goal that dbt Mesh facilitates by enabling organizations to break down large, monolithic data projects into smaller, domain-specific dbt projects. Each team or domain can independently develop, maintain, and share its data models, fostering a decentralized data environment. dbt Mesh also enhances the interoperability and reusability of data across different domains, a key aspect of the data mesh philosophy. By allowing cross-project references and shared governance through model contracts and access controls, dbt Mesh ensures that while data ownership is decentralized, there is still a governed structure to the overall data architecture. #### How dbt Mesh works[​](#how-dbt-mesh-works "Direct link to How dbt Mesh works")  Can dbt Mesh handle cyclic dependencies between projects? You can enable bidirectional dependencies across projects so these relationships can go in either direction, meaning that the `jaffle_finance` project can add a new model that depends on any public models produced by the `jaffle_marketing` project, so long as the new dependency doesn't introduce any node-level cycles. dbt checks for cycles across projects and raises errors if any are detected. When setting up projects that depend on each other, it's important to do so in a stepwise fashion. Each project must run and produce public models before the original producer project can take a dependency on the original consumer project. For example, the order of operations would be as follows for a simple two-project setup: 1. The `project_a` project runs in a deployment environment and produces public models. 2. The `project_b` project adds `project_a` as a dependency. 3. The `project_b` project runs in a deployment environment and produces public models. 4. The `project_a` project adds `project_b` as a dependency.  Is it possible for multiple projects to directly reference a shared source? While it’s not currently possible to share sources across projects, it would be possible to have a shared foundational project, with staging models on top of those sources, exposed as “public” models to other teams/projects.  What if a model I've already built on from another project later becomes protected? This would be a breaking change for downstream consumers of that model. If the maintainers of the upstream project wish to remove the model (or “downgrade” its access modifier, effectively the same thing), they should mark that model for deprecation (using [deprecation\_date](https://docs.getdbt.com/reference/resource-properties/deprecation_date.md)), which will deliver a warning to all downstream consumers of that model. In the future, we plan for dbt to also be able to proactively flag this scenario in [continuous integration](https://docs.getdbt.com/docs/deploy/continuous-integration.md) for the maintainers of the upstream public model.  If I run \`dbt build --select +model\`, will this trigger a run of upstream models in other projects? No, unless upstream projects are installed as [packages](https://docs.getdbt.com/docs/build/packages.md) (source code). In that case, the models in project installed as a project become “your” models, and you can select or run them. There are cases in which this can be desirable; see docs on [project dependencies](https://docs.getdbt.com/docs/mesh/govern/project-dependencies.md).  If each project/domain has its own data warehouse, is it still possible to build models across them? Yes, as long as they’re in the same data platform (BigQuery, Databricks, Redshift, Snowflake, etc.) and you have configured permissions and sharing in that data platform provider to allow this.  Can I run tests that involve tables from multiple different projects? Yes, because the cross-project collaboration is done using the `{{ ref() }}` macro, you can use those models from other teams in [singular tests](https://docs.getdbt.com/docs/build/data-tests.md#singular-data-tests).  Which team's data schema would dbt Mesh create? Each team defines their connection to the data warehouse, and the default schema names for dbt to use when materializing datasets. By default, each project belonging to a team will create: * One schema for production runs (for example, `finance`). * One schema per developer (for example, `dev_jerco`). Depending on each team’s needs, this can be customized with model-level [schema configurations](https://docs.getdbt.com/docs/build/custom-schemas.md), including the ability to define different rules by environment.  Is it possible to apply model contracts to source data? No, contracts can only be applied at the [model level](https://docs.getdbt.com/docs/mesh/govern/model-contracts.md). It is a recommended best practice to [define staging models](https://docs.getdbt.com/best-practices/how-we-structure/2-staging.md) on top of sources, and it is possible to define contracts on top of those staging models.  Can contracts be partially enforced? No. A contract applies to an entire model, including all columns in the model’s output. This is the same set of columns that a consumer would see when viewing the model’s details in Explorer, or when querying the model in the data platform. * If you wish to contract only a subset of columns, you can create a separate model (materialized as a view) selecting only that subset. * If you wish to limit which rows or columns a downstream consumer can see when they query the model’s data, depending on who they are, some data platforms offer advanced capabilities around dynamic row-level access and column-level data masking.  Can I have multiple owners in a group? No, a [group](https://docs.getdbt.com/docs/mesh/govern/model-access.md#groups) can only be assigned to a single owner. However, the assigned owner can be a *team*, rather than an individual.  Can contracts be assigned individual owners? Not directly, but contracts are [assigned to models](https://docs.getdbt.com/docs/mesh/govern/model-contracts.md) and models can be assigned to individual owners. You can use meta fields for this purpose.  Can I make a model “public” only for specific team(s) to use? This is not currently possible, but something we hope to enable in the near future. If you’re interested in this functionality, please reach out to your dbt Labs account team.  Is it possible to orchestrate job runs across multiple different projects? dbt will soon offer the capability to trigger jobs on the completion of another job, including a job in a different project. This offers one mechanism for executing a pipeline from start to finish across projects.  Integrations available between the dbt Discovery API and other tools for cross-project lineage? Yes. In addition to being viewable natively through [Catalog](https://www.getdbt.com/product/dbt-explorer), it is possible to view cross-project lineage connect using partner integrations with data cataloging tools. For a list of available dbt integrations, refer to the [Integrations page](https://www.getdbt.com/product/integrations).  How does data restatement work in dbt Mesh, particularly when fixing a data set bug? Tests and model contracts in dbt help eliminate the need to restate data in the first place. With these tools, you can incorporate checks at the source and output layers of your dbt projects to assess data quality in the most critical places. When there are changes in transformation logic (for example, the definition of a particular column is changed), restating the data is as easy as merging the updated code and running a dbt job. If a data quality issue does slip through, you also have the option of simply rolling back the git commit, and then re-running the dbt job with the old code.  How does dbt handle job run logs and can it feed them to standard monitoring tools, reports, etc.? Yes, all of this metadata is accessible via the [dbt Admin API](https://docs.getdbt.com/docs/dbt-cloud-apis/admin-cloud-api.md). This metadata can be fed into a monitoring tool, or used to create reports and dashboards. We also expose some of this information in dbt itself in [jobs](https://docs.getdbt.com/docs/deploy/jobs.md), [environments](https://docs.getdbt.com/docs/environments-in-dbt.md) and in [Catalog](https://www.getdbt.com/product/dbt-explorer).  Can dbt Mesh reference models in other accounts within the same data platform? You can reference models in other accounts within the same data platform by leveraging the data-sharing capabilities of that platform, as long as the database identifier of the public model is consistent across the producer and consumer. For example, [Snowflake cross-account data shares](https://docs.snowflake.com/en/user-guide/data-sharing-intro), [Databricks Unity Catalog across workspaces](https://docs.databricks.com/en/data-governance/unity-catalog/index.html), or multiple BigQuery projects. #### Permissions and access[​](#permissions-and-access "Direct link to Permissions and access")  How do user access permissions work in dbt Mesh? The existence of projects that have at least one public model will be visible to everyone in the organization with [read-only access](https://docs.getdbt.com/docs/cloud/manage-access/seats-and-users.md). Private or protected models require a user to have read-only access to the specific project to see its existence.  How do all the different types of “access” interact? There’s model-level access within dbt, role-based access for users and groups in dbt, and access to the underlying data within the data platform. First things first: access to underlying data is always defined and enforced by the underlying data platform (for example, BigQuery, Databricks, Redshift, Snowflake, Starburst, etc.) This access is managed by executing “DCL statements” (namely `grant`). dbt makes it easy to [configure `grants` on models](https://docs.getdbt.com/reference/resource-configs/grants.md), which provision data access for other roles/users/groups in the data warehouse. However, dbt does *not* automatically define or coordinate those grants unless they are configured explicitly. Refer to your organization's system for managing data warehouse permissions. [dbt Enterprise and Enterprise+ plans](https://www.getdbt.com/pricing) support [role-based access control (RBAC)](https://docs.getdbt.com/docs/cloud/manage-access/about-user-access.md#role-based-access-control-) that manages granular permissions for users and user groups. You can control which users can see or edit all aspects of a dbt project. A user’s access to dbt projects also determines whether they can “explore” that project in detail. Roles, users, and groups are defined within the dbt application via the UI or by integrating with an identity provider. [Model access](https://docs.getdbt.com/docs/mesh/govern/model-access.md) defines where models can be referenced. It also informs the discoverability of those projects within Catalog. Model `access` is defined in code, just like any other model configuration (`materialized`, `tags`, etc). * **Public:** Models with `public` access can be referenced everywhere. These are the “data products” of your organization. * **Protected:** Models with `protected` access can only be referenced within the same project. This is the default level of model access. We are discussing a future extension to `protected` models to allow for their reference in *specific* downstream projects. Please read [the GitHub issue](https://github.com/dbt-labs/dbt-core/issues/9340), and upvote/comment if you’re interested in this use case. * **Private:** Model `groups` enable more-granular control over where `private` models can be referenced. By defining a group, and configuring models to belong to that group, you can restrict other models (not in the same group) from referencing any `private` models the group contains. Groups also provide a standard mechanism for defining the `owner` of all resources it contains. Within Catalog, `public` models are discoverable for every user in the dbt account — every public model is listed in the “multi-project” view. By contrast, `protected` and `private` models in a project are visible only to users who have access to that project (including read-only access). Because dbt does not implicitly coordinate data warehouse `grants` with model-level `access`, it is possible for there to be a mismatch between them. For example, a `public` model’s metadata is viewable to all dbt users, anyone can write a `ref` to that model, but when they actually run or preview, they realize they do not have access to the underlying data in the data warehouse. **This is intentional.** In this way, your organization can retain least-privileged access to underlying data, while providing visibility and discoverability for the wider organization. Armed with the knowledge of which other “data products” (public models) exist — their descriptions, their ownership, which columns they contain — an analyst on another team can prepare a well-informed request for access to the underlying data.  Is it possible to request access permissions from other teams within dbt? Not currently! But this is something we may evaluate in the future.  As a central data team member, can I still maintain visibility on the entire organizational DAG? Yes! As long as a user has permissions (at least read-only access) on all projects in a dbt account, they can navigate across the entirety of the organization’s DAG in Catalog, and see models at all levels of detail.  How can I limit my developers from accessing sensitive production data when referencing from other projects? By default, cross-project references resolve to the “Production” deployment environment of the upstream project. If your organization has genuinely different data in production versus non-production environments, this poses an issue. For this reason, we rolled out canonical type of deployment environment: “[Staging](https://docs.getdbt.com/docs/deploy/deploy-environments.md#staging-environment).” If a project defines both a “Production” environment and a “Staging” environment, then cross-project references from development and “Staging” environments will resolve to “Staging,” whereas only references coming from “Production” environments will resolve to “Production.” In this way, you are guaranteed separation of data environments, without needing to duplicate project configurations.  Does dbt Mesh work if projects are 'duplicated' (dev project <> prod project)? The short answer is "no." Cross-project references require that each project `name` be unique in your dbt account. Historical limitations required customers to "duplicate" projects so that one actual dbt project (codebase) would map to more than one dbt project. To that end, we are working to remove the historical limitations that required customers to "duplicate" projects in dbt — Staging environments for data isolation, environment-level permissions, and environment-level data warehouse connections (coming soon). Once those pieces are in place, it should no longer be necessary to define separate dbt projects to isolate data environments or permissions. #### Compatibility with other features[​](#compatibility-with-other-features "Direct link to Compatibility with other features")  How does the dbt Semantic Layer relate to and work with dbt Mesh? The [Semantic Layer](https://docs.getdbt.com/docs/use-dbt-semantic-layer/dbt-sl.md) and dbt Mesh are complementary mechanisms enabled by dbt that work together to enhance the management, usability, and governance of data in large-scale data environments. The Semantic Layer in dbt allows teams to centrally define business metrics and dimensions. It ensures consistent and reliable metric definitions across various analytics tools and platforms. Mesh enables organizations to split their data architecture into multiple domain-specific projects, while retaining the ability to reference “public” models across projects. It is also possible to reference a “public” model from another project for the purpose of defining semantic models and metrics. Your organization can have multiple dbt projects feed into a unified semantic layer, ensuring that metrics and dimensions are consistently defined and understood across these domains. When using the dbt Semantic Layer in a [dbt Mesh](https://docs.getdbt.com/best-practices/how-we-mesh/mesh-1-intro.md) setting, we recommend the following: * You have one standalone project that contains your semantic models and metrics. * Then as you build your Semantic Layer, you can [cross-reference dbt models](https://docs.getdbt.com/docs/mesh/govern/project-dependencies.md) across your various projects or packages to create your semantic models using the [two-argument `ref` function](https://docs.getdbt.com/reference/dbt-jinja-functions/ref.md#ref-project-specific-models)( `ref('project_name', 'model_name')`). * Your dbt Semantic Layer project serves as a global source of truth across the rest of your projects. ###### Usage example[​](#usage-example "Direct link to Usage example") For example, let's say you have a public model (`fct_orders`) that lives in the `jaffle_finance` project. As you build your semantic model, use the following syntax to ref the model: Notice that in the `model` parameter, we're using the `ref` function with two arguments to reference the public model `fct_orders` defined in the `jaffle_finance` project.
 How does dbt Catalog relate to and work with dbt Mesh? **[Catalog](https://docs.getdbt.com/docs/explore/explore-projects.md)** is a tool within dbt that serves as a knowledge base and lineage visualization platform. It provides a comprehensive view of your dbt assets, including models, tests, sources, and their interdependencies. Used in conjunction with dbt Mesh, Catalog becomes a powerful tool for visualizing and understanding the relationships and dependencies between models across multiple dbt projects.  How does the dbt CLI relate to and work with dbt Mesh? The [dbt CLI](https://docs.getdbt.com/docs/cloud/cloud-cli-installation.md) allows users to develop and run dbt commands from their preferred development environments, like VS Code, Sublime Text, or terminal interfaces. This flexibility is particularly beneficial in a dbt Mesh setup, where managing multiple projects can be complex. Developers can work in their preferred tools while leveraging the centralized capabilities of dbt.  Can I upgrade Mesh projects to Fusion incrementally? Yes! You can upgrade select projects to the dbt Fusion engine while keeping others on dbt Core. * Fusion projects can reference public models from dbt Core projects * dbt Core projects can reference public models from Fusion projects This works because dbt Mesh uses a publication artifact (not the manifest) to resolve cross-project references, and this artifact is identical between dbt Core and Fusion. You can upgrade dbt Mesh projects to Fusion in any order and there's no requirement to start with upstream or downstream projects first. Feature optimization While basic Mesh functionality works in hybrid setups, some advanced platform features (like full Catalog lineage visibility across projects) work best when all projects use the same engine. #### Availability[​](#availability "Direct link to Availability")  Does dbt Mesh require me to be on a specific version of dbt? Yes, your account must be on [at least dbt v1.6](https://docs.getdbt.com/docs/dbt-versions/upgrade-dbt-version-in-cloud.md) to take advantage of [cross-project dependencies](https://docs.getdbt.com/docs/mesh/govern/project-dependencies.md), one of the most crucial underlying capabilities required to implement a dbt Mesh.  Is there a way to leverage dbt Mesh capabilities in dbt Core? While dbt Core defines several of the foundational elements for dbt Mesh, dbt offers an enhanced experience that leverages these elements for scaled collaboration across multiple teams, facilitated by multi-project discovery in Catalog that’s tailored to each user’s access. Several key components that underpin the dbt Mesh pattern, including [model contracts, versions, and access modifiers](https://docs.getdbt.com/docs/mesh/govern/about-model-governance.md), are defined and implemented in dbt Core. We believe these are components of the core language, which is why their implementations are open source. We want to define a standard pattern that analytics engineers everywhere can adopt, extend, and help us improve. To reference models defined in another project, users can also leverage [packages](https://docs.getdbt.com/docs/build/packages.md), a longstanding feature of dbt Core. By importing an upstream project as a package, dbt will import all models defined in that project, which enables the resolution of cross-project references to those models. They can be [optionally restricted](https://docs.getdbt.com/docs/mesh/govern/model-access.md#how-do-i-restrict-access-to-models-defined-in-a-package) to just the models with `public` access. The major distinction comes with dbt's metadata service, which is unique to the dbt platform and allows for the resolution of references to only the public models in a project. This service enables users to take dependencies on upstream projects, and reference just their `public` models, *without* needing to load the full complexity of those upstream projects into their local development environment.  Does dbt Mesh require a specific dbt plan? Yes, a [dbt Enterprise-tier](https://www.getdbt.com/pricing) plan is required to set up multiple projects and reference models across them. Refer to [model governance](https://docs.getdbt.com/docs/mesh/govern/about-model-governance.md) for more information on the features available across dbt plans. #### Tips on implementing dbt Mesh[​](#tips-on-implementing-dbt-mesh "Direct link to Tips on implementing dbt Mesh")  Is there a recommended migration or implementation process? Refer to our developer guide on [How we structure our dbt Mesh projects](https://docs.getdbt.com/best-practices/how-we-mesh/mesh-1-intro.md). You can also learn how to implement dbt Mesh by following our [Quickstart dbt Mesh](https://docs.getdbt.com/guides/mesh-qs.md) guide.  My team isn’t structured to require multiple projects today. What aspects of dbt Mesh are relevant to me? Let’s say your organization has fewer than 500 models and fewer than a dozen regular contributors to dbt. You're operating at a scale well served by the monolith (a single project), and the larger pattern of dbt Mesh probably won't provide any immediate benefits. It’s never too early to think about how you’re organizing models *within* that project. Use model `groups` to define clear ownership boundaries and `private` access to restrict purpose-built models from becoming load-bearing blocks in an unrelated section of the DAG. Your future selves will thank you for defining these interfaces, especially if you reach a scale where it makes sense to “graduate” the interfaces between `groups` into boundaries between projects. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Deciding how to structure your dbt Mesh #### Exploring mesh patterns[​](#exploring-mesh-patterns "Direct link to Exploring mesh patterns") When adopting a multi-project architecture, where do you draw the lines between projects? How should you organize data workflows in a world where instead of having a single dbt DAG, you have multiple projects speaking to each other, each comprised of their own DAG? Adopting the Mesh pattern is not a one-size-fits-all process. In fact, it's the opposite! It's about customizing your project structure to fit *your* team and *your* data. Now you can mold your organizational knowledge graph to your organizational people graph, bringing people and data closer together rather than compromising one for the other. While there is not a single best way to implement this pattern, there are some common decision points that will be helpful for you to consider. At a high level, you’ll need to decide: * Where to draw the lines between your dbt Projects -- i.e. how do you determine where to split your DAG and which models go in which project? * How to manage your code -- do you want multiple dbt Projects living in the same repository (mono-repo) or do you want to have multiple repos with one repo per project? tip To help you get started, check out our [Quickstart with Mesh](https://docs.getdbt.com/guides/mesh-qs.md) or our online [Mesh course](https://learn.getdbt.com/courses/dbt-mesh) to learn more! #### Define your project interfaces by splitting your DAG[​](#define-your-project-interfaces-by-splitting-your-dag "Direct link to Define your project interfaces by splitting your DAG") The first (and perhaps most difficult!) decision when migrating to a multi-project architecture is deciding where to draw the line in your DAG to define the interfaces between your projects. Let's explore some language for discussing the design of these patterns. ##### Vertical splits[​](#vertical-splits "Direct link to Vertical splits") Vertical splits separate out layers of transformation in DAG order. Let's look at some examples. * **Splitting up staging and mart layers** to create a more tightly-controlled, shared set of components that other projects build on but can't edit. * **Isolating earlier models for security and governance requirements** to separate out and mask PII data so that downstream consumers can't access it is a common use case for a vertical split. * **Protecting complex or expensive data** to isolate large or complex models that are expensive to run so that they are safe from accidental selection, independently deployable, and easier to debug when they have issues. [![A simplified dbt DAG with a dotted line representing a vertical split.](/img/best-practices/how-we-mesh/vertical_split.png?v=2 "A simplified dbt DAG with a dotted line representing a vertical split.")](#)A simplified dbt DAG with a dotted line representing a vertical split. ##### Horizontal splits[​](#horizontal-splits "Direct link to Horizontal splits") Horizontal splits separate your DAG based on source or domain. These splits are often based around the shape and size of the data and how it's used. Let's consider some possibilities for horizontal splitting. * **Team consumption patterns.** For example, splitting out the marketing team's data flow into a separate project. * **Data from different sources.** For example, clickstream event data and transactional ecommerce data may need to be modeled independently of each other. * **Team workflows.** For example, if two embedded groups operate at different paces, you may want to split the projects up so they can move independently. [![A simplified dbt DAG with a dotted line representing a horizontal split.](/img/best-practices/how-we-mesh/horizontal_split.png?v=2 "A simplified dbt DAG with a dotted line representing a horizontal split.")](#)A simplified dbt DAG with a dotted line representing a horizontal split. ##### Combining these strategies[​](#combining-these-strategies "Direct link to Combining these strategies") * **These are not either/or techniques**. You should consider both types of splits, and combine them in any way that makes sense for your organization. * **Pick one type of split and focus on that first**. If you have a hub-and-spoke team topology for example, handle breaking out the central platform project before you split the remainder into domains. Then if you need to break those domains up horizontally you can focus on that after the fact. * **DRY applies to underlying data, not just code.** Regardless of your strategy, you should not be sourcing the same rows and columns into multiple nodes. When working within a mesh pattern it becomes increasingly important that we don't duplicate logic or data. [![A simplified dbt DAG with two dotted lines representing both a vertical and horizontal split.](/img/best-practices/how-we-mesh/combined_splits.png?v=2 "A simplified dbt DAG with two dotted lines representing both a vertical and horizontal split.")](#)A simplified dbt DAG with two dotted lines representing both a vertical and horizontal split. #### Determine your git strategy[​](#determine-your-git-strategy "Direct link to Determine your git strategy") A multi-project architecture can exist in a single repo (monorepo) or as multiple projects, with each one being in their own repository (multi-repo). * If you're a **smaller team** looking primarily to speed up and simplify development, a **monorepo** is likely the right choice, but can become unwieldy as the number of projects, models and contributors grow. * If you’re a **larger team with multiple groups**, and need to decouple projects for security and enablement of different development styles and rhythms, a **multi-repo setup** is your best bet. #### Projects, splits, and teams[​](#projects-splits-and-teams "Direct link to Projects, splits, and teams") Since the launch of Mesh, the most common pattern we've seen is one where projects are 1:1 aligned to teams, and each project has its own codebase in its own repository. This isn’t a hard-and-fast rule: Some organizations want multiple teams working out of a single repo, and some teams own multiple domains that feel awkward to keep combined. Users may need to contribute models across multiple projects and this is fine. There will be some friction doing this, versus a single repo, but this is *useful* friction, especially if upstreaming a change from a “spoke” to a “hub.” This should be treated like making an API change, one that the other team will be living with for some time to come. You should be concerned if your teammates find they need to make a coordinated change across multiple projects very frequently (every week), or as a key prerequisite for ~20%+ of their work. ##### Cycle detection[​](#cycle-detection "Direct link to Cycle detection") You can enable bidirectional dependencies across projects so these relationships can go in either direction, meaning that the `jaffle_finance` project can add a new model that depends on any public models produced by the `jaffle_marketing` project, so long as the new dependency doesn't introduce any node-level cycles. dbt checks for cycles across projects and raises errors if any are detected. When setting up projects that depend on each other, it's important to do so in a stepwise fashion. Each project must run and produce public models before the original producer project can take a dependency on the original consumer project. For example, the order of operations would be as follows for a simple two-project setup: 1. The `project_a` project runs in a deployment environment and produces public models. 2. The `project_b` project adds `project_a` as a dependency. 3. The `project_b` project runs in a deployment environment and produces public models. 4. The `project_a` project adds `project_b` as a dependency. ##### Tips and tricks[​](#tips-and-tricks "Direct link to Tips and tricks") The [implementation](https://docs.getdbt.com/best-practices/how-we-mesh/mesh-4-implementation.md) page provides more in-depth examples of how to split a monolithic project into multiple projects. Here are some tips to get you started when considering the splitting methods listed above on your own projects: 1. Start by drawing a diagram of your teams doing data work. Map each team to a single dbt project. If you already have an existing monolithic project, and you’re onboarding *net-new teams,* this could be as simple as declaring the existing project as your “hub” and creating new “spoke” sandbox projects for each team. 2. Split off common foundations when you know that multiple downstream teams will require the same data source. Those could be upstreamed into a centralized hub or split off into a separate foundational project. need some splits to facilitate other splits, for example, source staging models in A that are used in both B and C (lack of project cycles). 3. Split again to introduce intentional friction and encapsulate a particular set of models (for example, for external export). 4. Recombine if you have “hot path” subsets of the DAG that you need to deploy with low latency because it powers in-app reporting or operational analytics. It might make sense to have a different dedicated team own these data models (see principle 1), similar to how software services with significantly different performance characteristics often warrant dedicated infrastructure, architecture, and staffing. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Don't nest your curlies ##### Poetry[​](#poetry "Direct link to Poetry") **Don't Nest Your Curlies** > If dbt errors out early > > and your Jinja is making you surly > > don't post to the slack > > just take a step back > > and check if you're nesting your curlies. ##### Jinja[​](#jinja "Direct link to Jinja") When writing Jinja code in a dbt project, it may be tempting to nest expressions inside of each other. Take this example: ```text {{ dbt_utils.date_spine( datepart="day", start_date=[ USE JINJA HERE ] ) }} ``` To nest a Jinja expression inside of another Jinja expression, simply place the desired code (without curly brackets) directly into the expression. **Correct example** Here, the return value of the `var()` context method is supplied as the `start_date` argument to the `date_spine` macro. Great! ```text {{ dbt_utils.date_spine( datepart="day", start_date=var('start_date') ) }} ``` **Incorrect example** Once we've denoted that we're inside a Jinja expression (using the `{{` syntax), no further curly brackets are required inside of the Jinja expression. This code will supply a literal string value, `"{{ var('start_date') }}"`, as the `start_date` argument to the `date_spine` macro. This is probably not what you actually want to do! ```text -- Do not do this! It will not work! {{ dbt_utils.date_spine( datepart="day", start_date="{{ var('start_date') }}" ) }} ``` Here's another example: ```sql {# Either of these work #} {% set query_sql = 'select * from ' ~ ref('my_model') %} {% set query_sql %} select * from {{ ref('my_model') }} {% endset %} {# This does not #} {% set query_sql = "select * from {{ ref('my_model')}}" %} ``` ##### An exception[​](#an-exception "Direct link to An exception") There is one exception to this rule: curlies inside of curlies are acceptable in hooks (ie. `on-run-start`, `on-run-end`, `pre-hook`, and `post-hook`). Code like this is both valid, and encouraged: ```text {{ config(post_hook="grant select on {{ this }} to role bi_role") }} ``` So why are curlies inside of curlies allowed in this case? Here, we actually *want* the string literal `"grant select on {{ this }} ..."` to be saved as the configuration value for the post-hook in this model. This string will be re-rendered when the model runs, resulting in a sensible SQL expression like `grant select on "schema"."table"....` being executed against the database. These hooks are a special exception to the rule stated above. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Examining our builds #### Examining our builds[​](#examining-our-builds "Direct link to Examining our builds") * ⌚ dbt keeps track of how **long each model took to build**, when it started, when it finished, its completion status (error, warn, or success), its materialization type, and *much* more. * 🖼️ This information is stored in a couple files which dbt calls **artifacts**. * 📊 Artifacts contain a ton of information in JSON format, so aren’t easy to read, but **dbt** packages the most useful bits of information into a tidy **visualization** for you. * ☁️ If you’re not using Cloud, we can still use the output of the **dbt Core CLI to understand our runs**. ##### Model timing[​](#model-timing "Direct link to Model timing") That’s where dbt’s Model Timing visualization comes in extremely handy. If we’ve set up a [Job](https://docs.getdbt.com/guides/bigquery.md) in dbt to run our models, we can use the Model Timing tab to pinpoint our longest-running models. ![\'s Model Timing diagram](/assets/images/model-timing-diagram-2e9607735bba729089c9f5d780aca8bf.png) * 🧵 This view lets us see our **mapped out in threads** (up to 64 threads, we’re currently running with 4, so we get 4 tracks) over time. You can think of **each thread as a lane on a highway**. * ⌛ We can see above that `stg_order_items` and `order_items` are **taking the most time**, so we may want to go ahead and **make that incremental**. * 1️⃣ If a job has a single dbt invocation (for example `dbt build`), the model timing chart reflects the timing of all models. * 🔢 If a job includes multiple dbt commands (for example, `dbt build` followed by `dbt compile`), the model timing chart reflects only the models from the final command (`dbt compile`). For models executed in both commands, the chart displays the timing from the last invocation. Models that were not re-invoked in the final command retain their timing from the earlier command (`dbt build`). If you aren’t using dbt, that’s okay! We don’t get a fancy visualization out of the box, but we can use the output from the dbt Core CLI to check our model times, and it’s a great opportunity to become familiar with that output. ##### dbt Core CLI output[​](#dbt-core-cli-output "Direct link to dbt Core CLI output") If you’ve ever run dbt, whether `build`, `test`, `run` or something else, you’ve seen some output like below. Let’s take a closer look at how to read this. ![CLI output from a dbt build command](/assets/images/dbt-build-output-a00c7bf04a1e0b13c2b797ca5fcb4676.png) * There are two entries per model, the **start** of a model’s build and the **completion**, which will include **how long** the model took to run. The **type** of model is included as well. For example: ```shell 20:24:51 5 of 10 START sql view model main.stg_products ......... [RUN] 20:24:51 5 of 10 OK created sql view model main.stg_products .... [OK in 0.13s] ``` * 5️⃣  On **both rows** we can see that our `stg_products` model is the 5th of 10 objects being built, the timestamp it started at, that it was defined in SQL (as opposed to python), and that it was a view. * 🆕  On the **first row** we can see the timestamp of when the model **started**. * ✅  On the **second row** — which does *not* necessarily come right after, thanks to threads other models can be starting and finishing as this model runs — we see the **completion** entry which adds the **status**, in this case `OK` , and the **time to build**, a lightning-fast 0.13s. That’s not unexpected considering what we know about views. * 🏎️  **Views should typically take less than a second or two,** it’s tables and incremental models you’ll want to keep a closer eye on with these tools. ##### dbt Artifacts package[​](#dbt-artifacts-package "Direct link to dbt Artifacts package") * 🎨  Lastly, when it comes to examining your dbt runs, you’re **not stuck without fancy visuals** if you’re using dbt Core. It’s not set up out-of-the-box, but if you want to introspect your project more deeply, you can use the [dbt Artifacts package](https://github.com/brooklyn-data/dbt_artifacts). * 👩‍🎨  This provides models you can **visualize for every aspect of your project** at a very granular level. * ⌚  You can use it to **create your own model timing visualization** in your BI tool, and any other reports you need to keep an eye on your materialization strategy. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### How we structure our dbt projects #### Why does structure matter?[​](#why-does-structure-matter "Direct link to Why does structure matter?") Analytics engineering, at its core, is about helping groups of human beings collaborate on better decisions at scale. We have [limited bandwidth for making decisions](https://en.wikipedia.org/wiki/Decision_fatigue). We also, as a cooperative social species, rely on [systems and patterns to optimize collaboration](https://en.wikipedia.org/wiki/Pattern_language) with others. This combination of traits means that for collaborative projects it's crucial to establish consistent and comprehensible norms such that your team’s limited bandwidth for decision making can be spent on unique and difficult problems, not deciding where folders should go or how to name files. Building a great dbt project is an inherently collaborative endeavor, bringing together domain knowledge from every department to map the goals and narratives of the entire company. As such, it's especially important to establish a deep and broad set of patterns to ensure as many people as possible are empowered to leverage their particular expertise in a positive way, and to ensure that the project remains approachable and maintainable as your organization scales. Famously, Steve Jobs [wore the same outfit everyday](https://images.squarespace-cdn.com/content/v1/5453c539e4b02ab5398ffc8f/1580381503218-E56FQDNFL1P4OBLQWHWW/ke17ZwdGBToddI8pDm48kJKedFpub2aPqa33K4gNUDwUqsxRUqqbr1mOJYKfIPR7LoDQ9mXPOjoJoqy81S2I8N_N4V1vUb5AoIIIbLZhVYxCRW4BPu10St3TBAUQYVKcxb5ZTIyC_D49_DDQq2Sj8YVGtM7O1i4h5tvKa2lazN4nGUQWMS_WcPM-ztWbVr-c/steve_jobs_outfit.jpg) to reduce decision fatigue. You can think of this guide similarly, as a black turtleneck and New Balance sneakers for your company’s dbt project. A dbt project’s power outfit, or more accurately its structure, is composed not of fabric but of files, folders, naming conventions, and programming patterns. How you label things, group them, split them up, or bring them together — the system you use to organize the [data transformations](https://www.getdbt.com/analytics-engineering/transformation/) encoded in your dbt project — this is your project’s structure. This guide is just a starting point. You may decide that you prefer Birkenstocks or a purple hoodie for your project over Jobs-ian minimalism. That's fine. What's important is that you think through the reasoning for those changes in your organization, explicitly declare them in a thorough, accessible way for all contributors, and above all *stay consistent*. One foundational principle that applies to all dbt projects though, is the need to establish a cohesive arc moving data from *source-conformed* to *business-conformed*. Source-conformed data is shaped by external systems out of our control, while business-conformed data is shaped by the needs, concepts, and definitions we create. No matter what patterns or conventions you define within your project, this process remains the essential purpose of the transformation layer, and dbt as your tool within it. This guide is an update to a seminal analytics engineering [post of the same name](https://discourse.getdbt.com/t/how-we-structure-our-dbt-projects/355) by the great Claire Carroll, and while some of the details have changed over time (as anticipated in that post) this fundamental trajectory holds true. Moving forward, this guide will be iteratively updated as new tools expand our viewpoints, new experiences sharpen our vision, and new voices strengthen our perspectives, but always in service of that aim. ##### Learning goals[​](#learning-goals "Direct link to Learning goals") This guide has three main goals: * Thoroughly cover our most up-to-date recommendations on how to structure typical dbt projects * Illustrate these recommendations with comprehensive examples * At each stage, explain *why* we recommend the approach that we do, so that you're equipped to decide when and where to deviate from these recommendations to better fit your organization’s unique needs You should walk away from this guide with a deeper mental model of how the components of a dbt project fit together, such that purpose and principles of analytics engineering feel more clear and intuitive. By approaching our structure intentionally, we’ll gain a better understanding of foundational ideals like moving our data from the wide array of narrower source-conformed models that our systems give us to a narrower set of wider, richer business-conformed designs we create. As we move along that arc, we’ll understand how stacking our transformations in optimized, modular layers means we can apply each transformation in only one place. With a disciplined approach to the files, folders, and materializations that comprise our structure, we’ll find that we can create clear stories not only through our data, but also our codebase and the artifacts it generates in our warehouse. Our hope is that by deepening your sense of the connections between these patterns and the principles they flow from, you'll be able to translate them to fit your specific needs and craft customized documentation for your team to act on. Example project. This guide walks through our recommendations using a very simple dbt project — similar to the one used for the Getting Started guide and many other demos — from a fictional company called the Jaffle Shop. You can read more about [jaffles](https://en.wiktionary.org/wiki/jaffle) if you want (they *are* a real thing), but that context isn’t important to understand the structure. We encourage you to follow along, try things out, make changes, and take notes on what works or doesn't work for you along the way. We'll get a deeper sense of our project as we move through the guide, but for now we just need to know that the Jaffle Shop is a restaurant selling jaffles that has two main data sources: * A replica of our transactional database, called `jaffle_shop`, with core entities like orders and customers. * Synced data from [Stripe](https://stripe.com/), which we use for processing payments. ##### Guide structure overview[​](#guide-structure-overview "Direct link to Guide structure overview") We'll walk through our topics in the same order that our data would move through transformation: 1. Dig into how we structure the files, folders, and models for our three primary layers in the `models` directory, which build on each other: 1. **Staging** — creating our atoms, our initial modular building blocks, from source data 2. **Intermediate** — stacking layers of logic with clear and specific purposes to prepare our staging models to join into the entities we want 3. **Marts** — bringing together our modular pieces into a wide, rich vision of the entities our organization cares about 2. Explore how these layers fit into the rest of the project: 1. Review the overall structure comprehensively 2. Expand on YAML configuration in-depth 3. Discuss how to use the other folders in a dbt project: `tests`, `seeds`, and `analyses` Below is the complete file tree of the project we’ll be working through. Don’t worry if this looks like a lot of information to take in at once - this is just to give you the full vision of what we’re building towards. We’ll focus in on each of the sections one by one as we break down the project’s structure. ```shell jaffle_shop ├── README.md ├── analyses ├── seeds │ └── employees.csv ├── dbt_project.yml ├── macros │ └── cents_to_dollars.sql ├── models │ ├── intermediate │ │ └── finance │ │ ├── _int_finance__models.yml │ │ └── int_payments_pivoted_to_orders.sql │ ├── marts │ │ ├── finance │ │ │ ├── _finance__models.yml │ │ │ ├── orders.sql │ │ │ └── payments.sql │ │ └── marketing │ │ ├── _marketing__models.yml │ │ └── customers.sql │ ├── staging │ │ ├── jaffle_shop │ │ │ ├── _jaffle_shop__docs.md │ │ │ ├── _jaffle_shop__models.yml │ │ │ ├── _jaffle_shop__sources.yml │ │ │ ├── base │ │ │ │ ├── base_jaffle_shop__customers.sql │ │ │ │ └── base_jaffle_shop__deleted_customers.sql │ │ │ ├── stg_jaffle_shop__customers.sql │ │ │ └── stg_jaffle_shop__orders.sql │ │ └── stripe │ │ ├── _stripe__models.yml │ │ ├── _stripe__sources.yml │ │ └── stg_stripe__payments.sql │ └── utilities │ └── all_dates.sql ├── packages.yml ├── snapshots └── tests └── assert_positive_value_for_total_amount.sql ``` #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### How we style our dbt models #### Fields and model names[​](#fields-and-model-names "Direct link to Fields and model names") * 👥 Models should be pluralized, for example, `customers`, `orders`, `products`. * 🔑 Each model should have a primary key. * 🔑 The primary key of a model should be named `_id`, for example, `account_id`. This makes it easier to know what `id` is being referenced in downstream joined models. * Use underscores for naming dbt models; avoid dots. * ✅ `models_without_dots` * ❌ `models.with.dots` * Most data platforms use dots to separate `database.schema.object`, so using underscores instead of dots reduces your need for [quoting](https://docs.getdbt.com/reference/resource-properties/quoting.md) as well as the risk of issues in certain parts of dbt. For more background, refer to [this GitHub issue](https://github.com/dbt-labs/dbt-core/issues/3246). * 🔑 Keys should be string data types. * 🔑 Consistency is key! Use the same field names across models where possible. For example, a key to the `customers` table should be named `customer_id` rather than `user_id` or 'id'. * ❌ Do not use abbreviations or aliases. Emphasize readability over brevity. For example, do not use `cust` for `customer` or `o` for `orders`. * ❌ Avoid reserved words as column names. * ➕ Booleans should be prefixed with `is_` or `has_`. * 🕰️ Timestamp columns should be named `_at`(for example, `created_at`) and should be in UTC. If a different timezone is used, this should be indicated with a suffix (`created_at_pt`). * 📆 Dates should be named `_date`. For example, `created_date.` * 🔙 Events dates and times should be past tense — `created`, `updated`, or `deleted`. * 💱 Price/revenue fields should be in decimal currency (`19.99` for $19.99; many app databases store prices as integers in cents). If a non-decimal currency is used, indicate this with a suffix (`price_in_cents`). * 🐍 Schema, table and column names should be in `snake_case`. * 🏦 Use names based on the *business* terminology, rather than the source terminology. For example, if the source database uses `user_id` but the business calls them `customer_id`, use `customer_id` in the model. * 🔢 Versions of models should use the suffix `_v1`, `_v2`, etc for consistency (`customers_v1` and `customers_v2`). * 🗄️ Use a consistent ordering of data types and consider grouping and labeling columns by type, as in the example below. This will minimize join errors and make it easier to read the model, as well as help downstream consumers of the data understand the data types and scan models for the columns they need. We prefer to use the following order: ids, strings, numerics, booleans, dates, and timestamps. #### Example model[​](#example-model "Direct link to Example model") ```sql with source as ( select * from {{ source('ecom', 'raw_orders') }} ), renamed as ( select ---------- ids id as order_id, store_id as location_id, customer as customer_id, ---------- strings status as order_status, ---------- numerics (order_total / 100.0)::float as order_total, (tax_paid / 100.0)::float as tax_paid, ---------- booleans is_fulfilled, ---------- dates date(order_date) as ordered_date, ---------- timestamps ordered_at from source ) select * from renamed ``` #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### How we style our dbt projects #### Why does style matter?[​](#why-does-style-matter "Direct link to Why does style matter?") Style might seem like a trivial, surface-level issue, but it's a deeply material aspect of a well-built project. A consistent, clear style enhances readability and makes your project easier to understand and maintain. Highly readable code helps build clear mental models making it easier to debug and extend your project. It's not just a favor to yourself, though; equally importantly, it makes it less effort for others to understand and contribute to your project, which is essential for peer collaboration, open-source work, and onboarding new team members. [A style guide lets you focus on what matters](https://mtlynch.io/human-code-reviews-1/#settle-style-arguments-with-a-style-guide), the logic and impact of your project, rather than the superficialities of how it's written. This brings harmony and pace to your team's work, and makes reviews more enjoyable and valuable. #### What's important about style?[​](#whats-important-about-style "Direct link to What's important about style?") There are two crucial tenets of code style: * Clarity * Consistency Style your code in such a way that you can quickly read and understand it. It's also important to consider code review and git diffs. If you're making a change to a model, you want reviewers to see just the material changes you're making clearly. Once you've established a clear style, stay consistent. This is the most important thing. Everybody on your team needs to have a unified style, which is why having a style guide is so crucial. If you're writing a model, you should be able to look at other models in the project that your teammates have written and read in the same style. If you're writing a macro or a test, you should see the same style as your models. Consistency is key. #### How should I style?[​](#how-should-i-style "Direct link to How should I style?") You should style the project in a way you and your teammates or collaborators agree on. The most important thing is that you have a style guide and stick to it. This guide is just a suggestion to get you started and to give you a sense of what a style guide might look like. It covers various areas you may want to consider, with suggested rules. It emphasizes lots of whitespace, clarity, clear naming, and comments. We believe one of the strengths of SQL is that it reads like English, so we lean into that declarative nature throughout our projects. Even within dbt Labs, though, there are differing opinions on how to style, even a small but passionate contingent of leading comma enthusiasts! Again, the important thing is not to follow this style guide; it's to make *your* style guide and follow it. Lastly, be sure to include rules, tools, *and* examples in your style guide to make it as easy as possible for your team to follow. #### Automation[​](#automation "Direct link to Automation") Use formatters and linters as much as possible. We're all human, we make mistakes. Not only that, but we all have different preferences and opinions while writing code. Automation is a great way to ensure that your project is styled consistently and correctly and that people can write in a way that's quick and comfortable for them, while still getting perfectly consistent output. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### How we style our Jinja #### Jinja style guide[​](#jinja-style-guide "Direct link to Jinja style guide") * 🫧 When using Jinja delimiters, use spaces on the inside of your delimiter, like `{{ this }}` instead of `{{this}}` * 🆕 Use newlines to visually indicate logical blocks of Jinja. * 4️⃣ Indent 4 spaces into a Jinja block to indicate visually that the code inside is wrapped by that block. * ❌ Don't worry (too much) about Jinja whitespace control, focus on your project code being readable. The time you save by not worrying about whitespace control will far outweigh the time you spend in your compiled code where it might not be perfect. #### Examples of Jinja style[​](#examples-of-jinja-style "Direct link to Examples of Jinja style") ```jinja {% macro make_cool(uncool_id) %} do_cool_thing({{ uncool_id }}) {% endmacro %} ``` ```sql select entity_id, entity_type, {% if this %} {{ that }}, {% else %} {{ the_other_thing }}, {% endif %} {{ make_cool('uncool_id') }} as cool_id ``` #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### How we style our Python #### Python tooling[​](#python-tooling "Direct link to Python tooling") * 🐍 Python has a more mature and robust ecosystem for formatting and linting (helped by the fact that it doesn't have a million distinct dialects). We recommend using those tools to format and lint your code in the style you prefer. * 🛠️ Our current recommendations are * [black](https://pypi.org/project/black/) formatter * [ruff](https://pypi.org/project/ruff/) linter info ☁️ dbt comes with the [black formatter built-in](https://docs.getdbt.com/docs/cloud/studio-ide/lint-format.md) to automatically lint and format their Python. You don't need to download or configure anything, just click `Format` in a Python model and you're good to go! #### Example Python[​](#example-python "Direct link to Example Python") ```python import pandas as pd def model(dbt, session): # set length of time considered a churn pd.Timedelta(days=2) dbt.config(enabled=False, materialized="table", packages=["pandas==1.5.2"]) orders_relation = dbt.ref("stg_orders") # converting a DuckDB Python Relation into a pandas DataFrame orders_df = orders_relation.df() orders_df.sort_values(by="ordered_at", inplace=True) orders_df["previous_order_at"] = orders_df.groupby("customer_id")[ "ordered_at" ].shift(1) orders_df["next_order_at"] = orders_df.groupby("customer_id")["ordered_at"].shift( -1 ) return orders_df ``` #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### How we style our SQL #### Basics[​](#basics "Direct link to Basics") * ☁️ Use [SQLFluff](https://sqlfluff.com/) to maintain these style rules automatically. * Customize `.sqlfluff` configuration files to your needs. * Refer to our [SQLFluff config file](https://github.com/dbt-labs/jaffle-shop-template/blob/main/.sqlfluff) for the rules we use in our own projects. * Exclude files and directories by using a standard `.sqlfluffignore` file. Learn more about the syntax in the [.sqlfluffignore syntax docs](https://docs.sqlfluff.com/en/stable/configuration/index.html). * Excluding unnecessary folders and files (such as `target/`, `dbt_packages/`, and `macros/`) can speed up linting, improve run times, and help you avoid irrelevant logs. * 👻 Use Jinja comments (`{# #}`) for comments that should not be included in the compiled SQL. * ⏭️ Use trailing commas. * 4️⃣ Indents should be four spaces. * 📏 Lines of SQL should be no longer than 80 characters. * ⬇️ Field names, keywords, and function names should all be lowercase. * 🫧 The `as` keyword should be used explicitly when aliasing a field or table. info ☁️ dbt users can use the built-in [SQLFluff Studio IDE integration](https://docs.getdbt.com/docs/cloud/studio-ide/lint-format.md) to automatically lint and format their SQL. The default style sheet is based on dbt Labs style as outlined in this guide, but you can customize this to fit your needs. No need to setup any external tools, just hit `Lint`! Also, the more opinionated [sqlfmt](http://sqlfmt.com/) formatter is also available if you prefer that style. #### Fields, aggregations, and grouping[​](#fields-aggregations-and-grouping "Direct link to Fields, aggregations, and grouping") * 🔙 Fields should be stated before aggregates and window functions. * 🤏🏻 Aggregations should be executed as early as possible (on the smallest data set possible) before joining to another table to improve performance. * 🔢 Ordering and grouping by a number (eg. group by 1, 2) is preferred over listing the column names (see [this classic rant](https://www.getdbt.com/blog/write-better-sql-a-defense-of-group-by-1) for why). Note that if you are grouping by more than a few columns, it may be worth revisiting your model design. #### Joins[​](#joins "Direct link to Joins") * 👭🏻 Prefer `union all` to `union` unless you explicitly want to remove duplicates. * 👭🏻 If joining two or more tables, *always* prefix your column names with the table name. If only selecting from one table, prefixes are not needed. * 👭🏻 Be explicit about your join type (i.e. write `inner join` instead of `join`). * 🥸 Avoid table aliases in join conditions (especially initialisms) — it's harder to understand what the table called "c" is as compared to "customers". * ➡️ Always move left to right to make joins easy to reason about - `right joins` often indicate that you should change which table you select `from` and which one you `join` to. #### 'Import' CTEs[​](#import-ctes "Direct link to 'Import' CTEs") * 🔝 All `{{ ref('...') }}` statements should be placed in CTEs at the top of the file. * 📦 'Import' CTEs should be named after the table they are referencing. * 🤏🏻 Limit the data scanned by CTEs as much as possible. Where possible, only select the columns you're actually using and use `where` clauses to filter out unneeded data. * For example: ```sql with orders as ( select order_id, customer_id, order_total, order_date from {{ ref('orders') }} where order_date >= '2020-01-01' ) ``` #### 'Functional' CTEs[​](#functional-ctes "Direct link to 'Functional' CTEs") * ☝🏻 Where performance permits, CTEs should perform a single, logical unit of work. * 📖 CTE names should be as verbose as needed to convey what they do e.g. `events_joined_to_users` instead of `user_events` (this could be a good model name, but does not describe a specific function or transformation). * 🌉 CTEs that are duplicated across models should be pulled out into their own intermediate models. Look out for chunks of repeated logic that should be refactored into their own model. * 🔚 The last line of a model should be a `select *` from your final output CTE. This makes it easy to materialize and audit the output from different steps in the model as you're developing it. You just change the CTE referenced in the `select` statement to see the output from that step. #### Model configuration[​](#model-configuration "Direct link to Model configuration") * 📝 Model-specific attributes (like sort/dist keys) should be specified in the model. * 📂 If a particular configuration applies to all models in a directory, it should be specified in the `dbt_project.yml` file. * 👓 In-model configurations should be specified like this for maximum readability: ```sql {{ config( materialized = 'table', sort = 'id', dist = 'id' ) }} ``` #### Example SQL[​](#example-sql "Direct link to Example SQL") ```sql with events as ( ... ), {# CTE comments go here #} filtered_events as ( ... ) select * from filtered_events ``` ##### Example SQL[​](#example-sql-1 "Direct link to Example SQL") ```sql with my_data as ( select field_1, field_2, field_3, cancellation_date, expiration_date, start_date from {{ ref('my_data') }} ), some_cte as ( select id, field_4, field_5 from {{ ref('some_cte') }} ), some_cte_agg as ( select id, sum(field_4) as total_field_4, max(field_5) as max_field_5 from some_cte group by 1 ), joined as ( select my_data.field_1, my_data.field_2, my_data.field_3, -- use line breaks to visually separate calculations into blocks case when my_data.cancellation_date is null and my_data.expiration_date is not null then expiration_date when my_data.cancellation_date is null then my_data.start_date + 7 else my_data.cancellation_date end as cancellation_date, some_cte_agg.total_field_4, some_cte_agg.max_field_5 from my_data left join some_cte_agg on my_data.id = some_cte_agg.id where my_data.field_1 = 'abc' and ( my_data.field_2 = 'def' or my_data.field_2 = 'ghi' ) having count(*) > 1 ) select * from joined ``` #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### How we style our YAML #### YAML Style Guide[​](#yaml-style-guide "Direct link to YAML Style Guide") * 2️⃣ Indents should be two spaces * ➡️ List items should be indented * 🔠 List items with a single entry can be a string. For example, `'select': 'other_user'`, but it's best practice to provide the argument as an explicit list. For example, `'select': ['other_user']` * 🆕 Use a new line to separate list items that are dictionaries where appropriate * 📏 Lines of YAML should be no longer than 80 characters. * 🛠️ Use the [dbt JSON schema](https://github.com/dbt-labs/dbt-jsonschema) with any compatible Studio IDE and a YAML formatter (we recommend [Prettier](https://prettier.io/)) to validate your YAML files and format them automatically. Note, refer to [YAML tips](https://docs.getdbt.com/docs/build/dbt-tips.md#yaml-tips) for more YAML information. info ☁️ As with Python and SQL, the Studio IDE comes with built-in formatting for YAML files (Markdown and JSON too!), via Prettier. Just click the `Format` button and you're in perfect style. As with the other tools, you can [also customize the formatting rules](https://docs.getdbt.com/docs/cloud/studio-ide/lint-format.md#format-yaml-markdown-json) to your liking to fit your company's style guide. ##### Example YAML[​](#example-yaml "Direct link to Example YAML") ```yaml models: - name: events columns: - name: event_id description: This is a unique identifier for the event data_tests: - unique - not_null - name: event_time description: "When the event occurred in UTC (eg. 2018-01-01 12:00:00)" data_tests: - not_null - name: user_id description: The ID of the user who recorded the event data_tests: - not_null - relationships: arguments: # available in v1.10.5 and higher. Older versions can set the as the top-level property. to: ref('users') field: id ``` #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Implementing your mesh plan ##### Where should your mesh journey start?[​](#where-should-your-mesh-journey-start "Direct link to Where should your mesh journey start?") Moving to a Mesh represents a meaningful change in development and deployment architecture. Before any sufficiently complex software refactor or migration, it's important to ask, 'Why might this not work?' The two most common reasons we've seen stem from 1. Lack of buy-in that a Mesh is the right long-term architecture 2. Lack of alignment on a well-scoped starting point Creating alignment on your architecture and starting point are major steps in ensuring a successful migration. Deciding on the right starting point will look different for every organization, but there are some heuristics that can help you decide where to start. In all likelihood, your organization already has logical components, and you may already be grouping, building, and deploying your project according to these interfaces.The goal is to define and formalize these organizational interfaces and use these boundaries to split your project apart by domain. How do you find these organizational interfaces? Here are some steps to get you started: * **Talk to teams** about what sort of separation naturally exists right now. * Are there various domains people are focused on? * Are there various sizes, shapes, and sources of data that get handled separately (such as click event data)? * Are there people focused on separate levels of transformation, such as landing and staging data or building marts? * Is there a single team that is *downstream* of your current dbt project, who could more easily migrate onto Mesh as a consumer? When attempting to define your project interfaces, you should consider investigating: * **Your jobs:** Which sets of models are most often built together? * **Your lineage graph:** How are models connected? * **Your selectors(defined in `selectors.yml`):** How do people already define resource groups? Let's go through an example process of taking a monolithing project, using groups and access to define the interfaces, and then splitting it into multiple projects. tip To help you get started, check out our [Quickstart with Mesh](https://docs.getdbt.com/guides/mesh-qs.md) or our online [Mesh course](https://learn.getdbt.com/courses/dbt-mesh) to learn more! #### Defining project interfaces with groups and access[​](#defining-project-interfaces-with-groups-and-access "Direct link to Defining project interfaces with groups and access") Once you have a sense of some initial groupings, you can first implement **group and access permissions** within a single project. * First you can create a [group](https://docs.getdbt.com/docs/build/groups.md) to define the owner of a set of models. ```yml # in models/__groups.yml groups: - name: marketing owner: name: Ben Jaffleck email: ben.jaffleck@jaffleshop.com ``` * Then, we can add models to that group using the `group:` key in the model's YAML entry. ```yml # in models/marketing/__models.yml models: - name: fct_marketing_model config: group: marketing # changed to config in v1.10 - name: stg_marketing_model config: group: marketing # changed to config in v1.10 ``` * Once you've added models to the group, you can **add [access](https://docs.getdbt.com/docs/mesh/govern/model-access.md) settings to the models** based on their connections between groups, *opting for the most private access that will maintain current functionality*. This means that any model that has *only* relationships to other models in the same group should be `private` , and any model that has cross-group relationships, or is a terminal node in the group DAG should be `protected` so that other parts of the DAG can continue to reference it. ```yml # in models/marketing/__models.yml models: - name: fct_marketing_model config: group: marketing # changed to config in v1.10 access: protected # changed to config in v1.10 - name: stg_marketing_model config: group: marketing # changed to config in v1.10 access: private # changed to config in v1.10 ``` * **Validate these groups by incrementally migrating your jobs** to execute these groups specifically via selection syntax. We would recommend doing this in parallel to your production jobs until you’re sure about them. This will help you feel out if you’ve drawn the lines in the right place. * If you find yourself **consistently making changes across multiple groups** when you update logic, that’s a sign that **you may want to rethink your groups**. #### Split your projects[​](#split-your-projects "Direct link to Split your projects") 1. **Move your grouped models into a subfolder**. This will include any model in the selected group, it's associated YAML entry, as well as its parent or child resources as appropriate depending on where this group sits in your DAG. 1. Note that just like in your dbt project, circular references are not allowed! Project B cannot have parents and children in Project A, for example. 2. **Create a new `dbt_project.yml` file** in the subdirectory. 3. **Copy any macros** used by the resources you moved. 4. **Create a new `packages.yml` file** in your subdirectory with the packages that are used by the resources you moved. 5. **Update `{{ ref }}` functions** — For any model that has a cross-project dependency (this may be in the files you moved, or in the files that remain in your project): 1. Update the `{{ ref() }}` function to have two arguments, where the first is the name of the source project and the second is the name of the model: e.g. `{{ ref('jaffle_shop', 'my_upstream_model') }}` 2. Update the upstream, cross-project parents’ `access` configs to `public` , ensuring any project can safely `{{ ref() }}` those models. 3. We *highly* recommend adding a [model contract](https://docs.getdbt.com/docs/mesh/govern/model-contracts.md) to the upstream models to ensure the data shape is consistent and reliable for your downstream consumers. 6. **Create a `dependencies.yml` file** ([docs](https://docs.getdbt.com/docs/mesh/govern/project-dependencies.md)) for the downstream project, declaring the upstream project as a dependency. ```yml # in dependencies.yml projects: - name: jaffle_shop ``` ##### Best practices[​](#best-practices "Direct link to Best practices") * When you’ve **confirmed the right groups**, it's time to split your projects. * **Do *one* group at a time**! * **Do *not* refactor as you migrate**, however tempting that may be. Focus on getting 1-to-1 parity and log any issues you find in doing the migration for later. Once you’ve fully migrated the project then you can start optimizing it for its new life as part of your mesh. * Start by splitting your project within the same repository for full git tracking and easy reversion if you need to start from scratch. #### Connecting existing projects[​](#connecting-existing-projects "Direct link to Connecting existing projects") Some organizations may already be coordinating across multiple dbt projects. Most often this is via: 1. Installing parent projects as dbt packages 2. Using `{{ source() }}` functions to read the outputs of a parent project as inputs to a child project. This has a few drawbacks: 1. If using packages, each project has to include *all* resources from *all* projects in its manifest, slowing down dbt and the development cycle. 2. If using sources, there are breakages in the lineage, as there's no real connection between the parent and child projects. The migration steps here are much simpler than splitting up a monolith! 1. If using the `package` method: 1. In the parent project: 1. mark all models being referenced downstream as `public` and add a model contract. 2. In the child project: 1. Remove the package entry from `packages.yml` 2. Add the upstream project to your `dependencies.yml` 3. Update the `{{ ref() }}` functions to models from the upstream project to include the project name argument. 2. If using `source` method: 1. In the parent project: 1. mark all models being imported downstream as `public` and add a model contract. 2. In the child project: 1. Add the upstream project to your `dependencies.yml` 2. Replace the `{{ source() }}` functions with cross project `{{ ref() }}` functions. 3. Remove the unnecessary `source` definitions. #### Additional Resources[​](#additional-resources "Direct link to Additional Resources") ##### Our example projects[​](#our-example-projects "Direct link to Our example projects") We've provided a set of example projects you can use to explore the topics covered here. We've split our [Jaffle Shop](https://github.com/dbt-labs/jaffle-shop) project into 3 separate projects in a multi-repo Mesh. Note that you'll need to leverage dbt to use multi-project architecture, as cross-project references are powered via dbt's APIs. * **[Platform](https://github.com/dbt-labs/jaffle-shop-mesh-platform)** - containing our centralized staging models. * **[Marketing](https://github.com/dbt-labs/jaffle-shop-mesh-marketing)** - containing our marketing marts. * **[Finance](https://github.com/dbt-labs/jaffle-shop-mesh-finance)** - containing our finance marts. #### Related docs[​](#related-docs "Direct link to Related docs") * [Quickstart with Mesh](https://docs.getdbt.com/guides/mesh-qs.md) #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Incremental models in-depth So far we’ve looked at tables and views, which map to the traditional objects in the data warehouse. As mentioned earlier, incremental models are a little different. This is where we start to deviate from this pattern with more powerful and complex materializations. * 📚 **Incremental models generate tables.** They physically persist the data itself to the warehouse, just piece by piece. What’s different is **how we build that table**. * 💅 **Only apply our transformations to rows of data with new or updated information**, this maximizes efficiency. * 🌍  If we have a very large set of data or compute-intensive transformations, or both, it can be very slow and costly to process the entire corpus of source data being input into a model or chain of models. If instead we can identify *only rows that contain new information* (that is, **new or updated records**), we then can process just those rows, building our models *incrementally*. * 3️⃣  We need **3 key things** in order to accomplish the above: * a **filter** to select just the new or updated records * a **conditional block** that wraps our filter and only applies it when we want it * **configuration** that tells dbt we want to build incrementally and helps apply the conditional filter when needed Let’s dig into how exactly we can do that in dbt. Let’s say we have an `orders` table that looks like the below: | order\_id | order\_status | customer\_id | order\_item\_id | ordered\_at | updated\_at | | --------- | ------------- | ------------ | --------------- | ----------- | ----------- | | 123 | shipped | 7 | 5791 | 2022-01-30 | 2022-01-30 | | 234 | confirmed | 15 | 1643 | 2022-01-31 | 2022-01-31 | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | We did our last `dbt build` job on `2022-01-31`, so any new orders since that run won’t appear in our table. When we do our next run (for simplicity let’s say the next day, although for an orders model we’d more realistically run this hourly), we have two options: * 🏔️ build the table from the **beginning of time again — a *table materialization*** * Simple and solid, if we can afford to do it (in terms of time, compute, and money — which are all directly correlated in a cloud warehouse). It’s the easiest and most accurate option. * 🤏 find a way to run **just new and updated rows since our previous run — *an* *incremental materialization*** * If we *can’t* realistically afford to run the whole table — due to complex transformations or big source data, it takes too long — then we want to build incrementally. We want to just transform and add the row with id 567 below, *not* the previous two with ids 123 and 234 that are already in the table. | order\_id | order\_status | customer\_id | order\_item\_id | ordered\_at | updated\_at | | --------- | ------------- | ------------ | --------------- | ----------- | ----------- | | 123 | shipped | 7 | 5791 | 2022-01-30 | 2022-01-30 | | 234 | confirmed | 15 | 1643 | 2022-01-31 | 2022-01-31 | | 567 | shipped | 61 | 28 | 2022-02-01 | 2022-02-01 | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | ##### Writing incremental logic[​](#writing-incremental-logic "Direct link to Writing incremental logic") Let’s think through the information we’d need to build such a model that only processes new and updated data. We would need: * 🕜  **a timestamp indicating when a record was last updated**, let’s call it our `updated_at` timestamp, as that’s a typical convention and what we have in our example above. * ⌛ the **most recent timestamp from this table *in our warehouse*** *—* that is, the one created by the previous run — to act as a cutoff point. We’ll call the model we’re working in `this`, for ‘this model we’re working in’. That would lets us construct logic like this: ```sql select * from orders where updated_at > (select max(updated_at) from {{ this }}) ``` Let’s break down that `where` clause a bit, because this is where the action is with incremental models. Stepping through the code ***right-to-left*** we: 1. Get our **cutoff.** 1. Select the `max(updated_at)` timestamp — the **most recent record** 2. from `{{ this }}` — the table for this model as it exists in the warehouse, as **built in our last run**, 3. so `max(updated_at) from {{ this }}` the ***most recent record processed in our last run,*** 4. that’s exactly what we want as a **cutoff**! 2. **Filter** the rows we’re selecting to add in this run. 1. Use the `updated_at` timestamp from our input, the equivalent column to the one in the warehouse, but in the up-to-the-minute **source data we’re selecting from** and 2. check if it’s **greater than our cutoff,** 3. if so it will satisfy our where clause, so we’re **selecting all the rows more recent than our cutoff.** This logic would let us isolate and apply our transformations to just the records that have come in since our last run, and I’ve got some great news: that magic `{{ this }}` keyword [does in fact exist in dbt](https://docs.getdbt.com/reference/dbt-jinja-functions/this.md), so we can write exactly this logic in our models. ##### Configuring incremental models[​](#configuring-incremental-models "Direct link to Configuring incremental models") So we’ve found a way to isolate the new rows we need to process. How then do we handle the rest? We still need to: * ➕  make sure dbt knows to ***add* new rows on top** of the existing table in the warehouse, **not replace** it. * 👉  If there are **updated rows**, we need a way for dbt to know **which rows to update**. * 🌍  Lastly, if we’re building into a new environment and there’s **no previous run to reference**, or we need to **build the model from scratch.** Put another way, we’ll want a means to skip the incremental logic and transform all of our input data like a regular table if needed. * 😎 **Visualized below**, we’ve figured out how to get the red ‘new records’ portion selected, but we need to sort out the step to the right, where we stick those on to our model. ![Diagram visualizing how incremental models work](/assets/images/incremental-diagram-8816eec2768f76dbb493f70c7ec25d99.png) info 😌 Incremental models can be confusing at first, **take your time reviewing** this visual and the previous steps until you have a **clear mental model.** Be patient with yourself. This materialization will become second nature soon, but it’s tough at first. If you’re feeling confused the [dbt Community is here for you on the Forum and Slack](https://www.getdbt.com/community/join-the-community). Thankfully dbt has some additional configuration and special syntax just for incremental models. First, let's look at a config block for incremental materialization: ```sql {{ config( materialized='incremental', unique_key='order_id' ) }} select ... ``` * 📚 The **`materialized` config** works just like tables and views, we just pass it the value `'incremental'`. * 🔑 We’ve **added a new config option `unique_key`,** that tells dbt that if it finds a record in our previous run — the data in the warehouse already — with the same unique id (in our case `order_id` for our `orders` table) that exists in the new data we’re adding incrementally, to **update that record instead of adding it as a separate row**. * 👯 This **hugely broadens the types of data we can build incrementally** from just immutable tables (data where rows only ever get added, never updated) to mutable records (where rows might change over time). As long as we’ve got a column that specifies when records were updated (such as `updated_at` in our example), we can handle almost anything. * ➕ We’re now **adding records** to the table **and updating existing rows**. That’s 2 of 3 concerns. * 🆕 We still need to **build the table from scratch** (via `dbt build` or `run` in a job) when necessary — whether because we’re in a new environment so don’t have an initial table to build on, or our model has drifted from the original over time due to data loading latency. * 🔀 We need to wrap our incremental logic, that is our `where` clause with our `updated_at` cutoff, in a **conditional statement that will only apply it when certain conditions are met**. If you’re thinking this is **a case for a Jinja `{% if %}` statement**, you’re absolutely right! ##### Incremental conditions[​](#incremental-conditions "Direct link to Incremental conditions") So we’re going to use an **if statement** to apply our cutoff filter **only when certain conditions are met**. We want to apply our cutoff filter *if* the **following things are true**: * ➕  we’ve set the materialization **config** to incremental, * 🛠️  there is an **existing table** for this model in the warehouse to build on, * 🙅‍♀️  and the `--full-refresh` **flag was *not* passed.** * [full refresh](https://docs.getdbt.com/reference/resource-configs/full_refresh.md) is a configuration and flag that is specifically designed to let us override the incremental materialization and build a table from scratch again. Thankfully, we don’t have to dig into the guts of dbt to sort out each of these conditions individually. * ⚙️  dbt provides us with a **macro [`is_incremental`](https://docs.getdbt.com/docs/build/incremental-models.md#understand-the-is_incremental-macro)** that checks all of these conditions for this exact use case. * 🔀  By **wrapping our cutoff logic** in this macro, it will only get applied when the macro returns true for all of the above conditions. Let’s take a look at all these pieces together: ```sql {{ config( materialized='incremental', unique_key='order_id' ) }} select * from orders {% if is_incremental() %} where updated_at > (select max(updated_at) from {{ this }}) {% endif %} ``` Fantastic! We’ve got a working incremental model. On our first run, when there is no corresponding table in the warehouse, `is_incremental` will evaluate to false and we’ll capture the entire table. On subsequent runs it will evaluate to true and we’ll apply our filter logic, capturing only the newer data. ##### Late-arriving facts[​](#late-arriving-facts "Direct link to Late-arriving facts") Our last concern specific to incremental models is what to do when data is inevitably loaded in a less-than-perfect way. Sometimes data loaders will, for a variety of reasons, load data late. Either an entire load comes in late, or some rows come in on a load after those with which they should have. The following is best practice for every incremental model to slow down the drift this can cause. * 🕐 For example if most of our records for `2022-01-30` come in the raw schema of our warehouse on the morning of `2022-01-31`, but a handful don’t get loaded til `2022-02-02`, how might we tackle that? There will already be `max(updated_at)` timestamps of `2022-01-31` in the warehouse, filtering out those late records. **They’ll never make it to our model.** * 🪟 To mitigate this, we can add a **lookback window** to our **cutoff** point. By **subtracting a few days** from the `max(updated_at)`, we would capture any late data within the window of what we subtracted. * 👯 As long as we have a **`unique_key` defined in our config**, we’ll simply update existing rows and avoid duplication. We process more data this way, but in a fixed way, and it keeps our model hewing closer to the source data. ###### Using state-aware orchestration with incremental models[​](#using-state-aware-orchestration-with-incremental-models "Direct link to Using state-aware orchestration with incremental models") By default, [state-aware orchestration](https://docs.getdbt.com/docs/deploy/state-aware-about.md) detects source freshness by checking warehouse metadata for any new rows. This may cause models to run more often than needed. To avoid this issue, configure a `loaded_at_field` for a specific timestamp column or use a `loaded_at_query` with custom SQL to tell dbt which field to check for freshness. This helps state-aware orchestration to detect only genuinely new data. For information on how to configure `loaded_at_field` and `loaded_at_query`, refer to [Source freshness](https://docs.getdbt.com/reference/resource-properties/freshness.md) and [Advanced configurations](https://docs.getdbt.com/docs/deploy/state-aware-setup.md#advanced-configurations). Even with a `loaded_at_field` or `loaded_at_query`, late arriving records may have an earlier event timestamp (for example, `event_date`). In this case, state-aware orchestration may skip rebuilding the incremental model, even though your lookback window would normally pick up those records. To ensure late-arriving data is detected, configure your `loaded_at_query` to align with the same lookback window used in your incremental filter. For example, if your incremental model uses a 3-day lookback window: ```yaml sources: - name: raw_orders tables: - name: orders config: loaded_at_query: | select max(ingested_at) from {{ this }} where ingested_at >= current_timestamp - interval '3 days' ``` ##### Long-term considerations[​](#long-term-considerations "Direct link to Long-term considerations") Late arriving facts point to the biggest tradeoff with incremental models: * 🪢 In addition to extra **complexity**, they also inevitably **drift from the source data over time.** Due to the imperfection of loaders and the reality of late arriving facts, we can’t help but miss some day in-between our incremental runs, and this accumulates. * 🪟 We can slow this entropy with the lookback window described above — **the longer the window the less efficient the model, but the slower the drift.** It’s important to note it will still occur though, however slowly. If we have a lookback window of 3 days, and a record comes in 4 days late from the loader, we’re still going to miss it. * 🌍 Thankfully, there is a way we can reset the relationship of the model to the source data. We can run the model with the **`--full-refresh` flag passed** (such as `dbt build --full-refresh -s orders`). As we saw in the `is_incremental` conditions above, that will make our logic return false, and our `where` clause filter will not be applied, running the whole table. * 🏗️ This will let us **rebuild the entire table from scratch,** a good practice to do regularly **if the size of the data will allow**. * 📆 A common pattern for incremental models of manageable size is to run a **full refresh on the weekend** (or any low point in activity), either **weekly or monthly**, to consistently reset the drift from late arriving facts. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Incremental patterns for near real-time data This section covers three core incremental patterns for achieving near real-time data freshness: 1. [Incremental MERGE from append-only tables](#incremental-merge-from-append-only-tables) 2. [CDC with Snowflake Streams](#cdc-with-snowflake-streams) 3. [Microbatch for large time-series tables](#microbatch-for-large-time-series-tables) Snowflake-specific pattern Some patterns on this guide uses Snowflake-specific features. Other warehouses have similar features with different implementations. Refer to the [additional resources](https://docs.getdbt.com/best-practices/how-we-handle-real-time-data/3-warehouse-native-features.md#resources-by-warehouse) section for adapter-specific documentation. #### Pattern 1: Incremental MERGE from append-only tables[​](#incremental-merge-from-append-only-tables "Direct link to Pattern 1: Incremental MERGE from append-only tables") This pattern uses the `merge` incremental strategy to upsert (insert + update) new and updated rows into a target table. Most data platforms support the `merge` strategy. See the [supported incremental strategies by adapter](https://docs.getdbt.com/docs/build/incremental-strategy.md#supported-incremental-strategies-by-adapter) for details. "Append-only tables" refers to a data pattern where source data continuously receives new rows without updates or deletes. ##### When to use the merge strategy[​](#when-to-use-the-merge-strategy "Direct link to When to use the merge strategy") Use this pattern when raw events continuously land into a staging table and you want a near real-time fact table updated every few minutes. ##### Example model[​](#example-model "Direct link to Example model") In this example, assume you have raw events continuously landing into `raw.events` (using Snowpipe, Databricks Auto Loader, Kafka, or a similar ingestion mechanism) and you're looking for a near real‑time fact table `analytics.fct_events` updated every few minutes. Configure the SQL model with the following settings: * Use the `incremental` filter to only scan rows newer than the latest timestamp already in the target. * Use `incremental_strategy='merge'` with `unique_key=event_id` to give you idempotent upserts (inserts + updates). * Cluster by date using `cluster_by=['event_date']` helps with query pruning during `MERGE` operations (syntax varies by warehouse). * Run the model every few minutes to achieve a freshness service level agreement (SLA) measured in minutes, depending on ingestion and job scheduling. The following example uses Snowflake SQL syntax (`::` type casting, `timestamp_ntz`, `cluster_by` config). Make sure you adapt the SQL and clustering syntax for your warehouse. models/fct\_events.sql ```sql {{ config( materialized = 'incremental', incremental_strategy = 'merge', -- default on Snowflake unique_key = 'event_id', cluster_by = ['event_date'] -- helps MERGE performance (syntax varies by warehouse) ) }} with source_events as ( select event_id, event_ts::timestamp_ntz as event_ts, -- Snowflake syntax for type casting to_date(event_ts) as event_date, user_id, event_type, payload from {{ source('raw', 'events') }} {% if is_incremental() %} -- Only pull new/changed rows since last successful load where event_ts > (select max(event_ts) from {{ this }}) {% endif %} ), deduped as ( -- optional: if the raw feed can produce duplicates select * from ( select *, row_number() over ( partition by event_id order by event_ts desc ) as _rn from source_events ) where _rn = 1 ) select event_id, event_ts, event_date, user_id, event_type, payload from deduped; ``` To ensure the best results: * Use clustering keys wisely for better `MERGE` performance. * Monitor `MERGE` performance as your table grows. * Consider adding a lookback window (for example, `event_ts > max(event_ts) - interval '1 hour'`) to handle late-arriving data. #### Pattern 2: CDC with Snowflake Streams[​](#cdc-with-snowflake-streams "Direct link to Pattern 2: CDC with Snowflake Streams") This pattern leverages Snowflake's native Change Data Capture (CDC) capabilities through [Streams](https://docs.snowflake.com/en/user-guide/streams-intro), a Snowflake-specific feature which tracks changes (inserts, updates, deletes) to source tables. ##### When to use CDC[​](#when-to-use-cdc "Direct link to When to use CDC") Use CDC when: * You have source tables that receive frequent updates (not just appends). * You need to capture both new records and changes to existing records. * You want to avoid full table scans on large source tables. ##### Setup[​](#setup "Direct link to Setup") To use this pattern, set up the stream in your data warehouse and then create a model to consume the stream. 1. Create the stream (one-time, outside dbt): ```sql create or replace stream RAW.EVENTS_STREAM on table RAW.EVENTS; ``` 2. Create a model consuming the stream: models/fct\_events\_cdc.sql ```sql {{ config( materialized = 'incremental', incremental_strategy = 'merge', unique_key = 'event_id', cluster_by = ['event_date'], snowflake_warehouse = 'TRANSFORM_WH' ) }} with changes as ( select event_id, event_ts::timestamp_ntz as event_ts, to_date(event_ts) as event_date, user_id, event_type, payload, metadata$action as change_type -- points at the STREAM, not the table from {{ source('raw', 'events_stream') }} ), filtered as ( select * from changes where change_type in ('INSERT', 'UPDATE') -- If you want to physically delete, you could also handle 'DELETE' here ) select event_id, event_ts, event_date, user_id, event_type, payload from filtered; ``` ##### Pattern distinctions[​](#pattern-distinctions "Direct link to Pattern distinctions") There are some key differences from [pattern 1](#incremental-merge-from-append-only-tables): * Streams only return changed rows, so you don’t need an `is_incremental()` time filter. Each run processes only the changes available at the moment. * Run the model every few minutes to pull new changes and merge them into `fct_events`. * This gives you a CDC-style pipeline. Snowflake Streams captures changes, and dbt handles transformations, tests, and lineage. #### Pattern 3: Microbatch for large time-series tables[​](#microbatch-for-large-time-series-tables "Direct link to Pattern 3: Microbatch for large time-series tables") For large `fact` tables where backfills or long lookback windows are challenging, use `incremental_strategy='microbatch'` (available in dbt Core v1.9 or higher and Latest release track in dbt platform). Refer to [incremental microbatch](https://docs.getdbt.com/docs/build/incremental-microbatch.md) for more details. Note that Microsoft Fabric doesn't support microbatch yet. See [incremental strategy by adapter](https://docs.getdbt.com/docs/build/incremental-strategy.md#supported-incremental-strategies-by-adapter) for more details. microbatch must have event\_time Every upstream model feeding this microbatch model must also be configured with `event_time` so dbt can push time-filters upstream. Otherwise, each batch could re-scan full upstream tables. ##### When to use microbatch[​](#when-to-use-microbatch "Direct link to When to use microbatch") * You have massive time-series tables (billions of rows). * Backfills are slow and risky with traditional incremental approaches. * You need to reprocess data in manageable chunks. * Late-arriving data is common. ##### Model configuration[​](#model-configuration "Direct link to Model configuration") Let's say you have a `fact_events` table with a `event_ts` column and you want to process it in hourly chunks. You can configure the model as follows: models/fct\_events\_microbatch.sql ```sql {{ config( materialized = 'incremental', incremental_strategy= 'microbatch', event_time = 'event_ts', -- time column in this model batch_size = 'hour', -- process in hourly chunks lookback = 1, -- reprocess 1 prior batch to catch late data unique_key = 'event_id', cluster_by = ['event_date'], full_refresh = false ) }} select event_id, event_ts::timestamp_ntz as event_ts, to_date(event_ts) as event_date, user_id, event_type, payload from {{ ref('stg_events') }}; ``` ##### Key behavior[​](#key-behavior "Direct link to Key behavior") * Use microbatch for massive fact tables (clickstream, IoT, point-of-sale) with multi-year history. * No `is_incremental() block` needed — dbt automatically generates the appropriate `WHERE event_ts BETWEEN..` predicates per batch based on `event_time`, `batch_size`, `begin`, `lookback`, and so on. * Each run processes multiple smaller queries (one per batch), making larger backfills safer and easier to retry. * The `lookback` parameter automatically handles late-arriving data by reprocessing recent batches. * Schedule jobs based on your SLA. #### Choosing the right incremental pattern[​](#choosing-the-right-incremental-pattern "Direct link to Choosing the right incremental pattern") The pattern you select will depend on your use case. Start with [pattern 1](#incremental-merge-from-append-only-tables) (`MERGE`), since it's appropriate for most use cases. Upgrade to [pattern 2](#cdc-with-snowflake-streams) (use your data warehouse's native CDC features) when you need efficient CDC. Reach for [pattern 3](#microbatch-for-large-time-series-tables) (Microbatch) when dealing with massive scale. Use the following table to help you choose the right pattern: | Pattern | Best for | Key benefit | | ------------------------ | ---------------------------- | ---------------------------------- | | `merge` from append-only | Most standard use cases | Simple, widely understood | | CDC with Streams | Tables with frequent updates | Efficient change capture | | Microbatch | Massive time-series tables | Safe backfills, late-data handling | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | #### Related docs[​](#related-docs "Direct link to Related docs") * [Incremental models](https://docs.getdbt.com/docs/build/incremental-models-overview.md) * [Microbatch incremental models](https://docs.getdbt.com/docs/build/incremental-microbatch.md) * [Configuring incremental models in dbt](https://docs.getdbt.com/docs/build/incremental-models.md) #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Intermediate: Purpose-built transformation steps Once we’ve got our atoms ready to work with, we’ll set about bringing them together into more intricate, connected molecular shapes. The intermediate layer is where these molecules live, creating varied forms with specific purposes on the way towards the more complex proteins and cells we’ll use to breathe life into our data products. ##### Intermediate: Files and folders[​](#intermediate-files-and-folders "Direct link to Intermediate: Files and folders") Let’s take a look at the intermediate layer of our project to understand the purpose of this stage more concretely. ```shell models/intermediate └── finance ├── _int_finance__models.yml └── int_payments_pivoted_to_orders.sql ``` * **Folders** * ✅ **Subdirectories based on business groupings.** Much like the staging layer, we’ll house this layer of models inside their own `intermediate` subfolder. Unlike the staging layer, here we shift towards being business-conformed, splitting our models up into subdirectories not by their source system, but by their area of business concern. * **File names** * `✅ int_[entity]s_[verb]s.sql` - the variety of transformations that can happen inside of the intermediate layer makes it harder to dictate strictly how to name them. The best guiding principle is to think about *verbs* (e.g. `pivoted`, `aggregated_to_user`, `joined`, `fanned_out_by_quantity`, `funnel_created`, etc.) in the intermediate layer. In our example project, we use an intermediate model to pivot payments out to the order grain, so we name our model `int_payments_pivoted_to_orders`. It’s easy for anybody to quickly understand what’s happening in that model, even if they don’t know [SQL](https://mode.com/sql-tutorial/). That clarity is worth the long file name. It’s important to note that we’ve dropped the double underscores at this layer. In moving towards business-conformed concepts, we no longer need to separate a system and an entity and simply reference the unified entity if possible. In cases where you need intermediate models to operate at the source system level (e.g. `int_shopify__orders_summed`, `int_core__orders_summed` which you would later union), you’d preserve the double underscores. Some people like to separate the entity and verbs with double underscores as well. That’s a matter of preference, but in our experience, there is often an intrinsic connection between entities and verbs in this layer that make that difficult to maintain. Don’t over-optimize too early! The example project is very simple for illustrative purposes. This level of division in our post-staging layers is probably unnecessary when dealing with these few models. Remember, our goal is a *single* *source of truth.* We don’t want finance and marketing operating on separate `orders` models, we want to use our dbt project as a means to bring those definitions together! As such, don’t split and optimize too early. If you have less than 10 marts models and aren’t having problems developing and using them, feel free to forego subdirectories completely (except in the staging layer, where you should always implement them as you add new source systems to your project) until the project has grown to really need them. Using dbt is always about bringing simplicity to complexity. ##### Intermediate: Models[​](#intermediate-models "Direct link to Intermediate: Models") Below is the lone intermediate model from our small example project. This represents an excellent use case per our principles above, serving a clear single purpose: grouping and pivoting a staging model to different grain. It utilizes a bit of Jinja to make the model DRY-er (striving to be DRY applies to the code we write inside a single model in addition to transformations across the codebase), but don’t be intimidated if you’re not quite comfortable with [Jinja](https://docs.getdbt.com/docs/build/jinja-macros.md) yet. Looking at the name of the CTE, `pivot_and_aggregate_payments_to_order_grain` we get a very clear idea of what’s happening inside this block. By descriptively labeling the transformations happening inside our CTEs within our model, just as we do with our files and folders, even a stakeholder who doesn’t know SQL would be able to grasp the purpose of this section, if not the code. As you begin to write more complex transformations moving out of the staging layer, keep this idea in mind. In the same way our models connect into a DAG and tell the story of our transformations on a macro scale, CTEs can do this on a smaller scale inside our model files. ```sql -- int_payments_pivoted_to_orders.sql {%- set payment_methods = ['bank_transfer','credit_card','coupon','gift_card'] -%} with payments as ( select * from {{ ref('stg_stripe__payments') }} ), pivot_and_aggregate_payments_to_order_grain as ( select order_id, {% for payment_method in payment_methods -%} sum( case when payment_method = '{{ payment_method }}' and status = 'success' then amount else 0 end ) as {{ payment_method }}_amount, {%- endfor %} sum(case when status = 'success' then amount end) as total_amount from payments group by 1 ) select * from pivot_and_aggregate_payments_to_order_grain ``` * ❌ **Exposed to end users.** Intermediate models should generally not be exposed in the main production schema. They are not intended for output to final targets like dashboards or applications, so it’s best to keep them separated from models that are so you can more easily control data governance and discoverability. * ✅ **Materialized ephemerally.** Considering the above, one popular option is to default to intermediate models being materialized [ephemerally](https://docs.getdbt.com/docs/build/materializations.md#ephemeral). This is generally the best place to start for simplicity. It will keep unnecessary models out of your warehouse with minimum configuration. Keep in mind though that the simplicity of ephemerals does translate a bit more difficulty in troubleshooting, as they’re interpolated into the models that `ref` them, rather than existing on their own in a way that you can view the output of. * ✅ **Materialized as views in a custom schema with special permissions.** A more robust option is to materialize your intermediate models as views in a specific [custom schema](https://docs.getdbt.com/docs/build/custom-schemas.md), outside of your main production schema. This gives you added insight into development and easier troubleshooting as the number and complexity of your models grows, while remaining easy to implement and taking up negligible space. Keep your warehouse tidy! There are three interfaces to the organizational knowledge graph we’re encoding into dbt and folder structure of our codebase, and the output into the warehouse. As such, it’s really important that we consider that output intentionally! Think of the schemas, tables, and views we’re creating in the warehouse as *part of the UX,* in addition to the dashboards, ML, apps, and other use cases you may be targeting for the data. Ensuring that our output is named and grouped well, and that models not intended for broad use are either not materialized or built into special areas with specific permissions is crucial to achieving this. * Intermediate models’ purposes, as these serve to break up complexity from our marts models, can take as many forms as [data transformation](https://www.getdbt.com/analytics-engineering/transformation/) might require. Some of the most common use cases of intermediate models include: * ✅ **Structural simplification.** Bringing together a reasonable number (typically 4 to 6) of entities or concepts (staging models, or perhaps other intermediate models) that will be joined with another similarly purposed intermediate model to generate a mart — rather than have 10 joins in our mart, we can join two intermediate models that each house a piece of the complexity, giving us increased readability, flexibility, testing surface area, and insight into our components. * ✅ **Re-graining.** Intermediate models are often used to fan out or collapse models to the right composite grain — if we’re building a mart for `order_items` that requires us to fan out our `orders` based on the `quantity` column, creating a new single row for each item, this would be ideal to do in a specific intermediate model to maintain clarity in our mart and more easily view that our grain is correct before we mix it with other components. * ✅ **Isolating complex operations.** It’s helpful to move any particularly complex or difficult to understand pieces of logic into their own intermediate models. This not only makes them easier to refine and troubleshoot, but simplifies later models that can reference this concept in a more clearly readable way. For example, in the `quantity` fan out example above, we benefit by isolating this complex piece of logic so we can quickly debug and thoroughly test that transformation, and downstream models can reference `order_items` in a way that’s intuitively easy to grasp. Narrow the DAG, widen the tables. Until we get to the marts layer and start building our various outputs, we ideally want our DAG to look like an arrowhead pointed right. As we move from source-conformed to business-conformed, we’re also moving from numerous, narrow, isolated concepts to fewer, wider, joined concepts. We’re bringing our components together into wider, richer concepts, and that creates this shape in our DAG. This way when we get to the marts layer we have a robust set of components that can quickly and easily be put into any configuration to answer a variety of questions and serve specific needs. One rule of thumb to ensure you’re following this pattern on an individual model level is allowing multiple *inputs* to a model, but **not** multiple *outputs*. Several arrows going *into* our post-staging models is great and expected, several arrows coming *out* is a red flag. There are absolutely situations where you need to break this rule, but it’s something to be aware of, careful about, and avoid when possible. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Intro to dbt Mesh #### What is dbt Mesh?[​](#what-is-dbt-mesh "Direct link to What is dbt Mesh?") Organizations of all sizes rely upon dbt to manage their data transformations, from small startups to large enterprises. At scale, it can be challenging to coordinate all the organizational and technical requirements demanded by your stakeholders within the scope of a single dbt project. To date, there also hasn't been a first-class way to effectively manage the dependencies, governance, and workflows between multiple dbt projects. That's where **Mesh** comes in - empowering data teams to work *independently and collaboratively*; sharing data, code, and best practices without sacrificing security or autonomy. Mesh is not a single product - it is a pattern enabled by a convergence of several features in dbt: * **[Cross-project references](https://docs.getdbt.com/docs/mesh/govern/project-dependencies.md#how-to-write-cross-project-ref)** - this is the foundational feature that enables the multi-project deployments. `{{ ref() }}`s now work across dbt projects on Enterprise and Enterprise+ plans. * **[Catalog](https://docs.getdbt.com/docs/explore/explore-projects.md)** - dbt's metadata-powered documentation platform, complete with full, cross-project lineage. * **Governance** - dbt's governance features allow you to manage access to your dbt models both within and across projects. * **[Groups](https://docs.getdbt.com/docs/mesh/govern/model-access.md#groups)** - With groups, you can organize nodes in your dbt DAG that share a logical connection (for example, by functional area) and assign an owner to the entire group. * **[Access](https://docs.getdbt.com/docs/mesh/govern/model-access.md#access-modifiers)** - access configs allow you to control who can reference models. * **[Model Versions](https://docs.getdbt.com/docs/mesh/govern/model-versions.md)** - when coordinating across projects and teams, we recommend treating your data models as stable APIs. Model versioning is the mechanism to allow graceful adoption and deprecation of models as they evolve. * **[Model Contracts](https://docs.getdbt.com/docs/mesh/govern/model-contracts.md)** - data contracts set explicit expectations on the shape of the data to ensure data changes upstream of dbt or within a project's logic don't break downstream consumers' data products. #### When is the right time to use dbt Mesh?[​](#when-is-the-right-time-to-use-dbt-mesh "Direct link to When is the right time to use dbt Mesh?") The multi-project architecture helps organizations with mature, complex transformation workflows in dbt increase the flexibility and performance of their dbt projects. If you're already using dbt and your project has started to experience any of the following, you're likely ready to start exploring this paradigm: * The **number of models** in your project is degrading performance and slowing down development. * Teams have developed **separate workflows** and need to decouple development from each other. * Teams are experiencing **communication challenges**, and the reliability of some of your data products has started to deteriorate. * **Security and governance** requirements are increasing and would benefit from increased isolation. dbt is designed to coordinate the features above and simplify the complexity to solve for these problems. If you're just starting your dbt journey, don't worry about building a multi-project architecture right away. You can *incrementally* adopt the features in this guide as you scale. The collection of features work effectively as independent tools. Familiarizing yourself with the tooling and features that make up a multi-project architecture, and how they can apply to your organization will help you make better decisions as you grow. For additional information, refer to the [Mesh FAQs](https://docs.getdbt.com/best-practices/how-we-mesh/mesh-5-faqs.md). #### Learning goals[​](#learning-goals "Direct link to Learning goals") * Understand the **purpose and tradeoffs** of building a multi-project architecture. * Develop an intuition for various **Mesh patterns** and how to design a multi-project architecture for your organization. * Establish recommended steps to **incrementally adopt** these patterns in your dbt implementation. tip To help you get started, check out our [Quickstart with Mesh](https://docs.getdbt.com/guides/mesh-qs.md) or our online [Mesh course](https://learn.getdbt.com/courses/dbt-mesh) to learn more! #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Intro to the dbt Semantic Layer tip Note that this best practices guide doesn't yet use the [new YAML specification](https://docs.getdbt.com/docs/build/latest-metrics-spec.md). We're working on updating this guide to use the new spec and file structure soon! To read more about the new spec, see [Creating metrics](https://docs.getdbt.com/docs/build/metrics-overview.md). Flying cars, hoverboards, and true self-service analytics: this is the future we were promised. The first two might still be a few years out, but real self-service analytics is here today. With dbt's Semantic Layer, you can resolve the tension between accuracy and flexibility that has hampered analytics tools for years, empowering everybody in your organization to explore a shared reality of metrics. Best of all for analytics engineers, building with these new tools will significantly [DRY](https://docs.getdbt.com/terms/dry) up and simplify your codebase. As you'll see, the deep interaction between your dbt models and the Semantic Layer make your dbt project the ideal place to craft your metrics. #### Learning goals[​](#learning-goals "Direct link to Learning goals") * ❓ Understand the **purpose and capabilities** of the **Semantic Layer**, particularly MetricFlow as the engine that powers it. * 🧱 Familiarity with the core components of MetricFlow — **semantic models and metrics** — and how they work together. * 🔁 Know how to **refactor** dbt models for the Semantic Layer. * 🏅 Aware of **best practices** to take maximum advantage of the Semantic Layer. #### Guide structure overview[​](#guide-structure-overview "Direct link to Guide structure overview") 1. Getting **setup** in your dbt project. 2. Building a **semantic model** and its fundamental parts: **entities, dimensions, and measures**. 3. Building a **metric**. 4. Defining **advanced metrics**: `ratio` and `derived` types. 5. **File and folder structure**: establishing a system for naming things. 6. **Refactoring** marts and roll-ups for the Semantic Layer. 7. Review **best practices**. If you're ready to ship your users more power and flexibility with less code, let's dive in! info MetricFlow is the engine for defining metrics in dbt and one of the key components of the [Semantic Layer](https://docs.getdbt.com/docs/use-dbt-semantic-layer/dbt-sl.md). It handles SQL query construction and defines the specification for dbt semantic models and metrics. To fully experience the Semantic Layer, including the ability to query dbt metrics via external integrations, you'll need a [dbt Starter, Enterprise, or Enterprise+ accounts](https://www.getdbt.com/pricing/). Refer to [Semantic Layer FAQs](https://docs.getdbt.com/docs/use-dbt-semantic-layer/sl-faqs.md) for more information. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Lambda views for near real-time dashboards Snowflake examples ahead This page uses Snowflake for code examples, but you can adapt the lambda view pattern to other warehouses. A lambda view pattern combines a batch / incremental fact table with a small near real-time (NRT) slice of very recent data and exposes them through a single view. This is a legacy-but-still-useful pattern some teams have used to deliver near real‑time operational dashboards on top of dbt and a warehouse. #### When to use lambda views[​](#when-to-use-lambda-views "Direct link to When to use lambda views") * You need fresher reads than your normal incremental schedule, but * You can't (or don't want to) use [dynamic tables](https://docs.getdbt.com/reference/resource-configs/snowflake-configs.md#dynamic-tables) or [materialized views](https://docs.getdbt.com/docs/build/materializations.md#materialized-view), or you want to keep logic entirely in dbt SQL. The examples used in this page assume the following setup: ##### Assumptions[​](#assumptions "Direct link to Assumptions") The examples used in this page assume the following setup: * Raw events land continuously into `raw.events` using your warehouse's streaming ingestion feature (like Snowpipe, Databricks Auto Loader, Kafka, or a similar ingestion mechanism). * You already maintain an [incremental fact table](https://docs.getdbt.com/best-practices/how-we-handle-real-time-data/2-incremental-patterns.md#incremental-merge-from-append-only-tables) that is rebuilt every few minutes using `incremental_strategy='merge'`. * Most dashboards are fine reading from that incremental table, but a small set of operational dashboards want "as‑of‑now" data (for example, the last few minutes of events). ##### How this pattern works[​](#how-this-pattern-works "Direct link to How this pattern works") * The base incremental table is rebuilt every few minutes using `incremental_strategy='merge'`. * The NRT view is a view that selects only events newer than the max `event_ts` already persisted in the base incremental table. * The lambda view `UNION ALL`s the base table and the NRT view, de-duplicating rows based on primary key semantics. Downstream BI or dashboards query only the lambda view. #### Base incremental table[​](#base-incremental-table "Direct link to Base incremental table") You can reuse the incremental `merge` from [Snowflake pattern 1](https://docs.getdbt.com/best-practices/how-we-handle-real-time-data/2-incremental-patterns.md#incremental-merge-from-append-only-tables) as your base table; for completeness: ```sql -- models/fct_events.sql {{ config( materialized = 'incremental', incremental_strategy = 'merge', unique_key = 'event_id', cluster_by = ['event_date'], snowflake_warehouse = 'TRANSFORM_WH' ) }} with source_events as ( select event_id, event_ts::timestamp_ntz as event_ts, to_date(event_ts) as event_date, user_id, event_type, payload from {{ source('raw', 'events') }} {% if is_incremental() %} -- Only pull new/changed rows since last successful load where event_ts > (select max(event_ts) from {{ this }}) {% endif %} ) select * from source_events; ``` Schedule this model to run, for example, every 5–15 minutes as part of your near real‑time job. #### NRT view: rows more recent than the base table[​](#nrt-view-rows-more-recent-than-the-base-table "Direct link to NRT view: rows more recent than the base table") The NRT view returns only events with `event_ts` greater than the maximum timestamp in the base table, so there is no overlap or double counting: ```sql -- models/fct_events_nrt.sql {{ config( materialized = 'view' ) }} with base_max as ( select max(event_ts) as max_event_ts from {{ ref('fct_events') }} ), fresh_events as ( select e.event_id, e.event_ts::timestamp_ntz as event_ts, to_date(e.event_ts) as event_date, e.user_id, e.event_type, e.payload from {{ source('raw', 'events') }} as e cross join base_max where e.event_ts > base_max.max_event_ts ) select * from fresh_events; ``` Characteristics: * No scheduling required — it's just a view over `raw.events` filtered by `max(event_ts)` from `fct_events`. * Every query against `fct_events_nrt` scans only "since last batch" data, which should be a small time window (for example, a few minutes or hours, depending on your job cadence). ##### Lambda view: single read path for BI[​](#lambda-view-single-read-path-for-bi "Direct link to Lambda view: single read path for BI") The lambda view combines historical data from the base incremental table with the most recent events from the NRT view. ```sql -- models/fct_events_lambda.sql {{ config( materialized = 'view' ) }} select event_id, event_ts, event_date, user_id, event_type, payload from {{ ref('fct_events') }} union all select event_id, event_ts, event_date, user_id, event_type, payload from {{ ref('fct_events_nrt') }}; ``` Point your BI tools and dashboards to `analytics.fct_events_lambda`. Most data comes from the pre-computed incremental table, while the most recent events (since the last dbt run) come from a live query against `raw.events`. This approach is outlined in [this original dbt lambda view blog post](https://discourse.getdbt.com/t/how-to-create-near-real-time-models-with-just-dbt-sql/1457) which describes how teams like JetBlue wired near real‑time operational dashboards on Snowflake and dbt. #### Considerations[​](#considerations "Direct link to Considerations") Take the following into consideration when using this pattern: * **Cost profile** * Every query against `fct_events_lambda` must read the NRT slice from `raw.events` in addition to the base table. * Use this pattern only for truly operational dashboards that justify the extra per‑query cost. * **Freshness** * Freshness is bounded by: * Your dbt incremental job frequency (age of `fct_events`), plus * Ingestion latency into `raw.events` (Snowpipe / streaming layer). * **Complexity vs alternatives** * For many modern Snowflake implementations, a [dynamic table](https://docs.getdbt.com/reference/resource-configs/snowflake-configs.md#dynamic-tables) or [materialized view](https://docs.getdbt.com/docs/build/materializations.md#materialized-view) with a small `target_lag` can provide similar "always within X minutes" service level agreements with less custom SQL and warehouse‑managed incremental logic. * Lambda views are best positioned as an *advanced / legacy pattern* you can still use for when you: * Want all logic in dbt SQL * Lack the right warehouse feature in your environment * Are extending an existing implementation already built this way #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Marts: Business-defined entities info Our guidance here diverges if you use the Semantic Layer. In a project without the Semantic Layer we recommend you denormalize heavily, per the best practices below. On the other hand, if you're using the Semantic Layer, we want to stay as normalized as possible to allow MetricFlow the most flexibility. See [The dbt Semantic Layer and marts](#the-dbt-semantic-layer-and-marts) for more information. This is the layer where everything comes together and we start to arrange all of our atoms (staging models) and molecules (intermediate models) into full-fledged cells that have identity and purpose. We sometimes like to call this the *entity* *layer* or *concept layer*, to emphasize that all our marts are meant to represent a specific entity or concept at its unique grain. For instance, an order, a customer, a territory, a click event, a payment — each of these would be represented with a distinct mart, and each row would represent a discrete instance of these concepts. Unlike in a traditional Kimball star schema though, in modern data warehousing — where storage is cheap and compute is expensive — we’ll happily borrow and add any and all data from other concepts that are relevant to answering questions about the mart’s core entity. Building the same data in multiple places, as we do with `orders` in our `customers` mart example below, is more efficient in this paradigm than repeatedly rejoining these concepts (this is a basic definition of denormalization in this context). Let’s take a look at how we approach this first layer intended expressly for exposure to end users. ##### Marts: Files and folders[​](#marts-files-and-folders "Direct link to Marts: Files and folders") The last layer of our core transformations is below, providing models for both `finance` and `marketing` departments. ```shell models/marts ├── finance │ ├── _finance__models.yml │ ├── orders.sql │ └── payments.sql └── marketing ├── _marketing__models.yml └── customers.sql ``` ✅ **Group by department or area of concern.** If you have fewer than 10 or so marts you may not have much need for subfolders, so as with the intermediate layer, don’t over-optimize too early. If you do find yourself needing to insert more structure and grouping though, use useful business concepts here. In our marts layer, we’re no longer worried about source-conformed data, so grouping by departments (marketing, finance, etc.) is the most common structure at this stage. ✅ **Name by entity.** Use plain English to name the file based on the concept that forms the grain of the mart’s `customers`, `orders`. Marts that don't include any time-based rollups (pure marts) should not have a time dimension (`orders_per_day`) here, typically best captured via metrics. ❌ **Build the same concept differently for different teams.** `finance_orders` and `marketing_orders` is typically considered an anti-pattern. There are, as always, exceptions — a common pattern we see is that, finance may have specific needs, for example reporting revenue to the government in a way that diverges from how the company as a whole measures revenue day-to-day. Just make sure that these are clearly designed and understandable as *separate* concepts, not departmental views on the same concept: `tax_revenue` and `revenue` not `finance_revenue` and `marketing_revenue`. ##### Marts: Models[​](#marts-models "Direct link to Marts: Models") Finally we’ll take a look at the best practices for models within the marts directory by examining two of our marts models. These are the business-conformed — that is, crafted to our vision and needs — entities we’ve been bringing these transformed components together to create. ```sql -- orders.sql with orders as ( select * from {{ ref('stg_jaffle_shop__orders' )}} ), order_payments as ( select * from {{ ref('int_payments_pivoted_to_orders') }} ), orders_and_order_payments_joined as ( select orders.order_id, orders.customer_id, orders.order_date, coalesce(order_payments.total_amount, 0) as amount, coalesce(order_payments.gift_card_amount, 0) as gift_card_amount from orders left join order_payments on orders.order_id = order_payments.order_id ) select * from orders_and_order_payments_joined ``` ```sql -- customers.sql with customers as ( select * from {{ ref('stg_jaffle_shop__customers')}} ), orders as ( select * from {{ ref('orders')}} ), customer_orders as ( select customer_id, min(order_date) as first_order_date, max(order_date) as most_recent_order_date, count(order_id) as number_of_orders, sum(amount) as lifetime_value from orders group by 1 ), customers_and_customer_orders_joined as ( select customers.customer_id, customers.first_name, customers.last_name, customer_orders.first_order_date, customer_orders.most_recent_order_date, coalesce(customer_orders.number_of_orders, 0) as number_of_orders, customer_orders.lifetime_value from customers left join customer_orders on customers.customer_id = customer_orders.customer_id ) select * from customers_and_customer_orders_joined ``` * ✅ **Materialized as tables or incremental models.** Once we reach the marts layer, it’s time to start building not just our logic into the warehouse, but the data itself. This gives end users much faster performance for these later models that are actually designed for their use, and saves us costs recomputing these entire chains of models every time somebody refreshes a dashboard or runs a regression in python. A good general rule of thumb regarding materialization is to always start with a view (as it takes up essentially no storage and always gives you up-to-date results), once that view takes too long to practically *query*, build it into a table, and finally once that table takes too long to *build* and is slowing down your runs, [configure it as an incremental model](https://docs.getdbt.com/docs/build/incremental-models.md). As always, start simple and only add complexity as necessary. The models with the most data and compute-intensive transformations should absolutely take advantage of dbt’s excellent incremental materialization options, but rushing to make all your marts models incremental by default will introduce superfluous difficulty. We recommend reading this [classic post from Tristan on the limits of incremental modeling](https://discourse.getdbt.com/t/on-the-limits-of-incrementality/303). * ✅ **Wide and denormalized.** Unlike old school warehousing, in the modern data stack storage is cheap and it’s compute that is expensive and must be prioritized as such, packing these into very wide denormalized concepts that can provide everything somebody needs about a concept as a goal. * ❌ **Too many joins in one mart.** One good rule of thumb when building dbt transformations is to avoid bringing together too many concepts in a single mart. What constitutes ‘too many’ can vary. If you need to bring 8 staging models together with nothing but simple joins, that might be fine. Conversely, if you have 4 concepts you’re weaving together with some complex and computationally heavy window functions, that could be too much. You need to weigh the number of models you’re joining against the complexity of the logic within the mart, and if it’s too much to read through and build a clear mental model of then look to modularize. While this isn’t a hard rule, if you’re bringing together more than 4 or 5 concepts to create your mart, you may benefit from adding some intermediate models for added clarity. Two intermediate models that bring together three concepts each, and a mart that brings together those two intermediate models, will typically result in a much more readable chain of logic than a single mart with six joins. * ✅ **Build on separate marts thoughtfully.** While we strive to preserve a narrowing DAG up to the marts layer, once here things may start to get a little less strict. A common example is passing information between marts at different grains, as we saw above, where we bring our `orders` mart into our `customers` marts to aggregate critical order data into a `customer` grain. Now that we’re really ‘spending’ compute and storage by actually building the data in our outputs, it’s sensible to leverage previously built resources to speed up and save costs on outputs that require similar data, versus recomputing the same views and CTEs from scratch. The right approach here is heavily dependent on your unique DAG, models, and goals — it’s just important to note that using a mart in building another, later mart is okay, but requires careful consideration to avoid wasted resources or circular dependencies. Marts are entity-grained. The most important aspect of marts is that they contain all of the useful data about a *particular entity* at a granular level. That doesn’t mean we don’t bring in lots of other entities and concepts, like tons of `user` data into our `orders` mart, we do! It just means that individual `orders` remain the core grain of our table. If we start grouping `users` and `orders` along a [date spine](https://github.com/dbt-labs/dbt-utils#date_spine-source), into something like `user_orders_per_day`, we’re moving past marts into *metrics*. ##### Marts: Other considerations[​](#marts-other-considerations "Direct link to Marts: Other considerations") * **Troubleshoot via tables.** While stacking views and ephemeral models up until our marts — only building data into the warehouse at the end of a chain when we have the models we really want end users to work with — is ideal in production, it can present some difficulties in development. Particularly, certain errors may seem to be surfacing in our later models that actually stem from much earlier dependencies in our model chain (ancestor models in our DAG that are built before the model throws the errors). If you’re having trouble pinning down where or what a database error is telling you, it can be helpful to temporarily build a specific chain of models as tables so that the warehouse will throw the error where it’s actually occurring. ##### The dbt Semantic Layer and marts[​](#the-dbt-semantic-layer-and-marts "Direct link to The dbt Semantic Layer and marts") Our structural recommendations are impacted quite a bit by whether or not you’re using the Semantic Layer. If you're using the Semantic Layer, we recommend a more normalized approach to your marts. If you're not using the Semantic Layer, we recommend a more denormalized approach that has become typical in dbt projects. For the full list of recommendations on structure, naming, and organization in the Semantic Layer, check out the [How we build our metrics](https://docs.getdbt.com/best-practices/how-we-build-our-metrics/semantic-layer-1-intro.md) guide, particularly the [Refactoring an existing rollup](https://docs.getdbt.com/best-practices/how-we-build-our-metrics/semantic-layer-8-refactor-a-rollup.md) section. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Materializations best practices What *really* happens when you type `dbt build`? Contrary to popular belief, a crack team of microscopic data elves do *not* construct your data row by row, although the truth feels equally magical. This guide explores the real answer to that question, with an introductory look at the objects that get built into your warehouse, why they matter, and how dbt knows what to build. Learn by video! For video tutorials on Snapshots , go to dbt Learn and check out the [Snapshots](https://learn.getdbt.com/courses/snapshots) [ course](https://learn.getdbt.com/courses/snapshots). The configurations that tell dbt how to construct these objects are called *materializations,* and knowing how to use them is a crucial skill for effective analytics engineering. When you’ve completed this guide, you will have that ability to use the three core materializations that cover most common analytics engineering situations. info 😌 **Materializations abstract away DDL and DML**. Typically in raw SQL- or python-based [data transformation](https://www.getdbt.com/analytics-engineering/transformation/), you have to write specific imperative instructions on how to build or modify your data objects. dbt’s materializations make this declarative, we tell dbt how we want things to be constructed and it figures out how to do that given the unique conditions and qualities of our warehouse. ##### Learning goals[​](#learning-goals "Direct link to Learning goals") By the end of this guide you should have a solid understanding of: * 🛠️ what **materializations** are * 👨‍👨‍👧 how the three main materializations that ship with dbt — **table**, **view**, and **incremental** — differ * 🗺️ **when** and **where** to use specific materializations to optimize your development and production builds * ⚙️ how to **configure materializations** at various scopes, from an individual model to entire folder ##### Prerequisites[​](#prerequisites "Direct link to Prerequisites") * 📒 You’ll want to have worked through the [quickstart guide](https://docs.getdbt.com/guides.md) and have a project setup to work through these concepts. * 🏃🏻‍♀️ Concepts like dbt runs, `ref()` statements, and models should be familiar to you. * 🔧 \[**Optional**] Reading through the [How we structure our dbt projects](https://docs.getdbt.com/best-practices/how-we-structure/1-guide-overview.md) Guide will be beneficial for the last section of this guide, when we review best practices for materializations using the dbt project approach of staging models and marts. ##### Guiding principle[​](#guiding-principle "Direct link to Guiding principle") We’ll explore this in-depth throughout, but the basic guideline is **start as simple as possible**. We’ll follow a tiered approached, only moving up a tier when it’s necessary. * 🔍 **Start with a view.** When the view gets too long to *query* for end users, * ⚒️ **Make it a table.** When the table gets too long to *build* in your dbt Jobs, * 📚 **Build it incrementally.** That is, layer the data on in chunks as it comes in. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### More advanced metrics #### More advanced metric types[​](#more-advanced-metric-types "Direct link to More advanced metric types") We're not limited to just passing measures through to our metrics, we can also *combine* measures to model more advanced metrics. * 🍊 **Ratio** metrics are, as the name implies, about **comparing two metrics as a numerator and a denominator** to form a new metric, for instance the percentage of order items that are food items instead of drinks. * 🧱 **Derived** metrics are when we want to **write an expression** that calculates a metric **using multiple metrics**. A classic example here is our gross profit calculated by subtracting costs from revenue. * ➕ **Cumulative** metrics calculate all of a **measure over a given window**, such as the past week, or if no window is supplied, the all-time total of that measure. #### Ratio metrics[​](#ratio-metrics "Direct link to Ratio metrics") * 🔢 We need to establish one measure that will be our **numerator**, and one that will be our **denominator**. * 🥪 Let's calculate the **percentage** of our Jaffle Shop revenue that **comes from food items**. * 💰 We already have our denominator, revenue, but we'll want to **make a new metric for our numerator** called `food_revenue`. models/marts/orders.yml ```yml - name: food_revenue description: The revenue from food in each order. label: Food Revenue type: simple type_params: measure: food_revenue ``` * 📝 Now we can set up our ratio metric. models/marts/orders.yml ```yml - name: food_revenue_pct description: The % of order revenue from food. label: Food Revenue % type: ratio type_params: numerator: food_revenue denominator: revenue ``` #### Derived metrics[​](#derived-metrics "Direct link to Derived metrics") * 🆙 Now let's really have some fun. One of the most important metrics for any business is not just revenue, but *revenue growth*. Let's use a derived metric to build month-over-month revenue. * ⚙️ A derived metric has a couple key components: * 📚 A list of metrics to build on. These can be manipulated and filtered in various way, here we'll use the `offset_window` property to lag by a month. * 🧮 An expression that performs a calculation with these metrics. * With these parts we can assemble complex logic that would otherwise need to be 'frozen' in logical models. models/marts/orders.yml ```yml - name: revenue_growth_mom description: "Percentage growth of revenue compared to 1 month ago. Excluded tax" type: derived label: Revenue Growth % M/M type_params: expr: (current_revenue - revenue_prev_month) * 100 / revenue_prev_month metrics: - name: revenue alias: current_revenue - name: revenue offset_window: 1 month alias: revenue_prev_month ``` #### Cumulative metrics[​](#cumulative-metrics "Direct link to Cumulative metrics") * ➕ Lastly, lets build a **cumulative metric**. In keeping with our theme of business priorities, let's continue with revenue and build an **all-time revenue metric** for any given time window. * 🪟 All we need to do is indicate the type is `cumulative` and not supply a `window` in the `type_params`, which indicates we want cumulative for the entire time period our end users select. models/marts/orders.yml ```yml - name: cumulative_revenue description: The cumulative revenue for all orders. label: Cumulative Revenue (All Time) type: cumulative type_params: measure: revenue ``` #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Near real-time data in dbt By design, dbt is batch-oriented with jobs having a defined start and end time. But did you know that you can also use dbt to get near real-time data by combining your data warehouse's continuous ingestion with frequent dbt transformations? This guide covers multiple patterns for achieving near real-time data freshness with dbt: 1. [Incremental patterns](https://docs.getdbt.com/best-practices/how-we-handle-real-time-data/2-incremental-patterns.md) — `merge` strategies, Change Data Capture (CDC), and microbatch processing 2. [Warehouse-native features](https://docs.getdbt.com/best-practices/how-we-handle-real-time-data/3-warehouse-native-features.md) — When to use dynamic tables and materialized views 3. [Lambda views pattern](https://docs.getdbt.com/best-practices/how-we-handle-real-time-data/4-lambda-views.md) — Combining batch and real-time data in a single view 4. [Views-only pattern](https://docs.getdbt.com/best-practices/how-we-handle-real-time-data/5-views-only-pattern.md) - Maximum freshness for lightweight transformations 5. [Operational considerations](https://docs.getdbt.com/best-practices/how-we-handle-real-time-data/6-operational-considerations.md) — Challenges, risks, and cost management Each pattern includes practical code examples, use cases, and tradeoffs to help you choose the right approach. Anyone can use this guide, but it's primarily for data engineers and architects who want to achieve near real-time data freshness with dbt. #### Where does dbt fit?[​](#where-does-dbt-fit "Direct link to Where does dbt fit?") There are two main ways to use dbt to get near real-time data: * For near real-time (5 - 15 minutes) — dbt excels at this and is well-suited for most operational dashboards. * For true real-time (sub-second) — This requires dedicated streaming databases (ClickHouse, Materialize, Rockset, and so on) in front of or alongside dbt; dbt still owns “analytic” tables and history but not the ultra‑low‑latency read path. #### How dbt achieves near real-time data[​](#how-dbt-achieves-near-real-time-data "Direct link to How dbt achieves near real-time data") To achieve real-time data with dbt, we recommend using a two-layer architecture: ###### Ingestion layer[​](#ingestion-layer "Direct link to Ingestion layer") Continuous data landing using your data warehouse's streaming ingestion features. Streaming ingestion features such as [streaming tables](https://docs.databricks.com/en/sql/load-data-streaming-table.html), [Snowpipe](https://docs.snowflake.com/en/user-guide/snowpipe-streaming/data-load-snowpipe-streaming-overview), or [Storage Write API](https://docs.cloud.google.com/bigquery/docs/write-api-streaming) work well for this. To find streaming ingestion features for your warehouse, refer to the [additional resources](https://docs.getdbt.com/best-practices/how-we-handle-real-time-data/3-warehouse-native-features.md#resources-by-warehouse) section. ###### dbt transformation layer[​](#dbt-transformation-layer "Direct link to dbt transformation layer") Run dbt every few minutes to transform the data, and use materialized views or dynamic tables for the lowest-latency reporting. Specific transformation approaches include: * [Incremental models](https://docs.getdbt.com/docs/build/incremental-models-overview.md) with merge or append strategies * [Microbatch incremental strategy](https://docs.getdbt.com/docs/build/incremental-microbatch.md) for large time-series tables * Jobs scheduled very frequently (like every 5 minutes) * [Dynamic tables](https://docs.getdbt.com/reference/resource-configs/snowflake-configs.md#dynamic-tables) or [materialized views](https://docs.getdbt.com/docs/build/materializations.md#materialized-view) with short refresh intervals #### Key recommendations[​](#key-recommendations "Direct link to Key recommendations") The following are some key recommendations to help you achieve near real-time data freshness with dbt: * Ingest data continuously: Use your warehouse's native streaming or micro-batch ingestion to land raw data as soon as it arrives. * Transform with dbt on a frequent schedule: Schedule dbt jobs to run as often as your business needs allow (for example, every 1–15 minutes). Balance freshness with cost and resource constraints. * Materialized views and dynamic tables: For the lowest-latency reporting, use materialized views or dynamic tables. These can be refreshed as frequently as every minute. * Incremental models and microbatching: Use dbt's incremental models to process only new or changed data, keeping transformations efficient and scalable. * Decouple ingestion from transformation: Keep data acquisition and transformation flows separate. This allows you to optimize each independently. * Monitor and test data freshness: Implement data quality checks and freshness monitoring to ensure your near real-time pipelines deliver accurate, up-to-date results. * Cost and complexity considerations: Running dbt jobs more frequently drives up compute costs and operational complexity. Always weigh the business value against these trade-offs. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Now it's your turn #### BYO Styles[​](#byo-styles "Direct link to BYO Styles") Now that you've seen how we style our dbt projects, it's time to build your own. Feel free to copy this guide and use it as a template for your own project. If you do, we'd love to hear about it! Reach out to us on [the Community Forum](https://discourse.getdbt.com/c/show-and-tell/22) or [Slack](https://www.getdbt.com/community) to share your style guide. We recommend co-locating your style guide with your code to make sure contributors can easily follow it. If you're using GitHub, you can add your style guide to your repository's wiki, or include it in your README. #### Pre-commit hooks[​](#pre-commit-hooks "Direct link to Pre-commit hooks") You can use [pre-commit hooks](https://pre-commit.com/) to automatically check your code for style violations (and often fix them automagically) before you commit. This is a great way to make sure all contributors follow your style guide. We recommend implementing this once you've settled on and published your style guide, and your codebase is conforming to it. This will ensure that all future commits follow the style guide. You can find an excellent set of open source pre-commit hooks for dbt from the community [here in the dbt-checkpoint project](https://github.com/dbt-checkpoint/dbt-checkpoint). #### dbt Project Evaluator[​](#dbt-project-evaluator "Direct link to dbt Project Evaluator") The [`dbt_project_evaluator`](https://github.com/dbt-labs/dbt-project-evaluator) is a package that ensures compliance to [dbt's style guide and best practices](https://docs.getdbt.com/best-practices/how-we-structure/1-guide-overview.md). The `dbt_project_evaluator` package highlights areas of a dbt project that are not aligned with dbt's best practices and provides recommendations on how to improve a project. This enables analytics engineers to determine exactly where their projects deviated from dbt's best practices and improve their projects on their own. The `dbt_project_evaluator` package covers the following categories: * Modeling * Testing * Documentation * Structure * Performance * Governance For more information, see [Introducing the dbt\_project\_evaluator: Automatically evaluate your dbt project for alignment with best practices](https://docs.getdbt.com/blog/align-with-dbt-project-evaluator). #### Style guide template[​](#style-guide-template "Direct link to Style guide template") ```markdown # dbt Example Style Guide ## SQL Style - Use lowercase keywords. - Use trailing commas. ## Model Organization Our models (typically) fit into two main categories:\ - Staging — Contains models that clean and standardize data. - Marts — Contains models which combine or heavily transform data. Things to note: - There are different types of models that typically exist in each of the above categories. See Model Layers for more information. - Read How we structure our dbt projects for an example and more details around organization. ## Model Layers - Only models in `staging` should select from sources. - Models not in the `staging` folder should select from refs. ## Model File Naming and Coding - All objects should be plural. Example: `stg_stripe__invoices.sql` vs. `stg_stripe__invoice.sql` - All models should use the naming convention `___`. See this article for more information. - Models in the **staging** folder should use the source's name as the `` and the entity name as the `additional_context`. Examples: - seed_snowflake_spend.csv - base_stripe\_\_invoices.sql - stg_stripe\_\_customers.sql - stg_salesforce\_\_customers.sql - int_customers\_\_unioned.sql - fct_orders.sql - Schema, table, and column names should be in `snake_case`. - Limit the use of abbreviations that are related to domain knowledge. An onboarding employee will understand `current_order_status` better than `current_os`. - Use names based on the _business_ rather than the source terminology. - Each model should have a primary key to identify the unique row and should be named `_id`. For example, `account_id`. This makes it easier to know what `id` is referenced in downstream joined models. - For `base` or `staging` models, columns should be ordered in categories, where identifiers are first and date/time fields are at the end. - Date/time columns should be named according to these conventions: - Timestamps: `_at` Format: UTC Example: `created_at` - Dates: `_date` Format: Date Example: `created_date` - Booleans should be prefixed with `is_` or `has_`. Example: `is_active_customer` and `has_admin_access` - Price/revenue fields should be in decimal currency (for example, `19.99` for $19.99; many app databases store prices as integers in cents). If a non-decimal currency is used, indicate this with suffixes. For example, `price_in_cents`. - Avoid using reserved words (such as these for Snowflake) as column names. - Consistency is key! Use the same field names across models where possible. For example, a key to the `customers` table should be named `customer_id` rather than `user_id`. ## Model Configurations - Model configurations at the folder level should be considered (and if applicable, applied) first. - More specific configurations should be applied at the model level using one of these methods. - Models within the `marts` folder should be materialized as `table` or `incremental`. - By default, `marts` should be materialized as `table` within `dbt_project.yml`. - If switching to `incremental`, this should be specified in the model's configuration. ## Testing - At a minimum, `unique` and `not_null` tests should be applied to the expected primary key of each model. ## CTEs For more information about why we use so many CTEs, read this glossary entry. - Where performance permits, CTEs should perform a single, logical unit of work. - CTE names should be as verbose as needed to convey what they do. - CTEs with confusing or noteable logic should be commented with SQL comments as you would with any complex functions and should be located above the CTE. - CTEs duplicated across models should be pulled out and created as their own models. ``` #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Operational considerations for near real-time data Teams that implement very high-frequency dbt jobs tend to run into a consistent set of challenges, both at the dbt scheduler layer and in the warehouse itself. Treat near real-time as a premium service Near real-time service level agreements (SLAs) require premium resources and add significant operational overhead. Pressure-test whether the business really needs minute-level freshness before committing. #### Over-scheduled jobs and queue management[​](#over-scheduled-jobs-and-queue-management "Direct link to Over-scheduled jobs and queue management") If a job's run duration is longer than its schedule frequency, the job becomes over-scheduled. The queue grows faster than the scheduler can process runs, and dbt platform will start cancelling queued runs to avoid an ever-expanding backlog. This is easy to hit with near real-time patterns if your incremental build time creeps up (more models, more tests, more data) but the cron schedule stays aggressive (for example, every 2–5 minutes). **Example scenario:** * Your job is scheduled to run every 5 minutes. * The job typically takes 6-7 minutes to complete. * New runs queue up while previous runs are still executing. * dbt platform starts cancelling queued runs to prevent infinite backlog. When this happens, remediation is non-trivial. You need to either refactor the job to run faster (prune model selection, adjust threads, optimize SQL) or relax the schedule and accept a looser freshness SLA. ###### Related scheduler constraints[​](#related-scheduler-constraints "Direct link to Related scheduler constraints") * Run slots limit how many jobs can run concurrently. Frequent near real-time jobs can starve other deployment jobs if slot usage isn't planned. * The scheduler runs distinct executions of the same job serially. If one run is still in progress when the next cron fires, the second run must wait (or be cancelled in an over-scheduled scenario). #### Warehouse cost and utilization[​](#warehouse-cost-and-utilization "Direct link to Warehouse cost and utilization") As the gap between job runtime and schedule interval shrinks, your warehouse is effectively running continuously to keep up with back-to-back transformation windows. **Cost scaling example:** * Daily job: Warehouse runs 30 min/day = ~2% utilization * Hourly job: Warehouse runs 30 min × 24 = 12 hours/day = 50% utilization * 5-minute job: Warehouse runs nearly 24/7 = ~100% utilization On platforms like Snowflake, ingestion options like Snowpipe for high-volume real-time feeds can be very expensive (cost per 1,000 files plus compute). Warehouse-managed options for freshness (for example, dynamic tables and materialized views) can also be harder to predict and monitor from a cost perspective, especially when underlying data is changing frequently. The net effect: you should treat near real-time SLAs as a premium service and pressure-test whether the business really needs minute-level freshness on each workload. #### Lambda view DAG complexity and correctness[​](#lambda-view-dag-complexity-and-correctness "Direct link to Lambda view DAG complexity and correctness") If you're using the [lambda views pattern](https://docs.getdbt.com/best-practices/how-we-handle-real-time-data/4-lambda-views.md), you face additional complexity: * **Duplicated logic**: You either centralize SQL in macros (more DRY, less readable) or duplicate the same transformations in both history (HIST) and NRT flows (more readable, more to maintain). * **Complex DAGs**: Every "product" model now has at least three artifacts (HIST table, NRT view, lambda union), plus supporting upstream layers. * **Materialization brittleness**: The pattern depends on specific materializations (views vs incrementals). A seemingly harmless materialization change can break freshness or correctness. On top of that, community experience has surfaced timing gaps between HIST and NRT flows: * Views (NRT) often update much faster than incremental tables. During a run, the NRT side may start filtering on the new `max(event_ts)` before the incremental table has finished loading, producing temporary holes in the unioned lambda view where recent data disappears briefly. * One way to mitigate is to introduce an explicit dependency from NRT to the incremental model (for example, a manual dependency on `{{ ref('fct_events') }}` comment), but this is somewhat brittle and increases coupling. #### Job reliability and resource limits[​](#job-reliability-and-resource-limits "Direct link to Job reliability and resource limits") High-frequency jobs are more likely to surface job-level failures: * **Memory limits** * Memory-heavy macros (for example, large `run_query()` results) or big doc-generation steps can hit account-level memory limits. * This causes runs to terminate with "memory limit" errors. * **Auto-deactivation** * A job that fails repeatedly can be auto-deactivated after 100 consecutive failures. * When this happens, scheduled triggers stop until someone manually intervenes. * **Smaller margin for error** * A flaky model, test, or small regression can quickly generate many failed runs. * This creates noisy alerts and can hit the auto-deactivation threshold faster. #### Ingestion architecture dependencies[​](#ingestion-architecture-dependencies "Direct link to Ingestion architecture dependencies") Lambda views and NRT dbt jobs sit on top of your ingestion architecture: * **The dependency** * If ingestion latency or throughput degrades (issues in a task/stream pipeline, backlogs in storage, intermittent Snowpipe delays), the lambda view can only union what has already arrived. * You can't make data fresher than your ingestion layer allows. * **What you end up tuning** * Task cadences and partition strategies in the landing zone * Lambda overlap windows and incremental look-backs * Which sources really need to participate in the NRT path #### Conclusion[​](#conclusion "Direct link to Conclusion") These challenges are why we position lambda views and ultra-frequent dbt schedules as special-case patterns. They're powerful when you truly need them, but they require deliberate design around scheduler behavior, cost, DAG structure, and ingestion architecture. In many cases, they're better replaced by [dynamic tables](https://docs.getdbt.com/best-practices/how-we-handle-real-time-data/3-warehouse-native-features.md#dynamic-tables), [materialized views](https://docs.getdbt.com/best-practices/how-we-handle-real-time-data/3-warehouse-native-features.md#materialized-views), or a dedicated streaming stack. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Refactor an existing rollup #### A new approach[​](#a-new-approach "Direct link to A new approach") Now that we've set the stage, it's time to dig in to the fun and messy part: how do we refactor an existing rollup in dbt into semantic models and metrics? Let's look at the differences we can observe in how we might approach this with MetricFlow supercharging dbt versus how we work without a Semantic Layer. These differences can then inform our structure. * 🍊 In dbt, we tend to create **highly denormalized datasets** that bring **everything you want around a certain entity or process into a single table**. * 💜 The problem is, this **limits the dimensionality available to MetricFlow**. The more we pre-compute and 'freeze' into place, the less flexible our data is. * 🚰 In MetricFlow, we ideally want **highly normalized**, star schema-like data that then allows MetricFlow to shine as a **denormalization engine**. * ∞ Another way to think about this is that instead of moving down a list of requested priorities trying to pre-make as many combinations of our marts as possible — increasing lines of code and complexity — we can **let MetricFlow present every combination possible without specifically coding it**. * 🏗️ To resolve these approaches optimally, we'll need to shift some **fundamental aspects of our modeling strategy**. #### Refactor steps outlined[​](#refactor-steps-outlined "Direct link to Refactor steps outlined") We recommend an incremental implementation process that looks something like this: 1. 👉 Identify **an important output** (a revenue chart on a dashboard for example, and the mart model(s) that supplies this output. 2. 🔍 Examine all the **entities that are components** of this rollup (for instance, an `active_customers_per_week` rollup may include customers, shipping, and product data). 3. 🛠️ **Build semantic models** for all the underlying component marts. 4. 📏 **Build metrics** for the required aggregations in the rollup. 5. 👯 Create a **clone of the output** on top of the Semantic Layer. 6. 💻 Audit to **ensure you get accurate outputs**. 7. 👉 Identify **any other outputs** that point to the rollup and **move them to the Semantic Layer**. 8. ✌️ Put a **deprecation plan** in place for the now extraneous frozen rollup. You would then **continue this process** on other outputs and marts moving down a list of **priorities**. Each model as you go along will be faster and easier as you'll **reuse many of the same components** that will already have been semantically modeled. #### Let's make a `revenue` metric[​](#lets-make-a-revenue-metric "Direct link to lets-make-a-revenue-metric") So far we've been working in new pointing at a staging model to simplify things as we build new mental models for MetricFlow. In reality, unless you're implementing MetricFlow in a green-field dbt project, you probably are going to have some refactoring to do. So let's get into that in detail. 1. 📚 Per the above steps, let's say we've identified our target as a revenue rollup that is built on top of `orders` and `order_items`. Now we need to identify all the underlying components, these will be all the 'import' CTEs at the top of these marts. So in the Jaffle Shop project we'd need: `orders`, `order_items`, `products`, `locations`, and `supplies`. 2. 🗺️ We'll next make semantic models for all of these. Let's walk through a straightforward conversion first with `locations`. 3. ⛓️ We'll want to first decide if we need to do any joining to get this into the shape we want for our semantic model. The biggest determinants of this are two factors: * 📏 Does this semantic model **contain measures**? * 🕥 Does this semantic model have a **primary timestamp**? * 🫂 If a semantic model **has measures but no timestamp** (for example, supplies in the example project, which has static costs of supplies), you'll likely want to **sacrifice some normalization and join it on to another model** that has a primary timestamp to allow for metric aggregation. 4. 🔄 If we *don't* need any joins, we'll just go straight to the staging model for our semantic model's `ref`. Locations does have a `tax_rate` measure, but it also has an `ordered_at` timestamp, so we can go **straight to the staging model** here. 5. 🥇 We specify our **primary entity** (based on `location_id`), dimensions (one categorical, `location_name`, and one **primary time dimension** `opened_at`), and lastly our measures, in this case just `average_tax_rate`. models/marts/locations.yml ```yaml semantic_models: - name: locations description: | Location dimension table. The grain of the table is one row per location. model: ref('stg_locations') entities: - name: location type: primary expr: location_id dimensions: - name: location_name type: categorical - name: date_trunc('day', opened_at) type: time type_params: time_granularity: day measures: - name: average_tax_rate description: Average tax rate. expr: tax_rate agg: avg ``` #### Semantic and logical interaction[​](#semantic-and-logical-interaction "Direct link to Semantic and logical interaction") Now, let's tackle a thornier situation. Products and supplies both have dimensions and measures but no time dimension. Products has a one-to-one relationship with `order_items`, enriching that table, which is itself just a mapping table of products to orders. Additionally, products have a one-to-many relationship with supplies. The high-level ERD looks like the diagram below. [![](/img/best-practices/semantic-layer/orders_erd.png?v=2)](#) So to calculate, for instance, the cost of ingredients and supplies for a given order, we'll need to do some joining and aggregating, but again we **lack a time dimension for products and supplies**. This is the signal to us that we'll **need to build a logical mart** and point our semantic model at that. tip **dbt 🧡 MetricFlow.** This is where integrating your semantic definitions into your dbt project really starts to pay dividends. The interaction between the logical and semantic layers is so dynamic, you either need to house them in one codebase or facilitate a lot of cross-project communication and dependency. 1. 🎯 Let's aim at, to start, building a table at the `order_items` grain. We can aggregate supply costs up, map over the fields we want from products, such as price, and bring the `ordered_at` timestamp we need over from the orders table. You can see example code, copied below, in `models/marts/order_items.sql`. models/marts/order\_items.sql ```sql {{ config( materialized = 'table', ) }} with order_items as ( select * from {{ ref('stg_order_items') }} ), orders as ( select * from {{ ref('stg_orders')}} ), products as ( select * from {{ ref('stg_products') }} ), supplies as ( select * from {{ ref('stg_supplies') }} ), order_supplies_summary as ( select product_id, sum(supply_cost) as supply_cost from supplies group by 1 ), joined as ( select order_items.*, products.product_price, order_supplies_summary.supply_cost, products.is_food_item, products.is_drink_item, orders.ordered_at from order_items left join orders on order_items.order_id = orders.order_id left join products on order_items.product_id = products.product_id left join order_supplies_summary on order_items.product_id = order_supplies_summary.product_id ) select * from joined ``` 2. 🏗️ Now we've got a table that looks more like what we want to feed into the Semantic Layer. Next, we'll **build a semantic model on top of this new mart** in `models/marts/order_items.yml`. Again, we'll identify our **entities, then dimensions, then measures**. models/marts/order\_items.yml ```yml semantic_models: #The name of the semantic model. - name: order_items defaults: agg_time_dimension: ordered_at description: | Items contatined in each order. The grain of the table is one row per order item. model: ref('order_items') entities: - name: order_item type: primary expr: order_item_id - name: order_id type: foreign expr: order_id - name: product type: foreign expr: product_id dimensions: - name: ordered_at expr: date_trunc('day', ordered_at) type: time type_params: time_granularity: day - name: is_food_item type: categorical - name: is_drink_item type: categorical measures: - name: revenue description: The revenue generated for each order item. Revenue is calculated as a sum of revenue associated with each product in an order. agg: sum expr: product_price - name: food_revenue description: The revenue generated for each order item. Revenue is calculated as a sum of revenue associated with each product in an order. agg: sum expr: case when is_food_item = 1 then product_price else 0 end - name: drink_revenue description: The revenue generated for each order item. Revenue is calculated as a sum of revenue associated with each product in an order. agg: sum expr: case when is_drink_item = 1 then product_price else 0 end - name: median_revenue description: The median revenue generated for each order item. agg: median expr: product_price ``` 3. 📏 Finally, Let's **build a simple revenue metric** on top of our semantic model now. models/marts/order\_items.yml ```yaml metrics: - name: revenue description: Sum of the product revenue for each order item. Excludes tax. type: simple label: Revenue type_params: measure: revenue ``` #### Checking our work[​](#checking-our-work "Direct link to Checking our work") * 🔍 We always start our **auditing** with a `dbt parse` to **ensure our code works** before we examine its output. * 👯 If we're working there, we'll move to trying out an `dbt sl query` that **replicates the logic of the output** we're trying to refactor. * 💸 For our example we want to **audit monthly revenue**, to do that we'd run the query below. ##### Example query[​](#example-query "Direct link to Example query") ```text dbt sl query --metrics revenue --group-by metric_time__month ``` ##### Example query results[​](#example-query-results "Direct link to Example query results") ```shell ✔ Success 🦄 - query completed after 1.02 seconds | METRIC_TIME__MONTH | REVENUE | |:---------------------|----------:| | 2016-09-01 00:00:00 | 17032.00 | | 2016-10-01 00:00:00 | 20684.00 | | 2016-11-01 00:00:00 | 26338.00 | | 2016-12-01 00:00:00 | 10685.00 | ``` * Try introducing some other dimensions from the semantic models into the `group-by` arguments to get a feel for this command. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Semantic structure tip Note that this best practices guide doesn't yet use the [new YAML specification](https://docs.getdbt.com/docs/build/latest-metrics-spec.md). We're working on updating this guide to use the new spec and file structure soon! To read more about the new spec, see [Creating metrics](https://docs.getdbt.com/docs/build/metrics-overview.md). #### Files and Folders[​](#files-and-folders "Direct link to Files and Folders") The first thing you need to establish is how you’re going to consistently structure your code. There are two recommend best practices to choose from: * 🏡 **Co-locate your semantic layer code** in a one-YAML-file-per-marts-model system. * Puts documentation, data tests, unit tests, semantic models, and metrics into a unified file that corresponds to a dbt-modeled mart. * Trades larger file size for less clicking between files. * Simpler for greenfield projects that are building the Semantic Layer alongside dbt models. * 🏘️**Create a sub-folder** called `models/semantic_models/`. * Create a parallel file and folder structure within that specifically for semantic layer code. * Gives you more targeted files, but may involves switching between files more often. * Better for migrating large existing projects, as you can quickly see what marts have been codified into the Semantic Layer. It’s not terribly difficult to shift between these (it can be done with some relatively straightforward shell scripting), and this is purely a decision based on your developers’ preference (i.e. it has no impact on execution or performance), so don’t feel locked in to either path. Just pick the one that feels right and you can always shift down the road if you change your mind. tip Make sure to save all semantic models and metrics under the directory defined in the [`model-paths`](https://docs.getdbt.com/reference/project-configs/model-paths.md) (or a subdirectory of it, like `models/semantic_models/`). If you save them outside of this path, it will result in an empty `semantic_manifest.json` file, and your semantic models or metrics won't be recognized. #### Naming[​](#naming "Direct link to Naming") Next, establish your system for consistent file naming: * 1️⃣ If you’re doing **one-YAML-file-per-mart** then you’d have an `orders.sql` and an `orders.yml`. * 📛 If you’re using a **parallel subfolder approach**, for the sake of unique file names it’s recommended to use the **prefix `sem_` e.g. `sem_orders.yml`** for the dedicated semantic model and metrics that build on `orders.sql` and `orders.yml`. #### Can't decide?[​](#cant-decide "Direct link to Can't decide?") Start with a dedicated subfolder for your semantic models and metrics, and then if you find that you’re spending a lot of time clicking between files, you can always shift to a one-YAML-file-per-mart system. Our internal data team has found that the dedicated subfolder approach is more manageable for migrating existing projects, and this is the approach our documentation uses, so if you can't pick go with that. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Set up the dbt Semantic Layer #### Getting started[​](#getting-started "Direct link to Getting started") There are two options for developing a dbt project, including the Semantic Layer: * [dbt CLI](https://docs.getdbt.com/docs/cloud/cloud-cli-installation.md) — MetricFlow commands are embedded in the dbt CLI under the `dbt sl` subcommand. This is the easiest, most full-featured way to develop Semantic Layer code for the time being. You can use the editor of your choice and run commands from the terminal. * [Studio IDE](https://docs.getdbt.com/docs/cloud/studio-ide/develop-in-studio.md) — You can create semantic models and metrics in the Studio IDE. #### Basic commands[​](#basic-commands "Direct link to Basic commands") * 🔍 A less common command that will come in handy with the Semantic Layer is `dbt parse`. This will parse your project and generate a **semantic manifest**, a representation of meaningful connections described by your project. This is uploaded to dbt, and used for running `dbt sl` commands in development. This file gives MetricFlow a **state of the world from which to generate queries**. * 🧰 `dbt sl query` is your other best friend, it will execute a query against your semantic layer and return a sample of the results. This is great for testing your semantic models and metrics as you build them. For example, if you're building a revenue model you can run `dbt sl query --metrics revenue --group-by metric_time__month` to validate that monthly revenue is calculating correctly. * 📝 Lastly, `dbt sl list dimensions --metrics [metric name]` will list all the dimensions available for a given metric. This is useful for checking that you're increasing dimensionality as you progress. You can `dbt sl list` other aspects of your Semantic Layer as well, run `dbt sl list --help` for the full list of options. For more information on the available commands, refer to the [MetricFlow commands](https://docs.getdbt.com/docs/build/metricflow-commands.md) reference, or use `dbt sl --help` and `dbt sl [subcommand] --help` on the command line. If you need to set up a dbt project first, check out the [quickstart guides](https://docs.getdbt.com/docs/get-started-dbt.md). #### Onward\![​](#onward "Direct link to Onward!") Throughout the rest of the guide, we'll show example code based on the Jaffle Shop project, a fictional chain of restaurants. You can check out the code yourself and try things out in the [Jaffle Shop repository](https://github.com/dbt-labs/jaffle-shop). So if you see us calculating metrics like `food_revenue` later in this guide, this is why! #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Staging: Preparing our atomic building blocks The staging layer is where our journey begins. This is the foundation of our project, where we bring all the individual components we're going to use to build our more complex and useful models into the project. We'll use an analogy for working with dbt throughout this guide: thinking modularly in terms of atoms, molecules, and more complex outputs like proteins or cells (we apologize in advance to any chemists or biologists for our inevitable overstretching of this metaphor). Within that framework, if our source system data is a soup of raw energy and quarks, then you can think of the staging layer as condensing and refining this material into the individual atoms we’ll later build more intricate and useful structures with. ##### Staging: Files and folders[​](#staging-files-and-folders "Direct link to Staging: Files and folders") Let's zoom into the staging directory from our `models` file tree [in the overview](https://docs.getdbt.com/best-practices/how-we-structure/1-guide-overview.md) and walk through what's going on here. ```shell models/staging ├── jaffle_shop │ ├── _jaffle_shop__docs.md │ ├── _jaffle_shop__models.yml │ ├── _jaffle_shop__sources.yml │ ├── base │ │ ├── base_jaffle_shop__customers.sql │ │ └── base_jaffle_shop__deleted_customers.sql │ ├── stg_jaffle_shop__customers.sql │ └── stg_jaffle_shop__orders.sql └── stripe ├── _stripe__models.yml ├── _stripe__sources.yml └── stg_stripe__payments.sql ``` * **Folders.** Folder structure is extremely important in dbt. Not only do we need a consistent structure to find our way around the codebase, as with any software project, but our folder structure is also one of the key interfaces for understanding the knowledge graph encoded in our project (alongside the DAG and the data output into our warehouse). It should reflect how the data flows, step-by-step, from a wide variety of source-conformed models into fewer, richer business-conformed models. Moreover, we can use our folder structure as a means of selection in dbt [selector syntax](https://docs.getdbt.com/reference/node-selection/syntax.md). For example, with the above structure, if we got fresh Stripe data loaded and wanted to run all the models that build on our Stripe data, we can easily run `dbt build --select staging.stripe+` and we’re all set for building more up-to-date reports on payments. * ✅ **Subdirectories based on the source system**. Our internal transactional database is one system, the data we get from Stripe's API is another, and lastly the events from our Snowplow instrumentation. We've found this to be the best grouping for most companies, as source systems tend to share similar loading methods and properties between tables, and this allows us to operate on those similar sets easily. * ❌ **Subdirectories based on loader.** Some people attempt to group by how the data is loaded (Fivetran, Stitch, custom syncs), but this is too broad to be useful on a project of any real size. * ❌ **Subdirectories based on business grouping.** Another approach we recommend against is splitting up by business groupings in the staging layer, and creating subdirectories like 'marketing', 'finance', etc. A key goal of any great dbt project should be establishing a single source of truth. By breaking things up too early, we open ourselves up to creating overlap and conflicting definitions (think marketing and financing having different fundamental tables for orders). We want everybody to be building with the same set of atoms, so in our experience, starting our transformations with our staging structure reflecting the source system structures is the best level of grouping for this step. * **File names.** Creating a consistent pattern of file naming is [crucial in dbt](https://docs.getdbt.com/blog/on-the-importance-of-naming). File names must be unique and correspond to the name of the model when selected and created in the warehouse. We recommend putting as much clear information into the file name as possible, including a prefix for the layer the model exists in, important grouping information, and specific information about the entity or transformation in the model. * ✅ `stg_[source]__[entity]s.sql` - the double underscore between source system and entity helps visually distinguish the separate parts in the case of a source name having multiple words. For instance, `google_analytics__campaigns` is always understandable, whereas to somebody unfamiliar `google_analytics_campaigns` could be `analytics_campaigns` from the `google` source system as easily as `campaigns` from the `google_analytics` source system. Think of it like an [oxford comma](https://www.youtube.com/watch?v=P_i1xk07o4g), the extra clarity is very much worth the extra punctuation. * ❌ `stg_[entity].sql` - might be specific enough at first, but will break down in time. Adding the source system into the file name aids in discoverability, and allows understanding where a component model came from even if you aren't looking at the file tree. * ✅ **Plural.** SQL, and particularly SQL in dbt, should read as much like prose as we can achieve. We want to lean into the broad clarity and declarative nature of SQL when possible. As such, unless there’s a single order in your `orders` table, plural is the correct way to describe what is in a table with multiple rows. ##### Staging: Models[​](#staging-models "Direct link to Staging: Models") Now that we’ve got a feel for how the files and folders fit together, let’s look inside one of these files and dig into what makes for a well-structured staging model. Below, is an example of a standard staging model (from our `stg_stripe__payments` model) that illustrates the common patterns within the staging layer. We’ve organized our model into two CTEs: one pulling in a source table via the [source macro](https://docs.getdbt.com/docs/build/sources.md#selecting-from-a-source) and the other applying our transformations. While our later layers of transformation will vary greatly from model to model, every one of our staging models will follow this exact same pattern. As such, we need to make sure the pattern we’ve established is rock solid and consistent. ```sql -- stg_stripe__payments.sql with source as ( select * from {{ source('stripe','payment') }} ), renamed as ( select -- ids id as payment_id, orderid as order_id, -- strings paymentmethod as payment_method, case when payment_method in ('stripe', 'paypal', 'credit_card', 'gift_card') then 'credit' else 'cash' end as payment_type, status, -- numerics amount as amount_cents, amount / 100.0 as amount, -- booleans case when status = 'successful' then true else false end as is_completed_payment, -- dates date_trunc('day', created) as created_date, -- timestamps created::timestamp_ltz as created_at from source ) select * from renamed ``` * Based on the above, the most standard types of staging model transformations are: * ✅ **Renaming** * ✅ **Type casting** * ✅ **Basic computations** (e.g. cents to dollars) * ✅ **Categorizing** (using conditional logic to group values into buckets or booleans, such as in the `case when` statements above) * ❌ **Joins** — the goal of staging models is to clean and prepare individual source-conformed concepts for downstream usage. We're creating the most useful version of a source system table, which we can use as a new modular component for our project. In our experience, joins are almost always a bad idea here — they create immediate duplicated computation and confusing relationships that ripple downstream — there are occasionally exceptions though (refer to [base models](#staging-other-considerations) for more info). * ❌ **Aggregations** — aggregations entail grouping, and we're not doing that at this stage. Remember - staging models are your place to create the building blocks you’ll use all throughout the rest of your project — if we start changing the grain of our tables by grouping in this layer, we’ll lose access to source data that we’ll likely need at some point. We just want to get our individual concepts cleaned and ready for use, and will handle aggregating values downstream. * ✅ **Materialized as views.** Looking at a partial view of our `dbt_project.yml` below, we can see that we’ve configured the entire staging directory to be materialized as views. As they’re not intended to be final artifacts themselves, but rather building blocks for later models, staging models should typically be materialized as views for two key reasons: * Any downstream model (discussed more in [marts](https://docs.getdbt.com/best-practices/how-we-structure/4-marts.md)) referencing our staging models will always get the freshest data possible from all of the component views it’s pulling together and materializing * It avoids wasting space in the warehouse on models that are not intended to be queried by data consumers, and thus do not need to perform as quickly or efficiently ```yaml # dbt_project.yml models: jaffle_shop: staging: +materialized: view ``` * Staging models are the only place we'll use the [`source` macro](https://docs.getdbt.com/docs/build/sources.md), and our staging models should have a 1-to-1 relationship to our source tables. That means for each source system table we’ll have a single staging model referencing it, acting as its entry point — *staging* it — for use downstream. Don’t Repeat Yourself. Staging models help us keep our code DRY. dbt's modular, reusable structure means we can, and should, push any transformations that we’ll always want to use for a given component model as far upstream as possible. This saves us from potentially wasting code, complexity, and compute doing the same transformation more than once. For instance, if we know we always want our monetary values as floats in dollars, but the source system is integers and cents, we want to do the division and type casting as early as possible so that we can reference it rather than redo it repeatedly downstream. This is a welcome change for many of us who have become used to applying the same sets of SQL transformations in many places out of necessity! For us, the earliest point for these 'always-want' transformations is the staging layer, the initial entry point in our transformation process. The DRY principle is ultimately the litmus test for whether transformations should happen in the staging layer. If we'll want them in every downstream model and they help us eliminate repeated code, they're probably okay. ##### Staging: Other considerations[​](#staging-other-considerations "Direct link to Staging: Other considerations") * **Base models when joins are necessary to stage concepts.** Sometimes, in order to maintain a clean and DRY staging layer we do need to implement some joins to create a solid concept for our building blocks. In these cases, we recommend creating a sub-directory in the staging directory for the source system in question and building `base` models. These have all the same properties that would normally be in the staging layer, they will directly source the raw data and do the non-joining transformations, then in the staging models we’ll join the requisite base models. The most common use cases for building a base layer under a staging folder are: * ✅ **Joining in separate delete tables**. Sometimes a source system might store deletes in a separate table. Typically we’ll want to make sure we can mark or filter out deleted records for all our component models, so we’ll need to join these delete records up to any of our entities that follow this pattern. This is the example shown below to illustrate. ```sql -- base_jaffle_shop__customers.sql with source as ( select * from {{ source('jaffle_shop','customers') }} ), customers as ( select id as customer_id, first_name, last_name from source ) select * from customers ``` ```sql -- base_jaffle_shop__deleted_customers.sql with source as ( select * from {{ source('jaffle_shop','customer_deletes') }} ), deleted_customers as ( select id as customer_id, deleted as deleted_at from source ) select * from deleted_customers ``` ```sql -- stg_jaffle_shop__customers.sql with customers as ( select * from {{ ref('base_jaffle_shop__customers') }} ), deleted_customers as ( select * from {{ ref('base_jaffle_shop__deleted_customers') }} ), join_and_mark_deleted_customers as ( select customers.*, case when deleted_customers.deleted_at is not null then true else false end as is_deleted from customers left join deleted_customers on customers.customer_id = deleted_customers.customer_id ) select * from join_and_mark_deleted_customers ``` * ✅ **Unioning disparate but symmetrical sources**. A typical example here would be if you operate multiple ecommerce platforms in various territories via a SaaS platform like Shopify. You would have perfectly identical schemas, but all loaded separately into your warehouse. In this case, it’s easier to reason about our orders if *all* of our shops are unioned together, so we’d want to handle the unioning in a base model before we carry on with our usual staging model transformations on the (now complete) set — you can dig into [more detail on this use case here](https://discourse.getdbt.com/t/unioning-identically-structured-data-sources/921). * **[Codegen](https://github.com/dbt-labs/dbt-codegen) to automate staging table generation.** It’s very good practice to learn to write staging models by hand, they’re straightforward and numerous, so they can be an excellent way to absorb the dbt style of writing SQL. Also, we’ll invariably find ourselves needing to add special elements to specific models at times — for instance, in one of the situations above that require base models — so it’s helpful to deeply understand how they work. Once that understanding is established though, because staging models are built largely following the same rote patterns and need to be built 1-to-1 for each source table in a source system, it’s preferable to start automating their creation. For this, we have the [codegen](https://github.com/dbt-labs/dbt-codegen) package. This will let you automatically generate all the source YAML and staging model boilerplate to speed up this step, and we recommend using it in every project. * **Utilities folder.** While this is not in the `staging` folder, it’s useful to consider as part of our fundamental building blocks. The `models/utilities` directory is where we can keep any general purpose models that we generate from macros or based on seeds that provide tools to help us do our modeling, rather than data to model itself. The most common use case is a [date spine](https://github.com/dbt-labs/dbt-utils#date_spine-source) generated with [the dbt utils package](https://hub.getdbt.com/dbt-labs/dbt_utils/latest/). Development flow versus DAG order. This guide follows the order of the DAG, so we can get a holistic picture of how these three primary layers build on each other towards fueling impactful data products. It’s important to note though that developing models does not typically move linearly through the DAG. Most commonly, we should start by mocking out a design in a spreadsheet so we know we’re aligned with our stakeholders on output goals. Then, we’ll want to write the SQL to generate that output, and identify what tables are involved. Once we have our logic and dependencies, we’ll make sure we’ve staged all the necessary atomic pieces into the project, then bring them together based on the logic we wrote to generate our mart. Finally, with a functioning model flowing in dbt, we can start refactoring and optimizing that mart. By splitting the logic up and moving parts back upstream into intermediate models, we ensure all of our models are clean and readable, the story of our DAG is clear, and we have more surface area to apply thorough testing. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Tactical terminology The rest of this guide will focus on the process of migrating your existing dbt code to the Semantic Layer. To do this, we'll need to introduce some new terminology and concepts that are specific to the Semantic Layer. We want to define them up front, as we have specific meanings in mind applicable to the process of migrating code to the Semantic Layer. These terms can mean different things in different settings, but here we mean: * 🔲 **Normalized** — can be defined with varying degrees of technical rigor, but used here we mean something that contains unique data stored only once in one place, so it can be efficiently joined and aggregated into various shapes. You can think of it referring to tables that function as conceptual building blocks in your business, *not* in the sense of say, strict [Codd 3NF](https://en.wikipedia.org/wiki/Third_normal_form). * 🛒 **Mart** — also has a variety of definitions, but here we mean a table that is relatively normalized and functions as the source of truth for a core concept in your business. * 🕸️ **Denormalized** — when we store the same data in multiple places for easier access without joins. The most denormalized data modeling system is OBT (One Big Table), where we try to get every possible interesting column related to a concept (for instance, customers) into one big table so all an analyst needs to do is `select`. * 🗞️ **Rollup** — used here as a catchall term meaning both denormalized tables built on top of normalized marts and those that perform aggregations to a certain grain. For example `active_accounts_per_week` might aggregate `customers` and `orders` data to a weekly time. Another example would be `customer_metrics` which might denormalize a lot of the data from `customers` as well as aggregated data from `orders`. For the sake of brevity in this guide, we’ll call all these types of products built on top of your normalized concepts as **rollups**. We'll also use a couple *new* terms for the sake of brevity. These aren't standard or official dbt-isms, but useful for communicating meaning in the context of refactoring code for the Semantic Layer: * 🧊 **Frozen** — shorthand to indicate code that is statically built in dbt’s logical transformation layer. Does not refer to the materialization type: views, incremental models, and regular tables are all considered *frozen* as they statically generate data or code that is stored in the warehouse as opposed to dynamically querying, as with the Semantic Layer. This is *not* a bad thing! We want some portion of our transformation logic to be frozen and stable as the *transformation* *logic* is not rapidly shifting and we benefit in testing, performance, and stability. * 🫠 **Melting** — the process of breaking up frozen structures into flexible Semantic Layer code. This allows them to create as many combinations and aggregations as possible dynamically in response to stakeholder needs and queries. tip 🏎️ **The Semantic Layer is a denormalization engine.** dbt transforms your data into clean, normalized marts. The Semantic Layer is a denormalization engine that dynamically connects and molds these building blocks into the maximum amount of shapes available *dynamically*. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### The rest of the project ##### Project structure review[​](#project-structure-review "Direct link to Project structure review") So far we’ve focused on the `models` folder, the primary directory of our dbt project. Next, we’ll zoom out and look at how the rest of our project files and folders fit in with this structure, starting with how we approach YAML configuration files. ```shell models ├── intermediate │ └── finance │ ├── _int_finance__models.yml │ └── int_payments_pivoted_to_orders.sql ├── marts │ ├── finance │ │ ├── _finance__models.yml │ │ ├── orders.sql │ │ └── payments.sql │ └── marketing │ ├── _marketing__models.yml │ └── customers.sql ├── staging │ ├── jaffle_shop │ │ ├── _jaffle_shop__docs.md │ │ ├── _jaffle_shop__models.yml │ │ ├── _jaffle_shop__sources.yml │ │ ├── base │ │ │ ├── base_jaffle_shop__customers.sql │ │ │ └── base_jaffle_shop__deleted_customers.sql │ │ ├── stg_jaffle_shop__customers.sql │ │ └── stg_jaffle_shop__orders.sql │ └── stripe │ ├── _stripe__models.yml │ ├── _stripe__sources.yml │ └── stg_stripe__payments.sql └── utilities └── all_dates.sql ``` ##### YAML in-depth[​](#yaml-in-depth "Direct link to YAML in-depth") When structuring your YAML configuration files in a dbt project, you want to balance centralization and file size to make specific configs as easy to find as possible. It’s important to note that while the top-level YAML files (`dbt_project.yml`, `packages.yml`) need to be specifically named and in specific locations, the files containing your `sources` and `models` dictionaries can be named, located, and organized however you want. It’s the internal contents that matter here. As such, we’ll lay out our primary recommendation, as well as the pros and cons of a popular alternative. Like many other aspects of structuring your dbt project, what’s most important here is consistency, clear intention, and thorough documentation on how and why you do what you do. * ✅ **Config per folder.** As in the example above, create a `_[directory]__models.yml` per directory in your models folder that configures all the models in that directory. for staging folders, also include a `_[directory]__sources.yml` per directory. * The leading underscore ensures your YAML files will be sorted to the top of every folder to make them easy to separate from your models. * YAML files don’t need unique names in the way that SQL model files do, but including the directory (instead of simply `_sources.yml` in each folder), means you can fuzzy find the right file more quickly. * We’ve recommended several different naming conventions over the years, most recently calling these `schema.yml` files. We’ve simplified to recommend that these simply be labelled based on the YAML dictionary that they contain. * If you utilize [doc blocks](https://docs.getdbt.com/docs/build/documentation.md#using-docs-blocks) in your project, we recommend following the same pattern, and creating a `_[directory]__docs.md` markdown file per directory containing all your doc blocks for that folder of models. * ❌ **Config per project.** Some people put *all* of their source and model YAML into one file. While you can technically do this, and while it certainly simplifies knowing what file the config you’re looking for will be in (as there is only one file), it makes it much harder to find specific configurations within that file. We recommend balancing those two concerns. * ⚠️ **Config per model.** On the other end of the spectrum, some people prefer to create one YAML file per model. This presents less of an issue than a single monolith file, as you can quickly search for files, know exactly where specific configurations exist, spot models without configs (and thus without tests) by looking at the file tree, and various other advantages. In our opinion, the extra files, tabs, and windows this requires creating, copying from, pasting to, closing, opening, and managing creates a somewhat slower development experience that outweighs the benefits. Defining config per directory is the most balanced approach for most projects, but if you have compelling reasons to use config per model, there are definitely some great projects that follow this paradigm. * ✅ **Cascade configs.** Leverage your `dbt_project.yml` to set default configurations at the directory level. Use the well-organized folder structure we’ve created thus far to define the baseline schemas and materializations, and use dbt’s cascading scope priority to define variations to this. For example, as below, define your marts to be materialized as tables by default, define separate schemas for our separate subfolders, and any models that need to use incremental materialization can be defined at the model level. ```yaml -- dbt_project.yml models: jaffle_shop: staging: +materialized: view intermediate: +materialized: ephemeral marts: +materialized: table finance: +schema: finance marketing: +schema: marketing ``` Define your defaults. One of the many benefits this consistent approach to project structure confers to us is this ability to cascade default behavior. Carefully organizing our folders and defining configuration at that level whenever possible frees us from configuring things like schema and materialization in every single model (not very DRY!) — we only need to configure exceptions to our general rules. Tagging is another area this principle comes into play. Many people new to dbt will rely on tags rather than a rigorous folder structure, and quickly find themselves in a place where every model *requires* a tag. This creates unnecessary complexity. We want to lean on our folders as our primary selectors and grouping mechanism, and use tags to define groups that are *exceptions.* A folder-based selection like \*\*`dbt build --select marts.marketing` is much simpler than trying to tag every marketing-related model, hoping all developers remember to add that tag for new models, and using `dbt build --select tag:marketing`. ###### Defining groups[​](#defining-groups "Direct link to Defining groups") A group is a collection of nodes within a dbt DAG. Groups enable intentional collaboration within and across teams by restricting [access to private](https://docs.getdbt.com/reference/resource-configs/access.md) models. Groups are defined in `.yml` files, nested under a `groups:` key. For more information about using groups, see [Add groups to your DAG](https://docs.getdbt.com/docs/build/groups.md). ##### How we use the other folders[​](#how-we-use-the-other-folders "Direct link to How we use the other folders") ```shell jaffle_shop ├── analyses ├── seeds │ └── employees.csv ├── macros │ ├── _macros.yml │ └── cents_to_dollars.sql ├── snapshots └── tests └── assert_positive_value_for_total_amount.sql ``` We’ve focused heavily thus far on the primary area of action in our dbt project, the `models` folder. As you’ve probably observed though, there are several other folders in our project. While these are, by design, very flexible to your needs, we’ll discuss the most common use cases for these other folders to help get you started. * ✅ `seeds` for lookup tables. The most common use case for seeds is loading lookup tables that are helpful for modeling but don’t exist in any source systems — think mapping zip codes to states, or UTM parameters to marketing campaigns. In this example project we have a small seed that maps our employees to their `customer_id`s, so that we can handle their purchases with special logic. * ❌ `seeds` for loading source data. Do not use seeds to load data from a source system into your warehouse. If it exists in a system you have access to, you should be loading it with a proper EL tool into the raw data area of your warehouse. dbt is designed to operate on data in the warehouse, not as a data-loading tool. * ✅ `analyses` for storing auditing queries. The `analyses` folder lets you store any queries you want to use Jinja with and version control, but not build into models in your warehouse. There are limitless possibilities here, but the most common use case when we set up projects at dbt Labs is to keep queries that leverage the [audit helper](https://github.com/dbt-labs/dbt-audit-helper) package. This package is incredibly useful for finding discrepancies in output when migrating logic from another system into dbt. * ✅ `tests` for testing multiple specific tables simultaneously. As dbt tests have evolved, writing singular tests has become less and less necessary. It's extremely useful for work-shopping test logic, but more often than not you'll find yourself either migrating that logic into your own custom generic tests or discovering a pre-built test that meets your needs from the ever-expanding universe of dbt packages (between the extra tests in [`dbt-utils`](https://github.com/dbt-labs/dbt-utils) and [`dbt-expectations`](https://github.com/calogica/dbt-expectations) almost any situation is covered). One area where singular tests still shine though is flexibly testing things that require a variety of specific models. If you're familiar with the difference between [unit tests](https://en.wikipedia.org/wiki/Unit_testing) [and](https://www.testim.io/blog/unit-test-vs-integration-test/) [integration](https://www.codecademy.com/resources/blog/what-is-integration-testing/) [tests](https://en.wikipedia.org/wiki/Integration_testing) in software engineering, you can think of generic and singular tests in a similar way. If you need to test the results of how several specific models interact or relate to each other, a singular test will likely be the quickest way to nail down your logic. * ✅ `snapshots` for creating [Type 2 slowly changing dimension](https://en.wikipedia.org/wiki/Slowly_changing_dimension#Type_2:_add_new_row) records from [Type 1](https://en.wikipedia.org/wiki/Slowly_changing_dimension#Type_1:_overwrite) (destructively updated) source data. This is [covered thoroughly in the dbt Docs](https://docs.getdbt.com/docs/build/snapshots.md), unlike these other folders has a more defined purpose, and is out-of-scope for this guide, but mentioned for completion. * ✅ `macros` for DRY-ing up transformations you find yourself doing repeatedly. Like snapshots, a full dive into macros is out-of-scope for this guide and well [covered elsewhere](https://docs.getdbt.com/docs/build/jinja-macros.md), but one important structure-related recommendation is to [write documentation for your macros](https://docs.getdbt.com/faqs/Docs/documenting-macros.md). We recommend creating a `_macros.yml` and documenting the purpose and arguments for your macros once they’re ready for use. ##### Project splitting[​](#project-splitting "Direct link to Project splitting") One important, growing consideration in the analytics engineering ecosystem is how and when to split a codebase into multiple dbt projects. Currently, our advice for most teams, especially those just starting, is fairly simple: in most cases, we recommend doing so with [Mesh](https://docs.getdbt.com/best-practices/how-we-mesh/mesh-1-intro.md)! Mesh allows organizations to handle complexity by connecting several dbt projects rather than relying on one big, monolithic project. This approach is designed to speed up development while maintaining governance. As breaking up monolithic dbt projects into smaller, connected projects, potentially within a modern mono repo becomes easier, the scenarios we currently advise against may soon become feasible. So watch this space! * ✅ **Business groups or departments.** Conceptual separations within the project are the primary reason to split up your project. This allows your business domains to own their own data products and still collaborate using Mesh. For more information about Mesh, please refer to our [Mesh FAQs](https://docs.getdbt.com/best-practices/how-we-mesh/mesh-5-faqs.md). * ✅ **Data governance.** Structural, organizational needs — such as data governance and security — are one of the few worthwhile reasons to split up a project. If, for instance, you work at a healthcare company with only a small team cleared to access raw data with PII in it, you may need to split out your staging models into their own projects to preserve those policies. In that case, you would import your staging project into the project that builds on those staging models as a [private package](https://docs.getdbt.com/docs/build/packages.md#private-packages). * ✅ **Project size.** At a certain point, your project may grow to have simply too many models to present a viable development experience. If you have 1000s of models, it absolutely makes sense to find a way to split up your project. * ❌ **ML vs Reporting use cases.** Similarly to the point above, splitting a project up based on different use cases, particularly more standard BI versus ML features, is a common idea. We tend to discourage it for the time being. As with the previous point, a foundational goal of implementing dbt is to create a single source of truth in your organization. The features you’re providing to your data science teams should be coming from the same marts and metrics that serve reports on executive dashboards. #### Final considerations[​](#final-considerations "Direct link to Final considerations") Overall, consistency is more important than any of these specific conventions. As your project grows and your experience with dbt deepens, you will undoubtedly find aspects of the above structure you want to change. While we recommend this approach for the majority of projects, every organization is unique! The only dogmatic advice we’ll put forward here is that when you find aspects of the above structure you wish to change, think intently about your reasoning and document for your team *how* and *why* you are deviating from these conventions. To that end, we highly encourage you to fork this guide and add it to your project’s README, wiki, or docs so you can quickly create and customize those artifacts. Finally, we emphasize that this guide is a living document! It will certainly change and grow as dbt and dbt Labs evolve. We invite you to join in — discuss, comment, and contribute regarding suggested changes or new elements to cover. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Views-only pattern for maximum freshness Snowflake examples ahead This page uses Snowflake for code examples, but you can adapt the views-only pattern to other warehouses. For some workloads, the simplest and most "real-time" pattern is to materialize everything as views on top of a continuously updated source table. When transformations are very lightweight and the source is already being updated in near real-time, this can preserve the source's latency almost perfectly. #### When to use the views-only pattern[​](#when-to-use-the-views-only-pattern "Direct link to When to use the views-only pattern") Use this when: * Source freshness is already "good enough" (for example, ingestion service or operational system writes into a warehouse table every few seconds or minutes). * You have very lightweight transformations, such as: * Simple projections / renames * One to two joins to small reference table * Minimal or no heavy aggregations or window functions * You care most about preserving the source table's latency and are willing to trade off some query performance at read time. * This works best for small-to-medium tables with simple queries. Typical examples: * Dashboards showing current system status (like active user sessions, current queue depth, or recent device heartbeats) where you need to see the latest data immediately. * Event data that you're forwarding to other tools with minimal transformation (raw data with just a bit of normalization, like cleaning up field names or adding a few reference fields). If your transform logic becomes heavier, multiple teams depend on the data, or you need better cost and performance control, transition to [incremental models](https://docs.getdbt.com/best-practices/how-we-handle-real-time-data/2-incremental-patterns.md) or [dynamic tables/materialized views](https://docs.getdbt.com/best-practices/how-we-handle-real-time-data/3-warehouse-native-features.md). Reserve this pattern for the smallest, most latency‑sensitive use cases. ###### Assumptions[​](#assumptions "Direct link to Assumptions") The examples used in this page assume the following setup: * A non‑dbt system (ETL, streaming pipeline, app) is already writing into a warehouse‑resident table such as `raw.realtime_events` or `ops.active_sessions`, and that table meets your service level agreement for latency. * Querying that table directly is acceptable from a performance and cost standpoint for your expected concurrency. * You don't need dbt to persist intermediate tables; you mainly care about: * Consistent SQL logic (column naming, type casting) * Tests, contracts, and lineage * Exposures to BI / downstream tools All dbt models in this path are materialized as views, not tables or incremental models. #### Example implementation[​](#example-implementation "Direct link to Example implementation") Here's an example implementation of the views-only pattern, which has the following pattern structure: * [Source table](#source-table-definition) (continuously updated): `raw.realtime_events` * [Thin staging view](#staging-view): `analytics.stg_realtime_events_v` * [Domain view(s)](#domain-view-definition): `analytics.vw_realtime_events_enriched` ##### Source table definition[​](#source-table-definition "Direct link to Source table definition") ```yaml # models/sources.yml version: 2 sources: - name: raw schema: raw tables: - name: realtime_events description: "Continuously updated event table from streaming pipeline." loaded_at_field: event_ts ``` ##### Staging view[​](#staging-view "Direct link to Staging view") ```sql -- models/staging/stg_realtime_events.sql {{ config( materialized = 'view' ) }} select event_id, event_ts::timestamp_ntz as event_ts, -- Snowflake syntax for type casting to_date(event_ts) as event_date, user_id, event_type, payload from {{ source('raw', 'realtime_events') }}; ``` ##### Domain view definition[​](#domain-view-definition "Direct link to Domain view definition") ```sql -- models/marts/vw_realtime_events_enriched.sql {{ config( materialized = 'view' ) }} with base as ( select * from {{ ref('stg_realtime_events') }} ), user_dim as ( select user_id, user_segment, signup_date from {{ ref('dim_user') }} -- can be a table or incremental model ) select b.event_id, b.event_ts, b.event_date, b.user_id, u.user_segment, b.event_type, b.payload from base as b left join user_dim as u on b.user_id = u.user_id; ``` Downstream tools query `analytics.vw_realtime_events_enriched`. As long as `raw.realtime_events` is continuously updated, this view stack is as fresh as the source. #### Benefits[​](#benefits "Direct link to Benefits") * Maximum freshness: The view reflects new data as soon as it lands in `raw.realtime_events`. * Simple operations: No incremental logic to tune and no extra dbt job needed just to keep the data fresh. You still schedule jobs for tests, docs, and so on. * Best for small datasets: Works well when tables are small and queries are simple. Computing the view on the fly is cheap and fast. #### Limitations and risks[​](#limitations-and-risks "Direct link to Limitations and risks") This pattern is only safe under tight constraints and has several important limitations: * [Doesn't scale to heavy transformations](#doesnt-scale-to-heavy-transformations) * [No "frozen" intermediate tables](#no-frozen-intermediate-tables) * [Schema change sensitivity](#schema-change-sensitivity) * [Potential impact on operational systems](#potential-impact-on-operational-systems) ##### Doesn't scale to heavy transformations[​](#doesnt-scale-to-heavy-transformations "Direct link to Doesn't scale to heavy transformations") If your logic evolves into large joins, deep view chains, or expensive aggregations, you'll quickly run into performance issues: * Every query must re‑execute all the logic. * The warehouse has to optimize and execute the full stack of views every time. In those cases, use either of the following: * [Incremental models](https://docs.getdbt.com/best-practices/how-we-handle-real-time-data/2-incremental-patterns.md) * [Dynamic tables or materialized views](https://docs.getdbt.com/best-practices/how-we-handle-real-time-data/3-warehouse-native-features.md), where appropriate ##### No "frozen" intermediate tables[​](#no-frozen-intermediate-tables "Direct link to No \"frozen\" intermediate tables") Because everything is a view: * There's no persisted intermediate layer to debug or profile. * You can't easily "rerun yesterday's logic" if upstream data changes—everything always reflects the current state. ##### Schema change sensitivity[​](#schema-change-sensitivity "Direct link to Schema change sensitivity") Schema changes in the source table propagate immediately through the view stack, which: * Can break downstream BI if columns are dropped or types change. * Make tests and model contracts more important, since there’s no batch boundary to catch issues before users see them. ##### Potential impact on operational systems[​](#potential-impact-on-operational-systems "Direct link to Potential impact on operational systems") If the continuously‑updated source is itself a live operational store (not a warehouse landing table), you must be careful not to overload it with analytics queries. In most cases, it is recommended to: 1. Replicate into a warehouse table first (Snowflake, BigQuery, Databricks, and so on). 2. Apply this views‑only pattern within the warehouse, not directly on the Online Transaction Processing system. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Warehouse-native features for real-time data Modern data warehouses offer native features that can simplify near real-time data patterns. Instead of managing incremental logic yourself, you can declare the desired freshness and let the warehouse handle the refresh mechanics. This section covers when to use dynamic tables and materialized views instead of incremental models for near real-time data. * [Dynamic tables](https://docs.getdbt.com/reference/resource-configs/snowflake-configs.md#dynamic-tables) are a warehouse-specific feature in Snowflake that lets the warehouse keep a table updated for you. You define what the table should look like, and the warehouse keeps the table fresh automatically. * [Materialized views](https://docs.getdbt.com/docs/build/materializations.md#materialized-view) are a warehouse-specific feature that lets the warehouse save the results of a query so they’re faster to read, and refresh them as the underlying data changes. Note that the exact behavior depends on the warehouse. * [Incremental models](https://docs.getdbt.com/docs/build/incremental-models.md) are a dbt feature that lets dbt update a table by processing only new data. You tell dbt how new data should be added using your incremental logic SQL, and dbt runs the right SQL when the model is built. ###### When to consider warehouse-native features[​](#when-to-consider-warehouse-native-features "Direct link to When to consider warehouse-native features") **Use dynamic tables or materialized views when:** * Your requirement is "data always within X minutes of real time" and you don't need precise scheduling control. * You want to simplify operational complexity by offloading refresh logic to the warehouse. * Your transformations are relatively straightforward. * You're willing to trade some control for convenience. **Stick with incremental models when:** * You need fine-grained control over scheduling and refresh logic, * You have complex business rules requiring custom incremental strategies (like microbatching). * You need to coordinate refreshes across multiple models in a specific order. #### Dynamic tables[​](#dynamic-tables "Direct link to Dynamic tables") Warehouse support Dynamic tables are currently supported in Snowflake, with similar features available in other warehouses under different names. Check your warehouse documentation for availability. With dynamic tables, you can define the target state with SQL, and the warehouse automatically handles incremental refreshes. For example, the following SQL model uses a dynamic table to keep a table up to date for you: ```sql {{ config( materialized = 'dynamic_table', target_lag = '5 minutes', snowflake_warehouse = 'TRANSFORM_WH' -- Snowflake syntax ) }} select event_id, event_ts::timestamp_ntz as event_ts, to_date(event_ts) as event_date, user_id, event_type, payload from {{ source('raw', 'events') }} where event_ts >= current_timestamp() - interval '7 days'; ``` ###### target\_lag config[​](#target_lag-config "Direct link to target_lag config") The [`target_lag` parameter](https://docs.getdbt.com/reference/resource-configs/snowflake-configs.md#target-lag) tells the warehouse the maximum acceptable staleness of the dynamic table relative to its sources, and helps determine when the table should be refreshed. For example: * `target_lag = '1 minute'` - Warehouse keeps the table within one minute of its source data, refreshing automatically as needed. * `target_lag = '5 minutes'` - Table may lag up to five minutes behind its sources. * `target_lag = 'downstream'` - Table refreshes only when a downstream table depends on it. * `target_lag = '1 minute'` - Data refreshed to be within 1 minute of the source * `target_lag = '5 minutes'` - Data within 5 minutes * `target_lag = 'downstream'` - Refresh when downstream tables need it The warehouse automatically determines when to refresh, whether to do a full or incremental update, and how to optimize the refresh query. ###### Benefits[​](#benefits "Direct link to Benefits") * Declarative freshness: specify "how fresh" not "when to refresh" * Warehouse-managed optimization * Cost predictability: refreshes run only when needed to meet `target_lag` * Simple setup ###### Limitations[​](#limitations "Direct link to Limitations") * Less control over exact timing or orchestration logic * Cost visibility can be harder to predict than scheduled dbt jobs * Less visibility into refresh decisions compared to dbt's explicit incremental logic * Currently warehouse-specific (implementation varies by platform) #### Materialized views[​](#materialized-views "Direct link to Materialized views") Materialized views are available in most modern data warehouses and cache query results that automatically refresh when underlying data changes. Materialized views work like this: * The warehouse detects changes to source tables and refreshes the materialized view. * Many warehouses can incrementally update the view rather than recomputing everything. * Queries against the materialized view read cached results, not the underlying tables. For example, the following sql model uses a materialized view to keep a table up to date for you: ```sql {{ config( materialized = 'materialized_view' ) }} select user_id, date_trunc('hour', event_ts) as event_hour, count(*) as event_count from {{ source('raw', 'events') }} group by 1, 2; ``` #### Resources by warehouse[​](#resources-by-warehouse "Direct link to Resources by warehouse") Here are some resources for each warehouse: ###### BigQuery[​](#bigquery "Direct link to BigQuery") * [dbt developer docs: BigQuery materialized views](https://docs.getdbt.com/reference/resource-configs/bigquery-configs.md#materialized-views) * [BigQuery docs: Materialized views intro](https://cloud.google.com/bigquery/docs/materialized-views-intro) * [BigQuery docs: Streaming API](https://docs.cloud.google.com/bigquery/docs/write-api) ###### Databricks[​](#databricks "Direct link to Databricks") * [dbt developer docs: Databricks materialized views and streaming tables](https://docs.getdbt.com/reference/resource-configs/databricks-configs.md#materialized-views-and-streaming-tables) * [Databricks docs: Materialized views](https://docs.databricks.com/en/views/materialized.html) ###### Postgres[​](#postgres "Direct link to Postgres") * [dbt developer docs: Postgres materialized views](https://docs.getdbt.com/reference/resource-configs/postgres-configs.md#materialized-views) * [Postgres docs: Materialized views](https://www.postgresql.org/docs/current/rules-materializedviews.html) ###### Redshift[​](#redshift "Direct link to Redshift") * [dbt developer docs: Redshift materialized views](https://docs.getdbt.com/reference/resource-configs/redshift-configs.md#materialized-views) * [Redshift docs: Materialized views overview](https://docs.aws.amazon.com/redshift/latest/dg/materialized-view-overview.html) * [Redshift docs: Streaming ingestion to a materialized view](https://docs.aws.amazon.com/redshift/latest/dg/materialized-view-streaming-ingestion.html) ###### Snowflake[​](#snowflake "Direct link to Snowflake") * [dbt developer docs: Dynamic tables configurations](https://docs.getdbt.com/reference/resource-configs/snowflake-configs.md#dynamic-tables) * [Snowflake docs: Dynamic tables intro](https://docs.snowflake.com/en/user-guide/dynamic-tables-intro) * [Snowflake blog: Dynamic tables for streaming pipelines](https://www.snowflake.com/en/blog/dynamic-tables-delivering-declarative-streaming-data-pipelines/) * [Snowflake docs: Materialized views](https://docs.snowflake.com/en/user-guide/views-materialized) #### Related docs[​](#related-docs "Direct link to Related docs") * [dbt blog: Announcing materialized views](https://docs.getdbt.com/blog/announcing-materialized-views) * [dbt blog: Optimizing query run time with materialization schedules](https://www.getdbt.com/blog/optimizing-query-run-time-with-materialization-schedules/) #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Who is dbt Mesh for? Before embarking on a Mesh implementation, it's important to understand if Mesh is the right fit for your team. Here, we outline three common organizational structures to help teams identify whether Mesh might fit your organization's needs. #### The enterprise data mesh[​](#the-enterprise-data-mesh "Direct link to The enterprise data mesh") Some data teams operate on a global scale. By definition, the team needs to manage, deploy, and distribute data products across a large number of teams. Central IT may own some data products or simply own the platform upon which data products are built. Often, these organizations have “architects” who can advise line-of-business teams on their work while keeping track of what’s happening globally (regarding tooling and the substance of work). This is a lot like how software organizations work beyond a certain scale. The headcount ratio of domain teams to platform teams in this scenario is roughly ≥10:1. For each member of the central platform team, there might be dozens of members of domain-aligned data teams. Is Mesh a good fit in this scenario? Absolutely! There is no other way to share data products at scale. One dbt project would not keep up with the global demands of an organization like this. ##### Tips and tricks[​](#tips-and-tricks "Direct link to Tips and tricks") * **Managing shared macros**: Teams operating at this scale will benefit from a separate repository containing a dbt package of reusable utility macros that all other projects will install. This is different from public models, which provide data-as-a-service (a set of “API endpoints”) — this is distributed as a **library**. This package can also standardize imports of other third-party packages, as well as providing wrappers / shims for those macros. This package should have a dedicated team of maintainers — probably the central platform team, or a set of “superusers” from domain-aligned data modeling teams. tip To help you get started, check out our [Quickstart with Mesh](https://docs.getdbt.com/guides/mesh-qs.md) or our online [Mesh course](https://learn.getdbt.com/courses/dbt-mesh) to learn more! ##### Adoption challenges[​](#adoption-challenges "Direct link to Adoption challenges") * Onboarding hundreds of people and dozens of projects is full of friction! The challenges of a scaled, global organization are not to be underestimated. To start the migration, prioritize teams that have strong dbt familiarity and fundamentals. Mesh is an advancement of core dbt deployments, so these teams are likely to have a smoother transition. Additionally, prioritize teams that manage strategic data assets that need to be shared widely. This ensures that Mesh will help your teams deliver concrete value quickly. If this sounds like your organization, Mesh is the architecture you should pursue. ✅ #### Hub and spoke[​](#hub-and-spoke "Direct link to Hub and spoke") Some slightly smaller organizations still operate with a central data team serving several business-aligned analytics teams in a ~5:1 headcount ratio. These central teams look less like an IT function and more like a modern data platform team of analytics engineers. This team provides the majority of the data products to the rest of the org, as well as the infrastructure for downstream analytics teams to spin up their own spoke projects to ensure quality and maintenance of the core platform. Is Mesh a good fit in this scenario? Almost certainly! If your central data team starts to bottleneck analysts’ work, you need a way for those teams to operate relatively independently while still ensuring the quality of the most used data products. Mesh is designed to solve this exact problem. ##### Tips and tricks[​](#tips-and-tricks-1 "Direct link to Tips and tricks") * **Data products by some, for all:** The spoke teams shouldn’t produce public models. By contrast, development in the hub team project should be slower, more careful, and focus on producing foundational public models shared across domains. We’d recommend giving hub team members access (at least read-only) to downstream projects, which will help with more granular impact analysis within Catalog. If a public model isn’t used in any downstream project or a specific column in that model, the hub team can feel better about removing it. However, they should still utilize the dbt governance features like `deprecation_date` and `version` as appropriate to set expectations. If there is a need for a public model in a spoke project to be shared across multiple projects, consider first whether it could or should be moved to the hub project. * **Sources:** Spokes should be allowed/encouraged to define and use *domain-specific* data sources. The platform team should not need to worry about, say, `Thinkific` data when building core data marts, but the Training project may need to. *No two sources anywhere in a dbt mesh should point to the same relation object.* If a spoke feels like they need to use a source the hub already uses, the interfaces should change so that the spoke can get what they need from the platform project. * **Project quality:** More analyst-focused teams will have different skill levels & quality bars. Owning their data means they own the consequences as well. Rather than being accountable for the end-to-end delivery of data assets, the Hub team is an enablement team: their role is to provide guardrails and quality checks, but not to fix all the issues exactly to their liking (and thereby remain a bottleneck). ##### Adoption challenges[​](#adoption-challenges-1 "Direct link to Adoption challenges") There are trade-offs to using this architecture, especially for the hub team managing and maintaining public models. This workflow has intentional friction to reduce the chances of unintentional model changes that break unspoken data contracts. These assurances may come with some sacrifices, such as faster onboarding or more flexible development workflows. Compared to having a single project, where a select few are doing all the development work, this architecture optimizes for slower development from a wider group of people. If this sounds like your organization, it's very likely that Mesh is a good fit for you. ✅ #### Single team monolith[​](#single-team-monolith "Direct link to Single team monolith") Some organizations operate on an even smaller scale. If your data org is a single small team that controls the end-to-end process of building and maintaining all data products at the organization, Mesh may not be required. The complexity in projects comes from having a wide variety of data sources and stakeholders. However, given the team's size, operating on a single codebase may be the most efficient way to manage data products. Generally, if a team of this size and scope is looking to implement Mesh, it's likely that they are looking for better interface design and/or performance improvements for certain parts of their dbt DAG, and not because they necessarily have an organizational pain point to solve. *Is Mesh a good fit?* Maybe! There are reasons to separate out parts of a large monolithic project into several to better orchestrate and manage the models. However, if the same people are managing each project, they may find that the overhead of managing multiple projects is not worth the benefits. If this sounds like your organization, it's worth considering whether Mesh is a good fit for you. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Writing custom generic data tests dbt ships with [Not Null](https://docs.getdbt.com/reference/resource-properties/data-tests.md#not-null), [Unique](https://docs.getdbt.com/reference/resource-properties/data-tests.md#unique), [Relationships](https://docs.getdbt.com/reference/resource-properties/data-tests.md#relationships), and [Accepted Values](https://docs.getdbt.com/reference/resource-properties/data-tests.md#accepted-values) generic data tests. (These used to be called "schema tests," and you'll still see that name in some places.) Under the hood, these generic data tests are defined as `test` blocks (like macros). info There are tons of generic data tests defined in open source packages, such as [dbt-utils](https://hub.getdbt.com/dbt-labs/dbt_utils/latest/) and [dbt-expectations](https://hub.getdbt.com/calogica/dbt_expectations/latest/) — the test you're looking for might already be here! ##### Generic tests with standard arguments[​](#generic-tests-with-standard-arguments "Direct link to Generic tests with standard arguments") Generic tests are defined in SQL files. Those files can live in two places: * `tests/generic/`: that is, a special subfolder named `generic` within your [test paths](https://docs.getdbt.com/reference/project-configs/test-paths.md) (`tests/` by default) * `macros/`: Why? Generic tests work a lot like macros, and historically, this was the only place they could be defined. If your generic test depends on complex macro logic, you may find it more convenient to define the macros and the generic test in the same file. To define your own generic tests, simply create a `test` block called ``. All generic tests should accept one or both of the standard arguments: * `model`: The resource on which the test is defined, templated out to its relation name. (Note that the argument is always named `model`, even when the resource is a source, seed, or snapshot.) * `column_name`: The column on which the test is defined. Not all generic tests operate on the column level, but if they do, they should accept `column_name` as an argument. Here's an example of an `is_even` schema test that uses both arguments: tests/generic/test\_is\_even.sql ```sql {% test is_even(model, column_name) %} with validation as ( select {{ column_name }} as even_field from {{ model }} ), validation_errors as ( select even_field from validation -- if this is true, then even_field is actually odd! where (even_field % 2) = 1 ) select * from validation_errors {% endtest %} ``` If this `select` statement returns zero records, then every record in the supplied `model` argument is even! If a nonzero number of records is returned instead, then at least one record in `model` is odd, and the test has failed. To use this generic test, specify it by name in the `data_tests` property of a model, source, snapshot, or seed: With one line of code, you've just created a test! In this example, `users` will be passed to the `is_even` test as the `model` argument, and `favorite_number` will be passed in as the `column_name` argument. You could add the same line for other columns, other models—each will add a new test to your project, *using the same generic test definition*. ##### Add description to generic data test logic[​](#add-description-to-generic-data-test-logic "Direct link to Add description to generic data test logic") You can add a description to the Jinja macro that provides the core logic for a data test by including the `description` key under the `macros:` section. You can add descriptions directly to the macro, including descriptions for macro arguments. Here's an example: macros/generic/schema.yml ```yaml macros: - name: test_not_empty_string description: Complementary test to default `not_null` test as it checks that there is not an empty string. It only accepts columns of type string. arguments: - name: model type: string description: Model Name - name: column_name type: string description: Column name that should not be an empty string ``` In this example: * When documenting custom test macros in a `schema.yml` file, add the `test_` prefix to the macro name. For example, if the test block's name is `not_empty_string`, then the macro's name would be `test_not_empty_string`. * We've provided a description at the macro level, explaining what the test does and any relevant notes. * Each argument (like `model`, `column_name`) also includes a description to clarify its purpose. ##### Generic tests with additional arguments[​](#generic-tests-with-additional-arguments "Direct link to Generic tests with additional arguments") The `is_even` test works without needing to specify any additional arguments. Other tests, like `relationships`, require more than just `model` and `column_name`. If your custom tests requires more than the standard arguments, include those arguments in the test signature, as `field` and `to` are included below: tests/generic/test\_relationships.sql ```sql {% test relationships(model, column_name, field, to) %} with parent as ( select {{ field }} as id from {{ to }} ), child as ( select {{ column_name }} as id from {{ model }} ) select * from child where id is not null and id not in (select id from parent) {% endtest %} ``` When calling this test from a `.yml` file, supply the arguments to the test in a dictionary. Note that the standard arguments (`model` and `column_name`) are provided by the context, so you do not need to define them again. ##### Generic tests with default config values[​](#generic-tests-with-default-config-values "Direct link to Generic tests with default config values") It is possible to include a `config()` block in a generic test definition. Values set there will set defaults for all specific instances of that generic test, unless overridden within the specific instance's `.yml` properties. tests/generic/warn\_if\_odd.sql ```sql {% test warn_if_odd(model, column_name) %} {{ config(severity = 'warn') }} select * from {{ model }} where ({{ column_name }} % 2) = 1 {% endtest %} ``` Any time the `warn_if_odd` test is used, it will *always* have warning-level severity, unless the specific test overrides that value: ##### Customizing dbt's built-in tests[​](#customizing-dbts-built-in-tests "Direct link to Customizing dbt's built-in tests") To change the way a built-in generic test works—whether to add additional parameters, re-write the SQL, or for any other reason—you simply add a test block named `` to your own project. dbt will favor your version over the global implementation! tests/generic/\.sql ```sql {% test unique(model, column_name) %} -- whatever SQL you'd like! {% endtest %} ``` ##### Examples[​](#examples "Direct link to Examples") Here's some additional examples of custom generic ("schema") tests from the community: * [Creating a custom schema test with an error threshold](https://discourse.getdbt.com/t/creating-an-error-threshold-for-schema-tests/966) * [Using custom schema tests to only run tests in production](https://discourse.getdbt.com/t/conditionally-running-dbt-tests-only-running-dbt-tests-in-production/322) * [Additional examples of custom schema tests](https://discourse.getdbt.com/t/examples-of-custom-schema-tests/181) #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ## Category ### Account FAQs #### [📄️ Account-specific features](https://docs.getdbt.com/faqs/Accounts/account-specific-features.md) [Account-specific features](https://docs.getdbt.com/faqs/Accounts/account-specific-features.md) --- ### Available flags #### [📄️ Anonymous usage stats](https://docs.getdbt.com/reference/global-configs/usage-stats.md) [dbt Labs is on a mission to build the best version of dbt possible, and a crucial part of that is understanding how users work with dbt. To this end, we've added some simple event tracking (or telemetry) to dbt using Snowplow. Importantly, we do not track credentials, raw model contents, or model names: we consider these private, and frankly none of our business.](https://docs.getdbt.com/reference/global-configs/usage-stats.md) --- ### Cost Insights FAQs #### [📄️ Actual vs displayed costs]() [Explanation of why actual warehouse costs may differ from displayed costs]() --- ### dbt Core FAQs #### [📄️ Installing dbt Core with pip](https://docs.getdbt.com/faqs/Core/install-pip-best-practices.md) [Instructions on how to install dbt Core with pip](https://docs.getdbt.com/faqs/Core/install-pip-best-practices.md) --- ### Documentation FAQs #### [📄️ Types of columns included in doc site](https://docs.getdbt.com/faqs/Docs/document-all-columns.md) [All columns appear in your docs site](https://docs.getdbt.com/faqs/Docs/document-all-columns.md) --- ### Environments FAQs #### [📄️ Custom branch settings](https://docs.getdbt.com/faqs/Environments/custom-branch-settings.md) [Use custom code from your repository](https://docs.getdbt.com/faqs/Environments/custom-branch-settings.md) --- ### General configs #### [📄️ Advanced usage](https://docs.getdbt.com/reference/advanced-config-usage.md) [Alternative SQL file config syntax](https://docs.getdbt.com/reference/advanced-config-usage.md) --- ### General properties #### [📄️ anchors](https://docs.getdbt.com/reference/resource-properties/anchors.md) [Definition](https://docs.getdbt.com/reference/resource-properties/anchors.md) --- ### Git FAQs #### [📄️ How to migrate git providers](https://docs.getdbt.com/faqs/Git/git-migration.md) [Learn how to migrate git providers in dbt with minimal disruption.](https://docs.getdbt.com/faqs/Git/git-migration.md) --- ### Jinja FAQs #### [📄️ Compiled sql has a lot of white space](https://docs.getdbt.com/faqs/Jinja/jinja-whitespace.md) [Managing whitespace control](https://docs.getdbt.com/faqs/Jinja/jinja-whitespace.md) --- ### Jinja reference #### [🗃️ dbt Jinja context functions](https://docs.getdbt.com/reference/dbt-jinja-functions-context-variables.md) [47 items](https://docs.getdbt.com/reference/dbt-jinja-functions-context-variables.md) --- ### List of commands #### [📄️ build](https://docs.getdbt.com/reference/commands/build.md) [The dbt build command will:](https://docs.getdbt.com/reference/commands/build.md) --- ### Models FAQs #### [📄️ Model configurations](https://docs.getdbt.com/faqs/Models/available-configurations.md) [Learning about model configurations](https://docs.getdbt.com/faqs/Models/available-configurations.md) --- ### Project configs #### [📄️ dbt\_project.yml](https://docs.getdbt.com/reference/dbt_project.yml.md) [Reference guide for configuring the dbt\_project.yml file.](https://docs.getdbt.com/reference/dbt_project.yml.md) --- ### Project FAQs #### [📄️ Add a seed file](https://docs.getdbt.com/faqs/Project/add-a-seed.md) [Learn how to add a seed file to your project](https://docs.getdbt.com/faqs/Project/add-a-seed.md) --- ### Runs FAQs #### [📄️ Reviewing SQL that dbt runs](https://docs.getdbt.com/faqs/Runs/checking-logs.md) [Review logs to check the SQL dbt is running](https://docs.getdbt.com/faqs/Runs/checking-logs.md) --- ### Seeds FAQs #### [📄️ Build one seed at a time](https://docs.getdbt.com/faqs/Seeds/build-one-seed.md) [Use select flag to build one seed at a time](https://docs.getdbt.com/faqs/Seeds/build-one-seed.md) --- ### Setting flags #### [📄️ Command line options](https://docs.getdbt.com/reference/global-configs/command-line-options.md) [For consistency, command-line interface (CLI) flags should come right after the dbt prefix and its subcommands. This includes "global" flags (supported for all commands). For a list of all flags you can set, refer to Available flags. When set, CLI flags override environment variables and project flags.](https://docs.getdbt.com/reference/global-configs/command-line-options.md) --- ### Snapshots FAQs #### [📄️ Use hooks to run with snapshots](https://docs.getdbt.com/faqs/Snapshots/snapshot-hooks.md) [Run hooks with snapshots](https://docs.getdbt.com/faqs/Snapshots/snapshot-hooks.md) --- ### Tests FAQs #### [📄️ Available data tests to use in dbt](https://docs.getdbt.com/faqs/Tests/available-tests.md) [Types of data tests to use in dbt](https://docs.getdbt.com/faqs/Tests/available-tests.md) --- ### Troubleshooting FAQs #### [📄️ Generate HAR files](https://docs.getdbt.com/faqs/Troubleshooting/generate-har-file.md) [How to generate HAR files for debugging](https://docs.getdbt.com/faqs/Troubleshooting/generate-har-file.md) --- ### Warehouse FAQs #### [📄️ Setting up permissions in BigQuery"](https://docs.getdbt.com/faqs/Warehouse/bq-impersonate-service-account-setup.md) [Use service account to set up permissions in BigQuery](https://docs.getdbt.com/faqs/Warehouse/bq-impersonate-service-account-setup.md) --- ## Community ### Alan Cruickshank [Home](https://docs.getdbt.com/index.md "Home")[Community spotlight](https://docs.getdbt.com/community/spotlight.md "Community spotlight")[Alan Cruickshank](https://docs.getdbt.com/community/spotlight/alan-cruickshank.md "Alan Cruickshank") ![Alan Cruickshank](/img/community/spotlight/alan-cruickshank.jpg?v=2) he/him Insights Director , tails.com Location: London, UK Organizations: Author & Maintainer of SQLFluff [LinkedIn](https://www.linkedin.com/in/amcruickshank/ "LinkedIn") | [SQLFluff](https://sqlfluff.com "SQLFluff") #### About I've been around in the dbt community, especially the [London dbt Meetup](https://www.meetup.com/london-dbt-meetup/ "London dbt Meetup"), since early 2019—around the time that we started using dbt at tails.com. My background is the startup/scaleup space and building data teams in a context where there is a lot of growth going on but there isn't a lot of money around to support that. That's a topic that I've written and spoken about on several occasions on podcasts, blogposts and even at Coalesce [2020](https://www.getdbt.com/coalesce-2020/presenting-sqlfluff/ "2020") and [2021](https://www.getdbt.com/coalesce-2021/this-is-just-the-beginning/ "2021")! Aside from my work at tails.com, my other main focus at the moment is [SQLFluff](https://sqlfluff.com/ "SQLFluff"), the open source SQL linter which I started developing as part of a hackday at tails.com in late 2019 and now is the most starred SQL linter on Github with almost 1M downloads a month. #### When did you join the dbt community and in what way has it impacted your career?[​](#when-did-you-join-the-dbt-community-and-in-what-way-has-it-impacted-your-career "Direct link to When did you join the dbt community and in what way has it impacted your career?") I [joined the community](https://www.getdbt.com/community/?utm_medium=internal\&utm_source=docs\&utm_campaign=q3-2024_dbt-spotlight_aw\&utm_content=____\&utm_term=all___) in 2019 and it's been an invaluable source of advice and wisdom, especially operating on the bleeding edge of open source data tooling. It's been a place to meet like-minded people, even find new colleagues and certainly one of the places I look to when thinking about how to approach hairy data problems. In London it's also been one of the most vibrant meetup groups in person, compared to many others which are either very, very specialized or more focussed on larger organisations. #### What dbt community leader do you identify with? How are you looking to grow your leadership in the dbt community?[​](#what-dbt-community-leader-do-you-identify-with-how-are-you-looking-to-grow-your-leadership-in-the-dbt-community "Direct link to What dbt community leader do you identify with? How are you looking to grow your leadership in the dbt community?") I just want to be useful 😁. I've learned a lot from the community over the years, and now I want to be able to give back to it. My primary vehicle for that is SQLFluff - both as a tool for the community to use, but also as a way of encouraging a wider group of people to feel welcome and able to contribute to open source software and build the tools of the future. I also see SQLFluff as a vehicle to drive more consistency in the way we write SQL, and through that drive better communication and lower the barrier for new people to enter this field and find their own success. #### What have you learned from community members? What do you hope others can learn from you?[​](#what-have-you-learned-from-community-members-what-do-you-hope-others-can-learn-from-you "Direct link to What have you learned from community members? What do you hope others can learn from you?") For better or worse, I spend most of my day job on people and organisational things, less on how to solve individual problems, and more on how to enable and support groups of people in being able to make great decisions themselves. In some ways, if I have to touch the keyboard too much, it's a sign that I've failed in that calling. dbt itself is a tool which enables better collaboration—and the community is full of people with great ideas on how to better enable other people around us. I hope that I'm able to pass some of that knowledge and the experience of applying it in a scaleup environment back to others also treading this path. More specifically from the dbt community, if I were to pick one recommendation, it would be Emilie Schario’s talk from Coalesce 2022 on [“Data Led is Dumb”](https://www.youtube.com/watch?v=WsMHPALc8Vg\&t=1s). I think should be essential watching for anyone who’s hearing “Data Led” a lot, and wants to turn that excitement into practical action. #### Anything else interesting you want to tell us?[​](#anything-else-interesting-you-want-to-tell-us "Direct link to Anything else interesting you want to tell us?") If you're not using SQLFluff on your dbt project, you probably should be: --- ### Alison Stanton [Home](https://docs.getdbt.com/index.md "Home")[Community spotlight](https://docs.getdbt.com/community/spotlight.md "Community spotlight")[Alison Stanton](https://docs.getdbt.com/community/spotlight/alison-stanton.md "Alison Stanton") Community Award Recipient 2023 ![Alison Stanton](/img/community/spotlight/alison.jpg?v=2) she/her AVP, Analytics Engineering Lead Location: Chicago, IL, USA Organizations: Advocates for SOGIE Data Collection [LinkedIn](https://www.linkedin.com/in/alisonstanton/ "LinkedIn") | [Github](https://github.com/alison985/ "Github") #### About I started programming 20+ years ago. I moved from web applications into transforming data and business intelligence reporting because it's both hard and useful. The majority of my career has been in engineering for SaaS companies. For my last few positions I've been brought in to transition larger, older companies to a modern data platform and ways of thinking. I am dbt Certified. I attend Coalesce and other dbt events virtually. I speak up in [dbt Slack](https://www.getdbt.com/community/join-the-community) and on the dbt-core, dbt-redshift, and dbt-sqlserver repositories. dbt Slack is my happy place, especially #advice-for-dbt-power-users. I care a lot about the dbt documentation and dbt doc. #### When did you join the dbt community and in what way has it impacted your career?[​](#when-did-you-join-the-dbt-community-and-in-what-way-has-it-impacted-your-career "Direct link to When did you join the dbt community and in what way has it impacted your career?") I joined the dbt community when I joined an employer in mid-2020. To summarize the important things that dbt has given me: it allowed me to focus on the next set of data challenges instead of staying in toil. Data folks joke that we're plumbers, but we're digital plumbers and that distinction should enable us to be DRY. That means not only writing DRY code like dbt allows, but also having tooling automation to DRY up repetitive tasks like dbt provides. dbt's existence flipped the experience of data testing on its head for me. I went from a)years of instigating tech discussions on how to systematize data quality checks and b) building my own SQL tests and design patterns, to having built-in mechanisms for data testing. dbt and the dbt community materials are assets I can use in order to provide validation for things I have, do, and will say about data. Having outside voices to point to when requesting investment in data up-front - to avoid problems later - is an under-appreciated tool for data leader's toolboxes. dbt's community has given me access to both a) high-quality, seasoned SMEs in my field to learn from and b) newer folks I can help. Both are gifts that I cherish. #### What dbt community leader do you identify with? How are you looking to grow your leadership in the dbt community?[​](#what-dbt-community-leader-do-you-identify-with-how-are-you-looking-to-grow-your-leadership-in-the-dbt-community "Direct link to What dbt community leader do you identify with? How are you looking to grow your leadership in the dbt community?") I want to be when I grow up: * MJ, who was the first person to ever say "data build tool" to me. If I'd listened to her then I could have been part of the dbt community years sooner. * Christine Dixon who presented ["Could You Defend Your Data in Court?"](https://www.youtube.com/watch?v=vD6IrGtxNAM) at Coalesce 2023. In your entire data career, that is the most important piece of education you'll get. * The dbt community team in general. Hands-down the most important work they do is the dbt Slack community, which gives me and others the accessibility we need to participate. Gwen Windflower (Winnie) for her extraordinary ability to bridge technical nuance with business needs on-the-fly. Dave Connors for being the first voice for "a node is a node is a node". Joel Labes for creating the ability to emoji-react with to post to the #best-of-slack channel. And so on. The decision to foster a space for data instead of just for their product because that enhances their product. The extremely impressive ability to maintain a problem-solving-is-cool, participate-as-you-can, chorus-of-voices, international, not-only-cis-men, and we're-all-in-this-together community. * Other (all?) dbt labs employees who engage with the community, instead of having a false separation with it — like most software companies. Welcoming feedback, listening to it, and actioning or filtering it out (ex. Mirna Wong, account reps). Thinking holistically about the eco-system, not just one feature at a time (ex. Anders). Responsiveness and ability to translate diverse items into technical clarity and focused actions (ex. Doug Beatty, the dbt support team). I've been in software and open source and online communities for a long time - these are rare things we should not take for granted. * Josh Devlin for prolificness that demonstrates expertise and dedication to helping. * The maintainers of dbt packages like dbt-utils, dbt-expectations, dbt-date, etc. * Everyone who gets over their fear to ask a question, propose an answer that may not work, or otherwise take a risk by sharing their voice. I hope I can support my employer my professional development and my dbt community through the following: * Elevate dbt understanding of and support for Enterprise-size company use cases through dialogue, requests, and examples. * Emphasize rigor with defensive coding and comprehensive testing practices. * Improve the onboarding and up-skilling of dbt engineers through feedback and edits on [docs.getdbt.com](https://docs.getdbt.com/index.md). * Contribute to the maintenance of a collaborative and helpful dbt community as the number of dbt practitioners reaches various growth stages and tipping points. * Engage in dialogue. Providing feedback. Champion developer experience as a priority. Be a good open-source citizen on GitHub. #### What have you learned from community members? What do you hope others can learn from you?[​](#what-have-you-learned-from-community-members-what-do-you-hope-others-can-learn-from-you "Direct link to What have you learned from community members? What do you hope others can learn from you?") I have learned: * Details on DAG sequencing. * How to make an engineering proposal a community conversation. * The [dbt semantic layer](https://www.getdbt.com/product/semantic-layer) So many things that are now so engrained in me that I can't remember not knowing them. I can teach and share about: * Naming new concepts and how to choose those names. * Reproducibility, reconciliation, and audits. * Data ethics. * Demographic questions for sexual orientation and/or gender identity on a form. I'm happy to be your shortcut to the most complicated data and most engrained tech debt in history. I also geek out talking about: * reusing functionality in creative ways, * balancing trade-offs in data schema modeling, * dealing with all of an organization's data holistically, * tracking instrumentation, and * the philosophy on prioritization. The next things on my agenda to learn about: * Successes and failures in data literacy work. The best I've found so far is 1:1 interactions and that doesn't scale. * How to reduce the amount of time running dbt test takes while maintaining coverage. Data ethics. The things you think are most important by giving them a emoji reaction in Slack. #### Anything else interesting you want to tell us?[​](#anything-else-interesting-you-want-to-tell-us "Direct link to Anything else interesting you want to tell us?") My gratitude to each community member for this community. --- ### Anya Prosvetova [Home](https://docs.getdbt.com/index.md "Home")[Community spotlight](https://docs.getdbt.com/community/spotlight.md "Community spotlight")[Anya Prosvetova](https://docs.getdbt.com/community/spotlight/anya-prosvetova.md "Anya Prosvetova") ![Anya Prosvetova](/img/community/spotlight/anya-prosvetova.jpg?v=2) she/her Senior Data Engineer , Aimpoint Digital Location: Amsterdam, Netherlands Organizations: Tableau Vizionary & DataDev Ambassador [Twitter](https://www.twitter.com/anyalitica "Twitter") | [LinkedIn](https://uk.linkedin.com/in/annaprosvetova "LinkedIn") | [Website](https://prosvetova.com "Website") #### About I’m a Data Engineer with a background in SaaS, consulting, financial services and the creative industries. I help organisations convert data into value, developing data pipelines and automating processes. I’m also a Tableau Visionary and DataDev Ambassador, and one of the organisers of Data + Women Netherlands community. I became an active member of the dbt Community about a year ago, and it was a great place to learn and ask questions. And it was really inspiring to speak at the first Amsterdam dbt Meetup recently, and meet the local community of fellow Analytics and Data Engineers. #### When did you join the dbt community and in what way has it impacted your career?[​](#when-did-you-join-the-dbt-community-and-in-what-way-has-it-impacted-your-career "Direct link to When did you join the dbt community and in what way has it impacted your career?") A few years back, I became a member of the dbt Community, but it wasn't until about a year ago, when I started using dbt at work, that I began actively engaging with it. Being the only data person in my company, the Community became a valuable resource for me to learn and ask questions. It's an excellent platform to gain insights from others, exchange experiences, and stay up-to-date with the latest product features. #### What dbt community leader do you identify with? How are you looking to grow your leadership in the dbt community?[​](#what-dbt-community-leader-do-you-identify-with-how-are-you-looking-to-grow-your-leadership-in-the-dbt-community "Direct link to What dbt community leader do you identify with? How are you looking to grow your leadership in the dbt community?") What I enjoy about the dbt Community is that its thought leaders are focused on working together to create a culture of mutual support and shared learning. Everyone is welcome to ask a question or share their latest blog without the fear of being judged. I believe that everyone has something valuable to contribute to the community, and I hope to help facilitate this supportive and collaborative environment where we can all learn from each other. #### What have you learned from community members? What do you hope others can learn from you?[​](#what-have-you-learned-from-community-members-what-do-you-hope-others-can-learn-from-you "Direct link to What have you learned from community members? What do you hope others can learn from you?") I've learned a lot about best practices for working with dbt and data in general, as well as tips and tricks for specific use cases. I've also gained a better understanding of the diverse range of data challenges that people face in different industries and contexts. As for what I hope others can learn from me, I aim to share my own experiences and knowledge in a way that is approachable and useful to people at all skill levels and backgrounds. --- ### Become a contributor #### Want to get involved? Start here[​](#want-to-get-involved-start-here "Direct link to Want to get involved? Start here") The dbt Community predates dbt Labs as an organization and harkens back to the days when a scrappy analytics consultancy of a few [pissed off data analysts](https://www.hashpath.com/2020/12/an-analytics-engineer-is-really-just-a-pissed-off-data-analyst/#:~:text=Often%20times%2C%20an%20analytics%20engineer,necessity%20\(and%20genius%20branding\).) started hacking together an open source project around which gathered a community that would change how the world uses data. The dbt Community exists to allow analytics practitioners to share their knowledge, help others and collectively to drive forward the discipline of analytics engineering. This is something that can’t be done by any one individual or any one organization - to create a new discipline is necessarily a community effort. The only reason that dbt has become as widespread as it has is because people like you choose to get involved and share your knowledge. Contributing to the community can also be a great way to learn new skills, build up a public portfolio and make friends with other practitioners. There are opportunities here for everyone to get involved, whether you are just beginning your analytics engineering journey or you are a seasoned data professional. Contributing isn’t about knowing all of the answers, it’s about learning things together. Below you’ll find a sampling of the ways to get involved. There are a lot of options but these are ultimately just variations on the theme of sharing knowledge with the broader community. [![](/img/icons/pencil-paper.svg)](https://docs.getdbt.com/community/contributing/contributing-writing.md) ###### [Writing contributions](https://docs.getdbt.com/community/contributing/contributing-writing.md) [Learn how to share and grow the collective knowledge of the dbt Community through blogs, guides, and documentation.](https://docs.getdbt.com/community/contributing/contributing-writing.md) [![](/img/icons/folder.svg)](https://docs.getdbt.com/community/contributing/contributing-coding.md) ###### [Coding contributions](https://docs.getdbt.com/community/contributing/contributing-coding.md) [The dbt Community supports a wide variety of open source and source-available projects, and this software is at the heart of everything we do. Learn how to get involved with projects in the dbt ecosystem.](https://docs.getdbt.com/community/contributing/contributing-coding.md) [![](/img/icons/discussions.svg)](https://docs.getdbt.com/community/contributing/contributing-online-community.md) ###### [Online community building](https://docs.getdbt.com/community/contributing/contributing-online-community.md) [Getting involved in the dbt Community Forum or Slack is one of the best entry points for contributing. Share your knowledge and learn from others.](https://docs.getdbt.com/community/contributing/contributing-online-community.md) [![](/img/icons/calendar.svg)](https://docs.getdbt.com/community/contributing/contributing-realtime-events.md) ###### [Realtime event participation](https://docs.getdbt.com/community/contributing/contributing-realtime-events.md) [Want to speak at a Meetup or conference? Learn how to get involved and check out best practices for crafting a talk that everyone will remember.](https://docs.getdbt.com/community/contributing/contributing-realtime-events.md) #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Bruno de Lima [Home](https://docs.getdbt.com/index.md "Home")[Community spotlight](https://docs.getdbt.com/community/spotlight.md "Community spotlight")[Bruno de Lima](https://docs.getdbt.com/community/spotlight/bruno-de-lima.md "Bruno de Lima") Community Award Recipient 2024 ![Bruno de Lima](/img/community/spotlight/bruno-souza-de-lima-newimage.jpg?v=2) he/him Senior Data Engineer , phData Location: Florianópolis, Brazil [LinkedIn](https://www.linkedin.com/in/brunoszdl/ "LinkedIn") | [Medium](https://medium.com/@bruno.szdl "Medium") #### About Hey all! I was born and raised in Florianopolis, Brazil, and I'm a Senior Data Engineer at phData. I live with my fiancée and I enjoy music, photography, and powerlifting. I started my career in early 2022 at Indicium as an Analytics Engineer, working with dbt from day 1. By 2023, my path took a global trajectory as I joined phData as a Data Engineer, expanding my experiences and creating connections beyond Brazil. While dbt is my main expertise, because of my work in consultancy I have experience with a large range of tools, specially the ones related to Snowflake, Databricks, AWS and GCP; but I have already tried several other modern data stack tools too. I actively participate in the dbt community, having organized dbt Meetups in Brazil (in [Floripa](https://www.meetup.com/en-AU/florianopolis-dbt-meetup) and [São Paulo](https://www.meetup.com/sao-paulo-dbt-meetup-group/)); writing about dbt-related topics in my Medium and LinkedIn profiles; contributing to the dbt Core code and to the docs; and frequently checking [dbt Slack](https://www.getdbt.com/community/join-the-community/) and [Discourse](https://discourse.getdbt.com/), helping (and being helped by) other dbt practitioners. If you are a community member, you may have seen me around! #### When did you join the dbt community and in what way has it impacted your career?[​](#when-did-you-join-the-dbt-community-and-in-what-way-has-it-impacted-your-career "Direct link to When did you join the dbt community and in what way has it impacted your career?") I was not truly happy with my academic life. My career took a new turn when I enrolled in the Analytics Engineer course by Indicium. That was my first contact with dbt, and I didn't realize how much it would transform my career. After that, I was hired at the company as an Analytics Engineer and worked extensively with dbt from day one. It took me some time to become an active member of the dbt community. I started working with dbt at the beginning of 2022 and became more involved towards the end of that year, encouraged by Daniel Avancini. I regret not doing this earlier, because being an active community member has been a game-changer for me, as my knowledge of dbt has grown exponentially just by participating in daily discussions on Slack. I have found #advice-dbt-help and #advice-dbt-for-power-users channels particularly useful, as well as the various database-specific channels. Additionally, the #i-made-this and #i-read-this channels have allowed me to learn about the innovative things that community members are doing. Inspired by other members, especially Josh Devlin and Owen Prough, I began answering questions on Slack and Discourse. For questions I couldn't answer, I would try engaging in discussions about possible solutions or provide useful links. I also started posting dbt tips on LinkedIn to help practitioners learn about new features or to refresh their memories about existing ones. By being more involved in the community, I felt more connected and supported. I received help from other members, and now, I could help others, too. I was happy with this arrangement, but more unexpected surprises came my way. My active participation in Slack, Discourse, and LinkedIn opened doors to new connections and career opportunities. I had the pleasure of meeting a lot of incredible people and receiving exciting job offers, including the ones for working at phData and teaching at Zach Wilson's data engineering bootcamp. Thanks to the dbt community, I went from feeling uncertain about my career prospects to having a solid career and being surrounded by incredible people. #### What dbt community leader do you identify with? How are you looking to grow your leadership in the dbt community?[​](#what-dbt-community-leader-do-you-identify-with-how-are-you-looking-to-grow-your-leadership-in-the-dbt-community "Direct link to What dbt community leader do you identify with? How are you looking to grow your leadership in the dbt community?") I identify with Gwen Windflower and Joel Labes, or at least they are the kind of leader I admire. Their strong presence and continuous interaction with all types of dbt enthusiasts make everyone feel welcomed in the community. They uplift those who contribute to the community, whether it's through a LinkedIn post or answering a question, and provide constructive feedback to help them improve. And of course they show a very strong knowledge about dbt and data in general, which is reflected in their contributions. And that is how I aspire to grow as a leader in the dbt Community. Despite of being an introvert, I like interacting with people, helping solve problems and providing suggestions. Recognizing and acknowledging the achievements of others is also important to me, as it fosters a positive environment where everyone's contributions are valued. And I am continuously learning about dbt to improve my skills, and to become a trustworthy reference for others to rely on. #### What have you learned from community members? What do you hope others can learn from you?[​](#what-have-you-learned-from-community-members-what-do-you-hope-others-can-learn-from-you "Direct link to What have you learned from community members? What do you hope others can learn from you?") I have learned that regardless of one's level of expertise, each person's voice is valued and respected in the community. I have also learned the importance of helping others and thinking critically, not just answering questions, but assuring that is the right question. By actively engaging with others, sharing knowledge and insights, we can collectively improve our understanding and use of dbt. Moreover, I have discovered that having fun with dbt and fostering a positive, supportive community culture can greatly enhance the learning experience. I hope the others can learn from me that it doesn’t matter who you are, where are you from and how old you are, you can make a difference in the community. I hope to inspire others to become more involved in the community, and to not be afraid to share their thoughts or ideas, or to post something because they think it is not cool enough. Through this process of mutual learning and support, we can accelerate our professional development and achieve our goals. So don't hold back, take initiative, and be an active contributor to this amazing community! #### Anything else interesting you want to tell us?[​](#anything-else-interesting-you-want-to-tell-us "Direct link to Anything else interesting you want to tell us?") I would like to mention my very first contribution to the community, a dbt commands cheatsheet. I made it because I was very new to dbt and wanted a resource where I could quickly check the available commands and what I could do with them. I made it for me, but then I thought it could help other beginners and shared it. I was incredibly surprised when it appeared in a dbt newsletter, and I think that was the starting point for me in the community. At this point, I knew everyone could contribute and felt more comfortable to do more of that. --- ### Christophe Oudar [Home](https://docs.getdbt.com/index.md "Home")[Community spotlight](https://docs.getdbt.com/community/spotlight.md "Community spotlight")[Christophe Oudar](https://docs.getdbt.com/community/spotlight/christophe-oudar.md "Christophe Oudar") Community Award Recipient 2024 ![Christophe Oudar](/img/community/spotlight/christophe-oudar.jpg?v=2) he/him Staff Engineer , Teads Location: Montpellier, France [X](https://x.com/Kayrnt "X") | [LinkedIn](https://www.linkedin.com/in/christopheoudar/ "LinkedIn") | [Substack](https://smallbigdata.substack.com/ "Substack") #### About I joined the dbt Community in November 2021 after exchanging some issues in Github. I currently work as a staff engineer at a scaleup in the ad tech industry called Teads, which I joined 11 years ago as a new grad. I've been using dbt Core on BigQuery since then. I write about data engineering both on [Medium](https://medium.com/@kayrnt) and [Substack](https://smallbigdata.substack.com/). I contribute on [dbt-bigquery](https://github.com/dbt-labs/dbt-bigquery/). I wrote an article that was then featured on the Developer Blog called [BigQuery ingestion-time partitioning and partition copy with dbt](https://docs.getdbt.com/blog/bigquery-ingestion-time-partitioning-and-partition-copy-with-dbt.). #### When did you join the dbt community and in what way has it impacted your career?[​](#when-did-you-join-the-dbt-community-and-in-what-way-has-it-impacted-your-career "Direct link to When did you join the dbt community and in what way has it impacted your career?") I joined the community in November 2021 as a way to explore how to move our in-house data modeling layer to dbt. The transition took over a year while we ensured we could cover all our bases and add missing features to dbt-bigquery. That project was one of stepping stones that helped me to move from senior to staff level at my current job. #### What dbt community leader do you identify with? How are you looking to grow your leadership in the dbt community?[​](#what-dbt-community-leader-do-you-identify-with-how-are-you-looking-to-grow-your-leadership-in-the-dbt-community "Direct link to What dbt community leader do you identify with? How are you looking to grow your leadership in the dbt community?") I identify with leaders that have strong convictions about how data engineering should move forward but remain open to innovation and ideas from everyone to bring the best to the field and make it as inclusive as possible to all cultures and profiles. I think that could mean people like Jordan Tigani or Mark Raasveldt. In the dbt community, my leadership has looked like helping people struggling and offering better ways to simplify one's day to day work when possible. #### What have you learned from community members? What do you hope others can learn from you?[​](#what-have-you-learned-from-community-members-what-do-you-hope-others-can-learn-from-you "Direct link to What have you learned from community members? What do you hope others can learn from you?") I read a lot of articles about dbt, especially when I got started with it. It helped me a lot to build a proper Slim CI that could fit my company's ways of working. I also got to see how data pipelines were done in other companies and the pros and cons of my approaches. I hope I can share more of that knowledge for people to pick what's best for their needs. ​ --- ### Coding contributions ##### Contribute to dbt Packages[​](#contribute-to-dbt-packages "Direct link to Contribute to dbt Packages") ###### Overview[​](#overview "Direct link to Overview") [dbt Packages](https://docs.getdbt.com/docs/build/packages.md) are the easiest way for analytics engineers to get involved with contributing code to the dbt Community, because dbt Packages are just standard [dbt Projects](https://docs.getdbt.com/docs/build/projects.md). If you can create a dbt Project, write a macro, and ref a model: you can make a dbt Package. Packages function much like libraries do in other programming languages. They allow for prewritten, modularized development of code to solve common problems in analytics engineering. You can view all dbt Packages on the [dbt Package Hub](https://hub.getdbt.com/). ###### Contribution opportunities[​](#contribution-opportunities "Direct link to Contribution opportunities") * Create a new package for the dbt Package Hub. This might be a new set of macros or tests that have been useful to you in your projects, a set of models for engaging with a commonly used datasource or anything else that can be done from within a dbt project. * Improve an existing package: Alternatively you can help improve an existing package. This can be done by creating and engaging with Issues or by functionality to address an existing issue via opening a PR. ###### Sample contributions[​](#sample-contributions "Direct link to Sample contributions") * [dbt Expectations](https://hub.getdbt.com/calogica/dbt_expectations/latest/) * [dbt Artifacts](https://hub.getdbt.com/brooklyn-data/dbt_artifacts/latest/) ###### Get started[​](#get-started "Direct link to Get started") * Use packages in your own projects! The best way to know how to improve a package is to use it in a production environment then look for ways it can be modified or improved. * Read the following resources on package development: * [So You Want to Build a dbt Package](https://docs.getdbt.com/blog/so-you-want-to-build-a-package) * [Package Best Practices](https://github.com/dbt-labs/hubcap/blob/main/package-best-practices.md) * Need help: Visit #package-ecosystem in the dbt Slack ##### Contribute to dbt open source or source-available software[​](#contribute-to-dbt-open-source-or-source-available-software "Direct link to Contribute to dbt open source or source-available software") ###### Overview[​](#overview-1 "Direct link to Overview") dbt Core and the dbt Fusion engine, adapters, tooling, and the sites powering the Package Hub and Developer Hub are all vibrant community projects. Unlike dbt Packages, contributing code to these projects typically requires some working knowledge of programming languages outside of SQL and Jinja, but the supportive community around these repositories can help you advance those skills. Even without contributing code, there are many ways to be part of communal development in these projects, detailed below. You can find a curated list of the most active OSS/SA projects that dbt Labs supports [here](https://docs.getdbt.com/community/resources/oss-sa-projects.md). ###### Contribution opportunities[​](#contribution-opportunities-1 "Direct link to Contribution opportunities") There are three primary ways to contribute to the dbt projects. We’ll use the dbt Fusion engine as an example, as the "front door" to the dbt ecosystem and a great place to start for newcomers: * [Open an issue](https://github.com/dbt-labs/dbt-fusion/issues/new/choose) to suggest an improvement or give feedback. * Comment / engage on existing [issues](https://github.com/dbt-labs/dbt-fusion/issues) or [discussions](https://github.com/dbt-labs/dbt-fuson/discussions). This could be upvoting issues that would be helpful for your organization, commenting to add nuance to a feature request or sharing how a feature would impact your dbt usage. * Create a pull request that resolves an open Issue. This involves writing the code and tests that add the feature/resolve the bug described in an Issue, and then going through the code review process asynchronously with a dbt Labs engineer. Note that signed commits are required when contributing to dbt Core. For steps on how to sign commits, see [Signing commits](https://docs.github.com/en/authentication/managing-commit-signature-verification/signing-commits). ###### Sample contributions[​](#sample-contributions-1 "Direct link to Sample contributions") * Check out [this issue](https://github.com/dbt-labs/dbt-core/issues/3612) about improving error messages and [the PR that the community contributed to fix it](https://github.com/dbt-labs/dbt-core/pull/3703). * From the above issue [another issue was generated](https://github.com/dbt-labs/dbt-bigquery/issues/202) to change not just the error message, but improve the behavior. This is the virtuous cycle of community-driven development! Bit by bit we, the community, craft the tool to better fit our needs. ###### Get started[​](#get-started-1 "Direct link to Get started") * Read the dbt Core [contribution guide](https://github.com/dbt-labs/dbt-core/blob/main/CONTRIBUTING.md) and the [Contributor Expectations](https://docs.getdbt.com/community/resources/contributor-expectations.md). * If contributing to the dbt Fusion engine, find an issue labeled “[good first issue](https://github.com/dbt-labs/dbt-fusion/issues?q=is%3Aopen+is%3Aissue+label%3Agood_first_issue)”, or look for similar labels on other repositories. If in doubt, also feel free to ask the maintainers for a good first issue, they’ll be excited to welcome you! ###### Need help?[​](#need-help "Direct link to Need help?") The following channels in the dbt Community Slack are a great place to ask questions: * \#dbt-core-development * \#adapter-ecosystem #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Community Forum guidelines #### What is the dbt Community Forum?[​](#what-is-the-dbt-community-forum "Direct link to What is the dbt Community Forum?") [The dbt Community Forum](https://discourse.getdbt.com) is the platform for getting support with dbt as well as to have long-lived discussions about dbt, analytics engineering, and analytics. It's a place for us to build up a long-lasting knowledge base around the common challenges, opportunities, and patterns we work with every day. The forum is different from the dbt Community Slack in a few key ways, most importantly it is: * **Asynchronous** and **long-lived** - sometimes conversations continue over weeks, months, and beyond. * **Intentional** - we recommend taking at least 5 to 10 minutes thinking about and shaping your initial post and any comments. * **Citable** - Slack conversations tend to be great in the moment but get lost in the flow — forum posts can more easily shared and referenced. **Guidelines for engaging on the Forum** The community [Rules of the Road](https://docs.getdbt.com/community/resources/community-rules-of-the-road.md) apply, and following them is the best way to get momentum behind your discussion or answers to your questions. The following guidelines will set you up for success: * Be respectful * Put effort into your posts * Mark replies as Solutions in the Help section * Don’t double post #### Categories[​](#categories "Direct link to Categories") The forum is broken down into three categories: * [Help](https://discourse.getdbt.com/c/help/19) * This is a Q\&A style forum where you can ask the dbt Community for help with specific questions about dbt, dbt, data modeling, or anything else you want a definitive answer on. * This category is for questions which can plausibly have a *single correct answer*. * ✅ How do I debug this Jinja error? * ✅ How do I set up CI in dbt on GitHub? * ❌ What is the best way to do marketing attribution? (More general Discussions like this are perfect for the [In-Depth Discussions](https://discourse.getdbt.com/c/discussions/21) category) * [Show and Tell](https://discourse.getdbt.com/c/show-and-tell/22) * This is the place to show off all of the cool things you are doing in dbt. Whether it’s a new macro, design pattern, or package, post here to show the community what you are up to! * [In-Depth Discussions](https://discourse.getdbt.com/c/discussions/21) * Share anything you’re thinking about that has to do with dbt or analytics engineering! This is a great place to jot down some thoughts to share with the community or spark a discussion on a topic that’s currently interesting you. #### Inclusivity on the Community Forum[​](#inclusivity-on-the-community-forum "Direct link to Inclusivity on the Community Forum") We are **strongly** committed to building a community where everyone can feel welcome. The dbt community started with people who were not traditionally considered “technical”, did not have ownership over technical systems, and were often left out of organizational decision-making. We came together to learn from each other, solve hard problems, and help build a new discipline where data folks have greater ownership over our own work. It really matters to us that everyone feels like they can ask questions and engage, no matter their professional or personal background. Technical forums have the potential to replicate harmful power structures, and can feel intimidating or hostile. We are working hard to create and sustain an inclusive environment through community-building, technological solutions, inclusive content, and diverse contributors. This is a long-term project, and we will continue to iterate and make improvements. If you have any ideas or feedback on how to make this space friendlier or more inclusive please let us know, either on the community Slack in the #community-strategy channel or via email at . We want to hear from you! #### Following new and ongoing Discussions[​](#following-new-and-ongoing-discussions "Direct link to Following new and ongoing Discussions") The best way to stay up to date is to [browse the forum](https://discourse.getdbt.com/) directly. You can also Track or Watch specific threads or the whole category to receive updates on them without commenting. Each category also has a companion Slack channel (#advice-dbt-for-beginners, #show-and-tell and #in-depth-discussions). You can reply to the initial post in Slack and it will be added as a comment on the forum thread, allowing you to participate from inside Slack if you prefer. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Community spotlight [Skip to main content](#__docusaurus_skipToContent_fallback) [Join our free, Fast track to dbt workshop on April 7 or 8. Build and run your first dbt models!](https://www.getdbt.com/resources/webinars/fast-track-to-dbt-workshop/?utm_medium=internal\&utm_source=docs\&utm_campaign=q1-2027_fast-track-dbt-workshop_aw\&utm_content=____\&utm_term=all_all__) [![dbt Logo](/img/dbt-logo.svg?v=2)![dbt Logo](/img/dbt-logo-light.svg?v=2)](https://docs.getdbt.com/index.md) [Version: v](#) * FusionCoreAll * [dbt Fusion engine (Latest)]() [Docs](#) * [Product docs](https://docs.getdbt.com/docs/introduction.md) * [References](https://docs.getdbt.com/reference/references-overview.md) * [Best practices](https://docs.getdbt.com/best-practices.md) * [Developer blog](https://docs.getdbt.com/blog) [Guides](https://docs.getdbt.com/guides.md)[APIs](https://docs.getdbt.com/docs/dbt-cloud-apis/overview.md) [Help](#) * [Release notes](https://docs.getdbt.com/docs/dbt-versions/dbt-cloud-release-notes.md) * [FAQs](https://docs.getdbt.com/docs/faqs.md) * [Support and billing](https://docs.getdbt.com/docs/dbt-support.md) * [Fusion Diaries](https://github.com/dbt-labs/dbt-fusion/discussions/categories/announcements) * [Courses](https://learn.getdbt.com) [Community](#) * [Join the dbt Community](https://docs.getdbt.com/community/join.md) * [Become a contributor](https://docs.getdbt.com/community/contribute.md) * [Community forum](https://docs.getdbt.com/community/forum) * [Events](https://docs.getdbt.com/community/events) * [Spotlight](https://docs.getdbt.com/community/spotlight.md) [Account](#) * [Log in to dbt](https://cloud.getdbt.com/) * [Create a free account](https://www.getdbt.com/signup) [Install VS Code extension](https://marketplace.visualstudio.com/items?itemName=dbtLabsInc.dbt) Search The dbt Community is where analytics engineering lives and grows, and you're a part of it! Every quarter we'll be highlighting community members in the dbt Community Spotlight. These are individuals who have gone above and beyond to contribute to the community in a variety of ways. We all see you. We appreciate you. You are awesome. Community Award Recipient 2024 [![Christophe Oudar](/img/community/spotlight/christophe-oudar.jpg?v=2)](https://docs.getdbt.com/community/spotlight/christophe-oudar.md) #### [Christophe Oudar](https://docs.getdbt.com/community/spotlight/christophe-oudar.md) he/him I joined the dbt Community in November 2021 after exchanging some issues in Github. I currently work as a staff engineer at a scaleup in the ad tech industry called Teads, which I joined 11 years ago as a new grad. I've been using dbt Core on BigQuery since then. I write about data engineering bo... [Read More](https://docs.getdbt.com/community/spotlight/christophe-oudar.md) Community Award Recipient 2024 [![Ruth Onyekwe](/img/community/spotlight/ruth-onyekwe.jpeg?v=2)](https://docs.getdbt.com/community/spotlight/ruth-onyekwe.md) #### [Ruth Onyekwe](https://docs.getdbt.com/community/spotlight/ruth-onyekwe.md) she/her I've been working in the world of Data Analytics for over 5 years and have been part of the dbt community for the last 4. With a background in International Business and Digital Marketing, I experienced first hand the need for reliable data to fuel business decisions. This inspired a career move ... [Read More](https://docs.getdbt.com/community/spotlight/ruth-onyekwe.md) Community Award Recipient 2024 [![The Original dbt-athena Maintainers](/img/community/spotlight/dbt-athena-groupheadshot.jpg?v=2)](https://docs.getdbt.com/community/spotlight/original-dbt-athena-maintainers.md) #### [The Original dbt-athena Maintainers](https://docs.getdbt.com/community/spotlight/original-dbt-athena-maintainers.md) The original dbt-athena Maintainers is a group of 5 people—Jérémy Guiselin, Mattia, Jesse Dobbelaere, Serhii Dimchenko, and Nicola Corda—who met via dbt Slack in the #db-athena channel, with the aim to make make [dbt-athena](https://docs.getdbt.com/docs/local/connect-data-platform/athena-setup) a production-ready adapter. In the first periods, Winter 2022 and Spring 202... [Read More](https://docs.getdbt.com/community/spotlight/original-dbt-athena-maintainers.md) Community Award Recipient 2024 [![Mike Stanley](/img/community/spotlight/mike-stanley.jpg?v=2)](https://docs.getdbt.com/community/spotlight/mike-stanley.md) #### [Mike Stanley](https://docs.getdbt.com/community/spotlight/mike-stanley.md) he/him I've split my time between financial services and the video games industry. Back when I wrote code every day, I worked in marketing analytics and marketing technology. I've been in the dbt community for about two years. I haven't authored any extensions to dbt's adapters yet but I've given feedba... [Read More](https://docs.getdbt.com/community/spotlight/mike-stanley.md) Community Award Recipient 2024 [![Meagan Palmer](/img/community/spotlight/Meagan-Palmer.png?v=2)](https://docs.getdbt.com/community/spotlight/meagan-palmer.md) #### [Meagan Palmer](https://docs.getdbt.com/community/spotlight/meagan-palmer.md) she/her I first started using dbt in 2016 or 2017 (I can't remember exactly). Since then, I have moved into data and analytics consulting and have dipped in and out of the dbt Community. Late last year, I started leading dbt Cloud training courses and spending more time in the [dbt Slack](https://www.getdbt.com/community/join-the-community/). In consulting,... [Read More](https://docs.getdbt.com/community/spotlight/meagan-palmer.md) Community Award Recipient 2024 [![Bruno de Lima](/img/community/spotlight/bruno-souza-de-lima-newimage.jpg?v=2)](https://docs.getdbt.com/community/spotlight/bruno-de-lima.md) #### [Bruno de Lima](https://docs.getdbt.com/community/spotlight/bruno-de-lima.md) he/him Hey all! I was born and raised in Florianopolis, Brazil, and I'm a Senior Data Engineer at phData. I live with my fiancée and I enjoy music, photography, and powerlifting. I started my career in early 2022 at Indicium as an Analytics Engineer, working with dbt from day 1. By 2023, my path took a... [Read More](https://docs.getdbt.com/community/spotlight/bruno-de-lima.md) Community Award Recipient 2024 [![Opeyemi Fabiyi](/img/community/spotlight/fabiyi-opeyemi.jpg?v=2)](https://docs.getdbt.com/community/spotlight/fabiyi-opeyemi.md) #### [Opeyemi Fabiyi](https://docs.getdbt.com/community/spotlight/fabiyi-opeyemi.md) he/him I’m an Analytics Engineer with Data Culture, a Data Consulting firm where I use dbt regularly to help clients build quality-tested data assets. Before Data Culture, I worked at Cowrywise, one of the leading Fintech companies in Nigeria, where I was a solo data team member, and that was my first ... [Read More](https://docs.getdbt.com/community/spotlight/fabiyi-opeyemi.md) Community Award Recipient 2024 [![Jenna Jordan](/img/community/spotlight/jenna-jordan.jpg?v=2)](https://docs.getdbt.com/community/spotlight/jenna-jordan.md) #### [Jenna Jordan](https://docs.getdbt.com/community/spotlight/jenna-jordan.md) she/her I am a Senior Data Management Consultant with Analytics8, where I advise clients on dbt best practices (especially regarding dbt Mesh and the various shifts in governance and strategy that come with it). My experiences working within a dbt Mesh architecture and all of the difficulties organizatio... [Read More](https://docs.getdbt.com/community/spotlight/jenna-jordan.md) #### Previously on the Spotlight [![Mikko Sulonen](/img/community/spotlight/Mikko-Sulonen.png?v=2)](https://docs.getdbt.com/community/spotlight/mikko-sulonen.md) #### [Mikko Sulonen](https://docs.getdbt.com/community/spotlight/mikko-sulonen.md) he/him I've been working with data since 2016. I first started with the on-prem SQL Server S-stack of SSIS, SSAS, SSRS. I did some QlikView and Qlik Sense, and some Power BI. Nowadays, I work mostly with Snowflake, Databricks, Azure, and dbt, of course. While tools and languages have come and gone, SQL ... [Read More](https://docs.getdbt.com/community/spotlight/mikko-sulonen.md) [![Radovan Bacovic](/img/community/spotlight/Radovan-Bacovic.png?v=2)](https://docs.getdbt.com/community/spotlight/radovan-bacovic.md) #### [Radovan Bacovic](https://docs.getdbt.com/community/spotlight/radovan-bacovic.md) he/him My professional journey and friendship with data started 20 years ago. I've experienced many tools and modalities: from good old RDMS systems and various programming languages like Java and C# in the early days of my career, through private clouds, to MPP databases and multitier architecture. I a... [Read More](https://docs.getdbt.com/community/spotlight/radovan-bacovic.md) [![Johann de Wet](/img/community/spotlight/johann-dewett.jpg?v=2)](https://docs.getdbt.com/community/spotlight/johann-de-wet.md) #### [Johann de Wet](https://docs.getdbt.com/community/spotlight/johann-de-wet.md) he/him I'm forever indebted to my manager, John Pienaar, who introduced me to both dbt and it's community when I joined his team as an Analytics Engineer at the start of 2022. I often joke about my career before dbt and after dbt. Our stack includes Fivetran, Segment, Airflow and BigQuery to name a few.... [Read More](https://docs.getdbt.com/community/spotlight/johann-de-wet.md) [![Tyler Rouze](/img/community/spotlight/tyler-rouze.jpg?v=2)](https://docs.getdbt.com/community/spotlight/tyler-rouze.md) #### [Tyler Rouze](https://docs.getdbt.com/community/spotlight/tyler-rouze.md) he/him My journey in data started all the way back in college where I studied Industrial Engineering. One of the core topics you learn in this program is mathematical optimization, where we often use data files as inputs to model constraints on these kinds of problems! Since then, I've been a data analy... [Read More](https://docs.getdbt.com/community/spotlight/tyler-rouze.md) [![Juan Manuel Perafan](/img/community/spotlight/juan-manuel-perafan.jpg?v=2)](https://docs.getdbt.com/community/spotlight/juan-manuel-perafan.md) #### [Juan Manuel Perafan](https://docs.getdbt.com/community/spotlight/juan-manuel-perafan.md) he/him Born and raised in Colombia! Living in the Netherlands since 2011. I've been working in the realm of analytics since 2017, focusing on Analytics Engineering, dbt, SQL, data governance, and business intelligence (BI). Besides consultancy work, I am very active in the data community. I co-authore... [Read More](https://docs.getdbt.com/community/spotlight/juan-manuel-perafan.md) [![Mariah Rogers](/img/community/spotlight/mariah-rogers.jpg?v=2)](https://docs.getdbt.com/community/spotlight/mariah-rogers.md) #### [Mariah Rogers](https://docs.getdbt.com/community/spotlight/mariah-rogers.md) she/her I got my start in the data world helping create a new major and minor in Data Science at my alma mater. I then became a data engineer, learned a ton, and propelled myself into the clean energy sector. Now I do data things at a clean energy company and geek out on solar energy at work and at home!... [Read More](https://docs.getdbt.com/community/spotlight/mariah-rogers.md) [![Yasuhisa Yoshida](/img/community/spotlight/yasuhisa-yoshida.jpg?v=2)](https://docs.getdbt.com/community/spotlight/yasuhisa-yoshida.md) #### [Yasuhisa Yoshida](https://docs.getdbt.com/community/spotlight/yasuhisa-yoshida.md) he/him I currently work as a data engineer at a startup called [10X.](https://10x.co.jp/) Specifically, I work with BigQuery to provide data marts for business users. Before using dbt, the queries for creating data marts were overly complex and lengthy, resulting in low data quality. With dbt, we have improved our proces... [Read More](https://docs.getdbt.com/community/spotlight/yasuhisa-yoshida.md) [![Safiyy Momen](/img/community/spotlight/safiyy-momen.jpg?v=2)](https://docs.getdbt.com/community/spotlight/safiyy-momen.md) #### [Safiyy Momen](https://docs.getdbt.com/community/spotlight/safiyy-momen.md) he/him I've been in the dbt community for ~4 years now. My experience is primarily in leading data teams, previously at a healthcare startup where I migrated the stack. The dbt Community was invaluable during that time. More recently I've built a product, Aero, that helps Snowflake users optimize costs ... [Read More](https://docs.getdbt.com/community/spotlight/safiyy-momen.md) Community Award Recipient 2023 [![Josh Devlin](/img/community/spotlight/josh-devlin.jpg?v=2)](https://docs.getdbt.com/community/spotlight/josh-devlin.md) #### [Josh Devlin](https://docs.getdbt.com/community/spotlight/josh-devlin.md) he/him Josh Devlin has a rich history of community involvement and technical expertise in both the dbt and wider analytics communities. Discovering dbt in early 2020, he quickly became an integral member of its [community](https://www.getdbt.com/community/join-the-community), leveraging the platform as a learning tool and aiding others along their dbt journe... [Read More](https://docs.getdbt.com/community/spotlight/josh-devlin.md) Community Award Recipient 2023 [![Sydney Burns](/img/community/spotlight/sydney.jpg?v=2)](https://docs.getdbt.com/community/spotlight/sydney-burns.md) #### [Sydney Burns](https://docs.getdbt.com/community/spotlight/sydney-burns.md) she/her In 2019, I started as an analytics intern at a healthcare tech startup. I learned about dbt in 2020 and [joined the community](https://www.getdbt.com/community/join-the-community/) to self-teach. The following year, I started using dbt professionally as a consultant, and was able to pick up various parts of the stack and dive into different implementations. That experience empowered me to strike a better balance between "best practices" and what suits a specific team best. I also [spoke at Coalesce 2022](https://coalesce.getdbt.com/blog/babies-and-bathwater-is-kimball-still-relevant), a highlight of my career! Now, I collaborate with other data professionals at Webflow, where focused on enhancing and scaling our data operations. I strive to share the same enthusias... [Read More](https://docs.getdbt.com/community/spotlight/sydney-burns.md) Community Award Recipient 2023 [![Dakota Kelley](/img/community/spotlight/dakota.jpg?v=2)](https://docs.getdbt.com/community/spotlight/dakota-kelley.md) #### [Dakota Kelley](https://docs.getdbt.com/community/spotlight/dakota-kelley.md) he/him For the last ~2 years I've worked at phData. Before that I spent 8 years working as a Software Developer in the public sector. Currently I'm a Solution Architect, helping our customers and clients implement dbt on Snowflake, working across multiple cloud providers. I first started reading about ... [Read More](https://docs.getdbt.com/community/spotlight/dakota-kelley.md) Community Award Recipient 2023 [![Alison Stanton](/img/community/spotlight/alison.jpg?v=2)](https://docs.getdbt.com/community/spotlight/alison-stanton.md) #### [Alison Stanton](https://docs.getdbt.com/community/spotlight/alison-stanton.md) she/her I started programming 20+ years ago. I moved from web applications into transforming data and business intelligence reporting because it's both hard and useful. The majority of my career has been in engineering for SaaS companies. For my last few positions I've been brought in to transition large... [Read More](https://docs.getdbt.com/community/spotlight/alison-stanton.md) Community Award Recipient 2023 [![Karen Hsieh](/img/community/spotlight/karen-hsieh.jpg?v=2)](https://docs.getdbt.com/community/spotlight/karen-hsieh.md) #### [Karen Hsieh](https://docs.getdbt.com/community/spotlight/karen-hsieh.md) she/her I’m a Product Manager who builds company-wide data literacy and empowers the product team to create values for people and grow the company. Utilizing dbt, I replaced time-consuming spreadsheets by creating key business metric dashboards that improved data literacy, enabling conversations about p... [Read More](https://docs.getdbt.com/community/spotlight/karen-hsieh.md) Community Award Recipient 2023 [![Sam Debruyn](/img/community/spotlight/sam.jpg?v=2)](https://docs.getdbt.com/community/spotlight/sam-debruyn.md) #### [Sam Debruyn](https://docs.getdbt.com/community/spotlight/sam-debruyn.md) he/him I have a background of about 10 years in software engineering and moved to data engineering in 2020. Today, I lead dataroots's data & cloud unit on a technical level, allowing me to share knowledge and help multiple teams and customers, while still being hands-on every day. In 2021 and 2022, I di... [Read More](https://docs.getdbt.com/community/spotlight/sam-debruyn.md) Community Award Recipient 2023 [![Oliver Cramer](/img/community/spotlight/oliver.jpg?v=2)](https://docs.getdbt.com/community/spotlight/oliver-cramer.md) #### [Oliver Cramer](https://docs.getdbt.com/community/spotlight/oliver-cramer.md) he/him When I joined Aquila Capital in early 2022, I had the ModernDataStack with SqlDBM, dbt & Snowflake available. During the first half year I joined the dbt community. I have been working in the business intelligence field for many years. In 2006 I founded the first TDWI Roudtable in the DACH region... [Read More](https://docs.getdbt.com/community/spotlight/oliver-cramer.md) Community Award Recipient 2023 [![Stacy Lo](/img/community/spotlight/stacy.jpg?v=2)](https://docs.getdbt.com/community/spotlight/stacy-lo.md) #### [Stacy Lo](https://docs.getdbt.com/community/spotlight/stacy-lo.md) she/her I began my career as a data analyst, then transitioned to a few different roles in data and software development. Analytics Engineer is the best title to describe my expertise in data. I’ve been in the dbt Community for almost a year. In April, I shared my experience adopting dbt at the [Taipei dbt Meetup](https://www.meetup.com/taipei-dbt-meetup/),... [Read More](https://docs.getdbt.com/community/spotlight/stacy-lo.md) [![Jing Yu Lim](/img/community/spotlight/jing-lim.jpg?v=2)](https://docs.getdbt.com/community/spotlight/jing-yu-lim.md) #### [Jing Yu Lim](https://docs.getdbt.com/community/spotlight/jing-yu-lim.md) she/her For ~3 years, I was a Product Analyst at Grab, a ride-hailing and food delivery app in Southeast Asia, before taking on an Analytics Engineering role in Spenmo, a B2B Fintech startup. I joined a tech company as an analyst in June 2023, but was recently impacted by a layoff. I'm also one of the co... [Read More](https://docs.getdbt.com/community/spotlight/jing-yu-lim.md) [![Alan Cruickshank](/img/community/spotlight/alan-cruickshank.jpg?v=2)](https://docs.getdbt.com/community/spotlight/alan-cruickshank.md) #### [Alan Cruickshank](https://docs.getdbt.com/community/spotlight/alan-cruickshank.md) he/him I've been around in the dbt community, especially the [London dbt Meetup](https://www.meetup.com/london-dbt-meetup/ "London dbt Meetup"), since early 2019—around the time that we started using dbt at tails.com. My background is the startup/scaleup space and building data teams in a context where there is a lot of growth going on but there isn't a lot of money around to support that. That's a topic that I've written and spoken about on several occasions on podcasts, blogposts and even at Coalesce [2020](https://www.getdbt.com/coalesce-2020/presenting-sqlfluff/ "2020") and [2021](https://www.getdbt.com/coalesce-2021/this-is-just-the-beginning/ "2021")! Aside from my work at tails.com, my other main focus at the moment is [SQLFluff](https://sqlfluff.com/ "SQLFluff"), the open source SQL linter which I started developing as p... [Read More](https://docs.getdbt.com/community/spotlight/alan-cruickshank.md) [![Faith Lierheimer](/img/community/spotlight/faith-lierheimer.jpg?v=2)](https://docs.getdbt.com/community/spotlight/faith-lierheimer.md) #### [Faith Lierheimer](https://docs.getdbt.com/community/spotlight/faith-lierheimer.md) she/her I've been a dbt Community member for around a year and a half. I come to the data world from teaching and academic research. Working in data fuses the aspects of those careers that I like the most, which are technical problem solving, and helping non-technical audiences understand data and what t... [Read More](https://docs.getdbt.com/community/spotlight/faith-lierheimer.md) [![Owen Prough](/img/community/spotlight/owen-prough.jpg?v=2)](https://docs.getdbt.com/community/spotlight/owen-prough.md) #### [Owen Prough](https://docs.getdbt.com/community/spotlight/owen-prough.md) he/him Well met, data adventurer! My professional data history is mostly USA healthcare-related (shout out to ANSI X12 claim files) while working with large (10k+ employee) software companies and small (but growing!) startups. My constant companion for the last decade has been SQL of various flavors [https://xkcd.com/927/...](https://xkcd.com/927/) [Read More](https://docs.getdbt.com/community/spotlight/owen-prough.md) [![Shinya Takimoto](/img/community/spotlight/shinya-takimoto.jpg?v=2)](https://docs.getdbt.com/community/spotlight/shinya-takimoto.md) #### [Shinya Takimoto](https://docs.getdbt.com/community/spotlight/shinya-takimoto.md) he/him I have about 3 years of dbt experience. I used to be in a large organization where the challenge was to create a quality analysis infrastructure for EC data managed by my department with a limited number of staff. It was then that I learned about dbt and I still remember the shock I felt when I r... [Read More](https://docs.getdbt.com/community/spotlight/shinya-takimoto.md) [![Anya Prosvetova](/img/community/spotlight/anya-prosvetova.jpg?v=2)](https://docs.getdbt.com/community/spotlight/anya-prosvetova.md) #### [Anya Prosvetova](https://docs.getdbt.com/community/spotlight/anya-prosvetova.md) she/her I’m a Data Engineer with a background in SaaS, consulting, financial services and the creative industries. I help organisations convert data into value, developing data pipelines and automating processes. I’m also a Tableau Visionary and DataDev Ambassador, and one of the organisers of Data + Wom... [Read More](https://docs.getdbt.com/community/spotlight/anya-prosvetova.md) [![David Effiong](/img/community/spotlight/david-effiong.jpg?v=2)](https://docs.getdbt.com/community/spotlight/david-effiong.md) #### [David Effiong](https://docs.getdbt.com/community/spotlight/david-effiong.md) he/him I started my career as a data analyst but I currently work as a data engineer in a financial Institution. I have experience working in both large organisations and startups. I have been in the dbt community for about 1 year and 6 months. I found out about dbt while working at a startup where I im... [Read More](https://docs.getdbt.com/community/spotlight/david-effiong.md) [![Emily Riederer](/img/community/spotlight/emily-riederer.jpg?v=2)](https://docs.getdbt.com/community/spotlight/emily-riederer.md) #### [Emily Riederer](https://docs.getdbt.com/community/spotlight/emily-riederer.md) she/her I'm a long-time dbt user and have been an active community member for a few years. Professionally, I've led a variety of data teams at Capital One spanning analytics, modeling, innersource data tools, and data infrastructure. The common denominator of all of these roles has been the overwhelmin... [Read More](https://docs.getdbt.com/community/spotlight/emily-riederer.md) Get started #### Start building with dbt. The free dbt VS Code extension is the best way to develop locally with the dbt Fusion Engine. [Install free extension](https://marketplace.visualstudio.com/items?itemName=dbtLabsInc.dbt) [Request your demo](https://www.getdbt.com/contact) [![dbt Labs](/img/dbt-logo-light.svg?v=2)](https://docs.getdbt.com/index.md) ###### Resources [VS Code Extension](https://docs.getdbt.com/docs/about-dbt-extension.md) [Resource Hub](https://www.getdbt.com/resources) [dbt Learn](https://www.getdbt.com/dbt-learn) [Certification](https://www.getdbt.com/dbt-certification) [Developer Blog](https://docs.getdbt.com/blog) ###### Community [Join the Community](https://docs.getdbt.com/community/join.md) [Become a Contributor](https://docs.getdbt.com/community/contribute.md) [Open Source dbt Packages](https://hub.getdbt.com/) [Community Forum](https://docs.getdbt.com/community/forum) ###### Support [Contact Support](https://docs.getdbt.com/docs/dbt-support.md) [Professional Services](https://www.getdbt.com/services) [Find a Partner](https://www.getdbt.com/partner-directory) [System Status](https://status.getdbt.com/) ###### Connect with Us [](https://github.com/dbt-labs/docs.getdbt.com "GitHub") [](https://www.linkedin.com/company/dbtlabs/mycompany/ "LinkedIn") [](https://www.youtube.com/channel/UCVpBwKK-ecMEV75y1dYLE5w "YouTube") [](https://www.instagram.com/dbt_labs/ "Instagram") [](https://x.com/dbt_labs "X") [](https://bsky.app/profile/getdbt.com "Bluesky") [](https://www.getdbt.com/community/join-the-community/ "Community Slack") © 2026 dbt Labs, Inc. All Rights Reserved. [Terms of Service](https://www.getdbt.com/terms-of-use/) [Privacy Policy](https://www.getdbt.com/cloud/privacy-policy/) [Security](https://www.getdbt.com/security/) Cookie Settings --- ### Contributor License Agreements #### Why we have a CLA[​](#why-we-have-a-cla "Direct link to Why we have a CLA") As the sponsor of dbt, dbt Labs would like to ensure the long-term viability of dbt and its community. The Contributor License Agreement helps ensure everyone can enjoy dbt with confidence that dbt is here to stay. Specifically, our Contributor License Agreements (CLAs) grant the contributor and dbt Labs joint copyright interest in contributed code. Further, it provides assurance from the contributor that contributions are original work that does not violate any third-party license agreement. The agreement between contributors and project is explicit, so dbt users can be confident in the legal status of the source code and their right to use it. #### Our CLAs[​](#our-clas "Direct link to Our CLAs") For all code contributions to dbt, we ask that contributors complete and sign a Contributor License Agreement. We have two different CLAs, depending on whether you are contributing to dbt in a personal or professional capacity: * [Individual Contributor License Agreement v1.0](https://docs.google.com/forms/d/e/1FAIpQLScfOV7K4enYRHozrDRP6BBIXjOij-JDGca6WBTHyP_ANXSqlg/viewform?usp=sf_link) * [Software Grant and Corporate Contributor License Agreement v1.0](https://docs.google.com/forms/d/e/1FAIpQLScDSTwGIlVyGWCMMvmszaXSE5IhIIRyeLQkgWf1-CSC2RnLww/viewform?usp=sf_link) * [Licenses FAQ](http://www.getdbt.com/licenses-faq) #### For Lawyers[​](#for-lawyers "Direct link to For Lawyers") Our individual and corporate CLAs are based on the Contributor License Agreements published by the [Apache Software Foundation](http://www.apache.org/), with modifications: * [Individual Contributor License Agreement ("Agreement") V2.0](http://www.apache.org/licenses/icla.txt) * [Software Grant and Corporate Contributor License Agreement ("Agreement") v r190612](http://www.apache.org/licenses/cla-corporate.txt) * [Licenses FAQ](http://www.getdbt.com/licenses-faq) If you have questions about these CLAs, please contact us at . #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Dakota Kelley [Home](https://docs.getdbt.com/index.md "Home")[Community spotlight](https://docs.getdbt.com/community/spotlight.md "Community spotlight")[Dakota Kelley](https://docs.getdbt.com/community/spotlight/dakota-kelley.md "Dakota Kelley") Community Award Recipient 2023 ![Dakota Kelley](/img/community/spotlight/dakota.jpg?v=2) he/him Solution Architect , phData Location: Edmond, USA [LinkedIn](https://www.linkedin.com/in/dakota-kelley/ "LinkedIn") #### About For the last ~2 years I've worked at phData. Before that I spent 8 years working as a Software Developer in the public sector. Currently I'm a Solution Architect, helping our customers and clients implement dbt on Snowflake, working across multiple cloud providers. I first started reading about dbt when I was in grad school about 3 years ago. When I began with phData I had a fantastic opportunity to work with dbt. From there I feel in love with the Engineering practices and structure that I always felt were missing from Data Work. Since then, I've been fortunate enough to speak at Coalesce 2022 and at [Coalesce 2023](https://www.youtube.com/watch?v=414-URZnZVY). On top of this, I've written numerous blogs about dbt as well. #### When did you join the dbt community and in what way has it impacted your career?[​](#when-did-you-join-the-dbt-community-and-in-what-way-has-it-impacted-your-career "Direct link to When did you join the dbt community and in what way has it impacted your career?") I [joined the dbt Community](https://www.getdbt.com/community/join-the-community/) not too long after my first working experience. One of my passions is giving back and helping others, and being a part of the community allows me to help others with problems I've tackled before. Along the way it helps me learn new ways and see different methods to solve a wide variety of problems. Every time I interact with the community I've learned something new and that energizes me. #### What dbt community leader do you identify with? How are you looking to grow your leadership in the dbt community?[​](#what-dbt-community-leader-do-you-identify-with-how-are-you-looking-to-grow-your-leadership-in-the-dbt-community "Direct link to What dbt community leader do you identify with? How are you looking to grow your leadership in the dbt community?") This is a tough one. I know there are several, but the main qualities I resonate with are from those who dig in and help each other. There are always nuances to others situations, and it's good to dig in together, understand those, and seek a solution. The other quality I look for is someone who is trying to pull others up with them. At the end of the day we should all be striving to make all things better than they were when we arrived, regardless if that's the dbt Community or the local park we visit for rest and relaxation. #### What have you learned from community members? What do you hope others can learn from you?[​](#what-have-you-learned-from-community-members-what-do-you-hope-others-can-learn-from-you "Direct link to What have you learned from community members? What do you hope others can learn from you?") The thing I hope others take away from me, is to genuinely support others and tackle problems with curiosity. There used to be a time where I was always worried about being wrong, so I wouldn't get too involved. It's okay to be wrong, that's how we learn new ways to handle problems and find new ways to grow. We just all have to be open to learning and trying our best to help and support each other. --- ### David Effiong [Home](https://docs.getdbt.com/index.md "Home")[Community spotlight](https://docs.getdbt.com/community/spotlight.md "Community spotlight")[David Effiong](https://docs.getdbt.com/community/spotlight/david-effiong.md "David Effiong") ![David Effiong](/img/community/spotlight/david-effiong.jpg?v=2) he/him Data Engineer , Sterling Bank PLC Location: Lagos, Nigeria Organizations: Young Data Professionals [Twitter](https://www.twitter.com/@david_uforo "Twitter") | [LinkedIn](https://www.linkedin.com/in/david-effiong "LinkedIn") | [YouTube](https://www.youtube.com/@daviddata "YouTube") #### About I started my career as a data analyst but I currently work as a data engineer in a financial Institution. I have experience working in both large organisations and startups. I have been in the dbt community for about 1 year and 6 months. I found out about dbt while working at a startup where I implemented a modern data stack using BigQuery, Airbyte, Metabase, and dbt. Currently my stack in my large organisation includes Azure tools + dbt. (😁 Of course I had to use dbt!) I have a YouTube channel where I share learnings about data and productivity. The name of my channel is David Data, please check it out. I spoke at the first in-person Lagos dbt meetup about and I am currently an organiser of the Lagos dbt meetup. #### When did you join the dbt community and in what way has it impacted your career?[​](#when-did-you-join-the-dbt-community-and-in-what-way-has-it-impacted-your-career "Direct link to When did you join the dbt community and in what way has it impacted your career?") I joined the dbt community late 2021 when I joined the startup. I was a data team of one with little experience in the domain and dbt community was and has remained impactful to my career. With the help of the community I was able to build a data stack as a team of one because there was always support to answer questions I post on the community. The community is so rich with value from conversations that you can read through threads and learn best practices or diverse approaches to problem solving. The dbt community has also been of great help to me in my current organisation in implementing dbt as part of the stack for data quality assurance purposes. The community is open to support anyone regardless of nationality or skill level and I am happy and grateful to be a part of this community. #### What dbt community leader do you identify with? How are you looking to grow your leadership in the dbt community?[​](#what-dbt-community-leader-do-you-identify-with-how-are-you-looking-to-grow-your-leadership-in-the-dbt-community "Direct link to What dbt community leader do you identify with? How are you looking to grow your leadership in the dbt community?") I identify with Opeyemi Fabiyi as a community leader. Opeyemi introduced me to dbt as a tool and as a community. Based on this belief in the power of communities, he went on to start Young Data Professionals, pioneered dbt meetups in Lagos, Nigeria, and also spoke at Coalesce 2022. I am looking to grow my leadership in the community by interacting more in community conversations, organizing more dbt meetups this year and also by continuing to share my dbt learning videos on my YouTube Channel. #### What have you learned from community members? What do you hope others can learn from you?[​](#what-have-you-learned-from-community-members-what-do-you-hope-others-can-learn-from-you "Direct link to What have you learned from community members? What do you hope others can learn from you?") I have learned solutions to technical problems from community members. I have also learned empathy and patience from community members while interacting with others. I hope I can provide technical solutions to other community members and also do it with patience and empathy. I also hope others can learn to be more involved in the community because the community has only grown because of people, and as more people get involved, more impact is made. #### Anything else interesting you want to tell us?[​](#anything-else-interesting-you-want-to-tell-us "Direct link to Anything else interesting you want to tell us?") Outside of work, I play the piano and sing in the choir. I also write a faith based blog, [The Writings of David Uforo](https://daviduforo.wordpress.com/). You may want to check out. --- ### dbt Community Code of Conduct This Code of Conduct applies to all dbt Community spaces, both online and offline. This includes Slack, Discourse, code repositories (dbt Core, the dbt Fusion engine, dbt packages, etc.), dbt Labs virtual events, and Meetups. Participants are responsible for knowing and abiding by this Code of Conduct. This Code of Conduct has two sections: * **dbt Community Values:** These values apply to all of our community spaces, and all of our guidelines are based on these values. * **Anti-harassment policy:** We are dedicated to providing a harassment-free experience for everyone in our community — here, we outline exactly what that means. We appreciate your support in continuing to build a community we’re all proud of. — The dbt Community Admin Team. #### dbt Community Values[​](#dbt-community-values "Direct link to dbt Community Values") ##### Create more value than you capture.[​](#create-more-value-than-you-capture "Direct link to Create more value than you capture.") Each community member should strive to create more value in the community than they capture. This is foundational to being a community. Ways to demonstrate this value: * [Coding contributions](https://docs.getdbt.com/community/contributing/contributing-coding.md): Contribute to dbt Core, a package, or an adapter. Beyond implementing new functionality, you can also open issues or participate in discussions. * [Writing contributions](https://docs.getdbt.com/community/contributing/contributing-writing.md): You can suggest edits to every page of the dbt documentation, or suggest a topic for the dbt Developer Blog. * [Join in online](https://docs.getdbt.com/community/contributing/contributing-online-community.md): Ask and answer questions on the Discourse forum, kick off a lively discussion in Slack, or even maintain a Slack channel of your own. * [Participate in events](https://docs.getdbt.com/community/contributing/contributing-realtime-events.md): Organise a community Meetup, speak at an event, or provide office space/sponsorship for an existing event. ##### Be you.[​](#be-you "Direct link to Be you.") Some developer communities allow and even encourage anonymity — we prefer it when people identify themselves clearly. It helps to build empathy, and form relationships. Ways to demonstrate this value: * Update your profile on dbt Community platforms to include your name and a clear picture of yourself. Where available, use the "what I do" section to add your role, title and current company. * Join your `#local-` channel in Slack, or if it doesn't exist then propose a new one. * Write in your own voice, and offer your own advice, rather than speaking in your company’s marketing or support voice. ##### Encourage diversity and participation.[​](#encourage-diversity-and-participation "Direct link to Encourage diversity and participation.") People with different mindsets and experiences, working together, create better outcomes. This includes diversity of race and gender, as well as the diversity of academic and career backgrounds, socio-economic backgrounds, geographic backgrounds, ideologies, and interests. Ways to demonstrate this value: * Make everyone in our community feel welcome, regardless of their background, and do everything possible to encourage participation in our community. * Demonstrate empathy for a community member’s experience — not everyone comes from the same career background, so adjust answers accordingly. * If you are sourcing speakers for events, put in additional effort to find speakers from underrepresented groups. ##### Be curious.[​](#be-curious "Direct link to Be curious.") Always ask yourself "why?" and strive to be continually learning. Ways to demonstrate this value: * Try solving a problem yourself before asking for help, e.g. rather than asking "what happens when I do X", experiment and observe the results! * When asking questions, explain the "why" behind your decisions, e.g. "I’m trying to solve X problem, by writing Y code. I’m getting Z problem" * When helping someone else, explain why you chose that solution, or if no solution exists, elaborate on the reason for that, e.g. "That’s not possible in dbt today — but here’s a workaround / check out this GitHub issue for a relevant discussion" #### Anti-harassment policy[​](#anti-harassment-policy "Direct link to Anti-harassment policy") We are dedicated to providing a harassment-free experience for everyone. We do not tolerate harassment of participants in any form. Harassment includes: * Offensive comments related to gender, gender identity and expression, sexual orientation, disability, mental illness, neuro(a)typicality, physical appearance, body size, age, race, or religion. * Unwelcome comments regarding a person’s lifestyle choices and practices, including those related to food, health, parenting, drugs, and employment. * Deliberate misgendering or use of ‘dead’ or rejected names. * Gratuitous or off-topic sexual images or behaviour in spaces where they’re not appropriate. * Physical contact and simulated physical contact (eg, textual descriptions like "*hug*" or "*backrub*") without consent or after a request to stop. * Threats of violence. * Incitement of violence towards any individual, including encouraging a person to commit suicide or to engage in self-harm. * Deliberate intimidation. * Stalking or following. * Harassing photography or recording, including logging online activity for harassment purposes. * Sustained disruption of discussion. * Unwelcome sexual attention. * Pattern of inappropriate social contact, such as requesting/assuming inappropriate levels of intimacy with others * Continued one-on-one communication after requests to cease. * Deliberate "outing" of any aspect of a person’s identity without their consent except as necessary to protect vulnerable people from intentional abuse. * Publication of non-harassing private communication. Be mindful that others may not want their image or name on social media. Ask permission prior to posting about another person at in-person events. The dbt Community prioritizes marginalized people’s safety over privileged people’s comfort. The dbt Community Admin team reserves the right not to act on complaints regarding: * ‘Reverse’ -isms, including ‘reverse racism,’ ‘reverse sexism,’ and ‘cisphobia’ * Reasonable communication of boundaries, such as "leave me alone," "go away," or "I’m not discussing this with you." * Communicating in a ‘tone’ you don’t find congenial * Criticizing racist, sexist, cissexist, or otherwise oppressive behavior or assumptions ##### Reporting harassment[​](#reporting-harassment "Direct link to Reporting harassment") If you are being harassed by a member of the dbt Community, notice that someone else is being harassed, or have any other concerns, please contact us at or use the workflows in [#moderation-and-administration](https://getdbt.slack.com/archives/C02JJ8N822H) on Slack. We will respect confidentiality requests for the purpose of protecting victims of abuse. At our discretion, we may publicly name a person about whom we’ve received harassment complaints, or privately warn third parties about them, if we believe that doing so will increase the safety of dbt community members or the general public. We will not name harassment victims without their affirmative consent. ##### Consequences[​](#consequences "Direct link to Consequences") Participants asked to stop any harassing behavior are expected to comply immediately. If a participant engages in harassing behavior, the dbt Community Admin team may take any action they deem appropriate, up to and including expulsion from all dbt Community spaces and identification of the participant as a harasser to other dbt Community members or the general public. #### dbt Summit and other events[​](#dbt-summit-and-other-events "Direct link to dbt Summit and other events") We reserve the right to prohibit certain directly competitive companies from attending and / or sponsoring dbt Summit and other dbt Labs-hosted events. While we actively support the broader ecosystem around dbt, it is not the right business decision for us to allow directly competitive companies to market to or sell to our users and customers at events that we invest very significant company resources into hosting. Any event declines will be handled directly with the individuals / companies in question, and full refunds will be issued. Events that are hosted by other members of the dbt community, such as most dbt Community Meetups, are free to make their own guidelines about attendance. #### Credits[​](#credits "Direct link to Credits") Credit to [01.org](https://01.org/community/participation-guidelines), [Tizen.org](https://www.tizen.org/community/guidelines), and [Geek Feminism](https://geekfeminism.wikia.org/wiki/Community_anti-harassment/Policy) for some of the wording used in this Code of Conduct. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### dbt Community Rules of the Road As of June 2023, the dbt Community includes over 50,000 data professionals and is still growing. People genuinely love this community. It's filled with smart, kind, and helpful people who share our commitment to elevating the analytics profession. We are committed to maintaining the spirit of this community, and have written these rules alongside its members to help everyone understand how to best participate. We appreciate your support in continuing to build a community we're all proud of. #### Expectations for all members[​](#expectations-for-all-members "Direct link to Expectations for all members") ##### Rule 1: Be respectful[​](#rule-1-be-respectful "Direct link to Rule 1: Be respectful") We want everyone in this community to have a fulfilling and positive experience. Therefore, this first rule is serious and straightforward; we simply will not tolerate disrespectful behavior of any kind. Everyone interacting on a dbt platform – including Slack, the forum, codebase, issue trackers, and mailing lists – is expected to follow the [Community Code of Conduct](https://docs.getdbt.com/community/resources/code-of-conduct.md). If you are unable to abide by the code of conduct set forth here, we encourage you not to participate in the community. ##### Rule 2: Keep it in public spaces[​](#rule-2-keep-it-in-public-spaces "Direct link to Rule 2: Keep it in public spaces") Unless you have someone's express permission to contact them directly, do not directly message other community members, whether on a dbt Community platform or other spaces like LinkedIn. We highly value the time community members put into helping each other, and we have precisely zero tolerance for people who abuse their access to experienced professionals. If you are being directly messaged with requests for assistance without your consent, let us know in the [#moderation-and-administration](https://getdbt.slack.com/archives/C02JJ8N822H) Slack channel. We will remove that person from the community. Your time and attention is valuable. ##### Rule 3: Follow messaging etiquette[​](#rule-3-follow-messaging-etiquette "Direct link to Rule 3: Follow messaging etiquette") In short: put effort into your question, use threads, post in the right channel, and do not seek extra attention by tagging individuals or double-posting. For more information, see our [guide on getting help](https://docs.getdbt.com/community/resources/getting-help.md). ##### Rule 4: Do not solicit community members[​](#rule-4-do-not-solicit-community-members "Direct link to Rule 4: Do not solicit community members") This community is built for data practitioners to discuss the work that they do, the ideas that they have, and the things that they are learning. It is decidedly not intended to be lead generation for vendors or recruiters. Vendors and recruiters are subject to additional rules to ensure this space remains welcoming to everyone. These requirements are detailed below and are enforced vigorously. #### Vendor expectations[​](#vendor-expectations "Direct link to Vendor expectations") As a vendor/dbt partner, you are also a member of this community, and we encourage you to participate fully in the space. We have seen folks grow fantastic user relationships for their products when they come in with the mindset to share rather than pushing a pitch. At the same time, active community members have a finely honed sense of when they are being reduced to an audience or a resource to be monetized, and their response is reliably negative. Who is a vendor? Vendors are generally individuals belonging to companies that are creating products or services primarily targeted at data professionals, but this title also includes recruiters, investors, open source maintainers (with or without a paid offering), consultants and freelancers. If in doubt, err on the side of caution. ##### Rule 1: Identify yourself[​](#rule-1-identify-yourself "Direct link to Rule 1: Identify yourself") Include your company in your display name, e.g. "Alice (DataCo)". When joining a discussion about your product (after the waiting period below), be sure to note your business interests. ##### Rule 2: Let others speak first[​](#rule-2-let-others-speak-first "Direct link to Rule 2: Let others speak first") If a community member asks a question about your product directly, or mentions that they have a problem that your product could help with, wait 1 business day before responding to allow other members to share their experiences and recommendations. (This doesn't apply to unambiguously support-style questions from existing users, or in your `#tools-` channel if you have one). ##### Rule 3: Keep promotional content to specified spaces[​](#rule-3-keep-promotional-content-to-specified-spaces "Direct link to Rule 3: Keep promotional content to specified spaces") As a space for professional practice, the dbt Community is primarily a non-commercial space. However, as a service to community members who want to be able to keep up to date with the data industry, there are several areas available on the Community Slack for vendors to share promotional material: * [#vendor-content](https://getdbt.slack.com/archives/C03B0Q4EBL3) * [#events](https://getdbt.slack.com/archives/C80RCAZ5E) * \#tools-\* (post in [#moderation-and-administration](https://getdbt.slack.com/archives/C02JJ8N822H) to request a channel for your tool/product) Recruiters may also post in [#jobs](https://getdbt.slack.com/archives/C7A7BARGT)/[#jobs-eu](https://getdbt.slack.com/archives/C04JMHHK6CD) but may not solicit applications in DMs. The definition of "vendor content" can be blurry at the edges, and we defer to members' instincts in these scenarios. As a rule, if something is hosted on a site controlled by that company or its employees (including platforms like Substack and Medium), or contains a CTA such as signing up for a mailing list or trial account, it will likely be considered promotional. ##### One more tip: Be yourself[​](#one-more-tip-be-yourself "Direct link to One more tip: Be yourself") Speak in your own voice, and join in any or all of the conversations that interest you. Share your expertise as a data professional. Make a meme if you're so inclined. Get in a (friendly) debate. You are not limited to only your company's products and services, and making yourself known as a familiar face outside of commercial contexts is one of the most effective ways of building trust with the community. Put another way, [create more value than you capture](https://docs.getdbt.com/community/resources/code-of-conduct.md#create-more-value-than-you-capture). Because unaffiliated community members are able to share links in any channel, the most effective way to have your work reach a wider audience is to create things that are genuinely useful to the community. #### Handling violations[​](#handling-violations "Direct link to Handling violations") The point of these rules is not to find opportunities to punish people, but to ensure the longevity of the community. Participation in this community is a privilege, and we reserve the right to remove people from it. To report an issue or appeal a judgement, email or use the workflows in [#moderation-and-administration](https://getdbt.slack.com/archives/C02JJ8N822H) on Slack. Violations related to our anti-harassment policy will result in immediate removal. Other issues are handled in proportion to their impact, and may include: * a friendly, but public, reminder that the behavior is inappropriate according to our guidelines. * a private message with a warning that any additional violations will result in removal from the community. * temporary or permanent suspension of your account. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### dbt Labs Community #jobs Channels Terms and Conditions I agree to abide by the [dbt Community Code of Conduct](https://docs.getdbt.com/community/resources/code-of-conduct.md) and all laws applicable to me in my use of the dbt Community's #jobs channels. I further agree: * dbt Labs is not responsible for not does it warrant or guarantee the validity, accuracy, completeness, legality, or reliability of any functionality of any #jobs channel, any posting's content, or any application and/or solicitation of any kind of employment. * dbt Labs does not review and approve job-related content. * dbt Labs disclaims liability of any kind whatsoever for any type of damage that occurs while using the community Slack for job-related reasons, and I waive any type of claim (including actual, special or consequential damages) to the maximum extent permitted by law. * Without limitation, dbt Labs disclaims liability for quality, performance, merchantability, and fitness for a particular purpose, express or implied, that may arise out of my use of the community Slack for job-related content, my reliance on such information, and/or my provision/receipt of job-related information. * I understand that no internet-based site is without risk, and my use is at my own risk. * My use of any job-posting template (or other forum for providing job-related information) confirms my consent to provide the data posted, confirms that I have permission to post such data, and is subject to the terms of the [dbt Labs privacy policy](https://www.getdbt.com/cloud/privacy-policy). For further information, please contact . #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Emily Riederer [Home](https://docs.getdbt.com/index.md "Home")[Community spotlight](https://docs.getdbt.com/community/spotlight.md "Community spotlight")[Emily Riederer](https://docs.getdbt.com/community/spotlight/emily-riederer.md "Emily Riederer") ![Emily Riederer](/img/community/spotlight/emily-riederer.jpg?v=2) she/her Senior Manager - Data Science & Analytics , Capital One Location: Chicago, IL Organizations: rOpenSci Editorial Board [Twitter](https://twitter.com/emilyriederer "Twitter") | [LinkedIn](https://linkedin.com/in/emilyriederer "LinkedIn") | [Website](https://emilyriederer.com "Website") | [GitHub](https://github.com/emilyriederer "GitHub") | [Mastodon](https://mastodon.social/@emilyriederer "Mastodon") #### About I'm a long-time dbt user and have been an active community member for a few years. Professionally, I've led a variety of data teams at Capital One spanning analytics, modeling, innersource data tools, and data infrastructure. The common denominator of all of these roles has been the overwhelming importance of high quality data processing pipelines. Outside of work, I enjoy doing pro bono projects and applying my same skillset to scrappier environments. My work with the dbt community is motivated by a passion for data quality and developer tooling. Some of my recent contributions include maintaining the `dbtplyr` package, speaking at Coalesce 2021, and [writing a dbt Developer Blog post](https://docs.getdbt.com/blog/grouping-data-tests "writing a dbt Developer Blog post") about my PR to the `dbt-utils` test suite. #### When did you join the dbt community and in what way has it impacted your career?[​](#when-did-you-join-the-dbt-community-and-in-what-way-has-it-impacted-your-career "Direct link to When did you join the dbt community and in what way has it impacted your career?") I have been involved in the dbt community for a few years now. While I enjoy being actively engaged, one of my favorite parts is simply "lurking" on the Slack channels. The data space is moving so fast right now with so many different competing frameworks, tools, and ideas. At the same time, data work tends to be less discussed and publicly shared than analysis methods (e.g. new modeling packages) due to data privacy and IP. I've found no better place to "drink from the firehouse" and benefit from the insights of others questions, challenges, and successes. #### What dbt community leader do you identify with? How are you looking to grow your leadership in the dbt community?[​](#what-dbt-community-leader-do-you-identify-with-how-are-you-looking-to-grow-your-leadership-in-the-dbt-community "Direct link to What dbt community leader do you identify with? How are you looking to grow your leadership in the dbt community?") Two community members that really inspire me are Claire Carroll and Joel Labes. I think both showcase excellence in technical best practices, crystal-clear communication of technical concepts in their prolific writing, and a passion for building community and creating on-ramps. That mix of so-called 'hard' and 'soft' skills adds so much to the community and helps empower every member to be their best. I'm always looking to balance the time I spend growing my skills along both dimensions. #### What have you learned from community members? What do you hope others can learn from you?[​](#what-have-you-learned-from-community-members-what-do-you-hope-others-can-learn-from-you "Direct link to What have you learned from community members? What do you hope others can learn from you?") Given my passion for data quality and the design side of data, I particularly enjoy thinking about data modeling and learning from the community's experience with the variety of classical and novel frameworks for designing resilient, flexible datamarts. As a passionate fan of open-source (and the also-thriving #rstats community), I hope to inspire others to create more packages and PRs that expand the developer toolkit. I also particularly enjoy discussing my thoughts on data quality and avoiding data disasters. #### Anything else interesting you want to tell us?[​](#anything-else-interesting-you-want-to-tell-us "Direct link to Anything else interesting you want to tell us?") test My passion for open-source tools and open-knowledge extends beyond dbt. I also enjoy serving on the editorial board for rOpenSci to champion the creation of open-source research software, reviewing technical books for CRC Press, doing pro-bono data projects, and sharing my own learnings through conference talks and writing (including on my website, guest blogs, and books including [R Markdown Cookbook](https://bookdown.org/yihui/rmarkdown-cookbook/) and [97 Things Every Data Engineer Should Know](https://www.oreilly.com/library/view/97-things-every/9781492062400/)). --- ### Expectations for dbt contributors Whether it's `dbt-core`, `dbt-fusion`, adapters, packages, or this very documentation site, contributing to the open source or source-available code that supports the dbt ecosystem is a great way to share your knowledge, level yourself up as a developer, and to give back to the community. The goal of this page is to help you understand what to expect when contributing to dbt ecosystem projects. Have you seen things in other projects that you like, and think we could learn from? [Open a discussion on the dbt Community Forum](https://discourse.getdbt.com), or start a conversation in the [dbt Community Slack](https://www.getdbt.com/community/join-the-community) (for example: `#community-strategy`, `#dbt-core-development`, `#package-ecosystem`, `#adapter-ecosystem`). We always appreciate hearing from you! #### Principles[​](#principles "Direct link to Principles") ##### dbt is a team sport[​](#dbt-is-a-team-sport "Direct link to dbt is a team sport") We all build dbt together -- whether you write code or contribute your ideas. By using dbt, you're invested in the future of the tool, and have an active role in pushing forward the standard of analytics engineering. You already benefit from using code and documentation contributed by community members. Contributing to the dbt community is your way to be an active participant in the thing we're all creating together. There's a very practical reason, too: building in public prioritizes our collective knowledge and experience over any one person's. We don't have experience using every database, operating system, security environment, ... We rely on the community of users to hone our product capabilities and documentation to the wide variety of contexts in which it operates. In this way, dbt gets to be the handiwork of thousands, rather than a few dozen. ##### We take seriously our role as maintainers of a standard[​](#we-take-seriously-our-role-as-maintainers-of-a-standard "Direct link to We take seriously our role as maintainers of a standard") As a standard, dbt must be reliable and consistent. Our first priority is ensuring the continued high quality of existing dbt capabilities before we introduce net-new capabilities. We also believe dbt as a framework should be extensible enough to ["make the easy things easy, and the hard things possible"](https://en.wikipedia.org/wiki/Perl#Philosophy). To that end, we *don't* believe it's appropriate for dbt to have an out-of-the-box solution for every niche problem. Users have the flexibility to achieve many custom behaviors by defining their own macros, materializations, hooks, and more. We view it as our responsibility as maintainers to decide when something should be "possible" — via macros, packages, etc. — and when something should be "easy" — built into the dbt standard. So when will we say "yes" to new capabilities for dbt? The signals we look for include: * Upvotes on issues in our GitHub repos * Open source dbt packages trying to close a gap * Technical advancements in the ecosystem In the meantime — we'll do our best to respond to new issues with: * Clarity about whether the proposed feature falls into the intended scope of dbt's source-available components * Context (including links to related issues) * Alternatives and workarounds * When possible, pointers to code that would aid a community contributor ##### Initiative is everything[​](#initiative-is-everything "Direct link to Initiative is everything") Given that we, as maintainers, will not be able to resolve every bug or flesh out every feature request, we empower you, as a community member, to initiate a change. * If you open the bug report, it's more likely to be identified. * If you open the feature request, it's more likely to be discussed. * If you comment on the issue, engaging with ideas and relating it to your own experience, it's more likely to be prioritized. * If you open a PR to fix an identified bug, it's more likely to be fixed. * If you comment on an existing PR, to confirm it solves the concrete problem for your team in practice, it's more likely to be merged. Sometimes, this can feel like shouting into the void, especially if you aren't met with an immediate response. We promise that there are dozens (if not hundreds) of folks who will read your comment, including us as maintainers. It all adds up to a real difference. #### Practicalities[​](#practicalities "Direct link to Practicalities") ##### Discussions[​](#discussions "Direct link to Discussions") A discussion is best suited to propose a Big Idea, such as brand-new capability in the dbt Fusion engine or an adapter. Anyone can open a discussion, comment on an existing one, or reply in a thread. When you open a new discussion, you might be looking for validation from other members of the community — folks who identify with your problem statement, who like your proposed idea, and who may have their own ideas for how it could be improved. The most helpful comments propose nuances or desirable user experiences to be considered in design and refinement. Unlike an **issue**, there is no specific code change that would “resolve” a discussion. If, over the course of a discussion, we reach a consensus on specific elements of a proposed design, we can open new implementation issues that reference the discussion for context. Those issues will connect desired user outcomes to specific implementation details, acceptance testing, and remaining questions that need answering. ##### Issues[​](#issues "Direct link to Issues") An issue could be a bug you've identified while using the product or reading the documentation. It could also be a specific idea you've had for a narrow extension of existing functionality. ###### Best practices for issues[​](#best-practices-for-issues "Direct link to Best practices for issues") * Issues are **not** for support / troubleshooting / debugging help. Please see [dbt support](https://docs.getdbt.com/docs/dbt-support.md) for more details and suggestions on how to get help. * Always search existing issues first, to see if someone else had the same idea / found the same bug you did. * Many dbt repositories offer templates for creating issues, such as reporting a bug or requesting a new feature. If available, please select the relevant template and fill it out to the best of your ability. This information helps us (and others) understand your issue. ###### You've found an existing issue that interests you. What should you do?[​](#youve-found-an-existing-issue-that-interests-you-what-should-you-do "Direct link to You've found an existing issue that interests you. What should you do?") Comment on it! Explain that you've run into the same bug, or had a similar idea for a new feature. If the issue includes a detailed proposal for a change, say which parts of the proposal you find most compelling, and which parts give you pause. ###### You've opened a new issue. What can you expect to happen?[​](#youve-opened-a-new-issue-what-can-you-expect-to-happen "Direct link to You've opened a new issue. What can you expect to happen?") In our most critical repositories (such as `dbt-core` and `dbt-fusion`), our goal is to respond to new issues as soon as possible. This initial response will often be a short acknowledgement that the maintainers are aware of the issue, signalling our perception of its urgency. Depending on the nature of your issue, it might be well suited to an external contribution, from you or another community member. **What if you're opening an issue in a different repository?** We have engineering teams dedicated to active maintenance of [`dbt-core`](https://github.com/dbt-labs/dbt-core) and its component libraries ([`dbt-common`](https://github.com/dbt-labs/dbt-common) + [`dbt-adapters`](https://github.com/dbt-labs/dbt-adapters) (also includes the dbt Labs managed adapters)), as well as [`dbt-fusion`](https://github.com/dbt-labs/dbt-fusion) (the next-generation engine powering the dbt standard). We've open-sourced a number of other software projects over the years, and the majority of them do not have the same activity or maintenance guarantees. Check to see if other recent issues have responses, or when the last commit was added to the `main` branch. **You're not sure about the status of your issue.** If your issue is in an actively maintained repo and has a `triage` label attached, we're aware it's something that needs a response. If the issue has been triaged, but not prioritized, this could mean: * The intended scope or user experience of a proposed feature requires further refinement from a maintainer * We believe the required code change is too tricky for an external contributor We'll do our best to explain the open questions or complexity, and when / why we could foresee prioritizing it. **Automation that can help us:** In many repositories, we use a bot that marks issues as stale if they haven't had any activity for 180 days. This helps us keep our backlog organized and up-to-date. We encourage you to comment on older open issues that you're interested in, to keep them from being marked stale. You're also always welcome to comment on closed issues to say that you're still interested in the proposal. ###### Issue labels[​](#issue-labels "Direct link to Issue labels") In all likelihood, the maintainer who responds will also add a number of labels. Not all of these labels are used in every repository. In some cases, the right resolution to an open issue might be tangential to the codebase. The right path forward might be in another codebase (we'll transfer it), a documentation update, or a change that you can make yourself in user-space code. In other cases, the issue might describe functionality that the maintainers are unwilling or unable to incorporate into the main codebase. In these cases, a maintainer will close the issue (perhaps using a `wontfix` label) and explain why. Some of the most common labels are explained below: | tag | description | | ------------------ | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | `triage` | This is a new issue which has not yet been reviewed by a maintainer. This label is removed when a maintainer reviews and responds to the issue. | | `bug` | This issue represents a defect or regression from the behavior that's documented | | `enhancement` | This issue represents a narrow extension of an existing capability | | `good_first_issue` | This issue does not require deep knowledge of the codebase to implement, and it is appropriate for a first-time contributor. | | `help_wanted` | This issue is trickier than a "good first issue." The required changes are scattered across the codebase, or more difficult to test. The maintainers are happy to help an experienced community contributor; they aren't planning to prioritize this issue themselves. | | `duplicate` | This issue is functionally identical to another open issue. The maintainers will close this issue and encourage community members to focus conversation on the other one. | | `stale` | This is an old issue which has not recently been updated. In repositories with a lot of activity, stale issues will periodically be closed. | | `wontfix` | This issue does not require a code change in the repository, or the maintainers are unwilling to merge a change which implements the proposed behavior. | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | ##### Pull requests[​](#pull-requests "Direct link to Pull requests") **Every PR should be associated with an issue.** Why? Before you spend a lot of time working on a contribution, we want to make sure that your proposal will be accepted. You should open an issue first, describing your desired outcome and outlining your planned change. If you've found an older issue that's already open, comment on it with an outline for your planned implementation *before* putting in the work to open a pull request. **PRs must include robust testing.** Comprehensive testing within pull requests is crucial for the stability of dbt. By prioritizing robust testing, we ensure the reliability of our codebase, minimize unforeseen issues, and safeguard against potential regressions. **We cannot merge changes that risk the backward incompatibility of existing documented behaviors.** We understand that creating thorough tests often requires significant effort, and your dedication to this process greatly contributes to the project's overall reliability. Thank you for your commitment to maintaining the integrity of our codebase and the experience of everyone using dbt! **PRs go through two review steps.** First, we aim to respond with feedback on whether we think the implementation is appropriate from a product & usability standpoint. At this point, we will close PRs that we believe fall outside the scope of dbt Core or the public components of the dbt Fusion engine, or which might lead to an inconsistent user experience. This is an important part of our role as maintainers; we're always open to hearing disagreement. If a PR passes this first review, we will queue it up for code review, at which point we aim to test it ourselves and provide thorough feedback. **We receive more PRs than we can thoroughly review, test, and merge.** Our teams have finite capacity, and our top priority is maintaining a well-scoped, high-quality framework for the tens of thousands of people who use it every week. To that end, we must prioritize overall stability and planned improvements over a long tail of niche potential features. For best results, say what in particular you'd like feedback on, and explain what would it mean to you, your team, and other community members to have the proposed change merged. Smaller PRs tackling well-scoped issues tend to be easier and faster for review. Two examples of community-contributed PRs: * [(dbt-core#9347) Fix configuration of turning test warnings into failures](https://github.com/dbt-labs/dbt-core/pull/9347) * [(dbt-core#9863) Better error message when trying to select a disabled model](https://github.com/dbt-labs/dbt-core/pull/9863) **Automation that can help us:** Many repositories have a template for pull request descriptions, which will include a checklist that must be completed before the PR can be merged. You don't have to do all of these things to get an initial PR, but they will delay our review process. Those include: * **Tests, tests, tests.** When you open a PR, some tests and code checks will run. (For security reasons, some may need to be approved by a maintainer.) We will not merge any PRs with failing tests. If you're not sure why a test is failing, please say so, and we'll do our best to get to the bottom of it together. * **Contributor License Agreement** (CLA): This ensures that we can merge your code, without worrying about unexpected implications for the copyright or license of open source or source-available dbt software. For more details, read: ["Contributor License Agreements"](https://docs.getdbt.com/community/resources/contributor-license-agreements.md) * **Changelog:** In projects that include a number of changes in each release, we need a reliable way to signal what's been included. The mechanism for this will vary by repository, so keep an eye out for notes about how to update the changelog. ##### Inclusion in release versions[​](#inclusion-in-release-versions "Direct link to Inclusion in release versions") ###### dbt Core Both bug fixes and backwards-compatible new features will be included in the [next minor release of dbt Core](https://docs.getdbt.com/docs/dbt-versions/core.md#how-dbt-core-uses-semantic-versioning). Fixes for regressions and net-new bugs that were present in the minor version's original release will be backported to versions with [active support](https://docs.getdbt.com/docs/dbt-versions/core.md). Other bug fixes may be backported when we have high confidence that they're narrowly scoped and won't cause unintended side effects. ###### dbt Fusion engine[​](#dbt-fusion-engine "Direct link to dbt Fusion engine") During the dbt Fusion engine's public beta process, new releases will be cut regularly. After the new engine reaches General Availability, we will update this document with a longer-term release strategy, although you can expect it to be similar to dbt Core's. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Faith Lierheimer [Home](https://docs.getdbt.com/index.md "Home")[Community spotlight](https://docs.getdbt.com/community/spotlight.md "Community spotlight")[Faith Lierheimer](https://docs.getdbt.com/community/spotlight/faith-lierheimer.md "Faith Lierheimer") ![Faith Lierheimer](/img/community/spotlight/faith-lierheimer.jpg?v=2) she/her Data Analyst II , Parsyl Location: Denver, CO, USA Organizations: Data Angels [Twitter](https://twitter.com/FaithLierheimer "Twitter") | [LinkedIn](https://www.linkedin.com/in/faithlierheimer/ "LinkedIn") | [Substack](https://faithfacts.substack.com/ "Substack") | [Data Folks](https://data-folks.masto.host/@faithlierheimer "Data Folks") #### About I've been a dbt Community member for around a year and a half. I come to the data world from teaching and academic research. Working in data fuses the aspects of those careers that I like the most, which are technical problem solving, and helping non-technical audiences understand data and what they can do with it. I have a dream stack with Databricks, dbt, and Looker. Professionally, I help shippers of perishable goods (everything from blueberries to childhood vaccinations) understand the risks their goods face in transit and how to mitigate them.This reduces food and medical waste worldwide. You can read more about these interests at [faithfacts.substack.com](https://faithfacts.substack.com/ "faithfacts.substack.com"). #### When did you join the dbt community and in what way has it impacted your career?[​](#when-did-you-join-the-dbt-community-and-in-what-way-has-it-impacted-your-career "Direct link to When did you join the dbt community and in what way has it impacted your career?") I joined the dbt community early in 2022 hoping to find technical help with dbt, and instead found a wide support network of career-minded data professionals. Being in the dbt community has helped me find my niche in the data world, and has helped me discover ways I can grow my career and technical acumen. Being in this community has been huge in easing my career transition from teaching into data. #### What dbt community leader do you identify with? How are you looking to grow your leadership in the dbt community?[​](#what-dbt-community-leader-do-you-identify-with-how-are-you-looking-to-grow-your-leadership-in-the-dbt-community "Direct link to What dbt community leader do you identify with? How are you looking to grow your leadership in the dbt community?") I primarily conceptualize of leadership as raising the floor beneath everyone, rather than enabling a few to touch its vaulted ceiling. As I gain more experience, I'd be delighted to be a resource for fellow career changers and teachers in transition. And, I love to goof in #roast-my-graph in the dbt Slack. [Come join](https://www.getdbt.com/community/join-the-community/?utm_medium=internal\&utm_source=docs\&utm_campaign=q3-2024_dbt-spotlight_aw\&utm_content=____\&utm_term=all___) that channel, it's a hoot and a holler. #### What have you learned from community members? What do you hope others can learn from you?[​](#what-have-you-learned-from-community-members-what-do-you-hope-others-can-learn-from-you "Direct link to What have you learned from community members? What do you hope others can learn from you?") I've learned a lot from community members, but most notably and concretely, I've actually gotten excellent visualization advice in #roast-my-graph. I've taken graphs there several times where I felt stuck on the presentation and have learned a lot about effective vizzes from my peers there. As I continue to gain experience, I hope others can learn from me what a successful career change looks like. And, ultimately, to take the work seriously but to not take ourselves that seriously. #### Anything else interesting you want to tell us?[​](#anything-else-interesting-you-want-to-tell-us "Direct link to Anything else interesting you want to tell us?") I have a black cat with one eye named Gus and my purpose is now to give him the best existence possible. --- ### Getting help #### Community help[​](#community-help "Direct link to Community help") dbt is powered by open source and source-available software, and has a generous community behind it. Asking questions well contributes to the community by building our collective body of knowledge. By following these steps, you'll be more likely to receive help from another community member. ##### 1. Try to solve your problem first before asking for help[​](#1-try-to-solve-your-problem-first-before-asking-for-help "Direct link to 1. Try to solve your problem first before asking for help") ###### Search the existing documentation[​](#search-the-existing-documentation "Direct link to Search the existing documentation") The docs site you're on is highly searchable, make sure to explore for the answer here as a first step. If you're new to dbt, try working through the [quickstart guide](https://docs.getdbt.com/guides.md) first to get a firm foundation on the essential concepts. ###### Try to debug the issue yourself[​](#try-to-debug-the-issue-yourself "Direct link to Try to debug the issue yourself") We have a handy guide on [debugging errors](https://docs.getdbt.com/guides/debug-errors.md) to help out! This guide also helps explain why errors occur, and which docs you might need to search for help. ###### Search for answers using your favorite search engine[​](#search-for-answers-using-your-favorite-search-engine "Direct link to Search for answers using your favorite search engine") We're committed to making more errors searchable, so it's worth checking if there's a solution already out there! Further, some errors related to installing dbt, the SQL in your models, or getting YAML right, are errors that are not-specific to dbt, so there may be other resources to check. ###### Experiment\![​](#experiment "Direct link to Experiment!") If the question you have is "What happens when I do `X`", try doing `X` and see what happens! Assuming you have a solid dev environment set up, making mistakes in development won't affect your end users ##### 2. Take a few minutes to formulate your question well[​](#2-take-a-few-minutes-to-formulate-your-question-well "Direct link to 2. Take a few minutes to formulate your question well") Explaining the problems you are facing clearly will help others help you. ###### Include relevant details in your question[​](#include-relevant-details-in-your-question "Direct link to Include relevant details in your question") Include exactly what's going wrong! When asking your question, you should: * Paste the error message or relevant code inside three backticks in your question, instead of sharing a screenshot * Include the version of dbt you're on (which you can check with `dbt --version`) * Let us know which warehouse you're using ###### Avoid generalizing your code[​](#avoid-generalizing-your-code "Direct link to Avoid generalizing your code") While we understand that you may wish to generalize your problem, or that you may have sensitive information you wish to anonymize, often replacing references in SQL can result in invalid code that creates an error different to the one you're hitting. This makes it harder for us to understand your problem. Wherever possible, share the exact code that you're trying to run. ###### Let us know what you've already tried[​](#let-us-know-what-youve-already-tried "Direct link to Let us know what you've already tried") In general, people are much more willing to help when they know you've already given something your best shot! ###### Share the context of the problem you're trying to solve[​](#share-the-context-of-the-problem-youre-trying-to-solve "Direct link to Share the context of the problem you're trying to solve") Sometimes you might hit a boundary of dbt because you're trying to use it in a way that doesn't align with the opinions we've built into dbt. By sharing the context of the problem you're trying to solve, we might be able to share insight into whether there's an alternative way to think about it. ###### Post a single message and use threads[​](#post-a-single-message-and-use-threads "Direct link to Post a single message and use threads") The dbt Slack's culture revolves around threads. When posting a message, try drafting it to yourself first to make sure you have included all the context. Include big code blocks in a thread to avoid overwhelming the channel. ###### Don't tag individuals to demand help[​](#dont-tag-individuals-to-demand-help "Direct link to Don't tag individuals to demand help") If someone feels inclined to answer your question, they will do so. We are a community of volunteers, and we're generally pretty responsive and helpful! If nobody has replied to your question, consider if you've asked a question that helps us understand your problem. If you require in-depth, ongoing assistance, we have a wonderful group of experienced dbt consultants in our ecosystem. You can find a full list [below](#receiving-dedicated-support). ##### 3. Choose the right medium for your question[​](#3-choose-the-right-medium-for-your-question "Direct link to 3. Choose the right medium for your question") We use a number of different mediums to share information * If your question is roughly "I've hit this error and am stuck", please ask it on [the dbt Community Forum](https://discourse.getdbt.com). * If you think you've found a bug, please report it on the relevant GitHub repo (e.g. [dbt repo](https://github.com/dbt-labs/dbt), [dbt-utils repo](https://github.com/dbt-labs/dbt-utils)) * If you are looking for a more wide-ranging conversation (e.g. "What's the best approach to X?", "Why is Y done this way?"), join our [Slack community](https://getdbt.com/community). Channels are consistently named with prefixes to aid discoverability. #### Receiving dedicated support[​](#receiving-dedicated-support "Direct link to Receiving dedicated support") If you need dedicated support to build your dbt project, consider reaching out regarding [professional services](https://www.getdbt.com/contact/), or engaging one of our [consulting partners](https://www.getdbt.com/partner-directory). #### dbt Training[​](#dbt-training "Direct link to dbt Training") If you want to receive dbt training, check out our [dbt Learn](https://learn.getdbt.com/) program. #### dbt support[​](#dbt-support "Direct link to dbt support") **Note:** If you are a **dbt user** and need help with one of the following issues, please reach out to us by clicking [**Create a support ticket**](https://docs.getdbt.com/docs/dbt-support.md#create-a-support-ticket) through the dbt navigation or [emailing support@getdbt.com](mailto:support@getdbt.com): * Account setup (e.g. connection issues, repo connections) * Billing * Bug reports related to the web interface As a rule of thumb, if you are using dbt, but your problem is related to code within your dbt project, then please follow the above process or checking out the [FAQs](https://docs.getdbt.com/docs/faqs.md) rather than reaching out to support. Refer to [dbt support](https://docs.getdbt.com/docs/dbt-support.md) for more information. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### How to deliver a fantastic meetup talk **Speaking at a dbt meetup? Here’s all the details you’ll need to know. If you’re speaking at another event, check out our additional tips at the end of the article.** #### Understanding dbt meetups[​](#understanding-dbt-meetups "Direct link to Understanding dbt meetups") dbt meetups are an opportunity for the dbt community to learn from each other. We’re typically on the lookout for talks that last for ~15 minutes, and we reserve an additional 5-10 minutes for Q\&A after your talk. We’re not *just* looking for talks that feature dbt — if your topic feels relevant to analytics engineers, we’d love to chat. In general, you can assume that around three quarters of the audience are dbt users. When shaping your talk, consider whether there’s something in there that might be new to an experienced dbt user, and, on the other end of the scale, something that feels relevant to a data practitioner that isn’t yet a dbt user. If you feel that your talk idea requires in-depth knowledge of dbt, consider speaking on Office Hours instead. Similarly, if you’re interested in giving a more introductory talk about dbt, consider reaching out to a local data meetup to see if it’s the right fit. For topic inspiration, you can find videos of past dbt meetup presentations [here](https://www.youtube.com/playlist?list=PL0QYlrC86xQn-jxWmEqtQRbZoyjq_ffq5). If you want to present at a dbt meetup, let us know [here](https://docs.google.com/forms/d/e/1FAIpQLScU4c0UvXLsasc7uwFBrzt6YzuGiMzEH_EyFfXGnIYDmTBDfQ/viewform). If we haven’t met you before, we might book-in a call to say hi and help shape your topic! We’ll also book a meeting before the event for a dry-run of the presentation to give any additional feedback. #### Recognize when you’re ready to give a talk[​](#recognize-when-youre-ready-to-give-a-talk "Direct link to Recognize when you’re ready to give a talk") Below, we’ve listed four signs that you’re ready to give a talk (originally based on [this article](https://thinkgrowth.org/how-to-write-about-your-work-652441747f41) from our Head of Marketing, Janessa — read that too!). We’ve also included examples for each category — where possible these are dbt meetup talks, but some of them are also links to blog posts from members in our community. ##### You recently finished a high-impact project[​](#you-recently-finished-a-high-impact-project "Direct link to You recently finished a high-impact project") These are a great option for first-time speakers as they mix together both big-picture thinking and tactics. For example: * "Improving data reliability" — Andrea Kopitz ([video](https://www.youtube.com/watch?v=M_cNspn2XsE), [slides](https://docs.google.com/presentation/d/1gHChax5aM3tqKkhepX7Mghmg0DTDbY5yoDBCfUR23lg/)) * "Predicting customer conversions using dbt + machine learning" — Kenny Ning ([video](https://www.youtube.com/watch?v=BF7HH8JDUS0), [slides](https://docs.google.com/presentation/d/1iqVjzxxRggMnRoI40ku88miDKw795djpKV_v4bbLpPE/)) * "Migrating 387 models from Redshift to Snowflake" — Sam Swift and Travis Dunlop ([video](https://www.youtube.com/watch?v=VhH614WVufM), [slides](https://docs.google.com/presentation/d/1wE8NSkFPLFKGQ8fvFUUKoZFVoUhws_FhFip-9mDhoPU/)) ##### You hit an inflection point in your career[​](#you-hit-an-inflection-point-in-your-career "Direct link to You hit an inflection point in your career") Have you recently changed something about your career that you think others can learn from? Started a new job, grown in your role? These topics might not mention dbt at all, but will be relevant to many people in the audience. For example: * “Getting hired as an analytics engineer: a candidate’s perspective” — Danielle Leong ([video](https://www.youtube.com/watch?v=6VCr30ZFxZ0)) * “One analyst's guide for going from good to great” — Jason Ganz ([blog post](https://blog.getdbt.com/one-analysts-guide-for-going-from-good-to-great/)) Other ideas: * You moved from a team of many to a team of one (or vice-versa), and want to share what each can learn from the other * You started to manage others and learned some things along the way ##### You’re digging deep into a topic[​](#youre-digging-deep-into-a-topic "Direct link to You’re digging deep into a topic") If you’ve spent many hours going deep on a topic, it could be a good idea to share what you’ve learned. For example: * “The farm-to-table testing framework” — Andrea Fabry ([blog post](https://blog.getdbt.com/data-testing-framework/)) * “How to create a career ladder” — Caitlin Moorman ([blog post](https://locallyoptimistic.com/post/career-ladders-part-2/)) ##### You have a strong opinion about something[​](#you-have-a-strong-opinion-about-something "Direct link to You have a strong opinion about something") Is there a “best practice” that you think is outdated? Want to convince others to see your point of view? In the data-space, we’ve seen this in topics like: * “Engineers shouldn’t write ETL” — Jeff Magnusson ([blog post](https://multithreaded.stitchfix.com/blog/2016/03/16/engineers-shouldnt-write-etl/)) * “You probably don’t need a data dictionary” — Michael Kaminsky & Alexander Jia ([blog post](https://locallyoptimistic.com/post/data_dictionaries/)) * “Write better SQL: In defense of `group by 1`” — Claire Carroll ([blog post](https://blog.getdbt.com/write-better-sql-a-defense-of-group-by-1/)) #### Checkpoint: Is someone else well-placed to give this talk?[​](#checkpoint-is-someone-else-well-placed-to-give-this-talk "Direct link to Checkpoint: Is someone else well-placed to give this talk?") Once you have a topic idea, stop for a moment and consider whether someone else on your team might also be a great fit for delivering this talk. Individuals from underrepresented groups are far less likely to self-nominate to give a talk — sometimes a shoulder tap is the nudge that’s needed. #### Shaping your talk[​](#shaping-your-talk "Direct link to Shaping your talk") Now, it’s time to write! Rather than starting with a slide deck, open up a blank document (or use [our template](https://docs.google.com/document/d/16aog0VitdLSScgxSNKe36q1C92QmG2vjXmtXYcPAhfw/edit#)), and start writing some notes. This helps you clarify your thinking, and is a great way to get feedback early, rather than investing the time into creating slides that might later be reworked. Don’t get too hung up on a title at this stage — we’re happy to work with you on that later in the process. #### The basic structure[​](#the-basic-structure "Direct link to The basic structure") Below, we’ve outlined a common structure used for meetup talks — if this is your first talk, this is a great way to get started (in fact, even experienced speakers often use a structure like this). Use this as a starting point, rather than an exact formula! 1. What is the business problem? Relating to a business problem helps audience members understand why you undertook a project. For example: * The finance team didn’t trust our numbers * We were never sure what led to an increase in customer conversion * The data team couldn’t find a balance between ad hoc requests and roadmap work * Our tracking across mobile and web was completely inconsistent 2. How did this manifest? Include evidence that this is a genuine problem — this helps create buy-in from the audience. Slack screenshots, quotes, charts, etc. are all good here! 3. What tactics were used to solve the problem? Three feels like a good number here. Make sure to emphasize people and process solutions as well as technology solutions. 4. What was the impact on the business problem? Since you set out a problem to be solved, it’s worth revisiting it. It’s okay if you found that your project didn’t go as planned — there’s a valuable lesson in there. Again, including evidence of improvement feels valuable. 5. What other things were learned, and/or what next steps are you taking? Summarize high level lessons that others can take-away, and potentially talk about what you’d do differently, or what you plan on doing next. ##### Why does this structure work?[​](#why-does-this-structure-work "Direct link to Why does this structure work?") The previous structure might seem formulaic, but we’ve seen it work a number of times. In our opinion, this structure works because: * **Your presentation has the structure of a story** — problem, journey, solution. Human beings love stories, and so the flow feels natural and easy for your audience to follow. * **It increases the target audience**. Sharing a few different tactics means that it’s more likely there will be something in your talk for different audience members. Compare that to narrowly scoping a talk on “[Writing packages when a source table may or may not exist](https://discourse.getdbt.com/t/writing-packages-when-a-source-table-may-or-may-not-exist/1487)”— it’s not going to feel relevant to most people in the room. * **It covers both theory and application.** Too much theory and you’re giving a TedTalk, too much application and you’re just giving a product demo. The best Meetup talks help people understand how you thought through a problem and why you made certain decisions so they can apply your knowledge within their unique context. #### Examples that follow this structure[​](#examples-that-follow-this-structure "Direct link to Examples that follow this structure") Here's a few of our favorite talks mapped to the structure — trust us, it works! ##### Improving data reliability — Andrea Kopitz, Envoy[​](#improving-data-reliability--andrea-kopitz-envoy "Direct link to Improving data reliability — Andrea Kopitz, Envoy") *[Video](https://www.youtube.com/watch?v=M_cNspn2XsE), [slides](https://docs.google.com/presentation/d/1gHChax5aM3tqKkhepX7Mghmg0DTDbY5yoDBCfUR23lg/).* 1. What is the business problem? Envoy’s financial data appeared inconsistent. 2. How did this manifest? Respondents to the team’s data survey said they no longer trusted the data. 3. What tactics were used to solve the problem? 1. Determine responsibility 2. Build more specific dbt tests 3. Track progress 4. What was the impact on the business problem? In their next data survey, satisfaction rating increased, and no mention of financial data accuracy. 5. What other things were learned, and/or what next steps are you taking? Lesson: Send out a data survey to your company to inform your roadmap. ##### Predicting customer conversions with dbt + machine learning — Kenny Ning, Better.com[​](#predicting-customer-conversions-with-dbt--machine-learning--kenny-ning-bettercom "Direct link to Predicting customer conversions with dbt + machine learning — Kenny Ning, Better.com") *[Video](https://www.youtube.com/watch?v=BF7HH8JDUS0), [slides](https://docs.google.com/presentation/d/1iqVjzxxRggMnRoI40ku88miDKw795djpKV_v4bbLpPE/).* 1. What is the business problem? No one knew why conversion rates for better.com customers would improve or worsen, making it difficult to know the value of different parts of the business. 2. How did this manifest? Different parts of the business took responsibility when it improved, no one took responsibility when it worsened. 3. What tactics were used to solve the problem? 1. Use a different approach to conversion rates — kaplan-meier conversion rates 2. Sketch out an ideal ML solution and see if it theoretically solves the problem 3. Build it! (ft. demonstration of solution) 4. What was the impact on the business problem? In the end — not as valuable as originally hoped (and that’s ok!). Editor note: [this article](https://better.engineering/2020-06-24-wizard-part-ii/) was a great follow up on the initial project. 5. What other things were learned, and/or what next steps are you taking? * Focus on end-to-end solutions * Materialize your clean dataset to improve collaboration * Sell to the business ##### Migrating 387 models from Redshift to Snowflake — Bowery Farming Data Team[​](#migrating-387-models-from-redshift-to-snowflake--bowery-farming-data-team "Direct link to Migrating 387 models from Redshift to Snowflake — Bowery Farming Data Team") *[Video](https://www.youtube.com/watch?v=VhH614WVufM), [slides](https://docs.google.com/presentation/d/1wE8NSkFPLFKGQ8fvFUUKoZFVoUhws_FhFip-9mDhoPU/).* 1. What is the business problem? A new Bowery Farming site had increased the amount of data the team were dealing with, which put a strain on their data stack. 2. How did this manifest? Charts show increased dbt times, and increased Redshift costs. 3. What tactics were used to solve the problem? 1. Push Redshift to its limit: Leverage Athena, Redshift configurations, separate clusters, python pre-processing 2. Trial Snowflake for cost and performance 3. Commit to a migration with strong project management 4. What was the impact on the business problem? Yet to be determined (at the time, they had just finished the project). But the team showed evidence that the project has been successfully completed! 5. What other things were learned, and/or what next steps are you taking? Other things learned: * Differences between Redshift and Snowflake SQL syntax * Teamwork and coordination are key to completing a migration #### Turn it into a presentation[​](#turn-it-into-a-presentation "Direct link to Turn it into a presentation") Now, it's time to take your idea and turn it into a presentation. ##### Structuring your slides[​](#structuring-your-slides "Direct link to Structuring your slides") As well as the slides that directly support your content, consider including: * At the start: * An intro slide for yourself (and teammates) * An intro slide for your company — you might also include some impressive numbers about your business, after all, your audience is full of people who love numbers! * Potentially include your tech stack for context — there’s no need to spend too much time on this, most audience members will be familiar with the tools. * Before diving into the specific tactics used: * Use a slide to list the three tactics at a high level — this signposting helps set expectations for audience members. * At the end: * A closing slide to prompt questions, and list your contact details. * If your company is hiring, mention that too! If available, use your corporate-branded slide deck. We also have dbt-branded slides if you want to use those. ##### Making your presentation shine[​](#making-your-presentation-shine "Direct link to Making your presentation shine") When turning your story into a presentation, also consider doing the following: ###### Use full sentences in your slide headings[​](#use-full-sentences-in-your-slide-headings "Direct link to Use full sentences in your slide headings") When presenting (especially virtually), it’s hard to hold everyone’s focus. That’s ok! By including full sentences as your heading, people can “hook” back into the presentation. For example, rather than having a slide on "Slide headings", use a title like “Use full sentences in your slide headings” (woah — meta!) ###### Make your slides accessible[​](#make-your-slides-accessible "Direct link to Make your slides accessible") This is a [great guide](https://www.smashingmagazine.com/2018/11/inclusive-design-accessible-presentations/) on making your slides accessible — read it! ###### Use evidence in your slides[​](#use-evidence-in-your-slides "Direct link to Use evidence in your slides") Evidence is a key part of getting buy-in that the story you’re telling is valuable. Consider including: * Screenshots of slack conversations * Quotes, survey results, charts * If talking about a complex transformation, include small samples of data to demonstrate the concept. You may need to generate some fake data to simplify the problem (example) * If one of your tactics is heavily code-based, consider sharing that code in a separate piece so that interested folks can refer back to it later. (Discourse is great for this) ###### (Virtual events) Create moments for interactivity[​](#virtual-events-create-moments-for-interactivity "Direct link to (Virtual events) Create moments for interactivity") For virtual events: is there a poll you can launch, or a question you can throw out to the chat? This can help create a sense of community at the event. #### Pair it with a blog post[​](#pair-it-with-a-blog-post "Direct link to Pair it with a blog post") The hardest part of nailing a great talk is the content, so if you’ve made it this far, you’ve already done most of the work. Turning your content into a blog post is a great way to solidify your thinking, and get some extra exposure. If you’d like to be featured on the [dbt Blog](https://blog.getdbt.com/), please email us at . We’ll also be adding more resources on how to write about your work soon! #### Speaking at a non-dbt event[​](#speaking-at-a-non-dbt-event "Direct link to Speaking at a non-dbt event") Above, we’ve given specific advice for speaking at a dbt meetup. If you’re a dbt community member who wants to speak at a non-dbt meetup or conference, there’s a few extra ways you can adjust your process. ##### Questions to ask the event organizer[​](#questions-to-ask-the-event-organizer "Direct link to Questions to ask the event organizer") ###### What is the technical baseline for the audience?[​](#what-is-the-technical-baseline-for-the-audience "Direct link to What is the technical baseline for the audience?") Do they know about dbt? If not, are they familiar with SQL? You’ll likely have a range of people in the audience so there won’t be one exact answer, but gathering information about the median knowledge is useful. As a guideline, aim to teach something new to at least half of the audience. ###### What kind of talks have been the most successful?[​](#what-kind-of-talks-have-been-the-most-successful "Direct link to What kind of talks have been the most successful?") Is the event oriented around technical talks or strategic talks? Is there an expectation of demo-ing code? Do they have past examples of talks that were well-received, or any tips? ###### What are the event logistics?[​](#what-are-the-event-logistics "Direct link to What are the event logistics?") How long is your talk supposed to go for? Is there an opportunity to do Q\&A? If the event is virtual, what is the software setup like? How will questions be moderated? If the event is in-person, will you be able to use your own computer, or will you use someone else’s? What sort of screen is there? How do you connect to it? And do you have the right dongle for your MacBook Pro? ###### Is there an opportunity for topic feedback?[​](#is-there-an-opportunity-for-topic-feedback "Direct link to Is there an opportunity for topic feedback?") Is the organizer interested in working with you to make your topic great? If not, can they point you to someone in their community who might be interested in helping out? ###### Are there any additional accessibility considerations you should be aware of?[​](#are-there-any-additional-accessibility-considerations-you-should-be-aware-of "Direct link to Are there any additional accessibility considerations you should be aware of?") Do any audience members use a communication device? Can you share your slides ahead of time to make them easier for audience members to access? Will the event be recorded for those who can’t attend in person? ##### Responding to a conference Call for Speakers[​](#responding-to-a-conference-call-for-speakers "Direct link to Responding to a conference Call for Speakers") If you’re submitting a response for a Call for Speakers, and talking about dbt, we’re happy to work with you on this. You may email us at for more information. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Jenna Jordan [Home](https://docs.getdbt.com/index.md "Home")[Community spotlight](https://docs.getdbt.com/community/spotlight.md "Community spotlight")[Jenna Jordan](https://docs.getdbt.com/community/spotlight/jenna-jordan.md "Jenna Jordan") Community Award Recipient 2024 ![Jenna Jordan](/img/community/spotlight/jenna-jordan.jpg?v=2) she/her Senior Data Management Consultant , Analytics8 Location: Asheville, USA [LinkedIn](https://www.linkedin.com/in/jennajordan1/ "LinkedIn") | [Personal website](https://jennajordan.me/ "Personal website") #### About I am a Senior Data Management Consultant with Analytics8, where I advise clients on dbt best practices (especially regarding dbt Mesh and the various shifts in governance and strategy that come with it). My experiences working within a dbt Mesh architecture and all of the difficulties organizations could run into with such a major paradigm shift inspired my peer exchange (role-playing/simulation game) at [Coalesce 2024](https://coalesce.getdbt.com/agenda): "Governance co-lab: We the people, in order to govern data, do establish processes." I also experimented with bringing role-playing scenarios to data problems at the September 2024 [Chicago dbt Meetup](https://www.meetup.com/chicago-dbt-meetup/), hosted by Analytics8. I occasionally write long blog posts on my website, if you're up for the read. #### When did you join the dbt community and in what way has it impacted your career?[​](#when-did-you-join-the-dbt-community-and-in-what-way-has-it-impacted-your-career "Direct link to When did you join the dbt community and in what way has it impacted your career?") My dbt learning journey kicked off with the CoRise (now Uplimit) course [Analytics Engineering with dbt](https://uplimit.com/course/analytics-engineering-with-dbt/), with Emily Hawkins and Jake Hannan, in February 2022 – less than a month after starting as a data engineer with the City of Boston Analytics Team. About a year later, I spearheaded the adoption of dbt at the City and got to build the project and associated architecture from scratch – which is probably the best learning experience you could ask for! I saw the value dbt could bring to improving data management processes at the City, and I knew there were other cities and local governments that could benefit from dbt as well, which motivated me to find my fellow co-speakers Ian Rose and Laurie Merrell to give a talk at Coalesce 2023 called ["From Coast to Coast: Implementing dbt in the public sector."](https://www.youtube.com/watch?v=6aX7tAfMmIM&) As a part of our goal to identify and cultivate a community of dbt practitioners in the public (and adjacent) sectors, we also started the dbt Community Slack channel [#industry-public-sector](https://getdbt.slack.com/archives/C05MNU6QB5L/). That experience allowed me to continue to grow my career and find my current role - as well as connect with so many amazing data folks! #### What dbt community leader do you identify with? How are you looking to grow your leadership in the dbt community?[​](#what-dbt-community-leader-do-you-identify-with-how-are-you-looking-to-grow-your-leadership-in-the-dbt-community "Direct link to What dbt community leader do you identify with? How are you looking to grow your leadership in the dbt community?") There are many leaders in the dbt community that I admire and identify with – I won’t list them all out because I will invariably miss someone (but… you probably know who you are). Technical prowess is always enviable, but I most admire those who bring the human element to data work: those who aren’t afraid to be their authentic selves, cultivate a practice of empathy and compassion, and are driven by curiosity and a desire to help others. I’ve never set out to be a leader, and I still don’t really consider myself to be a leader – I’m much more comfortable in the role of a librarian. I just want to help people by connecting them to the information and resources that they may need. #### What have you learned from community members? What do you hope others can learn from you?[​](#what-have-you-learned-from-community-members-what-do-you-hope-others-can-learn-from-you "Direct link to What have you learned from community members? What do you hope others can learn from you?") Pretty much everything I’ve learned about dbt and working in a mature analytics ecosystem I’ve learned from dbt community members. The [dbt Community Slack](https://www.getdbt.com/community/join-the-community/) is full of useful information and advice, and has also helped me identify experts about certain topics that I can chat with to learn even more. When I find someone sharing useful information, I usually try to find and follow them on social media so I can see more of their content. If there is one piece of advice I want to share, it is this: don’t be afraid to engage. Ask for help when you need it, but also offer help freely. Engage with the community with the same respect and grace you would offer your friends and coworkers. #### Anything else interesting you want to tell us?[​](#anything-else-interesting-you-want-to-tell-us "Direct link to Anything else interesting you want to tell us?") Library Science is so much more than the Dewey Decimal System (seriously, ask a librarian about Dewey for a juicy rant). RDF triples (for knowledge graphs) are queried using SPARQL (pronounced “sparkle”). An antelope can be a document. The correct way to write a date/time is ISO-8601. The oldest known table (of the spreadsheet variety) is from 5,000 years ago – record-keeping predates literature by a significant margin. Zip codes aren’t polygons – they don’t contain an area or have boundaries. Computers don’t always return 0.3 when asked to add 0.1 + 0.2. SQL was the sequel to SQUARE. Before computers, people programmed looms (weaving is binary). What? You asked!! On a more serious note – data teams: start hiring librarians. No, seriously. No degree could have prepared me better for what I do in the data field than my M.S. in Library & Information Science. I promise, you want the skillset & mindset that a librarian will bring to your team. --- ### Jing Yu Lim [Home](https://docs.getdbt.com/index.md "Home")[Community spotlight](https://docs.getdbt.com/community/spotlight.md "Community spotlight")[Jing Yu Lim](https://docs.getdbt.com/community/spotlight/jing-yu-lim.md "Jing Yu Lim") ![Jing Yu Lim](/img/community/spotlight/jing-lim.jpg?v=2) she/her I'm open to work! Location: Singapore, Singapore [LinkedIn](https://www.linkedin.com/in/limjingyu/ "LinkedIn") #### About For ~3 years, I was a Product Analyst at Grab, a ride-hailing and food delivery app in Southeast Asia, before taking on an Analytics Engineering role in Spenmo, a B2B Fintech startup. I joined a tech company as an analyst in June 2023, but was recently impacted by a layoff. I'm also one of the co-organisers of the [Singapore dbt Meetup](https://www.meetup.com/singapore-dbt-meetup/ "Singapore dbt Meetup")! My story with dbt started in Jan 2022, when I joined Spenmo where I taught myself dbt, mainly via [dbt's documentation](https://docs.getdbt.com/docs/introduction "dbt's documentation") and [Slack community](https://www.getdbt.com/community/join-the-community/?utm_medium=internal\&utm_source=docs\&utm_campaign=q3-2024_dbt-spotlight_aw\&utm_content=____\&utm_term=all___ "Slack community"). We used Snowflake as our data warehouse, and Holistics for BI. I spoke about data self-serve and Spenmo's journey with dbt at multiple meetups. #### When did you join the dbt community and in what way has it impacted your career?[​](#when-did-you-join-the-dbt-community-and-in-what-way-has-it-impacted-your-career "Direct link to When did you join the dbt community and in what way has it impacted your career?") I joined the dbt community late January 2022, while setting up Spenmo's first dbt project. I was completely new to dbt, and relied heavily on the #advice-dbt-help channel in dbt Slack whenever I got stuck. I have learnt so much from reading discussions in other channels as well (e.g. #leading-data-teams, #advice-mock-interviews, #db-snowflake, #tools-holistics). The dbt community also helped me expand my professional network, where I met so many amazing individuals! It all started with #local-singapore which was created by community member Jolanda Zwagemaker sometime in April 2022. We organised dinners to connect with one another, which eventually led to an opportunity to run Singapore dbt Meetup (HUGE thank you to dbt) - it is heartwarming to see connections forged between many attendees of the meetup, where we also learn from one another. It really does feel like a community! #### What dbt community leader do you identify with? How are you looking to grow your leadership in the dbt community?[​](#what-dbt-community-leader-do-you-identify-with-how-are-you-looking-to-grow-your-leadership-in-the-dbt-community "Direct link to What dbt community leader do you identify with? How are you looking to grow your leadership in the dbt community?") Claire Carroll and Mila Page! My very first touchpoint with dbt was their articles in [The Analytics Engineering Guide](https://www.getdbt.com/analytics-engineering/). I remember relating to it so much that I was saying "YES" to every other line I read, and sending text snippets to my friends. To me, Analytics Engineering could help overcome certain challenges I face as an analyst, and make the job feels less like a "hamster wheel." As the concept of analytics engineering is fairly new in Singapore, I feel the need to spread the word and bring about a mindset shift among not just data teams, but anyone who needs to work with a data team. #### What have you learned from community members? What do you hope others can learn from you?[​](#what-have-you-learned-from-community-members-what-do-you-hope-others-can-learn-from-you "Direct link to What have you learned from community members? What do you hope others can learn from you?") One of my favourite presentations from the Singapore dbt Meetup was ["How would the ideal Semantic Layer look like?"](https://docs.google.com/presentation/d/1t1ts04b7qA-BVlV3qbNZ4fI-MSZn0iL6_FhsaWhJk_0/edit?usp=sharing) by fellow community member Thanh Dinh from Holistics. It taught me a new perspective on metrics: they could be like dbt models, where dependencies can be set up between metric models. I definitely have so much more to learn as an individual, but I hope to share some of my tips and lessons in terms of data modelling with others. #### Anything else interesting you want to tell us?[​](#anything-else-interesting-you-want-to-tell-us "Direct link to Anything else interesting you want to tell us?") Thank you dbt for enabling us to run meetups! It has been critical for ensuring a great experience for the Singapore community. Also a huge shoutout to Amada, the Global Community Development Lead, for always being super helpful and supportive despite the 12-hour time difference! --- ### Johann de Wet [Home](https://docs.getdbt.com/index.md "Home")[Community spotlight](https://docs.getdbt.com/community/spotlight.md "Community spotlight")[Johann de Wet](https://docs.getdbt.com/community/spotlight/johann-de-wet.md "Johann de Wet") ![Johann de Wet](/img/community/spotlight/johann-dewett.jpg?v=2) he/him Staff Analytics Engineer , Yoco Location: Paarl, South Africa [LinkedIn](https://www.linkedin.com/in/johann-de-wet-21a6128/ "LinkedIn") #### About I'm forever indebted to my manager, John Pienaar, who introduced me to both dbt and it's community when I joined his team as an Analytics Engineer at the start of 2022. I often joke about my career before dbt and after dbt. Our stack includes Fivetran, Segment, Airflow and BigQuery to name a few. Prior to that I was a business intelligence consultant for 16 years working at big financial corporates. During this time I've had the opportunity to work in many different roles from front end development to data engineering and data warehouse platform development. The only two constants in my career have been SQL en Ralph Kimball's Dimension Modeling methodology...which probably makes me a bit partial to those. #### When did you join the dbt community and in what way has it impacted your career?[​](#when-did-you-join-the-dbt-community-and-in-what-way-has-it-impacted-your-career "Direct link to When did you join the dbt community and in what way has it impacted your career?") I joined the dbt community at the start of 2022 when I joined my current company. Having the community available to me from the start of my dbt career was invaluable not only in helping me get up to speed on dbt quickly but also in making me aware of so many capabilities, optimisations and solutions to problems that others have implemented. The knowledge gained here goes much further than just dbt itself. #### What dbt community leader do you identify with? How are you looking to grow your leadership in the dbt community?[​](#what-dbt-community-leader-do-you-identify-with-how-are-you-looking-to-grow-your-leadership-in-the-dbt-community "Direct link to What dbt community leader do you identify with? How are you looking to grow your leadership in the dbt community?") It feels almost unfair to single out members where there are so many but people like Dave Connors and Benoit Perigaud really represent what this community is about in my opinion. Prompt, proficient, passionate and polite...all the P's you could want. Someone like Sara Leon also deserves a shoutout for keeping us abreast of the newest developments in dbt and also making us feel even more connected to the community. As for myself, I would like to make some contributions to dbt's codebase in the future to be able to assist the dbt engineers who have offered so much to us dbt users over the years. #### What have you learned from community members? What do you hope others can learn from you?[​](#what-have-you-learned-from-community-members-what-do-you-hope-others-can-learn-from-you "Direct link to What have you learned from community members? What do you hope others can learn from you?") I learned that community members are incredibly passionate about dbt as well as helping other members by sharing their knowledge and also contributing to dbt itself which is to everyone's benefit ultimately. I try to help regardless of how "simple" a question may be. Even if it's been asked and answered a thousand times before for that person it's new and potentially a real blocker in their world. I remind myself that we were all new to dbt at some stage and every bit of help along the way got us to where we are now. #### Anything else interesting you want to tell us?[​](#anything-else-interesting-you-want-to-tell-us "Direct link to Anything else interesting you want to tell us?") I'm hoping to attend my first Coalesce event in Las Vegas later this year. --- ### Join the Community Want to learn how organizations around the world are tackling the biggest challenges in data while making new friends from the best analytics teams? Join the dbt Community — data practitioners’ favorite place to learn new skills, keep on top of industry trends, and forge connections. [![](/img/icons/slack.svg)](https://www.getdbt.com/community/join-the-community/) ###### [Join us on Slack](https://www.getdbt.com/community/join-the-community/) [Follow the pulse of the dbt Community! Chat with other practitioners in your city, country or worldwide about data work, tech stacks, or simply share a killer meme.](https://www.getdbt.com/community/join-the-community/) [![](/img/icons/discussions.svg)](https://docs.getdbt.com/community/forum) ###### [Community Forum](https://docs.getdbt.com/community/forum) [Have a question about how to do something in dbt? Hop into the Community Forum and work with others to create long lived community knowledge.](https://docs.getdbt.com/community/forum) [![](/img/icons/pencil-paper.svg)](https://docs.getdbt.com/community/contribute.md) ###### [How to contribute](https://docs.getdbt.com/community/contribute.md) [Want to get involved? This is the place! Learn how to contribute to our public repositories, write for the blog, speak at a meetup and more.](https://docs.getdbt.com/community/contribute.md) [![](/img/icons/folder.svg)](https://docs.getdbt.com/community/resources/code-of-conduct.md) ###### [Code of Conduct](https://docs.getdbt.com/community/resources/code-of-conduct.md) [We are committed to creating a space where everyone can feel welcome and safe. Our Code of Conduct reflects the agreement that all Community members make to uphold these ideals.](https://docs.getdbt.com/community/resources/code-of-conduct.md) [![](/img/icons/calendar.svg)](https://docs.getdbt.com/community/events) ###### [Upcoming events](https://docs.getdbt.com/community/events) [Whether it's in-person Meetups in your local area, dbt Summit – the annual Analytics Engineering Conference – or online Office Hours there are options for everyone.](https://docs.getdbt.com/community/events) [![](/img/icons/star.svg)](https://www.youtube.com/playlist?list=PL0QYlrC86xQl1DGKBopQZiZ6tSqrMlD2M) ###### [Watch past events](https://www.youtube.com/playlist?list=PL0QYlrC86xQl1DGKBopQZiZ6tSqrMlD2M) [Get a taste for the energy of our live events, get inspired, or prepare for an upcoming event by watching recordings from our YouTube archives.](https://www.youtube.com/playlist?list=PL0QYlrC86xQl1DGKBopQZiZ6tSqrMlD2M) #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Josh Devlin [Home](https://docs.getdbt.com/index.md "Home")[Community spotlight](https://docs.getdbt.com/community/spotlight.md "Community spotlight")[Josh Devlin](https://docs.getdbt.com/community/spotlight/josh-devlin.md "Josh Devlin") Community Award Recipient 2023 ![Josh Devlin](/img/community/spotlight/josh-devlin.jpg?v=2) he/him Senior Analytics Engineer , Canva Location: Melbourne, Australia (but spent most of the last decade in Houston, USA) [Twitter](https://twitter.com/JayPeeDevlin "Twitter") | [LinkedIn](https://www.linkedin.com/in/josh-devlin/ "LinkedIn") #### About Josh Devlin has a rich history of community involvement and technical expertise in both the dbt and wider analytics communities. Discovering dbt in early 2020, he quickly became an integral member of its [community](https://www.getdbt.com/community/join-the-community), leveraging the platform as a learning tool and aiding others along their dbt journey. Josh has helped thousands of dbt users with his advice and near-encyclopaedic knowledge of dbt. Beyond the online community, he transitioned from being an attendee at the first virtual Coalesce conference in December 2020 to a [presenter at the first in-person Coalesce event](https://coalesce.getdbt.com/blog/babies-and-bathwater-is-kimball-still-relevant "at the first in-person Coalesce") in New Orleans in 2022. He has also contributed to the dbt-core and dbt-snowflake codebases, helping improve the product in the most direct way. His continuous contributions echo his philosophy of learning through teaching, a principle that has not only enriched the dbt community but also significantly bolstered his proficiency with the tool, making him a valuable community member. Aside from his technical endeavors, Josh carries a heart for communal growth and an individual's ability to contribute to a larger whole, a trait mirrored in his earlier pursuits as an orchestral musician. His story is a blend of technical acumen, communal involvement, and a nuanced appreciation for the symbiotic relationship between teaching and learning, making him a notable figure in the analytics engineering space. #### When did you join the dbt community and in what way has it impacted your career?[​](#when-did-you-join-the-dbt-community-and-in-what-way-has-it-impacted-your-career "Direct link to When did you join the dbt community and in what way has it impacted your career?") I have been a subscriber to 'The Data Science Roundup' (now ['The Analytics Engineering Roundup'](https://roundup.getdbt.com/)) since its inception, so I knew that dbt existed from the very beginning, since the time that dbt Labs was still called Fishtown Analytics. Despite that, I never really understood what the tool was or how it fit in until early 2020 when I first started experimenting with the tool. I immediately joined the community and found it warm and welcoming, so I started to help people where I could and never stopped! #### What dbt community leader do you identify with? How are you looking to grow your leadership in the dbt community?[​](#what-dbt-community-leader-do-you-identify-with-how-are-you-looking-to-grow-your-leadership-in-the-dbt-community "Direct link to What dbt community leader do you identify with? How are you looking to grow your leadership in the dbt community?") I like to think I represent the warm, helpful vibes of the early days of the Community, where folks like Claire Carroll warmly welcomed myself and others! #### What have you learned from community members? What do you hope others can learn from you?[​](#what-have-you-learned-from-community-members-what-do-you-hope-others-can-learn-from-you "Direct link to What have you learned from community members? What do you hope others can learn from you?") I've learned that the more you give, the more you get. I've put hundreds of hours into helping other people in the community, but I've gotten all that back and much more. I hope I can encourage others to give of themselves and reap the rewards later! #### Anything else interesting you want to tell us?[​](#anything-else-interesting-you-want-to-tell-us "Direct link to Anything else interesting you want to tell us?") In a previous life I was an orchestral musician! --- ### Juan Manuel Perafan [Home](https://docs.getdbt.com/index.md "Home")[Community spotlight](https://docs.getdbt.com/community/spotlight.md "Community spotlight")[Juan Manuel Perafan](https://docs.getdbt.com/community/spotlight/juan-manuel-perafan.md "Juan Manuel Perafan") ![Juan Manuel Perafan](/img/community/spotlight/juan-manuel-perafan.jpg?v=2) he/him Analytics Engineer , Xebia Location: Amsterdam, Netherlands Organizations: Co-author: \*Fundamentals of Analytics Engineering\* [LinkedIn](https://www.linkedin.com/in/jmperafan/ "LinkedIn") | [Amazon Profile](https://www.amazon.com/author/jmperafan "Amazon Profile") #### About Born and raised in Colombia! Living in the Netherlands since 2011. I've been working in the realm of analytics since 2017, focusing on Analytics Engineering, dbt, SQL, data governance, and business intelligence (BI). Besides consultancy work, I am very active in the data community. I co-authored the book \*Fundamentals of Analytics Engineering\* and have spoken at various conferences and meetups worldwide, including [Coalesce](https://coalesce.getdbt.com/), Linux Foundation OS Summit, Big Data Summit Warsaw, Dutch Big Data Expo, and Developer Week Latin America. I also love meetups! I am the founder of the Analytics Engineering Meetup and co-founder of the [Netherlands dbt Meetup](https://www.meetup.com/amsterdam-dbt-meetup/). #### When did you join the dbt community and in what way has it impacted your career?[​](#when-did-you-join-the-dbt-community-and-in-what-way-has-it-impacted-your-career "Direct link to When did you join the dbt community and in what way has it impacted your career?") I've been a dbt user since 2020, but it wasn't until I attended Coalesce 2021 in New Orleans that I truly felt part of the community. The experience inspired me to start a dbt Meetup in Amsterdam. I thoroughly enjoy organizing Meetups! They provide a platform to network and learn from some of the most experienced data professionals in your area. Additionally, it's rewarding to see how attendees bond. Often, in our day-to-day jobs, we're surrounded by people who don't fully grasp our work, so having deeper conversations at Meetups is refreshing. #### What dbt community leader do you identify with? How are you looking to grow your leadership in the dbt community?[​](#what-dbt-community-leader-do-you-identify-with-how-are-you-looking-to-grow-your-leadership-in-the-dbt-community "Direct link to What dbt community leader do you identify with? How are you looking to grow your leadership in the dbt community?") I am not the most knowledgeable Analytics Engineer out there, but I'm good at building communities. Starting from scratch is daunting. So, for me, leading means doing the hard work to make it easier for others to join. When it comes to Meetups, it means being ready to handle every aspect of an event alone, but also letting team members pitch in where they're comfortable. It's okay if they only want a small part at first, but maybe eventually, they'll feel comfortable running anything (not just meetups) on their own. #### What have you learned from community members? What do you hope others can learn from you?[​](#what-have-you-learned-from-community-members-what-do-you-hope-others-can-learn-from-you "Direct link to What have you learned from community members? What do you hope others can learn from you?") Along the way, I've gained insights into technical areas like data ops, data modeling, and lots of SQL best practices, as well as broader fields such as stakeholder management and data strategy. It is easy to focus on the things you don't know. But each of us brings unique expertise and skills to the table. Analytics engineering is a diverse field, and mastering every aspect is not expected. #### Anything else interesting you want to tell us?[​](#anything-else-interesting-you-want-to-tell-us "Direct link to Anything else interesting you want to tell us?") I am always up for a chat. If you see me in a conference or want to DM me, please don't hesitate. It is always a pleasure to network with other fellow data professionals. --- ### Karen Hsieh [Home](https://docs.getdbt.com/index.md "Home")[Community spotlight](https://docs.getdbt.com/community/spotlight.md "Community spotlight")[Karen Hsieh](https://docs.getdbt.com/community/spotlight/karen-hsieh.md "Karen Hsieh") Community Award Recipient 2023 ![Karen Hsieh](/img/community/spotlight/karen-hsieh.jpg?v=2) she/her Director of Tech & Data , ALPHA Camp Location: Taipei, Taiwan [Twitter](https://twitter.com/ijac_wei "Twitter") | [LinkedIn](https://www.linkedin.com/in/karenhsieh/ "LinkedIn") | [Medium](https://medium.com/@ijacwei "Medium") #### About I’m a Product Manager who builds company-wide data literacy and empowers the product team to create values for people and grow the company. Utilizing dbt, I replaced time-consuming spreadsheets by creating key business metric dashboards that improved data literacy, enabling conversations about product and business. Since joining the dbt community in 2019, I’ve led the creation of the #local-taiwan dbt Slack channel, organized 10 [Taipei dbt Meetups](https://www.meetup.com/taipei-dbt-meetup/ "Taipei dbt Meetups") and [spoken at Coalesce 2022](https://youtu.be/VMlrT4wXTgg "spoken at Coalesce 2022"). I write about how data empowers products on [Medium](https://medium.com/@ijacwei "Medium"). I focus on understanding how users utilize and think about the product based on facts. #### When did you join the dbt community and in what way has it impacted your career?[​](#when-did-you-join-the-dbt-community-and-in-what-way-has-it-impacted-your-career "Direct link to When did you join the dbt community and in what way has it impacted your career?") As a Product Manager with a passion for data, I began using dbt in 2019 when it was introduced to me by Richard Lee, CTO & co-founder of [iCook.tw](http://icook.tw/). We were likely one of the first companies in Taiwan to use dbt and as the sole dbt model writer in our startup, I worked with BigQuery, dbt plus Metabase, with the occasional use of Data Studio (now Looker Studio). I joined the dbt Slack community to connect with others who use dbt and found the forum to be more informative than formal documentation. Viewing that documentation is formal. Viewing the questions from people is real! I love the sense of community that is built around building things together. dbt demonstrates how I want to build products—by creating a direct connection between the users and those who build the product. I conduct user interviews in a style that resembles chatting with friends who share similar interests. This approach encourages me to get to know users more directly. In January 2022, Laurence Chen (REPLWARE) and a friend asked me if I knew anyone else in Taiwan using dbt, which inspired me to request and maintain a local channel for the Taiwan community in the dbt Slack, #local-taipei. I had no prior experience with community-building and wasn't an engineer, but I searched the #introductions channel and found Allen Wang (Zitara Technologies). With him and Laurence, we got started on organizing our first meetup in Taiwan. Aftering running a few successful meetups on our own, we got in touch with dbt Labs, and are now the official Taiwan dbt Meetup organizers. We have now run 10 meetups in total, with two more planned in May and June 2023. We've never actively sought to increase membership, but the meetups have continued to grow and attract passionate people who are always willing to share their experiences. (See more about the story [here](https://medium.com/dbt-local-taiwan/how-does-dbt-local-taipei-get-started-ff58489c80fa).) Through dbt, I've learned that people are friendly and willing to help when asked. Publicly asking questions is a great way to practice describing questions more effectively and simplifying them for others to understand, a skill that has come in handy for solving problems beyond those related to data. I've also developed a support system where I know where to go for help with various questions, such as data, leadership, product, business and hiring. It all started with the dbt community 💜. What’s more, Laurence pushed me to introduce dbt at [COSCUP Taiwan 2022](https://coscup.org/2022/zh-TW/session/SRKVLQ), the largest open-source event in Taiwan, and I was thrilled to be accepted to speak at Coalesce 2022. Preparing talks and answering questions were valuable training, and with the help of many people, I revised my presentation multiple times, refining my opinions and structuring my thinking. I learned a lot from these experiences. #### What dbt community leader do you identify with? How are you looking to grow your leadership in the dbt community?[​](#what-dbt-community-leader-do-you-identify-with-how-are-you-looking-to-grow-your-leadership-in-the-dbt-community "Direct link to What dbt community leader do you identify with? How are you looking to grow your leadership in the dbt community?") I admire dbt Lab’s co-founder Tristan Handy’s openness to chat and share his thoughts, as well as his open-minded approach to product-building and his mission to encourage data analysts to work more like software developers. When I spoke with dbt employees at Coalesce 2022, it was clear that they aligned with the company mission and loved the product that they worked on. I am also inspired by Benn Stancil's (Mode) ability to [write a newsletter every Friday](https://benn.substack.com/) that uncovers questions and trends in a clear and concise way, sparking discussions and inspiring people. While attending Coalesce annually from 2020 to 2022, I found Emilie Schario’s (co-founder, Turbine) presentations enlightening and I could see the evolution of how data teams work over the course of her three successive talks. Additionally, I am inspired by Jolanda Zwagemaker (Wise), who created #local-singapore to connect with others when she moved from London to Singapore and even flew to Taiwan to share her experiences with Incremental. I never think about leadership in the community. I want to share my thoughts, ideas, and experiences to inspire others and start discussions. I also hope to encourage more people in Taiwan to join conversations with others from around the world, as I believe this will help us learn and grow together. #### What have you learned from community members? What do you hope others can learn from you?[​](#what-have-you-learned-from-community-members-what-do-you-hope-others-can-learn-from-you "Direct link to What have you learned from community members? What do you hope others can learn from you?") Being a part of the dbt community has taught me the value of collaboration and learning from others. As someone who started using dbt as a solo PM, it's been reassuring to find that I'm not alone. It's inspiring to see people from diverse backgrounds, including those without engineering experience, eager to learn and become analytics engineers. It’s incredible to see the progress occurring in people’s skills and even in their careers. I hope that my journey from learning SQL in 2019 to leading a small data team now can inspire others. I believe that sharing knowledge and asking questions are great ways to learn and grow. I want others to know that it's okay not to have all the answers and that their experiences are valuable. Everyone has a unique perspective and experience that can help others. #### Anything else interesting you want to tell us?[​](#anything-else-interesting-you-want-to-tell-us "Direct link to Anything else interesting you want to tell us?") Thank you 💕. I gain a lot from the community and dbt. --- ### Maintaining a Slack channel #### TL;DR[​](#tldr "Direct link to TL;DR") There are three things you should do to be a good channel maintainer in the [dbt Slack community](https://community.getdbt.com/): * Once you see some folks in the channel, post initial conversation topics to help them get to know each other. * Keep an eye out in #introductions for folks who might benefit from your new channel. For example, if someone works in the space or on the same problems, then direct them to the channel. * Make sure folks follow the [Rules of the Road](https://docs.getdbt.com/community/resources/community-rules-of-the-road.md) in the channel. If you notice someone is not following one, gently remind them of the rule in thread, and, ideally, provide an example of how they can rephrase their message or where they can redirect it. If you have a question about how to proceed, just post it in #moderation-and-administration with a screenshot or link to the thread and someone will give you advice. #### Scope of the role[​](#scope-of-the-role "Direct link to Scope of the role") A maintainer can be a dbt Labs employee but does not have to be. *Slack channel maintainer* is philosophically similar to OSS maintainer. At the onset, the channel maintainer will help build up this new space in Slack and stir up conversation during the first few weeks of the channel's existence. They are someone who stays on top of feedback and encourages generative contributions. This is not necessarily someone who is the generator of content and contributions, or answers every question. #### Initial instructions[​](#initial-instructions "Direct link to Initial instructions") 1. Review the [Rules of the Road](https://docs.getdbt.com/community/resources/community-rules-of-the-road.md) and [Code of Conduct](https://docs.getdbt.com/community/resources/code-of-conduct.md) and please let the folks who created the channel know that you read both documents and you agree to be mindful of them. 2. To request a new channel, go to the [#moderation-and-administration channel](https://getdbt.slack.com/archives/C02JJ8N822H), click **Workflows** on the top of the channel description. Click **Request a New Channel**. * Fill out the fields and click **Submit** to submit your request. Someone will get in touch from there. ![request-slack-chnl](https://github.com/siljamardla/docs.getdbt.com/assets/89008547/b14abc52-4164-40a8-b48a-e8061fb4b51a) 3. If you are a vendor, review the [Vendor Expectations](https://docs.getdbt.com/community/resources/community-rules-of-the-road.md#vendor-expectations). 4. Add the Topic and Description to the channel. @Mention your name in the channel Description, identifying yourself as the maintainer. Ex: *Maintainer: First Last (pronouns).* If you are a vendor, make sure your Handle contains your affiliation. 5. Complete or update your Slack profile by making sure your Company (in the ‘What I do’ field), Pronouns, and Handle, if you’re a vendor, are up-to-date. 6. Post initial conversation topics once a few folks get in the channel to help folks get to know each other. Check out this [example introductory post](https://getdbt.slack.com/archives/C02FXAZRRDW/p1632407767005000). 7. Stir up conversation during the first few weeks of the channel's existence. As you get started, answer the questions you can or help find someone with answers, seed discussions about once a week, and making sure folks follow the Rules of the Road. #### Long-term expectations[​](#long-term-expectations "Direct link to Long-term expectations") * Maintaining the channel, checking in, and being active on a regular basis by answering folks' questions, and seeding discussions. Want an example? Check out [this poll](https://getdbt.slack.com/archives/C022A67TLFL/p1628279819038800). * For guidance on how to answer a question, see [Answering Community Questions](https://www.getdbt.com/community/answering-community-questions). If you are not sure how to answer a lingering or unanswered question, you can post about it in #moderation-and-administration or direct it to another channel, if relevant. * If the channel is an industry channel, it’s helpful to monitor [#introductions](https://getdbt.slack.com/archives/CETJLH1V3) and invite people. Keep an eye out for folks who might benefit from being in the new channel if they mention they are working in the space, or are thinking about some of these problems. * Make sure folks follow the [Rules of the Road](https://docs.getdbt.com/community/resources/community-rules-of-the-road.md). For example, if you notice someone is not following one, gently remind them of the rule in thread, and, ideally, provide an example of how they can rephrase their message or where they can redirect it. If you have a question about how to proceed, just post about it in #moderation-and-administration with a link to the thread or screenshot and someone will give you advice. * In tools channels, sharing customer stories and product updates is very okay in this channel because folks expect that when they join. However, please avoid any direct sales campaigns, pricing offers, etc. * If you have any questions/doubts about the [Rules of the Road and Vendor Expectations](https://docs.getdbt.com/community/resources/community-rules-of-the-road.md), please post a question in #moderation-and-administration about what sort of things the community expects from interactions with vendors. * A reminder that we never DM anyone in Slack without their permission in a public channel or some prior relationship. * A reminder that @ here/all/channel are disabled. * Use and encourage the use of threads 🧵 to keep conversations tidy! #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Mariah Rogers [Home](https://docs.getdbt.com/index.md "Home")[Community spotlight](https://docs.getdbt.com/community/spotlight.md "Community spotlight")[Mariah Rogers](https://docs.getdbt.com/community/spotlight/mariah-rogers.md "Mariah Rogers") ![Mariah Rogers](/img/community/spotlight/mariah-rogers.jpg?v=2) she/her Senior Analytics Engineer , Arcadia Location: Irvine, California [LinkedIn](https://linkedin.com/in/mariahjrogers "LinkedIn") #### About I got my start in the data world helping create a new major and minor in Data Science at my alma mater. I then became a data engineer, learned a ton, and propelled myself into the clean energy sector. Now I do data things at a clean energy company and geek out on solar energy at work and at home! I attended my first Coalesce virtually in 2021 when my former colleague Emily Ekdahl gave a talk about some cool things we'd been working on. She inspired me to propose a talk the following year, so I submitted two topics and, surprisingly, both were accepted! I ultimately chose to speak about Testing in dbt in New Orleans in 2022, and the community's reception of that talk continues to be a highlight of my career. #### When did you join the dbt community and in what way has it impacted your career?[​](#when-did-you-join-the-dbt-community-and-in-what-way-has-it-impacted-your-career "Direct link to When did you join the dbt community and in what way has it impacted your career?") My colleague and I discovered dbt in mid-2020 while hunting down documentation for Jinja2 because we had just spent four months writing our own Jinja2-powered SQL templating tool. We immediately backtracked that project, migrated everything to dbt, and never looked back. I joined the dbt Slack community in shortly after we started our migration (fun fact, my first access log into the community Slack was on October 12, 2020). The tool, and the wonderful, supportive community around it, changed the way I work with data every single day and has helped me launch a career more successful and fulfilling than I could have imagined. I have spoken at Coalesce and a local Meetup, made countless online and IRL friends that are a blast to run into each year at the various conferences and events, and paved a path for my career long-term that I did not know could have existed before I found dbt and became an Analytics Engineer. #### What dbt community leader do you identify with? How are you looking to grow your leadership in the dbt community?[​](#what-dbt-community-leader-do-you-identify-with-how-are-you-looking-to-grow-your-leadership-in-the-dbt-community "Direct link to What dbt community leader do you identify with? How are you looking to grow your leadership in the dbt community?") Sometimes it feels hard to exist in this space as "just" a practitioner living in a vendor-ruled world. I am most inspired by my fellow practitioners who have their day jobs and still make the time and exude passion for the analytics craft both in- and outside of work, and share that with the community. I also greatly appreciate and aspire to be the type of community leader who crafts safe spaces and supportive networks like the women leading Data Angels, Women in Data, and other such groups. I have found so much profound community in these groups, and I hope to give back to these networks everything I have gotten out of them and more. #### What have you learned from community members? What do you hope others can learn from you?[​](#what-have-you-learned-from-community-members-what-do-you-hope-others-can-learn-from-you "Direct link to What have you learned from community members? What do you hope others can learn from you?") They say that there are builders and there are optimizers. I am firmly in the optimizer camp. Everything I do, I do to make my life easier, the lives of my collaborators or stakeholders easier, or to help others in the community do the same for themselves. At least half of the expertise I have working with dbt or as an Analytics Engineer I learned from community members' blog posts, substacks, tweets, Coalesce talks, or even casual conversations at Meetups. The other half I earned out of curiosity about whether there was a better or more efficient way to accomplish the task at hand than the way I first learned to do it or first tried to complete it (and there usually is!). It is my goal to help spread this sense of curiosity and wonder around the community, as well as to share some of my more substantive learnings and opinions in the form of articles or conference talks in the future! I still feel like a caterpillar in my cocoon, trying to figure out what shape I'll take before I make my imprint on the world, but I will emerge soon! #### Anything else interesting you want to tell us?[​](#anything-else-interesting-you-want-to-tell-us "Direct link to Anything else interesting you want to tell us?") Catch me in the purple jean jacket covered in data patches in Vegas this October at Coalesce 2024! --- ### Meagan Palmer [Home](https://docs.getdbt.com/index.md "Home")[Community spotlight](https://docs.getdbt.com/community/spotlight.md "Community spotlight")[Meagan Palmer](https://docs.getdbt.com/community/spotlight/meagan-palmer.md "Meagan Palmer") Community Award Recipient 2024 ![Meagan Palmer](/img/community/spotlight/Meagan-Palmer.png?v=2) she/her Principal Consultant , Altis Consulting Location: Sydney, Australia [LinkedIn](https://www.linkedin.com/in/meaganpalmer/ "LinkedIn") #### About I first started using dbt in 2016 or 2017 (I can't remember exactly). Since then, I have moved into data and analytics consulting and have dipped in and out of the dbt Community. Late last year, I started leading dbt Cloud training courses and spending more time in the [dbt Slack](https://www.getdbt.com/community/join-the-community/). In consulting, I get to use a range of stacks. I've used dbt with Redshift, Snowflake, and Databricks in production settings with a range of loaders & reporting tools, and I've been enjoying using DuckDB for some home experimentation. To share some of the experiences, I regularly post to LinkedIn and have recently started [Analytics Engineering Today](https://www.linkedin.com/newsletters/analytics-engineering-today-7210968984693690370/), a twice monthly newsletter about dbt in practice. #### When did you join the dbt community and in what way has it impacted your career?[​](#when-did-you-join-the-dbt-community-and-in-what-way-has-it-impacted-your-career "Direct link to When did you join the dbt community and in what way has it impacted your career?") I was fortunate that Jon Bradley at Nearmap had the vision to engage the then Fishtown Analytics team (as the dbt Labs team was formerly called) as consultants and begin using dbt in our stack. I can't thank him enough. It was a turning point for my career, where I could combine my interests and experiences in delivering business value, data, product management, and software engineering practices. #### Which dbt community leader do you identify with? How are you looking to grow your leadership in the dbt community?[​](#which-dbt-community-leader-do-you-identify-with-how-are-you-looking-to-grow-your-leadership-in-the-dbt-community "Direct link to Which dbt community leader do you identify with? How are you looking to grow your leadership in the dbt community?") Being in Australia, I often see replies from [Jeremy Yeo](https://www.linkedin.com/in/jeremyyeo/) to people in the dbt Slack. His clarity of communication is impressive. For growth, I'm hoping that others can benefit from the wide range of experience I have. My LinkedIn Newsletter, [Analytics Engineering Today](https://www.linkedin.com/newsletters/analytics-engineering-today-7210968984693690370/) aims to upskill the dbt Community and shed some light on some useful features that might not be well known. I was at [Coalesce Online](https://coalesce.getdbt.com/)and am doing some webinars/events later in the year. Come say hi, I love talking dbt and analytics engineering with people. #### What have you learned from community members? What do you hope others can learn from you?[​](#what-have-you-learned-from-community-members-what-do-you-hope-others-can-learn-from-you "Direct link to What have you learned from community members? What do you hope others can learn from you?") The community members are amazing. It's great to be among a group of people that want to learn and improve. I've learned a lot - both from other members helping with my queries and in reading how other businesses have implemented dbt, including their stories on the organizational & technical issues they face. I hope I can help instill a sense that simple, clean solutions are possible and preferable. I want to highlight that it is important to focus on what is the actual problem you are trying to solve and the fact that it's worth asking for help when you're starting to get stuck. I'm keen to help more women get the courage to work & lead in STEM. There has been a lot of progress made over the course of my career which is great to see. Australian/NZ women, please connect with me, happy to chat. --- ### Mike Stanley [Home](https://docs.getdbt.com/index.md "Home")[Community spotlight](https://docs.getdbt.com/community/spotlight.md "Community spotlight")[Mike Stanley](https://docs.getdbt.com/community/spotlight/mike-stanley.md "Mike Stanley") Community Award Recipient 2024 ![Mike Stanley](/img/community/spotlight/mike-stanley.jpg?v=2) he/him Manager, Data , Freetrade Location: London, United Kingdom [LinkedIn](https://www.linkedin.com/in/mike-stanley-31616994/ "LinkedIn") #### About I've split my time between financial services and the video games industry. Back when I wrote code every day, I worked in marketing analytics and marketing technology. I've been in the dbt community for about two years. I haven't authored any extensions to dbt's adapters yet but I've given feedback on proposed changes! #### When did you join the dbt community and in what way has it impacted your career?[​](#when-did-you-join-the-dbt-community-and-in-what-way-has-it-impacted-your-career "Direct link to When did you join the dbt community and in what way has it impacted your career?") I've led data teams for almost ten years now and it can be a challenge to stay current on new technology when you're spending a lot of time on leadership and management. I joined the dbt Community to learn how to get more from it, how to solve problems and use more advanced features, and to learn best practices. I find that answering questions is the way I learn best, so I started helping people! #### Which dbt Community leader do you identify with? How are you looking to grow your leadership in the dbt community?[​](#which-dbt-community-leader-do-you-identify-with-how-are-you-looking-to-grow-your-leadership-in-the-dbt-community "Direct link to Which dbt Community leader do you identify with? How are you looking to grow your leadership in the dbt community?") I hope that we can all continue to level up our dbt skills and leave the data environments that we work in better than we found them. #### What have you learned from community members? What do you hope others can learn from you?[​](#what-have-you-learned-from-community-members-what-do-you-hope-others-can-learn-from-you "Direct link to What have you learned from community members? What do you hope others can learn from you?") Everything! People share so much about their best practices and when and how to deviate from them, interesting extensions to dbt that they've worked on, common bugs and problems, and how to think in a "dbtish" way. I couldn't have learned any of that without the community! --- ### Mikko Sulonen [Home](https://docs.getdbt.com/index.md "Home")[Community spotlight](https://docs.getdbt.com/community/spotlight.md "Community spotlight")[Mikko Sulonen](https://docs.getdbt.com/community/spotlight/mikko-sulonen.md "Mikko Sulonen") ![Mikko Sulonen](/img/community/spotlight/Mikko-Sulonen.png?v=2) he/him Data Architect, Partner , Recordly Oy Location: Tampere, Finland [LinkedIn](https://www.linkedin.com/in/mikkosulonen/ "LinkedIn") #### About I've been working with data since 2016. I first started with the on-prem SQL Server S-stack of SSIS, SSAS, SSRS. I did some QlikView and Qlik Sense, and some Power BI. Nowadays, I work mostly with Snowflake, Databricks, Azure, and dbt, of course. While tools and languages have come and gone, SQL has stayed. I've been a consultant for all of my professional life. #### When did you join the dbt community and in what way has it impacted your career?[​](#when-did-you-join-the-dbt-community-and-in-what-way-has-it-impacted-your-career "Direct link to When did you join the dbt community and in what way has it impacted your career?") I started using dbt around 2019-2020, I think. Snapshots (no longer "archives") were a new thing along with the RPC server! I asked around my then-company: pretty much nobody had used dbt, though some commented that it did look promising. That left me looking elsewhere for experiences and best practices around the tool, and I found different blog writers and eventually the [dbt Slack](https://www.getdbt.com/community/join-the-community). I quickly noticed I could learn much more from the experiences of others than by trying everything myself. After just lurking for a while, I started to answer people's questions and give my own thoughts. This was completely new to me: voicing my input and opinions to people I had never met or who were not my colleagues. #### Which dbt community leader do you identify with? How are you looking to grow your leadership in the dbt community?[​](#which-dbt-community-leader-do-you-identify-with-how-are-you-looking-to-grow-your-leadership-in-the-dbt-community "Direct link to Which dbt community leader do you identify with? How are you looking to grow your leadership in the dbt community?") There are quite many. I started to write some names here, but felt the list was getting a bit long and I'd still forget somebody! What the community leaders have in common is that they are approachable, knowledgeable, and passionate. They want to help others, they want to drive the community forward, and they are down to earth. I've had the pleasure of meeting many of them in person at the past two [Coalesces](https://coalesce.getdbt.com/), and I hope to meet many more! Growing my own leadership in the community... That's an interesting question: I hadn't really identified myself as leader in the community before. Maybe I should come out of the Slack channels and join and/or organize some [dbt Meetups](https://www.meetup.com/pro/dbt/)? I always try to answer even the simplest questions, even if they've been answered a hundred times already. Every day, new people are introduced to dbt, and they are facing issues for the first time. Every one of us was new at one point! #### What have you learned from community members? What do you hope others can learn from you?[​](#what-have-you-learned-from-community-members-what-do-you-hope-others-can-learn-from-you "Direct link to What have you learned from community members? What do you hope others can learn from you?") I've learnt a surprising amount about the different teams and ways of working with and around data. I've learnt that it is highly probable that somebody somewhere has already had, and likely solved, the problem you are having. All that is needed to connect the dots is for people to speak and listen. When asking and answering questions, I try to hone in on what they're really trying to get at. I ask, why is it that you want to do something in that particular way? I like to say, don't love the solutions, love the problems! --- ### Oliver Cramer [Home](https://docs.getdbt.com/index.md "Home")[Community spotlight](https://docs.getdbt.com/community/spotlight.md "Community spotlight")[Oliver Cramer](https://docs.getdbt.com/community/spotlight/oliver-cramer.md "Oliver Cramer") Community Award Recipient 2023 ![Oliver Cramer](/img/community/spotlight/oliver.jpg?v=2) he/him Lead Data Warehouse Architect , Aquila Capital Location: Celle, Germany Organizations: TDWI Germany [LinkedIn](https://www.linkedin.com/in/oliver-cramer/ "LinkedIn") #### About When I joined Aquila Capital in early 2022, I had the ModernDataStack with SqlDBM, dbt & Snowflake available. During the first half year I joined the dbt community. I have been working in the business intelligence field for many years. In 2006 I founded the first TDWI Roudtable in the DACH region. I often speak at conferences, such as the Snowflake Summit and the German TDWI conference. I have been very involved in the data vault community for over 20 years and I do a lot of work with dbt Labs’ Sean McIntyre and Victoria Mola to promote Data Vault in EMEA. I have even travelled to Canada and China to meet data vault community members! Currently I have a group looking at the Data Vault dbt packages. The German Data Vault User Group (DDVUG) has published a sample database to test Data Warehouse Automation tools. In addition, I founded the Analytics Engineering Northern Germany Meetup Group, which will transition into an official dbt Meetup, the [Northern Germany dbt Meetup](https://www.meetup.com/norther-germany-dbt-meetup/). #### When did you join the dbt community and in what way has it impacted your career?[​](#when-did-you-join-the-dbt-community-and-in-what-way-has-it-impacted-your-career "Direct link to When did you join the dbt community and in what way has it impacted your career?") I joined the dbt community in 2022. My current focus is on building modern data teams. There is no magic formula for structuring your analytics function. Given the pace of technological change in our industry, the structure of a data team must evolve over time. #### What dbt community leader do you identify with? How are you looking to grow your leadership in the dbt community?[​](#what-dbt-community-leader-do-you-identify-with-how-are-you-looking-to-grow-your-leadership-in-the-dbt-community "Direct link to What dbt community leader do you identify with? How are you looking to grow your leadership in the dbt community?") I like working with dbt Labs' Sean McIntyre to promote Data Vault in Europe and Victoria Perez Mola, also from dbt Labs, is always a great help when I have questions about dbt. #### What have you learned from community members? What do you hope others can learn from you?[​](#what-have-you-learned-from-community-members-what-do-you-hope-others-can-learn-from-you "Direct link to What have you learned from community members? What do you hope others can learn from you?") I just think it's good to have a community, to be able to ask questions and get good answers. #### Anything else interesting you want to tell us?[​](#anything-else-interesting-you-want-to-tell-us "Direct link to Anything else interesting you want to tell us?") Data Vault is actively looking forward to supporting the messaging that dbt (+packages) is a real alternative that works. --- ### Online community building ##### Maintaining a channel in the dbt Community Slack[​](#maintaining-a-channel-in-the-dbt-community-slack "Direct link to Maintaining a channel in the dbt Community Slack") ###### Overview[​](#overview "Direct link to Overview") The dbt Slack is the place for real time conversations with the dbt Community. Slack channels exist for specific locations, tools, industries and methodologies. In order to make sure that every channel has dedicated attention from a committed community member, we have Community Maintainers who oversee the discussion in particular channels. ###### Contribution opportunities[​](#contribution-opportunities "Direct link to Contribution opportunities") Every channel can benefit from people who are engaged and committed to making it a more interesting place to hang out! If there's a channel that you're interested in, spend time there. For new channels that you'd like to create and maintain, post a message in the #channel-requests channel. ###### Sample contributions:[​](#sample-contributions "Direct link to Sample contributions:") * Karen Hsieh's [contributions](https://getdbt.slack.com/archives/C02TU2DSKND/p1661483529756289) to the #local-taipei channel are a fantastic example to learn from. ###### Get started[​](#get-started "Direct link to Get started") * Read the guide to [Maintaining a Slack Channel](https://docs.getdbt.com/community/resources/maintaining-a-channel.md) ##### Participating on the Community Forum[​](#participating-on-the-community-forum "Direct link to Participating on the Community Forum") ###### Overview[​](#overview-1 "Direct link to Overview") [The dbt Community Forum](https://discourse.getdbt.com) is the preferred platform for support questions as well as a space for long-lived discussions about dbt, analytics engineering, and the analytics profession. It's a place for us to build up a long-lasting knowledge base around the common challenges, opportunities, and patterns we work with every day. ###### Contribution opportunities[​](#contribution-opportunities-1 "Direct link to Contribution opportunities") Participate in the Forum by asking and answering questions. These discussions are what allows us to find gaps in our best practices, documentation and other recommendation, as well as to get folks onboarded and understanding dbt. Remember, it’s a mitzvah to answer a question. If you see a great question or answer, be generous with your 💜 reactions. Click the Solved button when your question is answered, so others can benefit. ###### Sample contributions[​](#sample-contributions-1 "Direct link to Sample contributions") * An analytics engineer wrote about [how they modified dbt to automatically put models into the correct schema](https://discourse.getdbt.com/t/extracting-schema-and-model-names-from-the-filename/575) based on their filename. * Here's [an example of the supportive, thorough answers](https://discourse.getdbt.com/t/is-it-possible-to-have-multiple-files-with-the-same-name-in-dbt/647) you can receive when you take the time to ask a question well. ###### Get started[​](#get-started-1 "Direct link to Get started") * Read the [Community Forum Guidelines](https://docs.getdbt.com/community/resources/forum-guidelines.md) to understand what topics are a good fit and why this space is important in building long-term community knowledge. * Head over to the “[Help](https://discourse.getdbt.com/c/help/19)” section of the forum and look for areas to hop in! You don’t need to know the exact answer to a question to be able to provide a helpful pointer. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Open source and source-available projects Looking for a good place to get involved contributing code? dbt Labs supports the following repositories, organized by the language primarily needed for contribution: #### Rust[​](#rust "Direct link to Rust") * [dbt Fusion engine](https://github.com/dbt-labs/dbt-fusion) - the next-generation engine powering dbt #### Python[​](#python "Direct link to Python") * [dbt Core](https://github.com/dbt-labs/dbt-core) - the original engine powering dbt * [hubcap](https://github.com/dbt-labs/hubcap) - the code powering the dbt Package hub #### dbt[​](#dbt "Direct link to dbt") * [dbt Labs' packages](https://hub.getdbt.com/dbt-labs/) - the dbt pacakges created and supported by dbt Labs. Packages are just dbt projects, so if you know the SQL, Jinja, and YAML necessary to work in dbt, you can contribute to packages. #### YAML and JSON Config[​](#yaml-and-json-config "Direct link to YAML and JSON Config") * [dbt-jsonschema](https://github.com/dbt-labs/dbt-jsonschema) - powering completion and linting for YAML configuration in dbt projects. #### Shell[​](#shell "Direct link to Shell") * [dbt-completion.bash](https://github.com/dbt-labs/dbt-completion.bash) - provides shell completion of CLI commands and selectors such as models and tests for bash and zsh. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Opeyemi Fabiyi [Home](https://docs.getdbt.com/index.md "Home")[Community spotlight](https://docs.getdbt.com/community/spotlight.md "Community spotlight")[Opeyemi Fabiyi](https://docs.getdbt.com/community/spotlight/fabiyi-opeyemi.md "Opeyemi Fabiyi") Community Award Recipient 2024 ![Opeyemi Fabiyi](/img/community/spotlight/fabiyi-opeyemi.jpg?v=2) he/him Analytics Manager , Data Culture Location: Lagos, Nigeria Organizations: Young Data Professionals (YDP) [Twitter](https://twitter.com/Opiano_1 "Twitter") | [LinkedIn](https://www.linkedin.com/in/opeyemifabiyi/ "LinkedIn") #### About I’m an Analytics Engineer with Data Culture, a Data Consulting firm where I use dbt regularly to help clients build quality-tested data assets. Before Data Culture, I worked at Cowrywise, one of the leading Fintech companies in Nigeria, where I was a solo data team member, and that was my first introduction to dbt and Analytics Engineering. Before that, I was doing Data Science and Analytics at Deloitte Nigeria. It’s been an exciting journey since I started using dbt and joining the community.Outside of work, I’m very passionate about Community building and Data Advocacy. I founded one of Nigeria’s most vibrant Data communities, “The Young Data Professional Community.” I’m also the Founder of the [Lagos dbt Meetup](https://www.meetup.com/lagos-dbt-meetup/) and one of the organizers of the Largest Data Conference in Africa, [DataFest Africa Conference](https://www.datacommunityafrica.org/datafestafrica/). I became an active member of the dbt community in 2021 & [spoke at Coalesce 2022](https://coalesce.getdbt.com/on-demand/how-to-leverage-dbt-community-as-the-first-and-only-data-hire-to-survive). So when I’m not actively working I’m involved in one community activity or the other. #### When did you join the dbt community and in what way has it impacted your career?[​](#when-did-you-join-the-dbt-community-and-in-what-way-has-it-impacted-your-career "Direct link to When did you join the dbt community and in what way has it impacted your career?") I joined the [dbt Slack community](https://www.getdbt.com/community/join-the-community/?utm_medium=internal\&utm_source=docs\&utm_campaign=q3-2024_dbt-spotlight_aw\&utm_content=____\&utm_term=all___) in 2021, and it has been an experience getting to learn from thought leaders in the space and stay in touch with cutting-edge innovation in the data space. The community has helped me become a better engineer by reading different responses to questions on Slack, and seeing genuine support from community members help other members tackle and solve their difficult problems is inspiring and has allowed me to model my community (YDP & the Lagos dbt Meetup) through that lens. I randomly enter the dbt Slack daily to read and learn from different channels. I love the sense of community that resonates in the dbt Slack channel, and the good news is that I got my current role from the #jobs channel from a post from Data Culture Co-Founder. So you can stay glued to that page if you are looking for a job role. The dbt community greatly impacted my previous role as a one-person data team. The community became the team I didn't have, providing all the necessary support and guidance I needed to deliver great value for the company excellently, and my experience with the community was the inspiration for my Coalesce talk in 2022 on how to leverage the dbt community as a data team of one. #### What dbt community leader do you identify with? How are you looking to grow your leadership in the dbt community?[​](#what-dbt-community-leader-do-you-identify-with-how-are-you-looking-to-grow-your-leadership-in-the-dbt-community "Direct link to What dbt community leader do you identify with? How are you looking to grow your leadership in the dbt community?") Many great leaders inspire me in the dbt community; Joel Labes for constantly interacting with new folks and providing that safe space for everyone to ask any question, no matter how dumb you may think your question may be. He will give a response that will solve your problem; Benn Stancil for his vast experience and how he communicates it well with humour in his Friday night Substack, a newsletter I look forward to, which helps me stay current with recent trends in the global data space. Both of them resonate with the kind of leader I want to grow in the dbt Community; to be vast, experienced and readily available to provide support and guidance and help people solve problems and grow their careers. #### What have you learned from community members? What do you hope others can learn from you?[​](#what-have-you-learned-from-community-members-what-do-you-hope-others-can-learn-from-you "Direct link to What have you learned from community members? What do you hope others can learn from you?") I've learned how to show empathy as a data professional and be a great engineer from various best practices around working with data. I also want others to know that irrespective of their current level of expertise or maturity in their data career, they can make an impact by getting involved in the community and helping others grow. #### Anything else interesting you want to tell us?[​](#anything-else-interesting-you-want-to-tell-us "Direct link to Anything else interesting you want to tell us?") Maybe I will consider DevRel as a career sometime because of my innate passion and love for community and people. Several folks tell me I’m a strong DevRel talent and a valuable asset for any product-led company. If you need someone to bounce ideas off of or discuss your community engagement efforts, please feel free to reach out. On a side note, it was really exciting for me to attend Coalesce 2024 in Vegas in person, which allowed me not only to learn but, most importantly, to meet amazing persons I’ve only interacted with online, like Bruno, Kuberjain, Dakota and many more; shout-out to [Zenlytic](https://www.zenlytic.com/) and [Lightdash](https://www.lightdash.com/) for making that possible and, most importantly, a huge shout-out to the dbt Lab community team: Amada, Natasha and everyone on the community team for their constant supports to helping out with making the dbt Lagos (Nigeria) meetup a success. --- ### Owen Prough [Home](https://docs.getdbt.com/index.md "Home")[Community spotlight](https://docs.getdbt.com/community/spotlight.md "Community spotlight")[Owen Prough](https://docs.getdbt.com/community/spotlight/owen-prough.md "Owen Prough") ![Owen Prough](/img/community/spotlight/owen-prough.jpg?v=2) he/him Data Engineer , Sift Healthcare Location: Milwaukee, USA [LinkedIn](https://linkedin.com/in/owen-prough "LinkedIn") #### About Well met, data adventurer! My professional data history is mostly USA healthcare-related (shout out to ANSI X12 claim files) while working with large (10k+ employee) software companies and small (but growing!) startups. My constant companion for the last decade has been SQL of various flavors , and these days I mostly work with PostgreSQL, AWS Athena, and Snowflake. I think SQL is a great tool to solve interesting problems. Oh and also dbt. I haven't done anything too fancy with dbt, but I have contributed to the [dbt-athena adapter](https://docs.getdbt.com/docs/local/connect-data-platform/athena-setup "dbt-athena adapter") and a few different packages. Mostly I lurk on Slack, cleverly disguised as a duck. It's a professional goal of mine to someday attend [Coalesce](https://coalesce.getdbt.com/?utm_medium=internal\&utm_source=docs\&utm_campaign=q3-2024_coalesce-2023_aw\&utm_content=coalesce____\&utm_term=all_all__ "Coalesce"). #### When did you join the dbt community and in what way has it impacted your career?[​](#when-did-you-join-the-dbt-community-and-in-what-way-has-it-impacted-your-career "Direct link to When did you join the dbt community and in what way has it impacted your career?") I committed dbt\_project.yml to the company git repo in July 2021 so I've been hanging out with all of you for about 2 years. What I love the most about dbt is how easy it is to write data tests. Writing data tests without dbt was painful, but now with all the tests we have in dbt I have a dramatically improved confidence in our data quality. The wider dbt community is also a reliable and constant source of education. I only interact in a few Slack channels, but I read *many* Slack channels to see what others are doing in the Analytics Engineering space and to get ideas about how to improve the processes/pipelines at my company. Y'all are great. #### What dbt community leader do you identify with? How are you looking to grow your leadership in the dbt community?[​](#what-dbt-community-leader-do-you-identify-with-how-are-you-looking-to-grow-your-leadership-in-the-dbt-community "Direct link to What dbt community leader do you identify with? How are you looking to grow your leadership in the dbt community?") This is an interesting question. I think I most identify with or am inspired by [Josh Devlin](https://docs.getdbt.com/community/spotlight/josh-devlin.md), who seems to be everywhere on Slack and very knowledgeable/helpful. I also want to know things and pay it forward. Also shout out to [Faith Lierheimer](https://docs.getdbt.com/community/spotlight/faith-lierheimer.md), whose contributions to [#roast-my-graph](https://www.getdbt.com/community/join-the-community/?utm_medium=internal\&utm_source=docs\&utm_campaign=q3-2024_dbt-spotlight_aw\&utm_content=____\&utm_term=all___) always make me laugh and/or weep. #### What have you learned from community members? What do you hope others can learn from you?[​](#what-have-you-learned-from-community-members-what-do-you-hope-others-can-learn-from-you "Direct link to What have you learned from community members? What do you hope others can learn from you?") The [public documentation for dbt](https://docs.getdbt.com/docs/introduction.md) is quite good. You should bookmark it and make it a personal goal to read through it all. There are a lot of cool things that dbt can do. Also I think it's really cool to see newcomers asking questions on Slack/[Discourse](https://discourse.getdbt.com/) and then see those same people answering others' questions. It speaks to the value we all get from dbt that folks want to give back to the community. #### Anything else interesting you want to tell us?[​](#anything-else-interesting-you-want-to-tell-us "Direct link to Anything else interesting you want to tell us?") Did you notice how I avoided starting a sentence with "dbt"? That's because I know the standard is lowercase, but starting a sentence with a lowercase word looks weird to my eyes. --- ### Radovan Bacovic [Home](https://docs.getdbt.com/index.md "Home")[Community spotlight](https://docs.getdbt.com/community/spotlight.md "Community spotlight")[Radovan Bacovic](https://docs.getdbt.com/community/spotlight/radovan-bacovic.md "Radovan Bacovic") ![Radovan Bacovic](/img/community/spotlight/Radovan-Bacovic.png?v=2) he/him Staff Data Engineer , GitLab Location: Novi Sad, Serbia Organizations: Permifrost maintainer [X](https://x.com/svenmurtinson "X") | [LinkedIn](https://www.linkedin.com/in/radovan-ba%C4%87ovi%C4%87-6498603/ "LinkedIn") | [Gitlab Profile](https://gitlab.com/rbacovic "Gitlab Profile") #### About My professional journey and friendship with data started 20 years ago. I've experienced many tools and modalities: from good old RDMS systems and various programming languages like Java and C# in the early days of my career, through private clouds, to MPP databases and multitier architecture. I also saw the emergence of cloud technology, and experienced the changes that came with it, up to the contemporary approach to data, which includes AI tools and practices. I am still excited to get new knowledge and solve problems in the data world. I always enjoy using SQL and Python as my primary "weapons" of choice together with other building blocks for successful data products like Snowflake, dbt, Tableau, Fivetran, Stich, Monte Carlo, DuckDB and more. I consider myself as an experienced data engineer and a wannabe "best bad Conference speaker." #### When did you join the dbt community and in what way has it impacted your career?[​](#when-did-you-join-the-dbt-community-and-in-what-way-has-it-impacted-your-career "Direct link to When did you join the dbt community and in what way has it impacted your career?") I have been in the dbt Community for almost 3 years now. The biggest impact that dbt has had on my professional journey is that it has given me a trustworthy partner for data transformation, and a single source of truth for all our data modification needs. As a public speaker who travels internationally, I recognized the interest of the data community in dbt around the world and, in response, organised several workshops and talks to help people use dbt. Let's just say that jumping into a great partnership with the dbt Community has been the greatest takeaway for me! #### Which dbt community leader do you identify with? How are you looking to grow your leadership in the dbt community?[​](#which-dbt-community-leader-do-you-identify-with-how-are-you-looking-to-grow-your-leadership-in-the-dbt-community "Direct link to Which dbt community leader do you identify with? How are you looking to grow your leadership in the dbt community?") A leader from the dbt Commuity who I have found to be the most prominent is [Sean McIntyre](https://www.linkedin.com/in/boxysean/) from dbt Labs, as I've had the privilege to collaborate with him many times. We recognized that we had a similar passion and energy; it looks like we are on the same journey. I wanted to be more involved in the dbt Community, spread the word, and share my journey through tutorials, conference talks, blogs and meetups. I think I am capable of addressing my influence in that direction. I am also interested in extending the dbt functionality and automating the deployment, testing and execution of dbt. In other words, I try to find good ways to leverage using DevSecOps for dbt to make our development faster and make dbt our trustworthy partner. #### What have you learned from community members? What do you hope others can learn from you?[​](#what-have-you-learned-from-community-members-what-do-you-hope-others-can-learn-from-you "Direct link to What have you learned from community members? What do you hope others can learn from you?") Being part of the community is always a two-way street and always has a positive result for all of us. Great ideas and sharing vision and energy are the number one things. On the other side, I always find the quick and "best in class" answer to my questions from the community and try to return the favour and be helpful whenever possible. As I said, through talks and tutorials, I think I am the most beneficial community member in that role. #### Anything else interesting you want to tell us?[​](#anything-else-interesting-you-want-to-tell-us "Direct link to Anything else interesting you want to tell us?") Nothing to add. Still passionate about discovering the tool and can't wait to see what the AI uprising will bring to us! --- ### Realtime event participation ##### Speak at a Meetup[​](#speak-at-a-meetup "Direct link to Speak at a Meetup") ###### Overview[​](#overview "Direct link to Overview") Meetups are all about knowledge sharing; they are a place to connect and learn with your fellow dbt Community members. They usually take place in person, with some happening virtually. The Meetups take place across the globe, and you can check them all out [here](https://www.meetup.com/pro/dbt/). ###### Contribution opportunities[​](#contribution-opportunities "Direct link to Contribution opportunities") * Become a Meetup organizer * Speak at an event * Sponsor an event or provide your office space For all of these opportunities, please fill out an [interest form](https://docs.google.com/forms/d/e/1FAIpQLScdzuz9Ouo1b07BMHveEBJsJ3rJAYuFvbTKep2fXDL0iZTZUg/viewform) and we will get back to you. ###### Sample contributions[​](#sample-contributions "Direct link to Sample contributions") * Take a look at [the slides](https://docs.google.com/presentation/d/1iqVjzxxRggMnRoI40ku88miDKw795djpKV_v4bbLpPE/edit#slide=id.g553a984de0_0_19) and [watch the video](https://www.youtube.com/watch?v=BF7HH8JDUS0) from Kenny Ning's 2020 Meetup talk on predicting customer conversions with dbt and ML for Better.com. * Dig into [the deck](https://docs.google.com/presentation/d/1wE8NSkFPLFKGQ8fvFUUKoZFVoUhws_FhFip-9mDhoPU/edit#slide=id.p) and [the video](https://www.youtube.com/watch?v=VhH614WVufM) from Bowery Farmings talk on migrating dbt models from Redshift to Snowflake. ###### Get Started[​](#get-started "Direct link to Get Started") * Read [How to Deliver a Fantastic Meetup Talk](https://docs.getdbt.com/community/resources/speaking-at-a-meetup.md). * Find a [Meetup near you](https://www.meetup.com/pro/dbt/), start attending and let the organizers know you are interested! ß ##### Speak at dbt Summit[​](#speak-at-dbt-summit "Direct link to Speak at dbt Summit") ###### Overview[​](#overview-1 "Direct link to Overview") [dbt Summit](https://www.getdbt.com/dbt-summit) is the annual analytics engineering conference hosted by dbt Labs. While Meetups are focused on sharing knowledge with a specific local hub of the Community, dbt Summit is the way to share ideas with everyone. Each year we gather together, take stock of what we’ve learned and pool our best ideas about analytics. ###### Contribution opportunities[​](#contribution-opportunities-1 "Direct link to Contribution opportunities") * Attend dbt Summit: * dbt Summit is the once a year gathering for analytics engineers. Whether you choose to join online or at one of our in-person events, attending dbt Summit is the best way to get an immersive experience of what the dbt Community is like. * Speak at dbt Summit! * We’d love to hear what you’ve been working on, thinking about and dreaming up in the analytics engineering space. dbt Summit talks can be forward looking views on the industry, deep dives into particular technical solutions or personal stories about your journey in data. ###### Sample contributions[​](#sample-contributions-1 "Direct link to Sample contributions") * [Run Your Data Team as a Product Team](https://www.getdbt.com/coalesce-2020/run-your-data-team-as-a-product-team/) * [Tailoring dbt's incremental\_strategy to Artsy's data needs](https://www.getdbt.com/coalesce-2021/tailoring-dbts-incremental-strategy-to-artsys-data-needs/) ###### Get started[​](#get-started-1 "Direct link to Get started") * If registrations are open register on the [dbt Summit website](https://www.getdbt.com/dbt-summit) * Join #dbt-summit-updates on the dbt Community Slack #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Ruth Onyekwe [Home](https://docs.getdbt.com/index.md "Home")[Community spotlight](https://docs.getdbt.com/community/spotlight.md "Community spotlight")[Ruth Onyekwe](https://docs.getdbt.com/community/spotlight/ruth-onyekwe.md "Ruth Onyekwe") Community Award Recipient 2024 ![Ruth Onyekwe](/img/community/spotlight/ruth-onyekwe.jpeg?v=2) she/her Data Analytics Manager , Spaulding Ridge Location: Madrid, Spain [LinkedIn](https://www.linkedin.com/in/ruth-onyekwe/ "LinkedIn") #### About I've been working in the world of Data Analytics for over 5 years and have been part of the dbt community for the last 4. With a background in International Business and Digital Marketing, I experienced first hand the need for reliable data to fuel business decisions. This inspired a career move into the technology space to be able to work with the tools and the people that were facilitating this process. Today I am leading teams to deliver data modernization projects, as well as helping grow the analytics arm of my company on a day to day basis. I also have the privilege of organising the [dbt Meetups in Barcelona, Spain](https://www.meetup.com/barcelona-dbt-meetup/) - and am excited to continue to grow the community across Europe. #### When did you join the dbt community and in what way has it impacted your career?[​](#when-did-you-join-the-dbt-community-and-in-what-way-has-it-impacted-your-career "Direct link to When did you join the dbt community and in what way has it impacted your career?") I joined the dbt community in 2021, after meeting dbt Labs reps at a conference. Through partnering with dbt Labs and learning the technology, we (Spaulding Ridge) were able to open a whole new offering in our service catalogue, and meet the growing needs of our customers. #### Which dbt Community leader do you identify with? How are you looking to grow your leadership in the dbt community?[​](#which-dbt-community-leader-do-you-identify-with-how-are-you-looking-to-grow-your-leadership-in-the-dbt-community "Direct link to Which dbt Community leader do you identify with? How are you looking to grow your leadership in the dbt community?") I identify with the transparent leaders - those willing to share their learnings, knowledge, and experiences. I want to encourage other dbt enthusiasts to stretch themselves professionally and actively participate in the analytics community. #### What have you learned from community members? What do you hope others can learn from you?[​](#what-have-you-learned-from-community-members-what-do-you-hope-others-can-learn-from-you "Direct link to What have you learned from community members? What do you hope others can learn from you?") I've learnt that most of us working in data have experienced the same struggles, be it searching for the best testing frameworks, or deciding how to build optimised and scalable models, or searching for the answers to non-technical questions like how to best organise teams or how to communicate with business stakeholders and translate their needs - we're all faced with the same dilemmas. And the great thing I've learned being in the dbt community, is that if you're brave enough to share your stories, you'll connect with someone who has already gone through those experiences, and can help you reach a solution a lot faster than if you tried to start from scratch. --- ### Safiyy Momen [Home](https://docs.getdbt.com/index.md "Home")[Community spotlight](https://docs.getdbt.com/community/spotlight.md "Community spotlight")[Safiyy Momen](https://docs.getdbt.com/community/spotlight/safiyy-momen.md "Safiyy Momen") ![Safiyy Momen](/img/community/spotlight/safiyy-momen.jpg?v=2) he/him Founder , Aero Location: Salt Lake City, Utah, USA [Twitter](https://twitter.com/safiyy_m "Twitter") | [LinkedIn](https://www.linkedin.com/in/safiyy-momen/ "LinkedIn") #### About I've been in the dbt community for ~4 years now. My experience is primarily in leading data teams, previously at a healthcare startup where I migrated the stack. The dbt Community was invaluable during that time. More recently I've built a product, Aero, that helps Snowflake users optimize costs with a Native extension. I'm exploring ways to automate analytics engineering workflows. I've spoken at various meetups, including the [New York dbt Meetup](https://www.meetup.com/nyc-dbt-meetup/), on data warehouse cost optimization. #### When did you join the dbt community and in what way has it impacted your career?[​](#when-did-you-join-the-dbt-community-and-in-what-way-has-it-impacted-your-career "Direct link to When did you join the dbt community and in what way has it impacted your career?") I joined a few years ago. It's helped me level up on best practices while making new friends and connections along the way. #### What dbt community leader do you identify with? How are you looking to grow your leadership in the dbt community?[​](#what-dbt-community-leader-do-you-identify-with-how-are-you-looking-to-grow-your-leadership-in-the-dbt-community "Direct link to What dbt community leader do you identify with? How are you looking to grow your leadership in the dbt community?") I remember Claire Carroll early on being a great example for community mentorship. I identify with leaders who bring constructive, opinionated, experience-driven insight to conversations. #### What have you learned from community members? What do you hope others can learn from you?[​](#what-have-you-learned-from-community-members-what-do-you-hope-others-can-learn-from-you "Direct link to What have you learned from community members? What do you hope others can learn from you?") I've learned lessons on scaling a large community and creating a culture of curiosity. Hoping others can learn new ways to be useful and build the courage to get themselves out there—to give talks, share their ideas, and engage in debate! #### Anything else interesting you want to tell us?[​](#anything-else-interesting-you-want-to-tell-us "Direct link to Anything else interesting you want to tell us?") If you're not doing the basics to control costs on Snowflake, you should be! Try out [Aero Cost Optimizer](https://app.snowflake.com/marketplace/listing/GZT1ZYS85V/aero-aero-cost-optimizer). --- ### Sam Debruyn [Home](https://docs.getdbt.com/index.md "Home")[Community spotlight](https://docs.getdbt.com/community/spotlight.md "Community spotlight")[Sam Debruyn](https://docs.getdbt.com/community/spotlight/sam-debruyn.md "Sam Debruyn") Community Award Recipient 2023 ![Sam Debruyn](/img/community/spotlight/sam.jpg?v=2) he/him Tech Lead Data & Cloud , dataroots Location: Heist-op-den-Berg, Belgium [Twitter](https://twitter.com/s_debruyn "Twitter") | [LinkedIn](https://www.linkedin.com/in/samueldebruyn/ "LinkedIn") | [Blog](https://debruyn.dev/ "Blog") #### About I have a background of about 10 years in software engineering and moved to data engineering in 2020. Today, I lead dataroots's data & cloud unit on a technical level, allowing me to share knowledge and help multiple teams and customers, while still being hands-on every day. In 2021 and 2022, I did a lot of work on dbt-core and the dbt adapters for Microsoft SQL Server, Azure SQL, Azure Synapse, and now also Microsoft Fabric. I spoke at a few meetups and conferences about dbt and other technologies which I'm passionate about. Sharing knowledge is what drives me, so in 2023 I founded the [Belgium dbt Meetup](https://www.meetup.com/analytics-engineering-belgium/). Every meetup reached its maximum capacity ever since. #### When did you join the dbt community and in what way has it impacted your career?[​](#when-did-you-join-the-dbt-community-and-in-what-way-has-it-impacted-your-career "Direct link to When did you join the dbt community and in what way has it impacted your career?") I [joined the dbt Community](https://www.getdbt.com/community/join-the-community/) at the end of 2020, when we had dbt 0.18. At first, I was a bit suspicious. I thought to myself, how could a tool this simple make such a big difference? But after giving it a try, I was convinced: this is what we'll all be using for our data transformations in the future. dbt shines in its simplicity and very low learning curve. Thanks to dbt, a lot more people can become proficient in data analytics. I became a dbt evangelist, both at my job as well as in local and online data communities. I think that data holds the truth. And I think that the more people we can give access to work with data, so that they don't having to depend on others to work with complex tooling, the more we can achieve together. #### What dbt community leader do you identify with? How are you looking to grow your leadership in the dbt community?[​](#what-dbt-community-leader-do-you-identify-with-how-are-you-looking-to-grow-your-leadership-in-the-dbt-community "Direct link to What dbt community leader do you identify with? How are you looking to grow your leadership in the dbt community?") It's hard to pick one person. There are lots of folks who inspired me along the way. There is Anders Swanson (known as dataders on Github), with whom I've spent countless hours discussing how we can bring two things I like together: dbt and the Microsoft SQL products. It's amazing to look back on what we achieved now that dbt Labs and Microsoft are working together to bring dbt support for Fabric and Synapse. There is also Jeremy Cohen (jerco) whose lengthy GitHub discussions bring inspiration to how you can do even more with dbt and what the future might hold. Cor Zuurmond (JCZuurmond) inspired me to start contributing to dbt Core, adapters, and related packages. He did an impressive amount of work by making dbt-spark even better, building pytest integration for dbt, and of course by bringing dbt to world's most used database: dbt-excel. #### What have you learned from community members? What do you hope others can learn from you?[​](#what-have-you-learned-from-community-members-what-do-you-hope-others-can-learn-from-you "Direct link to What have you learned from community members? What do you hope others can learn from you?") dbt doesn't only shine when you're using it, but also under the hood. dbt's codebase is very approachable and consistently well written with code that is clean, elegant, and easy to understand. When you're thinking about a potential feature, a bugfix, or building integrations with dbt, just go to [Slack](https://www.getdbt.com/community/join-the-community/) or [Github](https://github.com/dbt-labs) and see what you can do to make that happen. You can contribute by discussing potential features, adding documentation, writing code, and more. You don't need to be a Python expert to get started. #### Anything else interesting you want to tell us?[​](#anything-else-interesting-you-want-to-tell-us "Direct link to Anything else interesting you want to tell us?") The dbt community is one of the biggest data communities globally, but also the most welcoming one. It's amazing how nice, friendly, and approachable everyone is. It's great to be part of this community. --- ### Shinya Takimoto [Home](https://docs.getdbt.com/index.md "Home")[Community spotlight](https://docs.getdbt.com/community/spotlight.md "Community spotlight")[Shinya Takimoto](https://docs.getdbt.com/community/spotlight/shinya-takimoto.md "Shinya Takimoto") ![Shinya Takimoto](/img/community/spotlight/shinya-takimoto.jpg?v=2) he/him Analytics Engineer , 10X, Inc. Location: Tokyo, Japan Organizations: Data Build Japan [Twitter](https://twitter.com/takimo "Twitter") | [LinkedIn](https://www.linkedin.com/in/shinya-takimoto-2793483a/ "LinkedIn") | [Website](https://takimo.tokyo/ "Website") #### About I have about 3 years of dbt experience. I used to be in a large organization where the challenge was to create a quality analysis infrastructure for EC data managed by my department with a limited number of staff. It was then that I learned about dbt and I still remember the shock I felt when I ran a dbt run for the first time. Currently, I work for a startup called 10X. We provide a system that allows retailers to seamlessly launch online grocery services in an O2O model. I am also actively involved in dbt Community activities, starting the #local-tokyo channel in dbt Slack, organizing the Tokyo dbt Meetup event and writing translations of dbt-related articles. In addition, I run a podcast called ModernDataStackRadio. #### When did you join the dbt community and in what way has it impacted your career?[​](#when-did-you-join-the-dbt-community-and-in-what-way-has-it-impacted-your-career "Direct link to When did you join the dbt community and in what way has it impacted your career?") I joined dbt Slack in late 2021 and led the creation of the #local-tokyo channel soon after, which I still maintain. Community activities gave me the opportunity to meet more dbt users. By sharing the knowledge and insights I had gained in my own company, I was able to connect with people who were struggling with the same issues and difficulties. The shared sense of common data usage challenges has led to networking and recognition of individuals and the companies they work for, which has increased my opportunities to work with many people. #### What dbt community leader do you identify with? How are you looking to grow your leadership in the dbt community?[​](#what-dbt-community-leader-do-you-identify-with-how-are-you-looking-to-grow-your-leadership-in-the-dbt-community "Direct link to What dbt community leader do you identify with? How are you looking to grow your leadership in the dbt community?") We have been working to create a place in Japan where people who feel the same way about the potential of dbt can interact with each other. Now we have almost 400 members. We would like to support the creation of an environment where more information can be shared through articles and discussions, whereby the focus can be on companies and players who are working on advanced projects. Thereby, we can increase the number of connections among dbt users. #### What have you learned from community members? What do you hope others can learn from you?[​](#what-have-you-learned-from-community-members-what-do-you-hope-others-can-learn-from-you "Direct link to What have you learned from community members? What do you hope others can learn from you?") Many members from enterprise companies to start-ups with various business sizes and a wide variety of business activities have joined the #local-tokyo channel in dbt Slack. Therefore, ideas and knowledge about data modeling and testing differ from one business domain to another. I believe that they provide the local community with a variety of new insights and perspectives that are surprising as well as curious. As a company that uses dbt in many production environments myself, I hope to share a lot of knowledge with the dbt Community. --- ### Stacy Lo [Home](https://docs.getdbt.com/index.md "Home")[Community spotlight](https://docs.getdbt.com/community/spotlight.md "Community spotlight")[Stacy Lo](https://docs.getdbt.com/community/spotlight/stacy-lo.md "Stacy Lo") Community Award Recipient 2023 ![Stacy Lo](/img/community/spotlight/stacy.jpg?v=2) she/her Senior IT Developer , Teamson Location: Taipei, Taiwan [LinkedIn](https://www.linkedin.com/in/olycats/ "LinkedIn") #### About I began my career as a data analyst, then transitioned to a few different roles in data and software development. Analytics Engineer is the best title to describe my expertise in data. I’ve been in the dbt Community for almost a year. In April, I shared my experience adopting dbt at the [Taipei dbt Meetup](https://www.meetup.com/taipei-dbt-meetup/), which inspired me to write technical articles. In Taiwan, the annual "iThome Iron Man Contest" happens in September, where participants post a technical article written in Mandarin every day for 30 consecutive days. Since no one has ever written about dbt in the contest, I'd like to be the first person, and that’s what I have been busy with for in the past couple of months. #### When did you join the dbt community and in what way has it impacted your career?[​](#when-did-you-join-the-dbt-community-and-in-what-way-has-it-impacted-your-career "Direct link to When did you join the dbt community and in what way has it impacted your career?") I joined dbt Slack on November 2022. It was the time our company decided to use dbt in our data architecture, so I joined the #local-taipei channel in [dbt Slack](https://www.getdbt.com/community/join-the-community) and introduced myself. To my surprise, I was immediately invited to share my experience at a [Taipei dbt Meetup](https://www.meetup.com/taipei-dbt-meetup/). I just joined the community, never joined any other meetups, did not know anyone there, and was very new to dbt. The biggest impact to my career is that I gained a lot of visibility! I got to know a lot of great data people, and now I have one [meetup presentation recorded on YouTube](https://youtu.be/KWfoT1nnexc?t=291), 30 technical articles on iThome Iron Man Contest, and now I am featured in the dbt Community Spotlight! #### What dbt community leader do you identify with? How are you looking to grow your leadership in the dbt community?[​](#what-dbt-community-leader-do-you-identify-with-how-are-you-looking-to-grow-your-leadership-in-the-dbt-community "Direct link to What dbt community leader do you identify with? How are you looking to grow your leadership in the dbt community?") Karen Hsieh is the best! She not only brought me in to the dbt Community by way of the #local-taipei channel in dbt Slack, but she also encouraged me to contribute to the community in many ways, without making me feel pressured. With her passion and leading style, Karen successfully built a friendly and diverse group of people in #local-taipei. I’d also like to recommend [Bruno de Lima](https://docs.getdbt.com/community/spotlight/bruno-de-lima.md)'s LinkedIn posts. His 'dbt Tips of the Day' effectively delivery knowledge in a user-friendly way. In addition, I really enjoyed the dbt exam practice polls. Learning dbt can be a challenge, but Bruno makes it both easy and fun! #### What have you learned from community members? What do you hope others can learn from you?[​](#what-have-you-learned-from-community-members-what-do-you-hope-others-can-learn-from-you "Direct link to What have you learned from community members? What do you hope others can learn from you?") I learned that there are many ways to contribute to the community, regardless of our background or skill level. Everyone has something valuable to offer, and we should never be afraid to share. Let's find our own ways to make an impact! #### Anything else interesting you want to tell us?[​](#anything-else-interesting-you-want-to-tell-us "Direct link to Anything else interesting you want to tell us?") Although the #local-taipei channel in dbt Slack is not made up of many, many people, we still managed to assemble a team of 7 people to join the Iron Man Contest. We produced a total of 200 articles in 30 days in topics around dbt and data. I don’t know how many people will find them useful, but it's definitely a great start to raising awareness of dbt in Taiwan. --- ### Sydney Burns [Home](https://docs.getdbt.com/index.md "Home")[Community spotlight](https://docs.getdbt.com/community/spotlight.md "Community spotlight")[Sydney Burns](https://docs.getdbt.com/community/spotlight/sydney-burns.md "Sydney Burns") Community Award Recipient 2023 ![Sydney Burns](/img/community/spotlight/sydney.jpg?v=2) she/her Senior Analytics Engineer , Webflow Location: Panama City, FL, USA [LinkedIn](https://www.linkedin.com/in/sydneyeburns/ "LinkedIn") #### About In 2019, I started as an analytics intern at a healthcare tech startup. I learned about dbt in 2020 and [joined the community](https://www.getdbt.com/community/join-the-community/) to self-teach. The following year, I started using dbt professionally as a consultant, and was able to pick up various parts of the stack and dive into different implementations. That experience empowered me to strike a better balance between "best practices" and what suits a specific team best. I also [spoke at Coalesce 2022](https://coalesce.getdbt.com/blog/babies-and-bathwater-is-kimball-still-relevant), a highlight of my career! Now, I collaborate with other data professionals at Webflow, where focused on enhancing and scaling our data operations. I strive to share the same enthusiasm, support, and knowledge with my team that I've gained from the broader community! #### When did you join the dbt community and in what way has it impacted your career?[​](#when-did-you-join-the-dbt-community-and-in-what-way-has-it-impacted-your-career "Direct link to When did you join the dbt community and in what way has it impacted your career?") The stack I used in my first data role was outdated and highly manual. Where I live, modern tech companies are few and far between, and I didn't have many in-person resources nor enough knowledge to realize that another world was possible at my skill level. I was thrilled to find a pocket of the Internet where similarly frustrated but creative data folks were sharing thoughtful solutions to problems I'd been struggling with! #### What dbt community leader do you identify with? How are you looking to grow your leadership in the dbt community?[​](#what-dbt-community-leader-do-you-identify-with-how-are-you-looking-to-grow-your-leadership-in-the-dbt-community "Direct link to What dbt community leader do you identify with? How are you looking to grow your leadership in the dbt community?") Christine Berger was my first ever (best ever!) data colleague, and the one who first introduced me to dbt. There are certain qualities I've always valued in her, that I've found in many others across the community, and strive to cultivate in myself — earnestness, curiosity, creativity, and consistently doing good work with deep care. #### What have you learned from community members? What do you hope others can learn from you?[​](#what-have-you-learned-from-community-members-what-do-you-hope-others-can-learn-from-you "Direct link to What have you learned from community members? What do you hope others can learn from you?") I spent too much time in my early career feeling scared to ask for help because I didn't want others to think I was incompetent. I'd spin my wheels on something for hours before finally asking someone to help me. The community has proven one thing to me time and time again: there are people here who will not only help you, but will be palpably *excited* to help you and share what they know, especially if it's clear you've made efforts to use your resources and try things on your own first. I'm one of those people now! --- ### The dbt Viewpoint Building a Mature Analytics Workflow: The dbt Viewpoint! In 2015-2016, a team of folks at RJMetrics had the opportunity to observe, and participate in, a significant evolution of the analytics ecosystem. The seeds of dbt were conceived in this environment, and the viewpoint below was written to reflect what we had learned and how we believed the world should be different. **dbt is our attempt to address the workflow challenges we observed, and as such, this viewpoint is the most foundational statement of the dbt project's goals.** The remainder of this document is largely unedited from [the original post](https://getdbt.com/blog/building-a-mature-analytics-workflow). #### Analytics today[​](#analytics-today "Direct link to Analytics today") The center of gravity in mature analytics organizations has shifted away from proprietary, end-to-end tools towards more composable solutions made up of: * data integration scripts and/or tools, * high-performance analytic databases, * SQL, R, and/or Python, and * visualization tools. This change has unlocked significant possibility, but analytics teams (ours included) have still faced challenges in consistently delivering high-quality, low-latency analytics. We believe that analytics teams have a workflow problem. Too often, analysts operate in isolation, and this creates suboptimal outcomes. Knowledge is siloed. We too often rewrite analyses that a colleague had already written. We fail to grasp the nuances of datasets that we’re less familiar with. We differ in our calculations of a shared metric. We have convinced ourselves after hundreds of conversations that these conditions are by and large the status quo for even sophisticated analytics teams. As a result, organizations suffer from reduced decision speed and reduced decision quality. Analytics doesn’t have to be this way. In fact, the playbook for solving these problems already exists — on our software engineering teams. The same techniques that software engineering teams use to collaborate on the rapid creation of quality applications can apply to analytics. We believe it’s time to build an open set of tools and processes to make that happen. #### Analytics is collaborative[​](#analytics-is-collaborative "Direct link to Analytics is collaborative") We believe a mature analytics team’s techniques and workflow should have the following collaboration features: ##### Version Control[​](#version-control "Direct link to Version Control") Analytic code — whether it’s Python, SQL, Java, or anything else — should be version controlled. Analysis changes as data and businesses evolve, and it’s important to know who changed what, when. ##### Quality Assurance[​](#quality-assurance "Direct link to Quality Assurance") Bad data can lead to bad analyses, and bad analyses can lead to bad decisions. Any code that generates data or analysis should be reviewed and tested. ##### Documentation[​](#documentation "Direct link to Documentation") Your analysis is a software application, and, like every other software application, people are going to have questions about how to use it. Even though it might seem simple, in reality the “Revenue” line you’re showing could mean dozens of things. Your code should come packaged with a basic description of how it should be interpreted, and your team should be able to add to that documentation as additional questions arise. ##### Modularity[​](#modularity "Direct link to Modularity") If you build a series of analyses about your company’s revenue, and your colleague does as well, you should use the same input data. Copy-paste is not a good approach here — if the definition of the underlying set changes, it will need to be updated everywhere it was used. Instead, think of the schema of a data set as its public interface. Create tables, views, or other data sets that expose a consistent schema and can be modified if business logic changes. #### Analytic code is an asset[​](#analytic-code-is-an-asset "Direct link to Analytic code is an asset") The code, processes, and tooling required to produce that analysis are core organizational investments. We believe a mature analytics organization’s workflow should have the following characteristics so as to protect and grow that investment: ##### Environments[​](#environments "Direct link to Environments") Analytics requires multiple environments. Analysts need the freedom to work without impacting users, while users need service level guarantees so that they can trust the data they rely on to do their jobs. ##### Service level guarantees[​](#service-level-guarantees "Direct link to Service level guarantees") Analytics teams should stand behind the accuracy of all analysis that has been promoted to production. Errors should be treated with the same level of urgency as bugs in a production product. Any code being retired from production should go through a deprecation process. ##### Design for maintainability[​](#design-for-maintainability "Direct link to Design for maintainability") Most of the cost involved in software development is in the maintenance phase. Because of this, software engineers write code with an eye towards maintainability. Analytic code, however, is often fragile. Changes in underlying data break most analytic code in ways that are hard to predict and to fix. Analytic code should be written with an eye towards maintainability. Future changes to the schema and data should be anticipated and code should be written to minimize the corresponding impact. #### Analytics workflows require automated tools[​](#analytics-workflows-require-automated-tools "Direct link to Analytics workflows require automated tools") Frequently, much of an analytic workflow is manual. Piping data from source to destination, from stage to stage, can eat up a majority of an analyst’s time. Software engineers build extensive tooling to support the manual portions of their jobs. In order to implement the analytics workflows we are suggesting, similar tools will be required. Here’s one example of an automated workflow: * models and analysis are downloaded from multiple source control repositories, * code is configured for the given environment, * code is tested, and * code is deployed. Workflows like this should be built to execute with a single command. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### The Original dbt-athena Maintainers [Home](https://docs.getdbt.com/index.md "Home")[Community spotlight](https://docs.getdbt.com/community/spotlight.md "Community spotlight")[The Original dbt-athena Maintainers](https://docs.getdbt.com/community/spotlight/original-dbt-athena-maintainers.md "The Original dbt-athena Maintainers") Community Award Recipient 2024 ![The Original dbt-athena Maintainers](/img/community/spotlight/dbt-athena-groupheadshot.jpg?v=2) A group of data-engineers , Mix of companies Location: Europe Organizations: dbt-athena (since November 2022) [Jérémy's LinkedIn](https://www.linkedin.com/in/jrmyy/ "Jérémy's LinkedIn") | [Mattia's LinkedIn](https://www.linkedin.com/in/mattia-sappa/ "Mattia's LinkedIn") | [Jesse's LinkedIn](https://www.linkedin.com/in/dobbelaerejesse/ "Jesse's LinkedIn") | [Serhii's LinkedIn](https://www.linkedin.com/in/serhii-dimchenko-075b3061/ "Serhii's LinkedIn") | [Nicola's LinkedIn](https://www.linkedin.com/in/nicolacorda/ "Nicola's LinkedIn") #### About The original dbt-athena Maintainers is a group of 5 people—Jérémy Guiselin, Mattia, Jesse Dobbelaere, Serhii Dimchenko, and Nicola Corda—who met via dbt Slack in the #db-athena channel, with the aim to make make [dbt-athena](https://docs.getdbt.com/docs/local/connect-data-platform/athena-setup) a production-ready adapter. In the first periods, Winter 2022 and Spring 2023, we focused on contributing directly to the adapter, adding relevant features like Iceberg and Lake Formation support, and stabilizing some internal behaviour. On a second iteration our role was triaging, providing community support and bug fixing. We encouraged community members to make their first contributions, and helped them to merge their PRs. #### When did you join the dbt community and in what way has it impacted your career?[​](#when-did-you-join-the-dbt-community-and-in-what-way-has-it-impacted-your-career "Direct link to When did you join the dbt community and in what way has it impacted your career?") The dbt community allowed the dbt-athena maintainers to meet each other, and share the common goal of making the dbt-athena adapter production-ready. #### Which dbt Community leader do you identify with? How are you looking to grow your leadership in the dbt community?[​](#which-dbt-community-leader-do-you-identify-with-how-are-you-looking-to-grow-your-leadership-in-the-dbt-community "Direct link to Which dbt Community leader do you identify with? How are you looking to grow your leadership in the dbt community?") As we grow, we are looking to embody democratic leadership. #### What have you learned from community members? What do you hope others can learn from you?[​](#what-have-you-learned-from-community-members-what-do-you-hope-others-can-learn-from-you "Direct link to What have you learned from community members? What do you hope others can learn from you?") We learned that the power of the community was endless. People started to share best practises, and some of the best practises were incorporated directly in dbt-athena, allowing people to run the adapter smoothly in their production environment. We reached a point where people started to ask advice for their AWS architecture, which we found pretty awesome. --- ### Tips for organizing inclusive events The dbt community is filled with dedicated community leaders who create opportunities for connection, learning and professional development within the analytics community. This guide is a resource to help organizers execute **inclusive digital events**. We understand that organizers, presenters, speakers, etc. might not be able to apply these tips to *every* event, but this guide will offer some food for thought. Additionally, this list can grow. If you would like to contribute a tip, please email . #### General logistics[​](#general-logistics "Direct link to General logistics") * Try to choose a date that does not overlap with [holidays](http://www.holidayscalendar.com/months/) or general major events. Don’t forget to check international holidays (if applicable) * Avoid really large national/local events (i.e. World Cup) #### Marketing[​](#marketing "Direct link to Marketing") * If you are using photos, share images that include community members with a wide range of presentations, including people from underrepresented groups. * Put event accessibility information on your event page (i.e. “closed captioning available for all video resources”) * In the registration process provide an opportunity for attendees to: * share pronouns * ask questions in advance * request specific needs or other accommodations (interpreting services, braille transcription, dietary restrictions, etc.) * If this is a paid event (e.g. a conference), create a scholarship for attendees that might need financial support * Think about how you are promoting your event — are you reaching underrepresented communities, marginalized populations and people who might not have access to the internet? #### Programming[​](#programming "Direct link to Programming") * Book diverse speakers. Include speakers that represent underrepresented and marginalized populations. * Do research on your speakers. Is there any reason that your speakers would make the audience uncomfortable? * Design an [accessible presentation](https://www.smashingmagazine.com/2018/11/inclusive-design-accessible-presentations/) * If possible, share a recording after the event for community members who are not able to make it and add closed captioning. * Ask speakers to introduce themselves before starting their presentation, so that transcription services can capture who is talking. #### Digital platforms for online events[​](#digital-platforms-for-online-events "Direct link to Digital platforms for online events") * Take a minute or two to explain the features of the platform that attendees will be using in the beginning of the event * Offer the option for attendees to dial-in by phone and participate without a computer or internet * Explore the accessibility features your platform offers and apply it where necessary (i.e. closed captioning, automatic transcripts, screen reader support, etc.) * Check if your platform is compatible with assistive technology #### Attendee communication[​](#attendee-communication "Direct link to Attendee communication") * Make sure that attendees have any addresses, links, codes, numbers to accessing the event beforehand * Share the agenda of the event beforehand so that attendees are able to make arrangements (if necessary) * Share contact information with attendees so that they are able to reach out with questions before and after the event. * Ask attendees for feedback in a post-event survey so that you are able to improve future experiences. #### Speaker communication[​](#speaker-communication "Direct link to Speaker communication") * Ask speakers how to pronounce their names before the event * Ask speakers for their pronouns before the event * Suggest that speakers use headphones to ensure clear audio * Ask speakers to use plain language and avoid jargon, slang, idioms, etc. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Tyler Rouze [Home](https://docs.getdbt.com/index.md "Home")[Community spotlight](https://docs.getdbt.com/community/spotlight.md "Community spotlight")[Tyler Rouze](https://docs.getdbt.com/community/spotlight/tyler-rouze.md "Tyler Rouze") ![Tyler Rouze](/img/community/spotlight/tyler-rouze.jpg?v=2) he/him Managing Consultant , Analytics8 Location: Austin, TX, USA Organizations: Chicago dbt Meetup [Website](https://tylerrouze.com "Website") | [LinkedIn](https://www.linkedin.com/in/tylerrouze "LinkedIn") #### About My journey in data started all the way back in college where I studied Industrial Engineering. One of the core topics you learn in this program is mathematical optimization, where we often use data files as inputs to model constraints on these kinds of problems! Since then, I've been a data analyst on both small and large teams, and more recently a consultant shepherding our firm's dbt-based projects towards success. Since joining the dbt Community, I've spoken at the [Chicago dbt Meetup](https://www.meetup.com/chicago-dbt-meetup/), [Coalesce](https://coalesce.getdbt.com/speakers/tyler-rouze) (a milestone for my career!), dbt's Data Leaders Series, and even made open source contributions to \`dbt-core\`! It has been the joy of my career to be a part of this vibrant community. #### When did you join the dbt community and in what way has it impacted your career?[​](#when-did-you-join-the-dbt-community-and-in-what-way-has-it-impacted-your-career "Direct link to When did you join the dbt community and in what way has it impacted your career?") I joined the dbt community a few years back, and if you've seen me speak you'll know that discovering dbt was a very "lightbulb on" moment for me as a data analyst in a past life. Making the data transformation process more visible and accessible to less technical backgrounds made a lot of sense to me, data analysts understand best how to derive value from your organization's data! While I do still get to be hands-on with dbt, I now spend more of my time thinking about architecting dbt implementations and building analytics teams around it. The beautiful thing about this industry is that we've made great strides in solving our technical problems, the biggest challenges we now face are more socio-technical and process-based which is just as interesting to me! #### What dbt community leader do you identify with? How are you looking to grow your leadership in the dbt community?[​](#what-dbt-community-leader-do-you-identify-with-how-are-you-looking-to-grow-your-leadership-in-the-dbt-community "Direct link to What dbt community leader do you identify with? How are you looking to grow your leadership in the dbt community?") Leadership is a weird thing because you don't really realize you are a leader until you've been performing "leadership activities" for a while, so I'm honestly not sure what kind of leader I am yet. I made a commitment to myself a while back to share and talk about the things I was working on publicly in the event that someone else might find it useful, and hopefully had ideas of their own to offer me. The benefit of this is that it opens up a dialogue for our industry to evolve our best practices over time. If you've seen me speak, it's not uncommon for me to posit ideas during the talks I've given that aren't fully developed. My hope is that through a little bit of vulnerability others will feel that there's a place for them to share what they're working on and thinking about too, regardless of how polished it is! #### What have you learned from community members? What do you hope others can learn from you?[​](#what-have-you-learned-from-community-members-what-do-you-hope-others-can-learn-from-you "Direct link to What have you learned from community members? What do you hope others can learn from you?") In similar thinking to the kind of leadership I try to exemplify, I've found that the more I give to the community, the more I get in return. Contributing to the dbt Community has given me the ability to have numerous conversations with other practitioners about the problems they face in their role and how they've solved them, which in turn makes me better at my job. There is so much we can learn from others, but someone has to start the conversation! --- ### Writing contributions ##### Contribute to the product documentation[​](#contribute-to-the-product-documentation "Direct link to Contribute to the product documentation") ###### Overview[​](#overview "Direct link to Overview") The [dbt Product Documentation](https://docs.getdbt.com/docs/introduction.md) sits at the heart of how people learn to use and engage with dbt. From explaining dbt to newcomers to providing references for advanced functionality and APIs, the product docs are a frequent resource for *every* dbt Developer. ###### Contribution opportunities[​](#contribution-opportunities "Direct link to Contribution opportunities") We strive to create pathways that inspire you to learn more about dbt and enable you to continuously improve the way you solve data problems. We always appreciate the vigilance of the Community helping us to accurately represent the functionality and capabilities of dbt. You can participate by: * [Opening an issue](https://github.com/dbt-labs/docs.getdbt.com/issues/new/choose) when you see something that can be fixed, whether it’s large or small. * Creating a PR when you see something you want to fix, or to address an existing issue. You can do this by clicking **Edit this page** at the bottom of any page on [docs.getdbt.com](http://docs.getdbt.com). ###### Sample contributions[​](#sample-contributions "Direct link to Sample contributions") We appreciate these contributions because they contain context in the original post (OP) that helps us understand their relevance. They also add value to the docs, even in small ways! * Larger contribution: * Smaller contribution: ###### Get started[​](#get-started "Direct link to Get started") * You can contribute to [docs.getdbt.com](http://docs.getdbt.com) by looking at our repository’s [README](https://github.com/dbt-labs/docs.getdbt.com#readme) or clicking **Edit this page** at the bottom of most pages at docs.getdbt.com. * Read the [Contributor Expectations](https://docs.getdbt.com/community/resources/contributor-expectations.md). * Find an issue labeled “[good first issue](https://github.com/dbt-labs/docs.getdbt.com/issues?q=is%3Aopen+is%3Aissue+label%3A%22good+first+issue%22+).” * Need help: Visit #community-writers on the Community Slack or mention `@dbt-labs/product-docs` in a pull request or issue comment. ##### Write a Developer Blog Post[​](#write-a-developer-blog-post "Direct link to Write a Developer Blog Post") ###### Overview[​](#overview-1 "Direct link to Overview") The [dbt Developer Blog](https://docs.getdbt.com/blog) is the place for analytics practitioners to talk about *what it’s like to do data work right now.* This is the place to share tips and tricks, hard won knowledge and stories from the trenches with the dbt Community. ###### Contribution opportunities[​](#contribution-opportunities-1 "Direct link to Contribution opportunities") We want to hear your stories! Did you recently solve a cool problem, discover an interesting bug or lead an organizational change? Come tell the story on the dbt Developer Blog. ###### Sample contributions[​](#sample-contributions-1 "Direct link to Sample contributions") * [Founding an Analytics Engineering Team From Scratch](https://docs.getdbt.com/blog/founding-an-analytics-engineering-team-smartsheet#our-own-take-on-data-mesh) * [Tackling the Complexity of Joining Snapshots](https://docs.getdbt.com/blog/joining-snapshot-complexity) ###### Get started[​](#get-started-1 "Direct link to Get started") * [Read the contribution guide](https://github.com/dbt-labs/docs.getdbt.com/blob/current/contributing/developer-blog.md) * [Open up an issue with your idea for a post](https://github.com/dbt-labs/docs.getdbt.com/issues/new?assignees=\&labels=content%2Cdeveloper+blog\&template=contribute-to-developer-blog.yml) ###### Need help?[​](#need-help "Direct link to Need help?") Visit #community-writers in the dbt Community Slack #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Yasuhisa Yoshida [Home](https://docs.getdbt.com/index.md "Home")[Community spotlight](https://docs.getdbt.com/community/spotlight.md "Community spotlight")[Yasuhisa Yoshida](https://docs.getdbt.com/community/spotlight/yasuhisa-yoshida.md "Yasuhisa Yoshida") ![Yasuhisa Yoshida](/img/community/spotlight/yasuhisa-yoshida.jpg?v=2) he/him Data Engineer , 10X, Inc Location: Kyoto, Japan Organizations: datatech-jp [Twitter](https://twitter.com/syou6162 "Twitter") | [LinkedIn](https://jp.linkedin.com/in/yasuhisa-yoshida-077b7b43 "LinkedIn") | [Personal Website](https://www.yasuhisay.info "Personal Website") #### About I currently work as a data engineer at a startup called [10X.](https://10x.co.jp/) Specifically, I work with BigQuery to provide data marts for business users. Before using dbt, the queries for creating data marts were overly complex and lengthy, resulting in low data quality. With dbt, we have improved our process by breaking down queries into manageable parts, visualizing data lineage, and enabling easy creation of tests. I am actively involved in the dbt community and share our insights on using dbt at #local-tokyo. Specifically, I shared our experiences with efficient metadata management using dbt-osmosis, and visualizing data quality using elementary. #### When did you join the dbt community and in what way has it impacted your career?[​](#when-did-you-join-the-dbt-community-and-in-what-way-has-it-impacted-your-career "Direct link to When did you join the dbt community and in what way has it impacted your career?") I joined dbt's Slack at the end of 2021. I often follow dbt's Slack because it offers insights into dbt that I could not have gained on my own. I also enjoy presenting my company's findings at the [Tokyo dbt Meetup](https://www.meetup.com/ja-JP/tokyo-dbt-meetup/) and receiving feedback from dbt community members outside my company. Participating in the dbt community frequently sparks ideas on how to enhance my company's data infrastructure. #### What dbt community leader do you identify with? How are you looking to grow your leadership in the dbt community?[​](#what-dbt-community-leader-do-you-identify-with-how-are-you-looking-to-grow-your-leadership-in-the-dbt-community "Direct link to What dbt community leader do you identify with? How are you looking to grow your leadership in the dbt community?") Although dbt is already a widely used tool, it is still young and actively being developed. Therefore, there may occasionally be times when there are small bugs, or you might not understand how to use it. In such cases, you can ask questions on the dbt Slack or submit an issue on GitHub. These actions can lead to a direct solution to the problem and often help other community members who are struggling with similar issues—you are not the only one! I have been helped many times by seeing these types of questions from others on dbt Slack and GitHub issues. Of course, answering questions and submitting pull requests on GitHub is also a great way to contribute to the community. However, starting with small contributions is perfectly fine. I would like to continue doing these kinds of activities to make the community more active. #### What have you learned from community members? What do you hope others can learn from you?[​](#what-have-you-learned-from-community-members-what-do-you-hope-others-can-learn-from-you "Direct link to What have you learned from community members? What do you hope others can learn from you?") dbt is widely used by various types of businesses, from startups needing fast development to enterprises requiring robust data quality. Different business phases and industries have diverse data requirements. In the dbt community, members actively share best practices and lessons from failures, helping you adapt dbt to your company's needs. Having observed many such use cases, we have learned to deliver value to business users through dimensional modeling and tools like [AutomateDV](https://automate-dv.readthedocs.io/en/latest/). I am confident that you can find use cases on dbt Slack that match your company’s business needs, and I encourage you to share the practices and insights you gain from using these tools with the community. #### Anything else interesting you want to tell us?[​](#anything-else-interesting-you-want-to-tell-us "Direct link to Anything else interesting you want to tell us?") dbt and the surrounding ecosystem are powerful allies for data engineers and analytics engineers. Let's work together as dbt community members to further enhance these tools and build a better world! --- ## Docs ### 2022 dbt platform release notes Archived release notes for dbt from 2022 #### December 2022[​](#december-2022 "Direct link to December 2022") ##### Threads default value changed to 4[​](#threads-default-value-changed-to-4 "Direct link to Threads default value changed to 4") Threads help parallelize node execution in the dbt directed acyclic graph [(DAG)](https://docs.getdbt.com/terms/dag). Previously, the thread value defaulted to 1, which can increase the runtime of your project. To help reduce the runtime of your project, the default value for threads in user profiles is now set to 4 threads. You can supply a custom thread count if you'd prefer more or less parallelization. For more information, read [Understanding threads](https://docs.getdbt.com/docs/running-a-dbt-project/using-threads.md). ##### Creating a new job no longer triggers a run by default[​](#creating-a-new-job-no-longer-triggers-a-run-by-default "Direct link to Creating a new job no longer triggers a run by default") To help save compute time, new jobs will no longer be triggered to run by default. When you create a new job in dbt, you can trigger the job to run by selecting **Run on schedule** and completing the desired schedule and timing information. For more information, refer to [Deploy jobs](https://docs.getdbt.com/docs/deploy/deploy-jobs.md). ##### Private packages must be cloned using access tokens provided by environment variables[​](#private-packages-must-be-cloned-using-access-tokens-provided-by-environment-variables "Direct link to Private packages must be cloned using access tokens provided by environment variables") The supported method for cloning private GitHub packages is the [git token method](https://docs.getdbt.com/docs/build/packages.md#git-token-method), where an appropriate access token is passed into the package repository URL with an environment variable. A small number of people have been able to clone private packages using dbt's native GitHub application without explicitly providing an access token. This functionality is being deprecated as it’s limited in flexibility. If you have been using a package hosted in a private repository on GitHub, you must start passing an access token into the URL. An example of passing an access token: packages.yml ```yaml packages: - git: "https://{{env_var('DBT_ENV_SECRET_GIT_CREDENTIAL')}}@github.com/dbt-labs/awesome_repo.git" ``` #### November 2022[​](#november-2022 "Direct link to November 2022") ##### The dbt Cloud + Databricks experience is getting even better[​](#the-dbt-cloud--databricks-experience-is-getting-even-better "Direct link to The dbt Cloud + Databricks experience is getting even better") dbt is the easiest and most reliable way to develop and deploy a dbt project. It helps remove complexity while also giving you more features and better performance. A simpler Databricks connection experience with support for Databricks’ Unity Catalog and better modeling defaults is now available for your use. For all the Databricks customers already using dbt with the dbt-spark adapter, you can now [migrate](https://docs.getdbt.com/guides/migrate-from-spark-to-databricks.md) your connection to the [dbt-databricks adapter](https://docs.getdbt.com/docs/local/connect-data-platform/databricks-setup.md) to get the benefits. [Databricks](https://www.databricks.com/blog/2022/11/17/introducing-native-high-performance-integration-dbt-cloud.html) is committed to maintaining and improving the adapter, so this integrated experience will continue to provide the best of dbt and Databricks. Check out our [live blog post](https://www.getdbt.com/blog/dbt-cloud-databricks-experience/) to learn more. ##### Extra features in new and refreshed IDE[​](#extra-features-in-new-and-refreshed-ide "Direct link to Extra features in new and refreshed IDE") The refreshed version of the Studio IDE has launched four brand-new additional features, making it easier and faster for you to develop in the Studio IDE. The new features are: * **Formatting** — Format your dbt SQL files to a single code style with a click of a button. This uses the tool [sqlfmt](https://github.com/tconbeer/sqlfmt). * **Git diff view** — Highlights the changes in a file before opening a pull request. * **dbt autocomplete** — There are four new types of autocomplete features to help you develop faster: * Use `ref` to autocomplete your model names * Use `source` to autocomplete your source name + table name * Use `macro` to autocomplete your arguments * Use `env var` to autocomplete env var * **Dark mode** — Use dark mode in the Studio IDE for low-light environments. Read more about all the [Cloud Studio IDE features](https://docs.getdbt.com/docs/cloud/studio-ide/develop-in-studio.md#cloud-ide-features). ##### Classic IDE deprecation notice[​](#classic-ide-deprecation-notice "Direct link to Classic IDE deprecation notice") In December 2022, dbt Labs will deprecate the classic Studio IDE. The [new and refreshed Studio IDE](https://docs.getdbt.com/docs/cloud/studio-ide/develop-in-studio.md) will be available for *all* dbt users. You will no longer be able to access the classic Studio IDE and dbt Labs might introduce changes that break the classic Studio IDE. With deprecation, dbt Labs will only support the refreshed version of the Studio IDE. Virtual Private Cloud (VPC) customers with questions about when this change will affect your account can contact your account team or support contact for assistance. #### October 2022[​](#october-2022 "Direct link to October 2022") ##### Announcing dbt Cloud’s native integration with Azure DevOps[​](#announcing-dbt-clouds-native-integration-with-azure-devops "Direct link to Announcing dbt Cloud’s native integration with Azure DevOps") dbt now offers a native integration with Azure DevOps for dbt customers on the enterprise plan. We built this integration to remove friction, increase security, and unlock net new product experiences for our customers. [Setting up the Azure DevOps integration](https://docs.getdbt.com/docs/cloud/git/connect-azure-devops.md) in dbt provides: * easy dbt project set up, * an improved security posture, * repo permissions enforcement in Studio IDE, and * dbt Slim CI. Check out our [live blog post](https://www.getdbt.com/blog/dbt-cloud-integration-azure-devops/) to learn more! ##### Introducing a snappier, improved, and powerful Cloud IDE[​](#introducing-a-snappier-improved-and-powerful-cloud-ide "Direct link to Introducing a snappier, improved, and powerful Cloud IDE") The new version of the Cloud Studio IDE makes it easy for you to build data models without thinking much about environment setup and configuration. The new Cloud Studio IDE includes performance upgrades, ergonomics improvements, and some delightful enhancements! Some of the improvements include: * Improved Cloud Studio IDE startup time (starting the Studio IDE), interaction time (saving and committing), and reliability. * Better organization and navigation with features like drag and drop of files, breadcrumb, build button drop-down, and more. * You can use new features like auto-format your file, auto-complete model names, and git diff view to see your changes before making a pull request. Read more about the new [Cloud Studio IDE features](https://docs.getdbt.com/docs/cloud/studio-ide/develop-in-studio.md#cloud-ide-features) and check out [New and improved Cloud Studio IDE](https://www.getdbt.com/blog/new-improved-cloud-ide/) blog for more info! #### September 2022[​](#september-2022 "Direct link to September 2022") ##### List Steps API endpoint deprecation warning[​](#list-steps-api-endpoint-deprecation-warning "Direct link to List Steps API endpoint deprecation warning") On October 14th, 2022 dbt Labs is deprecating the List Steps API endpoint. From October 14th, any GET requests to this endpoint will fail. Please prepare to stop using the List Steps endpoint as soon as possible. dbt Labs will continue to maintain the [Retrieve Run](https://docs.getdbt.com/dbt-cloud/api-v2#/operations/Retrieve%20Run) endpoint, which is a viable alternative depending on the use case. You can fetch run steps for an individual run with a GET request to the following URL, replacing `YOUR_ACCESS_URL` with the [appropriate Access URL](https://docs.getdbt.com/docs/cloud/about-cloud/access-regions-ip-addresses.md) for your region and plan: `https://YOUR_ACCESS_URL/api/v2/accounts/{accountId}/runs/{runId}/?include_related=["run_steps"]` ##### Query the previous three months of data using the metadata API[​](#query-the-previous-three-months-of-data-using-the-metadata-api "Direct link to Query the previous three months of data using the metadata API") In order to make the metadata API more scalable and improve its latency, we’ve implemented data retention limits. The metadata API can now query data from the previous three months. For example, if today was March 1, you could query data back to January 1st. For more information, see [Metadata API](https://docs.getdbt.com/docs/dbt-cloud-apis/discovery-api.md) #### August 2022[​](#august-2022 "Direct link to August 2022") ##### Support for cross-database sources on Redshift RA3 instances[​](#support-for-cross-database-sources-on-redshift-ra3-instances "Direct link to Support for cross-database sources on Redshift RA3 instances") Cross-database queries for RA3 instances are now supported by dbt projects using a Redshift connection. With cross-database queries, you can seamlessly query data from any database in the cluster, regardless of which database you are connected to with dbt. The [connection configuration](https://docs.getdbt.com/docs/local/connect-data-platform/redshift-setup.md) `ra3_node` has been defaulted to `true`. This allows users to: * benefit from the full RA3 nodes’ capabilities, * generate appropriate dbt documentation. #### July 2022[​](#july-2022 "Direct link to July 2022") ##### Large DAG feature[​](#large-dag-feature "Direct link to Large DAG feature") You can now select **Render Lineage** to visualize large DAGs. Large DAGs can take a long time (10 or more seconds, if not minutes) to render and can cause browsers to crash. The new button prevents large DAGs from rendering automatically. Instead, you can select **Render Lineage** to load the visualization. This should affect about 15% of the DAGs. [![Render Lineage]( "Render Lineage")](#)Render Lineage #### May 2022[​](#may-2022 "Direct link to May 2022") ##### Refresh expired access tokens in the IDE when using GitLab[​](#refresh-expired-access-tokens-in-the-ide-when-using-gitlab "Direct link to Refresh expired access tokens in the IDE when using GitLab") On May 22, GitLab changed how they treat [OAuth access tokens that don't expire](https://docs.gitlab.com/ee/update/deprecations.html#oauth-tokens-without-expiration). We updated our Studio IDE logic to handle OAuth token expiration more gracefully. Now, the first time your token expires after 2 hours of consecutive Studio IDE usage, you will have to re-authenticate in GitLab to refresh your expired OAuth access token. We will handle subsequent refreshes for you if you provide the authorization when you re-authenticate. This additional security layer in the Studio IDE is available only to the dbt enterprise plan. #### April 2022[​](#april-2022 "Direct link to April 2022") ##### Audit log[​](#audit-log "Direct link to Audit log") To review actions performed by people in your organization, dbt provides logs of audited user and system events. The dbt audit log lists events triggered in your organization within the last 90 days. The audit log includes details such as who performed the action, what the action was, and when it was performed. For more details, review [the audit log for dbt Enterprise](https://docs.getdbt.com/docs/cloud/manage-access/audit-log.md) documentation. ##### Credentials no longer accidentally wiped when editing an environment[​](#credentials-no-longer-accidentally-wiped-when-editing-an-environment "Direct link to Credentials no longer accidentally wiped when editing an environment") We resolved a bug where when updating unencrypted fields (for example, threads, schema name) in an environment setting would cause secret fields (for example, password, keypair, credential details) to be deleted from that environment. Now users can freely update environment settings without fear of unintentionally wiping credentials. ##### Email verification[​](#email-verification "Direct link to Email verification") To enhance the security of user creation, dbt users created using SAML Just-in-Time (JIT) will now confirm identity via email to activate their account. Using email to confirm identity ensures the user still has access to the same email address they use to login via SAML. ##### Scheduler performance improvements[​](#scheduler-performance-improvements "Direct link to Scheduler performance improvements") We rolled out our new distributed scheduler, which has much faster prep times, especially at the top of the hour. We share more about our work and improvements in our [product news blog post](https://www.getdbt.com/blog/a-good-problem-to-have/). #### March 2022[​](#march-2022 "Direct link to March 2022") ##### Spotty internet issues no longer cause a session time out message[​](#spotty-internet-issues-no-longer-cause-a-session-time-out-message "Direct link to Spotty internet issues no longer cause a session time out message") We fixed an issue where a spotty internet connection could cause the “Studio IDE session timed out” message to appear unexpectedly. People using a VPN were most likely to see this issue. We updated the health check logic so it now excludes client-side connectivity issues from the Studio IDE session check. If you lose your internet connection, we no longer update the health-check state. Now, losing internet connectivity will no longer cause this unexpected message. [![Fix Session Timeout]( "Fix Session Timeout")](#)Fix Session Timeout ##### Dividing queue time into waiting and prep time[​](#dividing-queue-time-into-waiting-and-prep-time "Direct link to Dividing queue time into waiting and prep time") dbt now shows "waiting time" and "prep time" for a run, which used to be expressed in aggregate as "queue time". Waiting time captures the time dbt waits to run your job if there isn't an available run slot or if a previous run of the same job is still running. Prep time represents the time it takes dbt to ready your job to run in your cloud data warehouse. [![New prep time and waiting time](/img/docs/dbt-cloud/v1.1.46releasenotes_img1.png?v=2 "New prep time and waiting time")](#)New prep time and waiting time #### February 2022[​](#february-2022 "Direct link to February 2022") ##### DAG updates and performance improvements[​](#dag-updates-and-performance-improvements "Direct link to DAG updates and performance improvements") Love the DAG in the Studio IDE as much as we do? Now when you click on a node in the DAG, the model or config file will open as a new tab in the Studio IDE, so you can directly view or edit the code. We'll continue to ship better developer ergonomic functionality throughout the year. ###### Performance improvements and enhancements[​](#performance-improvements-and-enhancements "Direct link to Performance improvements and enhancements") * Updated recommended dbt commands in the Studio IDE to include dbt Core v1.0 commands, such as "build" and the "--select" argument. ##### Service tokens and bug fixes[​](#service-tokens-and-bug-fixes "Direct link to Service tokens and bug fixes") Service tokens can now be assigned granular permissions to enforce least privilege access. If you're on Enterprise, you can assign any enterprise permission set to newly issued service tokens. If you're on Teams, you can assign the Job Admin permission set to newly issued service tokens. We highly recommend you re-issue service tokens with these new permissions to increase your security posture! See docs [here](https://docs.getdbt.com/docs/dbt-cloud-apis/service-tokens.md#permissions-for-service-account-tokens). ###### New products and features[​](#new-products-and-features "Direct link to New products and features") * We are joining the [GitHub secret scanning partner program](https://docs.github.com/en/developers/overview/secret-scanning-partner-program) to better secure your token against accidental public exposure and potential fraudulent usage. ###### Bug fixes[​](#bug-fixes "Direct link to Bug fixes") * Credentials are no longer accidentally deleted when a user updates an environment setting. #### January 2022[​](#january-2022 "Direct link to January 2022") ##### Autocomplete snippets for SQL and YAML files in IDE[​](#autocomplete-snippets-for-sql-and-yaml-files-in-ide "Direct link to Autocomplete snippets for SQL and YAML files in IDE") Some noteworthy improvements include autocomplete snippets for SQL and YAML files in the IDE, which are available for use now! We also added a [new metric layer page](https://docs.getdbt.com/docs/use-dbt-semantic-layer/dbt-sl.md) to docs.getdbt.com to help you begin thinking about the metrics layer in dbt Cloud. ###### Performance improvements and enhancements[​](#performance-improvements-and-enhancements-1 "Direct link to Performance improvements and enhancements") * Branch names now default to "main" instead of "master" in new managed and unmanaged Git repositories. * Update IDE autocomplete snippets. ##### Model timing for Multi-tenant Team and Enterprise accounts[​](#model-timing-for-multi-tenant-team-and-enterprise-accounts "Direct link to Model timing for Multi-tenant Team and Enterprise accounts") We started the new year with a gift! Multi-tenant Team and Enterprise accounts can now use the new [Model timing](https://docs.getdbt.com/docs/deploy/deploy-jobs.md#model-timing) tab in dbt. You can use this tab to further explore long-running models to see if they need refactoring or rescheduling. ###### Performance improvements and enhancements[​](#performance-improvements-and-enhancements-2 "Direct link to Performance improvements and enhancements") * We added client-side naming validation for file or folder creation. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### 2023 dbt platform release notes Archived release notes for dbt from 2023 #### December 2023[​](#december-2023 "Direct link to December 2023") *  Semantic Layer updates The dbt Labs team continues to work on adding new features, fixing bugs, and increasing reliability for the dbt Semantic Layer. The following list explains the updates and fixes for December 2023 in more detail. #### Bug fixes[​](#bug-fixes "Direct link to Bug fixes") * Tableau integration — The dbt Semantic Layer integration with Tableau now supports queries that resolve to a "NOT IN" clause. This applies to using "exclude" in the filtering user interface. Previously it wasn’t supported. * `BIGINT` support — The dbt Semantic Layer can now support `BIGINT` values with precision greater than 18. Previously it would return an error. * Memory leak — Fixed a memory leak in the JDBC API that would previously lead to intermittent errors when querying it. * Data conversion support — Added support for converting various Redshift and Postgres-specific data types. Previously, the driver would throw an error when encountering columns with those types. #### Improvements[​](#improvements "Direct link to Improvements") * Deprecation — We deprecated dbt Metrics and the legacy dbt Semantic Layer, both supported on dbt version 1.5 or lower. This change came into effect on December 15th, 2023. * Improved dbt converter tool — The [dbt converter tool](https://github.com/dbt-labs/dbt-converter) can now help automate some of the work in converting from LookML (Looker's modeling language) for those who are migrating. Previously this wasn’t available. *  External attributes The extended attributes feature in dbt Cloud is now GA! It allows for an environment level override on any YAML attribute that a dbt adapter accepts in its `profiles.yml`. You can provide a YAML snippet to add or replace any [profile](https://docs.getdbt.com/docs/local/profiles.yml.md) value. To learn more, refer to [Extended attributes](https://docs.getdbt.com/docs/dbt-cloud-environments.md#extended-attributes). The **Extended Attributes** text box is available from your environment's settings page: [![Example of the Extended attributes text box](/img/docs/dbt-cloud/using-dbt-cloud/extended-attributes.png?v=2 "Example of the Extended attributes text box")](#)Example of the Extended attributes text box *  Legacy semantic layer dbt Labs has deprecated dbt Metrics and the legacy dbt Semantic Layer, both supported on dbt version 1.5 or lower. This change starts on December 15th, 2023. This deprecation means dbt Metrics and the legacy Semantic Layer are no longer supported. We also removed the feature from the dbt Cloud user interface and documentation site. ##### Why this change?[​](#why-this-change "Direct link to Why this change?") The [re-released dbt Semantic Layer](https://docs.getdbt.com/docs/use-dbt-semantic-layer/dbt-sl.md), powered by MetricFlow, offers enhanced flexibility, performance, and user experience, marking a significant advancement for the dbt community. ##### Key changes and impact[​](#key-changes-and-impact "Direct link to Key changes and impact") * **Deprecation date** — The legacy Semantic Layer and dbt Metrics will be officially deprecated on December 15th, 2023. * **Replacement** — [MetricFlow](https://docs.getdbt.com/docs/build/build-metrics-intro.md) replaces dbt Metrics for defining semantic logic. The `dbt_metrics` package will no longer be supported post-deprecation. * **New feature** — Exports replaces the materializing data with `metrics.calculate` functionality and will be available in dbt Cloud in December or January. ##### Breaking changes and recommendations[​](#breaking-changes-and-recommendations "Direct link to Breaking changes and recommendations") * For users on dbt version 1.5 and lower with dbt Metrics and Snowflake proxy: * **Impact**: Post-deprecation, queries using the proxy *will not* run. * For users on dbt version 1.5 and lower using dbt Metrics without Snowflake proxy: * **Impact**: No immediate disruption, but the package will not receive updates or support after deprecation * **Recommendation**: Plan migration to the re-released Semantic Layer for compatibility with dbt version 1.6 and higher. ##### Engage and support[​](#engage-and-support "Direct link to Engage and support") * Feedback and community support — Engage and share feedback with the dbt Labs team and dbt Community slack using channels like [#dbt-cloud-semantic-layer](https://getdbt.slack.com/archives/C046L0VTVR6) and [#dbt-metricflow](https://getdbt.slack.com/archives/C02CCBBBR1D). Or reach out to your dbt Cloud account representative. * Resources for upgrading — Refer to some additional info and resources to help you upgrade your dbt version: * [Upgrade version in dbt Cloud](https://docs.getdbt.com/docs/dbt-versions/upgrade-dbt-version-in-cloud.md) * [Version migration guides](https://docs.getdbt.com/docs/dbt-versions/core-upgrade.md) #### November 2023[​](#november-2023 "Direct link to November 2023") *  New features and UI changes to dbt Catalog There are new quality-of-life improvements in dbt Cloud for email and Slack notifications about your jobs: * You can add external email addresses and send job notifications to them. External emails can be: * Addresses that are outside of your dbt Cloud account * Third-party integration addresses for configuring notifications to services like Microsoft Teams or PagerDuty * You can configure notifications for multiple Slack channels. Previously, you could only configure one Slack channel. * Any account admin can now edit slack notifications, not just the person who created them. To learn more, check out [Job notifications](https://docs.getdbt.com/docs/deploy/job-notifications.md). *  Job notifications There are new quality-of-life improvements in dbt Cloud for email and Slack notifications about your jobs: * You can add external email addresses and send job notifications to them. External emails can be: * Addresses that are outside of your dbt Cloud account * Third-party integration addresses for configuring notifications to services like Microsoft Teams or PagerDuty * You can configure notifications for multiple Slack channels. Previously, you could only configure one Slack channel. * Any account admin can now edit slack notifications, not just the person who created them. To learn more, check out [Job notifications](https://docs.getdbt.com/docs/deploy/job-notifications.md). *  Repo caching Now available for dbt Cloud Enterprise plans is a new option to enable Git repository caching for your job runs. When enabled, dbt Cloud caches your dbt project's Git repository and uses the cached copy instead if there's an outage with the Git provider. This feature improves the reliability and stability of your job runs. To learn more, refer to [Repo caching](https://docs.getdbt.com/docs/cloud/account-settings.md#git-repository-caching). [![Example of the Repository caching option](/img/docs/deploy/account-settings-repository-caching.png?v=2 "Example of the Repository caching option")](#)Example of the Repository caching option #### October 2023[​](#october-2023 "Direct link to October 2023") *  dbt Cloud APIs Beginning December 1, 2023, the [Administrative API](https://docs.getdbt.com/docs/dbt-cloud-apis/admin-cloud-api.md) v2 and v3 will expect you to limit all "list" or `GET` API methods to 100 results per API request. This limit enhances the efficiency and stability of our services. If you need to handle more than 100 results, then use the `limit` and `offset` query parameters to paginate those results; otherwise, you will receive an error. This maximum limit applies to [multi-tenant instances](https://docs.getdbt.com/docs/cloud/about-cloud/access-regions-ip-addresses.md) only, and *does not* apply to single tenant instances. Refer to the [API v3 Pagination](https://docs.getdbt.com/dbt-cloud/api-v3#/) or [API v2 Pagination](https://docs.getdbt.com/dbt-cloud/api-v2#/) sections for more information on how to paginate your API responses. *  dbt CLI We are excited to announce the dbt CLI, **unified command line for dbt**, is available in public preview. It’s a local development experience, powered by dbt Cloud. It’s easy to get started: `pip3 install dbt` or `brew install dbt` and you’re ready to go. We will continue to invest in the dbt Cloud IDE as the easiest and most accessible way to get started using dbt, especially for data analysts who have never developed software using the command line before. We will keep improving the speed, stability, and feature richness of the IDE, as we have been [all year long](https://www.getdbt.com/blog/improvements-to-the-dbt-cloud-ide/). We also know that many people developing in dbt have a preference for local development, where they can use their favorite terminal, text editor, keybindings, color scheme, and so on. This includes people with data engineering backgrounds, as well as those analytics engineers who started writing code in the dbt Cloud IDE and have expanded their skills. The new dbt CLI offers the best of both worlds, including: * The power of developing against the dbt Cloud platform * The flexibility of your own local setup Run whichever community-developed plugins, pre-commit hooks, or other arbitrary scripts you like. Some of the unique capabilities of this dbt CLI include: * Automatic deferral of build artifacts to your Cloud project's production environment * Secure credential storage in the dbt Cloud platform * Support for dbt Mesh ([cross-project `ref`](https://docs.getdbt.com/docs/mesh/govern/project-dependencies.md)) * Development workflow for dbt Semantic Layer * Speedier, lower cost builds Refer to [dbt CLI](https://docs.getdbt.com/docs/cloud/cloud-cli-installation.md) to learn more. *  Custom branch fix If you don't set a [custom branch](https://docs.getdbt.com/docs/dbt-cloud-environments.md#custom-branch-behavior) for your dbt Cloud environment, it now defaults to the default branch of your Git repository (for example, `main`). Previously, [CI jobs](https://docs.getdbt.com/docs/deploy/ci-jobs.md) would run for pull requests (PRs) that were opened against *any branch* or updated with new commits if the **Custom Branch** option wasn't set. #### Azure DevOps[​](#azure-devops "Direct link to Azure DevOps") Your Git pull requests (PRs) might not trigger against your default branch if you're using Azure DevOps and the default branch isn't `main` or `master`. To resolve this, [set up a custom branch](https://docs.getdbt.com/faqs/Environments/custom-branch-settings.md) with the branch you want to target. *  dbt deps auto install The dbt Cloud IDE and dbt CLI now automatically installs `dbt deps` when your environment starts or when necessary. Previously, it would prompt you to run `dbt deps` during initialization. This improved workflow is available to all multi-tenant dbt Cloud users (Single-tenant support coming next week) and applies to dbt versions. However, you should still run the `dbt deps` command in these situations: * When you make changes to the `packages.yml` or `dependencies.yml` file during a session * When you update the package version in the `packages.yml` or `dependencies.yml` file. * If you edit the `dependencies.yml` file and the number of packages remains the same, run `dbt deps`. (Note that this is a known bug dbt Labs will fix in the future.) *  Native retry support Previously in dbt Cloud, you could only rerun an errored job from start but now you can also rerun it from its point of failure. You can view which job failed to complete successfully, which command failed in the run step, and choose how to rerun it. To learn more, refer to [Retry jobs](https://docs.getdbt.com/docs/deploy/retry-jobs.md). [![Example of the Rerun options in dbt Cloud](/img/docs/deploy/native-retry.gif?v=2 "Example of the Rerun options in dbt Cloud")](#)Example of the Rerun options in dbt Cloud *  Product docs updates Hello from the dbt Docs team: @mirnawong1, @matthewshaver, @nghi-ly, and @runleonarun! First, we’d like to thank the 15 new community contributors to docs.getdbt.com. We merged [107 PRs](https://github.com/dbt-labs/docs.getdbt.com/pulls?q=is%3Apr+merged%3A2023-09-01..2023-09-31) in September. Here's what's new to [docs.getdbt.com](http://docs.getdbt.com/): * Migrated docs.getdbt.com from Netlify to Vercel. #### ☁ Cloud projects[​](#cloud-projects "Direct link to ☁ Cloud projects") * Continuous integration jobs are now generally available and no longer in beta! * Added [Postgres PrivateLink set up page](https://docs.getdbt.com/docs/cloud/secure/private-connectivity/aws/aws-postgres.md) * Published beta docs for [dbt Explorer](https://docs.getdbt.com/docs/explore/explore-projects.md). * Added a new Semantic Layer [GraphQL API doc](https://docs.getdbt.com/docs/dbt-cloud-apis/sl-graphql.md) and updated the [integration docs](https://docs.getdbt.com/docs/cloud-integrations/avail-sl-integrations.md) to include Hex. Responded to dbt community feedback and clarified Metricflow use cases for dbt Core and dbt Cloud. * Added an [FAQ](https://docs.getdbt.com/faqs/Git/git-migration.md) describing how to migrate from one git provider to another in dbt Cloud. * Clarified an example and added a [troubleshooting section](https://docs.getdbt.com/docs/cloud/connect-data-platform/connect-snowflake.md#troubleshooting) to Snowflake connection docs to address common errors and provide solutions. #### 🎯 Core projects[​](#core-projects "Direct link to 🎯 Core projects") * Deprecated dbt Core v1.0 and v1.1 from the docs. * Added configuration instructions for the [AWS Glue](https://docs.getdbt.com/docs/local/connect-data-platform/glue-setup.md) community plugin. * Revised the dbt Core quickstart, making it easier to follow. Divided this guide into steps that align with the [other guides](https://docs.getdbt.com/guides/manual-install.md?step=1). #### New 📚 Guides, ✏️ blog posts, and FAQs[​](#newguides️blog-posts-and-faqs "Direct link to New 📚 Guides, ✏️ blog posts, and FAQs") Added a [style guide template](https://docs.getdbt.com/best-practices/how-we-style/6-how-we-style-conclusion.md#style-guide-template) that you can copy & paste to make sure you adhere to best practices when styling dbt projects! #### Upcoming changes[​](#upcoming-changes "Direct link to Upcoming changes") Stay tuned for a flurry of releases in October and a filterable guides section that will make guides easier to find! *  Semantic layer GA If you're using the legacy Semantic Layer, we *highly* recommend you [upgrade your dbt version](https://docs.getdbt.com/docs/dbt-versions/upgrade-dbt-version-in-cloud.md) to dbt v1.6 or higher and migrate to the latest Semantic Layer. dbt Labs is thrilled to announce that the [dbt Semantic Layer](https://docs.getdbt.com/docs/use-dbt-semantic-layer/dbt-sl.md) is now generally available. It offers consistent data organization, improved governance, reduced costs, enhanced efficiency, and accessible data for better decision-making and collaboration across organizations. It aims to bring the best of modeling and semantics to downstream applications by introducing: * Brand new [integrations](https://docs.getdbt.com/docs/cloud-integrations/avail-sl-integrations.md) such as Tableau, Google Sheets, Hex, Mode, and Lightdash. * New [Semantic Layer APIs](https://docs.getdbt.com/docs/dbt-cloud-apis/sl-api-overview.md) using GraphQL and JDBC to query metrics and build integrations. * dbt Cloud [multi-tenant regional](https://docs.getdbt.com/docs/cloud/about-cloud/access-regions-ip-addresses.md) support for North America, EMEA, and APAC. Single-tenant support coming soon. * Coming soon — Schedule exports (a way to build tables in your data platform) as part of your dbt Cloud job. Use the APIs to call an export, then access them in your preferred BI tool. [![Use the universal dbt Semantic Layer to define and queried metrics in integration tools.](/img/docs/dbt-cloud/semantic-layer/sl-architecture.jpg?v=2 "Use the universal dbt Semantic Layer to define and queried metrics in integration tools.")](#)Use the universal dbt Semantic Layer to define and queried metrics in integration tools. The dbt Semantic Layer is available to [dbt Cloud Team or Enterprise](https://www.getdbt.com/) multi-tenant plans on dbt v1.6 or higher. * Team and Enterprise customers can use 1,000 Queried Metrics per month for no additional cost on a limited trial basis, subject to reasonable use limitations. Refer to [Billing](https://docs.getdbt.com/docs/cloud/billing.md#what-counts-as-a-queried-metric) for more information. * dbt Developer plans and dbt Core users can define metrics but won't be able to query them with integrated tools. #### September 2023[​](#september-2023 "Direct link to September 2023") *  CI updates dbt Cloud now has two distinct job types: [deploy jobs](https://docs.getdbt.com/docs/deploy/deploy-jobs.md) for building production data assets, and [continuous integration (CI) jobs](https://docs.getdbt.com/docs/deploy/ci-jobs.md) for checking code changes. These jobs perform fundamentally different tasks so dbt Labs improved the setup experience with better defaults for each. With two types of jobs, instead of one generic type, we can better guide you through the setup flow. Best practices are built into the default settings so you can go from curious to being set up in seconds. And, we now have more efficient state comparisons on CI checks: never waste a build or test on code that hasn’t been changed. We now diff between the Git pull request (PR) code and what’s running in production more efficiently with the introduction of deferral to an environment versus a job. To learn more, refer to [Continuous integration in dbt](https://docs.getdbt.com/docs/deploy/continuous-integration.md). Below is a comparison table that describes how deploy jobs and CI jobs behave differently: | | Deploy Jobs | CI Jobs | | ---------------------- | ------------------------------------------------------------------------------ | --------------------------------------------------------------------------------------------------------------------- | | Purpose | Builds production data assets. | Builds and tests new code before merging changes into production. | | Trigger types | Triggered by a schedule or by API. | Triggered by a commit to a PR or by API. | | Destination | Builds into a production database and schema. | Builds into a staging database and ephemeral schema, lived for the lifetime of the PR. | | Execution mode | Runs execute sequentially, so as to not have collisions on the underlying DAG. | Runs execute in parallel to promote team velocity. | | Efficiency run savings | Detects over-scheduled jobs and cancels unnecessary runs to avoid queue clog. | Cancels existing runs when a newer commit is pushed to avoid redundant work. | | State comparison | Only sometimes needs to detect state. | Almost always needs to compare state against the production environment to build on modified code and its dependents. | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | #### What you need to update[​](#what-you-need-to-update "Direct link to What you need to update") * If you want to set up a CI environment for your jobs, dbt Labs recommends that you create your CI job in a dedicated [deployment environment](https://docs.getdbt.com/docs/deploy/deploy-environments.md#create-a-deployment-environment) that's connected to a staging database. To learn more about these environment best practices, refer to the guide [Get started with continuous integration tests](https://docs.getdbt.com/guides/set-up-ci.md). * If you had set up a CI job before October 2, 2023, the job might've been misclassified as a deploy job with this update. Below describes how to fix the job type: If you used the [Create Job](https://docs.getdbt.com/dbt-cloud/api-v2#/operations/Create%20Job) API endpoint but didn't set `"triggers":triggers.git_provider_webhook`, the job was misclassified as a deploy job and you must re-create it as described in [Trigger a CI job with the API](https://docs.getdbt.com/docs/deploy/ci-jobs.md#trigger-a-ci-job-with-the-api). If you used the dbt UI but didn't enable the **Run on Pull Requests** option that was in the **Continuous Integration** (CI) tab, the job was misclassified as a deploy job and you must re-create it as described in [Set up CI jobs](https://docs.getdbt.com/docs/deploy/ci-jobs.md#set-up-ci-jobs). To check for the job type, review your CI jobs in dbt's [Run History](https://docs.getdbt.com/docs/deploy/run-visibility.md#run-history) and check for the **CI Job** tag below the job name. If it doesn't have this tag, it was misclassified and you need to re-create the job. **CI update phase 3 — Update: Improved automatic deletion of temporary schemas** Temporary schemas are now being automatically deleted (dropped) for all adapters (like Databricks), PrivateLink connections, and environment variables in connection strings. dbt Labs has rearchitected how schema deletion works for [continuous integration (CI)](https://docs.getdbt.com/docs/deploy/continuous-integration.md) runs. We created a new service to delete any schema with a prefix of `dbt_cloud_pr_` that's been generated by a PR run. However, temporary schemas will not be automatically deleted if: * Your project overrides the [generate\_schema\_name macro](https://docs.getdbt.com/docs/build/custom-schemas.md) but it doesn't contain the required prefix `dbt_cloud_pr_`. For details, refer to [Troubleshooting](https://docs.getdbt.com/docs/deploy/ci-jobs.md#troubleshooting). * You're using a [non-native Git integration](https://docs.getdbt.com/docs/deploy/ci-jobs.md#trigger-a-ci-job-with-the-api). This is because automatic deletion relies on incoming webhooks from Git providers, which is only available through the native integrations. *  Product docs updates Hello from dbt's Product Documentation team (the stewards of the docs.getdbt.com site): @mirnawong1, @matthewshaver, @nghi-ly, and @runleonarun. What a busy summer! We merged 256 PRs between July 1st and August 31. We'd like to recognize all of the docs and support from our partner team, Developer Experience: @jasnonaz @gwenwindflower @dbeatty10 @dataders @joellabes @Jstein77 @dave-connors-3! We'd also like to give a special thanks to the 22 community members who contributed to the [dbt Product docs](https://docs.getdbt.com) for the first time. 🙏 Based on feedback from the dbt community, we made these changes: * Added a [permissions table](https://docs.getdbt.com/docs/cloud/manage-access/enterprise-permissions.md) for Enterprise accounts * Added a [browser session page](https://docs.getdbt.com/docs/cloud/about-cloud/browsers.md#browser-sessions) that clarifies dbt Cloud’s browser session time and when it logs users off. You can provide feedback by opening a pull request or issue in [our repo](https://github.com/dbt-labs/docs.getdbt.com) or reaching out in the dbt community Slack channel [#dbt-product-docs](https://getdbt.slack.com/archives/C0441GSRU04)). #### ⚡ General docs projects[​](#zap-general-docs-projects "Direct link to zap-general-docs-projects") * Added the ability to collapse sections you’re not currently looking at. There were quite a few people who wanted this, and it bugged us too, so we were happy to get this shipped! * Introduced the idea of [“Trusted” adapters](https://docs.getdbt.com/docs/supported-data-platforms.md#types-of-adapters). #### ☁ Cloud projects[​](#cloud-projects-1 "Direct link to ☁ Cloud projects") * The **What’s new?** product update widget is back in the dbt Cloud UI! The Docs team will begin updating the content to keep you informed about new features. * Launched the re-released [Semantic Layer beta docs](https://docs.getdbt.com/docs/use-dbt-semantic-layer/dbt-sl.md), which introduces users to the new API, new guide to set up MetricFlow and the new Semantic Layer, as well as revamp the ‘Use the dbt Semantic Layer’ section for users. * Updated [Admin API v2 and v3](https://docs.getdbt.com/docs/dbt-cloud-apis/admin-cloud-api.md) to help you understand the differences between them and which version includes the endpoints you use. * To improve discoverability, the docs team made changes to the [deploy dbt sidebar](https://docs.getdbt.com/docs/deploy/deployments.md). We added cards and aligned better with the dbt Cloud UI and the way it’s used. * Deprecated legacy job schemas in the [Discovery API](https://docs.getdbt.com/docs/dbt-cloud-apis/discovery-api.md). * Added a page to describe [experimental and beta features](https://docs.getdbt.com/docs/dbt-versions/experimental-features.md) in dbt Cloud and what you need to know about them. * Added a section to introduce a new beta feature [**Extended Attributes**](https://docs.getdbt.com/docs/dbt-cloud-environments.md#extended-attributes-beta), which allows users to set a flexible `profiles.yml` snippet in their dbt Cloud Environment settings. #### 🎯 Core projects[​](#core-projects-1 "Direct link to 🎯 Core projects") * We released [dbt 1.6]()! We added docs for the new commands `dbt retry` and `dbt clone` #### New 📚 Guides, ✏️ blog posts, and FAQs[​](#newguides️blog-posts-and-faqs-1 "Direct link to New 📚 Guides, ✏️ blog posts, and FAQs") * Check out how these community members use the dbt community in the [Community spotlight](https://docs.getdbt.com/community/spotlight.md). * Blog posts published this summer include [Optimizing Materialized Views with dbt](https://docs.getdbt.com/blog/announcing-materialized-views), [Data Vault 2.0 with dbt Cloud](https://docs.getdbt.com/blog/data-vault-with-dbt-cloud), and [Create dbt Documentation and Tests 10x faster with ChatGPT](https://docs.getdbt.com/blog/create-dbt-documentation-10x-faster-with-ChatGPT) - We now have two new best practice guides: [How we build our metrics](https://docs.getdbt.com/best-practices/how-we-build-our-metrics/semantic-layer-1-intro.md) and [Set up Continuous Integration](https://docs.getdbt.com/guides/set-up-ci.md). *  Removing prerelease versions Previously, when dbt Labs released a new [version](https://docs.getdbt.com/docs/dbt-versions/core.md#how-dbt-core-uses-semantic-versioning) in dbt Cloud, the older patch *prerelease* version and the *latest* version remained as options in the dropdown menu available in the **Environment settings**. Now, when the *latest* version is released, the *prerelease* version will be removed and all customers remaining on it will be migrated seamlessly. There will be no interruptions to service when this migration occurs. To see which version you are currently using and to upgrade, select **Deploy** in the top navigation bar and select **Environments**. Choose the preferred environment and click **Settings**. Click **Edit** to make a change to the current dbt version. dbt Labs recommends always using the latest version whenever possible to take advantage of new features and functionality. #### August 2023[​](#august-2023 "Direct link to August 2023") *  Deprecation of endpoints in the Discovery API dbt Labs has deprecated and will be deprecating certain query patterns and replacing them with new conventions to enhance the performance of the dbt Cloud [Discovery API](https://docs.getdbt.com/docs/dbt-cloud-apis/discovery-api.md). All these changes will be in effect on *September 7, 2023*. We understand that these changes might require adjustments to your existing integration with the Discovery API. Please [contact us](mailto:support@getdbt.com) with any questions. We're here to help you during this transition period. #### Job-based queries[​](#job-based-queries "Direct link to Job-based queries") Job-based queries that use the data type `Int` for IDs will be deprecated. They will be marked as deprecated in the [GraphQL explorer](https://metadata.cloud.getdbt.com/graphql). The new convention will be for you to use the data type `BigInt` instead. This change will be in effect starting September 7, 2023. Example of query before deprecation: ```graphql query ($jobId: Int!) { models(jobId: $jobId){ uniqueId } } ``` Example of query after deprecation: ```graphql query ($jobId: BigInt!) { job(id: $jobId) { models { uniqueId } } } ``` #### modelByEnvironment queries[​](#modelbyenvironment-queries "Direct link to modelByEnvironment queries") The `modelByEnvironment` object has been renamed and moved into the `environment` object. This change is in effect and has been since August 15, 2023. Example of query before deprecation: ```graphql query ($environmentId: Int!, $uniqueId: String) { modelByEnvironment(environmentId: $environmentId, uniqueId: $uniqueId) { uniqueId executionTime executeCompletedAt } } ``` Example of query after deprecation: ```graphql query ($environmentId: BigInt!, $uniqueId: String) { environment(id: $environmentId) { applied { modelHistoricalRuns(uniqueId: $uniqueId) { uniqueId executionTime executeCompletedAt } } } } ``` #### Environment and account queries[​](#environment-and-account-queries "Direct link to Environment and account queries") Environment and account queries that use `Int` as a data type for ID have been deprecated. IDs must now be in `BigInt`. This change is in effect and has been since August 15, 2023. Example of query before deprecation: ```graphql query ($environmentId: Int!, $first: Int!) { environment(id: $environmentId) { applied { models(first: $first) { edges { node { uniqueId executionInfo { lastRunId } } } } } } } ``` Example of query after deprecation: ```graphql query ($environmentId: BigInt!, $first: Int!) { environment(id: $environmentId) { applied { models(first: $first) { edges { node { uniqueId executionInfo { lastRunId } } } } } } } ``` *  dbt Cloud IDE v1.2 We're excited to announce that we replaced the backend service that powers the Cloud IDE with a more reliable server -- dbt-server. Because this release contains foundational changes, IDE v1.2 requires dbt v1.6 or higher. This significant update follows the rebuild of the IDE frontend last year. We're committed to improving the IDE to provide you with a better experience. Previously, the Cloud IDE used dbt-rpc, an outdated service that was unable to stay up-to-date with changes from dbt-core. The dbt-rpc integration used legacy dbt-core entry points and logging systems, causing it to be sluggish, brittle, and poorly tested. The Core team had been working around this outdated technology to avoid breaking it, which prevented them from developing with velocity and confidence. #### New features[​](#new-features "Direct link to New features") * **Better dbt-core parity:** The Cloud IDE has better command parity with dbt-core, including support for commands like `dbt list` and improved treatment of flags like `--vars`, `--fail-fast`, etc. * **Improved maintainability:** With the new dbt-server, it's easier to fix bugs and improve the overall quality of the product. With dbt-rpc, fixing bugs was a time-consuming and challenging process that required extensive testing. With the new service, we can identify and fix bugs more quickly, resulting in a more stable and reliable IDE. * **A more reliable service:** Simplified architecture that's less prone to failure. ##### Product refinements[​](#product-refinements "Direct link to Product refinements") * Improved `Preview` capabilities with Core v1.6 + IDE v1.2. [This Loom](https://www.loom.com/share/12838feb77bf463c8585fc1fc6aa161b) provides more information. ##### Bug fixes[​](#bug-fixes-1 "Direct link to Bug fixes") * Global page can become "inert" and stop handling clicks * Switching back and forth between files in the git diff view can cause overwrite * Browser gets stuck during markdown preview for doc with large table * Editor right click menu is offset * Unable to Cancel on the Save New File component when Closing All Files in the IDE * Mouse flicker in the modal's file tree makes it difficult to select a folder where you want to save a new file * Snapshots not showing in Lineage when inside a subfolder and is mixed cased named * Tooltips do not work for Format and Save * When a dbt invocation is in progress or if parsing is ongoing, attempting to switch branches will cause the `Git Branch` dropdown to close automatically ##### Known issues[​](#known-issues "Direct link to Known issues") * `{{this}}` function does not display properly in preview/compile with dbt-server #### July 2023[​](#july-2023 "Direct link to July 2023") *  Faster runs and unlimited job concurrency for Enterprise account We’ve introduced significant improvements to the dbt Cloud Scheduler, offering improved performance, durability, and scalability. Read more on how you can experience faster run start execution and how enterprise users can now run as many jobs concurrently as they want to. #### Faster run starts[​](#faster-run-starts "Direct link to Faster run starts") The Scheduler takes care of preparing each dbt Cloud job to run in your cloud data platform. This [prep](https://docs.getdbt.com/docs/deploy/job-scheduler.md#scheduler-queue) involves readying a Kubernetes pod with the right version of dbt installed, setting environment variables, loading data platform credentials, and git provider authorization, amongst other environment-setting tasks. Only after the environment is set up, can dbt execution begin. We display this time to the user in dbt Cloud as “prep time”. [![The scheduler prepares a job for execution and displays it as 'prep time' in dbt Cloud.](/img/run-start.jpg?v=2 "The scheduler prepares a job for execution and displays it as 'prep time' in dbt Cloud.")](#)The scheduler prepares a job for execution and displays it as 'prep time' in dbt Cloud. For all its strengths, Kubernetes has challenges, especially with pod management impacting run execution time. We’ve rebuilt our scheduler by ensuring faster job execution with a ready pool of pods to execute customers’ jobs. This means you won't experience long prep times at the top of the hour, and we’re determined to keep runs starting near instantaneously. Don’t just take our word, review the data yourself. [![Job prep time data has seen a 75% speed improvement from Jan 2023 to July 2023. Prep time took 106 secs in Jan and now takes 27 secs as of July.](/img/prep-start.jpg?v=2 "Job prep time data has seen a 75% speed improvement from Jan 2023 to July 2023. Prep time took 106 secs in Jan and now takes 27 secs as of July.")](#)Job prep time data has seen a 75% speed improvement from Jan 2023 to July 2023. Prep time took 106 secs in Jan and now takes 27 secs as of July. Jobs scheduled at the top of the hour used to take over 106 seconds to prepare because of the volume of runs the scheduler has to process. Now, even with increased runs, we have reduced prep time to 27 secs (at a maximum) — a 75% speed improvement for runs at peak traffic times! #### Unlimited job concurrency for Enterprise accounts[​](#unlimited-job-concurrency-for-enterprise-accounts "Direct link to Unlimited job concurrency for Enterprise accounts") Our enhanced scheduler offers more durability and empowers users to run jobs effortlessly. This means Enterprise, multi-tenant accounts can now enjoy the advantages of unlimited job concurrency. Previously limited to a fixed number of run slots, Enterprise accounts now have the freedom to operate without constraints. Single-tenant support will be coming soon. Something to note, each running job occupies a run slot for its duration, and if all slots are occupied, jobs will queue accordingly. For more feature details, refer to the [dbt pricing page](https://www.getdbt.com/pricing/). Note, Team accounts created after July 2023 benefit from unlimited job concurrency: * Legacy Team accounts have a fixed number of run slots. * Both Team and Developer plans are limited to one project each. For larger-scale needs, our [Enterprise plan](https://www.getdbt.com/pricing/) offers features such as audit logging, unlimited job concurrency and projects, and more. #### June 2023[​](#june-2023 "Direct link to June 2023") *  Lint format dbt Labs is excited to announce you can now lint and format your dbt code in the dbt Cloud IDE. This is an enhanced development workflow which empowers you to effortlessly prioritize code quality. You can perform linting and formatting on five different file types: SQL, YAML, Markdown, Python, and JSON. For SQL files, you can easily lint and format your code using [SQLFluff](https://sqlfluff.com/) and apply consistent formatting using [sqlfmt](http://sqlfmt.com/). Additionally, for other file types like YAML, Markdown, JSON, and Python, you can utilize the respective tools powered by [Prettier](https://prettier.io/) and [Black](https://black.readthedocs.io/en/latest/) to ensure clean and standardized code formatting. For more info, read [Lint and format your code](https://docs.getdbt.com/docs/cloud/studio-ide/lint-format.md). [![Use SQLFluff to lint/format your SQL code, and view code errors in the Code Quality tab.](/img/docs/dbt-cloud/cloud-ide/sqlfluff.gif?v=2 "Use SQLFluff to lint/format your SQL code, and view code errors in the Code Quality tab.")](#)Use SQLFluff to lint/format your SQL code, and view code errors in the Code Quality tab. [![Use sqlfmt to format your SQL code.](/img/docs/dbt-cloud/cloud-ide/sqlfmt.gif?v=2 "Use sqlfmt to format your SQL code.")](#)Use sqlfmt to format your SQL code. [![Format YAML, Markdown, and JSON files using Prettier.](/img/docs/dbt-cloud/cloud-ide/prettier.gif?v=2 "Format YAML, Markdown, and JSON files using Prettier.")](#)Format YAML, Markdown, and JSON files using Prettier. *  CI updates dbt Cloud CI is a critical part of the analytics engineering workflow. Large teams rely on process to ensure code quality is high, and they look to dbt Cloud CI to automate testing code changes in an efficient way, enabling speed while keep the bar high. With status checks directly posted to their dbt PRs, developers gain the confidence that their code changes will work as expected in production, and once you’ve grown accustomed to seeing that green status check in your PR, you won’t be able to work any other way. [![CI checks directly from within Git](/img/docs/release-notes/ci-checks.png?v=2 "CI checks directly from within Git")](#)CI checks directly from within Git What separates dbt CI from other CI providers is its ability to keep track of state of what’s running in your production environment, so that when you run a CI job, only the modified data assets in your pull request and their downstream dependencies get built and tested in a staging schema. dbt aims to make each CI check as efficient as possible, so as to not waste any data warehouse resources. As soon as the CI run completes, its status posts directly back to the PR in GitHub, GitLab, or Azure DevOps, depending on which Git provider you’re using. Teams can set up guardrails to let only PRs with successful CI checks be approved for merging, and the peer review process is greatly streamlined because dbt does the first testing pass. We're excited to introduce a few critical capabilities to dbt CI that will improve productivity and collaboration in your team’s testing and integration workflow. As of this week, you can now: * **Run multiple CI checks in parallel**. If more than one contributor makes changes to the same dbt project in dbt Cloud in short succession, the later arriving CI check no longer has to wait for the first check to complete. Both checks will execute concurrently. * **Automatically cancel stale CI runs**. If you push multiple commits to the same PR, dbt will cancel older, now-out-of-date CI checks automatically. No resources wasted on checking stale code. * **Run CI checks without blocking production runs**. CI checks will no longer consume run slots, meaning you can have as many CI checks running as you want, without impeding your production jobs. To learn more, refer to [Continuous integration](https://docs.getdbt.com/docs/deploy/continuous-integration.md) and [CI jobs](https://docs.getdbt.com/docs/deploy/ci-jobs.md). *  Admin API dbt Labs updated the docs for the [dbt Cloud Administrative API](https://docs.getdbt.com/docs/dbt-cloud-apis/admin-cloud-api.md) and they are now available for both [v2](https://docs.getdbt.com/dbt-cloud/api-v2#/) and [v3](https://docs.getdbt.com/dbt-cloud/api-v3#/). * Now using Spotlight for improved UI and UX. * All endpoints are now documented for v2 and v3. Added automation to the docs so they remain up to date. * Documented many of the request and response bodies. * You can now test endpoints directly from within the API docs. And, you can choose which [regional server](https://docs.getdbt.com/docs/cloud/about-cloud/access-regions-ip-addresses.md) to use (North America, APAC, or EMEA). * With the new UI, you can more easily generate code for any endpoint. *  Product docs updates Hello from the dbt Docs team: @mirnawong1, @matthewshaver, @nghi-ly, and @runleonarun! First, we’d like to thank the 17 new community contributors to docs.getdbt.com — ✨ @aaronbini, @sjaureguimodo, @aranke, @eiof, @tlochner95, @mani-dbt, @iamtodor, @monilondo, @vrfn, @raginjason, @AndrewRTsao, @MitchellBarker, @ajaythomas, @smitsrr, @leoguyaux, @GideonShils, @michaelmherrera! Here's what's new to [docs.getdbt.com](http://docs.getdbt.com/) in June: #### ☁ Cloud projects[​](#cloud-projects-2 "Direct link to ☁ Cloud projects") * We clarified the nuances of [CI and CI jobs](https://docs.getdbt.com/docs/deploy/continuous-integration.md), updated the [Scheduler content](https://docs.getdbt.com/docs/deploy/job-scheduler.md), added two new pages for the job settings and run visibility, moved the project state page to the [Syntax page](https://docs.getdbt.com/reference/node-selection/syntax.md), and provided a landing page for [Deploying with Cloud](https://docs.getdbt.com/docs/deploy/jobs.md) to help readers navigate the content better. * We reformatted the [Supported data platforms page](https://docs.getdbt.com/docs/supported-data-platforms.md) by adding dbt Cloud to the page, splitting it into multiple pages, using cards to display verified adapters, and moving the [Warehouse setup pages](https://docs.getdbt.com/docs/local/connect-data-platform/about-dbt-connections.md) to the Docs section. * We launched a new [Lint and format page](https://docs.getdbt.com/docs/cloud/studio-ide/lint-format.md), which highlights the awesome new dbt Cloud IDE linting/formatting function. * We enabled a connection between [dbt Cloud release notes](https://docs.getdbt.com/docs/dbt-versions/dbt-cloud-release-notes.md) and the dbt Slack community. This means new dbt Cloud release notes are automatically sent to the slack community [#dbt-cloud channel](https://getdbt.slack.com/archives/CMZ2V0X8V) via RSS feed, keeping users up to date with changes that may affect them. * We’ve added two new docs links in the dbt Cloud Job settings user interface (UI). This will provide additional guidance and help users succeed when setting up a dbt Cloud job: [job commands](https://docs.getdbt.com/docs/deploy/job-commands.md) and job triggers. * We added information related to the newly created [IT license](https://docs.getdbt.com/docs/cloud/manage-access/about-user-access.md#license-based-access-control), available for Team and Enterprise plans. * We added a new [Supported browser page](https://docs.getdbt.com/docs/cloud/about-cloud/browsers.md), which lists the recommended browsers for dbt Cloud. * We launched a new page informing users of [new Experimental features option](https://docs.getdbt.com/docs/dbt-versions/experimental-features.md) in dbt Cloud. * We worked with dbt Engineering to help publish new beta versions of the dbt [dbt Cloud Administrative API docs](https://docs.getdbt.com/docs/dbt-cloud-apis/admin-cloud-api.md). #### 🎯 Core projects[​](#core-projects-2 "Direct link to 🎯 Core projects") * We launched the new [MetricFlow docs](https://docs.getdbt.com/docs/build/build-metrics-intro.md) on dbt Core v1.6 beta. * Split [Global configs](https://docs.getdbt.com/reference/global-configs/about-global-configs.md) into individual pages, making it easier to find, especially using search. #### New 📚 Guides, ✏️ blog posts, and FAQs[​](#newguides️blog-posts-and-faqs-2 "Direct link to New 📚 Guides, ✏️ blog posts, and FAQs") * Add an Azure DevOps example in the [Customizing CI/CD with custom pipelines](https://docs.getdbt.com/guides/custom-cicd-pipelines.md) guide. #### May 2023[​](#may-2023 "Direct link to May 2023") *  dbt Cloud IDE To continue improving your [Cloud IDE](https://docs.getdbt.com/docs/cloud/studio-ide/develop-in-studio.md) development experience, the dbt Labs team continues to work on adding new features, fixing bugs, and increasing reliability ✨. Stay up-to-date with [IDE-related changes](https://docs.getdbt.com/tags/ide.md). #### New features[​](#new-features-1 "Direct link to New features") * Lint via SQL Fluff is now available in beta (GA over the next 2-3 weeks) * Format markdown files with prettier * Leverage developer experience shortcuts, including ``Ctrl + ` ``(toggle history drawer), `CMD + Option + /` (toggle block comment), `CMD + Shift + P` (open command palette), `Option + W` (close editor tab) * Display parent folder name for files with same name in Changes section * Navigate the new IDE features quickly using [the IDE User Interface](https://docs.getdbt.com/docs/cloud/studio-ide/ide-user-interface.md) help page * Use `top X` in SQL when previewing in the IDE * Opt into the new IDE backend layer over the past month (still with dbt-rpc). Ready for beta later in June! #### Product refinements[​](#product-refinements-1 "Direct link to Product refinements") * Performance-related upgrades: * Reduced cold start time by 60+% * Improved render time of modals in the IDE by 98% * Improved IDE performance with dbt Core v1.5+ (faster and snappier – highly encourage you to [upgrade your dbt version](https://docs.getdbt.com/docs/dbt-versions/upgrade-dbt-version-in-cloud.md)!) * Upgraded sqlfmt (which powers the Format button) to 0.18.0 * Updated Build button to change menu options based on file/model type (snapshot, macro, etc.) * Display message to disable adblocker for file contents error * Moved Format button to console bar * Made many security enhancements in the IDE #### Bug fixes[​](#bug-fixes-2 "Direct link to Bug fixes") * File icon sizes no longer get wonky in small screen * Toast notifications no longer take over command bar menu * Hover info inside the text editor no longer gets cut off * Transition between a file and a recently modified scratchpad no longer triggers a console error * dbt v1.5+ now can access the IDE * Confirm button on the Unsaved Changes modal now closes after clicking it * Long node names no longer overflow in the parsed logs section in history drawer * Status pill in history drawer no longer scales with longer command * Tooltip for tab name with a long file name is no longer cut off * Lint button should no longer available in main branch *  Run history improvements New usability and design improvements to the **Run History** dashboard in dbt Cloud are now available. These updates allow people to discover the information they need more easily by reducing the number of clicks, surfacing more relevant information, keeping people in flow state, and designing the look and feel to be more intuitive to use. Highlights include: * Usability improvements for CI runs with hyperlinks to the branch, PR, and commit SHA, along with more discoverable temporary schema names * Preview of runs' error messages on hover * Hyperlinks to the environment * Better iconography on run status * Clearer run trigger cause (API, scheduled, pull request, triggered by user) * More details on the schedule time on hover * Run timeout visibility dbt Labs is making a change to the metadata retrieval policy for Run History in dbt Cloud. **Beginning June 1, 2023,** developers on the dbt multi-tenant application will be able to self-serve access to their account’s run history through the dbt user interface (UI) and API for only 365 days, on a rolling basis. Older run history will be available for download by reaching out to Customer Support. We're seeking to minimize the amount of metadata we store while maximizing application performance. Specifically, all `GET` requests to the dbt Cloud [Runs endpoint](https://docs.getdbt.com/dbt-cloud/api-v2#/operations/List%20Runs) will return information on runs, artifacts, logs, and run steps only for the past 365 days. Additionally, the run history displayed in the dbt Cloud UI will only show runs for the past 365 days. [![The dbt Cloud UI displaying a Run History](/img/docs/dbt-cloud/rn-run-history.jpg?v=2 "The dbt Cloud UI displaying a Run History")](#)The dbt Cloud UI displaying a Run History We will retain older run history in cold storage and can make it available to customers who reach out to our Support team. To request older run history info, contact the Support team at or use the dbt Cloud application chat by clicking the `?` icon in the dbt Cloud UI. *  Run details and log improvements New usability and design improvements to the run details and logs in dbt Cloud are now available. The ability to triage errors in logs is a big benefit of using dbt Cloud's job and scheduler functionality. The updates help make the process of finding the root cause much easier. Highlights include: * Surfacing a warn state on a run step * Search in logs * Easier discoverability of errors and warnings in logs * Lazy loading of logs, making the whole run details page load faster and feel more performant * Cleaner look and feel with iconography * Helpful tool tips *  Product docs updates Hello from the dbt Docs team: @mirnawong1, @matthewshaver, @nghi-ly, and @runleonarun! First, we’d like to thank the 13 new community contributors to docs.getdbt.com! Here's what's new to [docs.getdbt.com](http://docs.getdbt.com/) in May: #### 🔎 Discoverability[​](#-discoverability "Direct link to 🔎 Discoverability") * We made sure everyone knows that Cloud-users don’t need a [profiles.yml file](https://docs.getdbt.com/docs/local/profiles.yml.md) by adding a callout on several key pages. * Fleshed out the [model Jinja variable page](https://docs.getdbt.com/reference/dbt-jinja-functions/model.md), which originally lacked conceptual info and didn’t link to the schema page. * Added a new [Quickstarts landing page](https://docs.getdbt.com/guides.md). This new format sets up for future iterations that will include filtering! But for now, we are excited you can step through quickstarts in a focused way. #### Cloud projects[​](#cloud-projects-3 "Direct link to Cloud projects") * We launched [dbt Cloud IDE user interface doc](https://docs.getdbt.com/docs/cloud/studio-ide/ide-user-interface.md), which provides a thorough walkthrough of the IDE UI elements and their definitions. * Launched a sparkling new [dbt Cloud Scheduler page](https://docs.getdbt.com/docs/deploy/job-scheduler.md) ✨! We went from previously having little content around the scheduler to a subsection that breaks down the awesome scheduler features and how it works. * Updated the [dbt Cloud user license page](https://docs.getdbt.com/docs/cloud/manage-access/seats-and-users.md#licenses) to clarify how to add or remove cloud users. * Shipped these Discovery API docs to coincide with the launch of the Discovery API: * [About the Discovery API](https://docs.getdbt.com/docs/dbt-cloud-apis/discovery-api.md) * [Use cases and examples for the Discovery API](https://docs.getdbt.com/docs/dbt-cloud-apis/discovery-use-cases-and-examples.md) * [Query the Discovery API](https://docs.getdbt.com/docs/dbt-cloud-apis/discovery-querying.md) #### 🎯 Core projects[​](#core-projects-3 "Direct link to 🎯 Core projects") * See what’s coming up [in Core v 1.6](https://github.com/dbt-labs/docs.getdbt.com/issues?q=is%3Aissue+label%3A%22dbt-core+v1.6%22)! * We turned the `profiles.yml` [page](https://docs.getdbt.com/docs/local/profiles.yml.md) into a landing page, added more context to profile.yml page, and moved the ‘About CLI’ higher up in the `Set up dbt` section. #### New 📚 Guides, ✏️ blog posts, and FAQs[​](#newguides️blog-posts-and-faqs-3 "Direct link to New 📚 Guides, ✏️ blog posts, and FAQs") If you want to contribute to a blog post, we’re focusing on content * Published a blog post: [Accelerate your documentation workflow: Generate docs for whole folders at once](https://docs.getdbt.com/blog/generating-dynamic-docs-dbt) * Published a blog post: [Data engineers + dbt v1.5: Evolving the craft for scale](https://docs.getdbt.com/blog/evolving-data-engineer-craft) * Added an [FAQ](https://docs.getdbt.com/faqs/Warehouse/db-connection-dbt-compile.md) to clarify the common question users have on *Why does dbt compile needs to connect to the database?* * Published a [discourse article](https://discourse.getdbt.com/t/how-to-configure-external-user-email-notifications-in-dbt-cloud/8393) about configuring job notifications for non-dbt Cloud users #### April 2023[​](#april-2023 "Direct link to April 2023") *  dbt Cloud IDE #### New features[​](#new-features-2 "Direct link to New features") * New warning message suggests you invoke `dbt deps` when it's needed (as informed by `dbt-score`). * New warning message appears when you select models but don't save them before clicking **Build** or invoking dbt (like, dbt build/run/test). * Previews of Markdown and CSV files are now available in the IDE console. * The file tree menu now includes a Duplicate File option. * Display loading time when previewing a model #### Product refinements[​](#product-refinements-2 "Direct link to Product refinements") * Enhance autocomplete experience which has performed slowly for people with large projects and who implement a limit to max `manifest.json` for this feature * Introduce pagination for invocation node summary view (displaying 100 nodes at a time) * Improve rendering for the Changes / Version Control section of the IDE * Update icons to be consistent in dbt Cloud * Add table support to the Markdown preview * Add the lineage tab back to seed resources in the IDE * Implement modal priority when there are multiple warning modals * Improve a complex command's description in the command palette #### Bug fixes[​](#bug-fixes-3 "Direct link to Bug fixes") * File tree no longer collapses on first click when there's a project subdirectory defined * **Revert all** button now works as expected * CSV preview no longer fails with only one column * Cursor and scroll bar location are now persistent with their positions * `git diff` view now shows just change diffs and no longer shows full diff (as if file is new) until page refreshes * ToggleMinimap Command no longer runs another Command at the same time * `git diff` view no longer shows infinite spins in specific scenarios (new file, etc.) * File contents no longer get mixed up when using diff view and one file has unsaved changes * YML lineage now renders model without tests (in dbt Core v1.5 and above) * Radio buttons for **Summary** and **Details** in the logs section now consistently update to show the accurate tab selection * IDE no longer throws the console error `Error: Illegal argument` and redirects to the `Something went wrong` page *  API updates Starting May 15, 2023, we will support only the following `order_by` functionality for the List Runs endpoint: * `id` and `-id` * `created_at` and `-created_at` * `finished_at` and `-finished_at` We recommend that you change your API requests to https://\/api/v2/accounts/{accountId}/runs/ to use a supported `order_by` before this date. Access URLs dbt Cloud is hosted in multiple regions around the world, and each region has a different access URL. Users on Enterprise plans can choose to have their account hosted in any one of these regions. For a complete list of available dbt Cloud access URLs, refer to [Regions & IP addresses](https://docs.getdbt.com/docs/cloud/about-cloud/access-regions-ip-addresses.md). For more info, refer to our [documentation](https://docs.getdbt.com/dbt-cloud/api-v2#/operations/List%20Runs). *  Scheduler optimization The dbt Cloud Scheduler now prevents queue clog by canceling unnecessary runs of over-scheduled jobs. The duration of a job run tends to grow over time, usually caused by growing amounts of data in the warehouse. If the run duration becomes longer than the frequency of the job’s schedule, the queue will grow faster than the scheduler can process the job’s runs, leading to a runaway queue with runs that don’t need to be processed. Previously, when a job was in this over-scheduled state, the scheduler would stop queuing runs after 50 were already in the queue. This led to a poor user experience where the scheduler canceled runs indiscriminately. You’d have to log into dbt Cloud to manually cancel all the queued runs and change the job schedule to "unclog" the scheduler queue. Now, the dbt Cloud scheduler detects when a scheduled job is set to run too frequently and appropriately cancels runs that don’t need to be processed. Specifically, scheduled jobs can only ever have one run of the job in the queue, and if a more recent run gets queued, the early queued run will get canceled with a helpful error message. Users will still need to either refactor the job so it runs faster or change the job schedule to run less often if the job often gets into an over-scheduled state. *  Starburst adapter GA The Starburst (Trino compatible) connection is now generally available in dbt Cloud. This means you can now use dbt Cloud to connect with Starburst Galaxy, Starburst Enterprise, and self-hosted Trino. This feature is powered by the [`dbt-trino`](https://github.com/starburstdata/dbt-trino) adapter. To learn more, check out our Quickstart guide for [dbt Cloud and Starburst Galaxy](https://docs.getdbt.com/guides/starburst-galaxy.md). *  Product docs updates Hello from the dbt Docs team: @mirnawong1, @matthewshaver, @nghi-ly, and @runleonarun! We want to share some highlights introduced to docs.getdbt.com in the last month: #### 🔎 Discoverability[​](#-discoverability-1 "Direct link to 🔎 Discoverability") * [API docs](https://docs.getdbt.com/docs/dbt-cloud-apis/overview.md) now live in the left sidebar to improve discoverability. * [The deploy dbt jobs sidebar](https://docs.getdbt.com/docs/deploy/deployments.md) has had a glow up 💅 that splits the ‘about deployment’ into two paths (deploy w dbt cloud and deploy w other tools), adds more info about the dbt cloud scheduler, its features, and how to create a job, adds ADF deployment guidance. We hope the changes improve the user experience and provide users with guidance when deploying with other tools. #### ☁ Cloud projects[​](#cloud-projects-4 "Direct link to ☁ Cloud projects") * Added Starburst/Trino adapter docs, including: - [dbt Cloud quickstart guide](https://docs.getdbt.com/guides/starburst-galaxy.md),  - [connection page](https://docs.getdbt.com/docs/cloud/connect-data-platform/connect-starburst-trino.md),  - [set up page](https://docs.getdbt.com/docs/local/connect-data-platform/trino-setup.md), and [config page](https://docs.getdbt.com/reference/resource-configs/trino-configs.md). * Enhanced [dbt Cloud jobs page](https://docs.getdbt.com/docs/deploy/jobs.md) and section to include conceptual info on the queue time, improvements made around it, and about failed jobs. * Check out the April dbt [Cloud release notes](https://docs.getdbt.com/docs/dbt-versions/dbt-cloud-release-notes.md) #### 🎯 Core projects[​](#core-projects-4 "Direct link to ���🎯 Core projects") * Clearer descriptions in the [Jinja functions page](https://docs.getdbt.com/reference/dbt-jinja-functions-context-variables.md), that improve content for each card.  * [1.5 Docs]() have been released as a Release Candidate (RC)! * See the beautiful [work captured in Core v 1.5](https://github.com/dbt-labs/docs.getdbt.com/issues?q=is%3Aissue+label%3A%22dbt-core+v1.5%22+is%3Aclosed). #### New 📚 Guides and ✏️ blog posts[​](#newguides-and️blog-posts "Direct link to New 📚 Guides and ✏️ blog posts") * [Use Databricks workflows to run dbt Cloud jobs](https://docs.getdbt.com/guides/how-to-use-databricks-workflows-to-run-dbt-cloud-jobs.md) * [Refresh Tableau workbook with extracts after a job finishes](https://docs.getdbt.com/guides/zapier-refresh-tableau-workbook.md) * [dbt Python Snowpark workshop/tutorial](https://docs.getdbt.com/guides/dbt-python-snowpark.md) * [How to optimize and troubleshoot dbt Models on Databricks](https://docs.getdbt.com/guides/optimize-dbt-models-on-databricks.md) * [The missing guide to debug() in dbt](https://docs.getdbt.com/blog/guide-to-jinja-debug) * [dbt Squared: Leveraging dbt Core and dbt Cloud together at scale](https://docs.getdbt.com/blog/dbt-squared) * [Audit\_helper in dbt: Bringing data auditing to a higher level](https://docs.getdbt.com/blog/audit-helper-for-migration) #### March 2023[​](#march-2023 "Direct link to March 2023") *  dbt v1.0 deprecation dbt Cloud now requires dbt version 1.0 or later. As of March 1, 2023, we removed all instances of older dbt versions from dbt Cloud. Any environments or jobs configured with a dbt version lower than 1.0 were automatically updated to dbt v1.4, which is the latest minor version available on dbt Cloud. For more info on dbt versions, releases, and dbt Cloud support timeline, refer to [About dbt Core versions](https://docs.getdbt.com/docs/dbt-versions/core.md#latest-releases). Refer to some additional info and resources to help you upgrade your dbt version: * [How to upgrade dbt without fear](https://docs.getdbt.com/blog/upgrade-dbt-without-fear) * [Upgrade Q\&A on breaking changes](https://docs.getdbt.com/docs/dbt-versions/upgrade-dbt-version-in-cloud.md#upgrading-legacy-versions-under-10) * [Version migration guides](https://docs.getdbt.com/docs/dbt-versions/core-upgrade.md) *  dbt Cloud IDE To continue improving your [Cloud IDE](https://docs.getdbt.com/docs/cloud/studio-ide/develop-in-studio.md) development experience, the dbt Labs team continue to work on adding new features, fixing bugs, and increasing reliability ✨. Read more about the [upcoming improvements to the Cloud IDE](https://www.getdbt.com/blog/improvements-to-the-dbt-cloud-ide/) and stay up-to-date with [IDE-related changes](https://docs.getdbt.com/tags/ide.md). #### New features[​](#new-features-3 "Direct link to New features") * Commit and revert individual files under **Version Control**. * Use the [command palette](https://docs.getdbt.com/docs/cloud/studio-ide/develop-in-studio.md#cloud-ide-features) to invoke common complex dbt commands, such as resuming from the last failure. * Create PRs even when there are uncommitted changes (under the **git** dropdown). * The IDE will display more autocomplete suggestions when editing a YML file, powered by [dbt-jsonschema](https://github.com/dbt-labs/dbt-jsonschema). * The file tree now has additional options in the right-click menu, such as Copy model as ref or Copy file path. * The DAG view has been adjusted to a default of `2+model+2`. * A lineage selector has been implemented in the DAG/lineage sub-tab. * Edit directly in the git diff view located in the right pane. * A warning message will now appear when users press Command-W/Control-W when there are unsaved changes. * A new onboarding flow guide is now available. #### Product refinements[​](#product-refinements-3 "Direct link to Product refinements") * The DAG selector now uses `name` instead of `file_uri` to build selectors. * The DAG is now vertically centered under the new Selector Input element * sqlfmt has been upgraded to v0.17.0. * When the Format button fails, a toast notification will display a syntax error. * The editor now has the option to toggle minimap/word-wrap via right-click. * The history drawer displays elapsed time in real-time and s/m/h increments. * When deleting development environments, the delete modal will now warn users that any uncommitted changes will be lost. * The context for the Git button has been adjusted to show that it will link to an external site (such as GitHub or GitLab) when users create a pull request. #### Bug fixes[​](#bug-fixes-4 "Direct link to Bug fixes") * The IDE now displays an error message when the git repository is not reachable. Previously, it failed silently. * The kebab menu is now visible when the invocation history drawer is open. Previously, it wasn't showing. * DAGs are now updated/populated consistently. Previously, it occasionally failed. * The purple highlight for DAG selection is now consistent across files. Previously, it was inconsistent. * Users can now rename files back to their original name. Previously, this wasn't possible. * The link to the IDE from the project setup page has been corrected. * The IDE no longer has issues with single-space file names. * Adding invalid characters in the sub-directory config no longer causes the IDE to fail. * YML autocomplete triggers consistently now. Previously, it occasionally didn't trigger. * Reverting single files now reloads the file contents in the tab. Previously, it didn't reload. * The file tree no longer collapses on the first click when there is a project subdirectory defined. *  API updates To make the API more scalable and reliable, we've implemented a maximum limit of `100` for all API requests to our `list` endpoints. If API requests exceed the maximum limit parameter of `100`, a user will receive an API error message. This maximum limit applies to [multi-tenant instances](https://docs.getdbt.com/docs/cloud/about-cloud/access-regions-ip-addresses.md) only, and *does not* apply to single tenant instances. Refer to the [Pagination](https://docs.getdbt.com/dbt-cloud/api-v2#/) section of the overview for more information on this change. #### Feb 2023[​](#feb-2023 "Direct link to Feb 2023") *  Disable partial parsing in job commands You can now use the `--no-partial-parse` flag to disable partial parsing in your dbt Cloud job commands.  Previously, the [`--no-partial-parse` global config](https://docs.getdbt.com/reference/global-configs/parsing.md) was only available in dbt Core. For more information, refer to [partial parsing](https://docs.getdbt.com/reference/parsing.md#partial-parsing). *  dbt Cloud IDE To continue improving our [Cloud IDE](https://docs.getdbt.com/docs/cloud/studio-ide/develop-in-studio.md) experience, the dbt Labs team worked on fixing bugs, increasing reliability, and adding new features ✨. Learn more about the [February changes](https://getdbt.slack.com/archives/C03SAHKKG2Z/p1677605383451109). #### New features[​](#new-features-4 "Direct link to New features") * Support for custom node colors in the IDE DAG visualization * Ref autocomplete includes models from seeds and snapshots * Prevent menus from getting cropped (git controls dropdown, file tree dropdown, build button, editor tab options) * Additional option to access the file menu by right-clicking on the files and folders in the file tree * Rename files by double-clicking on files in the file tree and the editor tabs * Right-clicking on file tabs has new options and will now open at your cursor instead of in the middle of the tab * The git branch name above **Version Control** links to the repo for specific git providers * Currently available for all [multi-tenant](https://docs.getdbt.com/docs/cloud/about-cloud/access-regions-ip-addresses.md) instances using GitHub or GitLab providers #### Product refinements[​](#product-refinements-4 "Direct link to Product refinements") * Added an error modal for RPC parsing errors when users attempt to invoke dbt commands (preview, compile, or general dbt invocations) * Enabled syntax highlighting for Jinja expression and statement delimiters * Clarified and renamed the options under the **Build** button * Changed the term for RPC status from `Compiling` to `Parsing` to match dbt-core construct * Implemented a new File Tree component to improve render time by 60% * Disabled the Local Storage of File Tree to prevent users from running into max LocalStorage issue for large projects * Changed snapshot snippet template (`__snapshot`) to a select from source #### Bug fixes[​](#bug-fixes-5 "Direct link to Bug fixes") * You no longer have file contents carrying over when you switch to a different project that has the same file name * The preview max limit no longer allows you to override the maximum * You no longer encounter node statuses failing to update in the history drawer for those on version 1.4 core. (This is a partial fix that may be fully addressed by core version 1.5) * You can now use the **Copy File Name** option to copy up to the last dot, rather than the first dot * You can now use the `--no-partial-parse` flag to disable partial parsing in your dbt Cloud job commands.  * Previously, the [`--no-partial-parse` global config](https://docs.getdbt.com/reference/global-configs/parsing.md) was only available in dbt Core. For more information, refer to [partial parsing](https://docs.getdbt.com/reference/parsing.md#partial-parsing). #### January 2023[​](#january-2023 "Direct link to January 2023") *  dbt Cloud IDE In the spirit of continuing to improve our [Cloud IDE](https://docs.getdbt.com/docs/cloud/studio-ide/develop-in-studio.md) experience, the dbt Labs team worked on fixing bugs, increasing reliability, and adding new features ✨. Learn more about the [January changes](https://getdbt.slack.com/archives/C03SAHKKG2Z/p1675272600286119) and what's coming soon. #### New features[​](#new-features-5 "Direct link to New features") * Improved syntax highlighting within the IDE for better Jinja-SQL combination (double quotes now show proper syntax highlight!) * Adjusted the routing URL for the IDE page and removed the `next` from the URL * Added a *new* easter egg within the IDE 🐶🦆 #### Product refinements[​](#product-refinements-5 "Direct link to Product refinements") * Performance improvements and reduced IDE slowness. The IDE should feel faster and snappier. * Reliability improvements – Improved error handling that previously put IDE in a bad state * Corrected the list of dropdown options for the Build button * Adjusted startup page duration * Added code snippets for `unique` and `not_null` tests for YAML files * Added code snippets for metrics based on environment dbt versions * Changed “commit and push” to “commit and sync” to better reflect the action * Improved error message when saving or renaming files to duplicate names #### Bug fixes[​](#bug-fixes-6 "Direct link to Bug fixes") * You no longer arbitrarily encounter an `RPC server got an unknown async ID` message * You can now see the build button dropdown, which had been hidden behind the placeholder DAG screen * You can now close toast notifications for command failure when the history drawer is open * You no longer encounter a `Something went wrong` message when previewing a model * You can now see repository status in the IDE, and the IDE finds the SSH folder * Scroll bars and download CSV no longer flicker within the preview pane #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### 2024 dbt platform release notes dbt release notes for recent and historical changes. Release notes fall into one of the following categories: * **New:** New products and features * **Enhancement:** Performance improvements and feature enhancements * **Fix:** Bug and security fixes * **Behavior change:** A change to existing behavior that doesn't fit into the other categories, such as feature deprecations or changes to default settings Release notes are grouped by month for both multi-tenant and virtual private cloud (VPC)\* environments \* The official release date for this new format of release notes is May 15th, 2024. Historical release notes for prior dates may not reflect all available features released earlier this year or their tenancy availability. #### December 2024[​](#december-2024 "Direct link to December 2024") * **New**: Saved queries now support [tags](https://docs.getdbt.com/reference/resource-configs/tags.md), which allow you to categorize your resources and filter them. Add tags to your [saved queries](https://docs.getdbt.com/docs/build/saved-queries.md) in the `semantic_model.yml` file or `dbt_project.yml` file. For example: dbt\_project.yml ```yml saved-queries: jaffle_shop: customer_order_metrics: +tags: order_metrics ``` * **New**: [Dimensions](https://docs.getdbt.com/reference/resource-configs/meta.md) now support the `meta` config property in [dbt Cloud **Latest** release track](https://docs.getdbt.com/docs/dbt-versions/cloud-release-tracks.md) and from dbt Core 1.9. You can add metadata to your dimensions to provide additional context and information about the dimension. Refer to [meta](https://docs.getdbt.com/reference/resource-configs/meta.md) for more information. * **New**: [Downstream exposures](https://docs.getdbt.com/docs/cloud-integrations/downstream-exposures-tableau.md) are now generally available to dbt Enterprise plans. Downstream exposures integrate natively with Tableau (Power BI coming soon) and auto-generate downstream lineage in dbt Explorer for a richer experience. * **New**: The Semantic Layer supports Sigma as a [partner integration](https://docs.getdbt.com/docs/cloud-integrations/avail-sl-integrations.md), available in Preview. Refer to [Sigma](https://help.sigmacomputing.com/docs/configure-a-dbt-semantic-layer-integration) for more information. * **New**: The Semantic Layer now supports Azure Single-tenant deployments. Refer to [Set up the Semantic Layer](https://docs.getdbt.com/docs/use-dbt-semantic-layer/setup-sl.md) for more information on how to get started. * **Fix**: Resolved intermittent issues in Single-tenant environments affecting Semantic Layer and query history. * **Fix**: [The dbt Semantic Layer](https://docs.getdbt.com/docs/use-dbt-semantic-layer/dbt-sl.md) now respects the BigQuery [`execution_project` attribute](https://docs.getdbt.com/docs/local/connect-data-platform/bigquery-setup.md#execution-project), including for exports. * **New**: [Model notifications](https://docs.getdbt.com/docs/deploy/model-notifications.md) are now generally available in dbt. These notifications alert model owners through email about any issues encountered by models and tests as soon as they occur while running a job. * **New**: You can now use your [Azure OpenAI key](https://docs.getdbt.com/docs/cloud/account-integrations.md?ai-integration=azure#ai-integrations) (available in beta) to use dbt features like [Copilot](https://docs.getdbt.com/docs/cloud/dbt-copilot.md) and [Ask dbt](https://docs.getdbt.com/docs/cloud-integrations/snowflake-native-app.md) . Additionally, you can use your own [OpenAI API key](https://docs.getdbt.com/docs/cloud/account-integrations.md?ai-integration=openai#ai-integrations) or use [dbt Labs-managed OpenAI](https://docs.getdbt.com/docs/cloud/account-integrations.md?ai-integration=dbtlabs#ai-integrations) key. Refer to [AI integrations](https://docs.getdbt.com/docs/cloud/account-integrations.md#ai-integrations) for more information. * **New**: The [`hard_deletes`](https://docs.getdbt.com/reference/resource-configs/hard-deletes.md) config gives you more control on how to handle deleted rows from the source. Supported options are `ignore` (default), `invalidate` (replaces the legacy `invalidate_hard_deletes=true`), and `new_record`. Note that `new_record` will create a new metadata column in the snapshot table. #### November 2024[​](#november-2024 "Direct link to November 2024") * **Enhancement**: Data health signals in dbt Explorer are now available for Exposures, providing a quick view of data health while browsing resources. To view trust signal icons, go to dbt Explorer and click **Exposures** under the **Resource** tab. Refer to [Data health signals for resources](https://docs.getdbt.com/docs/explore/data-health-signals.md) for more info. * **Bug**: Identified and fixed an error with Semantic Layer queries that take longer than 10 minutes to complete. * **Fix**: Job environment variable overrides in credentials are now respected for Exports. Previously, they were ignored. * **Behavior change**: If you use a custom microbatch macro, set a [`require_batched_execution_for_custom_microbatch_strategy` behavior flag](https://docs.getdbt.com/reference/global-configs/behavior-changes.md#custom-microbatch-strategy) in your `dbt_project.yml` to enable batched execution. If you don't have a custom microbatch macro, you don't need to set this flag as dbt will handle microbatching automatically for any model using the [microbatch strategy](https://docs.getdbt.com/docs/build/incremental-microbatch.md#how-microbatch-compares-to-other-incremental-strategies). * **Enhancement**: For users that have Advanced CI's [compare changes](https://docs.getdbt.com/docs/deploy/advanced-ci.md#compare-changes) feature enabled, you can optimize performance when running comparisons by using custom dbt syntax to customize deferral usage, exclude specific large models (or groups of models with tags), and more. Refer to [Compare changes custom commands](https://docs.getdbt.com/docs/deploy/job-commands.md#compare-changes-custom-commands) for examples of how to customize the comparison command. * **New**: SQL linting in CI jobs is now generally available in dbt. You can enable SQL linting in your CI jobs, using [SQLFluff](https://sqlfluff.com/), to automatically lint all SQL files in your project as a run step before your CI job builds. SQLFluff linting is available on [dbt release tracks](https://docs.getdbt.com/docs/dbt-versions/cloud-release-tracks.md) and to dbt [Team or Enterprise](https://www.getdbt.com/pricing/) accounts. Refer to [SQL linting](https://docs.getdbt.com/docs/deploy/continuous-integration.md#sql-linting) for more information. * **New**: Use the [`dbt_valid_to_current`](https://docs.getdbt.com/reference/resource-configs/dbt_valid_to_current.md) config to set a custom indicator for the value of `dbt_valid_to` in current snapshot records (like a future date). By default, this value is `NULL`. When configured, dbt will use the specified value instead of `NULL` for `dbt_valid_to` for current records in the snapshot table. This feature is available in [the dbt Cloud **Latest** release track](https://docs.getdbt.com/docs/dbt-versions/cloud-release-tracks.md) (formerly called `Versionless`) and dbt Core v1.9 and later. * **New**: Use the [`event_time`](https://docs.getdbt.com/reference/resource-configs/event-time.md) configuration to specify "at what time did the row occur." This configuration is required for [Incremental microbatch](https://docs.getdbt.com/docs/build/incremental-microbatch.md) and can be added to ensure you're comparing overlapping times in [Advanced CI's compare changes](https://docs.getdbt.com/docs/deploy/advanced-ci.md). Available in [the dbt Cloud **Latest** release track](https://docs.getdbt.com/docs/dbt-versions/cloud-release-tracks.md) (formerly called `Versionless`) and dbt Core v1.9 and higher. * **Fix**: This update improves [Semantic Layer Tableau integration](https://docs.getdbt.com/docs/cloud-integrations/semantic-layer/tableau.md) making query parsing more reliable. Some key fixes include: * Error messages for unsupported joins between saved queries and ALL tables. * Improved handling of queries when multiple tables are selected in a data source. * Fixed a bug when an IN filter contained a lot of values. * Better error messaging for queries that can't be parsed correctly. * **Enhancement**: The Semantic Layer supports creating new credentials for users who don't have permissions to create service tokens. In the **Credentials & service tokens** side panel, the **+Add Service Token** option is unavailable for those users who don't have permission. Instead, the side panel displays a message indicating that the user doesn't have permission to create a service token and should contact their administration. Refer to [Set up Semantic Layer](https://docs.getdbt.com/docs/use-dbt-semantic-layer/setup-sl.md) for more details. #### October 2024[​](#october-2024 "Direct link to October 2024")  Coalesce 2024 announcements Documentation for new features and functionality announced at Coalesce 2024: * Iceberg table support for [Snowflake](https://docs.getdbt.com/reference/resource-configs/snowflake-configs.md#iceberg-table-format) * [Athena](https://docs.getdbt.com/reference/resource-configs/athena-configs.md) and [Teradata](https://docs.getdbt.com/reference/resource-configs/teradata-configs.md) adapter support in dbt Cloud * dbt Cloud now hosted on [Azure](https://docs.getdbt.com/docs/cloud/about-cloud/access-regions-ip-addresses.md) * Get comfortable with [dbt Cloud Release Tracks](https://docs.getdbt.com/docs/dbt-versions/cloud-release-tracks.md) that keep your project up-to-date, automatically — on a cadence appropriate for your team * Scalable [microbatch incremental models](https://docs.getdbt.com/docs/build/incremental-microbatch.md) * Advanced CI [features](https://docs.getdbt.com/docs/deploy/advanced-ci.md) * [Linting with CI jobs](https://docs.getdbt.com/docs/deploy/continuous-integration.md#sql-linting) * dbt Assist is now [dbt Copilot](https://docs.getdbt.com/docs/cloud/dbt-copilot.md) * Developer blog on [Snowflake Feature Store and dbt: A bridge between data pipelines and ML](https://docs.getdbt.com/blog/snowflake-feature-store) * [Downstream exposures with Tableau](https://docs.getdbt.com/docs/explore/view-downstream-exposures.md) * Semantic Layer integration with [Excel desktop and M365](https://docs.getdbt.com/docs/cloud-integrations/semantic-layer/excel.md) * [Data health tiles](https://docs.getdbt.com/docs/explore/data-tile.md) * [Semantic Layer and Cloud IDE integration](https://docs.getdbt.com/docs/build/metricflow-commands.md#metricflow-commands) * Query history in [Explorer](https://docs.getdbt.com/docs/explore/model-query-history.md#view-query-history-in-explorer) * Semantic Layer Metricflow improvements, including [improved granularity and custom calendar](https://docs.getdbt.com/docs/build/metricflow-time-spine.md#custom-calendar) * [Python SDK](https://docs.getdbt.com/docs/dbt-cloud-apis/sl-python.md) is now generally available - **Behavior change:** [Multi-factor authentication](https://docs.getdbt.com/docs/cloud/manage-access/mfa.md) is now enforced on all users who log in with username and password credentials. - **Enhancement**: The dbt Semantic Layer JDBC now allows users to paginate `semantic_layer.metrics()` and `semantic_layer.dimensions()` for metrics and dimensions using `page_size` and `page_number` parameters. Refer to [Paginate metadata calls](https://docs.getdbt.com/docs/dbt-cloud-apis/sl-jdbc.md#querying-the-api-for-metric-metadata) for more information. - **Enhancement**: The dbt Semantic Layer JDBC now allows you to filter your metrics to include only those that contain a specific substring, using the `search` parameter. If no substring is provided, the query returns all metrics. Refer to [Fetch metrics by substring search](https://docs.getdbt.com/docs/dbt-cloud-apis/sl-jdbc.md#querying-the-api-for-metric-metadata) for more information. - **Fix**: The [Semantic Layer Excel integration](https://docs.getdbt.com/docs/cloud-integrations/semantic-layer/excel.md) now correctly surfaces errors when a query fails to execute. Previously, it was not clear why a query failed to run. - **Fix:** Previously, POST requests to the Jobs API with invalid `cron` strings would return HTTP response status code 500s but would update the underlying entity. Now, POST requests to the Jobs API with invalid `cron` strings will result in status code 400s, without the underlying entity being updated. - **Fix:** Fixed an issue where the `Source` view page in dbt Explorer did not correctly display source freshness status if older than 30 days. - **Fix:** The UI now indicates when the description of a model is inherited from a catalog comment. - **Behavior change:** User API tokens have been deprecated. Update to [personal access tokens](https://docs.getdbt.com/docs/dbt-cloud-apis/user-tokens.md) if you have any still in use. - **New**: The Cloud IDE supports signed commits for Git, available for Enterprise plans. You can sign your Git commits when pushing them to the repository to prevent impersonation and enhance security. Supported Git providers are GitHub and GitLab. Refer to [Git commit signing](https://docs.getdbt.com/docs/cloud/studio-ide/git-commit-signing.md) for more information. - **New:** With Mesh, you can now enable bidirectional dependencies across your projects. Previously, dbt enforced dependencies to only go in one direction. dbt checks for cycles across projects and raises errors if any are detected. For details, refer to [Cycle detection](https://docs.getdbt.com/docs/mesh/govern/project-dependencies.md#cycle-detection). There's also the [Intro to Mesh](https://docs.getdbt.com/best-practices/how-we-mesh/mesh-1-intro.md) guide to help you learn more best practices. - **New**: The [Semantic Layer Python software development kit](https://docs.getdbt.com/docs/dbt-cloud-apis/sl-python.md) is now [generally available](https://docs.getdbt.com/docs/dbt-versions/product-lifecycles.md). It provides users with easy access to the Semantic Layer with Python and enables developers to interact with the Semantic Layer APIs to query metrics/dimensions in downstream tools. - **Enhancement**: You can now add a description to a singular data test. Use the [`description` property](https://docs.getdbt.com/reference/resource-properties/description.md) to document [singular data tests](https://docs.getdbt.com/docs/build/data-tests.md#singular-data-tests). You can also use [docs block](https://docs.getdbt.com/docs/build/documentation.md#using-docs-blocks) to capture your test description. The enhancement is available now in [the **Latest** release track in dbt Cloud](https://docs.getdbt.com/docs/dbt-versions/cloud-release-tracks.md), and it will be included in dbt Core v1.9. - **New**: Introducing the [microbatch incremental model strategy](https://docs.getdbt.com/docs/build/incremental-microbatch.md) (beta), available now in [dbt Cloud Latest](https://docs.getdbt.com/docs/dbt-versions/cloud-release-tracks.md) and will soon be supported in dbt Core v1.9. The microbatch strategy allows for efficient, batch-based processing of large time-series datasets for improved performance and resiliency, especially when you're working with data that changes over time (like new records being added daily). To enable this feature in dbt Cloud, set the `DBT_EXPERIMENTAL_MICROBATCH` environment variable to `true` in your project. - **New**: The dbt Semantic Layer supports custom calendar configurations in MetricFlow, available in [Preview](https://docs.getdbt.com/docs/dbt-versions/product-lifecycles.md#dbt-cloud). Custom calendar configurations allow you to query data using non-standard time periods like `fiscal_year` or `retail_month`. Refer to [custom calendar](https://docs.getdbt.com/docs/build/metricflow-time-spine.md#custom-calendar) to learn how to define these custom granularities in your MetricFlow timespine YAML configuration. - **New**: In the **Latest** release track in dbt, [Snapshots](https://docs.getdbt.com/docs/build/snapshots.md) have been updated to use YAML configuration files instead of SQL snapshot blocks. This new feature simplifies snapshot management and improves performance, and will soon be released in dbt Core 1.9. * Who does this affect? Users of the **Latest** release track in dbt can define snapshots using the new YAML specification. Users upgrading to **Latest** who have existing snapshot definitions can keep their existing configurations, or they can choose to migrate their snapshot definitions to YAML. * Users on older versions: No action is needed; existing snapshots will continue to work as before. However, we recommend upgrading to the **Latest** release track to take advantage of the new snapshot features. - **Behavior change:** Set [`state_modified_compare_more_unrendered_values`](https://docs.getdbt.com/reference/global-configs/behavior-changes.md#source-definitions-for-state) to true to reduce false positives for `state:modified` when configs differ between `dev` and `prod` environments. - **Behavior change:** Set the [`skip_nodes_if_on_run_start_fails`](https://docs.getdbt.com/reference/global-configs/behavior-changes.md#failures-in-on-run-start-hooks) flag to `True` to skip all selected resources from running if there is a failure on an `on-run-start` hook. - **Enhancement**: In the **Latest** release track in dbt Cloud, snapshots defined in SQL files can now use `config` defined in `schema.yml` YAML files. This update resolves the previous limitation that required snapshot properties to be defined exclusively in `dbt_project.yml` and/or a `config()` block within the SQL file. This will also be released in dbt Core 1.9. - **New**: In the **Latest** release track in dbt Cloud, the `snapshot_meta_column_names` config allows for customizing the snapshot metadata columns. This feature allows an organization to align these automatically-generated column names with their conventions, and will be included in the upcoming dbt Core 1.9 release. - **Enhancement**: the **Latest** release track in dbt Cloud infers a model's `primary_key` based on configured data tests and/or constraints within `manifest.json`. The inferred `primary_key` is visible in dbt Explorer and utilized by the dbt Cloud [compare changes](https://docs.getdbt.com/docs/deploy/run-visibility.md#compare-tab) feature. This will also be released in dbt Core 1.9. Read about the [order dbt infers columns can be used as primary key of a model](https://github.com/dbt-labs/dbt-core/blob/7940ad5c7858ff11ef100260a372f2f06a86e71f/core/dbt/contracts/graph/nodes.py#L534-L541). - **New:** dbt Explorer now includes trust signal icons, which is currently available as a [Preview](https://docs.getdbt.com/docs/dbt-versions/product-lifecycles.md#dbt-cloud). Trust signals offer a quick, at-a-glance view of data health when browsing your dbt models in dbt Explorer. These icons indicate whether a model is **Healthy**, **Caution**, **Degraded**, or **Unknown**. For accurate health data, ensure the resource is up-to-date and has had a recent job run. Refer to [Data health signals](https://docs.getdbt.com/docs/explore/data-health-signals.md) for more information. - **New:** Downstream exposures are now available in Preview in dbt. Downstream exposures helps users understand how their models are used in downstream analytics tools to inform investments and reduce incidents. It imports and auto-generates exposures based on Tableau dashboards, with user-defined curation. To learn more, refer to [Downstream exposures](https://docs.getdbt.com/docs/cloud-integrations/downstream-exposures-tableau.md). #### September 2024[​](#september-2024 "Direct link to September 2024") * **Fix**: MetricFlow updated `get_and_expire` to replace the unsupported `GETEX` command with a `GET` and conditional expiration, ensuring compatibility with Azure Redis 6.0. * **Enhancement**: The [dbt Semantic Layer Python SDK](https://docs.getdbt.com/docs/dbt-cloud-apis/sl-python.md) now supports `TimeGranularity` custom grain for metrics. This feature allows you to define custom time granularities for metrics, such as `fiscal_year` or `retail_month`, to query data using non-standard time periods. * **New**: Use the Copilot AI engine to generate semantic model for your models, now available in beta. Copilot automatically generates documentation, tests, and now semantic models based on the data in your model, . To learn more, refer to [Copilot](https://docs.getdbt.com/docs/cloud/dbt-copilot.md). * **New**: Use the new recommended syntax for [defining `foreign_key` constraints](https://docs.getdbt.com/reference/resource-properties/constraints.md) using `refs`, available in the **Latest** release track in dbt Cloud. This will soon be released in dbt Core v1.9. This new syntax will capture dependencies and works across different environments. * **Enhancement**: You can now run [Semantic Layer commands](https://docs.getdbt.com/docs/build/metricflow-commands.md) commands in the [dbt Cloud IDE](https://docs.getdbt.com/docs/cloud/studio-ide/develop-in-studio.md). The supported commands are `dbt sl list`, `dbt sl list metrics`, `dbt sl list dimension-values`, `dbt sl list saved-queries`, `dbt sl query`, `dbt sl list dimensions`, `dbt sl list entities`, and `dbt sl validate`. * **New**: Microsoft Excel, a Semantic Layer integration, is now generally available. The integration allows you to connect to Microsoft Excel to query metrics and collaborate with your team. Available for [Excel Desktop](https://pages.store.office.com/addinsinstallpage.aspx?assetid=WA200007100\&rs=en-US\&correlationId=4132ecd1-425d-982d-efb4-de94ebc83f26) or [Excel Online](https://pages.store.office.com/addinsinstallpage.aspx?assetid=WA200007100\&rs=en-US\&correlationid=4132ecd1-425d-982d-efb4-de94ebc83f26\&isWac=True). For more information, refer to [Microsoft Excel](https://docs.getdbt.com/docs/cloud-integrations/semantic-layer/excel.md). * **New**: [Data health tile](https://docs.getdbt.com/docs/explore/data-tile.md) is now generally available in dbt Explorer. Data health tiles provide a quick at-a-glance view of your data quality, highlighting potential issues in your data. You can embed these tiles in your dashboards to quickly identify and address data quality issues in your dbt project. * **New**: dbt Explorer's Model query history feature is now in Preview for dbt Enterprise customers. Model query history allows you to view the count of consumption queries for a model based on the data warehouse's query logs. This feature provides data teams insight, so they can focus their time and infrastructure spend on the worthwhile used data products. To learn more, refer to [Model query history](https://docs.getdbt.com/docs/explore/model-query-history.md). * **Enhancement**: You can now use [Extended Attributes](https://docs.getdbt.com/docs/dbt-cloud-environments.md#extended-attributes) and [Environment Variables](https://docs.getdbt.com/docs/build/environment-variables.md) when connecting to the Semantic Layer. If you set a value directly in the Semantic Layer Credentials, it will have a higher priority than Extended Attributes. When using environment variables, the default value for the environment will be used. If you're using exports, job environment variable overrides aren't supported yet, but they will be soon. * **New:** There are two new [environment variable defaults](https://docs.getdbt.com/docs/build/environment-variables.md#dbt-cloud-context) — `DBT_CLOUD_ENVIRONMENT_NAME` and `DBT_CLOUD_ENVIRONMENT_TYPE`. * **New:** The [Amazon Athena warehouse connection](https://docs.getdbt.com/docs/cloud/connect-data-platform/connect-amazon-athena.md) is available as a public preview for dbt accounts that have upgraded to [the **Latest** release track](https://docs.getdbt.com/docs/dbt-versions/cloud-release-tracks.md). #### August 2024[​](#august-2024 "Direct link to August 2024") * **Fix:** Fixed an issue in [dbt Explorer](https://docs.getdbt.com/docs/explore/explore-projects.md) where navigating to a consumer project from a public node resulted in displaying a random public model rather than the original selection. * **New**: You can now configure metrics at granularities at finer time grains, such as hour, minute, or even by the second. This is particularly useful for more detailed analysis and for datasets where high-resolution time data is required, such as minute-by-minute event tracking. Refer to [dimensions](https://docs.getdbt.com/docs/build/dimensions.md) for more information about time granularity. * **Enhancement**: Microsoft Excel now supports [saved selections](https://docs.getdbt.com/docs/cloud-integrations/semantic-layer/excel.md#using-saved-selections) and [saved queries](https://docs.getdbt.com/docs/cloud-integrations/semantic-layer/excel.md#using-saved-queries). Use Saved selections to save your query selections within the Excel application. The application also clears stale data in [trailing rows](https://docs.getdbt.com/docs/cloud-integrations/semantic-layer/excel.md#other-settings) by default. To return your results and keep any previously selected data intact, un-select the **Clear trailing rows** option. * **Behavior change:** GitHub is no longer supported for OAuth login to dbt. Use a supported [SSO or OAuth provider](https://docs.getdbt.com/docs/cloud/manage-access/sso-overview.md) to securely manage access to your dbt account. #### July 2024[​](#july-2024 "Direct link to July 2024") * **Behavior change:** `target_schema` is no longer a required configuration for [snapshots](https://docs.getdbt.com/docs/build/snapshots.md). You can now target different schemas for snapshots across development and deployment environments using the [schema config](https://docs.getdbt.com/reference/resource-configs/schema.md). * **New:** [Connections](https://docs.getdbt.com/docs/cloud/connect-data-platform/about-connections.md#connection-management) are now available under **Account settings** as a global setting. Previously, they were found under **Project settings**. This is being rolled out in phases over the coming weeks. * **New:** Admins can now assign [environment-level permissions](https://docs.getdbt.com/docs/cloud/manage-access/environment-permissions.md) to groups for specific roles. * **New:** [Merge jobs](https://docs.getdbt.com/docs/deploy/merge-jobs.md) for implementing [continuous deployment (CD)](https://docs.getdbt.com/docs/deploy/continuous-deployment.md) workflows are now GA in dbt. Previously, you had to either set up a custom GitHub action or manually build the changes every time a pull request is merged. * **New**: The ability to lint your SQL files from the dbt CLI is now available. To learn more, refer to [Lint SQL files](https://docs.getdbt.com/docs/cloud/configure-cloud-cli.md#lint-sql-files). * **Behavior change:** dbt Cloud IDE automatically adds a `--limit 100` to preview queries to avoid slow and expensive queries during development. Recently, dbt Core changed how the `limit` is applied to ensure that `order by` clauses are consistently respected. Because of this, queries that already contain a limit clause might now cause errors in the IDE previews. To address this, dbt Labs plans to provide an option soon to disable the limit from being applied. Until then, dbt Labs recommends removing the (duplicate) limit clause from your queries during previews to avoid these IDE errors. * **Enhancement**: Introducing a revamped overview page for dbt Explorer, available in beta. It includes a new design and layout for the dbt Explorer homepage. The new layout provides a more intuitive experience for users to navigate their dbt projects, as well as a new **Latest updates** section to view the latest changes or issues related to project resources. To learn more, refer to [Overview page](https://docs.getdbt.com/docs/explore/explore-projects.md#overview-page). ###### dbt Semantic Layer[​](#dbt-semantic-layer "Direct link to dbt Semantic Layer") * **New**: Introduced the [`dbt-sl-sdk` Python software development kit (SDK)](https://github.com/dbt-labs/semantic-layer-sdk-python) Python library, which provides you with easy access to the dbt Semantic Layer with Python. It allows developers to interact with the dbt Semantic Layer APIs and query metrics and dimensions in downstream tools. Refer to the [dbt Semantic Layer Python SDK](https://docs.getdbt.com/docs/dbt-cloud-apis/sl-python.md) for more information. * **New**: Introduced Semantic validations in CI pipelines. Automatically test your semantic nodes (metrics, semantic models, and saved queries) during code reviews by adding warehouse validation checks in your CI job using the `dbt sl validate` command. You can also validate modified semantic nodes to guarantee code changes made to dbt models don't break these metrics. Refer to [Semantic validations in CI](https://docs.getdbt.com/docs/deploy/ci-jobs.md#semantic-validations-in-ci) to learn about the additional commands and use cases. * **New**: We now expose the `meta` field within the [config property](https://docs.getdbt.com/reference/resource-configs/meta.md) for dbt Semantic Layer metrics in the [JDBC and GraphQL APIs](https://docs.getdbt.com/docs/dbt-cloud-apis/sl-api-overview.md) under the `meta` field. * **New**: Added a new command in the dbt CLI called `export-all`, which allows you to export multiple or all of your saved queries. Previously, you had to explicitly specify the [list of saved queries](https://docs.getdbt.com/docs/build/metricflow-commands.md#list-saved-queries). * **Enhancement**: The Semantic Layer now offers more granular control by supporting multiple data platform credentials, which can represent different roles or service accounts. Available for dbt Enterprise plans, you can map credentials to service tokens for secure authentication. Refer to [Set up Semantic Layer](https://docs.getdbt.com/docs/use-dbt-semantic-layer/setup-sl.md#set-up-dbt-semantic-layer) for more details. * **Fix**: Addressed a bug where unicode query filters (such as Chinese characters) were not working correctly in the Semantic Layer Tableau integration. * **Fix**: Resolved a bug with parsing certain private keys for BigQuery when running an export. * **Fix**: Addressed a bug that caused a "closed connection" error to be returned when querying or running an Export. * **Fix**: Resolved an issue in dbt Core where, during partial parsing, all generated metrics in a file were incorrectly deleted instead of just those related to the changed semantic model. Now, only the metrics associated with the modified model are affected. #### June 2024[​](#june-2024 "Direct link to June 2024") * **New:** Introduced new granularity support for cumulative metrics in MetricFlow. Granularity options for cumulative metrics are slightly different than granularity for other metric types. For other metrics, we use the `date_trunc` function to implement granularity. However, because cumulative metrics are non-additive (values can't be added up), we can't use the `date_trunc` function to change their time grain granularity. Instead, we use the `first()`, `last()`, and `avg()` aggregation functions to aggregate cumulative metrics over the requested period. By default, we take the first value of the period. You can change this behavior by using the `period_agg` parameter. For more information, refer to [Granularity options for cumulative metrics](https://docs.getdbt.com/docs/build/cumulative.md#granularity-options). ###### dbt Semantic Layer[​](#dbt-semantic-layer-1 "Direct link to dbt Semantic Layer") * **New:** Added support for Predicate pushdown SQL optimization in MetricFlow. We will now push down categorical dimension filters to the metric source table. Previously filters were applied after we selected from the metric source table. This change helps reduce full table scans on certain query engines. * **New:** Enabled `where` filters on dimensions (included in saved queries) to use the cache during query time. This means you can now dynamically filter your dashboards without losing the performance benefits of caching. Refer to [caching](https://docs.getdbt.com/docs/use-dbt-semantic-layer/sl-cache.md#result-caching) for more information. * **Enhancement:** In [Google Sheets](https://docs.getdbt.com/docs/cloud-integrations/semantic-layer/gsheets.md), we added information icons and descriptions to metrics and dimensions options in the Query Builder menu. Click on the **Info** icon button to view a description of the metric or dimension. Available in the following Query Builder menu sections: metric, group by, where, saved selections, and saved queries. * **Enhancement:** In [Google Sheets](https://docs.getdbt.com/docs/cloud-integrations/semantic-layer/gsheets.md), you can now apply granularity to all time dimensions, not just metric time. This update uses our [APIs](https://docs.getdbt.com/docs/dbt-cloud-apis/sl-api-overview.md) to support granularity selection on any chosen time dimension. * **Enhancement**: MetricFlow time spine warnings now prompt users to configure missing or small-grain-time spines. An error message is displayed for multiple time spines per granularity. * **Enhancement**: Errors now display if no time spine is configured at the requested or smaller granularity. * **Enhancement:** Improved querying error message when no semantic layer credentials were set. * **Enhancement:** Querying grains for cumulative metrics now returns multiple granularity options (day, week, month, quarter, year) like all other metric types. Previously, you could only query one grain option for cumulative metrics. * **Fix:** Removed errors that prevented querying cumulative metrics with other granularities. * **Fix:** Fixed various Tableau errors when querying certain metrics or when using calculated fields. * **Fix:** In Tableau, we relaxed naming field expectations to better identify calculated fields. * **Fix:** Fixed an error when refreshing database metadata for columns that we can't convert to Arrow. These columns will now be skipped. This mainly affected Redshift users with custom types. * **Fix:** Fixed Private Link connections for Databricks. ###### Also available this month:[​](#also-available-this-month "Direct link to Also available this month:") * **Enhancement:** Updates to the UI when [creating merge jobs](https://docs.getdbt.com/docs/deploy/merge-jobs.md) are now available. The updates include improvements to helper text, new deferral settings, and performance improvements. * **New**: The Semantic Layer now offers a seamless integration with Microsoft Excel, available in [preview](https://docs.getdbt.com/docs/dbt-versions/product-lifecycles.md#dbt-cloud). Build semantic layer queries and return data on metrics directly within Excel, through a custom menu. To learn more and install the add-on, check out [Microsoft Excel](https://docs.getdbt.com/docs/cloud-integrations/semantic-layer/excel.md). * **New:** [Job warnings](https://docs.getdbt.com/docs/deploy/job-notifications.md) are now GA. Previously, you could receive email or Slack alerts about your jobs when they succeeded, failed, or were canceled. Now with the new **Warns** option, you can also receive alerts when jobs have encountered warnings from tests or source freshness checks during their run. This gives you more flexibility on *when* to be notified. * **New:** A [preview](https://docs.getdbt.com/docs/dbt-versions/product-lifecycles.md#dbt-cloud) of the dbt Snowflake Native App is now available. With this app, you can access dbt Explorer, the **Ask dbt** chatbot, and orchestration observability features, extending your dbt experience into the Snowflake UI. To learn more, check out [About the dbt Snowflake Native App](https://docs.getdbt.com/docs/cloud-integrations/snowflake-native-app.md) and [Set up the dbt Snowflake Native App](https://docs.getdbt.com/docs/cloud-integrations/set-up-snowflake-native-app.md). #### May 2024[​](#may-2024 "Direct link to May 2024") * **Enhancement:** We've now introduced a new **Prune branches** [Git button](https://docs.getdbt.com/docs/cloud/studio-ide/ide-user-interface.md#prune-branches-modal) in the IDE. This button allows you to delete local branches that have been deleted from the remote repository, keeping your branch management tidy. Available in all regions now and will be released to single tenant accounts during the next release cycle. ###### dbt Cloud Launch Showcase event[​](#dbt-cloud-launch-showcase-event "Direct link to dbt Cloud Launch Showcase event") The following features are new or enhanced as part of our [dbt Launch Showcase](https://www.getdbt.com/resources/webinars/dbt-cloud-launch-showcase) event on May 14th, 2024: * **New:** [Copilot](https://docs.getdbt.com/docs/cloud/dbt-copilot.md) is a powerful AI engine helping you generate documentation, tests, and semantic models, saving you time as you deliver high-quality data. Available in private beta for a subset of dbt Enterprise users and in the IDE. [Register your interest](https://docs.google.com/forms/d/e/1FAIpQLScPjRGyrtgfmdY919Pf3kgqI5E95xxPXz-8JoVruw-L9jVtxg/viewform) to join the private beta. * **New:** The new low-code editor, now in private beta, enables less SQL-savvy analysts to create or edit dbt models through a visual, drag-and-drop experience inside of dbt. These models compile directly to SQL and are indistinguishable from other dbt models in your projects: they are version-controlled, can be accessed across projects in Mesh, and integrate with dbt Explorer and the Cloud IDE. [Register your interest](https://docs.google.com/forms/d/e/1FAIpQLScPjRGyrtgfmdY919Pf3kgqI5E95xxPXz-8JoVruw-L9jVtxg/viewform) to join the private beta. * **New:** [dbt CLI](https://docs.getdbt.com/docs/cloud/cloud-cli-installation.md) is now Generally Available (GA) to all users. The dbt CLI is a command-line interface that allows you to interact with dbt, use automatic deferral, leverage Mesh, and more! * **New:** [Unit tests](https://docs.getdbt.com/docs/build/unit-tests.md) are now GA in dbt. Unit tests enable you to test your SQL model logic against a set of static inputs. *  New: Native support for Azure Synapse Analytics[Preview](https://docs.getdbt.com/docs/dbt-versions/product-lifecycles "Go to https://docs.getdbt.com/docs/dbt-versions/product-lifecycles") Native support in dbt Cloud for Azure Synapse Analytics is now available as a [preview](https://docs.getdbt.com/docs/dbt-versions/product-lifecycles.md#dbt-cloud)! To learn more, refer to [Connect Azure Synapse Analytics](https://docs.getdbt.com/docs/cloud/connect-data-platform/connect-azure-synapse-analytics.md) and [Microsoft Azure Synapse DWH configurations](https://docs.getdbt.com/reference/resource-configs/azuresynapse-configs.md). Also, check out the [Quickstart for dbt Cloud and Azure Synapse Analytics](https://docs.getdbt.com/guides/azure-synapse-analytics.md?step=1). The guide walks you through: * Loading the Jaffle Shop sample data (provided by dbt Labs) into Azure Synapse Analytics. * Connecting dbt Cloud to Azure Synapse Analytics. * Turning a sample query into a model in your dbt project. A model in dbt is a SELECT statement. * Adding tests to your models. * Documenting your models. * Scheduling a job to run. * **New:** MetricFlow enables you to now add metrics as dimensions to your metric filters to create more complex metrics and gain more insights. Available for all Semantic Layer users. * **New:** [Staging environment](https://docs.getdbt.com/docs/deploy/deploy-environments.md#staging-environment) is now GA. Use staging environments to grant developers access to deployment workflows and tools while controlling access to production data. Available to all dbt users. * **New:** Oauth login support via [Databricks](https://docs.getdbt.com/docs/cloud/manage-access/set-up-databricks-oauth.md) is now GA to Enterprise customers. *  New: GA of dbt Explorer's features dbt Explorer's current capabilities — including column-level lineage, model performance analysis, and project recommendations — are now Generally Available for dbt Cloud Enterprise and Teams plans. With Explorer, you can more easily navigate your dbt Cloud project – including models, sources, and their columns – to gain a better understanding of its latest production or staging state. To learn more about its features, check out: * [Explore projects](https://docs.getdbt.com/docs/explore/explore-projects.md) * [Explore multiple projects](https://docs.getdbt.com/docs/explore/explore-multiple-projects.md) * [Column-level lineage](https://docs.getdbt.com/docs/explore/column-level-lineage.md) * [Model performance](https://docs.getdbt.com/docs/explore/model-performance.md) * [Project recommendations](https://docs.getdbt.com/docs/explore/project-recommendations.md) * **New:** Native support for Microsoft Fabric in dbt is now GA. This feature is powered by the [dbt-fabric](https://github.com/Microsoft/dbt-fabric) adapter. To learn more, refer to [Connect Microsoft Fabric](https://docs.getdbt.com/docs/cloud/connect-data-platform/connect-microsoft-fabric.md) and [Microsoft Fabric DWH configurations](https://docs.getdbt.com/reference/resource-configs/fabric-configs.md). There's also a [quickstart guide](https://docs.getdbt.com/guides/microsoft-fabric.md?step=1) to help you get started. * **New:** Mesh is now GA to dbt Enterprise users. Mesh is a framework that helps organizations scale their teams and data assets effectively. It promotes governance best practices and breaks large projects into manageable sections. Get started with Mesh by reading the [Mesh quickstart guide](https://docs.getdbt.com/guides/mesh-qs.md?step=1). * **New:** The Semantic Layer [Tableau Desktop, Tableau Server](https://docs.getdbt.com/docs/cloud-integrations/semantic-layer/tableau.md), and [Google Sheets integration](https://docs.getdbt.com/docs/cloud-integrations/semantic-layer/gsheets.md) is now GA to dbt Team or Enterprise accounts. These first-class integrations allow you to query and unlock valuable insights from your data ecosystem. * **Enhancement:** As part of our ongoing commitment to improving the [IDE](https://docs.getdbt.com/docs/cloud/studio-ide/develop-in-studio.md#considerations), the filesystem now comes with improvements to speed up dbt development, such as introducing a Git repository limit of 10GB. ###### Also available this month:[​](#also-available-this-month-1 "Direct link to Also available this month:") * **Update**: The [dbt CLI](https://docs.getdbt.com/docs/cloud/cloud-cli-installation.md) is now available for Azure single tenant and is accessible in all [deployment regions](https://docs.getdbt.com/docs/cloud/about-cloud/access-regions-ip-addresses.md) for both multi-tenant and single-tenant accounts. * **New**: The [Semantic Layer](https://docs.getdbt.com/docs/use-dbt-semantic-layer/dbt-sl.md) introduces [declarative caching](https://docs.getdbt.com/docs/use-dbt-semantic-layer/sl-cache.md), allowing you to cache common queries to speed up performance and reduce query compute costs. Available for dbt Team or Enterprise accounts. *  New: Latest Release Track The **Latest** Release Track is now Generally Available (previously Public Preview). On this release track, you get automatic upgrades of dbt, including early access to the latest features, fixes, and performance improvements for your dbt project. dbt Labs will handle upgrades behind-the-scenes, as part of testing and redeploying the dbt Cloud application — just like other dbt Cloud capabilities and other SaaS tools that you're using. No more manual upgrades and no more need for *a second sandbox project* just to try out new features in development. To learn more about the new setting, refer to [Release Tracks](https://docs.getdbt.com/docs/dbt-versions/cloud-release-tracks.md) for details. [![Example of the Latest setting](/img/docs/dbt-cloud/cloud-configuring-dbt-cloud/choosing-dbt-version/example-environment-settings.png?v=2 "Example of the Latest setting")](#)Example of the Latest setting * **Behavior change:** Introduced the `require_resource_names_without_spaces` flag, opt-in and disabled by default. If set to `True`, dbt will raise an exception if it finds a resource name containing a space in your project or an installed package. This will become the default in a future version of dbt. Read [No spaces in resource names](https://docs.getdbt.com/reference/global-configs/behavior-changes.md#no-spaces-in-resource-names) for more information. #### April 2024[​](#april-2024 "Direct link to April 2024") *  New: Merge jobs[Beta](https://docs.getdbt.com/docs/dbt-versions/product-lifecycles "Go to https://docs.getdbt.com/docs/dbt-versions/product-lifecycles") You can now set up a continuous deployment (CD) workflow for your projects natively in dbt Cloud. You can now access a beta release of [Merge jobs](https://docs.getdbt.com/docs/deploy/merge-jobs.md), which is a new [job type](https://docs.getdbt.com/docs/deploy/jobs.md), that enables you to trigger dbt job runs as soon as changes (via Git pull requests) merge into production. [![Example of creating a merge job](/img/docs/dbt-cloud/using-dbt-cloud/example-create-merge-job.png?v=2 "Example of creating a merge job")](#)Example of creating a merge job * **Behavior change:** Introduced the `require_explicit_package_overrides_for_builtin_materializations` flag, opt-in and disabled by default. If set to `True`, dbt will only use built-in materializations defined in the root project or within dbt, rather than implementations in packages. This will become the default in May 2024 (dbt Core v1.8 and dbt Cloud release tracks). Read [Package override for built-in materialization](https://docs.getdbt.com/reference/global-configs/behavior-changes.md#package-override-for-built-in-materialization) for more information. **Semantic Layer** * **New**: Use Saved selections to [save your query selections](https://docs.getdbt.com/docs/cloud-integrations/semantic-layer/gsheets.md#using-saved-selections) within the [Google Sheets application](https://docs.getdbt.com/docs/cloud-integrations/semantic-layer/gsheets.md). They can be made private or public and refresh upon loading. * **New**: Metrics are now displayed by their labels as `metric_name`. * **Enhancement**: [Metrics](https://docs.getdbt.com/docs/build/metrics-overview.md) now supports the [`meta` option](https://docs.getdbt.com/reference/resource-configs/meta.md) under the [config](https://docs.getdbt.com/reference/resource-properties/config.md) property. Previously, we only supported the now deprecated `meta` tag. * **Enhancement**: In the Google Sheets application, we added [support](https://docs.getdbt.com/docs/cloud-integrations/semantic-layer/gsheets.md#using-saved-queries) to allow jumping off from or exploring MetricFlow-defined saved queries directly. * **Enhancement**: In the Google Sheets application, we added support to query dimensions without metrics. Previously, you needed a dimension. * **Enhancement**: In the Google Sheets application, we added support for time presets and complex time range filters such as "between", "after", and "before". * **Enhancement**: In the Google Sheets application, we added supported to automatically populate dimension values when you select a "where" filter, removing the need to manually type them. Previously, you needed to manually type the dimension values. * **Enhancement**: In the Google Sheets application, we added support to directly query entities, expanding the flexibility of data requests. * **Enhancement**: In the Google Sheets application, we added an option to exclude column headers, which is useful for populating templates with only the required data. * **Deprecation**: For the Tableau integration, the [`METRICS_AND_DIMENSIONS` data source](https://docs.getdbt.com/docs/cloud-integrations/semantic-layer/tableau.md#using-the-integration) has been deprecated for all accounts not actively using it. We encourage users to transition to the "ALL" data source for future integrations. #### March 2024[​](#march-2024 "Direct link to March 2024") * **New:** The Semantic Layer services now support using Privatelink for customers who have it enabled. * **New:** You can now develop against and test your Semantic Layer in the dbt CLI if your developer credential uses SSO. * **Enhancement:** You can select entities to Group By, Filter By, and Order By. * **Fix:** `dbt parse` no longer shows an error when you use a list of filters (instead of just a string filter) on a metric. * **Fix:** `join_to_timespine` now properly gets applied to conversion metric input measures. * **Fix:** Fixed an issue where exports in Redshift were not always committing to the DWH, which also had the side-effect of leaving table locks open. * **Behavior change:** Introduced the `source_freshness_run_project_hooks` flag, opt-in and disabled by default. If set to `True`, dbt will include `on-run-*` project hooks in the `source freshness` command. This will become the default in a future version of dbt. Read [Project hooks with source freshness](https://docs.getdbt.com/reference/global-configs/behavior-changes.md#project-hooks-with-source-freshness) for more information. #### February 2024[​](#february-2024 "Direct link to February 2024") * **New:** [Exports](https://docs.getdbt.com/docs/use-dbt-semantic-layer/exports.md#define-exports) allow you to materialize a saved query as a table or view in your data platform. By using exports, you can unify metric definitions in your data platform and query them as you would any other table or view. * **New:** You can access a list of your [exports](https://docs.getdbt.com/docs/use-dbt-semantic-layer/exports.md) with the new list saved-queries command by adding `--show-exports` * **New:** The Semantic Layer and [Tableau Connector](https://docs.getdbt.com/docs/cloud-integrations/semantic-layer/tableau.md) now supports relative date filters in Tableau. *  New: Use exports to write saved queries You can now use the [exports](https://docs.getdbt.com/docs/use-dbt-semantic-layer/exports.md) feature with [dbt Semantic Layer](https://docs.getdbt.com/docs/use-dbt-semantic-layer/dbt-sl.md), allowing you to query reliable metrics and fast data reporting. Exports enhance the saved queries feature, allowing you to write commonly used queries directly within your data platform using dbt Cloud's job scheduler. By exposing tables of metrics and dimensions, exports enable you to integrate with additional tools that don't natively connect with the dbt Semantic Layer, such as PowerBI. Exports are available for dbt Cloud multi-tenant [Team or Enterprise](https://www.getdbt.com/pricing/) plans on dbt versions 1.7 or newer. Refer to the [exports blog](https://www.getdbt.com/blog/announcing-exports-for-the-dbt-semantic-layer) for more details. [![Add an environment variable to run exports in your production run.](/img/docs/dbt-cloud/semantic-layer/deploy_exports.png?v=2 "Add an environment variable to run exports in your production run.")](#)Add an environment variable to run exports in your production run. *  New: Trigger on job completion teamenterprise Now available for dbt Cloud Team and Enterprise plans is the ability to trigger deploy jobs when other deploy jobs are complete. You can enable this feature [in the UI](https://docs.getdbt.com/docs/deploy/deploy-jobs.md) with the **Run when another job finishes** option in the **Triggers** section of your job or with the [Create Job API endpoint](https://docs.getdbt.com/dbt-cloud/api-v2#/operations/Create%20Job). When enabled, your job will run after the specified upstream job completes. You can configure which run status(es) will trigger your job. It can be just on `Success` or on all statuses. If you have dependencies between your dbt projects, this allows you to *natively* orchestrate your jobs within dbt Cloud — no need to set up a third-party tool. An example of the **Triggers** section when creating the job: [![Example of Triggers on the Deploy Job page](/img/docs/dbt-cloud/using-dbt-cloud/example-triggers-section.png?v=2 "Example of Triggers on the Deploy Job page")](#)Example of Triggers on the Deploy Job page *  New: Latest Release Track[Beta](https://docs.getdbt.com/docs/dbt-versions/product-lifecycles "Go to https://docs.getdbt.com/docs/dbt-versions/product-lifecycles") *Now available in the dbt version dropdown in dbt Cloud — starting with select customers, rolling out to wider availability through February and March.* On this release track, you get automatic upgrades of dbt, including early access to the latest features, fixes, and performance improvements for your dbt project. dbt Labs will handle upgrades behind-the-scenes, as part of testing and redeploying the dbt Cloud application — just like other dbt Cloud capabilities and other SaaS tools that you're using. No more manual upgrades and no more need for *a second sandbox project* just to try out new features in development. To learn more about the new setting, refer to [Release Tracks](https://docs.getdbt.com/docs/dbt-versions/cloud-release-tracks.md) for details. [![Example of the Latest setting](/img/docs/dbt-cloud/cloud-configuring-dbt-cloud/choosing-dbt-version/example-environment-settings.png?v=2 "Example of the Latest setting")](#)Example of the Latest setting *  New: Override dbt version with new User development settings You can now [override the dbt version](https://docs.getdbt.com/docs/dbt-versions/upgrade-dbt-version-in-cloud.md#override-dbt-version) that's configured for the development environment within your project and use a different version — affecting only your user account. This lets you test new dbt features without impacting other people working on the same project. And when you're satisfied with the test results, you can safely upgrade the dbt version for your project(s). Use the **dbt version** dropdown to specify the version to override with. It's available on your project's credentials page in the **User development settings** section. For example: [![Example of overriding the dbt version on your user account](/img/docs/dbt-cloud/cloud-configuring-dbt-cloud/choosing-dbt-version/example-override-version.png?v=2 "Example of overriding the dbt version on your user account")](#)Example of overriding the dbt version on your user account *  Enhancement: Edit in primary git branch in IDE You can now edit, format, or lint files and execute dbt commands directly in your primary git branch in the [dbt Cloud IDE](https://docs.getdbt.com/docs/cloud/studio-ide/develop-in-studio.md). This enhancement is available across various repositories, including native integrations, imported git URLs, and managed repos. This enhancement is currently available to all dbt Cloud multi-tenant regions and will soon be available to single-tenant accounts. The primary branch of the connected git repo has traditionally been *read-only* in the IDE. This update changes the branch to *protected* and allows direct edits. When a commit is made, dbt Cloud will prompt you to create a new branch. dbt Cloud will pre-populate the new branch name with the GIT\_USERNAME-patch-#; however, you can edit the field with a custom branch name. Previously, the primary branch was displayed as read-only, but now the branch is displayed with a lock icon to identify it as protected: [![Previous read-only experience](/img/docs/dbt-cloud/using-dbt-cloud/read-only.png?v=2 "Previous read-only experience")](#)Previous read-only experience [![New protected experience](/img/docs/dbt-cloud/using-dbt-cloud/protected.png?v=2 "New protected experience")](#)New protected experience When you make a commit while on the primary branch, a modal window will open prompting you to create a new branch and enter a commit message: [![Create new branch window](/img/docs/dbt-cloud/using-dbt-cloud/create-new-branch.png?v=2 "Create new branch window")](#)Create new branch window * **Enhancement:** The Semantic Layer [Google Sheets integration](https://docs.getdbt.com/docs/cloud-integrations/semantic-layer/gsheets.md) now exposes a note on the cell where the data was requested, indicating clearer data requests. The integration also now exposes a new **Time Range** option, which allows you to quickly select date ranges. * **Enhancement:** The [GraphQL API](https://docs.getdbt.com/docs/dbt-cloud-apis/sl-graphql.md) includes a `requiresMetricTime` parameter to better handle metrics that must be grouped by time. (Certain metrics defined in MetricFlow can't be looked at without a time dimension). * **Enhancement:** Enable querying metrics with offset and cumulative metrics with the time dimension name, instead of `metric_time`. [Issue #1000](https://github.com/dbt-labs/metricflow/issues/1000) * Enable querying `metric_time` without metrics. [Issue #928](https://github.com/dbt-labs/metricflow/issues/928) * **Enhancement:** Added support for consistent SQL query generation, which enables ID generation consistency between otherwise identical MF queries. Previously, the SQL generated by `MetricFlowEngine` was not completely consistent between identical queries. [Issue 1020](https://github.com/dbt-labs/metricflow/issues/1020) * **Fix:** The Tableau Connector returns a date filter when filtering by dates. Previously it was erroneously returning a timestamp filter. * **Fix:** MetricFlow now validates if there are `metrics`, `group by`, or `saved_query` items in each query. Previously, there was no validation. [Issue 1002](https://github.com/dbt-labs/metricflow/issues/1002) * **Fix:** Measures using `join_to_timespine` in MetricFlow now have filters applied correctly after time spine join. * **Fix:** Querying multiple granularities with offset metrics: * If you query a time offset metric with multiple instances of `metric_time`/`agg_time_dimension`, only one of the instances will be offset. All of them should be. * If you query a time offset metric with one instance of `metric_time`/`agg_time_dimension` but filter by a different one, the query will fail. * **Fix:** MetricFlow prioritizes a candidate join type over the default type when evaluating nodes to join. For example, the default join type for distinct values queries is `FULL OUTER JOIN`, however, time spine joins require `CROSS JOIN`, which is more appropriate. * **Fix:** Fixed a bug that previously caused errors when entities were referenced in `where` filters. #### January 2024[​](#january-2024 "Direct link to January 2024") *  January docs updates Hello from the dbt Docs team: @mirnawong1, @matthewshaver, @nghi-ly, and @runleonarun! First, we’d like to thank the 10 new community contributors to docs.getdbt.com 🙏 What a busy start to the year! We merged 110 PRs in January. Here's how we improved the [docs.getdbt.com](http://docs.getdbt.com/) experience: * Added new hover behavior for images * Added new expandables for FAQs * Pruned outdated notices and snippets as part of the docs site maintenance January saw some great new content: * New [dbt Mesh FAQs](https://docs.getdbt.com/best-practices/how-we-mesh/mesh-5-faqs.md) page * Beta launch of [Explorer’s column-level lineage](https://docs.getdbt.com/docs/explore/column-level-lineage.md) feature * Developer blog posts: * [More time coding, less time waiting: Mastering defer in dbt](https://docs.getdbt.com/blog/defer-to-prod) * [Deprecation of dbt Server](https://docs.getdbt.com/blog/deprecation-of-dbt-server) * From the community: [Serverless, free-tier data stack with dlt + dbt core](https://docs.getdbt.com/blog/serverless-dlt-dbt-stack) * The Extrica team added docs for the [dbt-extrica community adapter](https://docs.getdbt.com/docs/local/connect-data-platform/extrica-setup.md) * Semantic Layer: New [conversion metrics docs](https://docs.getdbt.com/docs/build/conversion.md) and added the parameter `fill_nulls_with` to all metric types (launched the week of January 12, 2024) * New [dbt environment command](https://docs.getdbt.com/reference/commands/dbt-environment.md) and its flags for the dbt CLI January also saw some refreshed content, either aligning with new product features or requests from the community: * Native support for [partial parsing in dbt Cloud](https://docs.getdbt.com/docs/cloud/account-settings.md#partial-parsing) * Updated guidance on using dots or underscores in the [Best practice guide for models](https://docs.getdbt.com/best-practices/how-we-style/1-how-we-style-our-dbt-models.md) * Updated [PrivateLink for VCS docs](https://docs.getdbt.com/docs/cloud/secure/private-connectivity/aws/aws-self-hosted.md) * Added a new `job_runner` role in our [Enterprise project role permissions docs](https://docs.getdbt.com/docs/cloud/manage-access/enterprise-permissions.md#project-role-permissions) * Added saved queries to [Metricflow commands](https://docs.getdbt.com/docs/build/metricflow-commands.md#list-saved-queries) * Removed [as\_text docs](https://github.com/dbt-labs/docs.getdbt.com/pull/4726) that were wildly outdated * **New:** New metric type that allows you to measure conversion events. For example, users who viewed a web page and then filled out a form. For more details, refer to [Conversion metrics](https://docs.getdbt.com/docs/build/conversion.md). * **New:** Instead of specifying the fully qualified dimension name (for example, `order__user__country`) in the group by or filter expression, you now only need to provide the primary entity and dimensions name, like `user__county`. * **New:** You can now query the [saved queries](https://docs.getdbt.com/docs/build/saved-queries.md) you've defined in the Semantic Layer using [Tableau](https://docs.getdbt.com/docs/cloud-integrations/semantic-layer/tableau.md), [GraphQL API](https://docs.getdbt.com/docs/dbt-cloud-apis/sl-graphql.md), [JDBC API](https://docs.getdbt.com/docs/dbt-cloud-apis/sl-jdbc.md), and the [dbt CLI](https://docs.getdbt.com/docs/cloud/cloud-cli-installation.md). *  New: Native support for partial parsing By default, dbt parses all the files in your project at the beginning of every dbt invocation. Depending on the size of your project, this operation can take a long time to complete. With the new partial parsing feature in dbt Cloud, you can reduce the time it takes for dbt to parse your project. When enabled, dbt Cloud parses only the changed files in your project instead of parsing all the project files. As a result, your dbt invocations will take less time to run. To learn more, refer to [Partial parsing](https://docs.getdbt.com/docs/cloud/account-settings.md#partial-parsing). [![Example of the Partial parsing option](/img/docs/deploy/account-settings-partial-parsing.png?v=2 "Example of the Partial parsing option")](#)Example of the Partial parsing option * **Enhancement:** The YAML spec parameter `label` is now available for Semantic Layer metrics in [JDBC and GraphQL APIs](https://docs.getdbt.com/docs/dbt-cloud-apis/sl-api-overview.md). This means you can conveniently use `label` as a display name for your metrics when exposing them. * **Enhancement:** Added support for `create_metric: true` for a measure, which is a shorthand to quickly create metrics. This is useful in cases when metrics are only used to build other metrics. * **Enhancement:** Added support for Tableau parameter filters. You can use the [Tableau connector](https://docs.getdbt.com/docs/cloud-integrations/semantic-layer/tableau.md) to create and use parameters with your Semantic Layer data. * **Enhancement:** Added support to expose `expr` and `agg` for [Measures](https://docs.getdbt.com/docs/build/measures.md) in the [GraphQL API](https://docs.getdbt.com/docs/dbt-cloud-apis/sl-graphql.md). * **Enhancement:** You have improved error messages in the command line interface when querying a dimension that is not reachable for a given metric. * **Enhancement:** You can now query entities using our Tableau integration (similar to querying dimensions). * **Enhancement:** A new data source is available in our Tableau integration called "ALL", which contains all semantic objects defined. This has the same information as "METRICS\_AND\_DIMENSIONS". In the future, we will deprecate "METRICS\_AND\_DIMENSIONS" in favor of "ALL" for clarity. * **Fix:** Support for numeric types with precision greater than 38 (like `BIGDECIMAL`) in BigQuery is now available. Previously, it was unsupported so would return an error. * **Fix:** In some instances, large numeric dimensions were being interpreted by Tableau in scientific notation, making them hard to use. These should now be displayed as numbers as expected. * **Fix:** We now preserve dimension values accurately instead of being inadvertently converted into strings. * **Fix:** Resolved issues with naming collisions in queries involving multiple derived metrics using the same metric input. Previously, this could cause a naming collision. Input metrics are now deduplicated, ensuring each is referenced only once. * **Fix:** Resolved warnings related to using two duplicate input measures in a derived metric. Previously, this would trigger a warning. Input measures are now deduplicated, enhancing query processing and clarity. * **Fix:** Resolved an error where referencing an entity in a filter using the object syntax would fail. For example, `{{Entity('entity_name')}}` would fail to resolve. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### 2025 dbt platform release notes dbt release notes for recent and historical changes. Release notes fall into one of the following categories: * **New:** New products and features * **Enhancement:** Performance improvements and feature enhancements * **Fix:** Bug and security fixes * **Behavior change:** A change to existing behavior that doesn't fit into the other categories, such as feature deprecations or changes to default settings Release notes are grouped by month for both multi-tenant and virtual private cloud (VPC) environments. #### December 2025[​](#december-2025 "Direct link to December 2025") * **New**: [Global navigation](https://docs.getdbt.com/docs/explore/global-navigation.md) is now the default experience for Catalog, providing a unified search experience that lets you find dbt resources across all your projects, as well as non-dbt resources in Snowflake. Global navigation is now generally available to all users. You can access Catalog by clicking **Catalog** in the top-level navigation. * **Enhancement**: dbt SSO slugs are now system-generated during SSO setup and aren't customizable. SSO slug configurations currently in use will remain valid; they will be read-only and cannot be changed. If you delete your existing SSO configuration and create a new one, you'll be provided with a new system-generated SSO slug. This change enhances security and prevents accounts from setting slugs that "impersonate" other organizations. * **Enhancement**: For users in the default region (`US1`) that previously created a dbt account in the past, the dbt VS Code extension now supports registering with OAuth . This makes it easier to register the extension for users who may have forgotten their password or are locked out of their account. For more information, see [Register the extension](https://docs.getdbt.com/docs/install-dbt-extension.md#register-the-extension). * **New and enhancements:** The dbt [Studio IDE user interface](https://docs.getdbt.com/docs/cloud/studio-ide/ide-user-interface.md) has been enhanced to bring more powerful development features to your fingertips: * A newly designed toolbar that groups all of your action and project insight tabs for easy access. * A dedicated inline **Commands** tab for history and logs. * When you upgrade your development environment to the dbt Fusion engine, the environment includes a new **Problems** tab that gives you live error detection on issues that could block your project from running successfully. #### November 2025[​](#november-2025 "Direct link to November 2025") * **Behavior change**: [dbt Copilot](https://docs.getdbt.com/docs/cloud/dbt-copilot.md) now requires all input files to use UTF-8 encoding. Files that use other encodings will return an error. If you're working with legacy files that use a different encoding, convert them to UTF-8 before using Copilot. * **Enhancement**: dbt Copilot now has improved reliability when working with OpenAI. This includes longer timeouts, better retry behavior, and improved handling of reasoning messages for long code generations, resulting in fewer failures and more successful completions. * **New**: The Snowflake adapter now supports basic table materialization on Iceberg tables registered in a Glue catalog through a [catalog-linked database](https://docs.snowflake.com/en/user-guide/tables-iceberg-catalog-linked-database#label-catalog-linked-db-create). For more information, see [Glue Data Catalog](https://docs.getdbt.com/docs/mesh/iceberg/snowflake-iceberg-support.md#external-catalogs). * **New**: You can use the `platform_detection_timeout_seconds` parameter to control how long the Snowflake connector waits when detecting the cloud platform where the connection is being made. For more information, see [Snowflake setup](https://docs.getdbt.com/docs/local/connect-data-platform/snowflake-setup.md#platform_detection_timeout_seconds). * **New**: The `cluster_by` configuration is supported in dynamic tables. For more information, see [Dynamic table clustering](https://docs.getdbt.com/reference/resource-configs/snowflake-configs.md#dynamic-table-clustering). * **New**: When jobs exceed their configured timeout, the BigQuery adapter sends a cancellation request to the BigQuery job. For more information, see [Connect BigQuery](https://docs.getdbt.com/docs/cloud/connect-data-platform/connect-bigquery.md#job-creation-timeout-seconds). #### October 2025[​](#october-2025 "Direct link to October 2025") * **Enhancement**: dbt enforces cumulative log size limits on run endpoints. If logs exceed this limit, dbt omits them and displays a banner. For more information, see [Run visibility](https://docs.getdbt.com/docs/deploy/run-visibility.md#log-size-limits). * **New**: The [docs.getdbt.com](http://docs.getdbt.com/) documentation site has introduced an LLM Context menu on all product documentation and guide pages. This menu provides users with quick options to interact with the current page using LLMs. You can can now: * Copy the page as raw Markdown — This makes it easier to reference or reuse documentation content. * Open the page directly in ChatGPT or Claude — This redirects you to a chat with the LLM and automatically loads a message asking it to read the page, helping you start a conversation with context from the page. [![LLM Context menu on documentation pages](/img/llm-menu.png?v=2 "LLM Context menu on documentation pages")](#)LLM Context menu on documentation pages * **Enhancement**: The CodeGenCodeLen feature has been re-introduced to the Studio IDE. This feature was [temporarily](#pre-coalesce) removed in the previous release due to compatibility issues. ##### Coalesce 2025 announcements[​](#coalesce-2025-announcements "Direct link to Coalesce 2025 announcements") The following features are new or enhanced as part of [dbt's Coalesce analytics engineering conference](https://coalesce.getdbt.com/event/21662b38-2c17-4c10-9dd7-964fd652ab44/summary) from October 13-16, 2025: * **New**: The [dbt MCP server](https://docs.getdbt.com/docs/dbt-ai/about-mcp.md) is now generally available (GA). For more information on the dbt MCP server and dbt Agents, refer to the [Announcing dbt Agents and the remote dbt MCP Server: Trusted AI for analytics](https://www.getdbt.com/blog/dbt-agents-remote-dbt-mcp-server-trusted-ai-for-analytics) blog post. * **Private preview**: The [dbt platform (powered by Fusion)](https://docs.getdbt.com/docs/dbt-versions/upgrade-dbt-version-in-cloud.md#dbt-fusion-engine) is now in private preview. If you have any questions, please reach out to your account manager. * [About data platform connections](https://docs.getdbt.com/docs/cloud/connect-data-platform/about-connections.md) lists all available dbt platform connections on Fusion and the supported authentication methods per connection. * **New**: Fusion‑specific configuration is now available for BigQuery, Databricks, Redshift, and Snowflake. For more information, see [Connect Fusion to your data platform](https://docs.getdbt.com/docs/local/profiles.yml.md). * **Alpha**: The `dbt-salesforce` adapter is available via the dbt Fusion engine CLI. Note that this connection is in the Alpha product stage and is not production-ready. For more information, see [Salesforce Data Cloud setup](https://docs.getdbt.com/docs/local/connect-data-platform/salesforce-data-cloud-setup.md). * **Private preview**: [State-aware orchestration](https://docs.getdbt.com/docs/deploy/state-aware-about.md) is now in private preview! * **New**: You can now [enable state-aware orchestration](https://docs.getdbt.com/docs/deploy/state-aware-setup.md) by selecting **Enable Fusion cost optimization features** in your job settings. Previously, you had to disable **Force node selection** to enable state-aware orchestration. * **Private beta**: The [Efficient Testing feature](https://docs.getdbt.com/docs/deploy/state-aware-about.md#efficient-testing-in-state-aware-orchestration) is now available in private beta. This feature reduces warehouse costs by avoiding redundant data tests and combining multiple tests in a single query. * **New**: To improve visibility into state‑aware orchestration and provide better control when you need to reset cached state, the following [UI enhancements](https://docs.getdbt.com/docs/deploy/state-aware-interface.md) are introduced: * **Models built and reused chart** on your **Account home** * New charts in the **Overview** section of your job that display **Recent runs**, **Total run duration**, **Models built**, and **Models reused** * A new structure to view logs grouped by models, with a **Reused** tab to quickly find reused models * **Reused** tag in **Latest status** lineage lens to see reused models in your DAG * **Clear cache** button on the **Environments** page to reset cached state when needed * **New**: [dbt Insights](https://docs.getdbt.com/docs/explore/dbt-insights.md) is now generally available (GA)! * **Private beta**: The [Analyst agent](https://docs.getdbt.com/docs/explore/navigate-dbt-insights.md#dbt-copilot) is now available in dbt Insights. The Analyst agent is a conversational AI feature where you can ask natural language prompts and receive analysis in real-time. For more information, see [Analyze data with the Analyst agent](https://docs.getdbt.com/docs/cloud/use-dbt-copilot.md#analyze-data-with-the-analyst-agent). * **Beta**: The [Semantic Layer querying](https://docs.getdbt.com/docs/explore/navigate-dbt-insights.md#semantic-layer-querying) within dbt Insights is now available in beta. With this feature, you can build SQL queries against the Semantic Layer without writing SQL code. It guides you in creating queries based on available metrics, dimensions, and entities. * **Enhancement**: In [dbt Insights](https://docs.getdbt.com/docs/explore/dbt-insights.md), projects upgraded to the [dbt Fusion engine](https://docs.getdbt.com/docs/fusion.md) get [Language Server Protocol (LSP) features](https://docs.getdbt.com/docs/explore/navigate-dbt-insights.md#lsp-features) and their compilation running on Fusion. * **New**: [MetricFlow](https://docs.getdbt.com/docs/build/about-metricflow.md) is now developed and maintained as part of the [Open Semantic Interchange (OSI)](https://www.snowflake.com/en/blog/open-semantic-interchange-ai-standard/) initiative, and is distributed under the [Apache 2.0 license](https://github.com/dbt-labs/metricflow/blob/main/LICENSE). For more information, see the blog post about [Open sourcing MetricFlow](https://www.getdbt.com/blog/open-source-metricflow-governed-metrics). ##### Pre-Coalesce[​](#pre-coalesce "Direct link to Pre-Coalesce") * **Behavior change**: dbt platform [access URLs](https://docs.getdbt.com/docs/cloud/about-cloud/access-regions-ip-addresses.md) for accounts in the US multi-tenant (US MT) region are transitioning from `cloud.getdbt.com` to dedicated domains on `dbt.com` (for example, `us1.dbt.com`). Users will be automatically redirected, which means no action is required. EMEA and APAC MT accounts are not impacted by this change and will be updated by the end of November 2025. Organizations that use network allow-listing should add `YOUR_ACCESS_URL.dbt.com` to their allow list (for example, if your access URL is `ab123.us1.dbt.com`, add the entire domain `ab123.us1.dbt.com` to your allow list). All OAuth, Git, and public API integrations will continue to work with the previous domain. View the updated access URL in dbt platform's **Account settings** page. For questions, contact . * **Enhancement**: * **Fusion MCP tools** — Added Fusion tools that support `compile_sql` and `get_column_lineage` (Fusion-exclusive) for both [Remote](https://docs.getdbt.com/docs/dbt-ai/about-mcp.md#fusion-tools-remote) and [Local](https://docs.getdbt.com/docs/dbt-ai/about-mcp.md#fusion-tools-local) usage. Remote Fusion tools defer to your prod environment by default (set with `x-dbt-prod-environment-id`); you can disable deferral with `x-dbt-fusion-disable-defer=true`. Refer to [set up remote MCP](https://docs.getdbt.com/docs/dbt-ai/setup-remote-mcp.md) for more info. * **Local MCP OAuth** — You can now authenticate the local dbt MCP server to the dbt platform with OAuth (supported docs for [Claude](https://docs.getdbt.com/docs/dbt-ai/integrate-mcp-claude.md), [Cursor](https://docs.getdbt.com/docs/dbt-ai/integrate-mcp-cursor.md), and [VS Code](https://docs.getdbt.com/docs/dbt-ai/integrate-mcp-vscode.md)), reducing local secret management and standardizing setup. Refer to [dbt platform authentication](https://docs.getdbt.com/docs/dbt-ai/setup-local-mcp.md#dbt-platform-authentication) for more information. * **Behavior change**: The CodeGenCodeLens feature for creating models from your sources with a click of a button has been temporarily removed from the Studio IDE due to compatibility issues. We plan to reintroduce this feature in the near future for both the IDE and the VS Code extension. #### September 2025[​](#september-2025 "Direct link to September 2025") * **Fix**: Improved how [MetricFlow](https://docs.getdbt.com/docs/build/about-metricflow.md) handles [offset metrics](https://docs.getdbt.com/docs/build/derived.md) for more accurate results when querying time-based data. MetricFlow now joins data *after* aggregation when the query grain matches the offset grain. Previously, when querying offset metrics, the offset join was applied *before* aggregation, which could exclude some values from the total time period. #### August 2025[​](#august-2025 "Direct link to August 2025") * **Fix**: Resolved a bug that caused [saved query](https://docs.getdbt.com/docs/build/saved-queries.md) exports to fail during `dbt build` with `Unable to get saved_query` errors. * **New**: The Semantic Layer GraphQL API now has a [`queryRecords`](https://docs.getdbt.com/docs/dbt-cloud-apis/sl-graphql.md#query-records) endpoint. With this endpoint, you can view the query history both for Insights and Semantic Layer queries. * **Fix**: Resolved a bug that caused Semantic Layer queries with a trailing whitespace to produce an error. This issue mostly affected [Push.ai](https://docs.push.ai/data-sources/semantic-layers/dbt) users and is fixed now. * **New**: You can now use [personal access tokens (PATs)](https://docs.getdbt.com/docs/dbt-cloud-apis/user-tokens.md) to authenticate in the Semantic Layer. This enables user-level authentication and reduces the need for sharing tokens between users. When you authenticate using PATs, queries are run using your personal development credentials. For more information, see [Set up the dbt Semantic Layer](https://docs.getdbt.com/docs/use-dbt-semantic-layer/setup-sl.md). #### July 2025[​](#july-2025 "Direct link to July 2025") * **New**: The [Tableau Cloud](https://www.tableau.com/products/cloud-bi) integration with Semantic Layer is now available. For more information, see [Tableau](https://docs.getdbt.com/docs/cloud-integrations/semantic-layer/tableau.md). * **Preview**: The [Semantic Layer Power BI integration](https://docs.getdbt.com/docs/cloud-integrations/semantic-layer/power-bi.md) is now available in Preview. * **Enhancement:** You can now use `limit` and `order_by` parameters when creating [saved queries](https://docs.getdbt.com/docs/build/saved-queries.md). * **Enhancement:** Users assigned IT [licenses](https://docs.getdbt.com/docs/cloud/manage-access/seats-and-users.md) can now edit and manage [global connections settings](https://docs.getdbt.com/docs/cloud/connect-data-platform/about-connections.md#connection-management). * **New**: Paginated [GraphQL](https://docs.getdbt.com/docs/dbt-cloud-apis/sl-graphql.md) endpoints for metadata queries in Semantic Layer are now available. This improves integration load times for large manifests. For more information, see [Metadata calls](https://docs.getdbt.com/docs/dbt-cloud-apis/sl-graphql.md#metadata-calls). #### June 2025[​](#june-2025 "Direct link to June 2025") * **New**: [System for Cross-Domain Identity Management](https://docs.getdbt.com/docs/cloud/manage-access/scim.md#scim-configuration-for-entra-id) (SCIM) through Microsoft Entra ID is now GA. Also available on legacy Enterprise plans. * **Enhancement:** You can now set the [compilation environment](https://docs.getdbt.com/docs/explore/access-dbt-insights.md#set-jinja-environment) to control how Jinja functions are rendered in dbt Insights. * **Beta**: The dbt Fusion engine supports the BigQuery adapter in beta. * **New:** You can now view the history of settings changes for [projects](https://docs.getdbt.com/docs/cloud/account-settings.md), [environments](https://docs.getdbt.com/docs/dbt-cloud-environments.md), and [jobs](https://docs.getdbt.com/docs/deploy/deploy-jobs.md). * **New:** Added support for the latest version of BigQuery credentials in Semantic Layer and MetricFlow. * **New:** Snowflake External OAuth is now supported for Semantic Layer queries. Snowflake connections that use External OAuth for user credentials can now emit queries for Insights, dbt CLI, and Studio IDE through the Semantic Layer Gateway. This enables secure, identity-aware access via providers like Okta or Microsoft Entra ID. * **New:** You can now [download your managed Git repo](https://docs.getdbt.com/docs/cloud/git/managed-repository.md#download-managed-repository) from the dbt platform. * **New**: The Semantic Layer now supports Trino as a data platform. For more details, see [Set up the Semantic Layer](https://docs.getdbt.com/docs/use-dbt-semantic-layer/setup-sl.md). * **New**: The dbt Fusion engine supports Databricks in beta. * **Enhancement**: Group owners can now specify multiple email addresses for model-level notifications, enabling broader team alerts. Previously, only a single email address was supported. Check out the [Configure groups](https://docs.getdbt.com/docs/deploy/model-notifications.md#configure-groups) section to learn more. * **New**: The Semantic Layer GraphQL API now has a [`List a saved query`](https://docs.getdbt.com/docs/dbt-cloud-apis/sl-graphql.md#list-a-saved-query) endpoint. #### May 2025[​](#may-2025 "Direct link to May 2025") ##### 2025 dbt Launch Showcase[​](#2025-dbt-launch-showcase "Direct link to 2025 dbt Launch Showcase") The following features are new or enhanced as part of our [dbt Launch Showcase](https://www.getdbt.com/resources/webinars/2025-dbt-cloud-launch-showcase) on May 28th, 2025: * **New**: The dbt Fusion engine is the brand new dbt engine re-written from the ground up to provide incredible speed, cost-savings tools, and comprehensive SQL language tools. The dbt Fusion engine is now available in beta for Snowflake users. * Read more [about Fusion](https://docs.getdbt.com/docs/fusion.md). * Understand what actions you need to take to get your projects Fusion-ready with the [upgrade guide](https://docs.getdbt.com/docs/dbt-versions/core-upgrade/upgrading-to-fusion.md). * Begin testing today with the [quickstart guide](https://docs.getdbt.com/guides/fusion.md). * Know [where we're headed with the dbt Fusion engine](https://getdbt.com/blog/where-we-re-headed-with-the-dbt-fusion-engine). * **New**: The dbt VS Code extension is a powerful new tool that brings the speed and productivity of the dbt Fusion engine into your Visual Studio Code editor. This is a free download that will forever change your dbt development workflows. The dbt VS Code extension is now available as beta [alongside Fusion](https://getdbt.com/blog/get-to-know-the-new-dbt-fusion-engine-and-vs-code-extension). Check out the [installation instructions](https://docs.getdbt.com/docs/install-dbt-extension.md) and read more [about the features](https://docs.getdbt.com/docs/about-dbt-extension.md) to get started enhancing your dbt workflows today! * **New**: dbt Explorer is now Catalog! Learn more about the change [here](https://getdbt.com/blog/updated-names-for-dbt-platform-and-features). * dbt's Catalog, global navigation provides a search experience that lets you find dbt resources across all your projects, as well as non-dbt resources in Snowflake. * External metadata ingestion allows you to connect directly to your data warehouse, giving you visibility into tables, views, and other resources that aren't defined in dbt. * **New**: [dbt Canvas is now generally available](https://getdbt.com/blog/dbt-canvas-is-ga) (GA). Canvas is the intuitive visual editing tool that enables anyone to create dbt models with an easy to understand drag-and-drop interface. Read more [about Canvas](https://docs.getdbt.com/docs/cloud/canvas.md) to begin empowering your teams to build more, faster! * **New**: [State-aware orchestration](https://docs.getdbt.com/docs/deploy/state-aware-about.md) is now in beta! Every time a new job in Fusion runs, state-aware orchestration automatically determines which models to build by detecting changes in code or data. * **New**: With Hybrid Projects, your organization can adopt complementary dbt Core and dbt Cloud workflows and seamlessly integrate these workflows by automatically uploading dbt Core artifacts into dbt Cloud. [Hybrid Projects](https://docs.getdbt.com/docs/deploy/hybrid-projects.md) is now available as a preview to [dbt Enterprise accounts](https://www.getdbt.com/pricing). * **New**: [System for Cross-Domain Identity Management (SCIM)](https://docs.getdbt.com/docs/cloud/manage-access/scim.md) through Okta is now GA. * **New**: dbt now acts as a [Model Context Protocol](https://docs.getdbt.com/docs/dbt-ai/about-mcp.md) (MCP) server, allowing seamless integration of AI tools with data warehouses through a standardized framework. * **New**: The [quickstart guide for data analysts](https://docs.getdbt.com/guides/analyze-your-data.md) is now available. With dbt, data analysts can use built-in, AI-powered tools to build governed data models, explore how they’re built, and run their own analysis. * **New**: You can view your [usage metering and limiting in dbt Copilot](https://docs.getdbt.com/docs/cloud/billing.md#dbt-copilot-usage-metering-and-limiting) on the billing page of your dbt Cloud account. * **New**: You can use Copilot to create a `dbt-styleguide.md` for dbt projects. The generated style guide template includes SQL style guidelines, model organization and naming conventions, model configurations and testing practices, and recommendations to enforce style rules. For more information, see [Copilot style guide](https://docs.getdbt.com/docs/cloud/copilot-styleguide.md). * **New**: Copilot chat is an interactive interface within the Studio IDE where you can generate SQL code from natural language prompts and ask analytics-related questions. It integrates contextual understanding of your dbt project and assists in streamlining SQL development. For more information, see [Copilot chat](https://docs.getdbt.com/docs/cloud/copilot-chat-in-studio.md). * **New**: Leverage dbt Copilot to generate SQL queries in [Insights](https://docs.getdbt.com/docs/explore/dbt-insights.md) from natural language prompts, enabling efficient data exploration within a context-aware interface. * **New**: The dbt platform Cost management dashboard was available as a preview for Snowflake users on Enterprise and Enterprise Plus plans. On November 25, 2025, we retired the cost management dashboard to focus on building a more scalable and integrated cost-insights experience, expected in early 2026. * **New**: Apache Iceberg catalog integration support is now available on Snowflake and BigQuery! This is essential to making your dbt Mesh interoperable across platforms, built on Iceberg. Read more about [Iceberg](https://docs.getdbt.com/docs/mesh/iceberg/apache-iceberg-support.md) to begin creating Iceberg tables. * **Update**: Product renaming and other changes. For more information, refer to [Updated names for dbt platform and features](https://getdbt.com/blog/updated-names-for-dbt-platform-and-features).  Product names key * Canvas (previously Visual Editor) * Catalog (previously Explorer) * Copilot * Cost Management * dbt Fusion engine * Insights * Mesh * Orchestrator * Studio IDE (previously Cloud IDE) * Semantic Layer * Pricing plan changes. For more information, refer to [One dbt](https://www.getdbt.com/product/one-dbt). #### April 2025[​](#april-2025 "Direct link to April 2025") * **Enhancement**: The [Python SDK](https://docs.getdbt.com/docs/dbt-cloud-apis/sl-python.md) now supports lazy loading for large fields for `dimensions`, `entities`, and `measures` on `Metric` objects. For more information, see [Lazy loading for large fields](https://docs.getdbt.com/docs/dbt-cloud-apis/sl-python.md#lazy-loading-for-large-fields). * **Enhancement**: The Semantic Layer now supports SSH tunneling for [Postgres](https://docs.getdbt.com/docs/cloud/connect-data-platform/connect-postgresql-alloydb.md#connecting-using-an-ssh-tunnel) or [Redshift](https://docs.getdbt.com/docs/cloud/connect-data-platform/connect-redshift.md#connecting-using-an-ssh-tunnel) connections. Refer to [Set up the Semantic Layer](https://docs.getdbt.com/docs/use-dbt-semantic-layer/setup-sl.md) for more information. * **Behavior change**: Users assigned the [`job admin` permission set](https://docs.getdbt.com/docs/cloud/manage-access/enterprise-permissions.md#job-admin) now have access to set up integrations for projects, including the [Tableau](https://docs.getdbt.com/docs/cloud-integrations/downstream-exposures-tableau.md) integration to populate downstream exposures. #### March 2025[​](#march-2025 "Direct link to March 2025") * **Behavior change**: As of March 31st, 2025, dbt Core versions 1.0, 1.1, and 1.2 have been deprecated from dbt. They are no longer available to select as versions for dbt projects. Workloads currently on these versions will be automatically upgraded to v1.3, which may cause new failures. * **Enhancement**: [Semantic Layer](https://docs.getdbt.com/docs/use-dbt-semantic-layer/dbt-sl.md) users on single-tenant configurations no longer need to contact their account representative to enable this feature. Setup is now self-service and available across all tenant configurations. * **New**: The Semantic Layer now supports Postgres as a data platform. For more details on how to set up the Semantic Layer for Postgres, see [Set up the Semantic Layer](https://docs.getdbt.com/docs/use-dbt-semantic-layer/setup-sl.md). * **New**: New [environment variable default](https://docs.getdbt.com/docs/build/environment-variables.md#dbt-cloud-context) `DBT_CLOUD_INVOCATION_CONTEXT`. * **Enhancement**: Users assigned [read-only licenses](https://docs.getdbt.com/docs/cloud/manage-access/about-user-access.md#licenses) are now able to view the [Deploy](https://docs.getdbt.com/docs/deploy/deployments.md) section of their dbt account and click into the individual sections but not edit or otherwise make any changes. ###### dbt Developer day[​](#dbt-developer-day "Direct link to dbt Developer day") The following features are new or enhanced as part of our [dbt Developer day](https://www.getdbt.com/resources/webinars/dbt-developer-day) on March 19th and 20th, 2025: * **New**: The [`--sample` flag](https://docs.getdbt.com/docs/build/sample-flag.md), now available for the `run` and `build` commands, helps reduce build times and warehouse costs by running dbt in sample mode. It generates filtered refs and sources using time-based sampling, allowing developers to validate outputs without building entire models. * **New**: Copilot, an AI-powered assistant, is now generally available in the Cloud IDE for all dbt Enterprise accounts. Check out [Copilot](https://docs.getdbt.com/docs/cloud/dbt-copilot.md) for more information. ###### Also available this month[​](#also-available-this-month "Direct link to Also available this month") * **New**: Bringing your own [Azure OpenAI key](https://docs.getdbt.com/docs/cloud/enable-dbt-copilot.md#bringing-your-own-openai-api-key-byok) for [Copilot](https://docs.getdbt.com/docs/cloud/dbt-copilot.md) is now generally available. Your organization can configure Copilot to use your own Azure OpenAI keys, giving you more control over data governance and billing. * **New**: The Semantic Layer supports Power BI as a [partner integration](https://docs.getdbt.com/docs/cloud-integrations/avail-sl-integrations.md), available in private beta. To join the private beta, please reach out to your account representative. Check out the [Power BI](https://docs.getdbt.com/docs/cloud-integrations/semantic-layer/power-bi.md) integration for more information. * **New**: [dbt release tracks](https://docs.getdbt.com/docs/dbt-versions/cloud-release-tracks.md) are Generally Available. Depending on their plan, customers may select among the Latest, Compatible, or Extended tracks to manage the update cadences for development and deployment environments. * **New:** The dbt-native integration with Azure DevOps now supports [Entra ID service principals](https://docs.getdbt.com/docs/cloud/git/setup-service-principal.md). Unlike a services user, which represents a real user object in Entra ID, the service principal is a secure identity associated with your dbt app to access resources in Azure unattended. Please [migrate your service user](https://docs.getdbt.com/docs/cloud/git/setup-service-principal.md#migrate-to-service-principal) to a service principal for Azure DevOps as soon as possible. #### February 2025[​](#february-2025 "Direct link to February 2025") * **Enhancement**: The [Python SDK](https://docs.getdbt.com/docs/dbt-cloud-apis/sl-python.md) added a new timeout parameter to Semantic Layer client and to underlying GraphQL clients to specify timeouts. Set a timeout number or use the `total_timeout` parameter in the global `TimeoutOptions` to control connect, execute, and close timeouts granularly. `ExponentialBackoff.timeout_ms` is now deprecated. * **New**: The [Azure DevOps](https://docs.getdbt.com/docs/cloud/git/connect-azure-devops.md) integration for Git now supports [Entra service principal apps](https://docs.getdbt.com/docs/cloud/git/setup-service-principal.md) on dbt Enterprise accounts. Microsoft is enforcing MFA across user accounts, including service users, which will impact existing app integrations. This is a phased rollout, and dbt Labs recommends [migrating to a service principal](https://docs.getdbt.com/docs/cloud/git/setup-service-principal.md#migrate-to-service-principal) on existing integrations once the option becomes available in your account. * **New**: Added the `dbt invocation` command to the [dbt CLI](https://docs.getdbt.com/docs/cloud/cloud-cli-installation.md). This command allows you to view and manage active invocations, which are long-running sessions in the dbt CLI. For more information, see [dbt invocation](https://docs.getdbt.com/reference/commands/invocation.md). * **New**: Users can now switch themes directly from the user menu, available [in Preview](https://docs.getdbt.com/docs/dbt-versions/product-lifecycles.md#dbt-cloud). We have added support for **Light mode** (default), **Dark mode**, and automatic theme switching based on system preferences. The selected theme is stored in the user profile and will follow users across all devices. * Dark mode is currently available on the Developer plan and will be available for all [plans](https://www.getdbt.com/pricing) in the future. We’ll be rolling it out gradually, so stay tuned for updates. For more information, refer to [Change your dbt theme](https://docs.getdbt.com/docs/cloud/about-cloud/change-your-dbt-cloud-theme.md). * **Fix**: Semantic Layer errors in the Cloud IDE are now displayed with proper formatting, fixing an issue where newlines appeared broken or difficult to read. This fix ensures error messages are more user-friendly and easier to parse. * **Fix**: Fixed an issue where [saved queries](https://docs.getdbt.com/docs/build/saved-queries.md) with no [exports](https://docs.getdbt.com/docs/build/saved-queries.md#configure-exports) would fail with an `UnboundLocalError`. Previously, attempting to process a saved query without any exports would cause an error due to an undefined relation variable. Exports are optional, and this fix ensures saved queries without exports don't fail. * **New**: You can now query metric alias in Semantic Layer [GraphQL](https://docs.getdbt.com/docs/dbt-cloud-apis/sl-graphql.md) and [JDBC](https://docs.getdbt.com/docs/dbt-cloud-apis/sl-jdbc.md) APIs. * For the JDBC API, refer to [Query metric alias](https://docs.getdbt.com/docs/dbt-cloud-apis/sl-jdbc.md#query-metric-alias) for more information. * For the GraphQL API, refer to [Query metric alias](https://docs.getdbt.com/docs/dbt-cloud-apis/sl-graphql.md#query-metric-alias) for more information. * **Enhancement**: Added support to automatically refresh access tokens when Snowflake's SSO connection expires. Previously, users would get the following error: `Connection is not available, request timed out after 30000ms` and would have to wait 10 minutes to try again. * **Enhancement**: The [`dbt_version` format](https://docs.getdbt.com/reference/commands/version.md#versioning) in dbt Cloud now better aligns with [semantic versioning rules](https://semver.org/). Leading zeroes have been removed from the month and day (`YYYY.M.D+`). For example: * New format: `2024.10.8+996c6a8` * Previous format: `2024.10.08+996c6a8` #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### About continuous integration (CI) in dbt Use [CI jobs](https://docs.getdbt.com/docs/deploy/ci-jobs.md) in dbt to set up automation for testing code changes before merging to production. Additionally, [enable Advanced CI features](https://docs.getdbt.com/docs/cloud/account-settings.md#account-access-to-advanced-ci-features) for these jobs to evaluate whether the code changes are producing the appropriate data changes you want by reviewing the comparison differences dbt provides. Refer to the guide [Get started with continuous integration tests](https://docs.getdbt.com/guides/set-up-ci.md?step=1) for more information. [![](/img/icons/dbt-bit.svg)](https://docs.getdbt.com/docs/deploy/continuous-integration.md) ###### [Continuous integration](https://docs.getdbt.com/docs/deploy/continuous-integration.md) [Set up CI checks to test every single change prior to deploying the code to production.](https://docs.getdbt.com/docs/deploy/continuous-integration.md) [![](/img/icons/dbt-bit.svg)](https://docs.getdbt.com/docs/deploy/advanced-ci.md) ###### [Advanced CI](https://docs.getdbt.com/docs/deploy/advanced-ci.md) [Compare the differences between what's in the production environment and the pull request before merging those changes, ensuring that you're always shipping trusted data products.](https://docs.getdbt.com/docs/deploy/advanced-ci.md)
#### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### About dbt AI and intelligence dbt AI and intelligence is a suite of features that helps you use AI to accelerate your data analytics and intelligence workflows. Whether you're generating code, tests, and documentation with inline AI assistance, delegating complex workflows to autonomous agents, or building your own custom agents with MCP — dbt has you covered. [![](/img/icons/dbt-copilot.svg)](https://docs.getdbt.com/docs/cloud/dbt-copilot-overview.md) ###### [dbt Copilot](https://docs.getdbt.com/docs/cloud/dbt-copilot-overview.md) [dbt's AI experience — bringing inline code assistance and autonomous agents (like the Developer agent) together across your analytics development lifecycle.](https://docs.getdbt.com/docs/cloud/dbt-copilot-overview.md) [![](/img/icons/dbt-copilot.svg)](https://docs.getdbt.com/docs/dbt-ai/about-mcp.md) ###### [dbt MCP](https://docs.getdbt.com/docs/dbt-ai/about-mcp.md) [Build your own custom agents and copilots with the local or remote dbt MCP server.](https://docs.getdbt.com/docs/dbt-ai/about-mcp.md) #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### About dbt Core versions Learn about versioning for the dbt Core engine (Python-based CLI). If you run the dbt Core engine locally (for example, using `pip`), then this page is for you. dbt Core releases follow [semantic versioning](https://semver.org/). If you're using dbt platform (including the dbt CLI, you don't need to manage dbt versions yourself. [Release tracks](https://docs.getdbt.com/docs/dbt-versions/cloud-release-tracks.md) automatically keep you up to date and provide early access to new features before they’re available in dbt Core. note If you want to use the dbt Fusion engine, locally or in dbt platform, then read [Get Started](https://docs.getdbt.com/docs/local/install-dbt.md?version=2). If you manage your own dbt Core versions locally, read on. dbt Core releases follow [semantic versioning](https://semver.org/). * **[Active](https://docs.getdbt.com/docs/dbt-versions/core.md#current-version-support)**: In the first few months after a minor version's initial release, we patch it with bugfix releases. These include fixes for regressions, new bugs, and older bugs / quality-of-life improvements. We implement these changes when we have high confidence that they're narrowly scoped and won't cause unintended side effects. * **[Critical](https://docs.getdbt.com/docs/dbt-versions/core.md#current-version-support)**: When a newer minor version ships, the previous one transitions to "Critical Support" for the remainder of its one-year window. Patches during this period are limited to critical security and installation fixes. After the one-year window ends, the version reaches end of life. * **[End of Life](https://docs.getdbt.com/docs/dbt-versions/core.md#end-of-life-versions)**: Minor versions that have reached EOL no longer receive new patch releases. * **Deprecated**: dbt Core versions that are no longer maintained by dbt Labs, nor supported in the dbt platform. ##### Latest releases[​](#latest-releases "Direct link to Latest releases") | dbt Core | Initial release | Support level and end date | | -------------------------------------------------------------------------------------------------------- | --------------- | ----------------------------------- | | [**v1.11**](https://docs.getdbt.com/docs/dbt-versions/core-upgrade/upgrading-to-v1.11.md) | Dec 19, 2025 | **Active Support — Dec 18, 2026** | | [**v1.10**](https://docs.getdbt.com/docs/dbt-versions/core-upgrade/upgrading-to-v1.10.md) | Jun 16, 2025 | **Critical Support — Jun 15, 2026** | | [**v1.9**](https://docs.getdbt.com/docs/dbt-versions/core-upgrade/upgrading-to-v1.9.md) | Dec 9, 2024 | Deprecated ⛔️ | | [**v1.8**](https://docs.getdbt.com/docs/dbt-versions/core-upgrade/upgrading-to-v1.8.md) | May 9, 2024 | Deprecated ⛔️ | | [**v1.7**](https://docs.getdbt.com/docs/dbt-versions/core-upgrade/upgrading-to-v1.7.md) | Nov 2, 2023 | End of Life ⚠️ | | [**v1.6**]() | Jul 31, 2023 | End of Life ⚠️ | | [**v1.5**]() | Apr 27, 2023 | End of Life ⚠️ | | [**v1.4**]() | Jan 25, 2023 | End of Life ⚠️ | | [**v1.3**]() | Oct 12, 2022 | End of Life ⚠️ | | [**v1.2**]() | Jul 26, 2022 | Deprecated ⛔️ | | [**v1.1**]() | Apr 28, 2022 | Deprecated ⛔️ | | [**v1.0**]() | Dec 3, 2021 | Deprecated ⛔️ | | **v0.X** ⛔️ | (Various dates) | Deprecated ⛔️ | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | All functionality in dbt Core since the v1.7 release is available in [dbt release tracks](https://docs.getdbt.com/docs/dbt-versions/cloud-release-tracks.md), which provide automated upgrades at a cadence appropriate for your team. 1 Release tracks are required for the Developer and Starter plans on dbt. Accounts using older dbt versions will be migrated to the **Latest** release track. For customers of dbt: dbt Labs strongly recommends migrating environments on older and unsupported versions to [release tracks](https://docs.getdbt.com/docs/dbt-versions/cloud-release-tracks.md) or a supported version. In 2025, dbt Labs will remove the oldest dbt Core versions from availability in dbt platform, starting with v1.0 -- v1.2. ##### Further reading[​](#further-reading "Direct link to Further reading") * [Choosing a dbt Core version in dbt](https://docs.getdbt.com/docs/dbt-versions/upgrade-dbt-version-in-cloud.md): Learn how you can use dbt Core versions in dbt. * [How to install dbt Core](https://docs.getdbt.com/docs/local/install-dbt.md): Learn about installing dbt Core. * [`require-dbt-version`](https://docs.getdbt.com/reference/project-configs/require-dbt-version.md) and [`dbt_version`](https://docs.getdbt.com/reference/dbt-jinja-functions/dbt_version.md): Restrict your project to only work with a range of dbt Core versions, or use the currently running version. #### End-of-life versions[​](#end-of-life-versions "Direct link to End-of-life versions") Once a dbt Core version reaches end-of-life (EOL), it no longer receives patches, including for known bugs. We recommend upgrading to a newer version in [dbt](https://docs.getdbt.com/docs/dbt-versions/upgrade-dbt-version-in-cloud.md) or [dbt Core](https://docs.getdbt.com/docs/local/install-dbt.md#upgrading-dbt-core). All versions prior to v1.0 have been deprecated. #### Current version support[​](#current-version-support "Direct link to Current version support") dbt supports each minor version (for example, v1.8) for *one year* from its initial release. During that window, we release patches with bug fixes and security updates. When we refer to a minor version, we mean its latest available patch (v1.8.x). After a newer minor version ships, the previous one transitions to **critical support** (security and installation fixes only) for the remainder of its one-year window. After the one-year window ends, the version reaches **end of life** and no longer receives patches. While a minor version is officially supported: * You can use it in dbt. For more on dbt versioning, see [Choosing a dbt version](https://docs.getdbt.com/docs/dbt-versions/upgrade-dbt-version-in-cloud.md). * You can select it from the version dropdown on this website, to see documentation that is accurate for use with that minor version. For upcoming releases, refer to the [`dbt-core` milestones](https://github.com/dbt-labs/dbt-core/milestones). #### Upgrading[​](#upgrading "Direct link to Upgrading") Upgrade to new patch versions as soon as they're available. Upgrade to new minor versions when you're ready because you can only get some features and fixes on the latest minor version. dbt makes all versions available as prereleases before the final release. For minor versions, we aim to release one or more betas 4+ weeks before the final release so you can try new features and share feedback. Release candidates are available about two weeks before the final release for testing in production-like environments. Refer to the [`dbt-core` milestones](https://github.com/dbt-labs/dbt-core/milestones) for details. #### How dbt Core uses semantic versioning[​](#how-dbt-core-uses-semantic-versioning "Direct link to How dbt Core uses semantic versioning") dbt Core follows [semantic versioning](https://semver.org/): * **Major versions** (for example, v1 to v2) may include breaking changes. Deprecated functionality will stop working. * **Minor versions** (for example, v1.8 to v1.9) add features and are backwards compatible. They will not break project code that relies on documented functionality. * **Patch versions** (for example, v1.8.0 to v1.8.1) include fixes only: bug fixes, security fixes, or installation fixes. We are committed to avoiding breaking changes in minor versions for end users of dbt. There are two types of breaking changes that may be included in minor versions: * Changes to the Python interface for adapter plugins. These changes are relevant only to adapter maintainers, and they will be clearly communicated in documentation and release notes. For more information, refer to [Build, test, document, and promote adapters guide](https://docs.getdbt.com/guides/adapter-creation.md). * Changes to metadata interfaces, including [artifacts](https://docs.getdbt.com/docs/deploy/artifacts.md) and [logging](https://docs.getdbt.com/reference/events-logging.md), signalled by a version bump. Those version upgrades may require you to update external code that depends on these interfaces, or to coordinate upgrades between dbt orchestrations that share metadata, such as [state-powered selection](https://docs.getdbt.com/reference/node-selection/syntax.md#about-node-selection). ##### Adapter plugin versions[​](#adapter-plugin-versions "Direct link to Adapter plugin versions") dbt releases `dbt-core` and adapter plugins (such as `dbt-snowflake`) independently. Their minor and patch version numbers may not match, but they coordinate through the `dbt-adapters` interface so you won't get a broken experience. For example, `dbt-core==1.8.0` can work with `dbt-snowflake==1.9.0`. If you're building or maintaining an adapter, refer to the [adapter creation guide](https://docs.getdbt.com/guides/adapter-creation.md) for details on the `dbt-adapters` interface. Run `dbt --version` to check your installed versions: ```text $ dbt --version Core: - installed: 1.8.0 - latest: 1.8.0 - Up to date! Plugins: - snowflake: 1.9.0 - Up to date! ``` You can also find the registered adapter version in [logs](https://docs.getdbt.com/reference/global-configs/logs.md). For example, in `logs/dbt.log`: ```text [0m13:13:48.572182 [info ] [MainThread]: Registered adapter: snowflake=1.9.0 ``` Refer to [Supported data platforms](https://docs.getdbt.com/docs/supported-data-platforms.md) for the full list of adapters. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### About dbt Insights EnterpriseEnterprise + ### About dbt Insights [Enterprise](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing")[Enterprise +](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing") Learn how to query data with Insights and view documentation in Catalog. Insights in dbt empowers users to seamlessly explore and query data with an intuitive, context-rich interface. It bridges technical and business users by combining metadata, documentation, AI-assisted tools, and powerful querying capabilities into one unified experience. Insights in dbt integrates with [Catalog](https://docs.getdbt.com/docs/explore/explore-projects.md), [Studio IDE](https://docs.getdbt.com/docs/cloud/studio-ide/develop-in-studio.md), [Canvas](https://docs.getdbt.com/docs/cloud/canvas.md), [Copilot](https://docs.getdbt.com/docs/cloud/dbt-copilot.md), and [Semantic Layer](https://docs.getdbt.com/docs/use-dbt-semantic-layer/dbt-sl.md) to make it easier for you to perform exploratory data analysis, leverage AI-assisted tools, make faster decisions, and collaborate across teams. [![Overview of the dbt Insights and its features](/img/docs/dbt-insights/insights-main.gif?v=2 "Overview of the dbt Insights and its features")](#)Overview of the dbt Insights and its features #### Key benefits[​](#key-benefits "Direct link to Key benefits") Key benefits include: * Quickly write, run, and iterate on SQL queries with tools like syntax highlighting, tabbed editors, and query history. * Leverage dbt metadata, trust signals, and lineage from Catalog for informed query construction. * Make data accessible to users of varied technical skill levels with SQL, Semantic Layer queries, and visual tools. * Use Copilot's AI-assistance to generate or edit SQL queries, descriptions, and more. Some example use cases include: * Analysts can quickly construct queries to analyze sales performance metrics across regions and view results. * All users have a rich development experience powered by Catalog's end-to-end exploration experience. #### Prerequisites[​](#prerequisites "Direct link to Prerequisites") * Be on a dbt [Enterprise-tier](https://www.getdbt.com/pricing) plan — [book a demo](https://www.getdbt.com/contact) to learn more about Insights. * Available on all [tenant](https://docs.getdbt.com/docs/cloud/about-cloud/tenancy.md) configurations. * Have a dbt [developer license](https://docs.getdbt.com/docs/cloud/manage-access/seats-and-users.md) with access to Insights. * Configured [developer credentials](https://docs.getdbt.com/docs/cloud/studio-ide/develop-in-studio.md#get-started-with-the-cloud-ide). * Your production and development [environments](https://docs.getdbt.com/docs/dbt-cloud-environments.md) are on dbt’s ‘Latest’ [release track](https://docs.getdbt.com/docs/dbt-versions/cloud-release-tracks.md) or a supported dbt version. * Use a supported data platform: Snowflake, BigQuery, Databricks, Redshift, or Postgres. * Single sign-on (SSO) for development user accounts is supported. Deployment environments will be queried leveraging the user's development credentials. * (Optional) — To query [Semantic Layer](https://docs.getdbt.com/docs/use-dbt-semantic-layer/dbt-sl.md) metrics from the Insights, you must also: * [Configure](https://docs.getdbt.com/docs/use-dbt-semantic-layer/setup-sl.md) the Semantic Layer for your dbt project. * Have a successful job run in the environment where you configured the Semantic Layer. * (Optional) To enable [Language Server Protocol (LSP) features](https://docs.getdbt.com/docs/explore/navigate-dbt-insights.md#lsp-features-in-dbt-insights) in Insights and run your compilations on the dbt Fusion engine, set your development environment to use the **Latest Fusion** dbt version. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### About dbt installation dbt enables data teams to transform data using analytics engineering best practices. Choose your local development experience from these tools: **Local command line interface (CLI)** * Leverage the speed and scale of the dbt Fusion engine or use dbt Core: * [Install dbt Core](https://docs.getdbt.com/docs/local/install-dbt.md) — Uses the Python-based dbt Core engine for traditional workflows. Does not include LSP features found in the dbt VS Code extension like autocomplete, hover insights, lineage, and more. * [Install dbt Fusion CLI](https://docs.getdbt.com/docs/local/install-dbt.md?version=2#get-started) — Provides Fusion performance benefits (faster parsing, compilation, execution) but does not include LSP features. **dbt VS Code extension** * [Install the official dbt VS Code extension](https://docs.getdbt.com/docs/install-dbt-extension.md) which combines dbt Fusion engine performance with visual LSP features when developing locally to make dbt development smoother and more efficient. #### Getting started[​](#getting-started "Direct link to Getting started") After installing your local development experience, you can get started: * Explore a detailed first-time setup guide for [dbt Fusion engine](https://docs.getdbt.com/guides/fusion.md?step=1). * [Connect to a data platform](https://docs.getdbt.com/docs/local/connect-data-platform/about-dbt-connections.md). * Learn [how to run your dbt projects](https://docs.getdbt.com/docs/running-a-dbt-project/run-your-dbt-projects.md). If you're interested in using the dbt platform, our feature-rich, browser-based UI, you can learn more in [About dbt set up](https://docs.getdbt.com/docs/cloud/about-cloud-setup.md). #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### About dbt integrations Many data applications integrate with dbt, enabling you to leverage the power of dbt for a variety of use cases and workflows. #### Integrations with dbt[​](#integrations-with-dbt "Direct link to Integrations with dbt") [![](/img/icons/vsce.svg)](https://docs.getdbt.com/docs/about-dbt-extension.md) ###### [dbt VS Code extension](https://docs.getdbt.com/docs/about-dbt-extension.md) [The dbt extension brings a hyper-fast, intelligent, and cost-efficient dbt development experience to VS Code. The best way to experience all the power of the new dbt Fusion engine while developing locally.](https://docs.getdbt.com/docs/about-dbt-extension.md) [![](/img/icons/dbt-bit.svg)](https://docs.getdbt.com/docs/cloud-integrations/downstream-exposures.md) ###### [Visualize and orchestrate downstream exposures](https://docs.getdbt.com/docs/cloud-integrations/downstream-exposures.md) [Configure downstream exposures automatically from dashboards and understand how models are used in downstream tools. Proactively refresh the underlying data sources during scheduled dbt jobs.](https://docs.getdbt.com/docs/cloud-integrations/downstream-exposures.md) [![](/img/icons/dbt-bit.svg)](https://docs.getdbt.com/docs/cloud-integrations/avail-sl-integrations.md) ###### [dbt Semantic layer integrations](https://docs.getdbt.com/docs/cloud-integrations/avail-sl-integrations.md) [Review a wide range of partners you can integrate and query with the dbt Semantic Layer.](https://docs.getdbt.com/docs/cloud-integrations/avail-sl-integrations.md) #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### About dbt LSP The dbt Fusion engine offers benefits beyond the speed and power of the framework. The dbt VS Code extension, Studio IDE, and Insights all contain a powerful set of features backed by our Language Server Protocol (LSP) that enable fast, efficient development workflows. The following features are supported across these tools: | | VS Code extension | Studio IDE | Insights | | ---------------------------- | ----------------- | ---------- | -------- | | Autocomplete function names | ✅ | ✅ | ❌ | | Autocomplete ref/source args | ✅ | ✅ | ✅ | | CTE Preview | ✅ | ✅ | ✅ | | Column-level lineage | ✅ | ❌ | ❌ | | Command palette | ✅ | N/A | ❌ | | Error detection | ✅ | ✅ | ✅ | | Go-to definition | ✅ | ✅ | ❌ | | Go-to reference | ✅ | ✅ | ❌ | | Incremental compilation | ✅ | ✅ | ❌ | | Preview query results | ✅ | N/A | ❌ | | Problems tab | ✅ | ✅ | ❌ | | Propagate column renames | ✅ | ❌ | ❌ | | Propagate model renames | ✅ | ❌ | ❌ | | Show column type on hover | ✅ | ✅ | ✅ | | Show compiled SQL | ✅ | ✅ | ❌ | | View table lineage | ✅ | N/A | ❌ | | Warning detection | ✅ | ✅ | ❌ | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### About dbt Mesh Organizations of all sizes rely upon dbt to manage their data transformations, from small startups to large enterprises. At scale, it can be challenging to coordinate all the organizational and technical requirements demanded by your stakeholders within the scope of a single dbt project. To date, there also hasn't been a first-class way to effectively manage the dependencies, governance, and workflows between multiple dbt projects. That's where **Mesh** comes in - empowering data teams to work *independently and collaboratively*; sharing data, code, and best practices without sacrificing security or autonomy. Mesh is not a single product - it is a pattern enabled by a convergence of several features in dbt: * **[Cross-project references](https://docs.getdbt.com/docs/mesh/govern/project-dependencies.md#how-to-write-cross-project-ref)** - this is the foundational feature that enables the multi-project deployments. `{{ ref() }}`s now work across dbt projects on Enterprise and Enterprise+ plans. * **[Catalog](https://docs.getdbt.com/docs/explore/explore-projects.md)** - dbt's metadata-powered documentation platform, complete with full, cross-project lineage. * **Governance** - dbt's governance features allow you to manage access to your dbt models both within and across projects. * **[Groups](https://docs.getdbt.com/docs/mesh/govern/model-access.md#groups)** - With groups, you can organize nodes in your dbt DAG that share a logical connection (for example, by functional area) and assign an owner to the entire group. * **[Access](https://docs.getdbt.com/docs/mesh/govern/model-access.md#access-modifiers)** - access configs allow you to control who can reference models. * **[Model Versions](https://docs.getdbt.com/docs/mesh/govern/model-versions.md)** - when coordinating across projects and teams, we recommend treating your data models as stable APIs. Model versioning is the mechanism to allow graceful adoption and deprecation of models as they evolve. * **[Model Contracts](https://docs.getdbt.com/docs/mesh/govern/model-contracts.md)** - data contracts set explicit expectations on the shape of the data to ensure data changes upstream of dbt or within a project's logic don't break downstream consumers' data products. #### When is the right time to use dbt Mesh?[​](#when-is-the-right-time-to-use-dbt-mesh "Direct link to When is the right time to use dbt Mesh?") The multi-project architecture helps organizations with mature, complex transformation workflows in dbt increase the flexibility and performance of their dbt projects. If you're already using dbt and your project has started to experience any of the following, you're likely ready to start exploring this paradigm: * The **number of models** in your project is degrading performance and slowing down development. * Teams have developed **separate workflows** and need to decouple development from each other. * Teams are experiencing **communication challenges**, and the reliability of some of your data products has started to deteriorate. * **Security and governance** requirements are increasing and would benefit from increased isolation. dbt is designed to coordinate the features above and simplify the complexity to solve for these problems. If you're just starting your dbt journey, don't worry about building a multi-project architecture right away. You can *incrementally* adopt the features as you scale. The collection of features work effectively as independent tools. Familiarizing yourself with the tooling and features that make up a multi-project architecture, and how they can apply to your organization will help you make better decisions as you grow. For additional information, refer to the [Mesh FAQs](https://docs.getdbt.com/best-practices/how-we-mesh/mesh-5-faqs.md). #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### About dbt Model Context Protocol (MCP) server As AI becomes more deeply integrated into data workflows, dbt users need a seamless way to access and integrate dbt's structured metadata and execution context effectively. This page provides an overview of dbt's MCP server, which exposes this context, supporting use cases such as conversational access to data, agent-driven automation of dbt workflows, and AI-assisted development. The [dbt MCP server](https://github.com/dbt-labs/dbt-mcp) provides a standardized framework that lets you integrate AI applications with dbt‑managed data assets across different data platforms. This ensures consistent, governed access to models, metrics, lineage, and freshness across your AI tools. The MCP server provides access to the dbt CLI, [API](https://docs.getdbt.com/docs/dbt-cloud-apis/overview.md), the [Discovery API](https://docs.getdbt.com/docs/dbt-cloud-apis/discovery-api.md), and [Semantic Layer](https://docs.getdbt.com/docs/use-dbt-semantic-layer/dbt-sl.md). It provides access to private APIs, text-to-SQL, and SQL execution. For more information on MCP, have a look at [Get started with the Model Context Protocol (MCP)](https://modelcontextprotocol.io/introduction). #### Server access[​](#server-access "Direct link to Server access") You can use the dbt MCP server in two ways: locally or remotely. Choose the setup that best fits your workflow: ##### Local MCP server[​](#local-mcp-server "Direct link to Local MCP server") The local MCP server provides the best experience for development workflows, like authoring dbt models, tests, and documentation. The [local MCP server](https://docs.getdbt.com/docs/dbt-ai/setup-local-mcp.md) runs on your machine and requires installing `uvx` (which installs dbt-mcp locally). This option provides: * Full access to dbt CLI commands (`dbt run`, `dbt build`, `dbt test`, and more) * Support for dbt Core, dbt CLI, and dbt Fusion engine * Ability to work with local dbt projects without requiring a dbt platform account * Optional integration with dbt platform APIs for metadata discovery and Semantic Layer access ##### Remote MCP server[​](#remote-mcp-server "Direct link to Remote MCP server") The remote MCP server from dbt offers data consumption use cases without local setup. The [remote MCP server](https://docs.getdbt.com/docs/dbt-ai/setup-remote-mcp.md) connects to the dbt platform via HTTP and requires no local installation. This option is useful when: * You either don’t want to install, or are restricted from installing, additional software on your system. * Your use case is primarily consumption-based (for example, querying metrics, exploring metadata, viewing lineage). info Only [`text_to_sql`](#sql) consumes dbt Copilot credits. Other MCP tools do not. When your account runs out of Copilot credits, the remote MCP server blocks all tools that run through it, even tools invoked from a local MCP server and [proxied](https://github.com/dbt-labs/dbt-mcp/blob/main/src/dbt_mcp/tools/toolsets.py#L24) to remote MCP (like SQL and remote Fusion tools). If you reach your dbt Copilot usage limit, all tools will be blocked until your Copilot credits reset. If you need help, please reach out to your account manager. #### Available tools[​](#available-tools "Direct link to Available tools") The following tool list is available for your MCP server and is auto-fetched from the [dbt MCP server README on GitHub](https://github.com/dbt-labs/dbt-mcp#tools) when the docs are built, so it stays in sync with each release. ##### SQL[​](#sql "Direct link to SQL") Tools for executing and generating SQL on dbt Platform infrastructure. * `execute_sql`: Executes SQL on dbt Platform infrastructure with Semantic Layer support. * `text_to_sql`: Generates SQL from natural language using project context. ##### Semantic Layer[​](#semantic-layer "Direct link to Semantic Layer") To learn more about the dbt Semantic Layer, click [here](https://docs.getdbt.com/docs/use-dbt-semantic-layer/dbt-sl). * `get_dimensions`: Gets dimensions for specified metrics. * `get_entities`: Gets entities for specified metrics. * `get_metrics_compiled_sql`: Returns compiled SQL for metrics without executing the query. * `list_metrics`: Retrieves all defined metrics. * `list_saved_queries`: Retrieves all saved queries. * `query_metrics`: Executes metric queries with filtering and grouping options. ##### Discovery[​](#discovery "Direct link to Discovery") To learn more about the dbt Discovery API, click [here](https://docs.getdbt.com/docs/dbt-cloud-apis/discovery-api). * `get_all_macros`: Retrieves macros; option to filter by package or return package names only. * `get_all_models`: Retrieves name and description of all models. * `get_all_sources`: Gets all sources with freshness status; option to filter by source name. * `get_exposure_details`: Gets exposure details including owner, parents, and freshness status. * `get_exposures`: Gets all exposures (downstream dashboards, apps, or analyses). * `get_lineage`: Gets full lineage graph (ancestors and descendants) with type and depth filtering. * `get_macro_details`: Gets details for a specific macro. * `get_mart_models`: Retrieves all mart models. * `get_model_children`: Gets downstream dependents of a model. * `get_model_details`: Gets model details including compiled SQL, columns, and schema. * `get_model_health`: Gets health signals: run status, test results, and upstream source freshness. * `get_model_parents`: Gets upstream dependencies of a model. * `get_model_performance`: Gets execution history for a model; option to include test results. * `get_related_models`: Finds similar models using semantic search. * `get_seed_details`: Gets details for a specific seed. * `get_semantic_model_details`: Gets details for a specific semantic model. * `get_snapshot_details`: Gets details for a specific snapshot. * `get_source_details`: Gets source details including columns and freshness. * `get_test_details`: Gets details for a specific test. * `search`: \[Alpha] Searches for resources across the dbt project (not generally available). ##### dbt CLI[​](#dbt-cli "Direct link to dbt CLI") Allowing your client to utilize dbt commands through the MCP tooling could modify your data models, sources, and warehouse objects. Proceed only if you trust the client and understand the potential impact. * `build`: Executes models, tests, snapshots, and seeds in DAG order. * `compile`: Generates executable SQL from models/tests/analyses; useful for validating Jinja logic. * `docs`: Generates documentation for the dbt project. * `get_lineage_dev`: Retrieves lineage from local manifest.json with type and depth filtering. * `get_node_details_dev`: Retrieves node details from local manifest.json (models, seeds, snapshots, sources). * `list`: Lists resources in the dbt project by type with selector support. * `parse`: Parses and validates project files for syntax correctness. * `run`: Executes models to materialize them in the database. * `show`: Executes SQL against the database and returns results. * `test`: Runs tests to validate data and model integrity. ##### Admin API[​](#admin-api "Direct link to Admin API") To learn more about the dbt Administrative API, click [here](https://docs.getdbt.com/docs/dbt-cloud-apis/admin-cloud-api). * `cancel_job_run`: Cancels a running job. * `get_job_details`: Gets job configuration including triggers, schedule, and dbt commands. * `get_job_run_artifact`: Downloads a specific artifact file from a job run. * `get_job_run_details`: Gets run details including status, timing, steps, and artifacts. * `get_job_run_error`: Gets error and/or warning details for a job run; option to include or show warnings only. * `get_project_details`: Gets project information for a specific dbt project. * `list_job_run_artifacts`: Lists available artifacts from a job run. * `list_jobs`: Lists jobs in a dbt Platform account; option to filter by project or environment. * `list_jobs_runs`: Lists job runs; option to filter by job, status, or order by field. * `list_projects`: Lists all projects in the dbt Platform account. * `retry_job_run`: Retries a failed job run. * `trigger_job_run`: Triggers a job run; option to override git branch, schema, or other settings. ##### dbt Codegen[​](#dbt-codegen "Direct link to dbt Codegen") These tools help automate boilerplate code generation for dbt project files. * `generate_model_yaml`: Generates model YAML with columns; option to inherit upstream descriptions. * `generate_source`: Generates source YAML by introspecting database schemas; option to include columns. * `generate_staging_model`: Generates staging model SQL from a source table. ##### dbt LSP[​](#dbt-lsp "Direct link to dbt LSP") A set of tools that leverage the Fusion engine for advanced SQL compilation and column-level lineage analysis. * `fusion.compile_sql`: Compiles SQL in project context via dbt Platform. * `fusion.get_column_lineage`: Traces column-level lineage via dbt Platform. * `get_column_lineage`: Traces column-level lineage locally (requires dbt-lsp via dbt Labs VSCE). ##### Product Docs[​](#product-docs "Direct link to Product Docs") Tools for searching and fetching content from the official dbt documentation at docs.getdbt.com. * `get_product_doc_pages`: Fetches the full Markdown content of one or more docs.getdbt.com pages by path or URL. * `search_product_docs`: Searches docs.getdbt.com for pages matching a query; returns titles, URLs, and descriptions ranked by relevance. Use get\_product\_doc\_pages to fetch full content. ##### MCP Server Metadata[​](#mcp-server-metadata "Direct link to MCP Server Metadata") These tools provide information about the MCP server itself. * `get_mcp_server_branch`: Returns the current git branch of the running dbt MCP server. * `get_mcp_server_version`: Returns the current version of the dbt MCP server. ##### Supported tools by MCP server type[​](#supported-tools-by-mcp-server-type "Direct link to Supported tools by MCP server type") The dbt MCP server has access to many parts of the dbt experience related to development, deployment, and discovery. Here are the categories of tools supported based on what form of the MCP server you connect to as well as detailed information on exact commands or queries available to the LLM. Note that access to the [dbt APIs](https://docs.getdbt.com/docs/dbt-cloud-apis/overview.md) is limited depending on your [plan type](https://www.getdbt.com/pricing). | Tools | Local | Remote | | ------------------ | ----- | ------ | | dbt CLI | ✅ | ❌ | | Semantic Layer | ✅ | ✅ | | SQL | ✅ | ✅ | | Metadata Discovery | ✅ | ✅ | | Administrative API | ✅ | ❌ | | Codegen Tools | ✅ | ❌ | | Fusion Tools | ✅ | ✅ | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | #### MCP integrations[​](#mcp-integrations "Direct link to MCP integrations") The dbt MCP server integrates with any [MCP client](https://modelcontextprotocol.io/clients) that supports token authentication and tool use capabilities. We have also created integration guides for the following clients: * [Claude](https://docs.getdbt.com/docs/dbt-ai/integrate-mcp-claude.md) * [Cursor](https://docs.getdbt.com/docs/dbt-ai/integrate-mcp-cursor.md) * [VS Code](https://docs.getdbt.com/docs/dbt-ai/integrate-mcp-vscode.md) #### Resources[​](#resources "Direct link to Resources") * For more information, refer to our blog on [Introducing the dbt MCP Server](https://docs.getdbt.com/blog/introducing-dbt-mcp-server#getting-started). #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### About dbt models dbt Core and Cloud are composed of different moving parts working harmoniously. All of them are important to what dbt does — transforming data—the 'T' in ELT. When you execute `dbt run`, you are running a model that will transform your data without that data ever leaving your warehouse. Models are where your developers spend most of their time within a dbt environment. Models are primarily written as a `select` statement and saved as a `.sql` file. While the definition is straightforward, the complexity of the execution will vary from environment to environment. Models will be written and rewritten as needs evolve and your organization finds new ways to maximize efficiency. SQL is the language most dbt users will utilize, but it is not the only one for building models. Starting in version 1.3, dbt Core and dbt support Python models. Python models are useful for training or deploying data science models, complex transformations, or where a specific Python package meets a need — such as using the `dateutil` library to parse dates. ##### Models and modern workflows[​](#models-and-modern-workflows "Direct link to Models and modern workflows") The top level of a dbt workflow is the project. A project is a directory of a `.yml` file (the project configuration) and either `.sql` or `.py` files (the models). The project file tells dbt the project context, and the models let dbt know how to build a specific data set. For more details on projects, refer to [About dbt projects](https://docs.getdbt.com/docs/build/projects.md). Your organization may need only a few models, but more likely you’ll need a complex structure of nested models to transform the required data. A model is a single file containing a final `select` statement, and a project can have multiple models, and models can even reference each other. Add to that, numerous projects and the level of effort required for transforming complex data sets can improve drastically compared to older methods. Learn more about models in [SQL models](https://docs.getdbt.com/docs/build/sql-models.md) and [Python models](https://docs.getdbt.com/docs/build/python-models.md) pages. If you'd like to begin with a bit of practice, visit our [Getting Started Guide](https://docs.getdbt.com/guides.md) for instructions on setting up the Jaffle\_Shop sample data so you can get hands-on with the power of dbt. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### About dbt projects A dbt project informs dbt about the context of your project and how to transform your data (build your data sets). By design, dbt enforces the top-level structure of a dbt project such as the `dbt_project.yml` file, the `models` directory, the `snapshots` directory, and so on. Within the directories of the top-level, you can organize your project in any way that meets the needs of your organization and data pipeline. At a minimum, all a project needs is the `dbt_project.yml` project configuration file. dbt supports a number of different resources, so a project may also include: | Resource | Description | | ------------------------------------------------------------------------ | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | [models](https://docs.getdbt.com/docs/build/models.md) | Each model lives in a single file and contains logic that either transforms raw data into a dataset that is ready for analytics or, more often, is an intermediate step in such a transformation. | | [snapshots](https://docs.getdbt.com/docs/build/snapshots.md) | A way to capture the state of your mutable tables so you can refer to it later. | | [seeds](https://docs.getdbt.com/docs/build/seeds.md) | CSV files with static data that you can load into your data platform with dbt. | | [data tests](https://docs.getdbt.com/docs/build/data-tests.md) | SQL queries that you can write to test the models and resources in your project. | | [macros](https://docs.getdbt.com/docs/build/jinja-macros.md) | Blocks of code that you can reuse multiple times. | | [docs](https://docs.getdbt.com/docs/build/documentation.md) | Docs for your project that you can build. | | [sources](https://docs.getdbt.com/docs/build/sources.md) | A way to name and describe the data loaded into your warehouse by your Extract and Load tools. | | [exposures](https://docs.getdbt.com/docs/build/exposures.md) | A way to define and describe a downstream use of your project. | | [metrics](https://docs.getdbt.com/docs/build/build-metrics-intro.md) | A way for you to define metrics for your project. | | [groups](https://docs.getdbt.com/docs/build/groups.md) | Groups enable collaborative node organization in restricted collections. | | [analysis](https://docs.getdbt.com/docs/build/analyses.md) | A way to organize analytical SQL queries in your project such as the general ledger from your QuickBooks. | | [semantic models](https://docs.getdbt.com/docs/build/semantic-models.md) | Semantic models define the foundational data relationships in [MetricFlow](https://docs.getdbt.com/docs/build/about-metricflow.md) and the [Semantic Layer](https://docs.getdbt.com/docs/use-dbt-semantic-layer/dbt-sl.md), enabling you to query metrics using a semantic graph. | | [saved queries](https://docs.getdbt.com/docs/build/saved-queries.md) | Saved queries organize reusable queries by grouping metrics, dimensions, and filters into nodes visible in the dbt DAG. | | [user-defined functions](https://docs.getdbt.com/docs/build/udfs.md) | User-defined functions (UDFs) let you create reusable custom functions in your warehouse, shareable across dbt, BI tools, data science workflows, and more. | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | When building out the structure of your project, you should consider these impacts on your organization's workflow: * **How would people run dbt commands** — Selecting a path * **How would people navigate within the project** — Whether as developers in the Studio IDE or stakeholders from the docs * **How would people configure the models** — Some bulk configurations are easier done at the directory level so people don't have to remember to do everything in a config block with each new model #### Project configuration[​](#project-configuration "Direct link to Project configuration") Every dbt project includes a project configuration file called `dbt_project.yml`. It defines the directory of the dbt project and other project configurations. Edit `dbt_project.yml` to set up common project configurations such as: | YAML key | Value description | | ----------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------- | | [name](https://docs.getdbt.com/reference/project-configs/name.md) | Your project’s name in [snake case](https://en.wikipedia.org/wiki/Snake_case) | | [version](https://docs.getdbt.com/reference/project-configs/version.md) | Version of your project | | [require-dbt-version](https://docs.getdbt.com/reference/project-configs/require-dbt-version.md) | Restrict your project to only work with a range of [dbt Core versions](https://docs.getdbt.com/docs/dbt-versions/core.md) | | [profile](https://docs.getdbt.com/reference/project-configs/profile.md) | The profile dbt uses to connect to your data platform | | [model-paths](https://docs.getdbt.com/reference/project-configs/model-paths.md) | Directories to where your model and source files live | | [seed-paths](https://docs.getdbt.com/reference/project-configs/seed-paths.md) | Directories to where your seed files live | | [test-paths](https://docs.getdbt.com/reference/project-configs/test-paths.md) | Directories to where your test files live | | [analysis-paths](https://docs.getdbt.com/reference/project-configs/analysis-paths.md) | Directories to where your analyses live | | [macro-paths](https://docs.getdbt.com/reference/project-configs/macro-paths.md) | Directories to where your macros live | | [snapshot-paths](https://docs.getdbt.com/reference/project-configs/snapshot-paths.md) | Directories to where your snapshots live | | [docs-paths](https://docs.getdbt.com/reference/project-configs/docs-paths.md) | Directories to where your docs blocks live | | [vars](https://docs.getdbt.com/docs/build/project-variables.md) | Project variables you want to use for data compilation | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | For complete details on project configurations, see [dbt\_project.yml](https://docs.getdbt.com/reference/dbt_project.yml.md). #### Project subdirectories[​](#project-subdirectories "Direct link to Project subdirectories") You can use the Project subdirectory option in dbt to specify a subdirectory in your git repository that dbt should use as the root directory for your project. This is helpful when you have multiple dbt projects in one repository or when you want to organize your dbt project files into subdirectories for easier management. To use the Project subdirectory option in dbt, follow these steps: 1. Click your account name in the bottom left and select **Your profile**. 2. Under **Projects**, select the project you want to configure as a project subdirectory. 3. Select **Edit** on the lower right-hand corner of the page. 4. In the **Project subdirectory** field, add the name of the subdirectory. For example, if your project YAML files are located in a subdirectory called `/finance`, you would enter `finance` as the subdirectory. * You can also reference nested subdirectories. For example, if your project YAML files are located in `/teams/finance`, you would enter `teams/finance` as the subdirectory. **Note**: You do not need a leading or trailing `/` in the Project subdirectory field. 5. Click **Save** when you've finished. After configuring the Project subdirectory option, dbt will use it as the root directory for your dbt project. This means that dbt commands, such as `dbt run` or `dbt test`, will operate on files within the specified subdirectory. If there is no `dbt_project.yml` file in the Project subdirectory, you will be prompted to initialize the dbt project. Project support in dbt plans Some [plans](https://www.getdbt.com/pricing) support only one dbt project, while [Enterprise-tier plans](https://www.getdbt.com/contact) allow multiple projects and [cross-project references](https://docs.getdbt.com/best-practices/how-we-mesh/mesh-1-intro.md) with Mesh. #### New projects[​](#new-projects "Direct link to New projects") You can create new projects and [share them](https://docs.getdbt.com/docs/cloud/git/git-version-control.md) with other people by making them available on a hosted git repository like GitHub, GitLab, and BitBucket. After you set up a connection with your data platform, you can [initialize your new project in dbt](https://docs.getdbt.com/guides.md) and start developing. Or, run [dbt init from the command line](https://docs.getdbt.com/reference/commands/init.md) to set up your new project. During project initialization, dbt creates sample model files in your project directory to help you start developing quickly. #### Sample projects[​](#sample-projects "Direct link to Sample projects") If you want to explore dbt projects more in-depth, you can clone dbt Lab’s [Jaffle shop](https://github.com/dbt-labs/jaffle_shop) on GitHub. It's a runnable project that contains sample configurations and helpful notes. If you want to see what a mature, production project looks like, check out the [GitLab Data Team public repo](https://gitlab.com/gitlab-data/analytics/-/tree/master/transform/snowflake-dbt). #### Related docs[​](#related-docs "Direct link to Related docs") * [Best practices: How we structure our dbt projects](https://docs.getdbt.com/best-practices/how-we-structure/1-guide-overview.md) * [Quickstarts for dbt](https://docs.getdbt.com/guides.md) * [Quickstart for dbt Core](https://docs.getdbt.com/guides/manual-install.md) #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### About documentation Good documentation for your dbt models will help downstream consumers discover and understand the datasets you curate for them. dbt provides a way to generate documentation for your dbt project and render it as a website. Tip Use [dbt Copilot](https://docs.getdbt.com/docs/cloud/dbt-copilot.md), available for dbt Enterprise and Enterprise+ accounts, to generate documentation in the Studio IDE only. #### Related documentation[​](#related-documentation "Direct link to Related documentation") * [Declaring properties](https://docs.getdbt.com/reference/configs-and-properties.md) * [`dbt docs` command](https://docs.getdbt.com/reference/commands/cmd-docs.md) * [`doc` Jinja function](https://docs.getdbt.com/reference/dbt-jinja-functions/doc.md) * If you're new to dbt, we recommend that you check out our [quickstart guide](https://docs.getdbt.com/guides.md) to build your first dbt project, complete with documentation. #### Assumed knowledge[​](#assumed-knowledge "Direct link to Assumed knowledge") * [Data tests](https://docs.getdbt.com/docs/build/data-tests.md) #### Overview[​](#overview "Direct link to Overview") dbt provides a scalable way to [generate](#generating-documentation) documentation for your dbt project using descriptions and commands. The documentation for your project includes: * **Information about your project**: including model code, a DAG of your project, any tests you've added to a column, and more. * **Information about your data warehouse**: including column data types, and table sizes. This information is generated by running queries against the information schema. * Importantly, dbt also provides a way to add **descriptions** to models, columns, sources, and more, to further enhance your documentation. The following sections describe how to [add descriptions](#adding-descriptions-to-your-project) to your project, [generate documentation](#generating-documentation), how to use [docs blocks](#using-docs-blocks), and set a [custom overview](#setting-a-custom-overview) for your documentation. #### Adding descriptions to your project[​](#adding-descriptions-to-your-project "Direct link to Adding descriptions to your project") Before generating documentation, add [descriptions](https://docs.getdbt.com/reference/resource-properties/description.md) to your project resources. Add the `description:` key to the same YAML files where you declare [data tests](https://docs.getdbt.com/docs/build/data-tests.md). For example: models/\.yml ```yaml models: - name: events description: This table contains clickstream events from the marketing website columns: - name: event_id description: This is a unique identifier for the event data_tests: - unique - not_null - name: user-id quote: true description: The user who performed the event data_tests: - not_null ``` ##### FAQs[​](#faqs "Direct link to FAQs") Are there any example dbt documentation sites? Yes! * **Quickstart Tutorial:** You can build your own example dbt project in the [quickstart guide](https://docs.getdbt.com/docs/get-started-dbt.md) * **Jaffle Shop:** A demonstration project (closely related to the tutorial) for a fictional e-commerce store ([main source code](https://github.com/dbt-labs/jaffle-shop) and [source code using duckdb](https://github.com/dbt-labs/jaffle_shop_duckdb)) * **GitLab:** Gitlab's internal dbt project is open source and is a great example of how to use dbt at scale ([source code](https://gitlab.com/gitlab-data/analytics/-/tree/master/transform/snowflake-dbt)) * **dummy-dbt:** A containerized dbt project that populates the Sakila database in Postgres and populates dbt seeds, models, snapshots, and tests. The project can be used for testing and experimentation purposes ([source code](https://github.com/gmyrianthous/dbt-dummy)) * **Google Analytics 4:** A demonstration project that transforms the Google Analytics 4 BigQuery exports to various models ([source code](https://github.com/stacktonic-com/stacktonic-dbt-example-project), [docs](https://stacktonic.com/article/google-analytics-big-query-and-dbt-a-dbt-example-project)) * **Make Open Data:** A production-grade ELT with tests, documentation, and CI/CD (GHA) about French open data (housing, demography, geography, etc). It can be used to learn with voluminous and ambiguous data. Contributions are welcome ([source code](https://github.com/make-open-data/make-open-data), [docs](https://make-open-data.fr/)) If you have an example project to add to this list, suggest an edit by clicking **Edit this page** below. Do I need to add a YAML entry for column for it to appear in the docs site? Fortunately, no! dbt will introspect your warehouse to generate a list of columns in each relation, and match it with the list of columns in your `.yml` files. As such, any undocumented columns will still appear in your documentation! How do I write long-form explanations in my descriptions? If you need more than a sentence to explain a model, you can: 1. Split your description over multiple lines using `>`. Interior line breaks are removed and Markdown can be used. This method is recommended for simple, single-paragraph descriptions: ```yml models: - name: customers description: > Lorem ipsum **dolor** sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. ``` 2. Split your description over multiple lines using `|`. Interior line breaks are maintained and Markdown can be used. This method is recommended for more complex descriptions: ```yml models: - name: customers description: | ### Lorem ipsum * dolor sit amet, consectetur adipisicing elit, sed do eiusmod * tempor incididunt ut labore et dolore magna aliqua. ``` 3. Use a [docs block](https://docs.getdbt.com/docs/build/documentation.md#using-docs-blocks) to write the description in a separate Markdown file. How do I access documentation in dbt Catalog? If you're using dbt to deploy your project and have a [Starter, Enterprise, or Enterprise+ plan](https://www.getdbt.com/pricing/), you can use Catalog to view your project's [resources](https://docs.getdbt.com/docs/build/projects.md) (such as models, tests, and metrics) and their lineage to gain a better understanding of its latest production state. Access Catalog in dbt by clicking the **Catalog** link in the navigation. You can have up to 5 read-only users access the documentation for your project. dbt developer plan and dbt Core users can use [dbt Docs](https://docs.getdbt.com/docs/explore/build-and-view-your-docs.md#dbt-docs), which generates basic documentation but it doesn't offer the same speed, metadata, or visibility as Catalog. Can I document things other than models, like sources, seeds, and snapshots? Yes! You can document almost everything in your project using the `description:` key. Check out the reference docs on [descriptions](https://docs.getdbt.com/reference/resource-properties/description.md) for more info! #### Generating documentation[​](#generating-documentation "Direct link to Generating documentation") Generate documentation for your project by following these steps: 1. Run the `dbt docs generate` [command](https://docs.getdbt.com/reference/commands/cmd-docs.md#dbt-docs-generate) to compile relevant information about your dbt project and warehouse into `manifest.json` and `catalog.json` files, respectively. 2. Ensure you've created the models with `dbt run` or `dbt build` to view the documentation for all columns, not just those described in your project. 3. Run the `dbt docs serve` [command](https://docs.getdbt.com/reference/commands/cmd-docs.md#dbt-docs-serve) if you're developing locally to use these `.json` files to populate a local website. dbt provides two complementary ways to [view documentation](https://docs.getdbt.com/docs/build/view-documentation.md), and your descriptions, after it's generated: * [**dbt Docs**:](https://docs.getdbt.com/docs/build/view-documentation.md#dbt-docs) A static documentation site with model lineage, metadata, and documentation that can be hosted on your web server (like S3 or Netlify). Available for dbt Core or dbt Developer plans. * [**Catalog**](https://docs.getdbt.com/docs/explore/explore-projects.md): Builds upon dbt Docs to provide a dynamic, real-time interface with enhanced metadata, customizable views, deeper project insights, and collaboration tools. Available on dbt [Starter, Enterprise, or Enterprise+ plans](https://www.getdbt.com/pricing). See [View documentation](https://docs.getdbt.com/docs/build/view-documentation.md) to get the most out of your dbt project's documentation. #### Using docs blocks[​](#using-docs-blocks "Direct link to Using docs blocks") Docs blocks provide a robust method for documenting models and other resources using Jinja and markdown. Docs block files can contain arbitrary markdown, but they must be uniquely named. ##### Syntax[​](#syntax "Direct link to Syntax") To declare a docs block, use the Jinja `docs` tag. The name of a docs block can't start with a digit and may contain: * Uppercase and lowercase letters (A-Z, a-z) * Digits (0-9) * Underscores (\_) events.md ```markdown {% docs table_events %} This table contains clickstream events from the marketing website. The events in this table are recorded by Snowplow and piped into the warehouse on an hourly basis. The following pages of the marketing site are tracked: - / - /about - /team - /contact-us {% enddocs %} ``` In this example, a docs block named `table_events` is defined with some descriptive markdown contents. There is nothing significant about the name `table_events` — docs blocks can be named however you like, as long as the name only contains alphanumeric and underscore characters and doesn't start with a numeric character. ##### Placement[​](#placement "Direct link to Placement") ##### Usage[​](#usage "Direct link to Usage") To use a docs block, reference it from your `schema.yml` file with the [doc()](https://docs.getdbt.com/reference/dbt-jinja-functions/doc.md) function in place of a markdown string. Using the examples above, the `table_events` docs can be included in the `schema.yml` file as shown here: schema.yml ```yaml models: - name: events description: '{{ doc("table_events") }}' columns: - name: event_id description: This is a unique identifier for the event data_tests: - unique - not_null ``` In the resulting documentation, `'{{ doc("table_events") }}'` will be expanded to the markdown defined in the `table_events` docs block. #### Setting a custom overview[​](#setting-a-custom-overview "Direct link to Setting a custom overview") *Currently available for dbt Docs only.* The "overview" shown in the dbt Docs website can be overridden by supplying your own docs block called `__overview__`. * By default, dbt supplies an overview with helpful information about the docs site itself. * Depending on your needs, it may be a good idea to override this docs block with specific information about your company style guide, links to reports, or information about who to contact for help. * To override the default overview, create a docs block that looks like this: models/overview.md ```markdown {% docs __overview__ %} # Monthly Recurring Revenue (MRR) playbook. This dbt project is a worked example to demonstrate how to model subscription revenue. **Check out the full write-up [here](https://blog.getdbt.com/modeling-subscription-revenue/), as well as the repo for this project [here](https://github.com/dbt-labs/mrr-playbook/).** ... {% enddocs %} ``` ##### Custom project-level overviews[​](#custom-project-level-overviews "Direct link to Custom project-level overviews") *Currently available for dbt Docs only.* You can set different overviews for each dbt project/package included in your documentation site by creating a docs block named `__[project_name]__`. For example, in order to define custom overview pages that appear when a viewer navigates inside the `dbt_utils` or `snowplow` package: models/overview.md ```markdown {% docs __dbt_utils__ %} # Utility macros Our dbt project heavily uses this suite of utility macros, especially: - `surrogate_key` - `test_equality` - `pivot` {% enddocs %} {% docs __snowplow__ %} # Snowplow sessionization Our organization uses this package of transformations to roll Snowplow events up to page views and sessions. {% enddocs %} ``` #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### About Hybrid projects Enterprise + ### About Hybrid projects [Enterprise +](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing") With Hybrid projects, your organization can adopt complementary dbt Core and dbt workflows (where some teams deploy projects in dbt Core and others in dbt) and seamlessly integrate these workflows by automatically uploading dbt Core [artifacts](https://docs.getdbt.com/reference/artifacts/dbt-artifacts.md) into dbt. Available in public preview Hybrid projects is available in public preview to [dbt Enterprise accounts](https://www.getdbt.com/pricing). dbt Core users can seamlessly upload [artifacts](https://docs.getdbt.com/reference/artifacts/dbt-artifacts.md) like [run results.json](https://docs.getdbt.com/reference/artifacts/run-results-json.md), [manifest.json](https://docs.getdbt.com/reference/artifacts/manifest-json.md), [catalog.json](https://docs.getdbt.com/reference/artifacts/catalog-json.md), [sources.json](https://docs.getdbt.com/reference/artifacts/sources-json.md), and so on — into dbt after executing a run in the dbt Core command line interface (CLI), which helps: * Collaborate with dbt + dbt Core users by enabling them to visualize and perform [cross-project references](https://docs.getdbt.com/docs/mesh/govern/project-dependencies.md#how-to-write-cross-project-ref) to dbt models that live in Core projects. * (Coming soon) New users interested in the [Canvas](https://docs.getdbt.com/docs/cloud/canvas.md) can build off of dbt models already created by a central data team in dbt Core rather than having to start from scratch. * dbt Core and dbt users can navigate to [Catalog](https://docs.getdbt.com/docs/explore/explore-projects.md) and view their models and assets. To view Catalog, you must have a [read-only seat](https://docs.getdbt.com/docs/cloud/manage-access/seats-and-users.md). #### Prerequisites[​](#prerequisites "Direct link to Prerequisites") To upload artifacts, make sure you meet these prerequisites: * Your organization is on a [dbt Enterprise+ plan](https://www.getdbt.com/pricing) * You're on [dbt's release tracks](https://docs.getdbt.com/docs/dbt-versions/cloud-release-tracks.md) and your dbt Core project is on dbt v1.10 or higher * [Configured](https://docs.getdbt.com/docs/deploy/hybrid-setup.md#connect-project-in-dbt-cloud) a hybrid project in dbt. * Updated your existing dbt Core project with latest changes and [configured it with model access](https://docs.getdbt.com/docs/deploy/hybrid-setup.md#make-dbt-core-models-public): * Ensure models that you want to share with other dbt projects use `access: public` in their model configuration. This makes the models more discoverable and shareable * Learn more about [access modifier](https://docs.getdbt.com/docs/mesh/govern/model-access.md#access-modifiers) and how to set the [`access` config](https://docs.getdbt.com/reference/resource-configs/access.md) * Update [dbt permissions](https://docs.getdbt.com/docs/cloud/manage-access/enterprise-permissions.md) to create a new project in dbt **Note:** Uploading artifacts doesn't count against dbt run slots. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### About Iceberg catalogs Data catalogs have recently become at the top of the data industry's mind, especially given the excitement about Iceberg and data governance for AI. It has become an overused term that represents a broad set of tools. So, before we dive into Iceberg catalogs, let's start at the beginning: #### About data catalogs[​](#about-data-catalogs "Direct link to About data catalogs") The short answer is it’s **data about your data**. A Data Catalog is a centralized metadata management layer that enables users and tools to discover, understand, and govern data effectively. At its core, it organizes metadata about datasets, including information about the schemas, lineage, access controls, and business context to help technical and non-technical users work with data more efficiently. ##### History of data catalogs[​](#history-of-data-catalogs "Direct link to History of data catalogs") Data catalogs aren’t a new concept. Data dictionaries were the earliest forms of catalogs, and they were part of relational databases. These dictionaries stored schema-level metadata (like table names). They weren’t made for business users and were very manual. Fast forward to the early 2010s, and the industry began to delve deeply into [Hadoop](https://hadoop.apache.org/) and data lakes. [Hive Metastore](https://hive.apache.org/) became the standard for managing schema metadata in Hadoop ecosystems. However, it was still limited to structural metadata, as it lacked lineage, discovery, and business context metadata. Next, there was the emergence of open source technical catalogs like [Iceberg](https://iceberg.apache.org/terms/), [Polaris](https://polaris.apache.org/), and [Unity Catalog](https://www.unitycatalog.io/), and business catalogs like [Atlan](https://atlan.com/what-is-a-data-catalog/). In the era of AI, it’s more important than ever to have catalogs that can support structural metadata and business logic. For data teams, the catalogs can fall into two buckets: * **Technical data catalogs:** Focus on structural metadata, including information about data like table and column names, data types, storage locations (particularly important for open table formats), and access controls. They usually come either “built-in” (no setup needed) or externally managed and integrated into your data platform. They are used by compute engines to locate and interact with data. * **Business data catalogs:** Serve broader organizational users (BI analysts, product managers, etc.). They enrich technical metadata with business context in the form of metrics, business definitions, data quality indicators, usage patterns, and ownership. ##### Why data catalogs are important to dbt[​](#why-data-catalogs-are-important-to-dbt "Direct link to Why data catalogs are important to dbt") For dbt users working in a lakehouse or multi-engine architecture, understanding and interacting with data catalogs is essential for several reasons, including: * **Table Discovery:** dbt models are registered in catalogs. Understanding the catalog structure is critical for managing datasets and informing dbt about what has already been built and where it resides. * **Cross-Engine Interoperability:** Iceberg catalogs allow datasets created by one compute engine to be read by another. This is what dbt Mesh’s cross-platform functionality is built on. #### About Iceberg catalogs[​](#about-iceberg-catalogs "Direct link to About Iceberg catalogs") Apache Iceberg is an open table format designed for petabyte-scale analytic datasets. It supports schema evolution, time travel, partition pruning, and transactional operations across distributed compute engines. Iceberg catalogs are a critical abstraction layer that maps logical table names to their metadata locations and provides a namespace mechanism. They decouple compute engines from the physical layout of data, enabling multiple tools to interoperate consistently on the same dataset. There are multiple types of Iceberg catalogs: * Iceberg REST * Iceberg REST compatible * Delta/Iceberg Hybrid\* Hybrid catalogs support storing duplicate table metadata in Iceberg and Delta Lake formats, enabling workflows like an Iceberg engine to read from Delta Lake or vice versa. There will be limitations specific to how the platform has implemented this. ##### How dbt works with Iceberg catalogs[​](#how-dbt-works-with-iceberg-catalogs "Direct link to How dbt works with Iceberg catalogs") dbt interacts with Iceberg catalogs through the adapters in two ways: * **Model Materialization:** When dbt materializes a model as a table or view, if the catalog integration is declared, the underlying adapter (Spark, Trino, Snowflake, etc.) creates an Iceberg table entry in the specified catalog, both built-in or external. * **Catalog Integration**: With our initial release of the new catalog framework, users can declare which catalog the table's metadata is written to. Why is this important? dbt uses and creates a significant amount of metadata. Before every run, dbt needs to know what already exists so it knows how to compile code (ex. resolving your `{{ref()}}` to the actual table name) and where to materialize the object. By supporting these two methods, dbt can cleverly adjust based on the environment, code logic, and use case defined in your dbt project. ##### Limitations[​](#limitations "Direct link to Limitations") To ensure that your compute engine has access to the catalog, you must provide the networking and permissions are set up correctly. This means that if you are using X warehouse with Y catalog but want to read Y catalog from Z warehouse, you need to ensure that Z warehouse can connect to Y catalog. If IP restrictions are turned on, you must resolve this by removing restrictions on allowlisting (only possible if the warehouse supports static IP addresses) or setting something like Privatelink to support this. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### About incremental models This is an introduction on incremental models, when to use them, and how they work in dbt. Incremental models in dbt is a [materialization](https://docs.getdbt.com/docs/build/materializations.md) strategy designed to efficiently update your data warehouse tables by only transforming and loading new or changed data since the last run. Instead of processing your entire dataset every time, incremental models append or update only the new rows, significantly reducing the time and resources required for your data transformations. This page will provide you with a brief overview of incremental models, their importance in data transformations, and the core concepts of incremental materializations in dbt. [![A visual representation of how incremental models work. Source: Materialization best practices guide (/best-practices/materializations/1-guide-overview)](/img/docs/building-a-dbt-project/incremental-diagram.jpg?v=2 "A visual representation of how incremental models work. Source: Materialization best practices guide (/best-practices/materializations/1-guide-overview)")](#)A visual representation of how incremental models work. Source: Materialization best practices guide (/best-practices/materializations/1-guide-overview) Learn by video! For video tutorials on Incremental models , go to dbt Learn and check out the [Incremental models](https://learn.getdbt.com/courses/incremental-models) [ course](https://learn.getdbt.com/courses/incremental-models). #### Understand incremental models[​](#understand-incremental-models "Direct link to Understand incremental models") Incremental models enable you to significantly reduce the build time by just transforming new records. This is particularly useful for large datasets, where the cost of processing the entire dataset is high. Incremental models [require extra configuration](https://docs.getdbt.com/docs/build/incremental-models.md) and are an advanced usage of dbt. We recommend using them when your dbt runs are becoming too slow. ##### When to use an incremental model[​](#when-to-use-an-incremental-model "Direct link to When to use an incremental model") Building models as tables in your data warehouse is often preferred for better query performance. However, using `table` materialization can be computationally intensive, especially when: * Source data has millions or billions of rows. * Data transformations on the source data are computationally expensive (take a long time to execute) and complex, like when using Regex or UDFs. Incremental models offer a balance between complexity and improved performance compared to `view` and `table` materializations and offer better performance of your dbt runs. In addition to these considerations for incremental models, it's important to understand their limitations and challenges, particularly with large datasets. For more insights into efficient strategies, performance considerations, and the handling of late-arriving data in incremental models, refer to the [On the Limits of Incrementality](https://discourse.getdbt.com/t/on-the-limits-of-incrementality/303) discourse discussion or to our [Materialization best practices](https://docs.getdbt.com/best-practices/materializations/2-available-materializations.md) page. ##### How incremental models work in dbt[​](#how-incremental-models-work-in-dbt "Direct link to How incremental models work in dbt") dbt's [incremental materialization strategy](https://docs.getdbt.com/docs/build/incremental-strategy.md) works differently on different databases. Where supported, a `merge` statement is used to insert new records and update existing records. On warehouses that do not support `merge` statements, a merge is implemented by first using a `delete` statement to delete records in the target table that are to be updated, and then an `insert` statement. Transaction management, a process used in certain data platforms, ensures that a set of actions is treated as a single unit of work (or task). If any part of the unit of work fails, dbt will roll back open transactions and restore the database to a good state. #### Related docs[​](#related-docs "Direct link to Related docs") * [Incremental models](https://docs.getdbt.com/docs/build/incremental-models.md) to learn how to configure incremental models in dbt. * [Incremental strategies](https://docs.getdbt.com/docs/build/incremental-strategy.md) to understand how dbt implements incremental models on different databases. * [Microbatch](https://docs.getdbt.com/docs/build/incremental-microbatch.md) to understand a new incremental strategy intended for efficient and resilient processing of very large time-series datasets. * [Materializations best practices](https://docs.getdbt.com/best-practices/materializations/1-guide-overview.md) to learn about the best practices for using materializations in dbt. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### About incremental strategy Incremental strategies for materializations optimize performance by defining how to handle new and changed data. There are various strategies to implement the concept of incremental materializations. The value of each strategy depends on: * The volume of data. * The reliability of your `unique_key`. * The support of certain features in your data platform. An optional `incremental_strategy` config is provided in some adapters that controls the code that dbt uses to build incremental models. Microbatch The [`microbatch` incremental strategy](https://docs.getdbt.com/docs/build/incremental-microbatch.md) is intended for large time-series datasets. dbt will process the incremental model in multiple queries (or "batches") based on a configured `event_time` column. Depending on the volume and nature of your data, this can be more efficient and resilient than using a single query for adding new data. ##### Supported incremental strategies by adapter[​](#supported-incremental-strategies-by-adapter "Direct link to Supported incremental strategies by adapter") This table shows the support of each incremental strategy across adapters available on dbt's [Latest release track](https://docs.getdbt.com/docs/dbt-versions/cloud-release-tracks.md). Some strategies may be unavailable if you're not on **Latest** and the feature hasn't been released to the **Compatible** track. If you're interested in an adapter available in dbt Core only, check out the [adapter's individual configuration page](https://docs.getdbt.com/reference/resource-configs.md) for more details. Click the name of the adapter in the following table for more information about supported incremental strategies: | Data platform adapter | `append` | `merge` | `delete+insert` | `insert_overwrite` | `microbatch` | | ----------------------------------------------------------------------------------------------------------------------------------------- | -------- | ------- | --------------- | ------------------ | ------------ | | [dbt-postgres](https://docs.getdbt.com/reference/resource-configs/postgres-configs.md#incremental-materialization-strategies) | ✅ | ✅ | ✅ | | ✅ | | [dbt-redshift](https://docs.getdbt.com/reference/resource-configs/redshift-configs.md#incremental-materialization-strategies) | ✅ | ✅ | ✅ | | ✅ | | [dbt-bigquery](https://docs.getdbt.com/reference/resource-configs/bigquery-configs.md#merge-behavior-incremental-models) | | ✅ | | ✅ | ✅ | | [dbt-spark](https://docs.getdbt.com/reference/resource-configs/spark-configs.md#incremental-models) | ✅ | ✅ | | ✅ | ✅ | | [dbt-databricks](https://docs.getdbt.com/reference/resource-configs/databricks-configs.md#incremental-models) | ✅ | ✅ | ✅ | ✅ | ✅ | | [dbt-snowflake](https://docs.getdbt.com/reference/resource-configs/snowflake-configs.md#merge-behavior-incremental-models) | ✅ | ✅ | ✅ | ✅ | ✅ | | [dbt-trino](https://docs.getdbt.com/reference/resource-configs/trino-configs.md#incremental) | ✅ | ✅ | ✅ | | ✅ | | [dbt-fabric](https://docs.getdbt.com/reference/resource-configs/fabric-configs.md#incremental) | ✅ | ✅ | ✅ | | | | [dbt-athena](https://docs.getdbt.com/reference/resource-configs/athena-configs.md#incremental-models) | ✅ | ✅ | | ✅ | ✅ | | [dbt-teradata](https://docs.getdbt.com/reference/resource-configs/teradata-configs.md#valid_history-incremental-materialization-strategy) | ✅ | ✅ | ✅ | | ✅ | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | ##### Configuring incremental strategy[​](#configuring-incremental-strategy "Direct link to Configuring incremental strategy") The `incremental_strategy` config can either be defined in specific models or for all models in your `dbt_project.yml` file: dbt\_project.yml ```yaml models: +incremental_strategy: "insert_overwrite" ``` or: models/my\_model.sql ```sql {{ config( materialized='incremental', unique_key='date_day', incremental_strategy='delete+insert', ... ) }} select ... ``` ##### Strategy-specific configs[​](#strategy-specific-configs "Direct link to Strategy-specific configs") If you use the `merge` strategy and specify a `unique_key`, by default, dbt will entirely overwrite matched rows with new values. On adapters which support the `merge` strategy, you may optionally pass a list of column names to a `merge_update_columns` config. In that case, dbt will update *only* the columns specified by the config, and keep the previous values of other columns. models/my\_model.sql ```sql {{ config( materialized = 'incremental', unique_key = 'id', merge_update_columns = ['email', 'ip_address'], ... ) }} select ... ``` Alternatively, you can specify a list of columns to exclude from being updated by passing a list of column names to a `merge_exclude_columns` config. models/my\_model.sql ```sql {{ config( materialized = 'incremental', unique_key = 'id', merge_exclude_columns = ['created_at'], ... ) }} select ... ``` ##### About incremental\_predicates[​](#about-incremental_predicates "Direct link to About incremental_predicates") `incremental_predicates` is an advanced use of incremental models, where data volume is large enough to justify additional investments in performance. This config accepts a list of any valid SQL expression(s). dbt does not check the syntax of the SQL statements. This an example of a model configuration in a `yml` file you might expect to see on Snowflake: ```yml models: - name: my_incremental_model config: materialized: incremental unique_key: id # this will affect how the data is stored on disk, and indexed to limit scans cluster_by: ['session_start'] incremental_strategy: merge # this limits the scan of the existing table to the last 7 days of data incremental_predicates: ["DBT_INTERNAL_DEST.session_start > dateadd(day, -7, current_date)"] # `incremental_predicates` accepts a list of SQL statements. # `DBT_INTERNAL_DEST` and `DBT_INTERNAL_SOURCE` are the standard aliases for the target table and temporary table, respectively, during an incremental run using the merge strategy. ``` Alternatively, here are the same configurations configured within a model file: ```sql -- in models/my_incremental_model.sql {{ config( materialized = 'incremental', unique_key = 'id', cluster_by = ['session_start'], incremental_strategy = 'merge', incremental_predicates = [ "DBT_INTERNAL_DEST.session_start > dateadd(day, -7, current_date)" ] ) }} ... ``` This will template (in the `dbt.log` file) a `merge` statement like: ```sql merge into DBT_INTERNAL_DEST from DBT_INTERNAL_SOURCE on -- unique key DBT_INTERNAL_DEST.id = DBT_INTERNAL_SOURCE.id and -- custom predicate: limits data scan in the "old" data / existing table DBT_INTERNAL_DEST.session_start > dateadd(day, -7, current_date) when matched then update ... when not matched then insert ... ``` Limit the data scan of *upstream* tables within the body of their incremental model SQL, which will limit the amount of "new" data processed/transformed. ```sql with large_source_table as ( select * from {{ ref('large_source_table') }} {% if is_incremental() %} where session_start >= dateadd(day, -3, current_date) {% endif %} ), ... ``` info The syntax depends on how you configure your `incremental_strategy`: * If using the `merge` strategy, you may need to explicitly alias any columns with either `DBT_INTERNAL_DEST` ("old" data) or `DBT_INTERNAL_SOURCE` ("new" data). * There's a decent amount of conceptual overlap with the `insert_overwrite` incremental strategy. ##### Built-in strategies[​](#built-in-strategies "Direct link to Built-in strategies") Before diving into [custom strategies](#custom-strategies), it's important to understand the built-in incremental strategies in dbt and their corresponding macros: | `incremental_strategy` | Corresponding macro | | ------------------------------------------------------------------------------------------------- | -------------------------------------- | | [`append`](https://docs.getdbt.com/docs/build/incremental-strategy.md#append) | `get_incremental_append_sql` | | [`delete+insert`](https://docs.getdbt.com/docs/build/incremental-strategy.md#deleteinsert) | `get_incremental_delete_insert_sql` | | [`merge`](https://docs.getdbt.com/docs/build/incremental-strategy.md#merge) | `get_incremental_merge_sql` | | [`insert_overwrite`](https://docs.getdbt.com/docs/build/incremental-strategy.md#insert_overwrite) | `get_incremental_insert_overwrite_sql` | | [`microbatch`](https://docs.getdbt.com/docs/build/incremental-strategy.md#microbatch) | `get_incremental_microbatch_sql` | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | For example, a built-in strategy for the `append` can be defined and used with the following files: macros/append.sql ```sql {% macro get_incremental_append_sql(arg_dict) %} {% do return(some_custom_macro_with_sql(arg_dict["target_relation"], arg_dict["temp_relation"], arg_dict["unique_key"], arg_dict["dest_columns"], arg_dict["incremental_predicates"])) %} {% endmacro %} {% macro some_custom_macro_with_sql(target_relation, temp_relation, unique_key, dest_columns, incremental_predicates) %} {%- set dest_cols_csv = get_quoted_csv(dest_columns | map(attribute="name")) -%} insert into {{ target_relation }} ({{ dest_cols_csv }}) ( select {{ dest_cols_csv }} from {{ temp_relation }} ) {% endmacro %} ``` Define a model models/my\_model.sql: ```sql {{ config( materialized="incremental", incremental_strategy="append", ) }} select * from {{ ref("some_model") }} ``` ###### About built-in incremental strategies[​](#about-built-in-incremental-strategies "Direct link to About built-in incremental strategies") ###### `append`[​](#append "Direct link to append") The `append` strategy is simple to implement and has low processing costs. It inserts selected records into the destination table without updating or deleting existing data. This strategy doesn’t align directly with type 1 or type 2 [slowly changing dimensions](https://www.thoughtspot.com/data-trends/data-modeling/slowly-changing-dimensions-in-data-warehouse) (SCD). It differs from SCD1, which overwrites existing records, and only loosely resembles SCD2. While it adds new rows (like SCD2), it doesn’t manage versioning or track historical changes explicitly. Importantly, `append` doesn't check for duplicates or verify whether a record already exists in the destination. If the same record appears multiple times in the source, it will be inserted again, potentially resulting in duplicate rows. This may not be an issue depending on your use case and data quality requirements. ###### `delete+insert`[​](#deleteinsert "Direct link to deleteinsert") The `delete+insert` strategy deletes the data for the `unique_key` from the target table and then inserts the data for those with a `unique_key`, which may be less efficient for larger datasets. It ensures updated records are fully replaced, avoiding partial updates and can be useful when a `unique_key` isn't truly unique or when `merge` is unsupported. `delete+insert` doesn't map directly to SCD logic (type 1 or 2) because it overwrites data and tracks history. For SCD2, use [dbt snapshots](https://docs.getdbt.com/docs/build/snapshots.md#what-are-snapshots), not `delete+insert`. ###### `merge`[​](#merge "Direct link to merge") `merge` inserts records with a `unique_key` that don’t exist yet in the destination table and updates records with keys that do exist — mirroring the logic of SCD1, where changes are overwritten rather than historically tracked. This strategy shouldn't be confused with `delete+insert` which deletes matching records before inserting new ones. By specifying a `unique_key` (which can be composed of one or more columns), `merge` can also help resolve duplicates. If the `unique_key` already exists in the destination table, `merge` will update the record, so you won't have duplicates. If the records don’t exist, `merge` will insert them. Note, if you use `merge` without specifying a `unique_key`, it behaves like the `append` strategy. While the `merge` strategy is useful for keeping tables current, it's best suited for smaller tables or incremental datasets. It can be expensive for large tables because it scans the entire destination table to determine what to update or insert. ###### `insert_overwrite`[​](#insert_overwrite "Direct link to insert_overwrite") The [`insert_overwrite`](https://downloads.apache.org/spark/docs/3.1.1/sql-ref-syntax-dml-insert-overwrite-table.html) strategy is used to efficiently update partitioned tables by replacing entire partitions with new data, rather than merging or updating individual rows. It overwrites only the affected partitions, not the whole table. Because it is designed for partitioned data and replaces entire partitions wholesale, it does not align with typical SCD logic, which tracks row-level history or changes. It's ideal for tables partitioned by date or another key and useful for refreshing recent or corrected data without full table rebuilds. ###### `microbatch`[​](#microbatch "Direct link to microbatch") [`microbatch`](https://docs.getdbt.com/docs/build/incremental-microbatch.md#what-is-microbatch-in-dbt) is an incremental strategy designed for processing large time-series datasets by splitting the data into time-based batches (for example, daily or hourly). It supports [parallel batch execution](https://docs.getdbt.com/docs/build/parallel-batch-execution.md#how-parallel-batch-execution-works) for faster runs. For details on which incremental strategies are supported by each adapter, refer to the section [Supported incremental strategies by adapter](https://docs.getdbt.com/docs/build/incremental-strategy.md#supported-incremental-strategies-by-adapter). ##### Custom strategies[​](#custom-strategies "Direct link to Custom strategies") limited support Custom strategies are not currently supported on the BigQuery and Spark adapters. From dbt v1.2 and onwards, users have an easier alternative to [creating an entirely new materialization](https://docs.getdbt.com/guides/create-new-materializations.md). They define and use their own "custom" incremental strategies by: 1. Defining a macro named `get_incremental_STRATEGY_sql`. Note that `STRATEGY` is a placeholder and you should replace it with the name of your custom incremental strategy. 2. Configuring `incremental_strategy: STRATEGY` within an incremental model. dbt won't validate user-defined strategies, it will just look for the macro by that name, and raise an error if it can't find one. For example, a user-defined strategy named `insert_only` can be defined and used with the following files: macros/my\_custom\_strategies.sql ```sql {% macro get_incremental_insert_only_sql(arg_dict) %} {% do return(some_custom_macro_with_sql(arg_dict["target_relation"], arg_dict["temp_relation"], arg_dict["unique_key"], arg_dict["dest_columns"], arg_dict["incremental_predicates"])) %} {% endmacro %} {% macro some_custom_macro_with_sql(target_relation, temp_relation, unique_key, dest_columns, incremental_predicates) %} {%- set dest_cols_csv = get_quoted_csv(dest_columns | map(attribute="name")) -%} insert into {{ target_relation }} ({{ dest_cols_csv }}) ( select {{ dest_cols_csv }} from {{ temp_relation }} ) {% endmacro %} ``` models/my\_model.sql ```sql {{ config( materialized="incremental", incremental_strategy="insert_only", ... ) }} ... ``` If you use a custom microbatch macro, set a [`require_batched_execution_for_custom_microbatch_strategy` behavior flag](https://docs.getdbt.com/reference/global-configs/behavior-changes.md#custom-microbatch-strategy) in your `dbt_project.yml` to enable batched execution of your custom strategy. ##### Custom strategies from a package[​](#custom-strategies-from-a-package "Direct link to Custom strategies from a package") To use the `merge_null_safe` custom incremental strategy from the `example` package: * [Install the package](https://docs.getdbt.com/docs/build/packages.md#how-do-i-add-a-package-to-my-project) * Add the following macro to your project: macros/my\_custom\_strategies.sql ```sql {% macro get_incremental_merge_null_safe_sql(arg_dict) %} {% do return(example.get_incremental_merge_null_safe_sql(arg_dict)) %} {% endmacro %} ``` ##### Questions from the Community[​](#questions-from-the-community "Direct link to Questions from the Community") ![Loading](/img/loader-icon.svg)[Ask the Community](https://discourse.getdbt.com/new-topic?category=help\&tags=incremental "Ask the Community") #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### About MetricFlow This guide introduces MetricFlow's fundamental ideas for people new to this feature. MetricFlow, which powers the Semantic Layer, helps you define and manage the logic for your company's metrics. It's an opinionated set of abstractions and helps data consumers retrieve metric datasets from a data platform quickly and efficiently. MetricFlow handles SQL query construction and defines the specification for dbt semantic models and metrics. It allows you to define metrics in your dbt project and query them with [MetricFlow commands](https://docs.getdbt.com/docs/build/metricflow-commands.md) whether in dbt or dbt Core. #### Prerequisites[​](#prerequisites "Direct link to Prerequisites") Before you start, consider the following guidelines: #### MetricFlow[​](#metricflow "Direct link to MetricFlow") MetricFlow is a SQL query generation tool designed to streamline metric creation across different data dimensions for diverse business needs. * It operates through YAML files, where a semantic graph links language to data. This graph comprises [semantic models](https://docs.getdbt.com/docs/build/semantic-models.md) (data entry points) and [metrics](https://docs.getdbt.com/docs/build/metrics-overview.md) (functions for creating quantitative indicators). * MetricFlow is developed and maintained as part of the [Open Semantic Interchange (OSI)](https://www.snowflake.com/en/blog/open-semantic-interchange-ai-standard/) initiative. * MetricFlow is compatible with dbt version 1.6 and higher. * MetricFlow is distributed under the [Apache 2.0 license](https://github.com/dbt-labs/metricflow/blob/main/LICENSE). Data practitioners and enthusiasts are highly encouraged to contribute. Read more about [MetricFlow's license history](https://github.com/dbt-labs/metricflow?tab=readme-ov-file#license-history). * As a part of the Semantic Layer, MetricFlow empowers organizations to define metrics using YAML abstractions. * To query metric dimensions, dimension values, and validate configurations, use [MetricFlow commands](https://docs.getdbt.com/docs/build/metricflow-commands.md). note MetricFlow doesn't support dbt [builtin functions or packages](https://docs.getdbt.com/reference/dbt-jinja-functions/builtins.md) at this time, however, support is planned for the future. MetricFlow abides by these principles: * **Flexibility with completeness**: Define metric logic using flexible abstractions on any data model. * **DRY (Don't Repeat Yourself)**: Minimize redundancy by enabling metric definitions whenever possible. * **Simplicity with gradual complexity:** Approach MetricFlow using familiar data modeling concepts. * **Performance and efficiency**: Optimize performance while supporting centralized data engineering and distributed logic ownership. ##### Semantic graph[​](#semantic-graph "Direct link to Semantic graph") We're introducing a new concept: a "semantic graph". It's the relationship between semantic models and YAML configurations that creates a data landscape for building metrics. You can think of it like a map, where tables are like locations, and the connections between them (edges) are like roads. Although it's under the hood, the semantic graph is a subset of the DAG, and you can see the semantic models as nodes on the DAG. The semantic graph helps us decide which information is available to use for consumption and which is not. The connections between tables in the semantic graph are more about relationships between the information. This is different from the DAG, where the connections show dependencies between tasks. When MetricFlow generates a metric, it uses its SQL engine to figure out the best path between tables using the framework defined in YAML files for semantic models and metrics. When these models and metrics are correctly defined, they can be used downstream with Semantic Layer's integrations. ##### Semantic models[​](#semantic-models "Direct link to Semantic models") Semantic models are the starting points of your data and correspond to models in your dbt project. You can create multiple semantic models from each model. Semantic models have metadata, like a data table, that define important information such as the table name and primary keys for the graph to be navigated correctly. For a semantic model, there are three main pieces of metadata: * [Entities](https://docs.getdbt.com/docs/build/entities.md): The join keys of your semantic model (think of these as the traversal paths, or edges between semantic models). * [Dimensions](https://docs.getdbt.com/docs/build/dimensions.md): These are the ways you want to group or slice/dice your metrics. ##### Metrics[​](#metrics "Direct link to Metrics") MetricFlow supports different metric types: #### Use case[​](#use-case "Direct link to Use case") In the upcoming sections, we'll show how data practitioners currently calculate metrics and compare it to how MetricFlow makes defining metrics easier and more flexible. The following example data is based on the Jaffle Shop repo. You can view the complete [dbt project](https://github.com/dbt-labs/jaffle-sl-template). The tables we're using in our example model are: * `orders` is a production data platform export that has been cleaned up and organized for analytical consumption * `customers` is a partially denormalized table in this case with a column derived from the orders table through some upstream process To make this more concrete, consider the metric `order_total`, which is defined using the SQL expression: `select sum(order_total) as order_total from orders` This expression calculates the total revenue for all orders by summing the `order_total` column in the orders table. In a business setting, the metric `order_total` is often calculated according to different categories, such as: * Time, for example `date_trunc(ordered_at, 'day')` * Order Type, using `is_food_order` dimension from the `orders` table ##### Calculate metrics[​](#calculate-metrics "Direct link to Calculate metrics") Next, we'll compare how data practitioners currently calculate metrics with multiple queries versus how MetricFlow simplifies and streamlines the process. * Calculate with multiple queries * Calculate with MetricFlow The following example displays how data practitioners typically would calculate the `order_total` metric aggregated. It's also likely that analysts are asked for more details on a metric, like how much revenue came from new customers. Using the following query creates a situation where multiple analysts working on the same data, each using their own query method — this can lead to confusion, inconsistencies, and a headache for data management. ```sql select date_trunc('day',orders.ordered_at) as day, case when customers.first_ordered_at is not null then true else false end as is_new_customer, sum(orders.order_total) as order_total from orders left join customers on orders.customer_id = customers.customer_id group by 1, 2 ``` In the following three example tabs, use MetricFlow to define a semantic model that uses `order_total` as a metric and a sample schema to create consistent and accurate results — eliminating confusion, code duplication, and streamlining your workflow. * Revenue example * More dimensions example * Advanced example Similarly, you can add additional dimensions like `is_food_order` to your semantic models to incorporate even more dimensions to slice and dice your revenue `order_total`. Imagine an even more complex metric is needed, such as the amount of money earned each day from food orders from returning customers. Without MetricFlow, the data practitioner's original SQL might look like this: ```sql select date_trunc('day',orders.ordered_at) as day, sum(case when is_food_order = true then order_total else null end) as food_order, sum(orders.order_total) as sum_order_total, food_order/sum_order_total from orders left join customers on orders.customer_id = customers.customer_id where case when customers.first_ordered_at is not null then true else false end = true group by 1 ``` MetricFlow simplifies the SQL process through metric YAML configurations as shown below. You can also commit them to your git repository to ensure everyone on the data and business teams can see and approve them as the true and only source of information. #### FAQs[​](#faqs "Direct link to FAQs")  Do my datasets need to be normalized? Not at all! While a cleaned and well-modeled dataset can be extraordinarily powerful and is the ideal input, you can use any dataset from raw to fully denormalized datasets. It's recommended that you apply quality data consistency, such as filtering bad data, normalizing common objects, and data modeling of keys and tables, in upstream applications. The Semantic Layer is more efficient at doing data denormalization instead of normalization. If you have not invested in data consistency, that is okay. The Semantic Layer can take SQL queries or expressions to define consistent datasets.  Why is normalized data the ideal input? MetricFlow is built to do denormalization efficiently. There are better tools to take raw datasets and accomplish the various tasks required to build data consistency and organized data models. On the other end, by putting in denormalized data you are potentially creating redundancy which is technically challenging to manage, and you are reducing the potential granularity that MetricFlow can use to aggregate metrics.  How does the dbt Semantic Layer handle joins? The dbt Semantic Layer, powered by MetricFlow, builds joins based on the types of keys and parameters that are passed to entities. To better understand how joins are constructed, see the documentation on [join types](https://docs.getdbt.com/docs/build/join-logic.md#types-of-joins). Rather than capturing arbitrary join logic, MetricFlow captures the types of each identifier and then helps users navigate to appropriate joins. This allows us to avoid the construction of fan out and chasm joins as well as generate legible SQL.  Are entities and join keys the same thing? If it helps you to think of entities as join keys, that is very reasonable. Entities in MetricFlow have applications beyond joining two tables, such as acting as a dimension.  Can a table without a primary or unique entities have dimensions? Yes, but because a dimension is considered an attribute of the primary or unique entity of the table, they are only usable by the metrics that are defined in that table. They cannot be joined to metrics from other tables. This is common in event logs. #### Related docs[​](#related-docs "Direct link to Related docs") * [Joins](https://docs.getdbt.com/docs/build/join-logic.md) * [Validations](https://docs.getdbt.com/docs/build/validation.md) #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### About microbatch incremental models Use microbatch incremental models to process large time-series datasets efficiently. info Available for [dbt **Latest**](https://docs.getdbt.com/docs/dbt-versions/cloud-release-tracks.md) and dbt Core v1.9 or higher. If you use a custom microbatch macro, set a [distinct behavior flag](https://docs.getdbt.com/reference/global-configs/behavior-changes.md#custom-microbatch-strategy) in your `dbt_project.yml` to enable batched execution. If you don't have a custom microbatch macro, you don't need to set this flag as dbt will handle microbatching automatically for any model using the [microbatch strategy](#how-microbatch-compares-to-other-incremental-strategies). Read and participate in the discussion: [dbt Core#10672](https://github.com/dbt-labs/dbt-core/discussions/10672). Refer to [Supported incremental strategies by adapter](https://docs.getdbt.com/docs/build/incremental-strategy.md#supported-incremental-strategies-by-adapter) for a list of supported adapters. #### What is "microbatch" in dbt?[​](#what-is-microbatch-in-dbt "Direct link to What is \"microbatch\" in dbt?") Incremental models in dbt are a [materialization](https://docs.getdbt.com/docs/build/materializations.md) designed to efficiently update your data warehouse tables by only transforming and loading *new or changed data* since the last run. Instead of reprocessing an entire dataset every time, incremental models process a smaller number of rows, and then append, update, or replace those rows in the existing table. This can significantly reduce the time and resources required for your data transformations. Microbatch is an incremental strategy designed for large time-series datasets: * It relies solely on a time column ([`event_time`](https://docs.getdbt.com/reference/resource-configs/event-time.md)) to define time-based ranges for filtering. * Set the `event_time` column for your microbatch model and its direct parents (upstream models). Note that this differs from `partition_by`, which groups rows into partitions. Required For incremental microbatch models, if your upstream models don't have `event_time` configured, dbt *cannot* automatically filter them during batch processing and will perform full table scans on every batch run. To avoid this, configure `event_time` on every upstream model that should be filtered. Learn how to exclude a model from auto-filtering by [opting out of auto-filtering](https://docs.getdbt.com/docs/build/incremental-microbatch.md#opting-out-of-auto-filtering). * It complements, rather than replaces, existing incremental strategies by focusing on efficiency and simplicity in batch processing. * Unlike traditional incremental strategies, microbatch enables you to [reprocess failed batches](https://docs.getdbt.com/docs/build/incremental-microbatch.md#retry), auto-detect [parallel batch execution](https://docs.getdbt.com/docs/build/parallel-batch-execution.md), and eliminate the need to implement complex conditional logic for [backfilling](#backfills). * Note that microbatch might not be the best [strategy](https://docs.getdbt.com/docs/build/incremental-strategy.md) for all use cases. Consider other strategies for use cases such as not having a reliable `event_time` column or if you want more control over the incremental logic. Read more in [How `microbatch` compares to other incremental strategies](#how-microbatch-compares-to-other-incremental-strategies). #### How microbatch works[​](#how-microbatch-works "Direct link to How microbatch works") When dbt runs a microbatch model — whether for the first time, during incremental runs, or in specified backfills — it will split the processing into multiple queries (or "batches"), based on the `event_time` and `batch_size` you configure. Each "batch" corresponds to a single bounded time period (by default, a single day of data). Where other incremental strategies operate only on "old" and "new" data, microbatch models treat every batch as an atomic unit that can be built or replaced on its own. Each batch is independent and idempotent. This is a powerful abstraction that makes it possible for dbt to run batches [separately](#backfills), concurrently, and [retry](#retry) them independently. ##### Adapter-specific behavior[​](#adapter-specific-behavior "Direct link to Adapter-specific behavior") dbt's microbatch strategy uses the most efficient mechanism available for "full batch" replacement on each adapter. This can vary depending on the adapter: * `dbt-postgres`: Uses the `merge` strategy, which performs "update" or "insert" operations. * `dbt-redshift`: Uses the `delete+insert` strategy, which "inserts" or "replaces." * `dbt-snowflake`: Uses the `delete+insert` strategy, which "inserts" or "replaces." * `dbt-bigquery`: Uses the `insert_overwrite` strategy, which "inserts" or "replaces." * `dbt-spark`: Uses the `insert_overwrite` strategy, which "inserts" or "replaces." * `dbt-databricks`: Uses the `replace_where` strategy, which "inserts" or "replaces." Check out the [supported incremental strategies by adapter](https://docs.getdbt.com/docs/build/incremental-strategy.md#supported-incremental-strategies-by-adapter) for more info. #### Example[​](#example "Direct link to Example") A `sessions` model aggregates and enriches data that comes from two other models: * `page_views` is a large, time-series table. It contains many rows, new records almost always arrive after existing ones, and existing records rarely update. It uses the `page_view_start` column as its `event_time`. * `customers` is a relatively small dimensional table. Customer attributes update often, and not in a time-based manner — that is, older customers are just as likely to change column values as newer customers. The customers model doesn't configure an `event_time` column. As a result: * Each batch of `sessions` will filter `page_views` to the equivalent time-bounded batch. * The `customers` table isn't filtered, resulting in a full scan for every batch. tip In addition to configuring `event_time` for the target table, you should also specify it for any upstream models that you want to filter, even if they have different time columns. models/staging/page\_views.yml ```yaml models: - name: page_views config: event_time: page_view_start ``` We run the `sessions` model for October 1, 2024, and then again for October 2. It produces the following queries: * Model definition * Compiled (Oct 1, 2024) * Compiled (Oct 2, 2024) The [`event_time`](https://docs.getdbt.com/reference/resource-configs/event-time.md) for the `sessions` model is set to `session_start`, which marks the beginning of a user’s session on the website. This setting allows dbt to combine multiple page views (each tracked by their own `page_view_start` timestamps) into a single session. This way, `session_start` differentiates the timing of individual page views from the broader timeframe of the entire user session. models/sessions.sql ```sql {{ config( materialized='incremental', incremental_strategy='microbatch', event_time='session_start', begin='2020-01-01', batch_size='day' ) }} with page_views as ( -- this ref will be auto-filtered select * from {{ ref('page_views') }} ), customers as ( -- this ref won't select * from {{ ref('customers') }} ) select page_views.id as session_id, page_views.page_view_start as session_start, customers.* from page_views left join customers on page_views.customer_id = customers.id ``` target/compiled/sessions.sql ```sql with page_views as ( select * from ( -- filtered on configured event_time select * from "analytics"."page_views" where page_view_start >= '2024-10-01 00:00:00' -- Oct 1 and page_view_start < '2024-10-02 00:00:00' ) ), customers as ( select * from "analytics"."customers" ), ... ``` target/compiled/sessions.sql ```sql with page_views as ( select * from ( -- filtered on configured event_time select * from "analytics"."page_views" where page_view_start >= '2024-10-02 00:00:00' -- Oct 2 and page_view_start < '2024-10-03 00:00:00' ) ), customers as ( select * from "analytics"."customers" ), ... ``` dbt will instruct the data platform to take the result of each batch query and [insert, update, or replace](#adapter-specific-behavior) the contents of the `analytics.sessions` table for the same day of data. To perform this operation, dbt will use the most efficient atomic mechanism for "full batch" replacement that is available on each data platform. For details, see [How microbatch works](#how-microbatch-works). It does not matter whether the table already contains data for that day. Given the same input data, the resulting table is the same no matter how many times a batch is reprocessed. [![Each batch of sessions filters page\_views to the matching time-bound batch, but doesn't filter sessions, performing a full scan for each batch.](/img/docs/building-a-dbt-project/microbatch/microbatch_filters.png?v=2 "Each batch of sessions filters page_views to the matching time-bound batch, but doesn't filter sessions, performing a full scan for each batch.")](#)Each batch of sessions filters page\_views to the matching time-bound batch, but doesn't filter sessions, performing a full scan for each batch. #### Relevant configs[​](#relevant-configs "Direct link to Relevant configs") Several configurations are relevant to microbatch models, and some are required: | Config | Description | Default | Type | Required | | --------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ------- | ------- | -------- | | [`event_time`](https://docs.getdbt.com/reference/resource-configs/event-time.md) | The column indicating "at what time did the row occur." Required for your microbatch model and any direct parents that should be filtered. | N/A | Column | Required | | [`begin`](https://docs.getdbt.com/reference/resource-configs/begin.md) | The "beginning of time" for the microbatch model. This is the starting point for any initial or full-refresh builds. For example, a daily-grain microbatch model run on `2024-10-01` with `begin = '2023-10-01` will process 366 batches (it's a leap year!) plus the batch for "today." | N/A | Date | Required | | [`batch_size`](https://docs.getdbt.com/reference/resource-configs/batch-size.md) | The granularity of your batches. Supported values are `hour`, `day`, `month`, and `year` | N/A | String | Required | | [`lookback`](https://docs.getdbt.com/reference/resource-configs/lookback.md) | Process X batches prior to the latest bookmark to capture late-arriving records. | `1` | Integer | Optional | | [`concurrent_batches`](https://docs.getdbt.com/reference/resource-properties/concurrent_batches.md) | Overrides dbt's auto detect for running batches concurrently (at the same time). Read more about [configuring concurrent batches](https://docs.getdbt.com/docs/build/parallel-batch-execution.md#configure-concurrent_batches). Setting to
\* `true` runs batches concurrently (in parallel).
\* `false` runs batches sequentially (one after the other). | `None` | Boolean | Optional | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | [![The event\_time column configures the real-world time of this record](/img/docs/building-a-dbt-project/microbatch/event_time.png?v=2 "The event_time column configures the real-world time of this record")](#)The event\_time column configures the real-world time of this record ##### Required configs for specific adapters[​](#required-configs-for-specific-adapters "Direct link to Required configs for specific adapters") Some adapters require additional configurations for the microbatch strategy. This is because each adapter implements the microbatch strategy differently. The following table lists the required configurations for the specific adapters, in addition to the standard microbatch configs: | Adapter | `unique_key` config | `partition_by` config | | ------------------------------------------------------------------------------------------------------------------------------- | ------------------- | --------------------- | | [`dbt-postgres`](https://docs.getdbt.com/reference/resource-configs/postgres-configs.md#incremental-materialization-strategies) | ✅ Required | N/A | | [`dbt-spark`](https://docs.getdbt.com/reference/resource-configs/spark-configs.md#incremental-models) | N/A | ✅ Required | | [`dbt-bigquery`](https://docs.getdbt.com/reference/resource-configs/bigquery-configs.md#merge-behavior-incremental-models) | N/A | ✅ Required | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | For example, if you're using `dbt-postgres`, configure `unique_key` as follows: models/sessions.sql ```sql {{ config( materialized='incremental', incremental_strategy='microbatch', unique_key='sales_id', ## required for dbt-postgres event_time='transaction_date', begin='2023-01-01', batch_size='day' ) }} select sales_id, transaction_date, customer_id, product_id, total_amount from {{ source('sales', 'transactions') }} ``` In this example, `unique_key` is required because `dbt-postgres` microbatch uses the `merge` strategy, which needs a `unique_key` to identify which rows dbt should merge in the data warehouse. Without a `unique_key`, dbt can't match rows between the incoming batch and the existing table. ##### Full refresh[​](#full-refresh "Direct link to Full refresh") As a best practice, we recommend [configuring `full_refresh: false`](https://docs.getdbt.com/reference/resource-configs/full_refresh.md) on microbatch models so that they ignore invocations with the `--full-refresh` flag. Note that running `dbt run --full-refresh` on a microbatch model by itself won't reset or reload data unless you have a `begin` datetime config for the model. If you need to reprocess historical data, we recommend using a targeted backfill with `--event-time-start` and `--event-time-end`. You must configure both for the full refresh to successfully run. ```bash dbt run --full-refresh --event-time-start "2024-01-01" --event-time-end "2024-02-01" ``` #### Usage[​](#usage "Direct link to Usage") **You must write your model query to process (read and return) exactly one "batch" of data**. This is a simplifying assumption and a powerful one: * You don’t need to think about `is_incremental` filtering * You don't need to pick among DML strategies (upserting/merging/replacing) * You can preview your model, and see the exact records for a given batch that will appear when that batch is processed and written to the table When you run a microbatch model, dbt will evaluate which batches need to be loaded, break them up into a SQL query per batch, and load each one independently. dbt will automatically filter upstream inputs (`source` or `ref`) that define `event_time`, based on the `lookback` and `batch_size` configs for this model. Note that dbt doesn't know the minimum `event_time` in your data — it only uses the configs you provide (like `begin`, `lookback`) to decide which batches to run. If you want to process data from the actual start of your dataset, you *must* explicitly define it using the `begin` config or the `--event-time-start` flag. During standard incremental runs, dbt will process batches according to the current timestamp and the configured `lookback`, with one query per batch. [![Configure a lookback to reprocess additional batches during standard incremental runs](/img/docs/building-a-dbt-project/microbatch/microbatch_lookback.png?v=2 "Configure a lookback to reprocess additional batches during standard incremental runs")](#)Configure a lookback to reprocess additional batches during standard incremental runs ###### Opting out of auto-filtering[​](#opting-out-of-auto-filtering "Direct link to Opting out of auto-filtering") If there's an upstream model that configures `event_time`, but you *don't* want the reference to it to be filtered, you can specify `ref('upstream_model').render()` to opt out of auto-filtering. This isn't generally recommended — most models that configure `event_time` are fairly large, and if you don't filter the reference, each batch performs a full scan of this input table. #### Backfills[​](#backfills "Direct link to Backfills") Whether to fix erroneous source data or retroactively apply a change in business logic, you may need to reprocess a large amount of historical data. Backfilling a microbatch model is as simple as selecting it to run or build, and specifying a "start" and "end" for `event_time`. Note that `--event-time-start` and `--event-time-end` are mutually necessary, meaning that if you specify one, you must specify the other. As always, dbt will process the batches between the start and end as independent queries. ```bash dbt run --event-time-start "2024-09-01" --event-time-end "2024-09-04" ``` [![Configure a lookback to reprocess additional batches during standard incremental runs](/img/docs/building-a-dbt-project/microbatch/microbatch_backfill.png?v=2 "Configure a lookback to reprocess additional batches during standard incremental runs")](#)Configure a lookback to reprocess additional batches during standard incremental runs #### Retry[​](#retry "Direct link to Retry") If one or more of your batches fail, you can use `dbt retry` to reprocess *only* the failed batches. ![Partial retry](https://github.com/user-attachments/assets/f94c4797-dcc7-4875-9623-639f70c97b8f) #### Timezones[​](#timezones "Direct link to Timezones") For now, dbt assumes that all values supplied are in UTC: * `event_time` * `begin` * `--event-time-start` * `--event-time-end` While we may consider adding support for custom time zones in the future, we also believe that defining these values in UTC makes everyone's lives easier. #### How microbatch compares to other incremental strategies[​](#how-microbatch-compares-to-other-incremental-strategies "Direct link to How microbatch compares to other incremental strategies") As data warehouses roll out new operations for concurrently replacing/upserting data partitions, we may find that the new operation for the data warehouse is more efficient than what the adapter uses for microbatch. In such instances, we reserve the right the update the default operation for microbatch, so long as it works as intended/documented for models that fit the microbatch paradigm. Most incremental models rely on the end user (you) to explicitly tell dbt what "new" means, in the context of each model, by writing a filter in an `{% if is_incremental() %}` conditional block. You are responsible for crafting this SQL in a way that queries [`{{ this }}`](https://docs.getdbt.com/reference/dbt-jinja-functions/this.md) to check when the most recent record was last loaded, with an optional look-back window for late-arriving records. Other incremental strategies will control *how* the data is being added into the table — whether append-only `insert`, `delete` + `insert`, `merge`, `insert overwrite`, etc — but they all have this in common. As an example: ```sql {{ config( materialized='incremental', incremental_strategy='delete+insert', unique_key='date_day' ) }} select * from {{ ref('stg_events') }} {% if is_incremental() %} -- this filter will only be applied on an incremental run -- add a lookback window of 3 days to account for late-arriving records where date_day >= (select {{ dbt.dateadd("day", -3, "max(date_day)") }} from {{ this }}) {% endif %} ``` For this incremental model: * "New" records are those with a `date_day` greater than the maximum `date_day` that has previously been loaded * The lookback window is 3 days * When there are new records for a given `date_day`, the existing data for `date_day` is deleted and the new data is inserted Let’s take our same example from before, and instead use the new `microbatch` incremental strategy: models/staging/stg\_events.sql ```sql {{ config( materialized='incremental', incremental_strategy='microbatch', event_time='event_occurred_at', batch_size='day', lookback=3, begin='2020-01-01', full_refresh=false ) }} select * from {{ ref('stg_events') }} -- this ref will be auto-filtered ``` Where you’ve also set an `event_time` for the model’s direct parents - in this case, `stg_events`: models/staging/stg\_events.yml ```yaml models: - name: stg_events config: event_time: my_time_field ``` And that’s it! When you run the model, each batch templates a separate query. For example, if you were running the model on October 1, dbt would template separate queries for each day between September 28 and October 1, inclusive — four batches in total. The query for `2024-10-01` would look like: target/compiled/staging/stg\_events.sql ```sql select * from ( select * from "analytics"."stg_events" where my_time_field >= '2024-10-01 00:00:00' and my_time_field < '2024-10-02 00:00:00' ) ``` Based on your data platform, dbt will choose the most efficient atomic mechanism to insert, update, or replace these four batches (`2024-09-28`, `2024-09-29`, `2024-09-30`, and `2024-10-01`) in the existing table. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### About model governance dbt supports model governance to help you control who can access models, what data they contain, how they change over time, and reference them across projects. dbt supports model governance in dbt Core and the dbt platform, with some differences in the features available across environments/plans. * Use model governance to define model structure and visibility in dbt Core and the dbt platform. * dbt builds on this with features like [cross-project ref](https://docs.getdbt.com/docs/mesh/govern/project-dependencies.md) that enable collaboration at scale across multiple projects, powered by its metadata service and [Catalog](https://docs.getdbt.com/docs/explore/explore-projects.md). Available in dbt Enterprise or Enterprise+ plans. All of the following features are available in dbt Core and the dbt platform, *except* project dependencies, which is only available to [dbt Enterprise-tier plans](https://www.getdbt.com/pricing). * [**Model access**](https://docs.getdbt.com/docs/mesh/govern/model-access.md) — Mark models as "public" or "private" to distinguish between mature data products and implementation details — and to control who can `ref` each. * [**Model contracts**](https://docs.getdbt.com/docs/mesh/govern/model-contracts.md) —Guarantee the shape of a model (column names, data types, constraints) before it builds, to prevent surprises for downstream data consumers. * [**Model versions**](https://docs.getdbt.com/docs/mesh/govern/model-versions.md) — When a breaking change is unavoidable, provide a smoother upgrade pathway and deprecation window for downstream data consumers. * [**Model namespaces**](https://docs.getdbt.com/reference/dbt-jinja-functions/ref.md#ref-project-specific-models) — Organize models into [groups](https://docs.getdbt.com/docs/build/groups.md) and [packages](https://docs.getdbt.com/docs/build/packages.md) to delineate ownership boundaries. Models in different packages can share the same name, and the `ref` function can take the project/package namespace as its first argument. * [**Project dependencies**](https://docs.getdbt.com/docs/mesh/govern/project-dependencies.md) — Resolve references to public models in other projects ("cross-project ref") using an always-on stateful metadata service, instead of importing all models from those projects as packages. Each project serves data products (public model references) while managing its own implementation details, enabling an [enterprise data mesh](https://docs.getdbt.com/best-practices/how-we-mesh/mesh-1-intro.md). [Enterprise](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing")[Enterprise +](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing") ###### Considerations[​](#considerations "Direct link to Considerations") There are some considerations to keep in mind when using model governance features: * Model governance features like model access, contracts, and versions strengthen trust and stability in your dbt project. Because they add structure, they can make rollbacks harder (for example, removing model access) and increase maintenance if adopted too early. Before adding governance features, consider whether your dbt project is ready to benefit from them. Introducing governance while models are still changing can complicate future changes. * Governance features are model-specific. They don't apply to other resource types, including snapshots, seeds, or sources. This is because these objects can change structure over time (for example, snapshots capture evolving historical data) and aren't suited to guarantees like contracts, access, or versioning. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### About profiles.yml If you're using dbt from the command line, you need a `profiles.yml` file that contains the connection details for your data platform. dbt platform accounts dbt platform projects don't require a profiles.yml file unless you're developing from your local machine instead of the cloud-based UI. #### About profiles.yml[​](#about-profilesyml "Direct link to About profiles.yml") The `profiles.yml` file stores database connection credentials and configuration for dbt projects, including: * **Connection details** — Account identifiers, hosts, ports, and authentication credentials. * **Target definitions** — Define different environments (dev, staging, prod) within a single profile. * **Default target** — Set which environment to use by default. * **Execution parameters** — Thread count, timeouts, and retry settings. * **Credential separation** — Keep sensitive information out of version control. The `profile` field in [`dbt_project.yml`](https://docs.getdbt.com/reference/dbt_project.yml.md) references a profile name defined in `profiles.yml`. #### Location of profiles.yml[​](#location-of-profilesyml "Direct link to Location of profiles.yml") Only one `profiles.yml` file is required and it can manage multiple projects and connections. * dbt Fusion * dbt Core Fusion searches for the parent directory of `profiles.yml` in the following order and uses the first location it finds: 1. `--profiles-dir` flag — Override for CI/CD or testing. 2. Project root directory — Project-specific credentials. 3. `~/.dbt/` directory (Recommended location) — Shared across all projects. dbt Core searches for the parent directory of `profiles.yml` in the following order and uses the first location it finds: 1. `--profiles-dir` flag 2. `DBT_PROFILES_DIR` environment variable 3. Current working directory 4. `~/.dbt/` directory (Recommended location) Note: dbt Core supports using the `DBT_PROFILES_DIR` environment variable or a `profiles.yml` file in the current working directory. These options aren't currently supported in Fusion. `~/.dbt/profiles.yml` is the recommended location for the following reasons: * **Security** — Keeps credentials out of project directories and version control. * **Reusability** — A single file for all dbt projects on the machine. * **Separation** — Connection details don't travel with project code. ###### When should I use project root?[​](#when-should-i-use-project-root "Direct link to When should I use project root?") Place your `profiles.yml` file in the project root directory for: * Self-contained demo or tutorial projects. * Docker containers with baked-in credentials. * CI/CD pipelines with environment-specific configs. #### Create and configure the `profiles.yml` file[​](#create-and-configure-the-profilesyml-file "Direct link to create-and-configure-the-profilesyml-file") The easiest way to create and configure a `profiles.yml` file is to execute `dbt init` after you've installed dbt on your machine. This takes you through the process of configuring an adapter and places the file into the recommended `~/.dbt/` location. If your project has an existing `profiles.yml` file, running `dbt init` will prompt you to amend or overwrite it. If you select the existing adapter for configuration, dbt will automatically populate the existing values. You can also manually create the file and add it to the proper location. To configure an adapter manually, copy and paste the fields from the adapter setup instructions for [dbt Core](https://docs.getdbt.com/docs/local/connect-data-platform/about-dbt-connections.md) or [Fusion](https://docs.getdbt.com/docs/local/profiles.yml.md) along with the appropriate values for each. ##### Example configuration[​](#example-configuration "Direct link to Example configuration") To set up your profile, copy the correct sample profile for your warehouse into your `profiles.yml` file and update the details as follows: * Profile name: Replace the name of the profile with a sensible name – it’s often a good idea to use the name of your organization. Make sure that this is the same name as the `profile` indicated in your `dbt_project.yml` file. * `target`: This is the default target your dbt project will use. It must be one of the targets you define in your profile. Commonly it is set to `dev`. * Populating your `outputs`: * `type`: The type of data warehouse you are connecting to * Warehouse credentials: Get these from your database administrator if you don’t already have them. Remember that user credentials are very sensitive information that should not be shared. May include fields like `account`, `username`, and `password`. * `schema`: The default schema that dbt will build objects in. * `threads`: The number of threads the dbt project will run on. The following example highlighs the format of the `profiles.yml` file. Note that many of the configs are adapter-specific and their syntax varies. \~/.dbt/profiles.yml ```yml my_project_profile: # Profile name (matches dbt_project.yml) target: dev # Default target to use outputs: dev: # Development environment type: adapter_type # Required: snowflake, bigquery, databricks, redshift, postgres, etc # Connection identifiers (placeholder examples, see adapter-specific pages for supported configs) account: abc123 database: docs_team schema: dev_schema # Authentication (adapter-specific) auth_method: username_password username: username password_credentials: password # Execution settings (common across adapters) threads: 4 # Number of parallel threads # Multiple profiles (for multiple projects) my_second_project_profile: target: dev outputs: dev: type: snowflake # Example adapter account: account user: user password: password database: database schema: schema warehouse: warehouse threads: 4 ``` ##### Environment variables[​](#environment-variables "Direct link to Environment variables") Use environment variables to keep sensitive credentials out of your `profiles.yml` file. Check out the [env\_var](https://docs.getdbt.com/reference/dbt-jinja-functions/env_var.md) reference for more information. Example: \~/.dbt/profiles.yml ```yml my_profile: target: dev outputs: dev: type: ADAPTER_NAME account: "{{ env_var("ADAPTER_ACCOUNT") }}" user: "{{ env_var("ADAPTER_USER") }}" password: "{{ env_var("ADAPTER_PASSWORD") }}" database: "{{ env_var("ADAPTER_DATABASE") }}" schema: "{{ env_var("ADAPTER_SCHEMA") }}" warehouse: "{{ env_var("ADAPTER_WAREHOUSE") }}" role: "{{ env_var("ADAPTER_ROLE") }}" threads: 4 ``` #### User config[​](#user-config "Direct link to User config") You can set default values of global configs for all projects that you run using your local machine. Refer to [About global configs](https://docs.getdbt.com/reference/global-configs/about-global-configs.md) for details. #### Understanding targets in profiles[​](#understanding-targets-in-profiles "Direct link to Understanding targets in profiles") dbt supports multiple targets within one profile to encourage the use of separate development and production environments as discussed in [dbt environments](https://docs.getdbt.com/docs/local/dbt-core-environments.md). A typical profile for an analyst using dbt locally will have a target named `dev`, and have this set as the default. You may also have a `prod` target within your profile, which creates the objects in your production schema. However, since it's often desirable to perform production runs on a schedule, we recommend deploying your dbt project to a separate machine other than your local machine. Most dbt users only have a `dev` target in their profile on their local machine. If you do have multiple targets in your profile, and want to use a target other than the default, you can do this using the `--target` flag when running a dbt command. For example, to run against your `prod` target instead of the default `dev` target: ```bash dbt run --target prod ``` You can use the `--target` flag with any dbt command, such as: ```bash dbt build --target prod dbt test --target dev dbt compile --target qa ``` ##### Overriding profiles and targets[​](#overriding-profiles-and-targets "Direct link to Overriding profiles and targets") When running dbt commands, you can specify which profile and target to use from the CLI using the `--profile` and `--target` [flags](https://docs.getdbt.com/reference/global-configs/about-global-configs.md#available-flags). These flags override what’s defined in your `dbt_project.yml` as long as the specified profile and target are already defined in your `profiles.yml` file. To run your dbt project with a different profile or target than the default, you can do so using the followingCLI flags: * `--profile` flag — Overrides the profile set in `dbt_project.yml` by pointing to another profile defined in `profiles.yml`. * `--target` flag — Specifies the target within that profile to use (as defined in `profiles.yml`). These flags help when you're working with multiple profiles and targets and want to override defaults without changing your files. ```bash dbt run --profile my-profile-name --target dev ``` In this example, the `dbt run` command will use the `my-profile-name` profile and the `dev` target. #### Understanding warehouse credentials[​](#understanding-warehouse-credentials "Direct link to Understanding warehouse credentials") We recommend that each dbt user has their own set of database credentials, including a separate user for production runs of dbt – this helps debug rogue queries, simplifies ownerships of schemas, and improves security. To ensure the user credentials you use in your target allow dbt to run, you will need to ensure the user has appropriate privileges. While the exact privileges needed varies between data warehouses, at a minimum your user must be able to: * Read source data * Create schemas¹ * Read system tables Running dbt without create schema privileges If your user is unable to be granted the privilege to create schemas, your dbt runs should instead target an existing schema that your user has permission to create relations within. #### Understanding target schemas[​](#understanding-target-schemas "Direct link to Understanding target schemas") The target schema represents the default schema that dbt will build objects into, and is often used as the differentiator between separate environments within a warehouse. Schemas in BigQuery dbt uses the term "schema" in a target across all supported warehouses for consistency. Note that in the case of BigQuery, a schema is actually a dataset. The schema used for production should be named in a way that makes it clear that it is ready for end-users to use for analysis – we often name this `analytics`. In development, a pattern we’ve found to work well is to name the schema in your `dev` target `dbt_`. Suffixing your name to the schema enables multiple users to develop in dbt, since each user will have their own separate schema for development, so that users will not build over the top of each other, and ensuring that object ownership and permissions are consistent across an entire schema. Note that there’s no need to create your target schema beforehand – dbt will check if the schema already exists when it runs, and create it if it doesn’t. While the target schema represents the default schema that dbt will use, it may make sense to split your models into separate schemas, which can be done by using [custom schemas](https://docs.getdbt.com/docs/build/custom-schemas.md). #### Understanding threads[​](#understanding-threads "Direct link to Understanding threads") When dbt runs, it creates a directed acyclic graph (DAG) of links between models. The number of threads represents the maximum number of paths through the graph dbt may work on at once – increasing the number of threads can minimize the run time of your project. The default value for threads in user profiles is 4 threads. For more information, check out [using threads](https://docs.getdbt.com/docs/running-a-dbt-project/using-threads.md). #### Related docs[​](#related-docs "Direct link to Related docs") * [Install dbt](https://docs.getdbt.com/docs/local/install-dbt.md) * [Connection profiles](https://docs.getdbt.com/docs/local/profiles.yml.md) #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### About state-aware orchestration Private previewEnterpriseEnterprise + ### About state-aware orchestration [Private preview](https://docs.getdbt.com/docs/dbt-versions/product-lifecycles "Go to https://docs.getdbt.com/docs/dbt-versions/product-lifecycles")[Enterprise](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing")[Enterprise +](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing") Every time a job runs, state-aware orchestration automatically determines which models to build by detecting changes in code or data. important The dbt Fusion engine is currently available for installation in: * [Local command line interface (CLI) tools](https://docs.getdbt.com/docs/local/install-dbt.md?version=2#get-started) [Preview](https://docs.getdbt.com/docs/dbt-versions/product-lifecycles "Go to https://docs.getdbt.com/docs/dbt-versions/product-lifecycles") * [VS Code and Cursor with the dbt extension](https://docs.getdbt.com/docs/install-dbt-extension.md) [Preview](https://docs.getdbt.com/docs/dbt-versions/product-lifecycles "Go to https://docs.getdbt.com/docs/dbt-versions/product-lifecycles") * [dbt platform environments](https://docs.getdbt.com/docs/dbt-versions/upgrade-dbt-version-in-cloud.md#dbt-fusion-engine) [Private preview](https://docs.getdbt.com/docs/dbt-versions/product-lifecycles "Go to https://docs.getdbt.com/docs/dbt-versions/product-lifecycles") Join the conversation in our Community Slack channel [`#dbt-fusion-engine`](https://getdbt.slack.com/archives/C088YCAB6GH). Read the [Fusion Diaries](https://github.com/dbt-labs/dbt-fusion/discussions/categories/announcements) for the latest updates. State-aware orchestration saves you compute costs and reduces runtime because when a job runs, it checks for new records and only builds the models that will change. [![Fusion powered state-aware orchestration](/img/docs/deploy/sao.gif?v=2 "Fusion powered state-aware orchestration")](#)Fusion powered state-aware orchestration We built dbt's state-aware orchestration on these four core principles: * **Real-time shared state:** All jobs write to a real-time shared model-level state, allowing dbt to rebuild only changed models regardless of which jobs the model is built in. * **Model-level queueing:** Jobs queue up at the model-level so you can avoid any 'collisions' and prevent rebuilding models that were just updated by another job. * **State-aware and state agnostic support:** You can build jobs dynamically (state-aware) or explicitly (state-agnostic). Both approaches update shared state so everything is kept in sync. * **Sensible defaults:** State-aware orchestration works out-of-the-box (natively), with an optional configuration setting for more advanced controls. For more information, refer to [state-aware advanced configurations](https://docs.getdbt.com/docs/deploy/state-aware-setup.md#advanced-configurations). note State-aware orchestration does not depend on [static analysis](https://docs.getdbt.com/docs/fusion/new-concepts.md#principles-of-static-analysis) and works even when `static_analysis` is disabled. #### Optimizing builds with state-aware orchestration[​](#optimizing-builds-with-state-aware-orchestration "Direct link to Optimizing builds with state-aware orchestration") State-aware orchestration uses shared state tracking to determine which models need to be built by detecting changes in code or data every time a job runs. It also supports custom refresh intervals and custom source freshness configurations, so dbt only rebuilds models when they're actually needed. For example, you can configure your project so that dbt skips rebuilding the `dim_wizards` model (and its parents) if they’ve already been refreshed within the last 4 hours, even if the job itself runs more frequently. Without configuring anything, dbt's state-aware orchestration automatically knows to build your models either when the code has changed or if there’s any new data in a source (or upstream model in the case of [dbt Mesh](https://docs.getdbt.com/docs/mesh/about-mesh.md)). **Note:** When a model fails a [data test](https://docs.getdbt.com/docs/build/data-tests.md), state-aware orchestration rebuilds it on subsequent runs instead of reusing it from prior state. This ensures dbt reevaluates models with unresolved data quality issues. ##### Handling concurrent jobs[​](#handling-concurrent-jobs "Direct link to Handling concurrent jobs") If two separate jobs both depend on the same downstream model (for example, `model_ab`) and both detect upstream changes (`updates_on = any`), `model_ab` could run twice — once for each job. However, if `model_ab` was already built and nothing has changed since that build, neither job will rebuild it. Instead, both jobs will reuse the existing version instead of rebuilding. Under state-aware orchestration, all jobs read and write from the same shared state and build a model only when either the code or data state has changed. This means that each job individually evaulates whether a model needs rebuilding based on the model’s compiled code and upstream data state. What happens when jobs overlap: * If both jobs reach the same model at exactly the same time, one job waits until the other finishes. This is to prevent collisions in the data warehouse when two jobs try to build the same model at the same time. * After the first job finishes building the model, the second job still checks whether a rebuild is needed. If there are new data or code changes to incorporate, the second job builds the model again. If there are no changes and building the model would produce the same result, the second job reuses the model. To prevent a job from being built too frequently even when the code or data state has changed, you can reduce build frequency by using the `build_after` config. For information on how to use `build_after`, refer to [Model freshness](https://docs.getdbt.com/reference/resource-configs/freshness.md) and [Advanced configurations](https://docs.getdbt.com/docs/deploy/state-aware-setup.md#advanced-configurations). ##### Handling deleted tables[​](#handling-deleted-tables "Direct link to Handling deleted tables") State-aware orchestration detects and rebuilds models when their tables are deleted in the warehouse, even if there are no code or data changes. When a table is deleted in the warehouse: * dbt raises a warning that the expected table is missing. * The affected model is queued for rebuild during the current run, even if there are no code or data changes. This behavior ensures consistency between the dbt state and the actual warehouse state. It also reduces the need to manually clear cache or disable state-aware orchestration when models are modified outside of dbt. #### Efficient testing in state-aware orchestration [Private beta](https://docs.getdbt.com/docs/dbt-versions/product-lifecycles "Go to https://docs.getdbt.com/docs/dbt-versions/product-lifecycles")[​](#efficient-testing-in-state-aware-orchestration- "Direct link to efficient-testing-in-state-aware-orchestration-") Private beta feature State-aware orchestration features in the dbt platform are only available in Fusion, which is in private preview. Contact your account manager to enable Fusion in your account. Data quality can get degraded in two ways: * New code changes definitions or introduces edge cases. * New data, like duplicates or unexpected values, invalidates downstream metrics. Running dbt’s out-of-the-box [data tests](https://docs.getdbt.com/docs/build/data-tests.md) (`unique`, `not_null`, `accepted_values`, `relationships`) on every build helps catch data errors before they impact business decisions. Catching these errors often requires having multiple tests on every model and running tests even when not necessary. If nothing relevant has changed, repeated test executions don’t improve coverage and only increase cost. With Fusion, dbt gains an understanding of the SQL code based on the logical plan for the compiled code. dbt then can determine when a test must run again, or when a prior upstream test result can be reused. Efficient testing in state-aware orchestration reduces warehouse costs by avoiding redundant data tests and combining multiple tests into one run. This feature includes two optimizations: * **Test reuse** — Tests are reused in cases where no logic in the code or no new data could have changed the test's outcome. * **Test aggregation** — When there are multiple tests on a model, dbt combines tests to run as a single query against the warehouse, rather than running separate queries for each test. Currently, Efficient testing is only available in deploy jobs, not in continuous integration (CI) or merge jobs. ##### Supported data tests[​](#supported-data-tests "Direct link to Supported data tests") The following tests can be reused when Efficient testing is enabled: * [`unique`](https://docs.getdbt.com/reference/resource-properties/data-tests.md#unique) * [`not_null`](https://docs.getdbt.com/reference/resource-properties/data-tests.md#not_null) * [`accepted_values`](https://docs.getdbt.com/reference/resource-properties/data-tests.md#accepted_values) ##### Enabling Efficient testing[​](#enabling-efficient-testing "Direct link to Enabling Efficient testing") Before enabling Efficient testing, make sure you have configured [`static_analysis`](https://docs.getdbt.com/docs/fusion/new-concepts.md#configuring-static_analysis). To enable Efficient testing: 1. From the main menu, go to **Orchestration** > **Jobs**. 2. Select your deploy job. Go to your job settings and click **Edit**. 3. Under **Enable Fusion cost optimization features**, expand **More options**. 4. Select **Efficient testing**. This feature is disabled by default. 5. Click **Save**. ##### Example[​](#example "Direct link to Example") In the following query, you’re joining an `orders` and a `customers` table: ```sql with orders as ( select * from {{ ref('orders') }} ), customers as ( select * from {{ ref('customers') }} ), joined as ( select customers.customer_id as customer_id, orders.order_id as order_id from customers left join orders on orders.customer_id = customers.customer_id ) select * from joined ``` * `not_null` test: A `left join` can introduce null values for customers without orders. Even if upstream tests verified `not_null(order_id)` in orders, the join can create null values downstream. dbt must always run a `not_null` test on `order_id` in this joined result. * `unique` test: If `orders.order_id` and `customers.customer_id` are unique upstream, uniqueness of `order_id` is preserved and the upstream result can be reused. ##### Limitations[​](#limitations "Direct link to Limitations") The following section lists some considerations when using Efficient testing in state-aware-orchestration: * **Aggregated tests do not support custom configs**. Tests that include the following [custom config options](https://docs.getdbt.com/reference/data-test-configs.md) will run individually rather than as part of the aggregated batch: ```yaml config: fail_calc: limit: severity: error | warn error_if: warn_if: store_failures: true | false where: ``` * **Efficient testing is available only in deploy jobs**. CI and merge jobs currently do not have the option to enable this feature. #### Related FAQs[​](#related-faqs "Direct link to Related FAQs") How is state-aware orchestration different from using selectors in dbt Core? In dbt Core, running with the selectors `state:modified+` and `source_status:fresher+` builds models that either: * Have changed since the prior run (`state:modified+`) * Have upstream sources that are fresher than in the prior run (`source_status:fresher+`) Instead of relying only on these selectors and prior-run artifacts, state-aware orchestration decides whether to rebuild a model based on: * Compiled SQL diffs that ignore non-meaningful changes like whitespace and comments * Upstream data changes at runtime and model-level freshness settings * Shared state across jobs While dbt Core uses selectors like `state:modified+` and `source_status:fresher+` to decide what to build *only for a single run in a single job*, state-aware orchestration with Fusion maintains a *shared, real-time model state across every job in the environment* and uses that state to determine whether a model’s code or upstream data have actually changed before rebuilding. This ensures dbt only rebuilds models when something has changed, no matter which job runs them. #### Related docs[​](#related-docs "Direct link to Related docs") * [State-aware orchestration configuration](https://docs.getdbt.com/docs/deploy/state-aware-setup.md) * [Artifacts](https://docs.getdbt.com/docs/deploy/artifacts.md) * [Continuous integration (CI) jobs](https://docs.getdbt.com/docs/deploy/ci-jobs.md) * [`freshness`](https://docs.getdbt.com/reference/resource-configs/freshness.md) #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### About the --empty flag ### About the `--empty` flag note The `--empty` flag is not currently available for Python models. If the flag is used with a Python model, it will be ignored. During dbt development, you might want to validate that your models are semantically correct without the time-consuming cost of building the entire model in the data warehouse. The [`run`](https://docs.getdbt.com/reference/commands/run.md) and [`build`](https://docs.getdbt.com/reference/commands/build.md) commands support the `--empty` flag for building schema-only dry runs. The `--empty` flag limits the refs and sources to zero rows. dbt will still execute the model SQL against the target data warehouse but will avoid expensive reads of input data. This validates dependencies and ensures your models will build properly. ##### Examples[​](#examples "Direct link to Examples") Run all models in a project while building only the schemas in your development environment: ```text dbt run --empty ``` Run a specific model: ```text dbt run --select path/to/your_model --empty ``` dbt will build and execute the SQL, resulting in an empty schema in the data warehouse. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### About the --sample flag ### About the `--sample` flag note The `--sample` flag is not currently available for Python models. If the flag is used with a Python model, it will be ignored. Seeds will be created normally, but are sampled when referenced by downstream nodes. Large data sets can drastically increase build times and reduce how quickly dbt developers can build and test new code. The dbt `--sample` flag can help to reduce build times and warehouse spend by running dbt in sample mode. Sample mode enables you to address cases where you don't need to build the entire model during the development or CI cycle but include enough data to validate the outputs. Sample mode takes the [`--empty` flag's](https://docs.getdbt.com/docs/build/empty-flag.md) validation of semantic results a step further by including a sampling of data from the model(s) in your development schema. It won't solve every scenario; for example, there are cases where not all joins will be populated. However, it presents a viable solution for faster building, testing, and validating many strategies. The `--sample` flag will become more robust over time, but it only supports time-based sampling for now. #### Using the `--sample` flag[​](#using-the---sample-flag "Direct link to using-the---sample-flag") The `--sample` flag is available for the [`run`](https://docs.getdbt.com/reference/commands/run.md) and [`build`](https://docs.getdbt.com/reference/commands/build.md) commands. When used, sample mode generates filtered refs and sources. Since it's using time-based sampling, if you have refs like `{{ ref('some_model') }}` being sampled, you need to set [`event_time`](https://docs.getdbt.com/reference/resource-configs/event-time.md) for `some_model` to the field that will be used as the timestamp. There are two time-based sample specifications supported for sample mode: * **Relative time specs:** Filters sampled data from the time the command is run back to a specified integer and granularity. Supported granularities are: * Hours * Days * Months * Years * **Static time specs:** Filters your data between a defined start and end period using date and/or timestamp. ##### Examples[​](#examples "Direct link to Examples") Let's say you want to run your `stg_customers` model and build the table in your development schema with a relative time spec sample size of three days. Your command in the IDE would look something like this: ```text dbt run --select path/to/stg_customers --sample="3 days" ``` If you have an even larger model, for example, `stg_orders` you can set sample mode to hours: ```text dbt run --select path/to/stg_customers --sample="6 hours" ``` Next, let's say you want to validate data for your entire business from a sample size further in the past - your busiest week in July, from the first until closing time on the eighth. You can run the following: ```text dbt run --sample="{'start': '2024-07-01', 'end': '2024-07-08 18:00:00'}" ``` To prevent a `ref` from being sampled, append `.render()` to it: ```sql with source as ( select * from {{ ref('stg_customers').render() }} ), ... ``` dbt will then execute the model SQL against the target data warehouse and build the tables with data from the sample sizes. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### About the dbt Snowflake Native App Preview ### About the dbt Snowflake Native App [Preview](https://docs.getdbt.com/docs/dbt-versions/product-lifecycles "Go to https://docs.getdbt.com/docs/dbt-versions/product-lifecycles") The dbt Snowflake Native App — powered by the Snowflake Native App Framework and Snowpark Container Services — extends your dbt experience into the Snowflake user interface. You'll be able to access these three experiences with your Snowflake login: * **Catalog** — An embedded version of [Catalog](https://docs.getdbt.com/docs/explore/explore-projects.md) * **Copilot** — A dbt-assisted chatbot, powered by [Semantic Layer](https://docs.getdbt.com/docs/use-dbt-semantic-layer/dbt-sl.md), OpenAI, and Snowflake Cortex * **Orchestration observability** — A view into the [job run history](https://docs.getdbt.com/docs/deploy/run-visibility.md) and sample code to create Snowflake tasks that trigger [deploy jobs](https://docs.getdbt.com/docs/deploy/deploy-jobs.md). These experiences enable you to extend what's been built with dbt to users who have traditionally worked downstream from the dbt project, such as BI analysts and technical stakeholders. For installation instructions, refer to [Set up the dbt Snowflake Native App](https://docs.getdbt.com/docs/cloud-integrations/set-up-snowflake-native-app.md). #### Architecture[​](#architecture "Direct link to Architecture") There are three tools connected to the operation of the dbt Snowflake Native App: | Tool | Description | | ----------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | Consumer’s Snowflake account | The location of where the Native App is installed, powered by Snowpark Container Services.

The Native App makes calls to the dbt APIs and Datadog APIs (for logging) using [Snowflake's external network access](https://docs.snowflake.com/en/developer-guide/external-network-access/external-network-access-overview).

To power the **Copilot** chatbot, the Semantic Layer accesses the Cortex LLM to execute queries and generate text based on the prompt. This is configured when the user sets up the Semantic Layer environment. | | dbt product Snowflake account | The location of where the Native App application package is hosted and then distributed into the consumer account.

The consumer's event table is shared to this account for application monitoring and logging. | | Consumer’s dbt account | The Native App interacts with the dbt APIs for metadata and processing Semantic Layer queries to power the Native App experiences.

The dbt account also calls the consumer Snowflake account to utilize the warehouse to execute dbt queries for orchestration and the Cortex LLM Arctic to power the **Copilot** chatbot. | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | The following diagram provides an illustration of the architecture: [![Architecture of dbt and Snowflake integration](/img/docs/cloud-integrations/architecture-dbt-snowflake-native-app.png?v=2 "Architecture of dbt and Snowflake integration")](#)Architecture of dbt and Snowflake integration #### Access[​](#access "Direct link to Access") Log in to the dbt Snowflake Native App using your regular Snowflake login authentication method. The Snowflake user must have a corresponding dbt user with a *[developer license](https://docs.getdbt.com/docs/cloud/manage-access/seats-and-users.md)*. Previously, this wasn't a requirement during the feature [Preview](https://docs.getdbt.com/docs/dbt-versions/product-lifecycles.md#dbt-cloud). If your Snowflake Native App is already configured, you will be prompted to [link credentials](#link-credentials) the next time you access dbt from the app. This is a one-time process. #### Procurement[​](#procurement "Direct link to Procurement") The dbt Snowflake Native App is available on the [Snowflake Marketplace](https://app.snowflake.com/marketplace/listing/GZTYZSRT2UA/dbt-labs-dbt). Purchasing it includes access to the Native App and a dbt account that's on the Enterprise-tier plan. Existing dbt Enterprise customers can also access it. If interested, contact your Enterprise account manager. If you're interested, please [contact us](mailto:sales_snowflake_marketplace@dbtlabs.com) for more information. #### Support[​](#support "Direct link to Support") If you have any questions about the dbt Snowflake Native App, you may [contact our Support team](mailto:dbt-snowflake-marketplace@dbtlabs.com) for help. Please provide information about your installation of the Native App, including your dbt account ID and Snowflake account identifier. #### Limitations[​](#limitations "Direct link to Limitations") * The Native app does not support dbt accounts with [IP Restrictions](https://docs.getdbt.com/docs/cloud/secure/ip-restrictions.md) enabled. #### Link credentials[​](#link-credentials "Direct link to Link credentials") Every Snowflake user accessing the Native app must also have dbt account access with a [developer or read-only license](https://docs.getdbt.com/docs/cloud/manage-access/seats-and-users.md). Feature access will be dependent on their dbt license type. For existing accounts with the Snowflake Native App configured, users will be prompted to authenticate with dbt the next time they log in. This is a one-time process if they have a user in dbt. If they don’t have a dbt user, they will be denied access, and an admin will need to [create one](https://docs.getdbt.com/docs/cloud/manage-access/invite-users.md). 1. When you attempt to access the dbt platform from the Snowflake Native App, you will be prompted to link your account. 2. Click **Link account** and you will be prompted for your dbt credentials. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### About the dbt VS Code extension Preview ### About the dbt VS Code extension [Preview](https://docs.getdbt.com/docs/dbt-versions/product-lifecycles "Go to https://docs.getdbt.com/docs/dbt-versions/product-lifecycles") The dbt VS Code extension brings a hyper-fast, intelligent, and cost-efficient dbt development experience to VS Code. This is the only way to enjoy all the power of the dbt Fusion engine while developing locally. * *Save time and resources* with near-instant parsing, live error detection, powerful IntelliSense capabilities, and more. * *Stay in flow* with a seamless, end-to-end dbt development experience designed from scratch for local dbt development. The dbt VS Code extension is available in the [VS Code Marketplace](https://marketplace.visualstudio.com/items?itemName=dbtLabsInc.dbt). *Note, this is a public preview release. Behavior may change ahead of the broader generally available (GA) release.* The dbt VS Code extension is only compatible with the dbt Fusion engine, but not with dbt Core. Try out the Fusion quickstart guide Check out the [Fusion quickstart guide](https://docs.getdbt.com/guides/fusion.md?step=1) to try the dbt VS Code extension in action. #### Navigating the dbt extension[​](#navigating-the-dbt-extension "Direct link to Navigating the dbt extension") Once the dbt VS Code extension has been installed, several visual enhancements will be added to your IDE to help you navigate the features and functionality. Check out the following video to see the features and functionality of the dbt VS Code extension: [dbt Fusion + VS Code extension walkthrough](https://app.storylane.io/share/a1rkqx0mbd7a) ##### The dbt extension menu[​](#the-dbt-extension-menu "Direct link to The dbt extension menu") The dbt logo on the sidebar (or the **dbt Extension** text on the bottom tray) launches the main menu for the extension. This menu contains helpful information and actions you can take: * **Get started button:** Launches the [Fusion upgrade](https://docs.getdbt.com/docs/install-dbt-extension.md#upgrade-to-fusion) workflow. * **Extension info:** Information about the extension, Fusion, and your dbt project. Includes configuration options and actions. * **Help:** Quick links to support, bug submissions, and documentation. [![dbt VS Code extension welcome screen.](/img/docs/extension/sidebar-menu.png?v=2 "dbt VS Code extension welcome screen.")](#)dbt VS Code extension welcome screen. ##### Caching[​](#caching "Direct link to Caching") The dbt extension caches important schema information from your data warehouse to improve speed and performance. This will automatically update over time, but if recent changes have been made that aren't reflected in your project, you can manually update the schema information: 1. Click the **dbt logo** on the sidebar to open the menu. 2. Expand the **Extension info** section and location the **Actions** subsection. 3. Click **Clear Cache** to update. ##### Productivity features[​](#productivity-features "Direct link to Productivity features") This section has moved We've moved productivity features to their own page! Check out their [new location](https://docs.getdbt.com/docs/dbt-extension-features.md). #### Using the extension[​](#using-the-extension "Direct link to Using the extension") Your dbt environment must be using the dbt Fusion engine in order to use this extension. See [the Fusion documentation](https://docs.getdbt.com/docs/fusion.md) for more on eligibility and upgrading. Once installed, the dbt extension automatically activates when you open any `.sql` or `.yml` file inside of a dbt project directory. #### Configuration[​](#configuration "Direct link to Configuration") After installation, you may want to configure the extension to better fit your development workflow: 1. Open the VS Code settings by pressing `Ctrl+,` (Windows/Linux) or `Cmd+,` (Mac). 2. Search for `dbt`. On this page, you can adjust the extension’s configuration options to fit your needs. [![dbt extension settings within the VS Code settings.](/img/docs/extension/dbt-extension-settings.png?v=2 "dbt extension settings within the VS Code settings.")](#)dbt extension settings within the VS Code settings. #### Known limitations[​](#known-limitations "Direct link to Known limitations") The following are currently known limitations of the dbt extension: * **Remote development:** The dbt extension does not yet support remote development sessions over SSH. Support will be added in a future release. For more information on remote development, refer to [Supporting Remote Development and GitHub Codespaces](https://code.visualstudio.com/api/advanced-topics/remote-extensions) and [Visual Studio Code Server](https://code.visualstudio.com/docs/remote/vscode-server). * **Working with YAML files:** Today, the dbt extension has the following limitations with operating on YAML files: * Go-to-definition is not supported for nodes defined in YAML files (like snapshots). * Renaming models and columns will not update references in YAML files. * Future releases of the dbt extension will address these limitations. * **Renaming models:** When you rename a model file, the dbt extension applies edits to update all `ref()` calls that reference the renamed model. Due to limitations of VS Code's Language Server Client, the extension can't auto-save these edited files. As a result, renaming a model file may cause compiler errors in your project. To fix these errors, either manually save each file that the dbt extension edited, or click **File** --> **Save All** to save all edited files. * **Using Cursor's Agent mode:** When using the dbt extension in Cursor, lineage visualization works best in Editor mode and doesn't render in Agent mode. If you're working in Agent mode and need to view lineage, switch to Editor mode to access the full lineage tab functionality. ##### Extension conflicts[​](#extension-conflicts "Direct link to Extension conflicts") The extension may occasionally conflict with other VS Code extensions that provide similar services (such as code validation). You may need to disable these third-party extensions while working with the dbt extension. **YAML by Red Hat:** The YAML extension by Red Hat may erroneously flag some keys (such as `static_analysis`) in dbt YAML files as invalid in the IDE. [![Static analysis erroneously tagged as invalid](/img/docs/extension/false-yaml-error.png?v=2 "Static analysis erroneously tagged as invalid")](#)Static analysis erroneously tagged as invalid To solve this issue, do one of the following: * (Recommended) Disable the Red Hat YAML extension while working with the dbt extension. * Add the following configuration to your VS Code `settings.json` file: ```json "yaml.schemas": { "Core/dbtschema.json": "data/dbt/models/**/schema.yml", "": "data/dbt/dbt_project.yml" }, ``` This could disable *all* use of the schema store, resulting in unintended consequences. #### Support[​](#support "Direct link to Support") dbt platform customers can contact dbt Labs support at . You can also get in touch with us by reaching out to your Account Manager directly. For organizations that are not customers of the dbt platform, the best place for questions and discussion is the [dbt Community Slack](https://www.getdbt.com/community/join-the-community). We welcome feedback as we work to continuously improve the extension, and would love to hear from you! For more information regarding support and acceptable use of the dbt VS Code extension, refer to our [Acceptable Use Policy](https://www.getdbt.com/dbt-assets/vscode-plugin-aup). #### More information about Fusion[​](#more-information-about-fusion "Direct link to More information about Fusion") Fusion marks a significant update to dbt. While many of the workflows you've grown accustomed to remain unchanged, there are a lot of new ideas, and a lot of old ones going away. The following is a list of the full scope of our current release of the Fusion engine, including implementation, installation, deprecations, and limitations: * [About the dbt Fusion engine](https://docs.getdbt.com/docs/fusion/about-fusion.md) * [About the dbt extension](https://docs.getdbt.com/docs/about-dbt-extension.md) * [New concepts in Fusion](https://docs.getdbt.com/docs/fusion/new-concepts.md) * [Supported features matrix](https://docs.getdbt.com/docs/fusion/supported-features.md) * [Installing Fusion CLI](https://docs.getdbt.com/docs/local/install-dbt.md?version=2#get-started) * [Installing VS Code extension](https://docs.getdbt.com/docs/install-dbt-extension.md) * [Fusion release track](https://docs.getdbt.com/docs/dbt-versions/upgrade-dbt-version-in-cloud.md#dbt-fusion-engine) * [Quickstart for Fusion](https://docs.getdbt.com/guides/fusion.md?step=1) * [Upgrade guide](https://docs.getdbt.com/docs/dbt-versions/core-upgrade/upgrading-to-fusion.md) * [Fusion licensing](http://www.getdbt.com/licenses-faq) #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Access Catalog from dbt platform features Access Catalog from other features and products inside dbt, ensuring you have a seamless experience navigating between resources and lineage in your project. This page explains how to access Catalog from various dbt features, including the Studio IDE and jobs. While the primary way to navigate to Catalog is by clicking **Catalog** in the navigation, you can also access it from other dbt features. ##### Studio IDE[​](#studio-ide "Direct link to Studio IDE") You can enhance your project navigation and editing experience by directly accessing resources from the [Studio IDE](https://docs.getdbt.com/docs/cloud/studio-ide/develop-in-studio.md) to Catalog for model, seed, or snapshot files. This workflow offers a seamless transition between the Studio IDE and Catalog, allowing you to quickly navigate between viewing project metadata and making updates to your models or other resources without switching contexts. ###### Access Catalog from the IDE[​](#access-catalog-from-the-ide "Direct link to Access Catalog from the IDE") * In your model, seed, or snapshot file, click the **View in Catalog** icon to the right of your file breadcrumb (under the file name tab). * This opens the model, seed, or snapshot file in a new tab, allowing you to view resources/lineage directly in Catalog. [![Access dbt Catalog from the IDE by clicking on the 'View in Explorer' icon next to the file breadcrumbs. ](/img/docs/collaborate/dbt-explorer/explorer-from-ide.jpg?v=2 "Access dbt Catalog from the IDE by clicking on the 'View in Explorer' icon next to the file breadcrumbs. ")](#)Access dbt Catalog from the IDE by clicking on the 'View in Explorer' icon next to the file breadcrumbs. ##### Canvas[​](#canvas "Direct link to Canvas") Seamlessly access Catalog via Canvas to bring your workflow to life with visual editing. ###### Access Catalog from Canvas[​](#access-catalog-from-canvas "Direct link to Access Catalog from Canvas") Steps here \[Roxi to check with Greg and team and will add images on response] ##### Lineage tab in jobs[​](#lineage-tab-in-jobs "Direct link to Lineage tab in jobs") The **Lineage tab** in dbt jobs displays the lineage associated with the [job run](https://docs.getdbt.com/docs/deploy/jobs.md). Access Catalog directly from this tab, allowing you understand dependencies/relationships of resources in your project. ###### Access Catalog from the lineage tab[​](#access-catalog-from-the-lineage-tab "Direct link to Access Catalog from the lineage tab") * From a job, select the **Lineage tab**. * Double-click the node in the lineage to open a new tab and view its metadata directly in Catalog. [![Access dbt Catalog from the lineage tab by double-clicking on the lineage node.](/img/docs/collaborate/dbt-explorer/explorer-from-lineage.gif?v=2 "Access dbt Catalog from the lineage tab by double-clicking on the lineage node.")](#)Access dbt Catalog from the lineage tab by double-clicking on the lineage node. ##### Model timing tab in jobs [Starter](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing")[Enterprise](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing")[Enterprise +](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing")[​](#model-timing-tab-in-jobs- "Direct link to model-timing-tab-in-jobs-") The [model timing tab](https://docs.getdbt.com/docs/deploy/run-visibility.md#model-timing) in dbt jobs displays the composition, order, and time taken by each model in a job run. Access Catalog directly from the **modeling timing tab**, which helps you investigate resources, diagnose performance bottlenecks, understand dependencies/relationships of slow-running models, and potentially make changes to improve their performance. ###### Access Catalog from the model timing tab[​](#access-catalog-from-the-model-timing-tab "Direct link to Access Catalog from the model timing tab") * From a job, select the **model timing tab**. * Hover over a resource and click on **View on Catalog** to view the resource metadata directly in Catalog. [![Access dbt Catalog from the model timing tab by hovering over the resource and clicking 'View in Explorer'.](/img/docs/collaborate/dbt-explorer/explorer-from-model-timing.jpg?v=2 "Access dbt Catalog from the model timing tab by hovering over the resource and clicking 'View in Explorer'.")](#)Access dbt Catalog from the model timing tab by hovering over the resource and clicking 'View in Explorer'. ##### dbt Insights [Enterprise](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing")[Enterprise +](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing")[​](#dbt-insights- "Direct link to dbt-insights-") Access Catalog directly from [Insights](https://docs.getdbt.com/docs/explore/access-dbt-insights.md) to view the project lineage and project resources with access to tables, columns, metrics, dimensions, and more. To access Catalog from Insights, click the **Catalog** icon in the Query console sidebar menu and search for the resource you're interested in. [![dbt Insights integrated with dbt Catalog](/img/docs/dbt-insights/insights-explorer.png?v=2 "dbt Insights integrated with dbt Catalog")](#)dbt Insights integrated with dbt Catalog #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Access the dbt Insights interface EnterpriseEnterprise + ### Access the dbt Insights interface [Enterprise](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing")[Enterprise +](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing") Learn how to access Insights, run queries, and view results. Insights provides a rich console experience with editor navigation. You can expect Insights to: * Enable you to write SQL queries, with the option to open multiple tabs * Have SQL + dbt autocomplete suggestions and syntax highlighting * Save SQL queries * View the results of the query and its details using the **Data** or **Details** tabs * Create a visualization of your query results using the **Chart** tab * View the history of queries and their statuses (like Success, Error, Pending) using the **Query history** tab * Use Copilot to generate or edit SQL queries using natural language prompts * Integrate with [Copilot](https://docs.getdbt.com/docs/cloud/dbt-copilot.md), [Catalog](https://docs.getdbt.com/docs/explore/explore-projects.md), [Studio IDE](https://docs.getdbt.com/docs/cloud/studio-ide/develop-in-studio.md), and [Canvas](https://docs.getdbt.com/docs/cloud/canvas.md) to provide a seamless experience for data exploration, AI-assisted writing, and collaboration #### Access the dbt Insights interface[​](#access-the-dbt-insights-interface "Direct link to Access the dbt Insights interface") Before accessing Insights, ensure that the [prerequisites](https://docs.getdbt.com/docs/explore/dbt-insights.md#prerequisites) are met. 1. To access Insights, select the **Insights** option in the navigation sidebar. 2. If your [developer credentials](https://docs.getdbt.com/docs/cloud/studio-ide/develop-in-studio.md#get-started-with-the-cloud-ide) aren’t set up, Insights will prompt you to set them up. The ability to query data is subject to warehouse provider permissions according to your developer credentials. 3. Once your credentials are set up, you can write, run, and edit SQL queries in the Insights editor for existing models in your project. #### Run queries[​](#run-queries "Direct link to Run queries") To run queries in Insights, you can use: * Standard SQL * Jinja ([`ref`](https://docs.getdbt.com/reference/dbt-jinja-functions/ref.md), [`source`](https://docs.getdbt.com/reference/dbt-jinja-functions/source.md) functions, and other Jinja functions) * Links from SQL code `ref` to the corresponding Explorer page * CTEs and subqueries * Basic aggregations and joins * Semantic Layer queries using Semantic Layer Jinja functions #### Example[​](#example "Direct link to Example") Let's use an example to illustrate how to run queries in Insights: * A [Jaffle Shop](https://github.com/dbt-labs/jaffle-shop) location wants to count unique orders and unique customers to understand whether they can expand their awesome Jaffle shop business to other parts of the world. * To express this logic in SQL, you (an analyst assigned to this project) want to understand yearly trends to help guide expansion decisions. Write the following SQL query to calculate the number of unique customers, cities, and total order revenue:

```sql with orders as ( select * from {{ ref('orders') }} ), customers as ( select * from {{ ref('customers') }} ) select date_trunc('year', ordered_at) as order_year, count(distinct orders.customer_id) as unique_customers, count(distinct orders.location_id) as unique_cities, to_char(sum(orders.order_total), '999,999,999.00') as total_order_revenue from orders join customers on orders.customer_id = customers.customer_id group by 1 order by 1 ``` ##### Use dbt Copilot[​](#use-dbt-copilot "Direct link to Use dbt Copilot") To make things easier, [use Copilot](https://docs.getdbt.com/docs/cloud/use-dbt-copilot.md#build-queries) to save time and explore other ways to analyze the data. Copilot can help you quickly update the query or generate a new one based on your prompt. 1. Click the **Copilot** icon in the Query console sidebar. 2. In the dropdown menu above the Copilot prompt box, select **Generate SQL**. 3. Enter your prompt in natural language and ask for a yearly breakdown of unique customers and total revenue. 4. Click **↑** to submit your prompt. 5. Copilot responds with: * A summary of the query * An explanation of the logic * The SQL it generated * Options to **Add** or **Replace** the existing query with the generated SQL 6. Review the output and click **Replace** to use the Copilot-generated SQL in your editor. 7. Click **Run** to preview the results. [![dbt Insights with dbt Copilot](/img/docs/dbt-insights/insights-copilot.png?v=2 "dbt Insights with dbt Copilot")](#)dbt Insights with dbt Copilot From here, you can: * Continue building or modifying the query using Copilot. * Explore the [results](#view-results) in the **Data** tab. * [View metadata and query details](#view-details) in the **Details** tab. * [Visualize results](#chart-results) in the **Chart** tab. * Check the [**Query history**](#query-history) for status and past runs. * Use [**Catalog**](#use-dbt-explorer) to explore model lineage and context. * If you want to save the query, you can click **Save Insight** in the [query console menu](https://docs.getdbt.com/docs/explore/navigate-dbt-insights.md#query-console-menu) to save it for future reference. Want to turn a query into a model? You can access the [Studio IDE](https://docs.getdbt.com/docs/cloud/studio-ide/develop-in-studio.md) or [Canvas](https://docs.getdbt.com/docs/cloud/canvas.md) from the [Query console menu](https://docs.getdbt.com/docs/explore/navigate-dbt-insights.md#query-console-menu) to promote your SQL into a reusable dbt model — all within dbt! ##### View results[​](#view-results "Direct link to View results") Using the same example, you can perform some exploratory data analysis by running the query and: * Viewing results in **Data** tab — View the paginated results of the query. * Sorting results — Click on the column header to sort the results by that column. * Exporting to CSV — On the top right of the table, click the download button to export the dataset. [![dbt Insights Export to CSV](/img/docs/dbt-insights/insights-export-csv.png?v=2 "dbt Insights Export to CSV")](#)dbt Insights Export to CSV ##### View details[​](#view-details "Direct link to View details") View the details of the query by clicking on the **Details** tab: * **Query metadata** — Copilot-generated title and description, the supplied SQL, and corresponding compiled SQL. * **Connection details** — Relevant data platform connection information. * **Query details** — Query duration, status, column count, row count. [![dbt Insights Details tab](/img/docs/dbt-insights/insights-details.png?v=2 "dbt Insights Details tab")](#)dbt Insights Details tab ##### Chart results[​](#chart-results "Direct link to Chart results") Visualize the chart results of the query by clicking on the **Chart** tab to: * Select the chart type using the chart icon. * Choose from **line chart, bar chart, or scatterplot**. * Select the axis and columns to visualize using the **Chart settings** icon. [![dbt Insights Chart tab](/img/docs/dbt-insights/insights-chart.png?v=2 "dbt Insights Chart tab")](#)dbt Insights Chart tab ##### Query history[​](#query-history "Direct link to Query history") View the history of queries and their statuses (All, Success, Error, or Pending) using the **Query history** icon: * Select a query to re-run to view the results. * Search for past queries and filter by status. * Hover over the query to view the SQL code or copy it. The query history is stored indefinitely. [![dbt Insights Query history icon](/img/docs/dbt-insights/insights-query-history.png?v=2 "dbt Insights Query history icon")](#)dbt Insights Query history icon ##### Use dbt Catalog[​](#use-dbt-catalog "Direct link to Use dbt Catalog") Access [Catalog](https://docs.getdbt.com/docs/explore/explore-projects.md) directly in Insights to view project resources such as models, columns, metrics, and dimensions, and more — all integrated in the Insights interface. This integrated view allows you and your users to maintain your query workflow, while getting more context on models, semantic models, metrics, macros, and more. The integrated Catalog view comes with: * Same search capabilities as Catalog * Allows users to narrow down displayed objects by type * Hyperlink from SQL code `ref` to the corresponding Catalog page * View assets in more detail by opening with the full Catalog experience or open them in Copilot. To access Catalog, click on the **Catalog** icon in the [Query console sidebar menu](https://docs.getdbt.com/docs/explore/navigate-dbt-insights.md#query-console-sidebar-menu). [![dbt Insights integrated with dbt Catalog](/img/docs/dbt-insights/insights-explorer.png?v=2 "dbt Insights integrated with dbt Catalog")](#)dbt Insights integrated with dbt Catalog ##### Set Jinja environment[​](#set-jinja-environment "Direct link to Set Jinja environment") Set the compilation environment to control how Jinja functions are rendered. This feature: * Supports "typed" environments marked as `Production`, `Staging`, and/or `Development`. * Enables you to run Semantic Layer. queries against staging environments (development environments not supported). * Still uses the individual user credentials, so users must have appropriate access to query `PROD` and `STG`. * Changing the environment changes context for the Catalog view in Insights, as well as the environment context during the handoff to Catalog and Canvas. For example, switching to `Staging` in Insights and selecting **View in Catalog** will open the `Staging` view in Catalog. [![Set the environment for your Jinja context](/img/docs/dbt-insights/insights-jinja-environment.png?v=2 "Set the environment for your Jinja context")](#)Set the environment for your Jinja context #### Save your Insights[​](#save-your-insights "Direct link to Save your Insights") Insights offers a robust save feature for quickly finding the queries you use most. There's also an option to share saved Insights with other dbt users (and have them share with you). Click the **bookmark icon** in a query to add it to your list! * Click the **bookmark icon** on the right menu to manage your saved Insights. You can view your personal and shared queries [![Manage your saved Insights](/img/docs/dbt-insights/saved-insights.png?v=2 "Manage your saved Insights")](#)Manage your saved Insights * View saved Insight details including description and creation date in the **Overview** tab. * View the Insight history in the **Version history** tab. Click a version to compare it the current and view changes. #### Considerations[​](#considerations "Direct link to Considerations") * Insights uses your development credentials to query. You have the ability to query against any object in your data warehouse that is accessible using your development credentials. * Every Jinja function uses [`defer --favor-state`](https://docs.getdbt.com/reference/node-selection/defer.md) to resolve Jinja. #### FAQs[​](#faqs "Direct link to FAQs") * What’s the difference between Insights and Catalog? * That’s a great question! Catalog helps you understand your dbt project's structure, resources, lineage, and metrics, offering context for your data. * Insights builds on that context, allowing you to write, run, and iterate on SQL queries directly in dbt. It’s designed for ad-hoc or exploratory analysis and empowers business users and analysts to explore data, ask questions, and collaborate seamlessly. * Catalog provides the context, while Insights enables action. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Add data tests to your DAG Tip Use [dbt Copilot](https://docs.getdbt.com/docs/cloud/dbt-copilot.md), available for dbt Enterprise and Enterprise+ accounts, to generate data tests in the Studio IDE only. #### Related reference docs[​](#related-reference-docs "Direct link to Related reference docs") * [Test command](https://docs.getdbt.com/reference/commands/test.md) * [Data test properties](https://docs.getdbt.com/reference/resource-properties/data-tests.md) * [Data test configurations](https://docs.getdbt.com/reference/data-test-configs.md) * [Test selection examples](https://docs.getdbt.com/reference/node-selection/test-selection-examples.md) important Tests are now called data tests to disambiguate from [unit tests](https://docs.getdbt.com/docs/build/unit-tests.md). The YAML key `tests:` is still supported as an alias for `data_tests:`. Refer to [New `data_tests:` syntax](#new-data_tests-syntax) for more information. #### Overview[​](#overview "Direct link to Overview") Data tests are assertions you make about your models and other resources in your dbt project (for example, sources, seeds, and snapshots). When you run `dbt test`, dbt will tell you if each test in your project passes or fails. You can use data tests to improve the integrity of the SQL in each model by making assertions about the results generated. Out of the box, you can test whether a specified column in a model only contains non-null values, unique values, or values that have a corresponding value in another model (for example, a `customer_id` for an `order` corresponds to an `id` in the `customers` model), and values from a specified list. You can extend data tests to suit business logic specific to your organization – any assertion that you can make about your model in the form of a select query can be turned into a data test. Data tests return a set of failing records. Generic data tests (a.k.a. schema tests) are defined using `test` blocks. Like almost everything in dbt, data tests are SQL queries. In particular, they are `select` statements that seek to grab "failing" records, ones that disprove your assertion. If you assert that a column is unique in a model, the test query selects for duplicates; if you assert that a column is never null, the test seeks after nulls. If the data test returns zero failing rows, it passes, and your assertion has been validated. There are two ways of defining data tests in dbt: * A **singular** data test is testing in its simplest form: If you can write a SQL query that returns failing rows, you can save that query in a `.sql` file within your [test directory](https://docs.getdbt.com/reference/project-configs/test-paths.md). It's now a data test, and it will be executed by the `dbt test` command. * A **generic** data test is a parameterized query that accepts arguments. The test query is defined in a special `test` block (like a [macro](https://docs.getdbt.com/docs/build/jinja-macros.md)). Once defined, you can reference the generic test by name throughout your `.yml` files—define it on models, columns, sources, snapshots, and seeds. dbt ships with four generic data tests built in, and we think you should use them! Defining data tests is a great way to confirm that your outputs and inputs are as expected, and helps prevent regressions when your code changes. Because you can use them over and over again, making similar assertions with minor variations, generic data tests tend to be much more common—they should make up the bulk of your dbt data testing suite. That said, both ways of defining data tests have their time and place. Creating your first data tests If you're new to dbt, we recommend that you check out our [online dbt Fundamentals course](https://learn.getdbt.com/learn/course/dbt-fundamentals/data-tests-30min/building-tests?page=1) or [quickstart guide](https://docs.getdbt.com/guides.md) to build your first dbt project with models and tests. #### Singular data tests[​](#singular-data-tests "Direct link to Singular data tests") The simplest way to define a data test is by writing the exact SQL that will return failing records. We call these "singular" data tests, because they're one-off assertions usable for a single purpose. These tests are defined in `.sql` files, typically in your `tests` directory (as defined by your [`test-paths` config](https://docs.getdbt.com/reference/project-configs/test-paths.md)). You can use Jinja (including `ref` and `source`) in the test definition, just like you can when creating models. Each `.sql` file contains one `select` statement, and it defines one data test: tests/assert\_total\_payment\_amount\_is\_positive.sql ```sql -- Refunds have a negative amount, so the total amount should always be >= 0. -- Therefore return records where total_amount < 0 to make the test fail. select order_id, sum(amount) as total_amount from {{ ref('fct_payments') }} group by 1 having total_amount < 0 ``` The name of this test is the name of the file: `assert_total_payment_amount_is_positive`. Note: * Omit semicolons (;) at the end of the SQL statement in your singular test files, as they can cause your data test to fail. * Singular data tests placed in the tests directory are automatically executed when running `dbt test`. Don't reference singular tests in `model_name.yml`, as they are not treated as generic tests or macros, and doing so will result in an error. To add a description to a singular data test in your project, add a `.yml` file to your `tests` directory, for example, `tests/schema.yml` with the following content: tests/schema.yml ```yaml data_tests: - name: assert_total_payment_amount_is_positive description: > Refunds have a negative amount, so the total amount should always be >= 0. Therefore return records where total amount < 0 to make the test fail. ``` Singular data tests are so easy that you may find yourself writing the same basic structure repeatedly, only changing the name of a column or model. By that point, the test isn't so singular! In that case, we recommend generic data tests. #### Generic data tests[​](#generic-data-tests "Direct link to Generic data tests") Certain data tests are generic: they can be reused over and over again. A generic data test is defined in a `test` block, which contains a parametrized query and accepts arguments. It might look like: ```sql {% test not_null(model, column_name) %} select * from {{ model }} where {{ column_name }} is null {% endtest %} ``` You'll notice that there are two arguments, `model` and `column_name`, which are then templated into the query. This is what makes the data test "generic": it can be defined on as many columns as you like, across as many models as you like, and dbt will pass the values of `model` and `column_name` accordingly. Once that generic test has been defined, it can be added as a *property* on any existing model (or source, seed, or snapshot). These properties are added in `.yml` files in the same directory as your resource. info If this is your first time working with adding properties to a resource, check out the docs on [declaring properties](https://docs.getdbt.com/reference/configs-and-properties.md). Out of the box, dbt ships with four generic data tests already defined: `unique`, `not_null`, `accepted_values` and `relationships`. Here's a full example using those tests on an `orders` model: ```yml models: - name: orders columns: - name: order_id data_tests: - unique - not_null - name: status data_tests: - accepted_values: arguments: # available in v1.10.5 and higher. Older versions can set the as the top-level property. values: ['placed', 'shipped', 'completed', 'returned'] - name: customer_id data_tests: - relationships: arguments: to: ref('customers') field: id ``` In plain English, these data tests translate to: * `unique`: the `order_id` column in the `orders` model should be unique * `not_null`: the `order_id` column in the `orders` model should not contain null values * `accepted_values`: the `status` column in the `orders` should be one of `'placed'`, `'shipped'`, `'completed'`, or `'returned'` * `relationships`: each `customer_id` in the `orders` model exists as an `id` in the `customers` table (also known as referential integrity) Behind the scenes, dbt constructs a `select` query for each data test, using the parametrized query from the generic test block. These queries return the rows where your assertion is *not* true; if the test returns zero rows, your assertion passes. You can find more information about these data tests, and additional configurations (including [`severity`](https://docs.getdbt.com/reference/resource-configs/severity.md) and [`tags`](https://docs.getdbt.com/reference/resource-configs/tags.md)) in the [reference section](https://docs.getdbt.com/reference/resource-properties/data-tests.md). You can also add descriptions to the Jinja macro that provides the core logic of a generic data test. Refer to the [Add description to generic data test logic](https://docs.getdbt.com/best-practices/writing-custom-generic-tests.md#add-description-to-generic-data-test-logic) for more information. ##### More generic data tests[​](#more-generic-data-tests "Direct link to More generic data tests") Those four tests are enough to get you started. You'll quickly find you want to use a wider variety of data tests — a good thing! You can also install generic data tests from a package, or write your own, to use (and reuse) across your dbt project. Check out the [guide on custom generic data tests](https://docs.getdbt.com/best-practices/writing-custom-generic-tests.md) for more information. info There are generic data tests defined in some open-source packages, such as [dbt-utils](https://hub.getdbt.com/dbt-labs/dbt_utils/latest/) and [dbt-expectations](https://hub.getdbt.com/calogica/dbt_expectations/latest/) — skip ahead to the docs on [packages](https://docs.getdbt.com/docs/build/packages.md) to learn more! ##### Example[​](#example "Direct link to Example") To add a generic (or "schema") data test to your project: 1. Add a `.yml` file to your `models` directory, for example, `models/schema.yml`, with the following content (you may need to adjust the `name:` values for an existing model) models/schema.yml ```yaml models: - name: orders columns: - name: order_id data_tests: - unique - not_null ``` 2. Run the [`dbt test` command](https://docs.getdbt.com/reference/commands/test.md): ```text $ dbt test Found 3 models, 2 tests, 0 snapshots, 0 analyses, 130 macros, 0 operations, 0 seed files, 0 sources 17:31:05 | Concurrency: 1 threads (target='learn') 17:31:05 | 17:31:05 | 1 of 2 START test not_null_order_order_id..................... [RUN] 17:31:06 | 1 of 2 PASS not_null_order_order_id........................... [PASS in 0.99s] 17:31:06 | 2 of 2 START test unique_order_order_id....................... [RUN] 17:31:07 | 2 of 2 PASS unique_order_order_id............................. [PASS in 0.79s] 17:31:07 | 17:31:07 | Finished running 2 tests in 7.17s. Completed successfully Done. PASS=2 WARN=0 ERROR=0 SKIP=0 TOTAL=2 ``` 3. Check out the SQL dbt is running by either: * **dbt:** checking the Details tab. * **dbt Core:** checking the `target/compiled` directory **Unique test** * Compiled SQL * Templated SQL ```sql select * from ( select order_id from analytics.orders where order_id is not null group by order_id having count(*) > 1 ) validation_errors ``` ```sql select * from ( select {{ column_name }} from {{ model }} where {{ column_name }} is not null group by {{ column_name }} having count(*) > 1 ) validation_errors ``` **Not null test** * Compiled SQL * Templated SQL ```sql select * from analytics.orders where order_id is null ``` ```sql select * from {{ model }} where {{ column_name }} is null ``` #### Storing data test failures[​](#storing-data-test-failures "Direct link to Storing data test failures") Normally, a data test query will calculate failures as part of its execution. If you set the optional `--store-failures` flag, the [`store_failures`](https://docs.getdbt.com/reference/resource-configs/store_failures.md), or the [`store_failures_as`](https://docs.getdbt.com/reference/resource-configs/store_failures_as.md) configs, dbt will first save the results of a test query to a table in the database, and then query that table to calculate the number of failures. This workflow allows you to query and examine failing records much more quickly in development: [![Store test failures in the database for faster development-time debugging.](/img/docs/building-a-dbt-project/test-store-failures.gif?v=2 "Store test failures in the database for faster development-time debugging.")](#)Store test failures in the database for faster development-time debugging. Note that, if you select to store data test failures: * Test result tables are created in a schema suffixed or named `dbt_test__audit`, by default. It is possible to change this value by setting a `schema` config. (For more details on schema naming, see [using custom schemas](https://docs.getdbt.com/docs/build/custom-schemas.md).) - A test's results will always **replace** previous failures for the same test. #### New `data_tests:` syntax[​](#new-data_tests-syntax "Direct link to new-data_tests-syntax") Data tests were historically called "tests" in dbt as the only form of testing available. With the introduction of unit tests, the key was renamed from `tests:` to `data_tests:`. dbt still supports `tests:` in your YML configuration files for backwards-compatibility purposes, and you might see it used throughout our documentation. However, you can't have a `tests` and a `data_tests` key associated with the same resource (for example, a single model) at the same time. models/schema.yml ```yml models: - name: orders columns: - name: order_id data_tests: - unique - not_null ``` dbt\_project.yml ```yml data_tests: +store_failures: true ``` #### FAQs[​](#faqs "Direct link to FAQs") What data tests are available for me to use in dbt? Out of the box, dbt ships with the following data tests: * `unique` * `not_null` * `accepted_values` * `relationships` (for example, referential integrity) You can also write your own [custom generic tests](https://docs.getdbt.com/docs/build/data-tests.md#generic-data-tests). Some additional generic tests have been open-sourced in the [dbt-utils package](https://github.com/dbt-labs/dbt-utils#generic-tests). Check out the docs on [packages](https://docs.getdbt.com/docs/build/packages.md) to learn how to make these tests available in your project. How do I test one model at a time? Running tests on one model looks very similar to running a model: use the `--select` flag (or `-s` flag), followed by the name of the model: ```shell dbt test --select customers ``` Check out the [model selection syntax documentation](https://docs.getdbt.com/reference/node-selection/syntax.md) for full syntax, and [test selection examples](https://docs.getdbt.com/reference/node-selection/test-selection-examples.md) in particular. One of my tests failed, how can I debug it? To debug a failing test, find the SQL that dbt ran by: * dbt: * Within the test output, click on the failed test, and then select "Details". * dbt Core: * Open the file path returned as part of the error message. * Navigate to the `target/compiled/schema_tests` directory for all compiled test queries. Copy the SQL into a query editor (in dbt, you can paste it into a new `Statement`), and run the query to find the records that failed. What data tests should I add to my project? We recommend that every model has a data test on a primary key, that is, a column that is `unique` and `not_null`. We also recommend that you test any assumptions on your source data. For example, if you believe that your payments can only be one of three payment methods, you should test that assumption regularly — a new payment method may introduce logic errors in your SQL. In advanced dbt projects, we recommend using [sources](https://docs.getdbt.com/docs/build/sources.md) and running these source data-integrity tests against the sources rather than models. When should I run my data tests? You should run your data tests whenever you are writing new code (to ensure you haven't broken any existing models by changing SQL), and whenever you run your transformations in production (to ensure that your assumptions about your source data are still valid). Can I store my data tests in a directory other than the \`tests\` directory in my project? By default, dbt expects your singular data test files to be located in the `tests` subdirectory of your project, and generic data test definitions to be located in `tests/generic` or `macros`. To change this, update the [test-paths](https://docs.getdbt.com/reference/project-configs/test-paths.md) configuration in your `dbt_project.yml` file, like so: dbt\_project.yml ```yml test-paths: ["my_cool_tests"] ``` Then, you can define generic data tests in `my_cool_tests/generic/`, and singular data tests everywhere else in `my_cool_tests/`. How do I run data tests on just my sources? To run data tests on all sources, use the following command: ```shell dbt test --select "source:*" ``` (You can also use the `-s` shorthand here instead of `--select`) To run data tests on one source (and all of its tables): ```shell $ dbt test --select source:jaffle_shop ``` And, to run data tests on one source table only: ```shell $ dbt test --select source:jaffle_shop.orders ``` Can I set test failure thresholds? You can use the `error_if` and `warn_if` configs to set custom failure thresholds in your tests. For more details, see [reference](https://docs.getdbt.com/reference/resource-configs/severity.md) for more information. You can also try the following solutions: * Setting the [severity](https://docs.getdbt.com/reference/resource-configs/severity.md) to `warn` or `error` * Writing a [custom generic test](https://docs.getdbt.com/best-practices/writing-custom-generic-tests.md) that accepts a threshold argument ([example](https://discourse.getdbt.com/t/creating-an-error-threshold-for-schema-tests/966)) Can I test the uniqueness of two columns? Yes, there's a few different options for testing the uniqueness of two columns. Consider an orders table that contains records from multiple countries, and the combination of ID and country code is unique: | order\_id | country\_code | | --------- | ------------- | | 1 | AU | | 2 | AU | | ... | ... | | 1 | US | | 2 | US | | ... | ... | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | Here are some approaches: ###### 1. Create a unique key in the model and test that[​](#1-create-a-unique-key-in-the-model-and-test-that "Direct link to 1. Create a unique key in the model and test that") models/orders.sql ```sql select country_code || '-' || order_id as surrogate_key, ... ``` models/orders.yml ```yml models: - name: orders columns: - name: surrogate_key data_tests: - unique ``` ###### 2. Test an expression[​](#2-test-an-expression "Direct link to 2. Test an expression") models/orders.yml ```yml models: - name: orders data_tests: - unique: arguments: # available in v1.10.5 and higher. Older versions can set the as the top-level property. column_name: "(country_code || '-' || order_id)" ``` ###### 3. Use the `dbt_utils.unique_combination_of_columns` test[​](#3-use-the-dbt_utilsunique_combination_of_columns-test "Direct link to 3-use-the-dbt_utilsunique_combination_of_columns-test") This is especially useful for large datasets since it is more performant. Check out the docs on [packages](https://docs.getdbt.com/docs/build/packages.md) for more information. models/orders.yml ```yml models: - name: orders data_tests: - dbt_utils.unique_combination_of_columns: arguments: # available in v1.10.5 and higher. Older versions can set the as the top-level property. combination_of_columns: - country_code - order_id ``` #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Add Exposures to your DAG Exposures make it possible to define and describe a downstream use of your dbt project, such as in a dashboard, application, or data science pipeline. By defining exposures, you can then: * run, test, and list resources that feed into your exposure * populate a dedicated page in the auto-generated [documentation](https://docs.getdbt.com/docs/build/documentation.md) site with context relevant to data consumers Exposures can be defined in two ways: * Manual — Declared [explicitly](https://docs.getdbt.com/docs/build/exposures.md#declaring-an-exposure) in your project’s YAML files. * Automatic — dbt [creates and visualizes downstream exposures](https://docs.getdbt.com/docs/cloud-integrations/downstream-exposures.md) automatically for supported integrations, removing the need for manual YAML definitions. These downstream exposures are stored in dbt’s metadata system, appear in [Catalog](https://docs.getdbt.com/docs/explore/explore-projects.md), and behave like manual exposures. However, they don’t exist in YAML files. ##### Declaring an exposure[​](#declaring-an-exposure "Direct link to Declaring an exposure") Exposures are defined in `.yml` files nested under an `exposures:` key. The following example shows an exposure definition in a `models/.yml` file: models/\.yml ```yaml exposures: - name: weekly_jaffle_metrics label: Jaffles by the Week type: dashboard maturity: high url: https://bi.tool/dashboards/1 description: > Did someone say "exponential growth"? depends_on: - ref('fct_orders') - ref('dim_customers') - source('gsheets', 'goals') - metric('count_orders') owner: name: Callum McData email: data@jaffleshop.com ``` ##### Available properties[​](#available-properties "Direct link to Available properties") *Required:* * **name**: a unique exposure name written in [snake case](https://en.wikipedia.org/wiki/Snake_case) * **type**: one of `dashboard`, `notebook`, `analysis`, `ml`, `application` (used to organize in docs site) * **owner**: `name` or `email` required; additional properties allowed *Expected:* * **depends\_on**: list of refable nodes, including `metric`, `ref`, and `source`. While possible, it is highly unlikely you will ever need an `exposure` to depend on a `source` directly. *Optional:* * **label**: May contain spaces, capital letters, or special characters. * **url**: Activates and populates the link to **View this exposure** in the upper right corner of the generated documentation site * **maturity**: Indicates the level of confidence or stability in the exposure. One of `high`, `medium`, or `low`. For example, you could use `high` maturity for a well-established dashboard, widely used and trusted within your organization. Use `low` maturity for a new or experimental analysis. *General properties (optional)* * [**description**](https://docs.getdbt.com/reference/resource-properties/description.md) * [**tags**](https://docs.getdbt.com/reference/resource-configs/tags.md) * [**meta**](https://docs.getdbt.com/reference/resource-configs/meta.md) * [**enabled**](https://docs.getdbt.com/reference/resource-configs/enabled.md) — You can set this property at the exposure level or at the project level in the [`dbt_project.yml`](https://docs.getdbt.com/reference/dbt_project.yml.md) file. ##### Referencing exposures[​](#referencing-exposures "Direct link to Referencing exposures") Once an exposure is defined, you can run commands that reference it: ```text dbt run -s +exposure:weekly_jaffle_report dbt test -s +exposure:weekly_jaffle_report ``` When we generate the [Catalog site](https://docs.getdbt.com/docs/explore/explore-projects.md), you'll see the exposure appear: [![Exposures has a dedicated section, under the project name in dbt Catalog, which lists each exposure in your project.](/img/docs/building-a-dbt-project/dbt-explorer-exposures.png?v=2 "Exposures has a dedicated section, under the project name in dbt Catalog, which lists each exposure in your project.")](#)Exposures has a dedicated section, under the project name in dbt Catalog, which lists each exposure in your project. [![Exposures appear as nodes in the dbt Catalog DAG. It displays an orange 'EXP' indicator within the node. ](/img/docs/building-a-dbt-project/dag-exposures.png?v=2 "Exposures appear as nodes in the dbt Catalog DAG. It displays an orange 'EXP' indicator within the node. ")](#)Exposures appear as nodes in the dbt Catalog DAG. It displays an orange 'EXP' indicator within the node. #### Related docs[​](#related-docs "Direct link to Related docs") * [Exposure properties](https://docs.getdbt.com/reference/exposure-properties.md) * [`exposure:` selection method](https://docs.getdbt.com/reference/node-selection/methods.md#exposure) * [Data health tiles](https://docs.getdbt.com/docs/explore/data-tile.md) #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Add groups to your DAG A group is a collection of nodes within a dbt DAG. Groups are named, and every group has an `owner`. They enable intentional collaboration within and across teams by restricting [access to private](https://docs.getdbt.com/reference/resource-configs/access.md) models. Group members may include models, tests, seeds, snapshots, analyses, and metrics. (Not included: sources and exposures.) Each node may belong to only one group. ##### Declaring a group[​](#declaring-a-group "Direct link to Declaring a group") Groups are defined in `.yml` files, nested under a `groups:` key. ###### Centrally defining a group[​](#centrally-defining-a-group "Direct link to Centrally defining a group") To centrally define a group in your project, there are two options: * Create one `_groups.yml` file in the root of the `models` directory. * Create one `_groups.yml` file in the root of a `groups` directory. For this option, you also need to configure [`model-paths`](https://docs.getdbt.com/reference/project-configs/model-paths.md) in the `dbt_project.yml` file: ```yml model-paths: ["models", "groups"] ``` ##### Adding a model to a group[​](#adding-a-model-to-a-group "Direct link to Adding a model to a group") Use the `group` configuration to add one or more models to a group. * Project-level * Model-level * In-file dbt\_project.yml ```yml models: marts: finance: +group: finance ``` models/schema.yml ```yml models: - name: model_name config: group: finance ``` models/model\_name.sql ```sql {{ config(group = 'finance') }} select ... ``` ##### Referencing a model in a group[​](#referencing-a-model-in-a-group "Direct link to Referencing a model in a group") By default, all models within a group have the `protected` [access modifier](https://docs.getdbt.com/reference/resource-configs/access.md). This means they can be referenced by downstream resources in *any* group in the same project, using the [`ref`](https://docs.getdbt.com/reference/dbt-jinja-functions/ref.md) function. If a grouped model's `access` property is set to `private`, only resources within its group can reference it. models/schema.yml ```yml models: - name: finance_private_model config: access: private # changed to config in v1.10 group: finance # in a different group! - name: marketing_model config: group: marketing ``` models/marketing\_model.sql ```sql select * from {{ ref('finance_private_model') }} ``` ```shell $ dbt run -s marketing_model ... dbt.exceptions.DbtReferenceError: Parsing Error Node model.jaffle_shop.marketing_model attempted to reference node model.jaffle_shop.finance_private_model, which is not allowed because the referenced node is private to the finance group. ``` #### Related docs[​](#related-docs "Direct link to Related docs") * [Model Access](https://docs.getdbt.com/docs/mesh/govern/model-access.md#groups) * [Group configuration](https://docs.getdbt.com/reference/resource-configs/group.md) * [Group selection](https://docs.getdbt.com/reference/node-selection/methods.md#group) #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Add Seeds to your DAG #### Related reference docs[​](#related-reference-docs "Direct link to Related reference docs") * [Seed configurations](https://docs.getdbt.com/reference/seed-configs.md) * [Seed properties](https://docs.getdbt.com/reference/seed-properties.md) * [`seed` command](https://docs.getdbt.com/reference/commands/seed.md) #### Overview[​](#overview "Direct link to Overview") Seeds are CSV files in your dbt project (typically in your `seeds` directory), that dbt can load into your data warehouse using the `dbt seed` command. Seeds can be referenced in downstream models the same way as referencing models — by using the [`ref` function](https://docs.getdbt.com/reference/dbt-jinja-functions/ref.md). Because these CSV files are located in your dbt repository, they are version controlled and code reviewable. Seeds are best suited to static data which changes infrequently. Good use-cases for seeds: * A list of mappings of country codes to country names * A list of test emails to exclude from analysis * A list of employee account IDs Poor use-cases of dbt seeds: * Loading raw data that has been exported to CSVs * Any kind of production data containing sensitive information. For example personal identifiable information (PII) and passwords. #### Example[​](#example "Direct link to Example") To load a seed file in your dbt project: 1. Add the file to your `seeds` directory, with a `.csv` file extension, for example, `seeds/country_codes.csv` seeds/country\_codes.csv ```text country_code,country_name US,United States CA,Canada GB,United Kingdom ... ``` 2. Run the `dbt seed` [command](https://docs.getdbt.com/reference/commands/seed.md) — a new table will be created in your warehouse in your target schema, named `country_codes` ```text $ dbt seed Found 2 models, 3 tests, 0 archives, 0 analyses, 53 macros, 0 operations, 1 seed file 14:46:15 | Concurrency: 1 threads (target='dev') 14:46:15 | 14:46:15 | 1 of 1 START seed file analytics.country_codes........................... [RUN] 14:46:15 | 1 of 1 OK loaded seed file analytics.country_codes....................... [INSERT 3 in 0.01s] 14:46:16 | 14:46:16 | Finished running 1 seed in 0.14s. Completed successfully Done. PASS=1 ERROR=0 SKIP=0 TOTAL=1 ``` 3. Refer to seeds in downstream models using the `ref` function. models/orders.sql ```sql -- This refers to the table created from seeds/country_codes.csv select * from {{ ref('country_codes') }} ``` #### Configuring seeds[​](#configuring-seeds "Direct link to Configuring seeds") Seeds are configured in your `dbt_project.yml`, check out the [seed configurations](https://docs.getdbt.com/reference/seed-configs.md) docs for a full list of available configurations. #### Documenting and testing seeds[​](#documenting-and-testing-seeds "Direct link to Documenting and testing seeds") You can document and test seeds in YAML by declaring properties — check out the docs on [seed properties](https://docs.getdbt.com/reference/seed-properties.md) for more information. #### FAQs[​](#faqs "Direct link to FAQs") Can I use seeds to load raw data? Seeds should **not** be used to load raw data (for example, large CSV exports from a production database). Since seeds are version controlled, they are best suited to files that contain business-specific logic, for example a list of country codes or user IDs of employees. Loading CSVs using dbt's seed functionality is not performant for large files. Consider using a different tool to load these CSVs into your data warehouse. Can I store my seeds in a directory other than the \`seeds\` directory in my project? By default, dbt expects your seed files to be located in the `seeds` subdirectory of your project. To change this, update the [seed-paths](https://docs.getdbt.com/reference/project-configs/seed-paths.md) configuration in your `dbt_project.yml` file, like so: dbt\_project.yml ```yml seed-paths: ["custom_seeds"] ``` The columns of my seed changed, and now I get an error when running the \`seed\` command, what should I do? If you changed the columns of your seed, you may get a `Database Error`: * Snowflake * Redshift ```shell $ dbt seed Running with dbt=1.6.0-rc2 Found 0 models, 0 tests, 0 snapshots, 0 analyses, 130 macros, 0 operations, 1 seed file, 0 sources 12:12:27 | Concurrency: 8 threads (target='dev_snowflake') 12:12:27 | 12:12:27 | 1 of 1 START seed file dbt_claire.country_codes...................... [RUN] 12:12:30 | 1 of 1 ERROR loading seed file dbt_claire.country_codes.............. [ERROR in 2.78s] 12:12:31 | 12:12:31 | Finished running 1 seed in 10.05s. Completed with 1 error and 0 warnings: Database Error in seed country_codes (seeds/country_codes.csv) 000904 (42000): SQL compilation error: error line 1 at position 62 invalid identifier 'COUNTRY_NAME' Done. PASS=0 WARN=0 ERROR=1 SKIP=0 TOTAL=1 ``` ```shell $ dbt seed Running with dbt=1.6.0-rc2 Found 0 models, 0 tests, 0 snapshots, 0 analyses, 149 macros, 0 operations, 1 seed file, 0 sources 12:14:46 | Concurrency: 1 threads (target='dev_redshift') 12:14:46 | 12:14:46 | 1 of 1 START seed file dbt_claire.country_codes...................... [RUN] 12:14:46 | 1 of 1 ERROR loading seed file dbt_claire.country_codes.............. [ERROR in 0.23s] 12:14:46 | 12:14:46 | Finished running 1 seed in 1.75s. Completed with 1 error and 0 warnings: Database Error in seed country_codes (seeds/country_codes.csv) column "country_name" of relation "country_codes" does not exist Done. PASS=0 WARN=0 ERROR=1 SKIP=0 TOTAL=1 ``` In this case, you should rerun the command with a `--full-refresh` flag, like so: ```text dbt seed --full-refresh ``` **Why is this the case?** When you typically run dbt seed, dbt truncates the existing table and reinserts the data. This pattern avoids a `drop cascade` command, which may cause downstream objects (that your BI users might be querying!) to get dropped. However, when column names are changed, or new columns are added, these statements will fail as the table structure has changed. The `--full-refresh` flag will force dbt to `drop cascade` the existing table before rebuilding it. How do I test and document seeds? To test and document seeds, use a [properties file](https://docs.getdbt.com/reference/configs-and-properties.md) and nest the configurations under a `seeds:` key #### Example[​](#example "Direct link to Example") seeds/properties.yml ```yml seeds: - name: country_codes description: A mapping of two letter country codes to country names columns: - name: country_code data_tests: - unique - not_null - name: country_name data_tests: - unique - not_null ``` How do I set a datatype for a column in my seed? dbt will infer the datatype for each column based on the data in your CSV. You can also explicitly set a datatype using the `column_types` [configuration](https://docs.getdbt.com/reference/resource-configs/column_types.md) like so: dbt\_project.yml ```yml seeds: jaffle_shop: # you must include the project name warehouse_locations: +column_types: zipcode: varchar(5) ``` How do I run models downstream of a seed? You can run models downstream of a seed using the [model selection syntax](https://docs.getdbt.com/reference/node-selection/syntax.md), and treating the seed like a model. For example, the following would run all models downstream of a seed named `country_codes`: ```shell $ dbt run --select country_codes+ ``` How do I preserve leading zeros in a seed? If you need to preserve leading zeros (for example in a zipcode or mobile number), include leading zeros in your seed file, and use the `column_types` [configuration](https://docs.getdbt.com/reference/resource-configs/column_types.md) with a varchar datatype of the correct length. How do I build one seed at a time? You can use a `--select` option with the `dbt seed` command, like so: ```shell $ dbt seed --select country_codes ``` There is also an `--exclude` option. Check out more in the [model selection syntax](https://docs.getdbt.com/reference/node-selection/syntax.md) documentation. Do hooks run with seeds? Yes! The following hooks are available: * [pre-hooks & post-hooks](https://docs.getdbt.com/reference/resource-configs/pre-hook-post-hook.md) * [on-run-start & on-run-end hooks](https://docs.getdbt.com/reference/project-configs/on-run-start-on-run-end.md) Configure these in your `dbt_project.yml` file. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Add snapshots to your DAG #### Related documentation[​](#related-documentation "Direct link to Related documentation") * [Snapshot configurations](https://docs.getdbt.com/reference/snapshot-configs.md) * [Snapshot properties](https://docs.getdbt.com/reference/snapshot-properties.md) * [`snapshot` command](https://docs.getdbt.com/reference/commands/snapshot.md) Learn by video! For video tutorials on Snapshots , go to dbt Learn and check out the [Snapshots](https://learn.getdbt.com/courses/snapshots) [ course](https://learn.getdbt.com/courses/snapshots). #### What are snapshots?[​](#what-are-snapshots "Direct link to What are snapshots?") Analysts often need to "look back in time" at previous data states in their mutable tables. While some source data systems are built in a way that makes accessing historical data possible, this is not always the case. dbt provides a mechanism, **snapshots**, which records changes to a mutable table over time. Snapshots implement [type-2 Slowly Changing Dimensions](https://en.wikipedia.org/wiki/Slowly_changing_dimension#Type_2:_add_new_row) over mutable source tables. These Slowly Changing Dimensions (or SCDs) identify how a row in a table changes over time. Imagine you have an `orders` table where the `status` field can be overwritten as the order is processed. | id | status | updated\_at | | -- | ------- | ----------- | | 1 | pending | 2024-01-01 | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | Now, imagine that the order goes from "pending" to "shipped". That same record will now look like: | id | status | updated\_at | | -- | ------- | ----------- | | 1 | shipped | 2024-01-02 | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | This order is now in the "shipped" state, but we've lost the information about when the order was last in the "pending" state. This makes it difficult (or impossible) to analyze how long it took for an order to ship. dbt can "snapshot" these changes to help you understand how values in a row change over time. Here's an example of a snapshot table for the previous example: | id | status | updated\_at | dbt\_valid\_from | dbt\_valid\_to | | -- | ------- | ----------- | ---------------- | -------------- | | 1 | pending | 2024-01-01 | 2024-01-01 | 2024-01-02 | | 1 | shipped | 2024-01-02 | 2024-01-02 | `null` | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | #### Configuring snapshots[​](#configuring-snapshots "Direct link to Configuring snapshots") ##### Configuration best practices[​](#configuration-best-practices "Direct link to Configuration best practices")  Use the timestamp strategy where possible The timestamp strategy is recommended because it handles column additions and deletions more efficiently than the `check` strategy. This is because it's more robust to schema changes, especially when columns are added or removed over time. The timestamp strategy relies on a single `updated_at` field, which means it avoids the need to constantly update your snapshot configuration as your source table evolves. Why timestamp is the preferred strategy: * Requires tracking only one column (`updated_at`) * Automatically handles new or removed columns in the source table * Less prone to errors when the table schema evolves over time (for example, if using the `check` strategy, you might need to update the `check_cols` configuration)  Use dbt\_valid\_to\_current for easier date range queries By default, `dbt_valid_to` is `NULL` for current records. However, if you set the [`dbt_valid_to_current` configuration](https://docs.getdbt.com/reference/resource-configs/dbt_valid_to_current.md) (available in dbt Core v1.9+), `dbt_valid_to` will be set to your specified value (such as `9999-12-31`) for current records. This allows for straightforward date range filtering.  Ensure your unique key is really unique The unique key is used by dbt to match rows up, so it's extremely important to make sure this key is actually unique! If you're snapshotting a source, I'd recommend adding a uniqueness test to your source ([example](https://github.com/dbt-labs/jaffle_shop/blob/8e7c853c858018180bef1756ec93e193d9958c5b/models/staging/schema.yml#L26)). ##### How snapshots work[​](#how-snapshots-work "Direct link to How snapshots work") When you run the [`dbt snapshot` command](https://docs.getdbt.com/reference/commands/snapshot.md): * **On the first run:** dbt will create the initial snapshot table — this will be the result set of your `select` statement, with additional columns including `dbt_valid_from` and `dbt_valid_to`. All records will have a `dbt_valid_to = null` or the value specified in [`dbt_valid_to_current`](https://docs.getdbt.com/reference/resource-configs/dbt_valid_to_current.md) (available in dbt Core 1.9+) if configured. * **On subsequent runs:** dbt will check which records have changed or if any new records have been created: * The `dbt_valid_to` column will be updated for any existing records that have changed. * The updated record and any new records will be inserted into the snapshot table. These records will now have `dbt_valid_to = null` or the value configured in `dbt_valid_to_current` (available in dbt Core v1.9+). Snapshots can be referenced in downstream models the same way as referencing models — by using the [ref](https://docs.getdbt.com/reference/dbt-jinja-functions/ref.md) function. #### Detecting row changes[​](#detecting-row-changes "Direct link to Detecting row changes") Snapshot "strategies" define how dbt knows if a row has changed. There are two strategies built-in to dbt: * [Timestamp](#timestamp-strategy-recommended) — Uses an `updated_at` column to determine if a row has changed. * [Check](#check-strategy) — Compares a list of columns between their current and historical values to determine if a row has changed. ##### Timestamp strategy (recommended)[​](#timestamp-strategy-recommended "Direct link to Timestamp strategy (recommended)") The `timestamp` strategy uses an `updated_at` field to determine if a row has changed. If the configured `updated_at` column for a row is more recent than the last time the snapshot ran, then dbt will invalidate the old record and record the new one. If the timestamps are unchanged, then dbt will not take any action. Why timestamp is recommended? * Requires tracking only one column (`updated_at`) * Automatically handles new or removed columns in the source table * Less prone to errors when the table schema evolves over time (for example, if using the `check` strategy, you might need to update the `check_cols` configuration) The `timestamp` strategy requires the following configurations: | Config | Description | Example | | ----------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------- | ------------ | | updated\_at | A column which represents when the source row was last updated. May support ISO date strings and unix epoch integers, depending on the data platform you use. | `updated_at` | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | **Example usage:** ##### Check strategy[​](#check-strategy "Direct link to Check strategy") The `check` strategy is useful for tables which do not have a reliable `updated_at` column. This strategy works by comparing a list of columns between their current and historical values. If any of these columns have changed, then dbt will invalidate the old record and record the new one. If the column values are identical, then dbt will not take any action. The `check` strategy requires the following configurations: | Config | Description | Example | | ----------- | --------------------------------------------------------------------- | ------------------- | | check\_cols | A list of columns to check for changes, or `all` to check all columns | `["name", "email"]` | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | check\_cols = 'all' The `check` snapshot strategy can be configured to track changes to *all* columns by supplying `check_cols = 'all'`. It is better to explicitly enumerate the columns that you want to check. Consider using a surrogate key to condense many columns into a single column. ###### Example usage[​](#example-usage "Direct link to Example usage") ###### Example usage with `updated_at`[​](#example-usage-with-updated_at "Direct link to example-usage-with-updated_at") When using the `check` strategy, dbt tracks changes by comparing values in `check_cols`. By default, dbt uses the timestamp to update `dbt_updated_at`, `dbt_valid_from` and `dbt_valid_to` fields. Optionally you can set an `updated_at` column: * If `updated_at` is configured, the `check` strategy uses this column instead, as with the timestamp strategy. * If `updated_at` value is null, dbt defaults to using the current timestamp. Check out the following example, which shows how to use the `check` strategy with `updated_at`: ```yaml snapshots: - name: orders_snapshot relation: ref('stg_orders') config: schema: snapshots unique_key: order_id strategy: check check_cols: - status - is_cancelled updated_at: updated_at ``` In this example: * If at least one of the specified `check_cols `changes, the snapshot creates a new row. If the `updated_at` column has a value (is not null), the snapshot uses it; otherwise, it defaults to the timestamp. * If `updated_at` isn’t set, then dbt automatically falls back to [using the current timestamp](#sample-results-for-the-check-strategy) to track changes. * Use this approach when your `updated_at` column isn't reliable for tracking record updates, but you still want to use it — rather than the snapshot's execution time — whenever row changes are detected. ##### Hard deletes (opt-in)[​](#hard-deletes-opt-in "Direct link to Hard deletes (opt-in)") #### Snapshot meta-fields[​](#snapshot-meta-fields "Direct link to Snapshot meta-fields") Snapshot tables will be created as a clone of your source dataset, plus some additional meta-fields\*. In dbt Core v1.9+ (or available sooner in [the **Latest** release track in dbt](https://docs.getdbt.com/docs/dbt-versions/cloud-release-tracks.md)): * These column names can be customized to your team or organizational conventions using the [`snapshot_meta_column_names`](https://docs.getdbt.com/reference/resource-configs/snapshot_meta_column_names.md) config. * Use the [`dbt_valid_to_current` config](https://docs.getdbt.com/reference/resource-configs/dbt_valid_to_current.md) to set a custom indicator for the value of `dbt_valid_to` in current snapshot records (like a future date such as `9999-12-31`). By default, this value is `NULL`. When set, dbt will use this specified value instead of `NULL` for `dbt_valid_to` for current records in the snapshot table. * Use the [`hard_deletes`](https://docs.getdbt.com/reference/resource-configs/hard-deletes.md) config to track deleted records as new rows with the `dbt_is_deleted` meta field when using the `hard_deletes='new_record'` field. | Field | Meaning | Notes | Example | | ---------------- | ------------------------------------------------------------------------------------------------------------------------------------------------ | ---------------------------------------------------------------------------------------------- | ------------------------------------------------------------- | | `dbt_valid_from` | The timestamp when this snapshot row was first inserted and became valid. | This column can be used to order the different "versions" of a record. | `snapshot_meta_column_names: {dbt_valid_from: start_date}` | | `dbt_valid_to` | The timestamp when this row became invalidated. For current records, this is `NULL` by default or the value specified in `dbt_valid_to_current`. | The most recent snapshot record will have `dbt_valid_to` set to `NULL` or the specified value. | `snapshot_meta_column_names: {dbt_valid_to: end_date}` | | `dbt_scd_id` | A unique key generated for each snapshot row. | This is used internally by dbt. | `snapshot_meta_column_names: {dbt_scd_id: scd_id}` | | `dbt_updated_at` | The `updated_at` timestamp of the source record when this snapshot row was inserted. | This is used internally by dbt. | `snapshot_meta_column_names: {dbt_updated_at: modified_date}` | | `dbt_is_deleted` | A string value indicating if the record has been deleted. (`True` if deleted, `False` if not deleted). | Added when `hard_deletes='new_record'` is configured. | `snapshot_meta_column_names: {dbt_is_deleted: is_deleted}` | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | All of these column names can be customized using the `snapshot_meta_column_names` config. Refer to this [example](https://docs.getdbt.com/reference/resource-configs/snapshot_meta_column_names.md#example) for more details. \*The timestamps used for each column are subtly different depending on the strategy you use: * For the `timestamp` strategy, the configured `updated_at` column is used to populate the `dbt_valid_from`, `dbt_valid_to` and `dbt_updated_at` columns.  Sample results for the timestamp strategy Snapshot query results at `2024-01-01 11:00` | id | status | updated\_at | | -- | ------- | ---------------- | | 1 | pending | 2024-01-01 10:47 | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | Snapshot results (note that `11:00` is not used anywhere): | id | status | updated\_at | dbt\_valid\_from | dbt\_valid\_to | dbt\_updated\_at | | -- | ------- | ---------------- | ---------------- | -------------- | ---------------- | | 1 | pending | 2024-01-01 10:47 | 2024-01-01 10:47 | | 2024-01-01 10:47 | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | Query results at `2024-01-01 11:30`: | id | status | updated\_at | | -- | ------- | ---------------- | | 1 | shipped | 2024-01-01 11:05 | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | Snapshot results (note that `11:30` is not used anywhere): | id | status | updated\_at | dbt\_valid\_from | dbt\_valid\_to | dbt\_updated\_at | | -- | ------- | ---------------- | ---------------- | ---------------- | ---------------- | | 1 | pending | 2024-01-01 10:47 | 2024-01-01 10:47 | 2024-01-01 11:05 | 2024-01-01 10:47 | | 1 | shipped | 2024-01-01 11:05 | 2024-01-01 11:05 | | 2024-01-01 11:05 | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | Snapshot results with `hard_deletes='new_record'`: | id | status | updated\_at | dbt\_valid\_from | dbt\_valid\_to | dbt\_updated\_at | dbt\_is\_deleted | | -- | ------- | ---------------- | ---------------- | ---------------- | ---------------- | ---------------- | | 1 | pending | 2024-01-01 10:47 | 2024-01-01 10:47 | 2024-01-01 11:05 | 2024-01-01 10:47 | False | | 1 | shipped | 2024-01-01 11:05 | 2024-01-01 11:05 | 2024-01-01 11:20 | 2024-01-01 11:05 | False | | 1 | deleted | 2024-01-01 11:20 | 2024-01-01 11:20 | | 2024-01-01 11:20 | True | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | * For the `check` strategy, the current timestamp is used to populate each column. If configured, the `check` strategy uses the `updated_at` column instead, as with the timestamp strategy.  Sample results for the check strategy Snapshot query results at `2024-01-01 11:00` | id | status | | -- | ------- | | 1 | pending | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | Snapshot results: | id | status | dbt\_valid\_from | dbt\_valid\_to | dbt\_updated\_at | | -- | ------- | ---------------- | -------------- | ---------------- | | 1 | pending | 2024-01-01 11:00 | | 2024-01-01 11:00 | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | Query results at `2024-01-01 11:30`: | id | status | | -- | ------- | | 1 | shipped | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | Snapshot results: | id | status | dbt\_valid\_from | dbt\_valid\_to | dbt\_updated\_at | | -- | ------- | ---------------- | ---------------- | ---------------- | | 1 | pending | 2024-01-01 11:00 | 2024-01-01 11:30 | 2024-01-01 11:00 | | 1 | shipped | 2024-01-01 11:30 | | 2024-01-01 11:30 | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | Snapshot results with `hard_deletes='new_record'`: | id | status | dbt\_valid\_from | dbt\_valid\_to | dbt\_updated\_at | dbt\_is\_deleted | | -- | ------- | ---------------- | ---------------- | ---------------- | ---------------- | | 1 | pending | 2024-01-01 11:00 | 2024-01-01 11:30 | 2024-01-01 11:00 | False | | 1 | shipped | 2024-01-01 11:30 | 2024-01-01 11:40 | 2024-01-01 11:30 | False | | 1 | deleted | 2024-01-01 11:40 | | 2024-01-01 11:40 | True | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | #### FAQs[​](#faqs "Direct link to FAQs") How do I run one snapshot at a time? To run one snapshot, use the `--select` flag, followed by the name of the snapshot: ```shell $ dbt snapshot --select order_snapshot ``` Check out the [model selection syntax documentation](https://docs.getdbt.com/reference/node-selection/syntax.md) for more operators and examples. How often should I run the snapshot command? Snapshots are a batch-based approach to [change data capture](https://en.wikipedia.org/wiki/Change_data_capture). The `dbt snapshot` command must be run on a schedule to ensure that changes to tables are actually recorded! While individual use-cases may vary, snapshots are intended to be run between hourly and daily. If you find yourself snapshotting more frequently than that, consider if there isn't a more appropriate way to capture changes in your source data tables. What happens if I add new columns to my snapshot query? When the columns of your source query changes, dbt will attempt to reconcile this change in the destination snapshot table. dbt does this by: 1. Creating new columns from the source query in the destination table 2. Expanding the size of string types where necessary (eg. `varchar`s on Redshift) dbt *will not* delete columns in the destination snapshot table if they are removed from the source query. It will also not change the type of a column beyond expanding the size of varchar columns. That is, if a `string` column is changed to a `date` column in the snapshot source query, dbt will not attempt to change the type of the column in the destination table. Do hooks run with snapshots? Yes! The following hooks are available for snapshots: * [pre-hooks](https://docs.getdbt.com/reference/resource-configs/pre-hook-post-hook.md) * [post-hooks](https://docs.getdbt.com/reference/resource-configs/pre-hook-post-hook.md) * [on-run-start](https://docs.getdbt.com/reference/project-configs/on-run-start-on-run-end.md) * [on-run-end](https://docs.getdbt.com/reference/project-configs/on-run-start-on-run-end.md) Can I store my snapshots in a directory other than the \`snapshot\` directory in my project? By default, dbt expects your snapshot files to be located in the `snapshots` subdirectory of your project. To change this, update the [snapshot-paths](https://docs.getdbt.com/reference/project-configs/snapshot-paths.md) configuration in your `dbt_project.yml` file, like so: dbt\_project.yml ```yml snapshot-paths: ["snapshots"] ``` Note that you cannot co-locate snapshots and models in the same directory. Debug Snapshot target is not a snapshot table errors If you see the following error when you try executing the snapshot command: > Snapshot target is not a snapshot table (missing `dbt_scd_id`, `dbt_valid_from`, `dbt_valid_to`) Double check that you haven't inadvertently caused your snapshot to behave like table materializations by setting its `materialized` config to be `table`. Prior to dbt version 1.4, it was possible to have a snapshot like this: ```sql {% snapshot snappy %} {{ config(materialized = 'table', ...) }} ... {% endsnapshot %} ``` dbt is treating snapshots like tables (issuing `create or replace table ...` statements) **silently** instead of actually snapshotting data (SCD2 via `insert` / `merge` statements). When upgrading to dbt versions 1.4 and higher, dbt now raises a Parsing Error (instead of silently treating snapshots like tables) that reads: ```text A snapshot must have a materialized value of 'snapshot' ``` This tells you to change your `materialized` config to `snapshot`. But when you make that change, you might encounter an error message saying that certain fields like `dbt_scd_id` are missing. This error happens because, previously, when dbt treated snapshots as tables, it didn't include the necessary [snapshot meta-fields](https://docs.getdbt.com/docs/build/snapshots.md#snapshot-meta-fields) in your target table. Since those meta-fields don't exist, dbt correctly identifies that you're trying to create a snapshot in a table that isn't actually a snapshot. When this happens, you have to start from scratch — re-snapshotting your source data as if it was the first time by dropping your "snapshot" which isn't a real snapshot table. Then dbt snapshot will create a new snapshot and insert the snapshot meta-fields as expected. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Add sources to your DAG #### Related reference docs[​](#related-reference-docs "Direct link to Related reference docs") * [Source properties](https://docs.getdbt.com/reference/source-properties.md) * [Source configurations](https://docs.getdbt.com/reference/source-configs.md) * [`{{ source() }}` Jinja function](https://docs.getdbt.com/reference/dbt-jinja-functions/source.md) * [`source freshness` command](https://docs.getdbt.com/reference/commands/source.md) #### Using sources[​](#using-sources "Direct link to Using sources") Sources make it possible to name and describe the data loaded into your warehouse by your Extract and Load tools. By declaring these tables as sources in dbt, you can then * select from source tables in your models using the [`{{ source() }}` function,](https://docs.getdbt.com/reference/dbt-jinja-functions/source.md) helping define the lineage of your data * test your assumptions about your source data * calculate the freshness of your source data ##### Declaring a source[​](#declaring-a-source "Direct link to Declaring a source") Sources are defined in `.yml` files nested under a `sources:` key. models/\.yml ```yaml sources: - name: jaffle_shop database: raw schema: jaffle_shop tables: - name: orders - name: customers - name: stripe tables: - name: payments ``` \*By default, `schema` will be the same as `name`. Add `schema` only if you want to use a source name that differs from the existing schema. If you're not already familiar with these files, be sure to check out [the documentation on properties.yml files](https://docs.getdbt.com/reference/configs-and-properties.md) before proceeding. ##### Selecting from a source[​](#selecting-from-a-source "Direct link to Selecting from a source") Once a source has been defined, it can be referenced from a model using the [`{{ source()}}` function](https://docs.getdbt.com/reference/dbt-jinja-functions/source.md). models/orders.sql ```sql select ... from {{ source('jaffle_shop', 'orders') }} left join {{ source('jaffle_shop', 'customers') }} using (customer_id) ``` dbt will compile this to the full table name: target/compiled/jaffle\_shop/models/my\_model.sql ```sql select ... from raw.jaffle_shop.orders left join raw.jaffle_shop.customers using (customer_id) ``` Using the `{{ source () }}` function also creates a dependency between the model and the source table. [![The source function tells dbt a model is dependent on a source ](/img/docs/building-a-dbt-project/sources-dag.png?v=2 "The source function tells dbt a model is dependent on a source ")](#)The source function tells dbt a model is dependent on a source ##### Testing and documenting sources[​](#testing-and-documenting-sources "Direct link to Testing and documenting sources") You can also: * Add data tests to sources * Add descriptions to sources, that get rendered as part of your documentation site These should be familiar concepts if you've already added data tests and descriptions to your models (if not check out the guides on [testing](https://docs.getdbt.com/docs/build/data-tests.md) and [documentation](https://docs.getdbt.com/docs/build/documentation.md)). models/\.yml ```yaml sources: - name: jaffle_shop description: This is a replica of the Postgres database used by our app tables: - name: orders database: raw description: > One record per order. Includes cancelled and deleted orders. columns: - name: id description: Primary key of the orders table data_tests: - unique - not_null - name: status description: Note that the status can change over time - name: ... - name: ... ``` You can find more details on the available properties for sources in the [reference section](https://docs.getdbt.com/reference/source-properties.md). ##### FAQs[​](#faqs "Direct link to FAQs") What if my source is in a poorly named schema or table? By default, dbt will use the `name:` parameters to construct the source reference. If these names are a little less-than-perfect, use the [schema](https://docs.getdbt.com/reference/resource-properties/schema.md) and [identifier](https://docs.getdbt.com/reference/resource-properties/identifier.md) properties to define the names as per the database, and use your `name:` property for the name that makes sense! models/\.yml ```yml sources: - name: jaffle_shop database: raw schema: postgres_backend_public_schema tables: - name: orders identifier: api_orders ``` In a downstream model: ```sql select * from {{ source('jaffle_shop', 'orders') }} ``` Will get compiled to: ```sql select * from raw.postgres_backend_public_schema.api_orders ``` What if my source is in a different database to my target database? Use the [`database` property](https://docs.getdbt.com/reference/resource-properties/database.md) to define the database that the source is in. models/\.yml ```yml sources: - name: jaffle_shop database: raw schema: jaffle_shop tables: - name: orders - name: customers ``` I need to use quotes to select from my source, what should I do? This is reasonably common on Snowflake in particular. By default, dbt will not quote the database, schema, or identifier for the source tables that you've specified. To force dbt to quote one of these values, use the [`quoting` property](https://docs.getdbt.com/reference/resource-properties/quoting.md): models/\.yml ```yaml sources: - name: jaffle_shop database: raw schema: jaffle_shop quoting: database: true schema: true identifier: true tables: - name: order_items - name: orders # This overrides the `jaffle_shop` quoting config quoting: identifier: false ``` How do I run data tests on just my sources? To run data tests on all sources, use the following command: ```shell dbt test --select "source:*" ``` (You can also use the `-s` shorthand here instead of `--select`) To run data tests on one source (and all of its tables): ```shell $ dbt test --select source:jaffle_shop ``` And, to run data tests on one source table only: ```shell $ dbt test --select source:jaffle_shop.orders ``` How do I run models downstream of one source? To run models downstream of a source, use the `source:` selector: ```shell $ dbt run --select source:jaffle_shop+ ``` (You can also use the `-s` shorthand here instead of `--select`) To run models downstream of one source table: ```shell $ dbt run --select source:jaffle_shop.orders+ ``` Check out the [model selection syntax](https://docs.getdbt.com/reference/node-selection/syntax.md) for more examples! #### Source data freshness[​](#source-data-freshness "Direct link to Source data freshness") With a couple of extra configs, dbt can optionally capture the "freshness" of the data in your source tables. This is useful for understanding if your data pipelines are in a healthy state, and is a critical component of defining Service Level Agreements (SLAs) for your warehouse. ##### Fusion and state-aware orchestration[​](#fusion-and-state-aware-orchestration "Direct link to Fusion and state-aware orchestration") If you're using the dbt Fusion engine with [state-aware orchestration](https://docs.getdbt.com/docs/deploy/state-aware-about.md), dbt automatically tracks source freshness using warehouse metadata. You don't need to configure `warn_after` or `error_after` for dbt to detect when source data changes. However, you should still configure source freshness if you want to: * Receive SLA alerts when sources don't update within expected timeframes. * Define custom freshness logic using [advanced configurations](https://docs.getdbt.com/docs/deploy/state-aware-setup.md#advanced-configurations) `loaded_at_field` or `loaded_at_query` (for example, for streaming data or partial loads). * Track freshness for source views. Fusion treats views as "always fresh" since it can't determine freshness from view metadata. ##### Declaring source freshness[​](#declaring-source-freshness "Direct link to Declaring source freshness") To configure source freshness information, add a `freshness` block to your source and `loaded_at_field` to your table declaration: models/\.yml ```yaml sources: - name: jaffle_shop database: raw config: freshness: # default freshness # changed to config in v1.9 warn_after: {count: 12, period: hour} error_after: {count: 24, period: hour} loaded_at_field: _etl_loaded_at # changed to config in v1.10 tables: - name: orders config: freshness: # make this a little more strict warn_after: {count: 6, period: hour} error_after: {count: 12, period: hour} - name: customers # this inherits the default freshness defined in the jaffle_shop source block at the beginning - name: product_skus config: freshness: null # do not check freshness for this table ``` In the `freshness` block, one or both of `warn_after` and `error_after` can be provided. If neither is provided, then dbt will not calculate freshness for the tables in this source. Additionally, the `loaded_at_field` is required to calculate freshness for a table (except for cases where dbt can leverage warehouse metadata to calculate freshness). If a `loaded_at_field`, or viable alternative, is not provided, then dbt will not calculate freshness for the table. These configs are applied hierarchically, so `freshness` and `loaded_at_field` values specified for a `source` will flow through to all of the `tables` defined in that source. This is useful when all of the tables in a source have the same `loaded_at_field`, as the config can just be specified once in the top-level source definition. ##### Checking source freshness[​](#checking-source-freshness "Direct link to Checking source freshness") To obtain freshness information for your sources, use the `dbt source freshness` command ([reference docs](https://docs.getdbt.com/reference/commands/source.md)): ```text $ dbt source freshness ``` Behind the scenes, dbt uses the freshness properties to construct a `select` query, shown below. You can find this query in the [query logs](https://docs.getdbt.com/faqs/Runs/checking-logs.md). ```sql select max(_etl_loaded_at) as max_loaded_at, convert_timezone('UTC', current_timestamp()) as calculated_at from raw.jaffle_shop.orders ``` The results of this query are used to determine whether the source is fresh or not: [![Uh oh! Not everything is as fresh as we'd like!](/img/docs/building-a-dbt-project/snapshot-freshness.png?v=2 "Uh oh! Not everything is as fresh as we'd like!")](#)Uh oh! Not everything is as fresh as we'd like! ##### Build models based on source freshness[​](#build-models-based-on-source-freshness "Direct link to Build models based on source freshness") Our best practice recommendation is to use [data source freshness](https://docs.getdbt.com/docs/build/sources.md#declaring-source-freshness). This will allow settings to be transfered into a `.yml` file where source freshness is defined on [model level](https://docs.getdbt.com/reference/resource-properties/freshness.md). To build models based on source freshness in dbt: 1. Run `dbt source freshness` to check the freshness of your sources. 2. Use the `dbt build --select source_status:fresher+` command to build and test models downstream of fresher sources. Using these commands in order makes sure models update with the latest data. This eliminates wasted compute cycles on unchanged data and builds models *only* when necessary. Set [source freshness snapshots](https://docs.getdbt.com/docs/deploy/source-freshness.md#enabling-source-freshness-snapshots) to 30 minutes to check for source freshness, then run a job which rebuilds every hour to rebuild model. This setup retrieves all the models and rebuild them in one attempt if their source freshness has expired. For more information, refer to [Source freshness snapshot frequency](https://docs.getdbt.com/docs/deploy/source-freshness.md#source-freshness-snapshot-frequency). ##### Filter[​](#filter "Direct link to Filter") Some databases can have tables where a filter over certain columns are required, in order prevent a full scan of the table, which could be costly. In order to do a freshness check on such tables a `filter` argument can be added to the configuration, for example, `filter: _etl_loaded_at >= date_sub(current_date(), interval 1 day)`. For the example above, the resulting query would look like ```sql select max(_etl_loaded_at) as max_loaded_at, convert_timezone('UTC', current_timestamp()) as calculated_at from raw.jaffle_shop.orders where _etl_loaded_at >= date_sub(current_date(), interval 1 day) ``` ##### FAQs[​](#faqs-1 "Direct link to FAQs") How do I exclude a table from a freshness snapshot? Some tables in a data source may be updated infrequently. If you've set a `freshness` property at the source level, this table is likely to fail checks. To work around this, you can set the table's freshness to null (`freshness: null`) to "unset" the freshness for a particular table: models/\.yml ```yaml sources: - name: jaffle_shop database: raw schema: jaffle_shop config: freshness: warn_after: {count: 12, period: hour} error_after: {count: 24, period: hour} loaded_at_field: _etl_loaded_at tables: - name: orders - name: product_skus config: freshness: null # do not check freshness for this table ``` How do I snapshot freshness for one source only? Use the `--select` flag to snapshot freshness for specific sources. Eg: ```shell # Snapshot freshness for all Jaffle Shop tables: $ dbt source freshness --select source:jaffle_shop # Snapshot freshness for a particular source : $ dbt source freshness --select source:jaffle_shop.orders # Snapshot freshness for multiple particular source tables: $ dbt source freshness --select source:jaffle_shop.orders source:jaffle_shop.customers ``` See the [`source freshness` command reference](https://docs.getdbt.com/reference/commands/source.md) for more information. Are the results of freshness stored anywhere? Yes! The `dbt source freshness` command will output a pass/warning/error status for each table selected in the freshness snapshot. Additionally, dbt will write the freshness results to a file in the `target/` directory called `sources.json` by default. You can also override this destination, use the `-o` flag to the `dbt source freshness` command. After enabling source freshness within a job, configure [Artifacts](https://docs.getdbt.com/docs/deploy/artifacts.md) in your **Project Details** page, which you can find by selecting your account name on the left side menu in dbt and clicking **Account settings**. You can see the current status for source freshness by clicking **View Sources** in the job page. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Administer the Semantic Layer StarterEnterpriseEnterprise + ### Administer the Semantic Layer [Starter](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing")[Enterprise](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing")[Enterprise +](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing") With the dbt Semantic Layer, you can centrally define business metrics, reduce code duplication and inconsistency, create self-service in downstream tools, and more. This topic shows you how to set up credentials and tokens so that other tools can query the Semantic Layer. #### Prerequisites[​](#prerequisites "Direct link to Prerequisites") * Have a dbt Starter, Enterprise, or Enterprise+ account. Available on all [tenant configurations](https://docs.getdbt.com/docs/cloud/about-cloud/tenancy.md). * Ensure your production and development environments are on a [supported dbt version](https://docs.getdbt.com/docs/dbt-versions/upgrade-dbt-version-in-cloud.md). * Use Snowflake, BigQuery, Databricks, Redshift, Postgres, or Trino. * Create a successful run in the environment where you configure the Semantic Layer. * **Note:** Semantic Layer supports querying in Deployment environments; development querying is coming soon. * Understand [MetricFlow's](https://docs.getdbt.com/docs/build/about-metricflow.md) key concepts powering the Semantic Layer. * Note that the Semantic Layer doesn't support using [Single sign-on (SSO)](https://docs.getdbt.com/docs/cloud/manage-access/sso-overview.md) for [production credentials](https://docs.getdbt.com/docs/dbt-cloud-apis/service-tokens.md#permissions-for-service-account-tokens), though SSO is supported for development user accounts. 📹 Learn about the dbt Semantic Layer with on-demand video courses! Explore our [dbt Semantic Layer on-demand course](https://learn.getdbt.com/courses/semantic-layer) to learn how to define and query metrics in your dbt project. Additionally, dive into mini-courses for querying the dbt Semantic Layer in your favorite tools: [Tableau](https://courses.getdbt.com/courses/tableau-querying-the-semantic-layer), [Excel](https://learn.getdbt.com/courses/querying-the-semantic-layer-with-excel), [Hex](https://courses.getdbt.com/courses/hex-querying-the-semantic-layer), and [Mode](https://courses.getdbt.com/courses/mode-querying-the-semantic-layer). #### Administer the Semantic Layer[​](#administer-the-semantic-layer "Direct link to Administer the Semantic Layer") You must be part of the Owner group and have the correct [license](https://docs.getdbt.com/docs/cloud/manage-access/seats-and-users.md) and [permissions](https://docs.getdbt.com/docs/cloud/manage-access/enterprise-permissions.md) to administer the Semantic Layer at the environment and project level. * Enterprise+ and Enterprise plan: * Developer license with Account Admin permissions, or * Owner with a Developer license, assigned Project Creator, Database Admin, or Admin permissions. * Starter plan: Owner with a Developer license. * Free trial: You are on a free trial of the Starter plan as an Owner, which means you have access to the dbt Semantic Layer. ##### 1. Select environment[​](#1-select-environment "Direct link to 1. Select environment") Select the environment where you want to enable the Semantic Layer: 1. Navigate to **Account settings** in the navigation menu. 2. Under **Settings**, click **Projects** and select the specific project you want to enable the Semantic Layer for. 3. In the **Project details** page, navigate to the **Semantic Layer** section. Select **Configure Semantic Layer**. [![Semantic Layer section in the 'Project details' page](/img/docs/dbt-cloud/semantic-layer/new-sl-configure.png?v=2 "Semantic Layer section in the 'Project details' page")](#)Semantic Layer section in the 'Project details' page 4. In the **Set Up Semantic Layer Configuration** page, select the deployment environment you want for the Semantic Layer and click **Save**. This provides administrators with the flexibility to choose the environment where the Semantic Layer will be enabled. [![Select the deployment environment to run your Semantic Layer against.](/img/docs/dbt-cloud/semantic-layer/sl-select-env.png?v=2 "Select the deployment environment to run your Semantic Layer against.")](#)Select the deployment environment to run your Semantic Layer against. ##### 2. Configure credentials and create tokens[​](#2-configure-credentials-and-create-tokens "Direct link to 2. Configure credentials and create tokens") There are two options for setting up Semantic Layer using API tokens: * [Add a credential and create service tokens](#add-a-credential-and-create-service-tokens) * [Configure development credentials and create personal tokens](#configure-development-credentials-and-create-a-personal-token) ###### Add a credential and create service tokens[​](#add-a-credential-and-create-service-tokens "Direct link to Add a credential and create service tokens") The first option is to use [service tokens](https://docs.getdbt.com/docs/dbt-cloud-apis/service-tokens.md) for authentication which are tied to an underlying data platform credential that you configure. The credential configured is used to execute queries that the Semantic Layer issues against your data platform. This credential controls the physical access to underlying data accessed by the Semantic Layer, and all access policies set in the data platform for this credential will be respected. | Feature | Starter plan | Enterprise+ and Enterprise plan | | --------------------------------------------------- | ------------------------------------------------------------ | --------------------------------------------------------------------------------------------------------------------------------------------------------------- | | Service tokens | Can create multiple service tokens linked to one credential. | Can use multiple credentials and link multiple service tokens to each credential. Note that you cannot link a single service token to more than one credential. | | Credentials per project | One credential per project. | Can [add multiple](#4-add-more-credentials) credentials per project. | | Link multiple service tokens to a single credential | ✅ | ✅ | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | *If you're on a Starter plan and need to add more credentials, consider upgrading to our [Enterprise+ or Enterprise plan](https://www.getdbt.com/contact). All Enterprise users can refer to [Add more credentials](#4-add-more-credentials) for detailed steps on adding multiple credentials.* ###### 1. Select deployment environment[​](#1--select-deployment-environment "Direct link to 1. Select deployment environment") * After selecting the deployment environment, you should see the **Credentials & service tokens** page. * Click the **Add Semantic Layer credential** button. ###### 2. Configure credential[​](#2-configure-credential "Direct link to 2. Configure credential") * In the **1. Add credentials** section, enter the credentials specific to your data platform that you want the Semantic Layer to use. * Use credentials with minimal privileges. The Semantic Layer requires read access to the schema(s) containing the dbt models used in your semantic models for downstream applications * Use [Extended Attributes](https://docs.getdbt.com/docs/dbt-cloud-environments.md#extended-attributes) and [Environment Variables](https://docs.getdbt.com/docs/build/environment-variables.md) when connecting to the Semantic Layer. If you set a value directly in the Semantic Layer Credentials, it will have a higher priority than Extended Attributes. When using environment variables, the default value for the environment will be used. For example, set the warehouse by using `{{env_var('DBT_WAREHOUSE')}}` in your Semantic Layer credentials. Similarly, if you set the account value using `{{env_var('DBT_ACCOUNT')}}` in Extended Attributes, dbt will check both the Extended Attributes and the environment variable. [![Add credentials and map them to a service token. ](/img/docs/dbt-cloud/semantic-layer/sl-add-credential.png?v=2 "Add credentials and map them to a service token. ")](#)Add credentials and map them to a service token. ###### 3. Create or link service tokens[​](#3-create-or-link-service-tokens "Direct link to 3. Create or link service tokens") * If you have permission to create service tokens, you’ll see the [**Map new service token** option](https://docs.getdbt.com/docs/use-dbt-semantic-layer/setup-sl.md#map-service-tokens-to-credentials) after adding the credential. Name the token, set permissions to 'Semantic Layer Only' and 'Metadata Only', and click **Save**. * Once the token is generated, you won't be able to view this token again, so make sure to record it somewhere safe. * If you don’t have access to create service tokens, you’ll see a message prompting you to contact your admin to create one for you. Admins can create and link tokens as needed. [![If you don’t have access to create service tokens, you can create a credential and contact your admin to create one for you.](/img/docs/dbt-cloud/semantic-layer/sl-credential-no-service-token.png?v=2 "If you don’t have access to create service tokens, you can create a credential and contact your admin to create one for you.")](#)If you don’t have access to create service tokens, you can create a credential and contact your admin to create one for you. info * Starter plans can create multiple service tokens that link to a single underlying credential, but each project can only have one credential. * All Enterprise plans can [add multiple credentials](#4-add-more-credentials) and map those to service tokens for tailored access. [Book a free live demo](https://www.getdbt.com/contact) to discover the full potential of dbt Enterprise and higher plans. ###### Configure development credentials and create a personal token[​](#configure-development-credentials-and-create-a-personal-token "Direct link to Configure development credentials and create a personal token") Using [personal access tokens (PATs)](https://docs.getdbt.com/docs/dbt-cloud-apis/user-tokens.md) is also a supported authentication method for the dbt Semantic Layer. This enables user-level authentication, reducing the need for sharing tokens between users. When you authenticate using PATs, queries are run using your personal development credentials. To use PATs in Semantic Layer: 1. Configure your development credentials. 1. Click your account name at the bottom left-hand menu and go to **Account settings** > **Credentials**. 2. Select your project. 3. Click **Edit**. 4. Go to **Development credentials** and enter your details. 5. Click **Save**. 2. [Create a personal access token](https://docs.getdbt.com/docs/dbt-cloud-apis/user-tokens.md). Make sure to copy the token. You can use the generated PAT as the authentication method for Semantic Layer [APIs](https://docs.getdbt.com/docs/dbt-cloud-apis/sl-api-overview.md) and [integrations](https://docs.getdbt.com/docs/cloud-integrations/avail-sl-integrations.md). ##### 3. View connection detail[​](#3-view-connection-detail "Direct link to 3. View connection detail") 1. Go back to the **Project details** page for connection details to connect to downstream tools. 2. Copy and share the Environment ID, service or personal token, Host, as well as the service or personal token name to the relevant teams for BI connection setup. If your tool uses the GraphQL API, save the GraphQL API host information instead of the JDBC URL. For info on how to connect to other integrations, refer to [Available integrations](https://docs.getdbt.com/docs/cloud-integrations/avail-sl-integrations.md). [![After configuring, you'll be provided with the connection details to connect to you downstream tools.](/img/docs/dbt-cloud/semantic-layer/sl-configure-example.png?v=2 "After configuring, you'll be provided with the connection details to connect to you downstream tools.")](#)After configuring, you'll be provided with the connection details to connect to you downstream tools. ##### 4. Add more credentials [Enterprise +](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing")[Enterprise](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing")[​](#4-add-more-credentials- "Direct link to 4-add-more-credentials-") All dbt Enterprise plans can optionally add multiple credentials and map them to service tokens, offering more granular control and tailored access for different teams, which can then be shared to relevant teams for BI connection setup. These credentials control the physical access to underlying data accessed by the Semantic Layer. We recommend configuring credentials and service tokens to reflect your teams and their roles. For example, create tokens or credentials that align with your team's needs, such as providing access to finance-related schemas to the Finance team.  Considerations for linking credentials * Admins can link multiple service tokens to a single credential within a project, but each service token can only be linked to one credential per project. * When you send a request through the APIs, the service token of the linked credential will follow access policies of the underlying view and tables used to build your semantic layer requests. * Use [Extended Attributes](https://docs.getdbt.com/docs/dbt-cloud-environments.md#extended-attributes) and [Environment Variables](https://docs.getdbt.com/docs/build/environment-variables.md) when connecting to the Semantic Layer. If you set a value directly in the Semantic Layer Credentials, it will have a higher priority than Extended Attributes. When using environment variables, the default value for the environment will be used. For example, set the warehouse by using `{{env_var('DBT_WAREHOUSE')}}` in your Semantic Layer credentials. Similarly, if you set the account value using `{{env_var('DBT_ACCOUNT')}}` in Extended Attributes, dbt will check both the Extended Attributes and the environment variable. ###### 1. Add more credentials[​](#1-add-more-credentials "Direct link to 1. Add more credentials") * After configuring your environment, on the **Credentials & service tokens** page, click the **Add Semantic Layer credential** button to create multiple credentials and map them to a service token.
* In the **1. Add credentials** section, fill in the data platform's credential fields. We recommend using “read-only” credentials. [![Add credentials and map them to a service token. ](/img/docs/dbt-cloud/semantic-layer/sl-add-credential.png?v=2 "Add credentials and map them to a service token. ")](#)Add credentials and map them to a service token. ###### 2. Map service tokens to credentials[​](#2-map-service-tokens-to-credentials "Direct link to 2. Map service tokens to credentials") * In the **2. Map new service token** section, [map a service token to the credential](https://docs.getdbt.com/docs/use-dbt-semantic-layer/setup-sl.md#map-service-tokens-to-credentials) you configured in the previous step. dbt automatically selects the service token permission set you need (Semantic Layer Only and Metadata Only). * To add another service token during configuration, click **Add Service Token**. * You can link more service tokens to the same credential later on in the **Semantic Layer Configuration Details** page. To add another service token to an existing Semantic Layer configuration, click **Add service token** under the **Linked service tokens** section. * Click **Save** to link the service token to the credential. Remember to copy and save the service token securely, as it won't be viewable again after generation. [![Use the configuration page to manage multiple credentials or link or unlink service tokens for more granular control.](/img/docs/dbt-cloud/semantic-layer/sl-credentials-service-token.png?v=2 "Use the configuration page to manage multiple credentials or link or unlink service tokens for more granular control.")](#)Use the configuration page to manage multiple credentials or link or unlink service tokens for more granular control. ###### 3. Delete credentials[​](#3-delete-credentials "Direct link to 3. Delete credentials") * To delete a credential, go back to the **Credentials & service tokens** page. * Under **Linked Service Tokens**, click **Edit** and, select **Delete Credential** to remove a credential. When you delete a credential, any service tokens mapped to that credential in the project will no longer work and will break for any end users. ##### Delete configuration[​](#delete-configuration "Direct link to Delete configuration") You can delete the entire Semantic Layer configuration for a project. Note that deleting the Semantic Layer configuration will remove all credentials and unlink all service tokens to the project. It will also cause all queries to the Semantic Layer to fail. Follow these steps to delete the Semantic Layer configuration for a project: 1. Navigate to the **Project details** page. 2. In the **Semantic Layer** section, select **Delete Semantic Layer**. 3. Confirm the deletion by clicking **Yes, delete semantic layer** in the confirmation pop up. To re-enable the dbt Semantic Layer setup in the future, you will need to recreate your setup configurations by following the [previous steps](#set-up-dbt-semantic-layer). If your semantic models and metrics are still in your project, no changes are needed. If you've removed them, you'll need to set up the YAML configs again. [![Delete the Semantic Layer configuration for a project.](/img/docs/dbt-cloud/semantic-layer/sl-delete-config.png?v=2 "Delete the Semantic Layer configuration for a project.")](#)Delete the Semantic Layer configuration for a project. #### Additional configuration[​](#additional-configuration "Direct link to Additional configuration") The following are the additional flexible configurations for Semantic Layer credentials. ##### Map service tokens to credentials[​](#map-service-tokens-to-credentials "Direct link to Map service tokens to credentials") * After configuring your environment, you can map additional service tokens to the same credential if you have the required [permissions](https://docs.getdbt.com/docs/cloud/manage-access/about-user-access.md#permission-sets). * Go to the **Credentials & service tokens** page and click the **+Add Service Token** button in the **Linked Service Tokens** section. * Type the service token name and select the permission set you need (Semantic Layer Only and Metadata Only). * Click **Save** to link the service token to the credential. * Remember to copy and save the service token securely, as it won't be viewable again after generation. [![Map additional service tokens to a credential.](/img/docs/dbt-cloud/semantic-layer/sl-add-service-token.gif?v=2 "Map additional service tokens to a credential.")](#)Map additional service tokens to a credential. ##### Unlink service tokens[​](#unlink-service-tokens "Direct link to Unlink service tokens") * Unlink a service token from the credential by clicking **Unlink** under the **Linked service tokens** section. If you try to query the Semantic Layer with an unlinked credential, you'll experience an error in your BI tool because no valid token is mapped. ##### Manage from service token page[​](#manage-from-service-token-page "Direct link to Manage from service token page") **View credential from service token** * View your Semantic Layer credential directly by navigating to the **API tokens** and then **Service tokens** page. * Select the service token to view the credential it's linked to. This is useful if you want to know which service tokens are mapped to credentials in your project. ###### Create a new service token[​](#create-a-new-service-token "Direct link to Create a new service token") * From the **Service tokens** page, create a new service token and map it to the credential(s) (assuming the semantic layer permission exists). This is useful if you want to create a new service token and directly map it to a credential in your project. * Make sure to select the correct permission set for the service token (Semantic Layer Only and Metadata Only). [![Create a new service token and map credentials directly on the separate 'Service tokens page'.](/img/docs/dbt-cloud/semantic-layer/sl-create-service-token-page.png?v=2 "Create a new service token and map credentials directly on the separate 'Service tokens page'.")](#)Create a new service token and map credentials directly on the separate 'Service tokens page'. #### Next steps[​](#next-steps "Direct link to Next steps") * Now that you've set up your credentials and tokens, start querying your metrics with the [available integrations](https://docs.getdbt.com/docs/cloud-integrations/avail-sl-integrations.md). * [Optimize querying performance](https://docs.getdbt.com/docs/use-dbt-semantic-layer/sl-cache.md) using declarative caching. * [Validate semantic nodes in CI](https://docs.getdbt.com/docs/deploy/ci-jobs.md#semantic-validations-in-ci) to ensure code changes made to dbt models don't break these metrics. * If you haven't already, learn how to [build you metrics and semantic models](https://docs.getdbt.com/docs/build/build-metrics-intro.md) in your development tool of choice. * Learn about commonly asked [Semantic Layer FAQs](https://docs.getdbt.com/docs/use-dbt-semantic-layer/sl-faqs.md). #### FAQs[​](#faqs "Direct link to FAQs")  How does caching interact with access controls? Cached data is stored separately from the underlying models. If metrics are pulled from the cache, we don’t have the security context applied to those tables at query time. In the future, we plan to clone credentials, identify the minimum access level needed, and apply those permissions to cached tables. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Advanced CI EnterpriseEnterprise + ### Advanced CI [Enterprise](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing")[Enterprise +](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing") [Continuous integration workflows](https://docs.getdbt.com/docs/deploy/continuous-integration.md) help increase the governance and improve the quality of the data. Additionally for these CI jobs, you can use Advanced CI features, such as [compare changes](#compare-changes), that provide details about the changes between what's currently in your production environment and the pull request's latest commit, giving you observability into how data changes are affected by your code changes. By analyzing the data changes that code changes produce, you can ensure you're always shipping trustworthy data products as you're developing. How to enable this feature You can opt into Advanced CI in dbt. Please refer to [Account access to Advanced CI features](https://docs.getdbt.com/docs/cloud/account-settings.md#account-access-to-advanced-ci-features) to learn how enable it in your dbt account. #### Prerequisites[​](#prerequisites "Direct link to Prerequisites") * You have a dbt Enterprise or Enterprise+ account. * You have [Advanced CI features](https://docs.getdbt.com/docs/cloud/account-settings.md#account-access-to-advanced-features) enabled. * You use a supported data platform: BigQuery, Databricks, Postgres, Redshift, or Snowflake. Support for additional data platforms coming soon. #### Compare changes feature[​](#compare-changes "Direct link to Compare changes feature") For [CI jobs](https://docs.getdbt.com/docs/deploy/ci-jobs.md) that have the [**dbt compare** option enabled](https://docs.getdbt.com/docs/deploy/ci-jobs.md#set-up-ci-jobs), dbt compares the changes between the last applied state of the production environment (defaulting to deferral for lower compute costs) and the latest changes from the pull request, whenever a pull request is opened or new commits are pushed. dbt reports the comparison differences in: * **dbt** — Shows the changes (if any) to the data's primary keys, rows, and columns in the [Compare tab](https://docs.getdbt.com/docs/deploy/run-visibility.md#compare-tab) from the [Job run details](https://docs.getdbt.com/docs/deploy/run-visibility.md#job-run-details) page. * **The pull request from your Git provider** — Shows a summary of the changes as a Git comment. [![Example of the Compare tab](/img/docs/dbt-cloud/example-ci-compare-changes-tab.png?v=2 "Example of the Compare tab")](#)Example of the Compare tab ##### Optimizing comparisons[​](#optimizing-comparisons "Direct link to Optimizing comparisons") When an [`event_time`](https://docs.getdbt.com/reference/resource-configs/event-time.md) column is specified on your model, compare changes can optimize comparisons by using only the overlapping timeframe (meaning the timeframe exists in both the CI and production environment), helping you avoid incorrect row-count changes and return results faster. This is useful in scenarios like: * **Subset of data in CI** — When CI builds only a [subset of data](https://docs.getdbt.com/best-practices/best-practice-workflows.md#limit-the-data-processed-when-in-development) (like the most recent 7 days), compare changes would interpret the excluded data as "deleted rows." Configuring `event_time` allows you to avoid this issue by limiting comparisons to the overlapping timeframe, preventing false alerts about data deletions that are just filtered out in CI. * **Fresher data in CI than in production** — When your CI job includes fresher data than production (because it has run more recently), compare changes would flag the additional rows as "new" data, even though they’re just fresher data in CI. With `event_time` configured, the comparison only includes the shared timeframe and correctly reflects actual changes in the data. [![event\_time ensures the same time-slice of data is accurately compared between your CI and production environments.](/img/docs/deploy/apples_to_apples.png?v=2 "event_time ensures the same time-slice of data is accurately compared between your CI and production environments.")](#)event\_time ensures the same time-slice of data is accurately compared between your CI and production environments. #### About the cached data[​](#about-the-cached-data "Direct link to About the cached data") After [comparing changes](#compare-changes), dbt stores a cache of no more than 100 records for each modified model for preview purposes. By caching this data, you can view the examples of changed data without rerunning the comparison against the data warehouse every time (optimizing for lower compute costs). To display the changes, dbt uses a cached version of a sample of the data records. These data records are queried from the database using the connection configuration (such as user, role, service account, and so on) that's set in the CI job's environment. You control what data to use. This may include synthetic data if pre-production or development data is heavily regulated or sensitive. * The selected data is cached on dbt Labs' systems for up to 30 days. No data is retained on dbt Labs' systems beyond this period. * The cache is encrypted and stored in an Amazon S3 or Azure blob storage in your account’s region. * dbt Labs will not access cached data from Advanced CI for its benefit and the data is only used to provide services as directed by you. * Third-party subcontractors, other than storage subcontractors, will not have access to the cached data. If you access a CI job run that's more than 30 days old, you will not be able to see the comparison results. Instead, a message will appear indicating that the data has expired. [![Example of message about expired data in the Compare tab](/img/docs/deploy/compare-expired.png?v=2 "Example of message about expired data in the Compare tab")](#)Example of message about expired data in the Compare tab #### Connection permissions[​](#connection-permissions "Direct link to Connection permissions") The compare changes feature uses the same credentials as the CI job, as defined in the CI job’s environment. The dbt administrator must ensure that client CI credentials are appropriately restricted since all customer's account users will be able to view the comparison results and the cached data. If using dynamic data masking in the data warehouse, the cached data will no longer be dynamically masked in the Advanced CI output, depending on the permissions of the users who view it. dbt Labs recommends limiting user access to unmasked data or considering using synthetic data for the Advanced CI testing functionality. [![Example of credentials in the user settings](/img/docs/deploy/compare-credentials.png?v=2 "Example of credentials in the user settings")](#)Example of credentials in the user settings #### Troubleshooting[​](#troubleshooting "Direct link to Troubleshooting")  Compare changes CI models need to be on same database host/connection Compare Changes only works if both CI and production models live on the same database host/connection. Compare Changes runs SQL queries in the current CI job’s environment to compare the CI model (like `ci.dbt_cloud_123.foo`) to the production model (`prod.analytics.foo`). If the CI job defers to a production job that's on a different database connection or host, then the compare changes feature will not work as expected. This is because the CI environment can't access or query production objects on another host. In the following example, the CI job can’t access the production model to compare them because they’re on different database hosts: * The dbt CI job in environment `ci.dbt_cloud_123.foo` that connects to host `abc123.rds.amazonaws.com` * The dbt production job in environment `prod.analytics.foo` that connects to host `def456.rds.amazonaws.com` #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Advanced data modeling This section covers advanced topics for the Semantic Layer and MetricFlow, such as data modeling workflows, and more. [![](/img/icons/dbt-bit.svg)](https://docs.getdbt.com/docs/build/fill-nulls-advanced.md) ###### [Fill null values for metrics](https://docs.getdbt.com/docs/build/fill-nulls-advanced.md) [Use fill\_nulls\_with to set null metric values to zero, ensuring numeric values for every data row.](https://docs.getdbt.com/docs/build/fill-nulls-advanced.md) [![](/img/icons/dbt-bit.svg)](https://docs.getdbt.com/docs/build/ref-metrics-in-filters.md) ###### [Metrics as dimensions with metric filters](https://docs.getdbt.com/docs/build/ref-metrics-in-filters.md) [Add metrics as dimensions to your metric filters to create more complex metrics and gain more insights.](https://docs.getdbt.com/docs/build/ref-metrics-in-filters.md) #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Analyses #### Overview[​](#overview "Direct link to Overview") dbt's notion of `models` makes it easy for data teams to version control and collaborate on data transformations. Sometimes though, a certain SQL statement doesn't quite fit into the mold of a dbt model. These more "analytical" SQL files can be versioned inside of your dbt project using the `analysis` functionality of dbt. Any `.sql` files found in the `analyses/` directory of a dbt project will be compiled, but not executed. This means that analysts can use dbt functionality like `{{ ref(...) }}` to select from models in an environment-agnostic way. In practice, an analysis file might look like this (via the [open source Quickbooks models](https://github.com/dbt-labs/quickbooks)): analyses/running\_total\_by\_account.sql ```sql -- analyses/running_total_by_account.sql with journal_entries as ( select * from {{ ref('quickbooks_adjusted_journal_entries') }} ), accounts as ( select * from {{ ref('quickbooks_accounts_transformed') }} ) select txn_date, account_id, adjusted_amount, description, account_name, sum(adjusted_amount) over (partition by account_id order by id rows unbounded preceding) from journal_entries order by account_id, id ``` To compile this analysis into runnable sql, run: ```text dbt compile ``` Then, look for the compiled SQL file in `target/compiled/{project name}/analyses/running_total_by_account.sql`. This SQL can then be pasted into a data visualization tool, for instance. Note that no `running_total_by_account` relation will be materialized in the database as this is an `analysis`, not a `model`. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Analyst agent BetaEnterpriseEnterprise + ### Analyst agent [Beta](https://docs.getdbt.com/docs/dbt-versions/product-lifecycles "Go to https://docs.getdbt.com/docs/dbt-versions/product-lifecycles")[Enterprise](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing")[Enterprise +](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing") The Analyst agent lets you chat with your data and get accurate answers powered by the [dbt Semantic Layer](https://docs.getdbt.com/docs/use-dbt-semantic-layer/dbt-sl.md). Unlike generic AI chat interfaces, the Analyst agent provides consistent, explainable results with transparent SQL, lineage, and data policies. #### Prerequisites[​](#prerequisites "Direct link to Prerequisites") * Enable beta features under **Account settings** > **Personal profile** > **Experimental features**. See [Preview new dbt platform features](https://docs.getdbt.com/docs/dbt-versions/experimental-features.md) for steps. * Have access to [dbt Insights](https://docs.getdbt.com/docs/explore/dbt-insights.md) and meet those prerequisites. * Be on a dbt platform [Enterprise-tier](https://www.getdbt.com/pricing) plan — [book a demo](https://www.getdbt.com/contact) to learn more about Insights. * Available on all [tenant](https://docs.getdbt.com/docs/cloud/about-cloud/tenancy.md) configurations. * Have a dbt [developer license](https://docs.getdbt.com/docs/cloud/manage-access/seats-and-users.md) with access to Insights. * Configured [developer credentials](https://docs.getdbt.com/docs/cloud/studio-ide/develop-in-studio.md#get-started-with-the-cloud-ide). #### Using the Analyst agent[​](#using-the-analyst-agent "Direct link to Using the Analyst agent") Use dbt Copilot to analyze your data and get contextualized results in real time by asking natural language questions to the [Insights](https://docs.getdbt.com/docs/explore/dbt-insights.md) Analyst agent. 1. Click the **Copilot** icon in the Query console sidebar menu. 2. In the dropdown menu above the Copilot prompt box, select **Agent**. 3. In the dbt Copilot prompt box, enter your question. 4. Click **↑** to submit your question. The agent then translates natural language questions into structured queries, executes queries against governed dbt models and metrics, and returns results with references, assumptions, and possible next steps. The agent can loop through these steps multiple times if it hasn't reached a complete answer, allowing for complex, multi-step analysis.⁠ dbt Insights automatically executes the SQL query suggested by the Analyst agent, and you can preview the SQL results in the **Data** tab. 5. Confirm the results or continue asking the agent for more insights about your data. Your conversation with the agent remains even if you switch tabs within dbt Insights. However, they disappear when you navigate out of Insights or when you close your browser. [![Using the Analyst agent in Insights](/img/docs/dbt-insights/insights-copilot-agent.png?v=2 "Using the Analyst agent in Insights")](#)Using the Analyst agent in Insights #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Apache Iceberg Support Apache Iceberg is an open standard table format that brings greater portability and interoperability to the data ecosystem. By standardizing how data is stored and accessed, Iceberg enables teams to work across different engines and platforms with confidence. It has many components to it but the main ones that dbt interacts with are: * **Iceberg Table Format** - an open-source table format. Tables materialized in iceberg table format are stored on a user’s infrastructure, such as a S3 Bucket. * **Iceberg Data Catalog** - an open-source metadata management system that tracks the schema, partition, and versions of Iceberg tables. * **Iceberg REST Protocol** (also referred to as Iceberg REST API) is how engines can support and speak to other Iceberg-compatible catalogs. dbt abstracts the complexity of table formats so teams can focus on delivering reliable, well-modeled data. Our initial integration with Iceberg supports table materializations and catalog integrations, allowing users to define and manage Iceberg tables directly in their dbt projects. To learn more, click on one of the following tiles [![](/img/icons/dbt-icon.svg)](https://docs.getdbt.com/docs/mesh/iceberg/about-catalogs.md) ###### [Using dbt + Iceberg Catalog overview](https://docs.getdbt.com/docs/mesh/iceberg/about-catalogs.md) [dbt support for Apache Iceberg](https://docs.getdbt.com/docs/mesh/iceberg/about-catalogs.md) [![](/img/icons/snowflake.svg)](https://docs.getdbt.com/docs/mesh/iceberg/snowflake-iceberg-support.md) ###### [Snowflake](https://docs.getdbt.com/docs/mesh/iceberg/snowflake-iceberg-support.md) [Snowflake Iceberg Configurations](https://docs.getdbt.com/docs/mesh/iceberg/snowflake-iceberg-support.md) [![](/img/icons/bigquery.svg)](https://docs.getdbt.com/docs/mesh/iceberg/bigquery-iceberg-support.md) ###### [BigQuery](https://docs.getdbt.com/docs/mesh/iceberg/bigquery-iceberg-support.md) [BigQuery Iceberg Configurations](https://docs.getdbt.com/docs/mesh/iceberg/bigquery-iceberg-support.md) [![](/img/icons/databricks.svg)](https://docs.getdbt.com/docs/mesh/iceberg/databricks-iceberg-support.md) ###### [Databricks](https://docs.getdbt.com/docs/mesh/iceberg/databricks-iceberg-support.md) [Databricks Iceberg Configurations](https://docs.getdbt.com/docs/mesh/iceberg/databricks-iceberg-support.md) #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Artifacts When running dbt jobs, dbt generates and saves *artifacts*. You can use these artifacts, like `manifest.json`, `catalog.json`, and `sources.json` to power different aspects of the dbt platform, namely: [Catalog](https://docs.getdbt.com/docs/explore/explore-projects.md), [dbt Docs](https://docs.getdbt.com/docs/explore/build-and-view-your-docs.md#dbt-docs), and [source freshness reporting](https://docs.getdbt.com/docs/build/sources.md#source-data-freshness). #### Create dbt Artifacts[​](#create-dbt-artifacts "Direct link to Create dbt Artifacts") [Catalog](https://docs.getdbt.com/docs/explore/explore-projects.md#generate-metadata) uses the metadata provided by the [Discovery API](https://docs.getdbt.com/docs/dbt-cloud-apis/discovery-api.md) to display the details about [the state of your project](https://docs.getdbt.com/docs/dbt-cloud-apis/project-state.md). It uses metadata from your staging and production [deployment environments](https://docs.getdbt.com/docs/deploy/deploy-environments.md). Catalog automatically retrieves the metadata updates after each job run in the production or staging deployment environment so it always has the latest results for your project — meaning it's always automatically updated after each job run. To view a resource, its metadata, and what commands are needed, refer to [generate metadata](https://docs.getdbt.com/docs/explore/explore-projects.md#generate-metadata) for more details.  For dbt Docs The following steps are for legacy dbt Docs only. For the current documentation experience, see [dbt Catalog](https://docs.getdbt.com/docs/explore/explore-projects.md). While running any job can produce artifacts, you should only associate one production job with a given project to produce the project's artifacts. You can designate this connection on the **Project details** page. To access this page: 1. From the dbt platform, click on your account name in the left side menu and select **Account settings**. 2. Select your project, and click **Edit** in the lower right. 3. Under **Artifacts**, select the jobs you want to produce documentation and source freshness artifacts for. [![Configuring Artifacts](/img/docs/dbt-cloud/using-dbt-cloud/project-level-artifact-updated.png?v=2 "Configuring Artifacts")](#)Configuring Artifacts If you don't see your job listed, you might need to edit the job and select **Run source freshness** and **Generate docs on run**. [![Editing the job to generate artifacts](/img/docs/dbt-cloud/using-dbt-cloud/edit-job-generate-artifacts.png?v=2 "Editing the job to generate artifacts")](#)Editing the job to generate artifacts When you add a production job to a project, dbt updates the content and provides links to the production documentation and source freshness artifacts it generated for that project. You can see these links by clicking **Deploy** in the upper left, selecting **Jobs**, and then selecting the production job. From the job page, you can select a specific run to see how artifacts were updated for that run only. ##### Documentation[​](#documentation "Direct link to Documentation") Navigate to [Catalog](https://docs.getdbt.com/docs/explore/explore-projects.md) through the **Explore** link to view your project's resources and lineage to gain a better understanding of its latest production state. To view a resource, its metadata, and what commands are needed, refer to [generate metadata](https://docs.getdbt.com/docs/explore/explore-projects.md#generate-metadata) for more details. Both the job's commands and the docs generate step (triggered by the **Generate docs on run** checkbox) must succeed during the job invocation to update the documentation.  For dbt Docs When set up, dbt updates the Documentation link in the header tab so it links to documentation for this job. This link always directs you to the latest version of the documentation for your project. ##### Source Freshness[​](#source-freshness "Direct link to Source Freshness") To view the latest source freshness result, refer to [generate metadata](https://docs.getdbt.com/docs/explore/explore-projects.md#generate-metadata) for more detail. Then navigate to Catalog through the **Explore** link.  For dbt Docs Configuring a job for the Source Freshness artifact setting also updates the data source link under **Orchestration** > **Data sources**. The link points to the latest Source Freshness report for the selected job. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Available dbt versions Whether you're using the CLI or working within the dbt platform, your environments are aligned with a versioned release of dbt. [![](/img/icons/dbt-bit.svg)](https://docs.getdbt.com/docs/dbt-versions/core.md) ###### [About dbt Core versions](https://docs.getdbt.com/docs/dbt-versions/core.md) [Information about how dbt Core is versioned and how to target those versions.](https://docs.getdbt.com/docs/dbt-versions/core.md) [![](/img/icons/dbt-bit.svg)](https://docs.getdbt.com/docs/dbt-versions/cloud-release-tracks.md) ###### [About release tracks](https://docs.getdbt.com/docs/dbt-versions/cloud-release-tracks.md) [Learn about how versions of dbt align with the release tracks available on the dbt platform.](https://docs.getdbt.com/docs/dbt-versions/cloud-release-tracks.md) [![](/img/icons/dbt-bit.svg)](https://docs.getdbt.com/docs/dbt-versions/upgrade-dbt-version-in-cloud.md) ###### [Upgrade versions in dbt platform](https://docs.getdbt.com/docs/dbt-versions/upgrade-dbt-version-in-cloud.md) [Instructions for upgrading your dbt platform projects to the latest version of dbt, including the Fusion Engine.](https://docs.getdbt.com/docs/dbt-versions/upgrade-dbt-version-in-cloud.md) [![](/img/icons/dbt-bit.svg)](https://docs.getdbt.com/docs/dbt-versions/product-lifecycles.md) ###### [Product lifecycles](https://docs.getdbt.com/docs/dbt-versions/product-lifecycles.md) [Learn about the dbt product lifecycles from beta through end of life.](https://docs.getdbt.com/docs/dbt-versions/product-lifecycles.md) [![](/img/icons/dbt-bit.svg)](https://docs.getdbt.com/docs/dbt-versions/experimental-features.md) ###### [Preview new dbt platform features](https://docs.getdbt.com/docs/dbt-versions/experimental-features.md) [Learn how to enable self-service beta and preview features for your dbt platform account.](https://docs.getdbt.com/docs/dbt-versions/experimental-features.md) [![](/img/icons/dbt-bit.svg)](https://docs.getdbt.com/docs/dbt-versions/core-upgrade.md) ###### [dbt version upgrade guides](https://docs.getdbt.com/docs/dbt-versions/core-upgrade.md) [All the information you need to prepare your projects for the next version of dbt, including Fusion. Includes guidance on new features, behavior changes, deprecations, and much more.](https://docs.getdbt.com/docs/dbt-versions/core-upgrade.md)
#### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Available integrations StarterEnterpriseEnterprise + ### Available integrations [Starter](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing")[Enterprise](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing")[Enterprise +](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing") There are a number of data applications that seamlessly integrate with the Semantic Layer, powered by MetricFlow, from business intelligence tools to notebooks, spreadsheets, data catalogs, and more. These integrations allow you to query and unlock valuable insights from your data ecosystem. Use the [Semantic Layer APIs](https://docs.getdbt.com/docs/dbt-cloud-apis/sl-api-overview.md) to simplify metric queries, optimize your development workflow, and reduce coding. This approach also ensures data governance and consistency for data consumers. The following tools integrate with the dbt Semantic Layer: [![](/img/icons/pbi.svg)](https://docs.getdbt.com/docs/cloud-integrations/semantic-layer/power-bi.md) ###### [Power BI](https://docs.getdbt.com/docs/cloud-integrations/semantic-layer/power-bi.md) [Use reports to query the dbt Semantic Layer with Power BI and produce dashboards with trusted data.](https://docs.getdbt.com/docs/cloud-integrations/semantic-layer/power-bi.md) [![](/img/icons/tableau-software.svg)](https://docs.getdbt.com/docs/cloud-integrations/semantic-layer/tableau.md) ###### [Tableau](https://docs.getdbt.com/docs/cloud-integrations/semantic-layer/tableau.md) [Learn how to connect to Tableau for querying metrics and collaborating with your team.](https://docs.getdbt.com/docs/cloud-integrations/semantic-layer/tableau.md) [![](/img/icons/google-sheets-logo-icon.svg)](https://docs.getdbt.com/docs/cloud-integrations/semantic-layer/gsheets.md) ###### [Google Sheets](https://docs.getdbt.com/docs/cloud-integrations/semantic-layer/gsheets.md) [Discover how to connect to Google Sheets for querying metrics and collaborating with your team.](https://docs.getdbt.com/docs/cloud-integrations/semantic-layer/gsheets.md) [![](/img/icons/excel.svg)](https://docs.getdbt.com/docs/cloud-integrations/semantic-layer/excel.md) ###### [Microsoft Excel](https://docs.getdbt.com/docs/cloud-integrations/semantic-layer/excel.md) [Connect to Microsoft Excel to query metrics and collaborate with your team. Available for Excel Desktop or Excel Online.](https://docs.getdbt.com/docs/cloud-integrations/semantic-layer/excel.md) [![](/img/icons/omni.svg)](https://docs.omni.co/integrations/dbt/semantic-layer) ###### [Omni](https://docs.omni.co/integrations/dbt/semantic-layer) [Connect Omni to the dbt Semantic Layer to query trusted metrics directly within your Omni data model.](https://docs.omni.co/integrations/dbt/semantic-layer) [](https://docs.omni.co/integrations/dbt/semantic-layer) [![](/img/icons/dot-ai.svg)](https://docs.getdot.ai/dot/integrations/dbt-semantic-layer) ###### [Dot](https://docs.getdot.ai/dot/integrations/dbt-semantic-layer) [Enable everyone to analyze data with AI in Slack or Teams.](https://docs.getdot.ai/dot/integrations/dbt-semantic-layer) [](https://docs.getdot.ai/dot/integrations/dbt-semantic-layer) [![](/img/icons/hex.svg)](https://learn.hex.tech/docs/connect-to-data/data-connections/dbt-integration#dbt-semantic-layer-integration) ###### [Hex](https://learn.hex.tech/docs/connect-to-data/data-connections/dbt-integration#dbt-semantic-layer-integration) [Check out how to connect, analyze metrics, collaborate, and discover more data possibilities.](https://learn.hex.tech/docs/connect-to-data/data-connections/dbt-integration#dbt-semantic-layer-integration) [](https://learn.hex.tech/docs/connect-to-data/data-connections/dbt-integration#dbt-semantic-layer-integration) [![](/img/icons/klipfolio.svg)](https://support.klipfolio.com/hc/en-us/articles/18164546900759-PowerMetrics-Adding-dbt-Semantic-Layer-metrics) ###### [Klipfolio PowerMetrics](https://support.klipfolio.com/hc/en-us/articles/18164546900759-PowerMetrics-Adding-dbt-Semantic-Layer-metrics) [Learn how to connect to a streamlined metrics catalog and deliver metric-centric analytics to business users.](https://support.klipfolio.com/hc/en-us/articles/18164546900759-PowerMetrics-Adding-dbt-Semantic-Layer-metrics) [](https://support.klipfolio.com/hc/en-us/articles/18164546900759-PowerMetrics-Adding-dbt-Semantic-Layer-metrics) [![](/img/icons/mode.svg)](https://mode.com/help/articles/supported-databases#dbt-semantic-layer) ###### [Mode](https://mode.com/help/articles/supported-databases#dbt-semantic-layer) [Discover how to connect, access, and get trustworthy metrics and insights.](https://mode.com/help/articles/supported-databases#dbt-semantic-layer) [](https://mode.com/help/articles/supported-databases#dbt-semantic-layer) [![](/img/icons/push.svg)](https://docs.push.ai/data-sources/semantic-layers/dbt) ###### [Push.ai](https://docs.push.ai/data-sources/semantic-layers/dbt) [Explore how to connect and use metrics to power reports and insights that drive change.](https://docs.push.ai/data-sources/semantic-layers/dbt) [](https://docs.push.ai/data-sources/semantic-layers/dbt) [![](/img/icons/sigma.svg)](https://help.sigmacomputing.com/docs/configure-a-dbt-semantic-layer-integration) ###### [Sigma (Preview)](https://help.sigmacomputing.com/docs/configure-a-dbt-semantic-layer-integration) [Connect Sigma to the dbt Semantic Layer to allow you to leverage your predefined dbt metrics in Sigma workbooks.](https://help.sigmacomputing.com/docs/configure-a-dbt-semantic-layer-integration) [](https://help.sigmacomputing.com/docs/configure-a-dbt-semantic-layer-integration) [![](/img/icons/steep.svg)](https://help.steep.app/integrations/dbt-cloud) ###### [Steep](https://help.steep.app/integrations/dbt-cloud) [Connect Steep to the dbt Semantic Layer for centralized, scalable analytics.](https://help.steep.app/integrations/dbt-cloud) [](https://help.steep.app/integrations/dbt-cloud)
Before you connect to these tools, you'll need to first [set up the dbt Semantic Layer](https://docs.getdbt.com/docs/use-dbt-semantic-layer/setup-sl.md) and [generate a service token](https://docs.getdbt.com/docs/dbt-cloud-apis/service-tokens.md) to create **Semantic Layer Only** and **Metadata Only** permissions. ##### Custom integration[​](#custom-integration "Direct link to Custom integration") * All BI tools can use [exports](https://docs.getdbt.com/docs/use-dbt-semantic-layer/exports.md) with the Semantic Layer, even if they don’t have a native integration. * [Consume metrics](https://docs.getdbt.com/docs/use-dbt-semantic-layer/consume-metrics.md) and develop custom integrations using different languages and tools, supported through [JDBC](https://docs.getdbt.com/docs/dbt-cloud-apis/sl-jdbc.md), ADBC, and [GraphQL](https://docs.getdbt.com/docs/dbt-cloud-apis/sl-graphql.md) APIs, and [Python SDK library](https://docs.getdbt.com/docs/dbt-cloud-apis/sl-python.md). For more info, check out [our examples on GitHub](https://github.com/dbt-labs/example-semantic-layer-clients/). * Connect to any tool that supports SQL queries. These tools must meet one of the two criteria: * Offers a generic JDBC driver option (such as DataGrip) or * Is compatible Arrow Flight SQL JDBC driver version 12.0.0 or higher. #### Related docs[​](#related-docs "Direct link to Related docs") * [dbt Semantic Layer APIs](https://docs.getdbt.com/docs/dbt-cloud-apis/sl-api-overview) to learn how to integrate and query your metrics in downstream tools. * [Semantic Layer API query syntax](https://docs.getdbt.com/docs/dbt-cloud-apis/sl-jdbc.md#querying-the-api-for-metric-metadata) * [Hex Semantic Layer cells](https://learn.hex.tech/docs/explore-data/cells/data-cells/dbt-metrics-cells) to set up SQL cells in Hex. * [Resolve 'Failed APN'](https://docs.getdbt.com/faqs/Troubleshooting/sl-alpn-error.md) error when connecting to the Semantic Layer. * [Semantic Layer on-demand course](https://learn.getdbt.com/courses/semantic-layer) * [Semantic Layer FAQs](https://docs.getdbt.com/docs/use-dbt-semantic-layer/sl-faqs.md) #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### BigQuery and Apache Iceberg dbt supports materializing Iceberg tables on BigQuery via the catalog integration, starting with the dbt-bigquery 1.10 release. #### Creating Iceberg Tables[​](#creating-iceberg-tables "Direct link to Creating Iceberg Tables") dbt supports creating Iceberg tables for two of the BigQuery materializations: * [Table](https://docs.getdbt.com/docs/build/materializations.md#table) * [Incremental](https://docs.getdbt.com/docs/build/materializations.md#incremental) #### BigQuery Iceberg catalogs[​](#bigquery-iceberg-catalogs "Direct link to BigQuery Iceberg catalogs") BigQuery supports Iceberg tables via its built-in catalog [BigLake Metastore](https://cloud.google.com/bigquery/docs/iceberg-tables#architecture) today. No setup is needed to access the BigLake Metastore. However, you will need to have a [storage bucket](https://docs.cloud.google.com/storage/docs/buckets#buckets) and [the required BigQuery roles](https://cloud.google.com/bigquery/docs/iceberg-tables#required-roles) configured prior to creating an Iceberg table. ##### dbt Catalog integration configurations for BigQuery[​](#dbt-catalog-integration-configurations-for-bigquery "Direct link to dbt Catalog integration configurations for BigQuery") The following table outlines the configuration fields required to set up a catalog integration for [Biglake Iceberg tables in BigQuery](https://docs.cloud.google.com/bigquery/docs/iceberg-tables). ##### Configure catalog integration for managed Iceberg tables[​](#configure-catalog-integration-for-managed-iceberg-tables "Direct link to Configure catalog integration for managed Iceberg tables") 1. Create a `catalogs.yml` at the top level of your dbt project.

An example: ```yaml catalogs: - name: my_bigquery_iceberg_catalog active_write_integration: biglake_metastore write_integrations: - name: biglake_metastore external_volume: 'gs://mydbtbucket' table_format: iceberg file_format: parquet catalog_type: biglake_metastore ``` 2. Apply the catalog configuration at either the model, folder, or project level: iceberg\_model.sql ```sql {{ config( materialized='table', catalog_name='my_bigquery_iceberg_catalog' ) }} select * from {{ ref('jaffle_shop_customers') }} ``` 3. Execute the dbt model with `dbt run -s iceberg_model`. ##### Limitations[​](#limitations "Direct link to Limitations") BigQuery today does not support connecting to external Iceberg catalogs. In terms of SQL operations and table management features, please refer to the [BigQuery docs](https://cloud.google.com/bigquery/docs/iceberg-tables#limitations) for more information. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Build and view your docs with dbt dbt enables you to generate documentation for your project and data platform. The documentation is automatically updated with new information after a fully successful job run, ensuring accuracy and relevance. The default documentation experience in dbt is [Catalog](https://docs.getdbt.com/docs/explore/explore-projects.md), available on [Starter, Enterprise, or Enterprise+ plans](https://www.getdbt.com/pricing/). Use [Catalog](https://docs.getdbt.com/docs/explore/explore-projects.md) to view your project's resources (such as models, tests, and metrics) and their lineage to gain a better understanding of its latest production state. Refer to [documentation](https://docs.getdbt.com/docs/build/documentation.md) for more configuration details. This shift makes [dbt Docs](#dbt-docs) a legacy documentation feature in dbt. dbt Docs is still accessible and offers basic documentation, but it doesn't offer the same speed, metadata, or visibility as Catalog. dbt Docs is available to dbt developer plans or dbt Core users. #### Set up a documentation job[​](#set-up-a-documentation-job "Direct link to Set up a documentation job") Catalog uses the [metadata](https://docs.getdbt.com/docs/explore/explore-projects.md#generate-metadata) generated after each job run in the production or staging environment, ensuring it always has the latest project results. To view richer metadata, you can set up documentation for a job in dbt when you edit your job settings or create a new job. Configure the job to [generate metadata](https://docs.getdbt.com/docs/explore/explore-projects.md#generate-metadata) when it runs. If you want to view column and statistics for models, sources, and snapshots in Catalog, then this step is necessary. To set up a job to generate docs: 1. In the top left, click **Deploy** and select **Jobs**. 2. Create a new job or select an existing job and click **Settings**. 3. Under **Execution Settings**, select **Generate docs on run** and click **Save**. [![Setting up a job to generate documentation](/img/docs/dbt-cloud/using-dbt-cloud/documentation-job-execution-settings.png?v=2 "Setting up a job to generate documentation")](#)Setting up a job to generate documentation *Note, for dbt Docs users you need to configure the job to generate docs when it runs, then manually link that job to your project. Proceed to [configure project documentation](#configure-project-documentation) so your project generates the documentation when this job runs.* You can also add the [`dbt docs generate` command](https://docs.getdbt.com/reference/commands/cmd-docs.md) to the list of commands in the job run steps. However, you can expect different outcomes when adding the command to the run steps compared to configuring a job selecting the **Generate docs on run** checkbox. Review the following options and outcomes: | Options | Outcomes | | --------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | **Select checkbox** | Select the **Generate docs on run** checkbox to automatically generate updated project docs each time your job runs. If that particular step in your job fails, the job can still be successful if all subsequent steps are successful. | | **Add as a run step** | Add `dbt docs generate` to the list of commands in the job run steps, in whatever order you prefer. If that particular step in your job fails, the job will fail and all subsequent steps will be skipped. | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | Tip — Documentation-only jobs To create and schedule documentation-only jobs at the end of your production jobs, add the `dbt compile` command in the **Commands** section. #### dbt Docs[​](#dbt-docs "Direct link to dbt Docs") dbt Docs, available on developer plans or dbt Core users, generates a website from your dbt project using the `dbt docs generate` command. It provides a central location to view your project's resources, such as models, tests, and lineage — and helps you understand the data in your warehouse. ##### Configure project documentation[​](#configure-project-documentation "Direct link to Configure project documentation") You configure project documentation to generate documentation when the job you set up in the previous section runs. In the project settings, specify the job that generates documentation artifacts for that project. Once you configure this setting, subsequent runs of the job will automatically include a step to generate documentation. 1. From dbt, click on your account name in the left side menu and select **Account settings**. 2. Navigate to **Projects** and select the project that needs documentation. 3. Click **Edit**. 4. Under **Artifacts**, select the job that should generate docs when it runs and click **Save**. [![Configuring project documentation](/img/docs/dbt-cloud/using-dbt-cloud/documentation-project-details.png?v=2 "Configuring project documentation")](#)Configuring project documentation Use Catalog for a richer documentation experience For a richer and more interactive experience, try out [Catalog](https://docs.getdbt.com/docs/explore/explore-projects.md), available on [Starter, Enterprise, or Enterprise+ plans](https://www.getdbt.com/pricing/). It includes map layers of your DAG, keyword search, interacts with the Studio IDE, model performance, project recommendations, and more. ##### Generating documentation[​](#generating-documentation "Direct link to Generating documentation") To generate documentation in the Studio IDE, run the `dbt docs generate` command in the **Command Bar** in the Studio IDE. This command will generate the documentation for your dbt project as it exists in development in your IDE session. After running `dbt docs generate` in the Studio IDE, click the icon above the file tree, to see the latest version of your documentation rendered in a new browser window. ##### View documentation[​](#view-documentation "Direct link to View documentation") Once you set up a job to generate documentation for your project, you can click **Catalog** in the navigation and then click on **dbt Docs**. Your project's documentation should open. This link will always help you find the most recent version of your project's documentation in dbt. These generated docs always show the last fully successful run, which means that if you have any failed tasks, including tests, then you will not see changes to the docs by this run. If you don't see a fully successful run, then you won't see any changes to the documentation. The Studio IDE makes it possible to view [documentation](https://docs.getdbt.com/docs/build/documentation.md) for your dbt project while your code is still in development. With this workflow, you can inspect and verify what your project's generated documentation will look like before your changes are released to production. #### Related docs[​](#related-docs "Direct link to Related docs") * [Documentation](https://docs.getdbt.com/docs/build/documentation.md) * [Catalog](https://docs.getdbt.com/docs/explore/explore-projects.md) #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Build your metrics Use MetricFlow in dbt to centrally define your metrics. As a key component of the [Semantic Layer](https://docs.getdbt.com/docs/use-dbt-semantic-layer/dbt-sl.md), MetricFlow is responsible for SQL query construction and defining specifications for dbt semantic models and metrics. It uses familiar constructs like semantic models and metrics to avoid duplicative coding, optimize your development workflow, ensure data governance for company metrics, and guarantee consistency for data consumers. [![This diagram shows how the dbt Semantic Layer works with your data stack.](/img/docs/dbt-cloud/semantic-layer/sl-concept.png?v=2 "This diagram shows how the dbt Semantic Layer works with your data stack.")](#)This diagram shows how the dbt Semantic Layer works with your data stack. MetricFlow allows you to: * Intuitively define metrics in your dbt project * Develop from your preferred environment, whether that's the [dbt platform CLI](https://docs.getdbt.com/docs/cloud/cloud-cli-installation.md), [Studio IDE](https://docs.getdbt.com/docs/cloud/studio-ide/develop-in-studio.md), or [dbt Core](https://docs.getdbt.com/docs/local/install-dbt.md) * Use [MetricFlow commands](https://docs.getdbt.com/docs/build/metricflow-commands.md) to query and test those metrics in your development environment * Harness the true magic of the universal Semantic Layer and dynamically query these metrics in downstream tools (Available for dbt [Starter, Enterprise, or Enterprise+](https://www.getdbt.com/pricing/) accounts only). [![](/img/icons/dbt-bit.svg)](https://docs.getdbt.com/docs/build/latest-metrics-spec.md) ###### [Migrate to the latest YAML spec](https://docs.getdbt.com/docs/build/latest-metrics-spec.md) [Learn how to migrate from the legacy metrics YAML spec to the latest metrics YAML spec.](https://docs.getdbt.com/docs/build/latest-metrics-spec.md) [![](/img/icons/dbt-bit.svg)](https://docs.getdbt.com/guides/sl-snowflake-qs.md) ###### [Quickstart for the dbt Semantic Layer](https://docs.getdbt.com/guides/sl-snowflake-qs.md) [Use this guide to build and define metrics, set up the dbt Semantic Layer, and query them using downstream tools.](https://docs.getdbt.com/guides/sl-snowflake-qs.md) [![](/img/icons/dbt-bit.svg)](https://docs.getdbt.com/docs/build/about-metricflow.md) ###### [About MetricFlow](https://docs.getdbt.com/docs/build/about-metricflow.md) [Understand MetricFlow's core concepts, how to use joins, how to save commonly used queries, and what commands are available.](https://docs.getdbt.com/docs/build/about-metricflow.md) [![](/img/icons/dbt-bit.svg)](https://docs.getdbt.com/docs/build/semantic-models.md) ###### [Semantic model](https://docs.getdbt.com/docs/build/semantic-models.md) [Use semantic models as the basis for defining data. They act as nodes in the semantic graph, with entities connecting them.](https://docs.getdbt.com/docs/build/semantic-models.md) [![](/img/icons/dbt-bit.svg)](https://docs.getdbt.com/docs/build/metrics-overview.md) ###### [Metrics](https://docs.getdbt.com/docs/build/metrics-overview.md) [Define metrics in your dbt project using different metric types in YAML files.](https://docs.getdbt.com/docs/build/metrics-overview.md) [![](/img/icons/dbt-bit.svg)](https://docs.getdbt.com/docs/build/advanced-topics.md) ###### [Advanced topics](https://docs.getdbt.com/docs/build/advanced-topics.md) [Learn about advanced topics for dbt Semantic Layer and MetricFlow, such as data modeling workflows, and more.](https://docs.getdbt.com/docs/build/advanced-topics.md) [![](/img/icons/dbt-bit.svg)](https://docs.getdbt.com/docs/use-dbt-semantic-layer/dbt-sl.md) ###### [About the dbt Semantic Layer](https://docs.getdbt.com/docs/use-dbt-semantic-layer/dbt-sl.md) [Introducing the dbt Semantic Layer, the universal process that allows data teams to centrally define and query metrics](https://docs.getdbt.com/docs/use-dbt-semantic-layer/dbt-sl.md) [![](/img/icons/dbt-bit.svg)](https://docs.getdbt.com/docs/cloud-integrations/avail-sl-integrations.md) ###### [Available integrations](https://docs.getdbt.com/docs/cloud-integrations/avail-sl-integrations.md) [Discover the diverse range of partners that seamlessly integrate with the powerful dbt Semantic Layer, allowing you to query and unlock valuable insights from your data ecosystem.](https://docs.getdbt.com/docs/cloud-integrations/avail-sl-integrations.md)
#### Related docs[​](#related-docs "Direct link to Related docs") * [Quickstart guide with the Semantic Layer](https://docs.getdbt.com/guides/sl-snowflake-qs.md) * [The Semantic Layer: what's next](https://www.getdbt.com/blog/dbt-semantic-layer-whats-next/) blog * [Semantic Layer on-demand course](https://learn.getdbt.com/courses/semantic-layer) * [Semantic Layer FAQs](https://docs.getdbt.com/docs/use-dbt-semantic-layer/sl-faqs.md) #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Cache common queries EnterpriseEnterprise + ### Cache common queries [Enterprise](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing")[Enterprise +](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing") The Semantic Layer allows you to cache common queries in order to speed up performance and reduce compute on expensive queries. There are two different types of caching: * [Result caching](#result-caching) leverages your data platform's built-in caching layer. * [Declarative caching](#declarative-caching) allows you to pre-warm the cache using saved queries configuration. While you can use caching to speed up your queries and reduce compute time, knowing the difference between the two depends on your use case: * Result caching happens automatically by leveraging your data platform's cache. * Declarative caching allows you to 'declare' the queries you specifically want to cache. With declarative caching, you need to anticipate which queries you want to cache. * Declarative caching also allows you to dynamically filter your dashboards without losing the performance benefits of caching. This works because filters on dimensions (that are already in a saved query config) will use the cache. #### Prerequisites[​](#prerequisites "Direct link to Prerequisites") * dbt [Enterprise or Enterprise+](https://www.getdbt.com/) plans. * dbt environments must be on [release tracks](https://docs.getdbt.com/docs/dbt-versions/cloud-release-tracks.md) and not legacy dbt Core versions. * A successful job run and [production environment](https://docs.getdbt.com/docs/deploy/deploy-environments.md#set-as-production-environment). * For declarative caching, you need to have [exports](https://docs.getdbt.com/docs/use-dbt-semantic-layer/exports.md) defined in your [saved queries](https://docs.getdbt.com/docs/build/saved-queries.md) YAML configuration file. #### Result caching[​](#result-caching "Direct link to Result caching") Result caching leverages your data platform’s built-in caching layer and features. [MetricFlow](https://docs.getdbt.com/docs/build/about-metricflow.md) generates the same SQL for multiple query requests, this means it can take advantage of your data platform’s cache. Double-check your data platform's specifications. Here's how caching works, using Snowflake as an example, and should be similar across other data platforms: 1. **Run from cold cache** — When you run a semantic layer query from your BI tool that hasn't been executed in the past 24 hours, the query scans the entire dataset and doesn't use the cache. 2. **Run from warm cache** — If you rerun the same query after 1 hour, the SQL generated and executed on Snowflake remains the same. On Snowflake, the result cache is set per user for 24 hours, which allows the repeated query to use the cache and return results faster. Different data platforms might have different caching layers and cache invalidation rules. Here's a list of resources on how caching works on some common data platforms: * [BigQuery](https://cloud.google.com/bigquery/docs/cached-results) * [DataBricks](https://docs.databricks.com/en/optimizations/disk-cache.html) * [Microsoft Fabric](https://learn.microsoft.com/en-us/fabric/data-warehouse/caching) * [Redshift](https://docs.aws.amazon.com/redshift/latest/dg/c_challenges_achieving_high_performance_queries.html#result-caching) * [Snowflake](https://community.snowflake.com/s/article/Caching-in-the-Snowflake-Cloud-Data-Platform) * [Starburst Galaxy](https://docs.starburst.io/starburst-galaxy/data-engineering/optimization-performance-and-quality/workload-optimization/warp-speed-enabled.html) #### Declarative caching[​](#declarative-caching "Direct link to Declarative caching") Declarative caching enables you to pre-warm the cache using [saved queries](https://docs.getdbt.com/docs/build/saved-queries.md) by setting the cache config to `true` in your `saved_queries` settings. This is useful for optimizing performance for key dashboards or common ad-hoc query requests. tip Declarative caching also allows you to dynamically filter your dashboards without losing the performance benefits of caching. This works because filters on dimensions (that are already in a saved query config) will use the cache. For example, if you filter a metric by geographical region on a dashboard, the query will hit the cache, ensuring faster results. This also removes the need to create separate saved queries with static filters. For configuration details, refer to [Declarative caching setup](#declarative-caching-setup). How declarative caching works: * Make sure your saved queries YAML configuration file has [exports](https://docs.getdbt.com/docs/use-dbt-semantic-layer/exports.md) defined. * Running a saved query triggers the Semantic Layer to: * Build a cached table from a saved query, with exports defined, into your data platform. * Make sure any query requests that match the saved query's inputs use the cache, returning data more quickly. * Automatically invalidates the cache when it detects new and fresh data in any upstream models related to the metrics in your cached table. * Refreshes (or rebuilds) the cache the next time you run the saved query. 📹 Check out this video demo to see how declarative caching works! This video demonstrates the concept of declarative caching, how to run it using the dbt scheduler, and how fast your dashboards load as a result. Refer to the following diagram, which illustrates what happens when the Semantic Layer receives a query request: [![Overview of the declarative cache query flow](/img/docs/dbt-cloud/semantic-layer/declarative-cache-query-flow.jpg?v=2 "Overview of the declarative cache query flow")](#)Overview of the declarative cache query flow ##### Declarative caching setup[​](#declarative-caching-setup "Direct link to Declarative caching setup") To populate the cache, you need to configure an export in your saved query YAML file configuration *and* set the `cache config` to `true`. You can't cache a saved query without an export defined. semantic\_model.yml ```yaml saved_queries: - name: my_saved_query ... # Rest of the saved queries configuration. config: cache: enabled: true # Set to true to enable, defaults to false. exports: - name: order_data_key_metrics config: export_as: table ``` To enable saved queries at the project level, you can set the `saved-queries` configuration in the [`dbt_project.yml` file](https://docs.getdbt.com/reference/dbt_project.yml.md). This saves you time in configuring saved queries in each file: dbt\_project.yml ```yaml saved-queries: my_saved_query: config: +cache: enabled: true ``` ##### Run your declarative cache[​](#run-your-declarative-cache "Direct link to Run your declarative cache") After setting up declarative caching in your YAML configuration, you can now run [exports](https://docs.getdbt.com/docs/use-dbt-semantic-layer/exports.md) with the dbt job scheduler to build a cached table from a saved query into your data platform. * Use [exports to set up a job](https://docs.getdbt.com/docs/use-dbt-semantic-layer/exports.md) to run a saved query dbt. * The dbt Semantic Layer builds a cache table in your data platform in a dedicated `dbt_sl_cache` schema. * The cache schema and tables are created using your deployment credentials. You need to grant read access to this schema for your Semantic Layer user. * The cache refreshes (or rebuilds) on the same schedule as the saved query job. [![Overview of the cache creation flow.](/img/docs/dbt-cloud/semantic-layer/cache-creation-flow.jpg?v=2 "Overview of the cache creation flow.")](#)Overview of the cache creation flow. After a successful job run, you can go back to your dashboard to experience the speed and benefits of declarative caching. #### Cache management[​](#cache-management "Direct link to Cache management") dbt uses the metadata from your dbt model runs to intelligently manage cache invalidation. When you start a dbt job, it keeps track of the last model runtime and checks the freshness of the metrics upstream of your cache. If an upstream model has data in it that was created after the cache was created, dbt invalidates the cache. This means queries won't use outdated cases and will instead query directly from the source data. Stale, outdated cache tables are periodically dropped and dbt will write a new cache the next time your saved query runs. You can manually invalidate the cache through the [dbt Semantic Layer APIs](https://docs.getdbt.com/docs/dbt-cloud-apis/sl-api-overview.md) using the `InvalidateCacheResult` field. #### FAQs[​](#faqs "Direct link to FAQs")  How does caching interact with access controls? Cached data is stored separately from the underlying models. If metrics are pulled from the cache, we don’t have the security context applied to those tables at query time. In the future, we plan to clone credentials, identify the minimum access level needed, and apply those permissions to cached tables. #### Related docs[​](#related-docs "Direct link to Related docs") * [Validate semantic nodes in CI](https://docs.getdbt.com/docs/deploy/ci-jobs.md#semantic-validations-in-ci) * [Saved queries](https://docs.getdbt.com/docs/build/saved-queries.md) * [Semantic Layer FAQs](https://docs.getdbt.com/docs/use-dbt-semantic-layer/sl-faqs.md) #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Changelog 2019 and 2020 note This changelog references dbt versions that are no longer supported and have been removed from the docs. For more information about upgrading to a supported version of dbt in your dbt Cloud environment, read [Upgrade dbt version in Cloud](https://docs.getdbt.com/docs/dbt-versions/upgrade-dbt-version-in-cloud.md). Welcome to the 2019 and 2020 changelog for the dbt application! You can use this changelog to see the highlights of what was new, fixed, and enhanced during this time period. #### dbt Cloud v1.1.16 (December 23, 2020)[​](#dbt-cloud-v1116-december-23-2020 "Direct link to dbt Cloud v1.1.16 (December 23, 2020)") This release adds preview support for Databricks Spark in dbt and adds two new permission sets for Enterprise acccounts. ###### Enhancements[​](#enhancements "Direct link to Enhancements") * Added preview support for Databricks Spark support * Added two new Enterprise permission sets: Account Viewer and Project Creator ###### Fixed[​](#fixed "Direct link to Fixed") * Improve logging infrastructure for dbt run logs * Fix for SSH tunnel logging errors #### dbt Cloud v1.1.15 (December 10, 2020)[​](#dbt-cloud-v1115-december-10-2020 "Direct link to dbt Cloud v1.1.15 (December 10, 2020)") Lots of great stuff to confer about this go-round: things really coalesced this week! Lots of excitement around adding Spark to the connection family, as well as knocking out some longstanding bugs. ###### Enhancements[​](#enhancements-1 "Direct link to Enhancements") * Add Spark as an option for database setup ###### Fixed[​](#fixed-1 "Direct link to Fixed") * Fix this one hairy bug where one email could have multiple user accounts * Fix setup-connection react-page routing * Break out group selection logic from license types and group names * Handle JSON errors in v1/v2 body parsing * Handle AuthForbidden and AuthCancelled graciously - ie, not throw 500s * Fix regression with Studio IDE loading spinner #### dbt Cloud v1.1.14 (November 25, 2020)[​](#dbt-cloud-v1114-november-25-2020 "Direct link to dbt Cloud v1.1.14 (November 25, 2020)") This release adds a few new pieces of connective tissue, notably OAuth for BigQuery and SparkAdapter work. There are also some quality of life improvements and investments for the future, focused on our beloved Studio IDE users, and some improved piping for observability into log management and API usage. ###### Enhancements[​](#enhancements-2 "Direct link to Enhancements") * Update IP allowlist * User can OAuth for BigQuery in profile credentials * Adding SparkAdapter backend models, mappers, and services * Added BigQuery OAuth integration * Adding db index for owner\_thread\_id ###### Fixed[​](#fixed-2 "Direct link to Fixed") * Fix post /run error rate * Fix bug where bad argument was passed to dbt runs * Log out unhandled error in environment variable context manager * Remove account settings permissions for user integrations #### dbt Cloud v1.1.13 (November 12, 2020)[​](#dbt-cloud-v1113-november-12-2020 "Direct link to dbt Cloud v1.1.13 (November 12, 2020)") This release adds support for triggering runs with overriden attributes via the [triggerRun](https://docs.getdbt.com/dbt-cloud/api-v2) API endpoint. Additionally, a number of bugs have been squashed and performance improvements have been made. ###### Enhancements[​](#enhancements-3 "Direct link to Enhancements") * Improve error handling for long-running queries in the Studio IDE * Use S3 client caching to improve log download speed for scheduled runs * Support triggering jobs [with overriden attributes from the API](https://docs.getdbt.com/dbt-cloud/api-v2) * Clarify "upgrade" copy on the billing page ###### Fixed[​](#fixed-3 "Direct link to Fixed") * GitLab groups endpoint now returns all groups and subgroups * Support BigQuery retry configs with value 0 * Prevent web IDE from crashing after running an invalid dbt command * Apply additional log scrubbing to filter short-lived git credentials * Fix older migration to make auth\_url field nullable * Support paths in GitLab instance URL * Fix for auth token request url in GitLab oauth flow #### dbt Cloud v1.1.12 (October 30, 2020)[​](#dbt-cloud-v1112-october-30-2020 "Direct link to dbt Cloud v1.1.12 (October 30, 2020)") This release adds dbt v.18.1 and 0.19.0b1 to dbt Cloud. Additionally, a number of bugs have been fixed. ###### Enhancements[​](#enhancements-4 "Direct link to Enhancements") * Update copy on billing page for picking a plan at the end of a trial * Improved authorization for metadata API * Add dbt 0.19.0b1 * Add dbt 0.18.1 ###### Fixed[​](#fixed-4 "Direct link to Fixed") * Fixed an issue where groups from other logged-in accounts appeared in the RBAC UI * Fixed requested GitLab scopes and an issue when encrypting deploy tokens for GitLab auth * Fixed an issue where null characters in logs threw errors in scheduled runs #### dbt Cloud v1.1.11 (October 15, 2020)[​](#dbt-cloud-v1111-october-15-2020 "Direct link to dbt Cloud v1.1.11 (October 15, 2020)") Release v1.1.11 includes some quality-of-life enhancements, copy tweaks, and error resolutions. It also marks the last time we'll have the same digit four times in a row in a release until v2.2.22. ###### Enhancements[​](#enhancements-5 "Direct link to Enhancements") * Add InterfaceError exception handling for commands * Rename My Account --> Profile * Add project and connection to admin backend ###### Fixed[​](#fixed-5 "Direct link to Fixed") * Resolve errors from presence of null-characters in logs * Email verifications backend * Undo run.serialize * Fix error while serialized run * Fix logic error in connection setup * Fix a bug with GitLab auth flow for unauthenticated users * Fix bug where Native Okta SSO uses the wrong port #### dbt Cloud v1.1.10 (October 8, 2020)[​](#dbt-cloud-v1110-october-8-2020 "Direct link to dbt Cloud v1.1.10 (October 8, 2020)") This release adds support for repositories imported via GitLab (Enterprise) and contains a number of bugfixes and improvements in the Studio IDE. ###### Enhancements[​](#enhancements-6 "Direct link to Enhancements") * Add Gitlab integration (Enterprise) * Add GitLab repository setup to project setup flow (Enterprise) * Add GitLab automated Deploy Token installation (Enterprise) * Add dbt 0.18.1rc1 ###### Fixed[​](#fixed-6 "Direct link to Fixed") * Fix bug where Studio IDE gets stuck after changing project repository * Fix race condition where connections can be added to the wrong project * Fix revoking email invites * Fix a bug in slim CI deferring run search where missing previous run caused the scheduler to raise an error * Fix a source of Studio IDE instability * Gracefully clean up Studio IDE backend on shutdown * Always show SSO mappings on Group Details page #### dbt Cloud v1.1.9 (October 1, 2020)[​](#dbt-cloud-v119-october-1-2020 "Direct link to dbt Cloud v1.1.9 (October 1, 2020)") This release adds the ability for admins on the Enterprise plan to configure the Role Based Access Control permissions applied to Projects in their account. Additionally, job execution deferral is now available behind a feature flag, and a number of fixes and improvements were released as well. ###### Enhancements[​](#enhancements-7 "Direct link to Enhancements") * Add dbt version in the navigation sidebar * Add RBAC Group Permission view, create, and modify UIs * Add personal git auth for Studio IDE error handling modals * Add Develop Requests to backend views * Implemented job execution deferral * Add support for dbt v0.18.1b2 ###### Fixed[​](#fixed-7 "Direct link to Fixed") * Fixed the scenario where interacting with the Refresh Studio IDE button causes an index.lock file to remain in the Studio IDE file system * Validate PR URL for XSS attempts * Address RBAC inconsistencies * Fixed users not being able to update their dbt password in-app * Fix for applying user permissions across multiple accounts after SSO auth * Google API: default to common api endpoint but allow override * Fix for missing email variable in GSuite debug logging * Destroy Studio IDE session when switching projects #### dbt Cloud v1.1.8 (September 17, 2020)[​](#dbt-cloud-v118-september-17-2020 "Direct link to dbt Cloud v1.1.8 (September 17, 2020)") This release adds native support for Okta SSO and dbt v0.18.0. It also adds initial support for a GitLab integration and self-service RBAC configuration. ###### Enhancements[​](#enhancements-8 "Direct link to Enhancements") * Add dbt 0.18.0 * Add native Okta SSO support * Add additional logging for Gsuite and Azure SSO * Add git cloning support via GitLab deploy tokens for scheduled runs (coming soon) * add RBAC Groups Detail Page and Groups List UIs ###### Fixed[​](#fixed-8 "Direct link to Fixed") * Allow `*_proxy` env vars in scheduled runs #### dbt Cloud v1.1.7 \[September 3, 2020][​](#dbt-cloud-v117-september-3-2020 "Direct link to dbt Cloud v1.1.7 [September 3, 2020]") This release adds a Release Candidate for [dbt v0.18.0](https://docs.getdbt.com/docs/dbt-versions/core-upgrade.md) and includes bugfixes and improvements to the Cloud IDE and job scheduler. ###### Enhancements[​](#enhancements-9 "Direct link to Enhancements") * Improve scheduler backoff behavior * Add dbt 0.18.0rc1 * Add support for non-standard ssh ports in connection tunnels * Add support for closing the Studio IDE filesystem context menu by clicking outside the menu ###### Fixed[​](#fixed-9 "Direct link to Fixed") * Fix for joining threads in run triggers * Fix thread caching for s3 uploads #### dbt Cloud v1.1.6 (August 20, 2020)[​](#dbt-cloud-v116-august-20-2020 "Direct link to dbt Cloud v1.1.6 (August 20, 2020)") This release includes security enhancements and improvements across the entire dbt application. ###### Enhancements[​](#enhancements-10 "Direct link to Enhancements") * Support for viewing development docs inside of the Studio IDE ([docs](https://docs.getdbt.com/docs/cloud/studio-ide/develop-in-studio.md) * Change CI temporary schema names to be prefixed with `dbt_cloud` instead of `sinter` * Change coloring and iconography to improve accessibility and UX across the application * \[Enterprise] Support the specification of multiple authorized domains in SSO configuration * \[On-premises] Upgrade boto3 to support KIAM authentication ###### Fixed[​](#fixed-10 "Direct link to Fixed") * \[Enterprise] Fix for missing IdP group membership mappings when users belong to >100 Azure AD groups * Disallow the creation of symlinks in the Studio IDE * Improve reliability of background cleanup processes * Improve performance and reliability of artifact management and PR webhook processing #### dbt Cloud v1.1.5 (August 4, 2020)[​](#dbt-cloud-v115-august-4-2020 "Direct link to dbt Cloud v1.1.5 (August 4, 2020)") This release adds a major new feature to the Studio IDE: merge conflict resolution! It also includes changes to the job scheduler that cut the time and resource utilization significantly. ###### Enhancements[​](#enhancements-11 "Direct link to Enhancements") * Add dbt 0.17.2 * Add dbt 0.18.0 beta 2 * Add merge conflict resolution, a merge commit workflow, and merge abort workflow to the IDE * Deprecate dbt versions prior to 0.13.0 * Refactor to cut job scheduler loop time * Reduce extra database calls to account table in job scheduler loop * \[On-premises] Allow clients to disable authentication for SMTP * \[On-premises] Allow disabling of TLS for SMTP * \[On-premises] Making k8s access mode for Studio IDE pods an environment variable * \[Security] Force session cookie to be secure * Make api and admin modules flake8 complaint ###### Fixed[​](#fixed-11 "Direct link to Fixed") * Fix incorrect usage of `region_name` in KMS client * Fix a call to a deprecated Github API * Remove extraneous billing API calls during job scheduler loop * Fix error where refreshing the IDE would leave running dbt processes in a bad state #### dbt Cloud v1.1.4 (July 21, 2020)[​](#dbt-cloud-v114-july-21-2020 "Direct link to dbt Cloud v1.1.4 (July 21, 2020)") This release dramatically speeds up the job scheduler. It adds a new stable dbt version (0.17.1) and a new prerelease (0.17.2b1), and it includes a number of bugfixes. ###### Enhancements[​](#enhancements-12 "Direct link to Enhancements") * Add dbt 0.17.2b1 * Add dbt 0.17.1 and set as default version * Speed up job scheduler by 50% * Added generate docs to rpc service and new view docs route * Queue limiting by account for scheduled jobs ###### Fixed[​](#fixed-12 "Direct link to Fixed") * Fix enterprise SSO configuration when old Auth0 Azure AD is configured * Do not schedule jobs for deleted job definitions or environments * Fix permissions issues * Fix a bug with metadata set in azure storage provider * Fixed error when switching to developer plan from trial * Fix authentication bug where we setup all accounts with same domain * \[Security] Add security check to prevent potentially malicious html files in dbt docs #### dbt Cloud v1.1.3 (July 7, 2020)[​](#dbt-cloud-v113-july-7-2020 "Direct link to dbt Cloud v1.1.3 (July 7, 2020)") This release contains a number of IDE features and bugfixes, a new release candidate of dbt, and a brand new Enterprise Single-Sign On method: Azure Active Directory! ###### Enhancements[​](#enhancements-13 "Direct link to Enhancements") * Add dbt 0.17.1rc3 * Snowflake: Add support for `client_session_keep_alive` config * Enterprise: Native Azure Oauth2 for Enterprise accounts * Studio IDE: Add custom command palette for finding files ###### Fixed[​](#fixed-13 "Direct link to Fixed") * Do not run CI builds for draft PRs in GitHub * Remove race condition when syncing account with stripe billing events * Enterprise: Fixed JIT provisioning bug impacting accounts with shared IdP domains * Studio IDE: Fix a regression with Github git clone method * Studio IDE: Fix a race condition where git clone didn't complete before user entered Studio IDE * Studio IDE: Fix bug with checking out an environment custom branch on Studio IDE refresh * Bigquery: Fix PR schema dropping #### dbt Cloud v1.1.2 (June 23, 2020)[​](#dbt-cloud-v112-june-23-2020 "Direct link to dbt Cloud v1.1.2 (June 23, 2020)") This branch includes an important security fix, two new versions of dbt, and some miscellaneous fixes. ###### Enhancements[​](#enhancements-14 "Direct link to Enhancements") * Add project names to the account settings notifications section * Add dbt 0.17.1 release candidate * Update development dbt version to Marian Anderson * Add remember me to login page and expire user sessions at browser close * Adding Auth Provider and enabling Gsuite SSO for enterprise customers ###### Fixed[​](#fixed-14 "Direct link to Fixed") * \[Security] Fix intra-account API key leakage * Support queries containing unicode characters in the Studio IDE #### dbt Cloud v1.1.1 (June 9, 2020)[​](#dbt-cloud-v111-june-9-2020 "Direct link to dbt Cloud v1.1.1 (June 9, 2020)") This release includes dbt 0.17.0 and a number of IDE quality of life improvements. ###### Enhancements[​](#enhancements-15 "Direct link to Enhancements") * Added dbt 0.17.0 * Added the ability to create a new folder in the IDE * Added gitignore status to file system and display dbt artifacts, including directories dbt\_modules, logs, and target * (Cloud only) Added rollbar and update some various error handling clean up * (On-premises only) Admin site: allow Repository's Pull Request Template field to be blank * (On-premises only) Added AWS KMS support ###### Fixed[​](#fixed-15 "Direct link to Fixed") * Expires old pending password reset codes when a new password reset is requested #### dbt Cloud v1.1.0 (June 2, 2020)[​](#dbt-cloud-v110-june-2-2020 "Direct link to dbt Cloud v1.1.0 (June 2, 2020)") This release adds some new admin backend functionality, as well as automatic seat usage reporting. ##### On-Premises Only[​](#on-premises-only "Direct link to On-Premises Only") ###### Added[​](#added "Direct link to Added") * Added automatic reporting of seat usage. ###### Changed[​](#changed "Direct link to Changed") * Admins can now edit remote URLs for repository in the admin backend. * Admins can now edit credentials in the admin backend. *** #### dbt Cloud v1.0.12 (May 27, 2020)[​](#dbt-cloud-v1012-may-27-2020 "Direct link to dbt Cloud v1.0.12 (May 27, 2020)") This release contains a few bugfixes for the Studio IDE and email notifications, as well as the latest release candidate of 0.17.0. ##### All versions[​](#all-versions "Direct link to All versions") ###### Added[​](#added-1 "Direct link to Added") * Use the correct starter project tag, based on dbt version, when initializing a new project in the IDE * Added branch filtering to IDE git checkout UI. * Added dbt 0.17.0-rc3. ###### Fixed[​](#fixed-16 "Direct link to Fixed") * Fixed source freshness report for dbt version v0.17.0 * Fixed issue with checking-out git branches * Fixed issue of logs being omitted on long running queries in the Studio IDE * Fixed slack notifications failing to send if email notifications fail ##### On-Premises Only[​](#on-premises-only-1 "Direct link to On-Premises Only") ###### Added[​](#added-2 "Direct link to Added") * Added an Admin page for deleting credentials. *** #### dbt Cloud v1.0.11 (May 19, 2020)[​](#dbt-cloud-v1011-may-19-2020 "Direct link to dbt Cloud v1.0.11 (May 19, 2020)") This version adds some new permission sets, and a new release candidate of dbt. ##### All versions[​](#all-versions-1 "Direct link to All versions") ###### Added[​](#added-3 "Direct link to Added") * Added permission sets for Job Viewer, Job Admin and Analyst. * Added dbt 0.17.0-rc1 *** #### dbt Cloud v1.0.10 (May 11, 2020)[​](#dbt-cloud-v1010-may-11-2020 "Direct link to dbt Cloud v1.0.10 (May 11, 2020)") ##### All versions[​](#all-versions-2 "Direct link to All versions") ###### Added[​](#added-4 "Direct link to Added") * Added dbt 0.17.0-b1. * PR Url is now self serve configurable. * Added more granular permissions around creating and deleting permissions. (Account Admin can create new projects by default while both Account Admin and Project Admin can delete the projects they have permissions for by default) * Added an error message to display to users that do not have permissions set up for any projects on an account. ###### Fixed[​](#fixed-17 "Direct link to Fixed") * Removed .sql from CSV download filename * Fixed breaking JobDefinition API with new param custom\_branch\_only * Fixed Studio IDE query table column heading casing *** #### dbt Cloud v1.0.9 (May 5, 2020)[​](#dbt-cloud-v109-may-5-2020 "Direct link to dbt Cloud v1.0.9 (May 5, 2020)") This release includes bugfixes around how permissions are applied to runs and run steps, fixes a bug where the scheduler would hang up, and improves performance of the Studio IDE. ##### All versions[​](#all-versions-3 "Direct link to All versions") ###### Fixed[​](#fixed-18 "Direct link to Fixed") * Fixed permission checks around Runs and Run Steps, this should only affect Enterprise accounts with per-project permissions. * Fixed receiving arbitrary remote\_url when creating a git url repository. * Fixed issue when handling non-resource specific errors from RPC server in Studio IDE. * Fixed a bug where the scheduler would stop if the database went away. * Fixed IDE query results table not supporting horizontal scrolling. ###### Changed[​](#changed-1 "Direct link to Changed") * Improve Studio IDE query results performance. * Allow configuration on jobs to only run builds when environment target branch is env's custom branch. * Allow configuration of GitHub installation IDs in the admin backend. ##### On-Premises Only[​](#on-premises-only-2 "Direct link to On-Premises Only") ###### Fixed[​](#fixed-19 "Direct link to Fixed") * Fixed logic error for installations with user/password auth enabled in an on-premises context *** #### dbt Cloud v1.0.8 (April 28, 2020)[​](#dbt-cloud-v108-april-28-2020 "Direct link to dbt Cloud v1.0.8 (April 28, 2020)") This release adds a new version of dbt (0.16.1), fixes a number of IDE bugs, and fixes some dbt Cloud on-premises bugs. ##### All versions[​](#all-versions-4 "Direct link to All versions") ###### Added[​](#added-5 "Direct link to Added") * Add dbt 0.16.1 ###### Fixed[​](#fixed-20 "Direct link to Fixed") * Fixed Studio IDE filesystem loading to check for directories to ensure that load and write methods are only performed on files. * Fixed a bug with generating private keys for connection SSH tunnels. * Fixed issue preventing temporary PR schemas from being dropped when PR is closed. * Fix issues with Studio IDE tabs not updating query compile and run results. * Fix issues with query runtime timer in Studio IDE for compile and run query functions. * Fixed what settings are displayed on the account settings page to align with the user's permissions. * Fixed bug with checking user's permissions in frontend when user belonged to more than one project. * Fixed bug with access control around environments and file system/git interactions that occurred when using Studio IDE. * Fixed a bug with Environments too generously matching repository. ###### Changed[​](#changed-2 "Direct link to Changed") * Make the configured base branch in the Studio IDE read-only. * Support configuring groups using an account ID in the admin backend. * Use gunicorn webserver in Studio IDE. * Allow any repository with a Github installation ID to use build-on-PR. * Member and Owner Groups are now editable from admin UI. ##### On-Premises Only[​](#on-premises-only-3 "Direct link to On-Premises Only") ###### Fixed[​](#fixed-21 "Direct link to Fixed") * Fixed an issue where account license counts were not set correctly from onprem license file. * Fixed an issue where docs would sometimes fail to load due to a server error. *** #### dbt Cloud v1.0.7 (April 13, 2020)[​](#dbt-cloud-v107-april-13-2020 "Direct link to dbt Cloud v1.0.7 (April 13, 2020)") This release rolls out a major change to how permissions are applied in dbt's API. It also adds some minor bugfixes, and some tooling for improved future QA. ##### All versions[​](#all-versions-5 "Direct link to All versions") ###### Added[​](#added-6 "Direct link to Added") * Added support to permission connections on a per project basis. * Added support to permission credentials on a per project basis. * Added support to permission repositories on a per project basis. * Smoke tests for account signup, user login and basic project setup * Add dbt 0.16.1rc1 * Non-enterprise users can now add new accounts from the Accounts dropdown. ###### Fixed[​](#fixed-22 "Direct link to Fixed") * Fix missing migration for credentials. * Fixed issue with testing connections with a non-default target name specified in the credentials. * Fix issue where Bigquery connections could be created with invalid values for `location`. *** #### dbt Cloud v1.0.6 (March 30, 2020)[​](#dbt-cloud-v106-march-30-2020 "Direct link to dbt Cloud v1.0.6 (March 30, 2020)") This release adds UIs to select group permissions in the project settings UI. It also contains bugfixes for the Studio IDE, PR build schema dropping, and adds support for dissociating Github and Slack integrations via the Admin backend. ##### All versions[​](#all-versions-6 "Direct link to All versions") ###### Added[​](#added-7 "Direct link to Added") * (Enterprise only) Added ability to create group permissions for specific projects in the project settings UI. ###### Fixed[​](#fixed-23 "Direct link to Fixed") * Fix empty state for selecting github repositories * Fixed an issue with the IDE failing to report an invalid project subdirectory for a dbt project * Fix blank loading screen displayed when switching accounts while on account/profile settings page * Fix issue preventing schemas from dropping during PR builds * Fix issue where whitespace in user's name breaks default schema name * Added webhook processing for when a user disassociates github access to their account. * Added slack disassociation capability on user integrations page and on backend admin panel (for notifications). ###### Changed[​](#changed-3 "Direct link to Changed") * Declare application store using configureStore from redux-toolkit *** #### dbt Cloud v1.0.5 (March 23, 2020)[​](#dbt-cloud-v105-march-23-2020 "Direct link to dbt Cloud v1.0.5 (March 23, 2020)") ##### All versions[​](#all-versions-7 "Direct link to All versions") ###### Added[​](#added-8 "Direct link to Added") * Add support for authenticating Development and Deployment Snowflake credentials using keypair auth * Add support for checking out tags, render git output in "clone" run step * Add dbt 0.15.3 * Add dbt 0.16.0 ###### Fixed[​](#fixed-24 "Direct link to Fixed") * Git provider urls now built with correct github account and repository directories. * Invalid DateTime Start time in Studio IDE Results Panel KPIs. * Fix a race condition causing the Invite User UI to not work properly. * Incorrect model build times in Studio IDE. ###### Changed[​](#changed-4 "Direct link to Changed") * Git: ignore `logs/` and `target/` directories in the IDE. *** #### 1.0.4 (March 16, 2020)[​](#104-march-16-2020 "Direct link to 1.0.4 (March 16, 2020)") This release adds two new versions of dbt, adds Snowflake SSO support for Enterprise accounts, and fixes a number of bugs. ##### All versions[​](#all-versions-8 "Direct link to All versions") ###### Added[​](#added-9 "Direct link to Added") * Added dbt 0.15.3rc1 * Added dbt 0.16.0rc2 * Add support for cloning private deps in the IDE when using deploy key auth. * Log user that kicked off manual runs. * Enterprise support for authenticating user Snowflake connections using Snowflake single sign-on ###### Fixed[​](#fixed-25 "Direct link to Fixed") * Fixed issue loading accounts for a user if they lack permissions for any subset of accounts they have a user license for. * Fixed issue with showing blank page for user who is not associated with any accounts. * Fixed issue where runs would continue to kick off on a deleted project. * Fixed issue where accounts connected to GitHub integrations with SAML protection could not import repositories * Improved error messages shown to the user if repos are unauthorized in a GitHub integration when importing a repo * Fix colors of buttons in generated emails ##### On-Premises[​](#on-premises "Direct link to On-Premises") ###### Added[​](#added-10 "Direct link to Added") * Added Admin backend UIs for managing user permissions. *** #### 1.0.3 (March 1, 2020)[​](#103-march-1-2020 "Direct link to 1.0.3 (March 1, 2020)") This release contains the building blocks for RBAC, and a number of bugfixes and upgrades. ##### All versions[​](#all-versions-9 "Direct link to All versions") ###### Added[​](#added-11 "Direct link to Added") * Add support for a read replica for reading runs from the API. * Added groups, group permissions, and user groups. * Add email address to email verification screen. * Add Enterprise Permissions. * Allow account-level access to resources for groups with a permission statement of "all resources" for api backwards compatibility. * Add dbt 0.16.0b3 ###### Fixed[​](#fixed-26 "Direct link to Fixed") * Fix issue with loading projects after switching accounts. * Fix broken links to connections from deployment environment settings. * Fix a bug with inviting readonly users. * Fix a bug where permissions were removed from Enterprise users upon login. ###### Changed[​](#changed-5 "Direct link to Changed") * Update Django version: 2.2.10 * Update Django admin panel version * Update Social Auth version and the related Django component * Update jobs from using account-based resource permissions to project-based resource permissions * Update modal that shows when trials are expired; fix copy for past-due accounts in modal * Replace formatted string logging with structured logging * Move connection and repository settings from account settings to project settings * Update project setup flow to be used for creating projects * Update develop requests to have a foreign key on projects ##### On-Premises[​](#on-premises-1 "Direct link to On-Premises") ###### Added[​](#added-12 "Direct link to Added") * Accounts created from admin backend will come with a default set of groups ###### Changed[​](#changed-6 "Direct link to Changed") * Rename "Fishtown Analytics User" to "Superuser" *** #### dbt Cloud v1.0.2 (February 20, 2020)[​](#dbt-cloud-v102-february-20-2020 "Direct link to dbt Cloud v1.0.2 (February 20, 2020)") This release contains a number of package upgrades, and a number of bugfixes. ##### All versions[​](#all-versions-10 "Direct link to All versions") ###### Added[​](#added-13 "Direct link to Added") * Add request context data to logs * Comprehensive logging for git subprocesses ###### Fixed[​](#fixed-27 "Direct link to Fixed") * Fix an issue where the "Cancel Run" button does not work * Fix warnings regarding mutable resource model defaults for jobs and job notifications * Fix bug where users can create multiple connection user credentials through the project setup workflow * Update auth for requests against Github's api from using query parameters to using an Authorization header * Remove unused threads input from deployment environments * Fix issue that prevented user from viewing documentation and data sources * Fix issue rendering code editor panel in the IDE when using Safari * Fix issue with log levels that caused dbt logs to be too chatty ###### Changed[​](#changed-7 "Direct link to Changed") * Update Django version: 2.2.10 * Update Django admin panel version * Update Social Auth version and the related Django component * Update jobs from using account-based resource permissions to project-based resource permissions * Update modal that shows when trials are expired; fix copy for past-due accounts in modal * Replace formatted string logging with structured logging * Move connection and repository settings from account settings to project settings * Update project setup flow to be used for creating projects ###### Removed[​](#removed "Direct link to Removed") None. *** #### dbt Cloud v1.0.1 (February 4, 2020)[​](#dbt-cloud-v101-february-4-2020 "Direct link to dbt Cloud v1.0.1 (February 4, 2020)") This release makes the IDE generally available, and adds two new versions of dbt (0.15.1, 0.15.2). For on-premises customers, there is a new set of configurations in the configuration console: SMTP: You can now configure dbt to send email notifications through your own SMTP server. RSA Encryption: You can now provide your own RSA keypair for dbt to use for encryption. These fields need to be specified for your instance of dbt to function properly. ##### All versions[​](#all-versions-11 "Direct link to All versions") ###### Added[​](#added-14 "Direct link to Added") * New Team List page * New Team User Detail page * New Invite User page * New dashboard for Read Only users * New dbt version: 0.15.1 * New dbt version: 0.15.2 * Ability to rename files in Studio IDE * New backend service for project-based resource permissions ###### Fixed[​](#fixed-28 "Direct link to Fixed") * Fix an issue where the user has to repeat steps in the onboarding flow * Fix issue where user can get stuck in the onboarding flow * Fix bug where email notifications could be sent to deleted users * Fix UI bug not allowing user to check "Build on pull request?" when creating a job * Fix UI bug in header of the Edit User page * Fix issue that did not take into account pending invites and license seats when re-sending a user invite. * Fix an issue when processing Github webhooks with unconfigured environments * Fix console warning presented when updating React state from unmounted component * Fix issue where closed tabs would continue to be shown, though the content was removed correctly * Fix issue that prevented opening an adjacent tab when a tab was closed * Fix issue creating BigQuery connections causing the account connections list to not load correctly. * Fix for locked accounts that have downgraded to the developer plan at trial end * Fix for not properly showing server error messages on the user invite page ###### Changed[​](#changed-8 "Direct link to Changed") * Deployed a number of Studio IDE visual improvements * Batch logs up every 5 seconds instead of every second to improve database performance * Make `retries` profile configuration for BigQuery connections optional * Support `retries` profile configuration for BigQuery connections (new in dbt v0.15.1) * Replace Gravatar images with generic person icons in the top navbar * Remove deprecated account subscription models * Remove external JS dependencies ###### Removed[​](#removed-1 "Direct link to Removed") * Remove the "read only" role (this is now a "read only" license type) * Remove the "standard" license type * Remove "beta" tag from Studio IDE * Remove unused frontend code (team page/create repository page and related services) ##### Self-Service[​](#self-service "Direct link to Self-Service") ###### Fixed[​](#fixed-29 "Direct link to Fixed") * Fix for locked accounts that have downgraded to the developer plan at trial end ###### Added[​](#added-15 "Direct link to Added") * New Plans page * Add a 14 day free trial * Add the ability to provision a new repository via dbt * New Invite Team step for project setup process for trial accounts ###### Changed[​](#changed-9 "Direct link to Changed") * The "Basic" and "Pro" plans are no longer available. The new "Developer" and "Team" plans are available. * Prorations are now charged immediately, instead of applied to the next billing cycle. * It is no longer possible to downgrade to a plan that does not support the current number of allocated seats. * A "Team" plan that has been cancelled will be locked (closed) at the end of the subscription's period ##### On-Premises[​](#on-premises-2 "Direct link to On-Premises") ###### Added[​](#added-16 "Direct link to Added") * Support custom SMTP settings * Support Azure Blob Storage for run logs + artifacts * Optionally disable anonymous usage tracking *** #### dbt Cloud v0.5.0 (December 19, 2019)[​](#dbt-cloud-v050-december-19-2019 "Direct link to dbt Cloud v0.5.0 (December 19, 2019)") This release preps dbt for the general Studio IDE release in January. Beta Studio IDE functionality can be turned on by checking "Develop file system" in the Accounts page in the dbt backend. ##### All versions[​](#all-versions-12 "Direct link to All versions") ###### Added[​](#added-17 "Direct link to Added") * New dbt version: 0.14.2 * New dbt version: 0.14.3 * New dbt version: 0.14.4 * New dbt version: 0.15.0 * New API endpoint: v3/projects * New API endpoint: v3/credentials * New API endpoint: v3/environments * New API endpoint: v3/events * Studio IDE: Add git workflow UI * Studio IDE: Add filesystem management * Studio IDE: Hup the server when files change * Studio IDE: Display server status and task history * Added development and deployment environments and credentials * Support `--warn-error` flag in dbt runs ###### Fixed[​](#fixed-30 "Direct link to Fixed") * Fixed an issue where the run scheduler would hang up when deleting PR schemas * Fixed an issue where the webhook processor would mark a webhook as processed without queuing a run * Fix a bug where SSH tunnels were not created for the Develop Studio IDE * Fix Develop Studio IDE scrolling in Firefox * Fix a bug where requests were timed out too aggressively * Require company name at signup * Fix security issue where IP blacklist could be bypassed using shorthand * Do a better job of handling git errors * Allow users to delete projects ###### Changed[​](#changed-10 "Direct link to Changed") * Move account picker to sidebar * Increase require.js timeout from 7s to 30s * Migrate environments to projects * Move some UIs into Account Settings * Make cron scheduling available on the free tier * Apply new styles to Studio IDE * Speed up develop #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Changelog 2021 note This changelog references dbt versions that are no longer supported and have been removed from the docs. For more information about upgrading to a supported version of dbt in your dbt environment, read [Upgrade dbt version in Cloud](https://docs.getdbt.com/docs/dbt-versions/upgrade-dbt-version-in-cloud.md). Welcome to the 2021 changelog for the dbt application! You can use this changelog to see highlights of what was new, fixed, and enhanced. #### dbt Cloud v1.1.41 (December 8, 2021)[​](#dbt-cloud-v1141-december-8-2021 "Direct link to dbt Cloud v1.1.41 (December 8, 2021)") It's one of the best weeks of the year - it's [Coalesce](https://coalesce.getdbt.com/)! We'll have some exciting product announcements to share! Did somebody say [metrics](https://coalesce.getdbt.com/talks/keynote-metric-system/) and [dbt Core v1.0](https://coalesce.getdbt.com/talks/dbt-v10-reveal/)?! ###### New products and features[​](#new-products-and-features "Direct link to New products and features") * dbt v1.0 is now available in dbt... nbd. ###### Performance improvements and enhancements[​](#performance-improvements-and-enhancements "Direct link to Performance improvements and enhancements") * Now whenever you log back into dbt, you'll return to the account and project that you most recently were working in! #### dbt Cloud v1.1.39 (November 10, 2021)[​](#dbt-cloud-v1139-november-10-2021 "Direct link to dbt Cloud v1.1.39 (November 10, 2021)") We shipped environment variables in dbt. Environment variables create a way to separate code from configuration - allowing you to set config based on context and keep secrets like git tokens securely stored. ###### New products and features[​](#new-products-and-features-1 "Direct link to New products and features") * You can now add environment variables to your dbt project. Why does this matter? Environment variables are a fundamental building block of a dbt project, which until now, we only enabled in dbt Core. They power many use cases such as cloning private packages, limiting the amount of data that is processed in development environments, changing your data sources depending on the environment, and more. Read about environment variables in our [blog post](https://blog.getdbt.com/introducing-environment-variables-in-dbt-cloud/) or [docs](https://docs.getdbt.com/docs/build/environment-variables.md). #### dbt Cloud v1.1.38 (October 27, 2021)[​](#dbt-cloud-v1138-october-27-2021 "Direct link to dbt Cloud v1.1.38 (October 27, 2021)") Have you used the [Metadata API](https://docs.getdbt.com/docs/dbt-cloud-apis/discovery-api.md) yet? The Metadata API is available to customers on the Team and Enterprise plans, and with it, you can learn tons about your dbt project, if it's running dbt v0.19.0 or later. You can now query information about *any* run, not just the last run of a job. Mo' data, mo' fun! #### dbt Cloud v1.1.37 (October 13, 2021)[​](#dbt-cloud-v1137-october-13-2021 "Direct link to dbt Cloud v1.1.37 (October 13, 2021)") dbt v0.21 is now available in dbt Cloud. The big change with this release is it introduces the `dbt build` command. `dbt build` logically does everything you'd want to do in your DAG. It runs your models, tests your tests, snapshots your snapshots, and seeds your seeds. It does this, resource by resource, from left to right across your DAG. dbt build is an opinionated task. It’s the culmination of all we’ve built- running models with resilient materializations, prioritizing data quality with tests, updating fixtures with seeds, capturing slowly changing dimensions with snapshot. Give it a try! ###### New products and features[​](#new-products-and-features-2 "Direct link to New products and features") * We have a new beta feature, which we're calling Model Bottlenecks. It allows you to visually see how long it takes to build models in each run, so you can see clearly which models are taking the longest. If you're interested in learning more, check out #beta-feedback-model-bottlenecks in the dbt community Slack, and we can add you to the beta. #### dbt Cloud v1.1.36 (September 29, 2021)[​](#dbt-cloud-v1136-september-29-2021 "Direct link to dbt Cloud v1.1.36 (September 29, 2021)") Check out the release candidate for `dbt v0.21.0`! Also tab switching in the dbt Cloud IDE now keeps track of your scroll position - at last! ###### Bug fixes[​](#bug-fixes "Direct link to Bug fixes") * Some Redshift customers were experiencing timeouts on runs. We've since fixed this bug by keeping the session alive longer. ###### Performance improvements and enhancements[​](#performance-improvements-and-enhancements-1 "Direct link to Performance improvements and enhancements") * You won't lose track of the code snippets you were looking at when you switch back and forth between tabs in the dbt Cloud IDE, as we now keep track of your scroll position. #### dbt Cloud v1.1.35 (September 15, 2021)[​](#dbt-cloud-v1135-september-15-2021 "Direct link to dbt Cloud v1.1.35 (September 15, 2021)") Have you ever been working in the Studio IDE, taken a several hour break from developing, and when you returned to your work, the Studio IDE started behaving in unexpected ways? Your develop session became inactive, without any notification. Well, that silent failure won’t happen anymore! dbt now will let you know when you have to refresh your Studio IDE so you can continue to pick up work where you last left off. ###### New products and features[​](#new-products-and-features-3 "Direct link to New products and features") * dbt v0.20.2 is released in dbt. ###### Performance improvements and enhancements[​](#performance-improvements-and-enhancements-2 "Direct link to Performance improvements and enhancements") * Set default threads to 4 for new jobs and in development creds. ###### Bug fixes[​](#bug-fixes-1 "Direct link to Bug fixes") * The user is now prompted to refresh the page when in a disconnected Studio IDE state. * dbt tasks that fail or error are now correctly ordered in the run drawer history. #### dbt Cloud v1.1.34 (September 1, 2021)[​](#dbt-cloud-v1134-september-1-2021 "Direct link to dbt Cloud v1.1.34 (September 1, 2021)") We just launched our beta for supporting environment variables in dbt. Environment variables are exciting because they allow you to clone private packages. If you’re interested in joining the beta, check out the #beta-feedback-for-env-vars channel in dbt Slack for more information. ###### Performance improvements and enhancements[​](#performance-improvements-and-enhancements-3 "Direct link to Performance improvements and enhancements") Our Studio IDE SQL drawer got a fresh new look, and it now has improved accessibility. #### dbt Cloud v1.1.33 (August 18, 2021)[​](#dbt-cloud-v1133-august-18-2021 "Direct link to dbt Cloud v1.1.33 (August 18, 2021)") We added a DAG in the Studio IDE, so that you can see your model dependencies as you develop! If you haven’t seen the DAG visualization yet, take a moment to spin up the Studio IDE, navigate to the Lineage tab, and click-click-click around in there — it is legitimately a brand new modality for developing dbt projects, and it’s something worth being excited about! ###### New products and features[​](#new-products-and-features-4 "Direct link to New products and features") * [Dashboard Status Tiles](https://docs.getdbt.com/docs/explore/data-tile.md) can now be embedded on dashboards (or anywhere you can embed an iFrame) to give immediate insight into data freshness and quality. This helps dbt project maintainers build trust internally about the data that end users are seeing. * We shipped DAG in the Studio IDE to GA! * Support for `dbt v0.20.1` in Cloud. ###### Bug fixes[​](#bug-fixes-2 "Direct link to Bug fixes") * Databricks users will now be able to see and update the token/schema for deployment environments. * Some Github users were experiencing a broken profile image in dbt. This should be fixed if users disconnect and reconnect their Github accounts. #### dbt Cloud v1.1.32 (August 4, 2021)[​](#dbt-cloud-v1132-august-4-2021 "Direct link to dbt Cloud v1.1.32 (August 4, 2021)") The Metadata API is now in GA! When dbt invokes certain commands like run, test, seed, etc, dbt generates metadata in the form of [artifacts](https://docs.getdbt.com/reference/artifacts/dbt-artifacts.md). These artifacts give you tons of information about project set up, run times, test details, compiled SQL, and so much more. Now dbt serves a GraphQL API which supports arbitrary queries over these artifacts, so you can retrieve the metadata you want almost instantaneously. ###### New products and features[​](#new-products-and-features-5 "Direct link to New products and features") * The Metadata API is the start of our metadata product suite. Learn more about how to use the Metadata API [here](https://docs.getdbt.com/docs/dbt-cloud-apis/discovery-api.md). * dbt Enterprise customers using GitHub now get better fine-grained access control in their dbt projects. dbt will enforce git permissions for every developer to ensure that read / write policies in GitHub carry through to the IDE. #### dbt Cloud v1.1.31 (July 21, 2021)[​](#dbt-cloud-v1131-july-21-2021 "Direct link to dbt Cloud v1.1.31 (July 21, 2021)") We’ve improved the tabbing experience in the Studio IDE. Tabs now work much more intuitively, and you don’t have to worry about losing your work anymore! ###### New products and features[​](#new-products-and-features-6 "Direct link to New products and features") * We are working to release a DAG directly in the IDE, so that when you’re developing, you have a clear idea of where the model you’re working on sits in the dependency graph. If you’re interested in testing out the feature early, head over to the `#beta-feedback-for-ide-dag` channel in the dbt Slack, and we’ll get the new product feature-flagged on your account! * Added dbt 0.20.0 to Cloud ###### Bug fixes[​](#bug-fixes-3 "Direct link to Bug fixes") * Users will now be able to initialize any project that doesn't contain a `dbt_project.yml` file, regardless of whether or not there are pre-existing files and/or commits to that repo. ###### Performance improvements and enhancements[​](#performance-improvements-and-enhancements-4 "Direct link to Performance improvements and enhancements") * We've been working on some nice improvements to tabs in our Studio IDE. We’ve fixed deficiencies with tabs that caused users to lose work if they didn’t hit save regularly enough. Additionally, opening, closing, and the order of the tabs work much more smoothly. * You may have noticed that there is now a source freshness checkbox in your execution settings when you configure a job on dbt Cloud. Selecting this checkbox will run `dbt source freshness` as the first step in your job, but it will not break subsequent steps if it fails. Updated source freshness documentation available [here](https://docs.getdbt.com/docs/deploy/source-freshness.md). * Added a new endpoint to allow API key rotation via `POST https://cloud.getdbt.com/api/v2/users/{user-id}/apikey` #### dbt Cloud v1.1.30 (July 7, 2021)[​](#dbt-cloud-v1130-july-7-2021 "Direct link to dbt Cloud v1.1.30 (July 7, 2021)") We shipped a resizable folder pane in the Studio IDE, and we're hearing great things! "My quality of life has greatly increased with this little update!" Hope this helps everyone else enjoy the Studio IDE a little more too. ###### New products and features[​](#new-products-and-features-7 "Direct link to New products and features") * Resizable folder pane in the Studio IDE: Have you ever developed in the Studio IDE and not been able to see the full name of your model because you couldn’t adjust the width of the file pane? Yeah, us too. Now you’ll be able to adjust your project’s file tree width to be as wide or as narrow as you’d like. It’s these small things that make developing in the Studio IDE so much easier. ###### Bug fixes[​](#bug-fixes-4 "Direct link to Bug fixes") * Made some changes to GitLab webhooks so that the status of the dbt run gets properly updated in GitLab. * Resolved an issue where users saw a blank screen rather than the SSO reauthentication page. ###### Performance improvements and enhancements[​](#performance-improvements-and-enhancements-5 "Direct link to Performance improvements and enhancements") * Refreshed the design of the repository import page. #### dbt Cloud v1.1.29 (June 23, 2021)[​](#dbt-cloud-v1129-june-23-2021 "Direct link to dbt Cloud v1.1.29 (June 23, 2021)") We're heads down working on a handful of new features that we're going to share at the end of this month. The finish line is in sight. In the meantime, check out our latest release candidates for dbt Core. The biggest changes are better tests, providing consistency, configurability, and persistence. Additionally, we've refactored partial parsing and introduced an experimental parser; both are set to off by default. ###### New products and features[​](#new-products-and-features-8 "Direct link to New products and features") * Add support for latest Core release candidates to dbt: v0.19.2-rc2 and v0.20.0-rc1 ###### Bug fixes[​](#bug-fixes-5 "Direct link to Bug fixes") * Add a safeguard for the SSO reauth page to avoid 401 interceptors ###### Performance improvements and enhancements[​](#performance-improvements-and-enhancements-6 "Direct link to Performance improvements and enhancements") * Ensure navigation bar is in dark mode when Studio IDE is set to dark mode #### dbt Cloud v1.1.28 (June 9, 2021)[​](#dbt-cloud-v1128-june-9-2021 "Direct link to dbt Cloud v1.1.28 (June 9, 2021)") We shipped a far better experience for GitLab users. Be sure to check out new CI features that are now available for customers using GitLab. Additionally, all developers should test out Slim CI which will speed up their model builds. ###### New products and features[​](#new-products-and-features-9 "Direct link to New products and features") * `Slim CI`: We’ve made Slim CI available for all our cloud customers! With Slim CI, you don't have to rebuild and test all your models; you can instruct dbt Cloud to run jobs on only modified or new resources. If you are a GitHub or GitLab user, try creating a new job that runs on pull requests and you can signal to dbt to run only on these modified resources by including the `state:modified+` argument. Read more about Slim CI [here](https://docs.getdbt.com/docs/deploy/continuous-integration.md). * Native GitLab authentication for dbt Developer and Team Tiers: We’ve shipped native GitLab auth into GA. You can now import new GitLab repos with a couple clicks, trigger CI builds when Merge Requests are opened in GitLab, and carry GitLab permissions through to Studio IDE's git actions. Read how to set up native GitLab auth [here](https://docs.getdbt.com/docs/cloud/git/connect-gitlab.md). ###### Bug fixes[​](#bug-fixes-6 "Direct link to Bug fixes") * Allow users to select artifacts from a job that runs source freshness on jobs with the source freshness execution settings set to `ON`. * Resolve `RUN ONLY ON CUSTOM BRANCH?` button to toggle on and off properly. * Retain information in a `Statement` tab when the page is refreshed. * Unsaved changes in the Studio IDE are now saved when committing work. * Drop temporary schemas in the data warehouse for closed or merged GitLab merge requests. ###### Performance improvements and enhancements[​](#performance-improvements-and-enhancements-7 "Direct link to Performance improvements and enhancements") * Behind the scenes, we’ve been moving off of Angular and onto React. We’ve started the process of migrating the central pieces of our UI over - the first of which is the main navigation. We think this will have a big impact on our ability to reduce UI bugs and improve user experience. * Added support for dbt 0.19.2rc2 + 0.20.0rc1 in dbt. #### dbt Cloud v1.1.27 (May 26, 2021)[​](#dbt-cloud-v1127-may-26-2021 "Direct link to dbt Cloud v1.1.27 (May 26, 2021)") A lot of improvements coming for GitLab webhooks and native auth. We also fixed a number of bugs in the Studio IDE. Our goal is for you to never see an infinite spinner again! ###### Enhancements[​](#enhancements "Direct link to Enhancements") * Add dbt v0.19.2rc1 and v0.20.0b1 * Add an open/closable overlay for the DAG * Disable department dropdown * Add DAG flags, button, and tab context * Add run source freshness option to jobs * Implement conditional redirecting after GitLab app integration * Add Develop Pod Support for Rook and Ceph file storage * Show all common actions for valid top level commands ###### Fixed[​](#fixed "Direct link to Fixed") * Fix link to documentation * Disable the "Restart Studio IDE" Button while the Studio IDE is loading * Continue canceling runs when we run into deleted accounts * Fix SSO re-auth page * Fix blank verify email page * Resolve git refresh regression * Fix missing "Run on Merge" button in Job creation/edit form- * Warn users they have unsaved changes * Updates test command suggestions and regex for common action suggestions * Updates order of stylesheet import to fix missing border bug * Fix GitLab PR link for Run Page * Fix infinite spinner for missing environment or development credentials * Fix infinite spinner when user is missing dev credentials * Do not try to push if awaiting a merge * Fix deleting schemas * Fix favicon reference #### dbt Cloud v1.1.26 (May 12, 2021)[​](#dbt-cloud-v1126-may-12-2021 "Direct link to dbt Cloud v1.1.26 (May 12, 2021)") If you haven't seen it yet, spin up the Studio IDE: the command bar now has recent actions (you can up-arrow like on the command line) as well as some hardcoded suggestions that will auto-populate your active model, if there is one. Check it out! Other fixes and adjustments as well, as we all get ready for Staging this Thursday - exciting week for the Product org over at ol' Fishtown! ###### Enhancements[​](#enhancements-1 "Direct link to Enhancements") * Made dbt default version on env 0.19.1 * Rolled out new command line experience to all customers * Post webhook triggered run status back to gitlab * Temporary tabs can also populate the model from manifest * Check command line content is minimally valid * Allow user to restart server when develop pod crashes * Prevent overflow of menu items ###### Fixed[​](#fixed-1 "Direct link to Fixed") * Handle validation error for improper remote URLs in the Scheduler * Refactor exception logging out of GitRepo and into exception handlers * Required tags returning null from core no longer causing infinite spinner * Removed deleted repos while fetching repository for sending commit statuses * Refactor git provider service * Resolve files with special characters becoming forever dirty * Disable input when RPC command running & add button when command bar is empty * Updating button for the Cancel/Enter button on commandline * Fix connection setup to always use the project referenced in the route * Fix "View data sources" URL in environment page * Add support for clicking on previously run commands and updating the text inside of the commandline * Fix sources URL in environments page * Fix metadata token not allowed API response #### dbt Cloud v1.1.25 (April 28, 2021)[​](#dbt-cloud-v1125-april-28-2021 "Direct link to dbt Cloud v1.1.25 (April 28, 2021)") Exciting things coming down the pipe - ongoing enhancements to the command bar experience in the Studio IDE, doing some work to ensure that more git providers are presented with a first class experience in Cloud, as well as assorted bug fixes - "I must have bug fixes, always and always" - that was Monet I think ###### Enhancements[​](#enhancements-2 "Direct link to Enhancements") * Made a grip of visual updates to the new command bar work * Moved to using the active model name instead of a placeholder in command bar work * Added user ability to delete connections, remove association from a given project. * Added verification of dbt version for command bar beta feature flag ###### Fixed[​](#fixed-2 "Direct link to Fixed") * Removed testing prop that keeps drawer open * Added double encoding to handle Snowflake roles with spaces * Fixed account switching in user notifications * Handled invalid Azure SSO group responses * Fixed error which only showed common actions when run drawer was closed * Allowed unencrypted adapter fields to be edited * Fixed bugs with file and folder renaming, alongside associated tab state #### dbt Cloud v1.1.24 (April 14, 2021)[​](#dbt-cloud-v1124-april-14-2021 "Direct link to dbt Cloud v1.1.24 (April 14, 2021)") Phew! As our company grows, so too does our changelog! Look at all these! The big chunks you'll see here are related to some ongoing in-Studio IDE work, focused on the command bar experience, as well as some partner & connection work (see the Gits, Databricks, and so forth), and of course ongoing longer-term bets around metadata! ###### Enhancements[​](#enhancements-3 "Direct link to Enhancements") * Added onFocus and onBlur properties to populate and remove "dbt" in command bar * Enabled executing command on enter if user's cursor is in the command bar * Added Metadata API access button to account settings * Added feature flag for displaying only recent actions * Added dbt 0.19.1 * Added regex validation to Databrick's hostname web-form field * Updated Connection Edit to allow adapter editing * Enabled self-service Github and GitLab integration disconnection * Added link to docs for license map & handle duplicate error gracefully * Moved deferred job execution to execution settings. * Recorded user command history * Enabled new file creation flow ###### Fixed[​](#fixed-3 "Direct link to Fixed") * Added styling class to popup to ensure text is readable * Fixed sourcemaps syntax for dev commands * Added timeout and retry to dbt deps * Updated databricks schema field type and add error handling to ConnectionSetup * Fixed Bigquery private keys & convert text to textarea * Fixed last used datetime in the service token UI * Added missing token URI to Bigquery connection edit * Prevent multiple develop sessions for one user * Fixed SchemaForm validating non-displayed fields * Fixed required fields for Bigquery connection JSON uploads * Fixed self selection as deferred job * Always create a Monaco model on tab open if no matching model exists * Fixed tab dirty indicator on open tab * Fixed password reset flow * Fixed docs and sources links in dashboard page for read only users * Fixed truncating first\_name to 30 characters #### dbt Cloud v1.1.23 (March 31, 2021)[​](#dbt-cloud-v1123-march-31-2021 "Direct link to dbt Cloud v1.1.23 (March 31, 2021)") Some backend work, some frontend work, some bug fixes: a nice mix for this release. A few user facing changes you may have noticed already are the persistence of dark/light mode settings across refresh (no more blinding Studio IDE!), branches in the Studio IDE being categorized by Active vs. Removed from Remote, and a tidier new file creation flow, with the file tree expanding to show the new file and opening a new tab to populate the said file! ###### Enhancements[​](#enhancements-4 "Direct link to Enhancements") * Splitting Local-only and Remote branches into different sections of the dropdown selector * Update Profile Integrations to include SSO info * Upgrade to Tailwind 2.0 and FUI 0.0.5 * Allow users to create metadata tokens from the UI * Support manually-managed group memberships * SSO: resolve bug w/ first & last names acting up * Integrate Delighted for NPS surveys * Add dbt 0.19.1rc1 to Cloud * Add an account-level setting to require users to re-authenticate via SSO * Read-only metadata ServiceToken for Cloud * Persist Studio IDE light mode / dark mode across refresh * Categorize & order git branches * Improve new file creation flow ###### Fixed[​](#fixed-4 "Direct link to Fixed") * Check for an empty repository before checking matching remote * Increase wait if run was finished recently * Support default branches through git when a custom branch is not specified * Don't download logs for skipped steps * API Gateway is no longer flooded with errors due to Studio IDE blindly polling dead Develop pod * Fix user license creation via admin interface * Adjusted addition of global .gitignore #### dbt Cloud v1.1.22 (March 17, 2021)[​](#dbt-cloud-v1122-march-17-2021 "Direct link to dbt Cloud v1.1.22 (March 17, 2021)") Rolling out a few long-term bets to ensure that our beloved dbt does not fall over for want of memory, as well as a grip of bug fixes and error messaging improvements (error messages should be helpful, not scolding or baffling, after all!) ###### Enhancements[​](#enhancements-5 "Direct link to Enhancements") * Release Scribe to 100% of multi-tenant accounts * Update language for SQL drawer empty state * Reduce Scribe memory usage ###### Fixed[​](#fixed-5 "Direct link to Fixed") * Fix NoSuchKey error * Guarantee unique notification settings per account, user, and type * Fix for account notification settings * Don't show deleted projects on notifications page * Fix unicode error while decoding last\_chunk * Show more relevant errors to customers * Groups are now editable by non-sudo requests * Normalize domain names across inputs/outputs * Redirect auth failed errors back to appropriate page with error description #### dbt Cloud v1.1.21 (March 3, 2021)[​](#dbt-cloud-v1121-march-3-2021 "Direct link to dbt Cloud v1.1.21 (March 3, 2021)") This changelog wraps up work on what we've been calling the SQL Drawer in the Studio IDE - some design nudges, some interface adjustments, overall a cleaner and snappier experience. If you haven't dipped into the Studio IDE in a while it's worth taking a look! Some back-end work as well, making SSO and role based admin easier and more broadly available for Enterprise level folks, along with your usual assortment of bug squashes and iterations. ###### Enhancements[​](#enhancements-6 "Direct link to Enhancements") * Styling and copy adjustments in the Cloud Studio IDE * Open self-service role based access control to all Enterprise customers * Update AuthProvider UI to enable SAML and Okta * Add a SAML auth redirect URL ###### Fixed[​](#fixed-6 "Direct link to Fixed") * Add param to admin project mapper to included soft deleted projects * Fix delaying logs when we are waiting for a model to finish executing * Saving GSuite auth provider form triggers an authorize * Scribe populates truncated debug logs when runs are executing * Delay attempts for non-200 status codes * Add logic to support select fields in adapter UI * Undo clobbering groups #### dbt Cloud v1.1.20 (February 17, 2021)[​](#dbt-cloud-v1120-february-17-2021 "Direct link to dbt Cloud v1.1.20 (February 17, 2021)") Continued stability and quality of life improvements for folks with multiple accounts and projects - no longer will you have to remember the chronological order of birth of your accounts and projects, as they'll be ordered by the much easier to parse (for human brains anyway) alphabetical order. We're also shipping some experience improvements in the SQL Drawer at the bottom half of the Studio IDE. ###### Enhancements[​](#enhancements-7 "Direct link to Enhancements") * Deleted Info and Logs Studio IDE Tabs, logs will now be displayed in Results Tab * Removed service token feature flag * List Jobs dropdown in alphabetical order * List Account and Project dropdowns in alphabetica order * Pre-join Job Definition results to speed up scheduler * Combine scheduler queries to speedup runtime by about 30% ###### Fixed[​](#fixed-7 "Direct link to Fixed") * Fix issue with source freshness for 0.19.0 #### dbt Cloud v1.1.19 (February 3, 2021)[​](#dbt-cloud-v1119-february-3-2021 "Direct link to dbt Cloud v1.1.19 (February 3, 2021)") The latest release of dbt (Oh Nineteen Oh) is now available for your enjoyment on dbt! We're also releasing some service token pieces here, though they're not quite ready for wide release yet. Moving forward, Oh Nineteen Oh will probably end up being the minimum version required to run the Metadata API & Metadata Toolkit, so, this is a big release! ###### Enhancements[​](#enhancements-8 "Direct link to Enhancements") * Added dbt 0.19.0 😻 * Allowed account-wide service tokens to create connections * Added integration for service token UI and API * Authorized requests that supply a service token ###### Fixed[​](#fixed-8 "Direct link to Fixed") * Added logic to show the entered service token name prior to the request completing * Fixed endlessly running rpc queries with non-working cancel button on Studio IDE refresh #### dbt Cloud v1.1.18 (January 20, 2021)[​](#dbt-cloud-v1118-january-20-2021 "Direct link to dbt Cloud v1.1.18 (January 20, 2021)") Most notable things here are around foundational work toward future feature releases, as well as strong assurances of future stability for dbt, and ensuring future sales tax compliance (which we understand turns out to be quite important!) - turns out to be a quite future-looking release! ###### Enhancements[​](#enhancements-9 "Direct link to Enhancements") * Add service tokens UI (stubbed) behind a feature flag * Fixing and Upgrading social-auth * Add dbt Spark 0.19.0rc1 * Adds the reconciliation of persisted file content and tab state when navigating into the Studio IDE * Adds the reconciliation of persisted file content and tab state between Studio IDE sessions * Read logs from scribe and stop logging to db * Upgrade social auth 3.3.3 * Add warning logs for social auth failures * Add dbt 0.19.0rc1 ###### Fixed[​](#fixed-9 "Direct link to Fixed") * Prevent social-auth from updating first or last name * Page through Stripe results when listing subscriptions * Prevent enqueueing runs in deleted projects * Fix Studio IDE git actions causing open tab contents to be lost on Studio IDE re-entry * Add DBT\_CLOUD\_CONTEXT environment variable * Add logic to hide IP whitelist message for on-prem customers * fix 0.19.0rc1 run image dependencies #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Column-level lineage EnterpriseEnterprise + ### Column-level lineage [Enterprise](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing")[Enterprise +](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing") Catalog now offers column-level lineage (CLL) for the resources in your dbt project. Analytics engineers can quickly and easily gain insight into the provenance of their data products at a more granular level. For each column in a resource (model, source, or snapshot) in a dbt project, Catalog provides end-to-end lineage for the data in that column given how it's used. CLL is available to all dbt Enterprise plans that can use Catalog. [![Overview of column level lineage](/img/docs/collaborate/dbt-explorer/example-overview-cll.png?v=2 "Overview of column level lineage")](#)Overview of column level lineage On-demand learning If you enjoy video courses, check out our [dbt Catalog on-demand course](https://learn.getdbt.com/courses/dbt-catalog) and learn how to best explore your dbt project(s)! #### Access the column-level lineage[​](#access-the-column-level-lineage "Direct link to Access the column-level lineage") There is no additional setup required for CLL if your account is on an Enterprise plan that can use Catalog. You can access the CLL by expanding the column card in the **Columns** tab of an Catalog [resource details page](https://docs.getdbt.com/docs/explore/explore-projects.md#view-resource-details) for a model, source, or snapshot. dbt updates the lineage in Explorer after each run that's executed in the production or staging environment. At least one job in the production or staging environment must run `dbt docs generate`. Refer to [Generating metadata](https://docs.getdbt.com/docs/explore/explore-projects.md#generate-metadata) for more details. [![Example of the Columns tab and where to expand for the CLL](/img/docs/collaborate/dbt-explorer/example-cll.png?v=2 "Example of the Columns tab and where to expand for the CLL")](#)Example of the Columns tab and where to expand for the CLL #### Column evolution lens[​](#column-lens "Direct link to Column evolution lens") You can use the column evolution lineage lens to determine when a column is transformed vs. reused (passthrough or rename). The lens helps you distinguish when and how a column is actually changed as it flows through your dbt lineage, informing debugging workflows in particular. [![Example of the Column evolution lens](/img/docs/collaborate/dbt-explorer/example-evolution-lens.png?v=2 "Example of the Column evolution lens")](#)Example of the Column evolution lens ##### Inherited column descriptions[​](#inherited-column-descriptions "Direct link to Inherited column descriptions") A reused column, labeled as **Passthrough** or **Rename** in the lineage, automatically inherits its description from the source and upstream model columns. The inheritance goes as far back as possible. As long as the column isn't transformed, you don't need to manually define the description; it'll automatically propagate downstream. Passthrough and rename columns are clearly labeled and color-coded in the lineage. In the following `dim_salesforce_accounts` model example (located at the end of the lineage), the description for a column inherited from the `stg_salesforce__accounts` model (located second to the left) indicates its origin. This helps developers quickly identify the original source of the column, making it easier to know where to make documentation changes. [![Example of lineage with propagated and inherited column descriptions.](/img/docs/collaborate/dbt-explorer/example-prop-inherit.jpg?v=2 "Example of lineage with propagated and inherited column descriptions.")](#)Example of lineage with propagated and inherited column descriptions. #### Column-level lineage use cases[​](#use-cases "Direct link to Column-level lineage use cases") Learn more about why and how you can use CLL in the following sections. ##### Root cause analysis[​](#root-cause-analysis "Direct link to Root cause analysis") When there is an unexpected breakage in a data pipeline, column-level lineage can be a valuable tool to understand the exact point where the error occurred in the pipeline. For example, a failing data test on a particular column in your dbt model might've stemmed from an untested column upstream. Using CLL can help quickly identify and fix breakages when they happen. ##### Impact analysis[​](#impact-analysis "Direct link to Impact analysis") During development, analytics engineers can use column-level lineage to understand the full scope of the impact of their proposed changes. This knowledge empowers them to create higher-quality pull requests that require fewer edits, as they can anticipate and preempt issues that would've been unchecked without column-level insights. ##### Collaboration and efficiency[​](#collaboration-and-efficiency "Direct link to Collaboration and efficiency") When exploring your data products, navigating column lineage allows analytics engineers and data analysts to more easily navigate and understand the origin and usage of their data, enabling them to make better decisions with higher confidence. #### Caveats[​](#caveats "Direct link to Caveats") Refer to the following CLL caveats or limitations as you navigate Catalog. ##### Column usage[​](#column-usage "Direct link to Column usage") Column-level lineage reflects the lineage from `select` statements in your models' SQL code. It doesn't reflect other usage like joins and filters. ##### SQL parsing[​](#sql-parsing "Direct link to SQL parsing") Column-level lineage relies on SQL parsing. Errors can occur when parsing fails or a column's origin is unknown (like with JSON unpacking, lateral joins, and so on). In these cases, lineage may be incomplete and dbt will provide a warning about it in the column lineage. [![Example of warning in the full lineage graph](/img/docs/collaborate/dbt-explorer/example-parsing-error-pill.png?v=2 "Example of warning in the full lineage graph")](#)Example of warning in the full lineage graph To review the error details: 1. Click the **Expand** icon in the upper right corner to open the column's lineage graph 2. Select the node to open the column’s details panel Possible error cases are: * **Parsing error** — Error occurs when the SQL is ambiguous or too complex for parsing. An example of ambiguous parsing scenarios are *complex* lateral joins. * **Python error** — Error occurs when a Python model is used within the lineage. Due to the nature of Python models, it's not possible to parse and determine the lineage. * **Unknown error** — Error occurs when the lineage can't be determined for an unknown reason. An example of this would be if a dbt best practice is not being followed, like using hardcoded table names instead of `ref` statements. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Community adapters Community adapters are adapter plugins contributed and maintained by members of the community. We welcome and encourage [adapter plugins contributions](https://docs.getdbt.com/docs/contribute-core-adapters.md#contribute-to-a-pre-existing-adapter) from the dbt community. Please be mindful that these [community maintainers](https://docs.getdbt.com/docs/connect-adapters.md#maintainers) are intrepid volunteers who donate their time and effort — so be kind, understanding, and help out where you can! Refer to the following table for the available community adapters and their respective adapter setup guide: | Data platforms | | | | ----------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------- | -------------------------------------------------------------------------------------------------- | | [CrateDB](https://docs.getdbt.com/docs/local/connect-data-platform/cratedb-setup.md) | [Databend Cloud](https://docs.getdbt.com/docs/local/connect-data-platform/databend-setup.md) | [DeltaStream](https://docs.getdbt.com/docs/local/connect-data-platform/deltastream-setup.md) | | [Doris & SelectDB](https://docs.getdbt.com/docs/local/connect-data-platform/doris-setup.md) | [DuckDB](https://docs.getdbt.com/docs/local/connect-data-platform/duckdb-setup.md) | [Extrica](https://docs.getdbt.com/docs/local/connect-data-platform/extrica-setup.md) | | [Hive](https://docs.getdbt.com/docs/local/connect-data-platform/hive-setup.md) | [Hologres](https://docs.getdbt.com/docs/local/connect-data-platform/hologres-setup.md) | [IBM DB2](https://docs.getdbt.com/docs/local/connect-data-platform/ibmdb2-setup.md) | | [IBM watsonx.data - Spark](https://docs.getdbt.com/docs/local/connect-data-platform/watsonx-spark-setup.md) | [Impala](https://docs.getdbt.com/docs/local/connect-data-platform/impala-setup.md) | [Infer](https://docs.getdbt.com/docs/local/connect-data-platform/infer-setup.md) | | [iomete](https://docs.getdbt.com/docs/local/connect-data-platform/iomete-setup.md) | [MaxCompute](https://docs.getdbt.com/docs/local/connect-data-platform/maxcompute-setup.md) | [MindsDB](https://docs.getdbt.com/docs/local/connect-data-platform/mindsdb-setup.md) | | [MySQL](https://docs.getdbt.com/docs/local/connect-data-platform/mysql-setup.md) | [RisingWave](https://docs.getdbt.com/docs/local/connect-data-platform/risingwave-setup.md) | [Rockset](https://docs.getdbt.com/docs/local/connect-data-platform/rockset-setup.md) | | [SingleStore](https://docs.getdbt.com/docs/local/connect-data-platform/singlestore-setup.md) | [SQL Server & Azure SQL](https://docs.getdbt.com/docs/local/connect-data-platform/mssql-setup.md) | [SQLite](https://docs.getdbt.com/docs/local/connect-data-platform/sqlite-setup.md) | | [Starrocks](https://docs.getdbt.com/docs/local/connect-data-platform/starrocks-setup.md) | [TiDB](https://docs.getdbt.com/docs/local/connect-data-platform/tidb-setup.md) | [TimescaleDB](https://dbt-timescaledb.debruyn.dev/) | | [Upsolver](https://docs.getdbt.com/docs/local/connect-data-platform/upsolver-setup.md) | [Vertica](https://docs.getdbt.com/docs/local/connect-data-platform/vertica-setup.md) | [Watsonx-Presto](https://docs.getdbt.com/docs/local/connect-data-platform/watsonx-presto-setup.md) | | [Yellowbrick](https://docs.getdbt.com/docs/local/connect-data-platform/yellowbrick-setup.md) | | | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Configure incremental models Learn how to configure and optimize incremental models when developing in dbt. Snowflake column size change [Snowflake plans to increase](https://docs.snowflake.com/en/release-notes/bcr-bundles/un-bundled/bcr-2118) the default column size for string and binary data types in May 2026. `dbt-snowflake` versions below v1.10.6 may fail to build certain incremental models when this change is deployed.  Assess impact and required actions If you're using a `dbt-snowflake` version below v1.10.6 or have not yet migrated to a [release track](https://docs.getdbt.com/docs/dbt-versions/cloud-release-tracks.md) in the dbt platform, your adapter version is incompatible with this change and may fail to build incremental models that meet *both* of the following conditions: * Contain string columns with collation defined * Use the `on_schema_change='sync_all_columns'` config To check whether this change affects your project, run the following [list](https://docs.getdbt.com/reference/commands/list.md) command: ```bash dbt ls -s config.materialized:incremental,config.on_schema_change:sync_all_columns --resource-type model ``` * If the command returns `No nodes selected!`, no action is required. * If the command returns one or more models (for example, `Found 1000 models, 644 macros`), you may be impacted if those models have string columns that don't specify a width. In that case, upgrade to a version that includes the fix: * **dbt Core**: `dbt-snowflake` v1.10.6 or later. For upgrade instructions, see [Upgrade adapters](https://docs.getdbt.com/docs/local/install-dbt.md#upgrade-adapters). * **dbt platform**: Any release track (Latest, Compatible, Extended, or Fallback). * **dbt Fusion engine**: v2.0.0-preview.147 or higher. This ensures your incremental models can safely handle schema changes while maintaining required collation settings. Incremental models are built as tables in your data warehouse. The first time a model is run, the table is built by transforming *all* rows of source data. On subsequent runs, dbt transforms *only* the rows in your source data that you tell dbt to filter for, inserting them into the target table which is the table that has already been built. Often, the rows you filter for on an incremental run will be the rows in your source data that have been created or updated since the last time dbt ran. As such, on each dbt run, your model gets built incrementally. Using an incremental model limits the amount of data that needs to be transformed, vastly reducing the runtime of your transformations. This improves warehouse performance and reduces compute costs. #### Configure incremental materializations[​](#configure-incremental-materializations "Direct link to Configure incremental materializations") Like the other materializations built into dbt, incremental models are defined with `select` statements, with the materialization defined in a config block. ```sql {{ config( materialized='incremental' ) }} select ... ``` To use incremental models, you also need to tell dbt: * How to filter the rows on an incremental run * The unique key of the model (if any) ##### Understand the is\_incremental() macro[​](#understand-the-is_incremental-macro "Direct link to Understand the is_incremental() macro") The `is_incremental()` macro powers incremental materializations. It will return `True` if *all* of the following conditions are met: * The model must already exist as a table in the database * The `full-refresh` flag *is not* passed * The running model is configured with `materialized='incremental'` Note that the SQL in your model needs to be valid whether `is_incremental()` evaluates to `True` or `False`. ##### Filtering rows on an incremental run[​](#filtering-rows-on-an-incremental-run "Direct link to Filtering rows on an incremental run") To tell dbt which rows it should transform on an incremental run, wrap valid SQL that filters for these rows in the `is_incremental()` macro. Often, you'll want to filter for "new" rows, as in, rows that have been created since the last time dbt ran this model. The best way to find the timestamp of the most recent run of this model is by checking the most recent timestamp in your target table. dbt makes it easy to query your target table by using the "[{{ this }}](https://docs.getdbt.com/reference/dbt-jinja-functions/this.md)" variable. Also common is wanting to capture both new and updated records. For updated records, you'll need to [define a unique key](#defining-a-unique-key-optional) to ensure you don't bring in modified records as duplicates. Your `is_incremental()` code will check for rows created *or modified* since the last time dbt ran this model. For example, a model that includes a computationally slow transformation on a column can be built incrementally, as follows: models/stg\_events.sql ```sql {{ config( materialized='incremental' ) }} select *, my_slow_function(my_column) from {{ ref('app_data_events') }} {% if is_incremental() %} -- this filter will only be applied on an incremental run -- (uses >= to include records whose timestamp occurred since the last run of this model) -- (If event_time is NULL or the table is truncated, the condition will always be true and load all records) where event_time >= (select coalesce(max(event_time),'1900-01-01') from {{ this }} ) {% endif %} ``` Optimizing your incremental model For more complex incremental models that make use of Common Table Expressions (CTEs), you should consider the impact of the position of the `is_incremental()` macro on query performance. In some warehouses, filtering your records early can vastly improve the run time of your query! ##### About incremental\_predicates[​](#about-incremental_predicates "Direct link to About incremental_predicates") `incremental_predicates` is an advanced use of incremental models, where data volume is large enough to justify additional investments in performance. This config accepts a list of any valid SQL expression(s). dbt does not check the syntax of the SQL statements. This an example of a model configuration in a `yml` file you might expect to see on Snowflake: ```yml models: - name: my_incremental_model config: materialized: incremental unique_key: id # this will affect how the data is stored on disk, and indexed to limit scans cluster_by: ['session_start'] incremental_strategy: merge # this limits the scan of the existing table to the last 7 days of data incremental_predicates: ["DBT_INTERNAL_DEST.session_start > dateadd(day, -7, current_date)"] # `incremental_predicates` accepts a list of SQL statements. # `DBT_INTERNAL_DEST` and `DBT_INTERNAL_SOURCE` are the standard aliases for the target table and temporary table, respectively, during an incremental run using the merge strategy. ``` Alternatively, here are the same configurations configured within a model file: ```sql -- in models/my_incremental_model.sql {{ config( materialized = 'incremental', unique_key = 'id', cluster_by = ['session_start'], incremental_strategy = 'merge', incremental_predicates = [ "DBT_INTERNAL_DEST.session_start > dateadd(day, -7, current_date)" ] ) }} ... ``` This will template (in the `dbt.log` file) a `merge` statement like: ```sql merge into DBT_INTERNAL_DEST from DBT_INTERNAL_SOURCE on -- unique key DBT_INTERNAL_DEST.id = DBT_INTERNAL_SOURCE.id and -- custom predicate: limits data scan in the "old" data / existing table DBT_INTERNAL_DEST.session_start > dateadd(day, -7, current_date) when matched then update ... when not matched then insert ... ``` Limit the data scan of *upstream* tables within the body of their incremental model SQL, which will limit the amount of "new" data processed/transformed. ```sql with large_source_table as ( select * from {{ ref('large_source_table') }} {% if is_incremental() %} where session_start >= dateadd(day, -3, current_date) {% endif %} ), ... ``` ##### Defining a unique key[​](#defining-a-unique-key "Direct link to Defining a unique key") Defining the optional [`unique_key` parameter](https://docs.getdbt.com/reference/resource-configs/unique_key.md) enables updating existing rows instead of just appending new rows. If new information arrives for an existing `unique_key`, that new information can replace the current information instead of being appended to the table. If a duplicate row arrives, it can be ignored. Refer to [strategy specific configs](https://docs.getdbt.com/docs/build/incremental-strategy.md#strategy-specific-configs) for more options on managing this update behavior, like choosing only specific columns to update. If you don't specify a `unique_key`, most adapters will result in `append`-only behavior, which means dbt inserts all rows returned by the model's SQL into the preexisting target table without regard for whether the rows represent duplicates. The optional `unique_key` parameter specifies a field (or combination of fields) that defines the grain of your model. That is, the field(s) identify a single unique row. You can define `unique_key` in a configuration block at the top of your model, and it can be a single column name or a list of column names. The `unique_key` should be supplied in your model definition as a string representing a single column or a list of single-quoted column names that can be used together, for example, `['col1', 'col2', …])`. Columns used in this way should not contain any nulls, or the incremental model may fail to match rows and generate duplicate rows. Either ensure that each column has no nulls (for example with `coalesce(COLUMN_NAME, 'VALUE_IF_NULL')`) or define a single-column [surrogate key](https://www.getdbt.com/blog/guide-to-surrogate-key) (for example with [`dbt_utils.generate_surrogate_key`](https://github.com/dbt-labs/dbt-utils#generate_surrogate_key-source)). tip In cases where you need multiple columns in combination to uniquely identify each row, we recommend you pass these columns as a list (`unique_key = ['user_id', 'session_number']`), rather than a string expression (`unique_key = 'concat(user_id, session_number)'`). By using the first syntax, which is more universal, dbt can ensure that the columns will be templated into your incremental model materialization in a way that's appropriate to your database. When you pass a list in this way, please ensure that each column does not contain any nulls, or the incremental model run may fail. Alternatively, you can define a single-column [surrogate key](https://www.getdbt.com/blog/guide-to-surrogate-key), for example with [`dbt_utils.generate_surrogate_key`](https://github.com/dbt-labs/dbt-utils#generate_surrogate_key-source). When you define a `unique_key`, you'll see this behavior for each row of "new" data returned by your dbt model: * If the same `unique_key` is present in the "new" and "old" model data, dbt will update/replace the old row with the new row of data. The exact mechanics of how that update/replace takes place will vary depending on your database, [incremental strategy](https://docs.getdbt.com/docs/build/incremental-strategy.md), and [strategy specific configs](https://docs.getdbt.com/docs/build/incremental-strategy.md#strategy-specific-configs). * If the `unique_key` is *not* present in the "old" data, dbt will insert the entire row into the table. Please note that if there's a unique\_key with more than one row in either the existing target table or the new incremental rows, the incremental model may fail depending on your database and [incremental strategy](https://docs.getdbt.com/docs/build/incremental-strategy.md). If you're having issues running an incremental model, it's a good idea to double check that the unique key is truly unique in both your existing database table and your new incremental rows. You can [learn more about surrogate keys here](https://www.getdbt.com/blog/guide-to-surrogate-key). info While common incremental strategies, such as `delete+insert` + `merge`, might use `unique_key`, others don't. For example, the `insert_overwrite` strategy does not use `unique_key`, because it operates on partitions of data rather than individual rows. For more information, see [About incremental\_strategy](https://docs.getdbt.com/docs/build/incremental-strategy.md). ###### `unique_key` example[​](#unique_key-example "Direct link to unique_key-example") Consider a model that calculates the number of daily active users (DAUs), based on an event stream. As source data arrives, you will want to recalculate the number of DAUs for both the day that dbt last ran, and any days since then. The model would look as follows: models/staging/fct\_daily\_active\_users.sql ```sql {{ config( materialized='incremental', unique_key='date_day' ) }} select date_trunc('day', event_at) as date_day, count(distinct user_id) as daily_active_users from {{ ref('app_data_events') }} {% if is_incremental() %} -- this filter will only be applied on an incremental run -- (uses >= to include records arriving later on the same day as the last run of this model) where date_day >= (select coalesce(max(date_day), '1900-01-01') from {{ this }}) {% endif %} group by 1 ``` Building this model incrementally without the `unique_key` parameter would result in multiple rows in the target table for a single day – one row for each time dbt runs on that day. Instead, the inclusion of the `unique_key` parameter ensures the existing row is updated instead. #### How do I rebuild an incremental model?[​](#how-do-i-rebuild-an-incremental-model "Direct link to How do I rebuild an incremental model?") If your incremental model logic has changed, the transformations on your new rows of data may diverge from the historical transformations, which are stored in your target table. In this case, you should rebuild your incremental model. To force dbt to rebuild the entire incremental model from scratch, use the `--full-refresh` flag on the command line. This flag will cause dbt to drop the existing target table in the database before rebuilding it for all-time. ```bash $ dbt run --full-refresh --select my_incremental_model+ ``` The trailing `+` in the command above will also run all downstream models that depend on `my_incremental_model`. If any of those downstream dependencies are also incremental models, they will be fully refreshed as well. You can optionally use the [`full_refresh config`](https://docs.getdbt.com/reference/resource-configs/full_refresh.md) to set a resource to always or never full-refresh at the project or resource level. If specified as true or false, the `full_refresh` config will take precedence over the presence or absence of the `--full-refresh` flag. For detailed usage instructions, check out the [dbt run](https://docs.getdbt.com/reference/commands/run.md) documentation. #### What if the columns of my incremental model change?[​](#what-if-the-columns-of-my-incremental-model-change "Direct link to What if the columns of my incremental model change?") Incremental models can be configured to include an optional `on_schema_change` parameter to enable additional control when incremental model columns change. These options enable dbt to continue running incremental models in the presence of schema changes, resulting in fewer `--full-refresh` scenarios and saving query costs. You can configure the `on_schema_change` setting as follows. dbt\_project.yml ```yaml models: +on_schema_change: "sync_all_columns" ``` models/staging/fct\_daily\_active\_users.sql ```sql {{ config( materialized='incremental', unique_key='date_day', on_schema_change='fail' ) }} ``` The possible values for `on_schema_change` are: * `ignore`: Default behavior (see below). * `fail`: Triggers an error message when the source and target schemas diverge * `append_new_columns`: Append new columns to the existing table. Note that this setting does *not* remove columns from the existing table that are not present in the new data. * `sync_all_columns`: Adds any new columns to the existing table, and removes any columns that are now missing. Note that this is *inclusive* of data type changes. On BigQuery, changing column types requires a full table scan; be mindful of the trade-offs when implementing. **Note**: None of the `on_schema_change` behaviors backfill values in old records for newly added columns. If you need to populate those values, we recommend running manual updates, or triggering a `--full-refresh`. `on_schema_change` tracks top-level changes Currently, `on_schema_change` only tracks top-level column changes. It does not track nested column changes. For example, on BigQuery, adding, removing, or modifying a nested column will not trigger a schema change, even if `on_schema_change` is set appropriately. ##### Default behavior[​](#default-behavior "Direct link to Default behavior") This is the behavior of `on_schema_change: ignore`, which is set by default. If you add a column to your incremental model, and execute a `dbt run`, this column will *not* appear in your target table. If you remove a column from your incremental model and execute a `dbt run`, `dbt run` will fail. Instead, whenever the logic of your incremental changes, execute a full-refresh run of both your incremental model and any downstream models. ##### Questions from the Community[​](#questions-from-the-community "Direct link to Questions from the Community") ![Loading](/img/loader-icon.svg)[Ask the Community](https://discourse.getdbt.com/new-topic?category=help\&tags=incremental "Ask the Community") #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Configure your local environment Whether you currently use dbt platform or self-host with Fusion, or you’re a dbt Core user upgrading to Fusion, follow the instructions on this page to: * [Prepare your local setup](#prepare-your-local-setup) * [Set environment variables locally](#set-environment-variables-locally) * [Configure the dbt extension](#configure-the-dbt-extension) If you're new to dbt or getting started with a new project, you can skip this page and check out our [Quickstart for the dbt Fusion engine](https://docs.getdbt.com/guides/fusion.md?step=1) to get started with the dbt extension. The steps differ slightly depending on whether you use dbt platform or self host with Fusion. * dbt platform — You’ll mirror your dbt platform environment locally to unlock Fusion-powered features like Mesh, deferral, and so on. If your project has environment variables, you'll also set them locally to leverage the VS Code extension's features. * Self-hosted — When you self-host with Fusion or are upgrading from dbt Core to Fusion, you’ll most likely already have a local setup and environment variables. Use this page to confirm that your existing local setup and environment variables work seamlessly with the dbt Fusion engine and VS Code extension. #### Prerequisites[​](#prerequisites "Direct link to Prerequisites") * dbt Fusion engine installed * Downloaded and installed the dbt VS Code extension * Basic understanding of [Git workflows](https://docs.getdbt.com/docs/cloud/git/version-control-basics.md) and [dbt project structure](https://docs.getdbt.com/best-practices/how-we-structure/1-guide-overview.md) * [Developer or analyst license](https://www.getdbt.com/pricing) if you're using dbt platform #### Prepare your local setup[​](#prepare-your-local-setup "Direct link to Prepare your local setup") In this section, we'll walk you through the steps to prepare your local setup for the dbt VS Code extension. If you're a dbt platform user that installed the VS Code extension, follow these steps. If you're a self-hosted user, you most likely already have a local setup and environment variables but can confirm using these steps. 1. [Clone](https://code.visualstudio.com/docs/sourcecontrol/overview#_cloning-a-repository) your dbt project repository from your Git provider to your local machine. If you use dbt platform, clone the same repo connected to your project. 2. Ensure you have a dbt [`profiles.yml` file](https://docs.getdbt.com/docs/local/profiles.yml.md). This file defines your data warehouse connection. If you don't have one, run `dbt init` in the terminal to configure your adapter. 3. Validate your `profiles.yml` and project configuration by running `dbt debug`. 4. Add a `dbt_cloud.yml` file from the dbt platform Account settings: * Navigate to **Your profile** -> **VS Code Extension** -> **Download credentials**. * Download the `dbt_cloud.yml` file with your [**Personal access Token (PAT)**](https://docs.getdbt.com/docs/dbt-cloud-apis/user-tokens.md) included and place it in the `~/.dbt/` directory. This then registers and connects the extension to dbt platform and enables platform features such as Mesh and deferral. * Check the `project_id` in your `dbt_project.yml` file matching the project you're working on. 5. Confirm connection from your workstation (like running `dbt debug` in the terminal). Your local computer connects directly to your data warehouse and Git. * dbt platform users: Ensure your laptop/VPN is allowed; dbt platform IPs no longer apply. Check with your admin if you have any issues. * dbt Core users: This has likely already been configured. 6. (Optional) If your project uses environment variables, [find them](https://docs.getdbt.com/docs/build/environment-variables.md#setting-and-overriding-environment-variables) in the dbt platform and [set them](#set-environment-variables-locally) in VS Code or Cursor. * dbt platform users: Copy any environment variables from **Deploy → Environments → Environment variables** tab in dbt platform. Masked secrets are hidden. Work with your admin to get those values. [![Environment variables tab]( "Environment variables tab")](#)Environment variables tab #### Set environment variables locally[​](#set-environment-variables-locally "Direct link to Set environment variables locally") Environment variables are used for authentication and configuration. This section is most relevant for [dbt VS Code extension](https://docs.getdbt.com/docs/about-dbt-extension.md) and dbt platform users who have environment variables configured as part of their workspace setup. If you’re using Fusion locally, you can also install the VS Code extension and use its features and actions — you just may not need to configure these variables unless your setup specifically requires them. The following table shows the different options and when to use them: | Location | Affects | Session state | When to use | | --------------------------------------------------------------------------- | --------------------- | ----------------------------- | ------------------------------------------------------------------------ | | [**Shell profile**](#configure-at-the-os-or-shell-level) | Terminal | ✅ Permanent | Variables remain active globally and available across terminal sessions. | | [**VS Code/Cursor settings**](#configure-in-the-vs-code-extension-settings) | Extension menus + LSP | ✅ Per VS Code/Cursor profile | Editor-only workflows using the extension menu actions. | | [**Terminal session**](#configure-in-the-terminal-session) | Current terminal only | ❌ Temporary | One off testing. | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | tip If you want to use both the VS Code extension menus and terminal to run dbt commands, define your variables in the `shell` profile and VS Code/Cursor settings so they remain active in the terminal globally and in VS Code/Cursor. ##### Configure at the OS or shell level[​](#configure-at-the-os-or-shell-level "Direct link to Configure at the OS or shell level") Define variables once at the OS or shell level to ensure they're available to all terminal sessions. Even if you close a terminal window, the variables will remain available to you. * Mac / Linux * Windows 1. Open your shell configuration file in a text editor using the following commands (If the file does not exist, create it using a text editor using `vi ~/.zshrc` or `vi ~/.bashrc`): ```bash open -e ~/.zshrc ## for zsh (macOS) nano ~/.bashrc ## for bash (Linux or older macOS) ``` 2. A file will open up and you can add your environment variables to the file. For example: * For zsh (macOS): ```bash ## ~/.zshrc export DBT_ENV_VAR1="my_value" export DBT_ENV_VAR2="another_value" ``` * For bash (Linux or older macOS): ```bash ## ~/.bashrc or ~/.bash_profile export DBT_ENV_VAR1="my_value" export DBT_ENV_VAR2="another_value" ``` 3. Save the file. 4. Start a new shell session by closing and reopening the terminal or running `source ~/.zshrc` or `source ~/.bashrc` in the terminal. 5. Verify the variables by running `echo $DBT_ENV_VAR1` and `echo $DBT_ENV_VAR2` in the terminal.
If you see the value printed back in the terminal, you're all set! These variables will now be available: * In all future terminal sessions * For all dbt commands run in the terminal There are two ways to create persistent environment variables on Windows: through PowerShell or the System Properties. The following steps will explain how to configure environment variables using PowerShell. **PowerShell** 1. Run the following commands in PowerShell: ```powershell [Environment]::SetEnvironmentVariable("DBT_ENV_VAR1","my_value","User") [Environment]::SetEnvironmentVariable("DBT_ENV_VAR2","another_value","User") ``` 1. This saves the variables permanently for your user account. To make them available system-wide for all users, replace "User" with "Machine" (requires admin rights). 2. Then, restart VS Code or select **Developer: Reload Window** for changes to take effect. 3. Verify the changes by running `echo $DBT_ENV_VAR1` and `echo $DBT_ENV_VAR2` in the terminal. **System properties (Environment Variables)** 1. Press **Start** → search for **Environment Variables** → open **Edit the system environment variables**. 2. From the **Advanced** tab of the System Properties, click **Environment Variables…**. 3. Under **User variables**, click **New…**. 4. Add the variables and values. For example: * Variable name: `DBT_ENV_VAR1` * Variable value: `my_value` 5. Repeat for any others, then click **OK**. 6. Restart VS Code or Cursor. 7. Verify the changes by running `echo $DBT_ENV_VAR1` and `echo $DBT_ENV_VAR2` in the terminal. ###### About `.env` file support[​](#about-env-file-support "Direct link to about-env-file-support") The [Fusion CLI](https://docs.getdbt.com/docs/local/install-dbt.md?version=2#get-started) and the dbt VS Code extension can automatically read environment variables from a `.env` file in your current working directory (the folder you `cd` into and run dbt commands from in your terminal), if one exists. The environment variables you define in the `.env` file are available both when running dbt commands in the terminal as well as when using the extension's menu actions. Here are some considerations when defining environment variables in the `.env` file: * The `.env` file provides a convenient way to set environment variables that work across both the CLI and the extension. * We recommend placing your `.env` file in the project root and running dbt commands from that location because the file is loaded *only* from your current working directory. It doesn't support the `--project-dir` flag or `DBT_PROJECT_DIR` environment variable, and dbt won't search your project root if you're running commands from a different directory location. * Add `.env` to your `.gitignore` file to prevent sensitive credentials from being committed to your repo. * Order of precedence: Environment variables set directly in your shell (such as `export DBT_ENV_VAR=value`) take precedence over values defined in the `.env` file. ##### Configure in the VS Code extension settings[​](#configure-in-the-vs-code-extension-settings "Direct link to Configure in the VS Code extension settings") To use the dbt extension menu actions/buttons, you can configure environment variables directly in the [VS Code User Settings](vscode://settings/dbt.environmentVariables) interface or in a `.env` file in your current working directory. This includes both your custom variables and any automatic [dbt platform variables](https://docs.getdbt.com/docs/build/environment-variables.md) (like `DBT_CLOUD_ENVIRONMENT_NAME`) that your project depends on. * Configure variables in the VS Code **User Settings** or in a `.env` file to have them recognized by the extension. For example, when using LSP -powered features, "Show build menu," and more. * VS Code does not inherit variables set by the VS Code terminal or external shells. * The terminal uses system environmental variables, and does not inherit variables set in the dbt VS Code extension config. For example, running a dbt command in the terminal won't fetch or use the dbt VS Code extension variables. To configure environment variables in VS Code/Cursor: * Open User Settings * Open .env file 1. Open the [Command Palette](https://code.visualstudio.com/docs/configure/settings#_user-settings) (Cmd + Shift + P for Mac, Ctrl + Shift + P for Windows/Linux). 2. Then select either **Preferences: Open User Settings** in the dropdown menu. 3. Open the [VS Code user settings page](vscode://settings/dbt.environmentVariables). 4. Search for `dbt.environmentVariables`. 5. In the **dbt:Environment Variables** section, add your item and value for the environment variables. 6. Click **Ok** to save the changes. 7. Reload the VS Code extension to apply the changes. Open the Command Palette and select **Developer: Reload Window**. 8. Verify the changes by running a dbt command and checking the output. 1) Create a `.env` file in your current working directory (typically at the root level of your dbt project, same level as your `dbt_project.yml` file). 2) Add your environment variables to the file. For example: ```env DBT_ENV_VAR1=my_value DBT_ENV_VAR2=another_value ``` 3) Save the file. 4) Reload the VS Code extension to apply the changes. 5) Verify the changes by running a dbt command using the extension menu button on the top right corner and checking the output. For example, running `dbtf debug` will show your connection using the values from `.env`: ```shell dbtf debug ... Debugging connection: "authenticator": "my_authenticator", "account": "my_account", "user": "my_user", "database": "my_database", # Loaded from DBT_MY_DATABASE in .env "schema": "my_schema", # Loaded from DBT_MY_SCHEMA in .env ``` ##### Configure in the terminal session[​](#configure-in-the-terminal-session "Direct link to Configure in the terminal session") Configure environment variables in the terminal session using the `export` command. Something to keep in mind: * Doing so will make variables visible to commands that run in that terminal session only. * It lasts only for the current session and opening a new terminal will lose the values. * The built-in dbt VS Code extension buttons and menus will not pick these up. To configure environment variables in the terminal session: 1. Run the following command in the terminal, replacing `DBT_ENV_VAR1` and `test1` with your own variable and value. * Mac / Linux * Windows Cmd * Windows PowerShell ```bash export DBT_ENV_VAR1=test1 ``` Refer to [Microsoft's documentation](https://learn.microsoft.com/en-us/windows-server/administration/windows-commands/set_1) for more information on the `set` command. ```bash set DBT_ENV_VAR1=test1 ``` Refer to [Microsoft's documentation](https://learn.microsoft.com/en-us/powershell/module/microsoft.powershell.core/about/about_environment_variables?view=powershell-7.5#use-the-variable-syntax) for more information on the `$env:` syntax. ```bash $env:DBT_ENV_VAR1 = "test1" ``` 2. Verify the changes by running a dbt command and checking the output. #### dbt extension settings[​](#dbt-extension-settings "Direct link to dbt extension settings") After installing the dbt extension and configuring your local setup, you may want to configure it to better fit your development workflow: 1. Open the VS Code settings by pressing `Ctrl+,` (Windows/Linux) or `Cmd+,` (Mac). 2. Search for `dbt`. On this page, you can adjust the extension’s configuration options to fit your needs. [![dbt extension settings within the VS Code settings.](/img/docs/extension/dbt-extension-settings.png?v=2 "dbt extension settings within the VS Code settings.")](#)dbt extension settings within the VS Code settings. #### Next steps[​](#next-steps "Direct link to Next steps") Now that you've configured your local environment, you can start using the dbt extension to streamline your dbt development workflows. Check out the following resources to get started: * [About the dbt extension](https://docs.getdbt.com/docs/about-dbt-extension.md) * [dbt extension features](https://docs.getdbt.com/docs/dbt-extension-features.md) * [Register the extension](https://docs.getdbt.com/docs/install-dbt-extension.md#register-the-extension) #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Connect to adapters Adapters are an essential component of dbt. At their most basic level, they are how dbt connects with the various supported data platforms. At a higher-level, adapters strive to give analytics engineers more transferrable skills as well as standardize how analytics projects are structured. Gone are the days where you have to learn a new language or flavor of SQL when you move to a new job that has a different data platform. That is the power of adapters in dbt — for more detail, refer to the [Build, test, document, and promote adapters](https://docs.getdbt.com/guides/adapter-creation.md) guide. This section provides more details on different ways you can connect dbt to an adapter, and explains what a maintainer is. ##### Set up in dbt[​](#set-up-in-dbt "Direct link to Set up in dbt") Explore the fastest and most reliable way to deploy dbt using dbt, a hosted architecture that runs dbt Core across your organization. dbt lets you seamlessly [connect](https://docs.getdbt.com/docs/cloud/about-cloud-setup.md) with a variety of [trusted](https://docs.getdbt.com/docs/supported-data-platforms.md) data platform providers directly in the dbt UI. ##### Install with dbt Core[​](#install-with-dbt-core "Direct link to Install with dbt Core") Install dbt Core, an open-source tool, locally using the command line. dbt communicates with a number of different data platforms by using a dedicated adapter plugin for each. When you install dbt Core, you'll also need to install the specific adapter for your database, [connect the dbt Fusion engine to dbt Core](https://docs.getdbt.com/docs/local/install-dbt.md), and set up a `profiles.yml` file. With a few exceptions [1](#user-content-fn-1), you can install all [adapters](https://docs.getdbt.com/docs/supported-data-platforms.md) from PyPI using `python -m pip install adapter-name`. For example to install Snowflake, use the command `python -m pip install dbt-snowflake`. The installation will include `dbt-core` and any other required dependencies, which may include both other dependencies and even other adapter plugins. Read more about [installing dbt](https://docs.getdbt.com/docs/local/install-dbt.md). #### Footnotes[​](#footnote-label "Direct link to Footnotes") 1. Use the PyPI package name when installing with `pip` | Adapter repo name | PyPI package name | | ----------------- | -------------------- | | `dbt-layer` | `dbt-layer-bigquery` | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | [↩](#user-content-fnref-1) #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Consume metrics from your Semantic Layer StarterEnterpriseEnterprise + ### Consume metrics from your Semantic Layer [Starter](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing")[Enterprise](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing")[Enterprise +](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing") After [deploying](https://docs.getdbt.com/docs/use-dbt-semantic-layer/deploy-sl.md) your Semantic Layer, the next important (and fun!) step is querying and consuming the metrics you’ve defined. This page links to key resources that guide you through the process of consuming metrics across different integrations, APIs, and tools, using various different [query syntaxes](https://docs.getdbt.com/docs/dbt-cloud-apis/sl-jdbc.md#querying-the-api-for-metric-metadata). Once your Semantic Layer is deployed, you can start querying your metrics using a variety of tools and APIs. Here are the main resources to get you started: ##### Available integrations[​](#available-integrations "Direct link to Available integrations") Integrate the Semantic Layer with a variety of business intelligence (BI) tools and data platforms, enabling seamless metric queries within your existing workflows. Explore the following integrations: * [Available integrations](https://docs.getdbt.com/docs/cloud-integrations/avail-sl-integrations.md) — Review a wide range of partners such as Tableau, Google Sheets, Microsoft Excel, and more, where you can query your metrics directly from the Semantic Layer. ##### Query with APIs[​](#query-with-apis "Direct link to Query with APIs") To leverage the full power of the Semantic Layer, you can use the Semantic Layer APIs for querying metrics programmatically: * [Semantic Layer APIs](https://docs.getdbt.com/docs/dbt-cloud-apis/sl-api-overview.md) — Learn how to use the Semantic Layer APIs to query metrics in downstream tools, ensuring consistent and reliable data metrics. * [JDBC API query syntax](https://docs.getdbt.com/docs/dbt-cloud-apis/sl-jdbc.md#querying-the-api-for-metric-metadata) — Dive into the syntax for querying metrics with the JDBC API, with examples and detailed instructions. * [GraphQL API query syntax](https://docs.getdbt.com/docs/dbt-cloud-apis/sl-graphql.md#querying) — Learn the syntax for querying metrics via the GraphQL API, including examples and detailed instructions. * [Python SDK](https://docs.getdbt.com/docs/dbt-cloud-apis/sl-python.md#usage-examples) — Use the Python SDK library to query metrics programmatically with Python. ##### Query during development[​](#query-during-development "Direct link to Query during development") For developers working within the dbt ecosystem, it’s essential to understand how to query metrics during the development phase using MetricFlow commands: * [MetricFlow commands](https://docs.getdbt.com/docs/build/metricflow-commands.md) — Learn how to use MetricFlow commands to query metrics directly during the development process, ensuring your metrics are correctly defined and working as expected. #### Next steps[​](#next-steps "Direct link to Next steps") After understanding the basics of querying metrics, consider optimizing your setup and ensuring the integrity of your metric definitions: * [Optimize querying performance](https://docs.getdbt.com/docs/use-dbt-semantic-layer/sl-cache.md) — Improve query speed and efficiency by using declarative caching techniques. * [Validate semantic nodes in CI](https://docs.getdbt.com/docs/deploy/ci-jobs.md#semantic-validations-in-ci) — Ensure that any changes to dbt models don’t break your metrics by validating semantic nodes in Continuous Integration (CI) jobs. * [Build your metrics and semantic models](https://docs.getdbt.com/docs/build/build-metrics-intro.md) — If you haven’t already, learn how to define and build your metrics and semantic models using your preferred development tool. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Continuous deployment in dbt To help you improve data transformations and ship data products faster, you can run [merge jobs](https://docs.getdbt.com/docs/deploy/merge-jobs.md) to implement a continuous deployment (CD) workflow in dbt. Merge jobs can automatically build modified models whenever a pull request (PR) merges, making sure the latest code changes are in production. You don't have to wait for the next scheduled job to run to get the latest updates. [![Workflow of continuous deployment in dbt](/img/docs/dbt-cloud/using-dbt-cloud/cd-workflow.png?v=2 "Workflow of continuous deployment in dbt")](#)Workflow of continuous deployment in dbt You can also implement continuous integration (CI) in dbt, which can help further to reduce the time it takes to push changes to production and improve code quality. To learn more, refer to [Continuous integration in dbt](https://docs.getdbt.com/docs/deploy/continuous-integration.md). #### How merge jobs work[​](#how-merge-jobs-work "Direct link to How merge jobs work") When you set up merge jobs, dbt listens for notifications from your [Git provider](https://docs.getdbt.com/docs/cloud/git/git-configuration-in-dbt-cloud.md) indicating that a PR has been merged. When dbt receives one of these notifications, it enqueues a new run of the merge job. You can set up merge jobs to perform one of the following when a PR merges: | Command to run | Usage description | | ------------------------------------ | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | | `dbt build --select state:modified+` | (Default) Build the modified data with every merge.

dbt builds only the changed data models and anything downstream of it, similar to CI jobs. This helps reduce computing costs and ensures that the latest code changes are always pushed to production. | | `dbt compile` | Refresh the applied state for performant (the slimmest) CI job runs.

dbt generates the executable SQL (from the source model, test, and analysis files) but does not run it. This ensures the changes are reflected in the manifest for the next time a CI job is run and keeps track of only the relevant changes. | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Continuous integration in dbt To implement a continuous integration (CI) workflow in dbt, you can set up automation that tests code changes by running [CI jobs](https://docs.getdbt.com/docs/deploy/ci-jobs.md) before merging to production. dbt tracks the state of what’s running in your production environment so, when you run a CI job, only the modified data assets in your pull request (PR) and their downstream dependencies are built and tested in a staging schema. You can also view the status of the CI checks (tests) directly from within the PR; this information is posted to your Git provider as soon as a CI job completes. Additionally, you can enable settings in your Git provider that allow PRs only with successful CI checks to be approved for merging. [![Workflow of continuous integration in dbt](/img/docs/dbt-cloud/using-dbt-cloud/ci-workflow.png?v=2 "Workflow of continuous integration in dbt")](#)Workflow of continuous integration in dbt Using CI helps: * Provide increased confidence and assurances that project changes will work as expected in production. * Reduce the time it takes to push code changes to production, through build and test automation, leading to better business outcomes. * Allow organizations to make code changes in a standardized and governed way that ensures code quality without sacrificing speed. #### How CI works[​](#how-ci-works "Direct link to How CI works") When you [set up CI jobs](https://docs.getdbt.com/docs/deploy/ci-jobs.md#set-up-ci-jobs), dbt listens for a notification from your Git provider indicating that a new PR has been opened or updated with new commits. When dbt receives one of these notifications, it enqueues a new run of the CI job. dbt builds and tests models, semantic models, metrics, and saved queries affected by the code change in a temporary schema, unique to the PR. This process ensures that the code builds without error and that it matches the expectations as defined by the project's dbt tests. The unique schema name follows the naming convention `dbt_cloud_pr__` (for example, `dbt_cloud_pr_1862_1704`) and can be found in the run details for the given run, as shown in the following image: [![Viewing the temporary schema name for a run triggered by a PR](/img/docs/dbt-cloud/using-dbt-cloud/using_ci_dbt_cloud.png?v=2 "Viewing the temporary schema name for a run triggered by a PR")](#)Viewing the temporary schema name for a run triggered by a PR When the CI run completes, you can view the run status directly from within the pull request. dbt updates the pull request in GitHub, GitLab, or Azure DevOps with a status message indicating the results of the run. The status message states whether the models and tests ran successfully or not. dbt deletes the temporary schema from your data warehouse when you close or merge the pull request. If your project has schema customization using the [generate\_schema\_name](https://docs.getdbt.com/docs/build/custom-schemas.md#how-does-dbt-generate-a-models-schema-name) macro, dbt might not drop the temporary schema from your data warehouse. For more information, refer to [Troubleshooting](https://docs.getdbt.com/docs/deploy/ci-jobs.md#troubleshooting). #### Availability of features by Git provider[​](#availability-of-features-by-git-provider "Direct link to Availability of features by Git provider") * If your git provider has a [native dbt integration](https://docs.getdbt.com/docs/cloud/git/git-configuration-in-dbt-cloud.md), you can seamlessly set up [continuous integration (CI)](https://docs.getdbt.com/docs/deploy/ci-jobs.md) jobs directly within dbt. * For providers without native integration, you can still use the [Git clone method](https://docs.getdbt.com/docs/cloud/git/import-a-project-by-git-url.md) to import your git URL and leverage the [dbt Administrative API](https://docs.getdbt.com/docs/dbt-cloud-apis/admin-cloud-api.md) to trigger a CI job to run. The following table outlines the available integration options and their corresponding capabilities. | **Git provider** | **Native dbt integration** | **Automated CI job** | **Git clone** | **Information** | **Supported plans** | | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | -------------------------- | -------------------- | ------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | ----------------------- | | [Azure DevOps](https://docs.getdbt.com/docs/cloud/git/connect-azure-devops.md)
| ✅ | ✅ | ✅ | Organizations on the Starter and Developer plans can connect to Azure DevOps using a deploy key. Note, you won’t be able to configure automated CI jobs but you can still develop. | Enterprise, Enterprise+ | | [GitHub](https://docs.getdbt.com/docs/cloud/git/connect-github.md)
| ✅ | ✅ | ✅ | | All dbt plans | | [GitLab](https://docs.getdbt.com/docs/cloud/git/connect-gitlab.md)
| ✅ | ✅ | ✅ | | All dbt plans | | All other git providers using [Git clone](https://docs.getdbt.com/docs/cloud/git/import-a-project-by-git-url.md) ([BitBucket](https://docs.getdbt.com/docs/cloud/git/import-a-project-by-git-url.md#bitbucket), [AWS CodeCommit](https://docs.getdbt.com/docs/cloud/git/import-a-project-by-git-url.md#aws-codecommit), and others) | ❌ | ❌ | ✅ | Refer to the [Customizing CI/CD with custom pipelines](https://docs.getdbt.com/guides/custom-cicd-pipelines.md?step=1) guide to set up continuous integration and continuous deployment (CI/CD). | | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | #### Differences between CI jobs and other deployment jobs[​](#differences-between-ci-jobs-and-other-deployment-jobs "Direct link to Differences between CI jobs and other deployment jobs") The [dbt scheduler](https://docs.getdbt.com/docs/deploy/job-scheduler.md) executes CI jobs differently from other deployment jobs in these important ways: * [**Concurrent CI checks**](#concurrent-ci-checks) — CI runs triggered by the same dbt CI job execute concurrently (in parallel), when appropriate. * [**Smart cancellation of stale builds**](#smart-cancellation-of-stale-builds) — Automatically cancels stale, in-flight CI runs when there are new commits to the PR. * [**Run slot treatment**](#run-slot-treatment) — CI runs don't consume a run slot. * [**SQL linting**](#sql-linting) — When enabled, automatically lints all SQL files in your project as a run step before your CI job builds. ##### Concurrent CI checks [Starter](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing")[Enterprise](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing")[Enterprise +](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing")[​](#concurrent-ci-checks- "Direct link to concurrent-ci-checks-") When you have teammates collaborating on the same dbt project creating pull requests on the same dbt repository, the same CI job will get triggered. Since each run builds into a dedicated, temporary schema that’s tied to the pull request, dbt can safely execute CI runs *concurrently* instead of *sequentially* (differing from what is done with deployment dbt jobs). Because no one needs to wait for one CI run to finish before another one can start, with concurrent CI checks, your whole team can test and integrate dbt code faster. The following describes the conditions when CI checks are run concurrently and when they’re not: * CI runs with different PR numbers execute concurrently. * CI runs with the *same* PR number and *different* commit SHAs execute serially because they’re building into the same schema. dbt will run the latest commit and cancel any older, stale commits. For details, refer to [Smart cancellation of stale builds](#smart-cancellation). * CI runs with the same PR number and same commit SHA, originating from different dbt projects will execute jobs concurrently. This can happen when two CI jobs are set up in different dbt projects that share the same dbt repository. ##### Smart cancellation of stale builds [Starter](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing")[Enterprise](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing")[Enterprise +](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing")[​](#smart-cancellation-of-stale-builds- "Direct link to smart-cancellation-of-stale-builds-") When you push a new commit to a PR, dbt enqueues a new CI run for the latest commit and cancels any CI run that is (now) stale and still in flight. This can happen when you’re pushing new commits while a CI build is still in process and not yet done. By cancelling runs in a safe and deliberate way, dbt helps improve productivity and reduce data platform spend on wasteful CI runs. [![Example of an automatically canceled run](/img/docs/dbt-cloud/using-dbt-cloud/example-smart-cancel-job.png?v=2 "Example of an automatically canceled run")](#)Example of an automatically canceled run ##### Run slot treatment [Starter](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing")[Enterprise](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing")[Enterprise +](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing")[​](#run-slot-treatment- "Direct link to run-slot-treatment-") CI runs don't consume run slots. This guarantees a CI check will never block a production run. ##### SQL linting [Starter](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing")[Enterprise](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing")[Enterprise +](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing")[​](#sql-linting- "Direct link to sql-linting-") Available on [dbt release tracks](https://docs.getdbt.com/docs/dbt-versions/cloud-release-tracks.md) and dbt Starter or Enterprise-tier accounts. When [enabled for your CI job](https://docs.getdbt.com/docs/deploy/ci-jobs.md#set-up-ci-jobs), dbt invokes [SQLFluff](https://sqlfluff.com/) which is a modular and configurable SQL linter that warns you of complex functions, syntax, formatting, and compilation errors. SQLFluff linting is not yet supported for dbt platform jobs that run on the dbt Fusion engine. For more information, see [Fusion limitations](https://docs.getdbt.com/docs/fusion/supported-features.md#limitations). By default, SQL linting lints all the changed SQL files in your project (compared to the last deferred production state). Note that [snapshots](https://docs.getdbt.com/docs/build/snapshots.md) can be defined in YAML *and* `.sql` files, but its SQL isn't lintable and can cause errors during linting. To prevent SQLFluff from linting snapshot files, add the snapshots directory to your `.sqlfluffignore` file (for example `snapshots/`). Refer to [snapshot linting](https://docs.getdbt.com/docs/cloud/studio-ide/lint-format.md#snapshot-linting) for more information. If the linter runs into errors, you can specify whether dbt should stop running the job on error or continue running it on error. When failing jobs, it helps reduce compute costs by avoiding builds for pull requests that don't meet your SQL code quality CI check. ###### To configure SQLFluff linting:[​](#to-configure-sqlfluff-linting "Direct link to To configure SQLFluff linting:") You can optionally configure SQLFluff linting rules to override default linting behavior. * Use [SQLFluff Configuration Files](https://docs.sqlfluff.com/en/stable/configuration/setting_configuration.html#configuration-files) to override the default linting behavior in dbt. * Create a `.sqlfluff` configuration file in your project, add your linting rules to it, and dbt will use them when linting. * When configuring, you can use `dbt` as the templater (for example, `templater = dbt`) * If you’re using the Studio IDE, dbt CLI, or any other editor, refer to [Customize linting](https://docs.getdbt.com/docs/cloud/studio-ide/lint-format.md#customize-linting) for guidance on how to add the dbt-specific (or dbtonic) linting rules we use for own project. * For complete details, refer to [Custom Usage](https://docs.sqlfluff.com/en/stable/gettingstarted.html#custom-usage) in the SQLFluff documentation. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Continuous integration jobs in dbt You can set up [continuous integration](https://docs.getdbt.com/docs/deploy/continuous-integration.md) (CI) jobs to run when someone opens a new pull request (PR) in your Git repository. By running and testing only *modified* models, dbt ensures these jobs are as efficient and resource conscientious as possible on your data platform. Triggering CI jobs in monorepos If you have a monorepo with several dbt projects, opening a single pull request in one of your projects will trigger jobs for all projects connected to the monorepo. To address this, you can use separate target branches per project (for example, `main-project-a`, `main-project-b`) to separate CI triggers. #### Prerequisites[​](#prerequisites "Direct link to Prerequisites") * You have a dbt account. * CI features: * For both the [concurrent CI checks](https://docs.getdbt.com/docs/deploy/continuous-integration.md#concurrent-ci-checks) and [smart cancellation of stale builds](https://docs.getdbt.com/docs/deploy/continuous-integration.md#smart-cancellation) features, your dbt account must be on the [Starter, Enterprise, or Enterprise+ plan](https://www.getdbt.com/pricing/). * [SQL linting](https://docs.getdbt.com/docs/deploy/continuous-integration.md#sql-linting) is available on [dbt release tracks](https://docs.getdbt.com/docs/dbt-versions/cloud-release-tracks.md) and to dbt [Starter, Enterprise, or Enterprise+](https://www.getdbt.com/pricing/) accounts. You should have [SQLFluff configured](https://docs.getdbt.com/docs/deploy/continuous-integration.md#to-configure-sqlfluff-linting) in your project. SQLFluff linting is not yet supported for dbt platform jobs that run on the dbt Fusion engine. For more information, see [Fusion limitations](https://docs.getdbt.com/docs/fusion/supported-features.md#limitations). * [Advanced CI](https://docs.getdbt.com/docs/deploy/advanced-ci.md) features: * For the [compare changes](https://docs.getdbt.com/docs/deploy/advanced-ci.md#compare-changes) feature, your dbt account must be on an [Enterprise-tier plan](https://www.getdbt.com/pricing/) and have enabled Advanced CI features. Please ask your [dbt administrator to enable](https://docs.getdbt.com/docs/cloud/account-settings.md#account-access-to-advanced-ci-features) this feature for you. After enablement, the **dbt compare** option becomes available in the CI job settings. * Set up a [connection with your Git provider](https://docs.getdbt.com/docs/cloud/git/git-configuration-in-dbt-cloud.md). This integration lets dbt run jobs on your behalf for job triggering. * If you're using a native [GitLab](https://docs.getdbt.com/docs/cloud/git/connect-gitlab.md) integration, you need a paid or self-hosted account that includes support for GitLab webhooks and [project access tokens](https://docs.gitlab.com/ee/user/project/settings/project_access_tokens.html). If you're using GitLab Free, merge requests will trigger CI jobs but CI job status updates (success or failure of the job) will not be reported back to GitLab. #### Availability of features by Git provider[​](#availability-of-features-by-git-provider "Direct link to Availability of features by Git provider") * If your git provider has a [native dbt integration](https://docs.getdbt.com/docs/cloud/git/git-configuration-in-dbt-cloud.md), you can seamlessly set up [continuous integration (CI)](https://docs.getdbt.com/docs/deploy/ci-jobs.md) jobs directly within dbt. * For providers without native integration, you can still use the [Git clone method](https://docs.getdbt.com/docs/cloud/git/import-a-project-by-git-url.md) to import your git URL and leverage the [dbt Administrative API](https://docs.getdbt.com/docs/dbt-cloud-apis/admin-cloud-api.md) to trigger a CI job to run. The following table outlines the available integration options and their corresponding capabilities. | **Git provider** | **Native dbt integration** | **Automated CI job** | **Git clone** | **Information** | **Supported plans** | | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | -------------------------- | -------------------- | ------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | ----------------------- | | [Azure DevOps](https://docs.getdbt.com/docs/cloud/git/connect-azure-devops.md)
| ✅ | ✅ | ✅ | Organizations on the Starter and Developer plans can connect to Azure DevOps using a deploy key. Note, you won’t be able to configure automated CI jobs but you can still develop. | Enterprise, Enterprise+ | | [GitHub](https://docs.getdbt.com/docs/cloud/git/connect-github.md)
| ✅ | ✅ | ✅ | | All dbt plans | | [GitLab](https://docs.getdbt.com/docs/cloud/git/connect-gitlab.md)
| ✅ | ✅ | ✅ | | All dbt plans | | All other git providers using [Git clone](https://docs.getdbt.com/docs/cloud/git/import-a-project-by-git-url.md) ([BitBucket](https://docs.getdbt.com/docs/cloud/git/import-a-project-by-git-url.md#bitbucket), [AWS CodeCommit](https://docs.getdbt.com/docs/cloud/git/import-a-project-by-git-url.md#aws-codecommit), and others) | ❌ | ❌ | ✅ | Refer to the [Customizing CI/CD with custom pipelines](https://docs.getdbt.com/guides/custom-cicd-pipelines.md?step=1) guide to set up continuous integration and continuous deployment (CI/CD). | | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | #### Set up CI jobs[​](#set-up-ci-jobs "Direct link to Set up CI jobs") dbt Labs recommends that you create your CI job in a dedicated dbt [deployment environment](https://docs.getdbt.com/docs/deploy/deploy-environments.md#create-a-deployment-environment) that's connected to a staging database. Having a separate environment dedicated for CI will provide better isolation between your temporary CI schema builds and your production data builds. Additionally, sometimes teams need their CI jobs to be triggered when a PR is made to a branch other than main. If your team maintains a staging branch as part of your release process, having a separate environment will allow you to set a [custom branch](https://docs.getdbt.com/faqs/Environments/custom-branch-settings.md) and, accordingly, the CI job in that dedicated environment will be triggered only when PRs are made to the specified custom branch. To learn more, refer to [Get started with CI tests](https://docs.getdbt.com/guides/set-up-ci.md). To make CI job creation easier, many options on the **CI job** page are set to default values that dbt Labs recommends that you use. If you don't want to use the defaults, you can change them. 1. On your deployment environment page, click **Create job** > **Continuous integration job** to create a new CI job. 2. Options in the **Job settings** section: * **Job name** — Specify the name for this CI job. * **Description** — Provide a description about the CI job. * **Environment** — By default, this will be set to the environment you created the CI job from. Use the dropdown to change the default setting. 3. Options in the **Git trigger** section: * **Triggered by pull requests** — By default, it’s enabled. Every time a developer opens up a pull request or pushes a commit to an existing pull request, this job will get triggered to run. * **Run on draft pull request** — Enable this option if you want to also trigger the job to run every time a developer opens up a draft pull request or pushes a commit to that draft pull request. 4. Options in the **Execution settings** section: * **Commands** — By default, this includes the `dbt build --select state:modified+` command. This informs dbt to build only new or changed models and their downstream dependents. Importantly, state comparison can only happen when there is a deferred environment selected to compare state to. Click **Add command** to add more [commands](https://docs.getdbt.com/docs/deploy/job-commands.md) that you want to be invoked when this job runs. * **Linting** — Enable this option for dbt to [lint the SQL files](https://docs.getdbt.com/docs/deploy/continuous-integration.md#sql-linting) in your project as the first step in `dbt run`. If this check runs into an error, dbt can either **Stop running on error** or **Continue running on error**. * **dbt compare**[Enterprise](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing")[Enterprise +](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing") — Enable this option to compare the last applied state of the production environment (if one exists) with the latest changes from the pull request, and identify what those differences are. To enable record-level comparison and primary key analysis, you must add a [primary key constraint](https://docs.getdbt.com/reference/resource-properties/constraints.md) or [uniqueness test](https://docs.getdbt.com/reference/resource-properties/data-tests.md#unique). Otherwise, you'll receive a "Primary key missing" error message in dbt. To review the comparison report, navigate to the [Compare tab](https://docs.getdbt.com/docs/deploy/run-visibility.md#compare-tab) in the job run's details. A summary of the report is also available from the pull request in your Git provider (see the [CI report example](#example-ci-report)). Optimization tip When you enable the **dbt compare** checkbox, you can customize the comparison command to optimize your CI job. For example, if you have large models that take a long time to compare, you can exclude them to speed up the process using the [`--exclude` flag](https://docs.getdbt.com/reference/node-selection/exclude.md). Refer to [compare changes custom commands](https://docs.getdbt.com/docs/deploy/job-commands.md#compare-changes-custom-commands) for more details. Additionally, if you set [`event_time`](https://docs.getdbt.com/reference/resource-configs/event-time.md) in your models/seeds/snapshots/sources, it allows you to compare matching date ranges between tables by filtering to overlapping date ranges. This is useful for faster CI workflow or custom sampling set ups. * **Compare changes against an environment (Deferral)** — By default, it’s set to the **Production** environment if you created one. This option allows dbt to check the state of the code in the PR against the code running in the deferred environment, so as to only check the modified code, instead of building the full table or the entire DAG. info Older versions of dbt only allow you to defer to a specific job instead of an environment. Deferral to a job compares state against the project code that was run in the deferred job's last successful run. Deferral to an environment is more efficient as dbt will compare against the project representation (which is stored in the `manifest.json`) of the last successful deploy job run that executed in the deferred environment. By considering *all* [deploy jobs](https://docs.getdbt.com/docs/deploy/deploy-jobs.md) that run in the deferred environment, dbt will get a more accurate, latest project representation state. * **Run timeout** — Cancel the CI job if the run time exceeds the timeout value. You can use this option to help ensure that a CI check doesn't consume too much of your warehouse resources. If you enable the **dbt compare** option, the timeout value defaults to `3600` (one hour) to prevent long-running comparisons. 5. (optional) Options in the **Advanced settings** section: * **Environment variables** — Define [environment variables](https://docs.getdbt.com/docs/build/environment-variables.md) to customize the behavior of your project when this CI job runs. You can specify that a CI job is running in a *Staging* or *CI* environment by setting an environment variable and modifying your project code to behave differently, depending on the context. It's common for teams to process only a subset of data for CI runs, using environment variables to branch logic in their dbt project code. * **Target name** — Define the [target name](https://docs.getdbt.com/docs/build/custom-target-names.md). Similar to **Environment Variables**, this option lets you customize the behavior of the project. You can use this option to specify that a CI job is running in a *Staging* or *CI* environment by setting the target name and modifying your project code to behave differently, depending on the context. * **dbt version** — By default, it’s set to inherit the [dbt version](https://docs.getdbt.com/docs/dbt-versions/core.md) from the environment. dbt Labs strongly recommends that you don't change the default setting. This option to change the version at the job level is useful only when you upgrade a project to the next dbt version; otherwise, mismatched versions between the environment and job can lead to confusing behavior. * **Threads** — By default, it’s set to 4 [threads](https://docs.getdbt.com/docs/local/profiles.yml.md#understanding-threads). Increase the thread count to increase model execution concurrency. * **Generate docs on run** — Enable this if you want to [generate project docs](https://docs.getdbt.com/docs/explore/build-and-view-your-docs.md) when this job runs. This is disabled by default since testing doc generation on every CI check is not a recommended practice. * **Run source freshness** — Enable this option to invoke the `dbt source freshness` command before running this CI job. Refer to [Source freshness](https://docs.getdbt.com/docs/deploy/source-freshness.md) for more details. [![Example of CI Job page in the dbt UI](/img/docs/dbt-cloud/using-dbt-cloud/create-ci-job.png?v=2 "Example of CI Job page in the dbt UI")](#)Example of CI Job page in the dbt UI ##### Example of CI check in pull request[​](#example-ci-check "Direct link to Example of CI check in pull request") The following is an example of a CI check in a GitHub pull request. The green checkmark means the dbt build and tests were successful. Clicking on the dbt section takes you to the relevant CI run in dbt. [![Example of CI check in GitHub pull request](/img/docs/dbt-cloud/using-dbt-cloud/example-github-pr.png?v=2 "Example of CI check in GitHub pull request")](#)Example of CI check in GitHub pull request ##### Example of CI report in pull request [Preview](https://docs.getdbt.com/docs/dbt-versions/product-lifecycles "Go to https://docs.getdbt.com/docs/dbt-versions/product-lifecycles")[​](#example-ci-report "Direct link to example-ci-report") The following is an example of a CI report in a GitHub pull request, which is shown when the **dbt compare** option is enabled for the CI job. It displays a high-level summary of the models that changed from the pull request. [![Example of CI report comment in GitHub pull request](/img/docs/dbt-cloud/using-dbt-cloud/example-github-ci-report.png?v=2 "Example of CI report comment in GitHub pull request")](#)Example of CI report comment in GitHub pull request #### Trigger a CI job with the API [Enterprise](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing")[Enterprise +](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing")[​](#trigger-a-ci-job-with-the-api- "Direct link to trigger-a-ci-job-with-the-api-") If you're not using dbt’s native Git integration with [GitHub](https://docs.getdbt.com/docs/cloud/git/connect-github.md), [GitLab](https://docs.getdbt.com/docs/cloud/git/connect-gitlab.md), or [Azure DevOps](https://docs.getdbt.com/docs/cloud/git/connect-azure-devops.md), you can use the [Administrative API](https://docs.getdbt.com/docs/dbt-cloud-apis/admin-cloud-api.md) to trigger a CI job to run. However, dbt will not automatically delete the temporary schema for you. This is because automatic deletion relies on incoming webhooks from Git providers, which is only available through the native integrations. ##### Prerequisites[​](#prerequisites-1 "Direct link to Prerequisites") * You have a dbt account. * You have a dbt [Enterprise or Enterprise+ plan](https://www.getdbt.com/pricing/). Legacy Team plans also retain access. * For the [Concurrent CI checks](https://docs.getdbt.com/docs/deploy/continuous-integration.md#concurrent-ci-checks) and [Smart cancellation of stale builds](https://docs.getdbt.com/docs/deploy/continuous-integration.md#smart-cancellation) features, your dbt account must be on the [Enterprise or Enterprise+ plan](https://www.getdbt.com/pricing/), and legacy Team plans. Starter plans do not have access to these features when triggering a CI job with the API. 1. Set up a CI job with the [Create Job](https://docs.getdbt.com/dbt-cloud/api-v2#/operations/Create%20Job) API endpoint using `"job_type": ci` or from the [dbt UI](#set-up-ci-jobs). 2. Call the [Trigger Job Run](https://docs.getdbt.com/dbt-cloud/api-v2#/operations/Trigger%20Job%20Run) API endpoint to trigger the CI job. You must include both of these fields to the payload: * Provide the pull request (PR) ID using one of these fields: * `github_pull_request_id` * `gitlab_merge_request_id` * `azure_devops_pull_request_id` * `non_native_pull_request_id` (for example, BitBucket) * Provide the `git_sha` or `git_branch` to target the correct commit or branch to run the job against. #### Semantic validations in CI [Starter](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing")[Enterprise](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing")[Enterprise +](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing")[​](#semantic-validations-in-ci-- "Direct link to semantic-validations-in-ci--") Automatically test your semantic nodes (metrics, semantic models, and saved queries) during code reviews by adding warehouse validation checks in your CI job, guaranteeing that any code changes made to dbt models don't break these metrics. To do this, add the command `dbt sl validate --select state:modified+` in the CI job. This ensures the validation of modified semantic nodes and their downstream dependencies. [![Semantic validations in CI workflow](/img/docs/dbt-cloud/deployment/sl-ci-job.png?v=2 "Semantic validations in CI workflow")](#)Semantic validations in CI workflow ###### Benefits[​](#benefits "Direct link to Benefits") * Testing semantic nodes in a CI job supports deferral and selection of semantic nodes. * It allows you to catch issues early in the development process and deliver high-quality data to your end users. * Semantic validation executes an explain query in the data warehouse for semantic nodes to ensure the generated SQL will execute. * For semantic nodes and models that aren't downstream of modified models, dbt defers to the production models ##### Set up semantic validations in your CI job[​](#set-up-semantic-validations-in-your-ci-job "Direct link to Set up semantic validations in your CI job") To learn how to set this up, refer to the following steps: 1. Navigate to the **Job setting** page and click **Edit**. 2. Add the `dbt sl validate --select state:modified+` command under **Commands** in the **Execution settings** section. The command uses state selection and deferral to run validation on any semantic nodes downstream of model changes. To reduce job times, we recommend only running CI on modified semantic models. 3. Click **Save** to save your changes. There are additional commands and use cases described in the [next section](#use-cases), such as validating all semantic nodes, validating specific semantic nodes, and so on. [![Validate semantic nodes downstream of model changes in your CI job.](/img/docs/dbt-cloud/deployment/ci-dbt-sl-validate-downstream.png?v=2 "Validate semantic nodes downstream of model changes in your CI job.")](#)Validate semantic nodes downstream of model changes in your CI job. ##### Use cases[​](#use-cases "Direct link to Use cases") Use or combine different selectors or commands to validate semantic nodes in your CI job. Semantic validations in CI supports the following use cases:  Semantic nodes downstream of model changes (recommended) To validate semantic nodes that are downstream of a model change, add the two commands in your job **Execution settings** section: ```bash dbt build --select state:modified+ dbt sl validate --select state:modified+ ``` * The first command builds the modified models. * The second command validates the semantic nodes downstream of the modified models. Before running semantic validations, dbt must build the modified models. This process ensures that downstream semantic nodes are validated using the CI schema through the dbt Semantic Layer API. For semantic nodes and models that aren't downstream of modified models, dbt defers to the production models. [![Validate semantic nodes downstream of model changes in your CI job.](/img/docs/dbt-cloud/deployment/ci-dbt-sl-validate-downstream.png?v=2 "Validate semantic nodes downstream of model changes in your CI job.")](#)Validate semantic nodes downstream of model changes in your CI job.  Semantic nodes that are modified or affected by downstream modified nodes. To only validate modified semantic nodes, use the following command (with [state selection](https://docs.getdbt.com/reference/node-selection/state-selection.md)): ```bash dbt sl validate --select state:modified+ ``` [![Use state selection to validate modified metric definition models in your CI job.](/img/docs/dbt-cloud/deployment/ci-dbt-sl-validate-modified.png?v=2 "Use state selection to validate modified metric definition models in your CI job.")](#)Use state selection to validate modified metric definition models in your CI job. This will only validate semantic nodes. It will use the defer state set configured in your orchestration job, deferring to your production models.  Select specific semantic nodes Use the selector syntax to select the *specific* semantic node(s) you want to validate: ```bash dbt sl validate --select metric:revenue ``` [![Use state selection to validate modified metric definition models in your CI job.](/img/docs/dbt-cloud/deployment/ci-dbt-sl-validate-select.png?v=2 "Use state selection to validate modified metric definition models in your CI job.")](#)Use state selection to validate modified metric definition models in your CI job. In this example, the CI job will validate the selected `metric:revenue` semantic node. To select multiple semantic nodes, use the selector syntax: `dbt sl validate --select metric:revenue metric:customers`. If you don't specify a selector, dbt will validate all semantic nodes in your project.  Select all semantic nodes To validate *all* semantic nodes in your project, add the following command to defer to your production schema when generating the warehouse validation queries: ```bash dbt sl validate ``` [![Validate all semantic nodes in your CI job by adding the command: 'dbt sl validate' in your job execution settings.](/img/docs/dbt-cloud/deployment/ci-dbt-sl-validate-all.png?v=2 "Validate all semantic nodes in your CI job by adding the command: 'dbt sl validate' in your job execution settings.")](#)Validate all semantic nodes in your CI job by adding the command: 'dbt sl validate' in your job execution settings. #### Troubleshooting[​](#troubleshooting "Direct link to Troubleshooting") Unable to trigger a CI job with GitLab When you connect dbt to a GitLab repository, GitLab automatically registers a webhook in the background, viewable under the repository settings. This webhook is also used to trigger [CI jobs](https://docs.getdbt.com/docs/deploy/ci-jobs.md) when you push to the repository. If you're unable to trigger a CI job, this usually indicates that the webhook registration is missing or incorrect. To resolve this issue, navigate to the repository settings in GitLab and view the webhook registrations by navigating to GitLab --> **Settings** --> **Webhooks**. Some things to check: * The webhook registration is enabled in GitLab. * The webhook registration is configured with the correct URL and secret. If you're still experiencing this issue, reach out to the Support team at and we'll be happy to help!  CI jobs aren't triggering occasionally when opening a PR using the Azure DevOps (ADO) integration dbt won't trigger a CI job run if the latest commit in a pull or merge request has already triggered a run for that job. However, some providers (like GitHub) will enforce the result of the existing run on multiple pull/merge requests. Scenarios where dbt does not trigger a CI job with Azure DevOps: 1. Reusing a branch in a new PR * If you abandon a previous PR (PR 1) that triggered a CI job for the same branch (`feature-123`) merging into `main`, and then open a new PR (PR 2) with the same branch merging into`main` — dbt won't trigger a new CI job for PR 2. 2. Reusing the same commit * If you create a new PR (PR 2) on the same commit (`#4818ceb`) as a previous PR (PR 1) that triggered a CI job — dbt won't trigger a new CI job for PR 2.  Temporary schemas aren't dropping If your temporary schemas aren't dropping after a PR merges or closes, this typically indicates one of these issues: * You have overridden the `generate_schema_name` macro and it isn't using `dbt_cloud_pr_` as the prefix. To resolve this, change your macro so that the temporary PR schema name contains the required prefix. For example: * ✅ Temporary PR schema name contains the prefix `dbt_cloud_pr_` (like `dbt_cloud_pr_123_456_marketing`). * ❌ Temporary PR schema name doesn't contain the prefix `dbt_cloud_pr_` (like `marketing`). A macro is creating a schema but there are no dbt models writing to that schema. dbt doesn't drop temporary schemas that weren't written to as a result of running a dbt model.  Error messages that refer to schemas from previous PRs If you receive a schema-related error message referencing a *previous* PR, this is usually an indicator that you are not using a production job for your deferral and are instead using *self*. If the prior PR has already been merged, the prior PR's schema may have been dropped by the time the CI job for the current PR is kicked off. To fix this issue, select a production job run to defer to instead of self.  Production job runs failing at the 'Clone Git Repository step' dbt can only check out commits that belong to the original repository. dbt *cannot* checkout commits that belong to a fork of that repository. If you receive the following error message at the **Clone Git Repository** step of your job run: ```text Error message: Cloning into '/tmp/jobs/123456/target'... Successfully cloned repository. Checking out to e845be54e6dc72342d5a8f814c8b3316ee220312...> Failed to checkout to specified revision. git checkout e845be54e6dc72342d5a8f814c8b3316ee220312 fatal: reference is not a tree: e845be54e6dc72342d5a8f814c8b3316ee220312 ``` Double-check that your PR isn't trying to merge using a commit that belongs to a fork of the repository attached to your dbt project.  CI job not triggering for Virtual Private dbt users To trigger jobs on dbt using the [API](https://docs.getdbt.com/docs/dbt-cloud-apis/admin-cloud-api.md), your Git provider needs to connect to your dbt account. If you're on a Virtual Private dbt Enterprise plan using security features like ingress PrivateLink or IP Allowlisting, registering CI hooks may not be available and can cause the job to fail silently.  PR status for CI job stays in 'pending' in Azure DevOps after job run finishes When you start a CI job, the pull request status should show as `pending` while it waits for an update from dbt. Once the CI job finishes, dbt sends the status to Azure DevOps (ADO), and the status will change to either `succeeded` or `failed`. If the status doesn't get updated after the job runs, check if there are any git branch policies in place blocking ADO from receiving these updates. One potential issue is the **Reset conditions** under **Status checks** in the ADO repository branch policy. If you enable the **Reset status whenever there are new changes** checkbox (under **Reset conditions**), it can prevent dbt from updating ADO about your CI job run status. You can find relevant information here: * [Azure DevOps Services Status checks](https://learn.microsoft.com/en-us/azure/devops/repos/git/branch-policies?view=azure-devops\&tabs=browser#status-checks) * [Azure DevOps Services Pull Request Stuck Waiting on Status Update](https://support.hashicorp.com/hc/en-us/articles/18670331556627-Azure-DevOps-Services-Pull-Request-Stuck-Waiting-on-Status-Update-from-Terraform-Cloud-Enterprise-Run) * [Pull request status](https://learn.microsoft.com/en-us/azure/devops/repos/git/pull-request-status?view=azure-devops#pull-request-status) #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Contribute to adapters The dbt Community exists to allow analytics practitioners share their knowledge, help others and collectively to drive forward the discipline of analytics engineering. There are opportunities here for everyone to contribute whether you're at the beginning of your analytics engineering journey or you are a seasoned data professional. This section explains how you can contribute to existing adapters, or create a new adapter. ##### Contribute to a pre-existing adapter[​](#contribute-to-a-pre-existing-adapter "Direct link to Contribute to a pre-existing adapter") Community-supported plugins are works in progress, and anyone is welcome to contribute by testing and writing code. If you're interested in contributing: * Join both the dedicated channel, [#adapter-ecosystem](https://getdbt.slack.com/archives/C030A0UF5LM), in [dbt Slack](https://community.getdbt.com/) and the channel for your adapter's data store. Refer to the **Slack Channel** link in the [dbt Core platform](https://docs.getdbt.com/docs/local/profiles.yml.md) pages. * Check out the open issues in the plugin's source repository. Use the relevant **GitHub repo** link in the [dbt Core platform](https://docs.getdbt.com/docs/local/profiles.yml.md) pages. ##### Create a new adapter[​](#create-a-new-adapter "Direct link to Create a new adapter") If you see something missing from the lists above, and you're interested in developing an integration, read more about adapters and how they're developed in the [Build, test, document, and promote adapters](https://docs.getdbt.com/guides/adapter-creation.md). If you have a new adapter, please add it to this list using a pull request! You can refer to [Build, test, document, and promote adapters](https://docs.getdbt.com/guides/adapter-creation.md) for more information on documenting your adapter. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Conversion metrics Conversion metrics let you measure how often one event leads to another for a specific entity within a defined time window. For example, you can track how often a user (entity) who visits your site (base event) makes a purchase (conversion event) within 7 days (time window). To set this up, you’ll specify both the time range and the entity that links/joins the two events. Conversion metrics are different from [ratio metrics](https://docs.getdbt.com/docs/build/ratio.md) because you need to include an entity in the pre-aggregated join. #### Parameters[​](#parameters "Direct link to Parameters") The specification for conversion metrics is as follows: Refer to [additional settings](#additional-settings) to learn how to customize conversion metrics with settings for null values, calculation type, and constant properties. The following code example displays the complete specification for conversion metrics and details how they're applied: #### Conversion metric example[​](#conversion-metric-example "Direct link to Conversion metric example") The following example will measure conversions from website visits (`VISITS` table) to order completions (`BUYS` table) and calculate a conversion metric for this scenario step by step. Suppose you have two semantic models, `VISITS` and `BUYS`: * The `VISITS` table represents visits to an e-commerce site. * The `BUYS` table represents someone completing an order on that site. The underlying tables look like the following: `VISITS`
Contains user visits with `USER_ID` and `REFERRER_ID`. | DS | USER\_ID | REFERRER\_ID | | ---------- | -------- | ------------ | | 2020-01-01 | bob | facebook | | 2020-01-04 | bob | google | | 2020-01-07 | bob | amazon | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | `BUYS`
Records completed orders with `USER_ID` and `REFERRER_ID`. | DS | USER\_ID | REFERRER\_ID | | ---------- | -------- | ------------ | | 2020-01-02 | bob | facebook | | 2020-01-07 | bob | amazon | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | Next, define a conversion metric as follows: To calculate the conversion, link the `BUYS` event to the nearest `VISITS` event (or closest base event). The following steps explain this process in more detail: ##### Step 1: Join `VISITS` and `BUYS`[​](#step-1-join-visits-and-buys "Direct link to step-1-join-visits-and-buys") This step joins the `BUYS` table to the `VISITS` table and gets all combinations of visits-buys events that match the join condition where buys occur within 7 days of the visit (any rows that have the same user and a buy happened at most 7 days after the visit). The SQL generated in these steps looks like the following: ```sql select v.ds, v.user_id, v.referrer_id, b.ds, b.uuid, 1 as buys from visits v inner join ( select *, uuid_string() as uuid from buys -- Adds a uuid column to uniquely identify the different rows ) b on v.user_id = b.user_id and v.ds <= b.ds and v.ds > b.ds - interval '7 days' ``` The dataset returns the following (note that there are two potential conversion events for the first visit): | V.DS | V.USER\_ID | V.REFERRER\_ID | B.DS | UUID | BUYS | | ---------- | ---------- | -------------- | ---------- | ----- | ---- | | 2020-01-01 | bob | facebook | 2020-01-02 | uuid1 | 1 | | 2020-01-01 | bob | facebook | 2020-01-07 | uuid2 | 1 | | 2020-01-04 | bob | google | 2020-01-07 | uuid2 | 1 | | 2020-01-07 | bob | amazon | 2020-01-07 | uuid2 | 1 | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | ##### Step 2: Refine with window function[​](#step-2-refine-with-window-function "Direct link to Step 2: Refine with window function") Instead of returning the raw visit values, use window functions to link conversions to the closest base event. You can partition by the conversion source and get the `first_value` ordered by `visit ds`, descending to get the closest base event from the conversion event: ```sql select first_value(v.ds) over (partition by b.ds, b.user_id, b.uuid order by v.ds desc) as v_ds, first_value(v.user_id) over (partition by b.ds, b.user_id, b.uuid order by v.ds desc) as user_id, first_value(v.referrer_id) over (partition by b.ds, b.user_id, b.uuid order by v.ds desc) as referrer_id, b.ds, b.uuid, 1 as buys from visits v inner join ( select *, uuid_string() as uuid from buys ) b on v.user_id = b.user_id and v.ds <= b.ds and v.ds > b.ds - interval '7 day' ``` The dataset returns the following: | V.DS | V.USER\_ID | V.REFERRER\_ID | B.DS | UUID | BUYS | | ---------- | ---------- | -------------- | ---------- | ----- | ---- | | 2020-01-01 | bob | facebook | 2020-01-02 | uuid1 | 1 | | 2020-01-07 | bob | amazon | 2020-01-07 | uuid2 | 1 | | 2020-01-07 | bob | amazon | 2020-01-07 | uuid2 | 1 | | 2020-01-07 | bob | amazon | 2020-01-07 | uuid2 | 1 | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | This workflow links the two conversions to the correct visit events. Due to the join, you end up with multiple combinations, leading to fanout results. After applying the window function, duplicates appear. To resolve this and eliminate duplicates, use a distinct select. The UUID also helps identify which conversion is unique. The next steps provide more detail on how to do this. ##### Step 3: Remove duplicates[​](#step-3-remove-duplicates "Direct link to Step 3: Remove duplicates") Instead of regular select used in the [Step 2](#step-2-refine-with-window-function), use a distinct select to remove the duplicates: ```sql select distinct first_value(v.ds) over (partition by b.ds, b.user_id, b.uuid order by v.ds desc) as v_ds, first_value(v.user_id) over (partition by b.ds, b.user_id, b.uuid order by v.ds desc) as user_id, first_value(v.referrer_id) over (partition by b.ds, b.user_id, b.uuid order by v.ds desc) as referrer_id, b.ds, b.uuid, 1 as buys from visits v inner join ( select *, uuid_string() as uuid from buys ) b on v.user_id = b.user_id and v.ds <= b.ds and v.ds > b.ds - interval '7 day'; ``` The dataset returns the following: | V.DS | V.USER\_ID | V.REFERRER\_ID | B.DS | UUID | BUYS | | ---------- | ---------- | -------------- | ---------- | ----- | ---- | | 2020-01-01 | bob | facebook | 2020-01-02 | uuid1 | 1 | | 2020-01-07 | bob | amazon | 2020-01-07 | uuid2 | 1 | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | You now have a dataset where every conversion is connected to a visit event. To proceed: 1. Sum up the total conversions in the "conversions" table. 2. Combine this table with the "opportunities" table, matching them based on group keys. 3. Calculate the conversion rate. ##### Step 4: Aggregate and calculate[​](#step-4-aggregate-and-calculate "Direct link to Step 4: Aggregate and calculate") Now that you’ve tied each conversion event to a visit, you can calculate the aggregated conversions and opportunities . Then, you can join them to calculate the actual conversion rate. The SQL to calculate the conversion rate is as follows: ```sql select coalesce(subq_3.metric_time__day, subq_13.metric_time__day) as metric_time__day, cast(max(subq_13.buys) as double) / cast(nullif(max(subq_3.visits), 0) as double) as visit_to_buy_conversion_rate_7d from ( -- base select metric_time__day, sum(visits) as visits from ( select date_trunc('day', first_contact_date) as metric_time__day, 1 as visits from visits ) subq_2 group by metric_time__day ) subq_3 full outer join ( -- conversion select metric_time__day, sum(buys) as buys from ( -- ... -- The output of this subquery is the table produced in Step 3. The SQL is hidden for legibility. -- To see the full SQL output, add --explain to your conversion metric query. ) subq_10 group by metric_time__day ) subq_13 on subq_3.metric_time__day = subq_13.metric_time__day group by metric_time__day ``` ##### Additional settings[​](#additional-settings "Direct link to Additional settings") Use the following additional settings to customize your conversion metrics: * **Null conversion values:** Set null conversions to zero using `fill_nulls_with`. Refer to [Fill null values for metrics](https://docs.getdbt.com/docs/build/fill-nulls-advanced.md) for more info. * **Calculation type:** Choose between showing raw conversions or conversion rate. * **Constant property:** Add conditions for specific scenarios to join conversions on constant properties. - Set null conversion events to zero - Set calculation type parameter - Set constant property To return zero in the final data set, you can set the value of a null conversion event to zero instead of null. You can add the `fill_nulls_with` parameter to your conversion metric definition like this: This will return the following results: [![Conversion metric with fill nulls with parameter](/img/docs/dbt-cloud/semantic-layer/conversion-metrics-fill-null.png?v=2 "Conversion metric with fill nulls with parameter")](#)Conversion metric with fill nulls with parameter Refer to [Fill null values for metrics](https://docs.getdbt.com/docs/build/fill-nulls-advanced.md) for more info. Use the conversion calculation parameter to either show the raw number of conversions or the conversion rate. The default value is the conversion rate. You can change the default to display the number of conversions by setting the `calculation: conversion` parameter: *Refer to [Amplitude's blog posts on constant properties](https://amplitude.com/blog/holding-constant) to learn about this concept.* You can add a constant property to a conversion metric to count only those conversions where a specific dimension or entity matches in both the base and conversion events. For example, if you're at an e-commerce company and want to answer the following question: * *How often did visitors convert from `View Item Details` to `Complete Purchase` with the same product in each step?*
* This question is tricky to answer because users could have completed these two conversion milestones across many products. For example, they may have viewed a pair of shoes, then a T-shirt, and eventually checked out with a bow tie. This would still count as a conversion, even though the conversion event only happened for the bow tie. Back to the initial questions, you want to see how many customers viewed an item detail page and then completed a purchase for the *same* product. In this case, you want to set `product_id` as the constant property. You can specify this in the configs as follows: You will add an additional condition to the join to make sure the constant property is the same across conversions. ```sql select distinct first_value(v.ds) over (partition by buy_source.ds, buy_source.user_id, buy_source.session_id order by v.ds desc rows between unbounded preceding and unbounded following) as ds, first_value(v.user_id) over (partition by buy_source.ds, buy_source.user_id, buy_source.session_id order by v.ds desc rows between unbounded preceding and unbounded following) as user_id, first_value(v.referrer_id) over (partition by buy_source.ds, buy_source.user_id, buy_source.session_id order by v.ds desc rows between unbounded preceding and unbounded following) as referrer_id, buy_source.uuid, 1 as buys from {{ source_schema }}.fct_view_item_details v inner join ( select *, {{ generate_random_uuid() }} as uuid from {{ source_schema }}.fct_purchases ) buy_source on v.user_id = buy_source.user_id and v.ds <= buy_source.ds and v.ds > buy_source.ds - interval '7 day' and buy_source.product_id = v.product_id --Joining on the constant property product_id ``` #### Related docs[​](#related-docs "Direct link to Related docs") * [Fill null values for metrics](https://docs.getdbt.com/docs/build/fill-nulls-advanced.md) #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Cost Insights Private betaEnterpriseEnterprise + ### Cost Insights [Private beta](https://docs.getdbt.com/docs/dbt-versions/product-lifecycles "Go to https://docs.getdbt.com/docs/dbt-versions/product-lifecycles")[Enterprise](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing")[Enterprise +](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing") Private beta feature Cost Insights is a private beta feature. To request access, contact your account manager. Cost Insights shows estimated costs and compute time for your dbt projects and models directly in the dbt platform, so you can measure and share the impact of optimizations like [state-aware orchestration](https://docs.getdbt.com/docs/deploy/state-aware-about.md). [State-aware orchestration](https://docs.getdbt.com/docs/deploy/state-aware-about.md) makes your dbt workflows more efficient by reusing models and tests instead of running full rebuilds. When this is enabled, Cost Insights helps you demonstrate the resulting cost reductions and efficiency gains. These cost and cost reduction estimates are based on a retroactive analysis of runs after you enable Fusion and state-aware orchestration. They reflect actual historical usage, *not* forecasts of future costs or cost reductions. With Cost Insights, you can see: * **How much your dbt models cost to run**: See the compute cost and times for each model and job in your warehouse's native units. * **The cost reductions from using state-aware orchestration**: Understand the cost reduction when state-aware orchestration reuses unchanged models. * **Cost trends over time**: Track your warehouse spend and optimization impact across your dbt projects. The Cost Insights section is available in different dbt platform areas and lets you view your cost data and the impact of state-aware optimizations across various dimensions: * [Project dashboard](https://docs.getdbt.com/docs/explore/explore-cost-data.md#project-dashboard) * [Catalog on Model page](https://docs.getdbt.com/docs/explore/explore-cost-data.md#model-performance-in-catalog) * [Job details page](https://docs.getdbt.com/docs/explore/explore-cost-data.md#job-details) #### Prerequisities[​](#prerequisities "Direct link to Prerequisities") To view cost data, ensure you have: * A dbt account with dbt Fusion engine enabled. Contact your account manager to enable Fusion for your account. * One of the roles listed in [Assign required permissions](https://docs.getdbt.com/docs/explore/set-up-cost-insights.md#assign-required-permissions). * A supported data warehouse: Snowflake, BigQuery, or Databricks. For setup instructions, see [Set up Cost Insights](https://docs.getdbt.com/docs/explore/set-up-cost-insights.md). #### Understanding cost and reduction estimates[​](#understanding-cost-and-reduction-estimates "Direct link to Understanding cost and reduction estimates") note Cost estimates are intended for visibility and optimization, not billing reconciliation. dbt calculates the cost of running your dbt models using your data warehouse's usage metadata and billing context. dbt computes costs daily using up to the *last seven days of available data*. ##### Warehouse-specific logic[​](#warehouse-specific-logic "Direct link to Warehouse-specific logic") The following sections explain how costs are calculated for each supported warehouse. Expand each section to view the details.  Snowflake dbt computes Snowflake query costs using Snowflake's query attribution data and your credit price (`price_per_credit`). Query attribution data is always available for Snowflake. dbt pulls the `per_credit` price directly from Snowflake when available; otherwise, dbt uses the configured or default value in the dbt platform. For more information about configuring or viewing these values, see [Configure Cost Insights settings](https://docs.getdbt.com/docs/explore/set-up-cost-insights.md#configure-cost-insights-settings-optional). Formula: ```text credits_per_query * price_per_credit ``` Where: * `credits_per_query` - Cloud services, compute, and query acceleration credits attributed to the query. dbt sources this value from `QUERY_ATTRIBUTION_HISTORY`. For more information, see the [Snowflake documentation](https://docs.snowflake.com/en/sql-reference/account-usage/query_attribution_history). * `price_per_credit` - Your Snowflake credit price (from Snowflake system tables when available, otherwise from your configured input or the default rate).  BigQuery BigQuery does not expose per-query cost directly in system tables. Instead, dbt estimates cost by combining *query usage* with a *pricing input* (either from your configuration or the default rate). * **On-demand pricing** The cost is determined by how much data each query processes. The usage shows the amount of that data billed for the query. Formula: ```text data_processed_per_query * price_per_tib ``` Where: * `data_processed_per_query` - Total data billed for the query (normalized to TiB). dbt sources this value from `information_schema.jobs.total_bytes_billed`. For more information, see the [BigQuery documentation](https://docs.cloud.google.com/bigquery/docs/information-schema-jobs). * `price_per_tib` - BigQuery on-demand price per TiB (from your configuration or the default rate). * **Capacity pricing (reservations)** The cost is determined by how long each query runs on reserved compute. The usage shows the amount of that reserved compute time consumed by a query. Formula: ```text compute_time_per_query * price_per_slot_hour ``` Where: * `compute_time_per_query` - Total slot time used by the query (in hours). dbt sources this value from `information_schema.jobs.total_slot_ms`. For more information, see the [BigQuery documentation](https://docs.cloud.google.com/bigquery/docs/information-schema-jobs). * `price_per_slot_hour` - BigQuery capacity price per slot-hour (from your configuration or the default rate) * **Cached queries** Queries served from cache do not consume compute and are counted as $0.  Databricks Databricks does not directly attribute usage to individual queries. Instead, dbt estimates per-query cost by proportionally allocating Databricks Units (DBUs) based on how long each query ran during a billing period. * Queries that run longer receive a larger share of usage. * Usage is converted to dollars using your list price. Formula: ```text usage_per_query * cost_per_dbu ``` Where: * `usage_per_query` - DBUs attributed to the query. * `cost_per_dbu` - Dollar cost per DBU for the relevant stock-keeping unit. For information about the pricing system table, see the [Databricks documentation](https://docs.databricks.com/aws/en/admin/system-tables/pricing). Databricks reports usage in billing windows. These windows are periods of time where a compute resource consumed a known number of DBUs. Queries have their own start and end times. For information about the billing usage system table, see the [Databricks documentation](https://docs.databricks.com/aws/en/admin/system-tables/billing). To attribute usage to queries: 1. dbt identifies which billing windows each query overlaps. 2. dbt calculates how long the query ran during each window. 3. dbt allocates DBUs *proportionally* based on the query’s share of total execution time on that compute resource during the same window. Conceptually: ```text DBUs_in_window * (query_runtime / total_query_runtime_in_window) ``` dbt sums this across all overlapping windows to get `usage_per_query`. ##### Cost reduction calculation[​](#cost-reduction-calculation "Direct link to Cost reduction calculation") dbt calculates cost reductions by comparing actual costs to what costs would have been *without model reuse*. To do this, dbt uses data from the last seven days (where available) and performs the following steps: 1. Calculates the average cost per model build. 2. Counts how many times a model was reused instead of rebuilt. 3. Multiplies the reused model count by the average cost per build to determine total cost reduction. Formula: ```text average_cost_per_build * reuse_count ``` dbt calculates reductions per model and per deployment environment (production and staging), based on recent historical runs. Additional notes: * dbt calculates estimated costs and savings daily. * Pricing inputs come from warehouse system tables (where available), connection-level configuration, or default list prices. ###### Example[​](#example "Direct link to Example") The following example shows how dbt calculates cost reductions. Looking back seven days, assuming a model runs on two distinct days: | Day | Total cost | Total executions | | --------- | ---------- | ---------------- | | Day 1 | $5 | 5 | | Day 2 | $10 | 10 | | **Total** | **$15** | **15** | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | |
The average cost per execution: $15 ÷ 15 runs = $1 per run If the model was *reused* eight times instead of rebuilt during this same period, the estimated cost reduction is: $1 average cost per run \* 8 reuses = $8 #### Considerations[​](#considerations "Direct link to Considerations") Keep the following in mind when using Cost Insights: **Data collection and refresh** * Cost Insights uses your platform metadata credentials to access warehouse system tables. No separate credentials are needed beyond the platform metadata setup. * Cost data refreshes daily and reflects the previous day's usage, which means there is a lag of up to one day between when a job runs and when its cost data appears. * You need sufficient [permissions](https://docs.getdbt.com/docs/explore/set-up-cost-insights.md#assign-required-permissions) to query warehouse metadata tables. **Cost accuracy** * dbt calculates costs using warehouse-reported usage data and applies default credit or compute costs based on standard warehouse pricing. * If you have custom pricing agreements with your warehouse provider, override the default values in your account settings to ensure accurate cost reporting. For more information, see [Set up Cost Insights](https://docs.getdbt.com/docs/explore/set-up-cost-insights.md#configure-cost-insights-settings-optional). * Update your cost variables whenever your warehouse pricing contracts change to maintain accurate tracking. * Changes to cost variables only apply to future calculations — historical cost data remains unchanged. **Optimization data** * Optimization and usage reduction data is available once state-aware orchestration is enabled and begins reusing models across runs. * For accounts already using state-aware orchestration, run at least one full model build within the last 10 days before enabling Cost Insights to establish a baseline for cost reduction calculations. If you don't see cost reduction data, run a full build to establish the baseline. * Cost Insights currently calculates estimated reductions in warehouse compute usage at the model level and will expand to include tests and seeds in the future. **Exporting data** * You can export cost data as a CSV file for further analysis and reporting. For more information, see [Explore cost data](https://docs.getdbt.com/docs/explore/explore-cost-data.md). #### Related FAQs[​](#related-faqs "Direct link to Related FAQs") Why might my actual warehouse costs differ from displayed costs? Cost Insights shows estimates based on warehouse-reported usage and your configured pricing variables. These estimates are based on a retroactive analysis of historical runs and reflect actual usage, *not* forecasts of future costs. Adjustments and differences may occur if: * Your warehouse has custom pricing that differs from the default compute credit unit. * There are discounts or credits applied at the billing level that aren't reflected in usage tables. * Costs include other charges beyond compute. Costs Insights in the dbt platform is designed to be directionally accurate, showing you dbt-specific components rather than matching your billing exactly. How often is cost data refreshed? Cost data refreshes daily and reflects the previous day's usage. This means there is a lag of up to one day between when a job runs and when its cost data appears in Cost Insights. How do I troubleshoot if cost data isn't appearing? If cost data isn't appearing in Cost Insights, check the following: * Verify that platform metadata credentials are configured in your account settings and that the credential test is passing. For more information, see [Set up Cost Insights](https://docs.getdbt.com/docs/explore/set-up-cost-insights.md#configure-platform-metadata-credentials). * Ensure you have one of the required permissions to view cost data. For more information, see [Assign required permissions](https://docs.getdbt.com/docs/explore/set-up-cost-insights.md#assign-required-permissions). * Confirm that at least one job is running in a production environment. Cost data only appears after jobs have executed. * Cost data refreshes daily and reflects the previous day's usage, which means there is a lag of up to one day between when a job runs and when its cost data appears. If you just ran a job, wait until the next day to see the data. * After enabling Cost Insights, dbt looks back 10 days to build baselines for cost reduction calculations. If you don't see cost reduction data, ensure you have sufficient job history within the last 10 days. Does the Cost Insights feature incur warehouse costs? dbt issues lightweight, read-only queries against your warehouse to retrieve metadata and to power features such as Cost Insights. dbt scopes and filters these queries to minimize impact, and most customers see negligible costs (typically on the order of cents). How does increasing job frequency affect cost reduction estimates? Cost reduction metrics reflect how dbt optimizes compute costs by reusing existing results instead of running the same model again. When you increase your job run frequency (for example, because performance improvements make it easier to schedule jobs more often), dbt has more opportunities to reuse models. As reuse increases, dbt optimizes more compute, which means your reported cost reductions may also increase. This metric shows the efficiency impact of reuse within your current workload. It reflects the compute costs that dbt reduces by reusing models instead of rebuilding them, rather than showing your total warehouse spend reduction. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Creating metrics After building [semantic models](https://docs.getdbt.com/docs/build/semantic-models.md), it's time to start adding metrics. This page explains the different supported metric types you can add to your dbt project. #### Parameters[​](#parameters "Direct link to Parameters") The keys for metrics parameters are: ##### Type-specific parameters[​](#type-specific-parameters "Direct link to Type-specific parameters") Each metric type has additional specific parameters: * **Simple metrics**: `agg` (required), `expr`, `percentile`, `percentile_type`, `non_additive_dimension`, `agg_time_dimension`, `join_to_timespine`, `fill_nulls_with` * **Cumulative metrics**: `input_metric` (required), `window`, `grain_to_date`, `period_agg` * **Derived metrics**: `expr` (required), `input_metrics` (required) * **Ratio metrics**: `numerator` (required), `denominator` (required) * **Conversion metrics**: `entity` (required), `calculation` (required), `base_metric` (required), `conversion_metric` (required), `window`, `constant_properties` Refer to the following sections about each metric type for detailed information on type-specific parameters. ##### Example[​](#example "Direct link to Example") Here's a complete example of the metrics spec configuration: 📹 Learn about the dbt Semantic Layer with on-demand video courses! Explore our [dbt Semantic Layer on-demand course](https://learn.getdbt.com/courses/semantic-layer) to learn how to define and query metrics in your dbt project. Additionally, dive into mini-courses for querying the dbt Semantic Layer in your favorite tools: [Tableau](https://courses.getdbt.com/courses/tableau-querying-the-semantic-layer), [Excel](https://learn.getdbt.com/courses/querying-the-semantic-layer-with-excel), [Hex](https://courses.getdbt.com/courses/hex-querying-the-semantic-layer), and [Mode](https://courses.getdbt.com/courses/mode-querying-the-semantic-layer). #### Default granularity for metrics[​](#default-granularity-for-metrics "Direct link to Default granularity for metrics") #### Conversion metrics[​](#conversion-metrics "Direct link to Conversion metrics") [Conversion metrics](https://docs.getdbt.com/docs/build/conversion.md) help you track when a base event and a subsequent conversion event occur for an entity within a set time period. #### Cumulative metrics[​](#cumulative-metrics "Direct link to Cumulative metrics") #### Derived metrics[​](#derived-metrics "Direct link to Derived metrics") [Derived metrics](https://docs.getdbt.com/docs/build/derived.md) allow you to perform calculations using other metrics. For example, you can calculate `gross_profit` by subtracting a `cost` metric from a `revenue` metric, or calculate growth by comparing a metric to its value from a previous time period. #### Ratio metrics[​](#ratio-metrics "Direct link to Ratio metrics") [Ratio metrics](https://docs.getdbt.com/docs/build/ratio.md) involve a numerator metric and a denominator metric. A `filter` string can be applied to both the numerator and denominator or separately to the numerator or denominator. #### Simple metrics[​](#simple-metrics "Direct link to Simple metrics") #### Filters[​](#filters "Direct link to Filters") Configure a filter using Jinja templating and the following syntax to reference entities, dimensions, time dimensions, or metrics in filters. Refer to [Metrics as dimensions](https://docs.getdbt.com/docs/build/ref-metrics-in-filters.md) for details on how to use metrics as dimensions with metric filters: models/metrics/file\_name.yml ```yaml filter: | {{ Entity('entity_name') }} filter: | {{ Dimension('primary_entity__dimension_name') }} filter: | {{ TimeDimension('time_dimension', 'granularity') }} filter: | {{ Metric('metric_name', group_by=['entity_name']) }} ``` For example, if you want to filter for the order date dimension grouped by month, use the following syntax: ```yaml filter: | {{ TimeDimension('order_date', 'month') }} ``` #### Related docs[​](#related-docs "Direct link to Related docs") * [Semantic models](https://docs.getdbt.com/docs/build/semantic-models.md) * [Fill null values for metrics](https://docs.getdbt.com/docs/build/fill-nulls-advanced.md) * [Metrics as dimensions with metric filters](https://docs.getdbt.com/docs/build/ref-metrics-in-filters.md) #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Cumulative metrics #### Parameters[​](#parameters "Direct link to Parameters") ##### Complete specification[​](#complete-specification "Direct link to Complete specification") The following displays the complete specification for cumulative metrics, along with an example: #### Cumulative metrics example[​](#cumulative-metrics-example "Direct link to Cumulative metrics example") Cumulative metrics measure data over a given window and consider the window infinite when no window parameter is passed, accumulating the data over all time. The following example shows how to define cumulative metrics in a YAML file: ##### Granularity options[​](#granularity-options "Direct link to Granularity options")  Expand toggle to view how the SQL compiles Note the use of the `window` function to select the `first` value. For `last` and `average`, we would replace the `first_value()` function in the generated SQL with `last_value()` and `average` respectively. ```sql -- re-aggregate metric via the group by select metric_time__week, metric_time__quarter, revenue_all_time from ( -- window function for metric re-aggregation select metric_time__week, metric_time__quarter, first_value(revenue_all_time) over ( partition by metric_time__week, metric_time__quarter order by metric_time__day rows between unbounded preceding and unbounded following ) as revenue_all_time from ( -- join self over time range -- pass only elements: ['txn_revenue', 'metric_time__week', 'metric_time__quarter', 'metric_time__day'] -- aggregate measures -- compute metrics via expressions select subq_11.metric_time__day as metric_time__day, subq_11.metric_time__week as metric_time__week, subq_11.metric_time__quarter as metric_time__quarter, sum(revenue_src_28000.revenue) as revenue_all_time from ( -- time spine select ds as metric_time__day, date_trunc('week', ds) as metric_time__week, date_trunc('quarter', ds) as metric_time__quarter from mf_time_spine subq_12 group by ds, date_trunc('week', ds), date_trunc('quarter', ds) ) subq_11 inner join fct_revenue revenue_src_28000 on ( date_trunc('day', revenue_src_28000.created_at) <= subq_11.metric_time__day ) group by subq_11.metric_time__day, subq_11.metric_time__week, subq_11.metric_time__quarter ) subq_16 ) subq_17 group by metric_time__week, metric_time__quarter, revenue_all_time ``` ##### Window options[​](#window-options "Direct link to Window options") This section details examples of when to specify and not to specify window options. ##### Grain to date[​](#grain-to-date "Direct link to Grain to date") You can choose to specify a grain to date in your cumulative metric configuration to accumulate a metric from the start of a grain (such as week, month, or year). When using a window, such as a month, MetricFlow will go back one full calendar month. However, grain to date will always start accumulating from the beginning of the grain, regardless of the latest date of data. We can compare the difference between a 1-month window and a monthly grain to date. * The cumulative metric in a window approach applies a sliding window of 1 month * The grain to date by month resets at the beginning of each month. Cumulative metric with grain to date:  Expand toggle to view how the SQL compiles ```sql with staging as ( select subq_3.date_day as metric_time__day, date_trunc('week', subq_3.date_day) as metric_time__week, sum(subq_1.order_count) as orders_last_month_to_date from dbt_jstein.metricflow_time_spine subq_3 inner join ( select date_trunc('day', ordered_at) as metric_time__day, 1 as order_count from analytics.dbt_jstein.orders orders_src_10000 ) subq_1 on ( subq_1.metric_time__day <= subq_3.date_day ) and ( subq_1.metric_time__day >= date_trunc('month', subq_3.date_day) ) group by subq_3.date_day, date_trunc('week', subq_3.date_day) ) select * from ( select metric_time__week, first_value(orders_last_month_to_date) over (partition by date_trunc('week', metric_time__day) order by metric_time__day) as cumulative_revenue from staging ) group by metric_time__week, cumulative_revenue order by metric_time__week 1 ``` #### SQL implementation example[​](#sql-implementation-example "Direct link to SQL implementation example") To calculate the cumulative value of the metric over a given window we do a time range join to a timespine table using the primary time dimension as the join key. We use the accumulation window in the join to decide whether a record should be included on a particular day. The following SQL code produced from an example cumulative metric is provided for reference: To implement cumulative metrics, refer to the SQL code example: ```sql select count(distinct distinct_users) as weekly_active_users, metric_time from ( select subq_3.distinct_users as distinct_users, subq_3.metric_time as metric_time from ( select subq_2.distinct_users as distinct_users, subq_1.metric_time as metric_time from ( select metric_time from transform_prod_schema.mf_time_spine subq_1356 where ( metric_time >= cast('2000-01-01' as timestamp) ) and ( metric_time <= cast('2040-12-31' as timestamp) ) ) subq_1 inner join ( select distinct_users as distinct_users, date_trunc('day', ds) as metric_time from demo_schema.transactions transactions_src_426 where ( (date_trunc('day', ds)) >= cast('1999-12-26' as timestamp) ) AND ( (date_trunc('day', ds)) <= cast('2040-12-31' as timestamp) ) ) subq_2 on ( subq_2.metric_time <= subq_1.metric_time ) and ( subq_2.metric_time > dateadd(day, -7, subq_1.metric_time) ) ) subq_3 ) group by metric_time, limit 100; ``` #### Limitations[​](#limitations "Direct link to Limitations") If you specify a `window` in your cumulative metric definition, you must include `metric_time` as a dimension in the SQL query. This is because the accumulation window is based on metric time. For example, ```sql select count(distinct subq_3.distinct_users) as weekly_active_users, subq_3.metric_time from ( select subq_2.distinct_users as distinct_users, subq_1.metric_time as metric_time group by subq_3.metric_time ``` #### Related docs[​](#related-docs "Direct link to Related docs") * [Fill null values for simple, derived, or ratio metrics](https://docs.getdbt.com/docs/build/fill-nulls-advanced.md) #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Custom aliases #### Overview[​](#overview "Direct link to Overview") When dbt runs a model, it will generally create a relation (either a table or a view ) in the database, except in the case of an [ephemeral model](https://docs.getdbt.com/docs/build/materializations.md), when it will create a CTE for use in another model. By default, dbt uses the model's filename as the identifier for the relation or CTE it creates. This identifier can be overridden using the [`alias`](https://docs.getdbt.com/reference/resource-configs/alias.md) model configuration. ##### Why alias model names?[​](#why-alias-model-names "Direct link to Why alias model names?") The names of schemas and tables are effectively the "user interface" of your data warehouse. Well-named schemas and tables can help provide clarity and direction for consumers of this data. In combination with [custom schemas](https://docs.getdbt.com/docs/build/custom-schemas.md), model aliasing is a powerful mechanism for designing your warehouse. The file naming scheme that you use to organize your models may also interfere with your data platform's requirements for identifiers. For example, you might wish to namespace your files using a period (`.`), but your data platform's SQL dialect may interpret periods to indicate a separation between schema names and table names in identifiers, or it may forbid periods from being used at all in CTE identifiers. In cases like these, model aliasing can allow you to retain flexibility in the way you name your model files without violating your data platform's identifier requirements. ##### Usage[​](#usage "Direct link to Usage") The `alias` config can be used to change the name of a model's identifier in the database. The following table shows examples of database identifiers for models both with and without a supplied `alias`, and with different materializations. | Model | Config | Relation Type | Database Identifier | | ---------------- | ----------------------------------------------------------- | ------------- | -------------------------------- | | ga\_sessions.sql | {{ config(materialization='view') }} | view | "analytics"."ga\_sessions" | | ga\_sessions.sql | {{ config(materialization='view', alias='sessions') }} | view | "analytics"."sessions" | | ga\_sessions.sql | {{ config(materialization='ephemeral') }} | CTE | "\_\_dbt\_\_cte\_\_ga\_sessions" | | ga\_sessions.sql | {{ config(materialization='ephemeral', alias='sessions') }} | CTE | "\_\_dbt\_\_cte\_\_sessions" | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | To configure an alias for a model, supply a value for the model's `alias` configuration parameter. For example: models/google\_analytics/ga\_sessions.sql ```sql -- This model will be created in the database with the identifier `sessions` -- Note that in this example, `alias` is used along with a custom schema {{ config(alias='sessions', schema='google_analytics') }} select * from ... ``` Or in a `schema.yml` file. models/google\_analytics/schema.yml ```yaml models: - name: ga_sessions config: alias: sessions ``` When referencing the `ga_sessions` model above from a different model, use the `ref()` function with the model's *filename* as usual. For example: models/combined\_sessions.sql ```sql -- Use the model's filename in ref's, regardless of any aliasing configs select * from {{ ref('ga_sessions') }} union all select * from {{ ref('snowplow_sessions') }} ``` ##### generate\_alias\_name[​](#generate_alias_name "Direct link to generate_alias_name") The alias generated for a model is controlled by a macro called `generate_alias_name`. This macro can be overridden in a dbt project to change how dbt aliases models. This macro works similarly to the [generate\_schema\_name](https://docs.getdbt.com/docs/build/custom-schemas.md#advanced-custom-schema-configuration) macro. To override dbt's alias name generation, create a macro named `generate_alias_name` in your own dbt project. The `generate_alias_name` macro accepts two arguments: 1. The custom alias supplied in the model config 2. The node that a custom alias is being generated for The default implementation of `generate_alias_name` simply uses the supplied `alias` config (if present) as the model alias, otherwise falling back to the model name. This implementation looks like this: get\_custom\_alias.sql ```jinja2 {% macro generate_alias_name(custom_alias_name=none, node=none) -%} {%- if custom_alias_name -%} {{ custom_alias_name | trim }} {%- elif node.version -%} {{ return(node.name ~ "_v" ~ (node.version | replace(".", "_"))) }} {%- else -%} {{ node.name }} {%- endif -%} {%- endmacro %} ``` 💡 Use Jinja's whitespace control to tidy your macros! When you're modifying macros in your project, you might notice extra white space in your code in the `target/compiled` folder. You can remove unwanted spaces and lines with Jinja's [whitespace control](https://docs.getdbt.com/faqs/Jinja/jinja-whitespace.md) by using a minus sign. For example, use `{{- ... -}}` or `{%- ... %}` around your macro definitions (such as `{%- macro generate_schema_name(...) -%} ... {%- endmacro -%}`). ##### Dispatch macro - SQL alias management for databases and dbt packages[​](#dispatch-macro---sql-alias-management-for-databases-and-dbt-packages "Direct link to Dispatch macro - SQL alias management for databases and dbt packages") See docs on macro `dispatch`: ["Managing different global overrides across packages"](https://docs.getdbt.com/reference/dbt-jinja-functions/dispatch.md#managing-different-global-overrides-across-packages) ##### Caveats[​](#caveats "Direct link to Caveats") ###### Ambiguous database identifiers[​](#ambiguous-database-identifiers "Direct link to Ambiguous database identifiers") Using aliases, it's possible to accidentally create models with ambiguous identifiers. Given the following two models, dbt would attempt to create two views with *exactly* the same names in the database (ie. `sessions`): models/snowplow\_sessions.sql ```sql {{ config(alias='sessions') }} select * from ... ``` models/sessions.sql ```sql select * from ... ``` Whichever one of these models runs second would "win", and generally, the output of dbt would not be what you would expect. To avoid this failure mode, dbt will check if your model names and aliases are ambiguous in nature. If they are, you will be presented with an error message like this: ```text $ dbt compile Encountered an error: Compilation Error dbt found two resources with the database representation "analytics.sessions". dbt cannot create two resources with identical database representations. To fix this, change the "schema" or "alias" configuration of one of these resources: - model.my_project.snowplow_sessions (models/snowplow_sessions.sql) - model.my_project.sessions (models/sessions.sql) ``` If these models should indeed have the same database identifier, you can work around this error by configuring a [custom schema](https://docs.getdbt.com/docs/build/custom-schemas.md) for one of the models. ###### Model versions[​](#model-versions "Direct link to Model versions") **Related documentation:** * [Model versions](https://docs.getdbt.com/docs/mesh/govern/model-versions.md) * [`versions`](https://docs.getdbt.com/reference/resource-properties/versions.md#alias) By default, dbt will create versioned models with the alias `_v`, where `` is that version's unique identifier. You can customize this behavior just like for non-versioned models by configuring a custom `alias` or re-implementing the `generate_alias_name` macro. #### Related docs[​](#related-docs "Direct link to Related docs") * [Customize dbt models database, schema, and alias](https://docs.getdbt.com/guides/customize-schema-alias.md?step=1) to learn how to customize dbt models database, schema, and alias * [Custom schema](https://docs.getdbt.com/docs/build/custom-schemas.md) to learn how to customize dbt schema * [Custom database](https://docs.getdbt.com/docs/build/custom-databases.md) to learn how to customize dbt database #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Custom databases A word on naming Different warehouses have different names for *logical databases*. The information in this document covers "databases" on Snowflake, Redshift, and Postgres; "projects" on BigQuery; and "catalogs" on Databricks Unity Catalog. The values `project` and `database` are interchangeable in BigQuery project configurations. #### Configuring custom databases[​](#configuring-custom-databases "Direct link to Configuring custom databases") The logical database that dbt models are built into can be configured using the `database` model configuration. If this configuration is not supplied to a model, then dbt will use the database configured in the active target from your `profiles.yml` file. If the `database` configuration *is* supplied for a model, then dbt will build the model into the configured database. The `database` configuration can be supplied for groups of models in the `dbt_project.yml` file, or for individual models in model SQL files. ##### Configuring database overrides in `dbt_project.yml`:[​](#configuring-database-overrides-in-dbt_projectyml "Direct link to configuring-database-overrides-in-dbt_projectyml") This config changes all models in the `jaffle_shop` project to be built into a database called `jaffle_shop`. dbt\_project.yml ```yaml name: jaffle_shop models: jaffle_shop: +database: jaffle_shop # For BigQuery users: # project: jaffle_shop ``` ##### Configuring database overrides in a model file[​](#configuring-database-overrides-in-a-model-file "Direct link to Configuring database overrides in a model file") This config changes a specific model to be built into a database called `jaffle_shop`. models/my\_model.sql ```sql {{ config(database="jaffle_shop") }} select * from ... ``` ##### generate\_database\_name[​](#generate_database_name "Direct link to generate_database_name") The database name generated for a model is controlled by a macro called `generate_database_name`. This macro can be overridden in a dbt project to change how dbt generates model database names. This macro works similarly to the [generate\_schema\_name](https://docs.getdbt.com/docs/build/custom-schemas.md#advanced-custom-schema-configuration) macro. To override dbt's database name generation, create a macro named `generate_database_name` in your own dbt project. The `generate_database_name` macro accepts two arguments: 1. The custom database supplied in the model config 2. The node that a custom database is being generated for The default implementation of `generate_database_name` simply uses the supplied `database` config if one is present, otherwise the database configured in the active `target` is used. This implementation looks like this: get\_custom\_database.sql ```jinja2 {% macro generate_database_name(custom_database_name=none, node=none) -%} {%- set default_database = target.database -%} {%- if custom_database_name is none -%} {{ default_database }} {%- else -%} {{ custom_database_name | trim }} {%- endif -%} {%- endmacro %} ``` 💡 Use Jinja's whitespace control to tidy your macros! When you're modifying macros in your project, you might notice extra white space in your code in the `target/compiled` folder. You can remove unwanted spaces and lines with Jinja's [whitespace control](https://docs.getdbt.com/faqs/Jinja/jinja-whitespace.md) by using a minus sign. For example, use `{{- ... -}}` or `{%- ... %}` around your macro definitions (such as `{%- macro generate_schema_name(...) -%} ... {%- endmacro -%}`). ##### Managing different behaviors across packages[​](#managing-different-behaviors-across-packages "Direct link to Managing different behaviors across packages") See docs on macro `dispatch`: ["Managing different global overrides across packages"](https://docs.getdbt.com/reference/dbt-jinja-functions/dispatch.md) #### Considerations[​](#considerations "Direct link to Considerations") ##### BigQuery[​](#bigquery "Direct link to BigQuery") When dbt opens a BigQuery connection, it will do so using the `project_id` defined in your active `profiles.yml` target. This `project_id` will be billed for the queries that are executed in the dbt run, even if some models are configured to be built in other projects. #### Related docs[​](#related-docs "Direct link to Related docs") * [Customize dbt models database, schema, and alias](https://docs.getdbt.com/guides/customize-schema-alias.md?step=1) to learn how to customize dbt models database, schema, and alias * [Custom schema](https://docs.getdbt.com/docs/build/custom-schemas.md) to learn how to customize dbt model schema * [Custom aliases](https://docs.getdbt.com/docs/build/custom-aliases.md) to learn how to customize dbt model alias name #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Custom schemas By default, all dbt models are built in the schema specified in your [environment](https://docs.getdbt.com/docs/dbt-cloud-environments.md) (dbt platform) or [profile's target](https://docs.getdbt.com/docs/local/dbt-core-environments.md) (dbt Core). This default schema is called your *target schema*. For projects with many models, it's common to organize them across multiple schemas. For example, you might want to: * Group models based on the business unit using the model, creating schemas such as `core`, `marketing`, `finance` and `support`. * Hide intermediate models in a `staging` schema, and only present models that should be queried by an end user in an `analytics` schema. To do this, specify a custom schema. dbt generates the schema name for a model by appending the custom schema to the target schema. For example, `_`. | Target schema | Custom schema | Resulting schema | | ------------------------ | ------------- | ----------------------------------- | | analytics\_prod | None | analytics\_prod | | alice\_dev | None | alice\_dev | | dbt\_cloud\_pr\_123\_456 | None | dbt\_cloud\_pr\_123\_456 | | analytics\_prod | marketing | analytics\_prod\_marketing | | alice\_dev | marketing | alice\_dev\_marketing | | dbt\_cloud\_pr\_123\_456 | marketing | dbt\_cloud\_pr\_123\_456\_marketing | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | #### How do I use custom schemas?[​](#how-do-i-use-custom-schemas "Direct link to How do I use custom schemas?") To specify a custom schema for a model, use the `schema` configuration key. As with any configuration, you can do one of the following: * apply this configuration to a specific model by using a config block within a model * apply it to a subdirectory of models by specifying it in your `dbt_project.yml` file orders.sql ```sql {{ config(schema='marketing') }} select ... ``` dbt\_project.yml ```yaml # models in `models/marketing/ will be built in the "*_marketing" schema models: my_project: marketing: +schema: marketing ``` #### Understanding custom schemas[​](#understanding-custom-schemas "Direct link to Understanding custom schemas") When first using custom schemas, it's a common misunderstanding to assume that a model *only* uses the new `schema` configuration; for example, a model that has the configuration `schema: marketing` would be built in the `marketing` schema. However, dbt puts it in a schema like `_marketing`. There's a good reason for this deviation. Each dbt user has their own target schema for development (refer to [Managing Environments](#managing-environments)). If dbt ignored the target schema and only used the model's custom schema, every dbt user would create models in the same schema and would overwrite each other's work. By combining the target schema and the custom schema, dbt ensures that objects it creates in your data warehouse don't collide with one another. If you prefer to use different logic for generating a schema name, you can change the way dbt generates a schema name (see below). ##### How does dbt generate a model's schema name?[​](#how-does-dbt-generate-a-models-schema-name "Direct link to How does dbt generate a model's schema name?") dbt uses a default macro called `generate_schema_name` to determine the name of the schema that a model should be built in. The following code represents the default macro's logic: ```sql {% macro generate_schema_name(custom_schema_name, node) -%} {%- set default_schema = target.schema -%} {%- if custom_schema_name is none -%} {{ default_schema }} {%- else -%} {{ default_schema }}_{{ custom_schema_name | trim }} {%- endif -%} {%- endmacro %} ```
💡 Use Jinja's whitespace control to tidy your macros! When you're modifying macros in your project, you might notice extra white space in your code in the `target/compiled` folder. You can remove unwanted spaces and lines with Jinja's [whitespace control](https://docs.getdbt.com/faqs/Jinja/jinja-whitespace.md) by using a minus sign. For example, use `{{- ... -}}` or `{%- ... %}` around your macro definitions (such as `{%- macro generate_schema_name(...) -%} ... {%- endmacro -%}`). #### Changing the way dbt generates a schema name[​](#changing-the-way-dbt-generates-a-schema-name "Direct link to Changing the way dbt generates a schema name") If your dbt project has a custom macro called `generate_schema_name`, dbt will use it instead of the default macro. This allows you to customize the name generation according to your needs. To customize this macro, copy the example code in the section [How does dbt generate a model's schema name](#how-does-dbt-generate-a-models-schema-name) into a file named `macros/generate_schema_name.sql` and make changes as necessary. Be careful. dbt will ignore any custom `generate_schema_name` macros included in installed packages.  Warning: Don't replace \`default\_schema\` in the macro If you're modifying how dbt generates schema names, don't just replace `{{ default_schema }}_{{ custom_schema_name | trim }}` with `{{ custom_schema_name | trim }}` in the `generate_schema_name` macro. If you remove `{{ default_schema }}`, it causes developers to override each other's models if they create their own custom schemas. This can also cause issues during development and continuous integration (CI). ❌ The following code block is an example of what your code *should not* look like: ```sql {% macro generate_schema_name(custom_schema_name, node) -%} {%- set default_schema = target.schema -%} {%- if custom_schema_name is none -%} {{ default_schema }} {%- else -%} # The following is incorrect as it omits {{ default_schema }} before {{ custom_schema_name | trim }}. {{ custom_schema_name | trim }} {%- endif -%} {%- endmacro %} ``` ##### generate\_schema\_name arguments[​](#generate_schema_name-arguments "Direct link to generate_schema_name arguments") | Argument | Description | Example | | -------------------- | -------------------------------------------------------------------------------------------- | ---------------------------------------------------- | | custom\_schema\_name | The configured value of `schema` in the specified node, or `none` if a value is not supplied | `marketing` | | node | The `node` that is currently being processed by dbt | `{"name": "my_model", "resource_type": "model",...}` | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | ##### Jinja context available in generate\_schema\_name[​](#jinja-context-available-in-generate_schema_name "Direct link to Jinja context available in generate_schema_name") If you choose to write custom logic to generate a schema name, it's worth noting that not all variables and methods are available to you when defining this logic. In other words: the `generate_schema_name` macro is compiled with a limited Jinja context. The following context methods *are* available in the `generate_schema_name` macro: | Jinja context | Type | Available | | --------------------------------------------------------------------------------- | -------- | ------------------ | | [target](https://docs.getdbt.com/reference/dbt-jinja-functions/target.md) | Variable | ✅ | | [env\_var](https://docs.getdbt.com/reference/dbt-jinja-functions/env_var.md) | Variable | ✅ | | [var](https://docs.getdbt.com/reference/dbt-jinja-functions/var.md) | Variable | Limited, see below | | [exceptions](https://docs.getdbt.com/reference/dbt-jinja-functions/exceptions.md) | Macro | ✅ | | [log](https://docs.getdbt.com/reference/dbt-jinja-functions/log.md) | Macro | ✅ | | Other macros in your project | Macro | ✅ | | Other macros in your packages | Macro | ✅ | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | ##### Which vars are available in generate\_schema\_name?[​](#which-vars-are-available-in-generate_schema_name "Direct link to Which vars are available in generate_schema_name?") Globally-scoped variables and variables defined on the command line with [--vars](https://docs.getdbt.com/docs/build/project-variables.md) are accessible in the `generate_schema_name` context. ##### Managing different behaviors across packages[​](#managing-different-behaviors-across-packages "Direct link to Managing different behaviors across packages") See docs on macro `dispatch`: ["Managing different global overrides across packages"](https://docs.getdbt.com/reference/dbt-jinja-functions/dispatch.md) #### A built-in alternative pattern for generating schema names[​](#a-built-in-alternative-pattern-for-generating-schema-names "Direct link to A built-in alternative pattern for generating schema names") A common customization is to use the custom schema in production when provided, with the target schema serving only as a fallback if no custom schema is specified. In other environments, such as development and CI, custom schema configurations are ignored, defaulting to the target schema instead. Production Environment (`target.name == 'prod'`) | Target schema | Custom schema | Resulting schema | | --------------- | ------------- | ---------------- | | analytics\_prod | None | analytics\_prod | | analytics\_prod | marketing | marketing | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | Development/CI Environment (`target.name != 'prod'`) | Target schema | Custom schema | Resulting schema | | ------------------------ | ------------- | ------------------------ | | alice\_dev | None | alice\_dev | | alice\_dev | marketing | alice\_dev | | dbt\_cloud\_pr\_123\_456 | None | dbt\_cloud\_pr\_123\_456 | | dbt\_cloud\_pr\_123\_456 | marketing | dbt\_cloud\_pr\_123\_456 | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | Similar to the regular macro, this approach guarantees that schemas from different environments will not collide. dbt ships with a macro for this use case — called `generate_schema_name_for_env` — which is disabled by default. To enable it, add a custom `generate_schema_name` macro to your project that contains the following code: macros/generate\_schema\_name.sql ```sql -- put this in macros/generate_schema_name.sql {% macro generate_schema_name(custom_schema_name, node) -%} {{ generate_schema_name_for_env(custom_schema_name, node) }} {%- endmacro %} ``` When using this macro, you'll need to set the target name in your production job to `prod`. #### Managing environments[​](#managing-environments "Direct link to Managing environments") In the `generate_schema_name` macro examples shown in the [built-in alternative pattern](#a-built-in-alternative-pattern-for-generating-schema-names) section, the `target.name` context variable is used to change the schema name that dbt generates for models. If the `generate_schema_name` macro in your project uses the `target.name` context variable, you must ensure that your different dbt environments are configured accordingly. While you can use any naming scheme you'd like, we typically recommend: * **dev** — Your local development environment; configured in a `profiles.yml` file on your computer. * **ci** — A [continuous integration](https://docs.getdbt.com/docs/cloud/git/connect-github.md) environment running on pull requests in GitHub, GitLab, and so on. * **prod** — The production deployment of your dbt project, like in dbt, Airflow, or [similar](https://docs.getdbt.com/docs/deploy/deployments.md). If your schema names are being generated incorrectly, double-check your target name in the relevant environment. For more information, consult the [managing environments in dbt Core](https://docs.getdbt.com/docs/local/dbt-core-environments.md) guide. #### Related docs[​](#related-docs "Direct link to Related docs") * [Customize dbt models database, schema, and alias](https://docs.getdbt.com/guides/customize-schema-alias.md?step=1) to learn how to customize dbt models database, schema, and alias * [Custom database](https://docs.getdbt.com/docs/build/custom-databases.md) to learn how to customize dbt model database * [Custom aliases](https://docs.getdbt.com/docs/build/custom-aliases.md) to learn how to customize dbt model alias name #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Custom target names #### dbt Scheduler[​](#dbt-scheduler "Direct link to dbt Scheduler") You can define a custom target name for any dbt job to correspond to settings in your dbt project. This is helpful if you have logic in your dbt project that behaves differently depending on the specified target, for example: ```sql select * from a_big_table -- limit the amount of data queried in dev {% if target.name != 'prod' %} where created_at > date_trunc('month', current_date) {% endif %} ``` To set a custom target name for a job in dbt, configure the **Target Name** field for your job in the Job Settings page. [![Overriding the target name to 'prod'](/img/docs/dbt-cloud/using-dbt-cloud/jobs-settings-target-name.png?v=2 "Overriding the target name to 'prod'")](#)Overriding the target name to 'prod' #### dbt Studio IDE[​](#dbt-studio-ide "Direct link to dbt Studio IDE") When developing in dbt, you can set a custom target name in your development credentials. Click your account name above the profile icon in the left panel, select **Account settings**, then go to **Credentials**. Choose the project to update the target name. [![Overriding the target name to 'dev'](/img/docs/dbt-cloud/using-dbt-cloud/development-credentials.png?v=2 "Overriding the target name to 'dev'")](#)Overriding the target name to 'dev' #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Data health signals Preview ### Data health signals [Preview](https://docs.getdbt.com/docs/dbt-versions/product-lifecycles "Go to https://docs.getdbt.com/docs/dbt-versions/product-lifecycles") Data health signals offer a quick, at-a-glance view of data health when browsing your resources in Catalog. They keep you informed on the status of your resource's health using the indicators **Healthy**, **Caution**, **Degraded**, or **Unknown**. Note, we don’t calculate data health for non-dbt resources. * Supported resources are [models](https://docs.getdbt.com/docs/build/models.md), [sources](https://docs.getdbt.com/docs/build/sources.md), and [exposures](https://docs.getdbt.com/docs/build/exposures.md). * For accurate health data, ensure the resource is up-to-date and had a recent job run. * Each data health signal reflects key data health components, such as test success status, missing resource descriptions, missing tests, absence of builds in 30-day windows, [and more](#data-health-signal-criteria). [![View data health signals for your models.](/img/docs/collaborate/dbt-explorer/data-health-signal.jpg?v=2 "View data health signals for your models.")](#)View data health signals for your models. #### Access data health signals[​](#access-data-health-signals "Direct link to Access data health signals") Access data health signals in the following places: * In the [search function](https://docs.getdbt.com/docs/explore/explore-projects.md#search-resources) or under **Models**, **Sources**, or **Exposures** in the **Resource** tab. * For sources, the data health signal also indicates the [source freshness](https://docs.getdbt.com/docs/deploy/source-freshness.md) status. * In the **Health** column on [each resource's details page](https://docs.getdbt.com/docs/explore/explore-projects.md#view-resource-details). Hover over or click the signal to view detailed information. * In the **Health** column of public models tables. * In the [DAG lineage graph](https://docs.getdbt.com/docs/explore/explore-projects.md#project-lineage). Click any node to open the node details panel where you can view it and its details. * In [Data health tiles](https://docs.getdbt.com/docs/explore/data-tile.md) through an embeddable iFrame and visible in your BI dashboard. [![Access data health signals in multiple places in dbt Catalog.](/img/docs/collaborate/dbt-explorer/data-health-signal.gif?v=2 "Access data health signals in multiple places in dbt Catalog.")](#)Access data health signals in multiple places in dbt Catalog. #### Data health signal criteria[​](#data-health-signal-criteria "Direct link to Data health signal criteria") Each resource has a health state that is determined by specific set of criteria. Select the following tabs to view the criteria for that resource type. * Models * Sources * Exposures The health state of a model is determined by the following criteria: | **Health state** | **Criteria** | | ---------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | ✅ **Healthy** | All of the following must be true:

- Built successfully in the last run
- Built in the last 30 days
- Model has tests configured
- All tests passed
- All upstream [sources are fresh](https://docs.getdbt.com/docs/build/sources.md#source-data-freshness) or freshness is not applicable (set to `null`)
- Has a description | | 🟡 **Caution** | One of the following must be true:

- Not built in the last 30 days
- Tests are not configured
- Tests return warnings
- One or more upstream sources are stale:
    - Has a freshness check configured
    - Freshness check ran in the past 30 days
    - Freshness check returned a warning
- Missing a description | | 🔴 **Degraded** | One of the following must be true:

- Model failed to build
- Model has failing tests
- One or more upstream sources are stale:
    - Freshness check hasn’t run in the past 30 days
    - Freshness check returned an error | | ⚪ **Unknown** | - Unable to determine health of resource; no job runs have processed the resource. | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | The health state of a source is determined by the following criteria: | **Health state** | **Criteria** | | ---------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | ✅ Healthy | All of the following must be true:

- Freshness check configured
- Freshness check passed
- Freshness check ran in the past 30 days
- Has a description | | 🟡 Caution | One of the following must be true:

- Freshness check returned a warning
- Freshness check not configured
- Freshness check not run in the past 30 days
- Missing a description | | 🔴 Degraded | - Freshness check returned an error | | ⚪ Unknown | Unable to determine health of resource; no job runs have processed the resource. | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | The health state of an exposure is determined by the following criteria: | **Health state** | **Criteria** | | ---------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | ✅ Healthy | All of the following must be true:

- Underlying sources are fresh
- Underlying models built successfully
- Underlying models’ tests passing
| | 🟡 Caution | One of the following must be true:

- At least one underlying source’s freshness checks returned a warning
- At least one underlying model was skipped
- At least one underlying model’s tests returned a warning
| | 🔴 Degraded | One of the following must be true:

- At least one underlying source’s freshness checks returned an error
- At least one underlying model did not build successfully
- At least one model’s tests returned an error | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Data health tile EnterpriseEnterprise + ### Data health tile [Enterprise](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing")[Enterprise +](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing") With data health tiles, stakeholders will get an at-a-glance confirmation on whether the data they’re looking at is stale or degraded. It allows teams to immediately go back into Catalog to see more details and investigate issues. The data health tile: * Distills [data health signals](https://docs.getdbt.com/docs/explore/data-health-signals.md) for data consumers. * Deep links you into Catalog where you can further dive into upstream data issues. * Provides richer information and makes it easier to debug. * Revamps the existing, [job-based tiles](#job-based-data-health). Data health tiles rely on [exposures](https://docs.getdbt.com/docs/build/exposures.md) to surface data health signals in your dashboards. An exposure defines how specific outputs — like dashboards or reports — depend on your data models. Exposures in dbt can be configured in two ways: * Manual — Defined [manually](https://docs.getdbt.com/docs/build/exposures.md#declaring-an-exposure) and explicitly in your project’s YAML files. * Automatic — Pulled automatically for supported dbt integrations. dbt automatically [creates and visualizes downstream exposures](https://docs.getdbt.com/docs/cloud-integrations/downstream-exposures.md), removing the need for manual YAML definitions. These downstream exposures are stored in dbt’s metadata system, appear in [Catalog](https://docs.getdbt.com/docs/explore/explore-projects.md), and behave like manual exposures, however they don’t exist in YAML files. [![Example of passing Data health tile in your dashboard.](/img/docs/collaborate/dbt-explorer/data-tile-pass.jpg?v=2 "Example of passing Data health tile in your dashboard.")](#)Example of passing Data health tile in your dashboard. [![Embed data health tiles in your dashboards to distill data health signals for data consumers.](/img/docs/collaborate/dbt-explorer/data-tiles.png?v=2 "Embed data health tiles in your dashboards to distill data health signals for data consumers.")](#)Embed data health tiles in your dashboards to distill data health signals for data consumers. #### Prerequisites[​](#prerequisites "Direct link to Prerequisites") * You must have a dbt account on an [Enterprise-tier plan](https://www.getdbt.com/pricing/). * You must be an account admin to set up [service tokens](https://docs.getdbt.com/docs/dbt-cloud-apis/service-tokens.md#permissions-for-service-account-tokens). * You must have [develop permissions](https://docs.getdbt.com/docs/cloud/manage-access/seats-and-users.md). * You have [exposures](https://docs.getdbt.com/docs/build/exposures.md) defined in your project: * If using manual exposures, they must be explicitly defined in your YAML files. * If using automatic downstream exposures, ensure your BI tool is [configured](https://docs.getdbt.com/docs/cloud-integrations/downstream-exposures-tableau.md) with dbt. * You have [source freshness](https://docs.getdbt.com/docs/deploy/source-freshness.md) enabled in the job that generates this exposure. * The exposure used for the data health tile must have the [`type` property](https://docs.getdbt.com/docs/build/exposures.md#available-properties) set to `dashboard`. Otherwise, you won't be able to view the **Embed data health tile in your dashboard** dropdown in Catalog. #### View exposure in dbt Catalog[​](#view-exposure-in-dbt-catalog "Direct link to View exposure in dbt Catalog") First, be sure to enable [source freshness](https://docs.getdbt.com/docs/deploy/source-freshness.md) in the job that generates this exposure. 1. Navigate to Catalog by clicking on the **Catalog** link in the navigation. 2. In the main **Overview** page, go to the left navigation. 3. Under the **Resources** tab, click on **Exposures** to view the [exposures](https://docs.getdbt.com/docs/build/exposures.md) list. 4. Select a dashboard exposure and go to the **General** tab to view the data health information. 5. In this tab, you’ll see: * Name of the exposure. * Data health status: Data freshness passed, Data quality passed, Data may be stale, Data quality degraded. * Resource type (model, source, and so on). * Dashboard status: Failure, Pass, Stale. * You can also see the last check completed, the last check time, and the last check duration. 6. You can click the **Open Dashboard** button on the upper right to immediately view this in your analytics tool. [![View an exposure in dbt Catalog.](/img/docs/collaborate/dbt-explorer/data-tile-exposures.jpg?v=2 "View an exposure in dbt Catalog.")](#)View an exposure in dbt Catalog. #### Embed in your dashboard[​](#embed-in-your-dashboard "Direct link to Embed in your dashboard") Once you’ve navigated to the exposure in Catalog, you’ll need to set up your data health tile and [service token](https://docs.getdbt.com/docs/dbt-cloud-apis/service-tokens.md). You can embed data health tile to any analytics tool that supports URL or iFrame embedding. Follow these steps to set up your data health tile: 1. Go to **Account settings** in dbt. 2. Select **API tokens** in the left sidebar and then **Service tokens**. 3. Click on **Create service token** and give it a name. 4. Select the [**Metadata Only**](https://docs.getdbt.com/docs/dbt-cloud-apis/service-tokens.md) permission. This token will be used to embed the tile in your dashboard in the later steps. [![Set up your dashboard status tile and service token to embed a data health tile](/img/docs/collaborate/dbt-explorer/data-tile-setup.jpg?v=2 "Set up your dashboard status tile and service token to embed a data health tile")](#)Set up your dashboard status tile and service token to embed a data health tile 5. Copy the **Metadata Only** token and save it in a secure location. You'll need it token in the next steps. 6. Navigate back to Catalog and select an exposure. tip The exposure used for the data health tile must have the [`type` property](https://docs.getdbt.com/docs/build/exposures.md#available-properties) set to `dashboard`. Otherwise, you won't be able to view the **Embed data health tile in your dashboard** dropdown in Catalog. 7. Below the **Data health** section, expand on the toggle for instructions on how to embed the exposure tile (if you're an account admin with develop permissions). 8. In the expanded toggle, you'll see a text field where you can paste your **Metadata Only token**. [![Expand the toggle to embed data health tile into your dashboard.](/img/docs/collaborate/dbt-explorer/data-tile-example.jpg?v=2 "Expand the toggle to embed data health tile into your dashboard.")](#)Expand the toggle to embed data health tile into your dashboard. 9. Once you’ve pasted your token, you can select either **URL** or **iFrame** depending on which you need to add to your dashboard. If your analytics tool supports iFrames, you can embed the dashboard tile within it. #### Examples[​](#examples "Direct link to Examples") The following examples show how to embed the data health tile in Omni, PowerBI, Tableau, and Sigma. * Omni example * PowerBI example * Tableau example * Sigma example Follow these steps to embed the data health tile in [Omni](https://omni.co/): [![Embed data health tile in Omni](/img/docs/collaborate/dbt-explorer/omni-example.png?v=2 "Embed data health tile in Omni")](#)Embed data health tile in Omni 1. Create a dashboard in Omni. 2. Copy the iFrame snippet available in Catalog's **Data health** section, under the **Embed data health into your dashboard** toggle. 3. Add a new Text or Markdown [element](https://docs.omni.co/visualize-present/dashboards/text-markdown) in your Dashboard with the code from step 2, it should be in the following format: ```html/text ``` 4. Save the tile and your Omni dashboard should now have a dbt platform hosted data health tile that is automatically updated based on the state of your dbt environment. You can embed the data health tile iFrame in PowerBI using PowerBI Pro Online, Fabric PowerBI, or PowerBI Desktop. [![Embed data health tile iFrame in PowerBI](/img/docs/collaborate/dbt-explorer/power-bi.png?v=2 "Embed data health tile iFrame in PowerBI")](#)Embed data health tile iFrame in PowerBI Follow these steps to embed the data health tile in PowerBI: 1. Create a dashboard in PowerBI and connect to your database to pull in the data. 2. Create a new PowerBI measure by right-clicking on your **Data**, **More options**, and then **New measure**. [![Create a new PowerBI measure.](/img/docs/collaborate/dbt-explorer/power-bi-measure.png?v=2 "Create a new PowerBI measure.")](#)Create a new PowerBI measure. 3. Navigate to Catalog, select the exposure, and expand the [**Embed data health into your dashboard**](https://docs.getdbt.com/docs/explore/data-tile.md#embed-in-your-dashboard) toggle. 4. Go to the **iFrame** tab and copy the iFrame code. Make sure the Metadata Only token is already set up. 5. In PowerBI, paste the iFrame code you copied into your measure calculation window. The iFrame code should look like this: ```html/text ``` [![In the 'Measure tools' tab, replace your values with the iFrame code.](/img/docs/collaborate/dbt-explorer/power-bi-measure-tools.png?v=2 "In the 'Measure tools' tab, replace your values with the iFrame code.")](#)In the 'Measure tools' tab, replace your values with the iFrame code. 6. PowerBI desktop doesn't support HTML rendering by default, so you need to install an HTML component from the PowerBI Visuals Store. 7. To do this, go to **Build visuals** and then **Get more visuals**. 8. Login with your PowerBI account. 9. There are several third-party HTML visuals. The one tested for this guide is [HTML content](https://appsource.microsoft.com/en-us/product/power-bi-visuals/WA200001930?tab=Overview). Install it, but please keep in mind it's a third-party plugin not created or supported by dbt Labs. 10. Drag the metric with the iFrame code into the HTML content widget in PowerBI. This should now display your data health tile. [![Drag the metric with the iFrame code into the HTML content widget in PowerBI. This should now display your data health tile.](/img/docs/collaborate/dbt-explorer/power-bi-final.png?v=2 "Drag the metric with the iFrame code into the HTML content widget in PowerBI. This should now display your data health tile.")](#)Drag the metric with the iFrame code into the HTML content widget in PowerBI. This should now display your data health tile. *Refer to [this tutorial](https://www.youtube.com/watch?v=SUm9Hnq8Th8) for additional information on embedding a website into your Power BI report.* Follow these steps to embed the data health tile in Tableau: [![Embed data health tile iFrame in Tableau](/img/docs/collaborate/dbt-explorer/tableau-example.png?v=2 "Embed data health tile iFrame in Tableau")](#)Embed data health tile iFrame in Tableau 1. Create a dashboard in Tableau and connect to your database to pull in the data. 2. Ensure you've copied the URL or iFrame snippet available in Catalog's **Data health** section, under the **Embed data health into your dashboard** toggle. 3. Insert a **Web Page** object. 4. Insert the URL and click **Ok**. ```text/html https://metadata.ACCESS_URL/exposure-tile?uniqueId=exposure.EXPOSURE_NAME&environmentType=production&environmentId=220370&token= ``` *Note, replace the placeholders with your actual values.* 5. You should now see the data health tile embedded in your Tableau dashboard. Follow these steps to embed the data health tile in Sigma: [![Embed data health tile in Sigma](/img/docs/collaborate/dbt-explorer/sigma-example.jpg?v=2 "Embed data health tile in Sigma")](#)Embed data health tile in Sigma 1. Create a dashboard in Sigma and connect to your database to pull in the data. 2. Ensure you've copied the URL or iFrame snippet available in Catalog's **Data health** section, under the **Embed data health into your dashboard** toggle. 3. Add a new embedded UI element in your Sigma Workbook in the following format: ```html/text https://metadata.ACCESS_URL/exposure-tile?uniqueId=exposure.EXPOSURE_NAME&environmentType=production&environmentId=ENV_ID_NUMBER&token= ``` *Note, replace the placeholders with your actual values.* 4. You should now see the data health tile embedded in your Sigma dashboard. #### Job-based data health Legacy[​](#job-based-data-health- "Direct link to job-based-data-health-") The default experience is the [environment-based data health tile](#view-exposure-in-dbt-explorer) with Catalog. This section is for legacy job-based data health tiles. If you're using the revamped environment-based exposure tile, refer to the previous section. Expand the following to learn more about the legacy job-based data health tile.  Job-based data health In dbt, the [Discovery API](https://docs.getdbt.com/docs/dbt-cloud-apis/discovery-api.md) can power dashboard status tiles, which are job-based. A dashboard status tile is placed on a dashboard (specifically: anywhere you can embed an iFrame) to give insight into the quality and freshness of the data feeding into that dashboard. This is done in dbt [exposures](https://docs.getdbt.com/docs/build/exposures.md). ###### Functionality[​](#functionality "Direct link to Functionality") The dashboard status tile looks like this: [![](/img/docs/dbt-cloud/using-dbt-cloud/dashboard-status-tiles/passing-tile.jpeg?v=2)](#) The data freshness check fails if any sources feeding into the exposure are stale. The data quality check fails if any dbt tests fail. A failure state could look like this: [![](/img/docs/dbt-cloud/using-dbt-cloud/dashboard-status-tiles/failing-tile.jpeg?v=2)](#) Clicking into **see details** from the Dashboard Status Tile takes you to a landing page where you can learn more about the specific sources, models, and tests feeding into this exposure. ###### Setup[​](#setup "Direct link to Setup") First, be sure to enable [source freshness](https://docs.getdbt.com/docs/deploy/source-freshness.md) in the job that generates this exposure. In order to set up your dashboard status tile, here is what you need: 1. **Metadata Only token.** You can learn how to set up a Metadata-Only token [here](https://docs.getdbt.com/docs/dbt-cloud-apis/service-tokens.md). 2. **Exposure name.** You can learn more about how to set up exposures [here](https://docs.getdbt.com/docs/build/exposures.md). 3. **Job iD.** Remember that you can select your job ID directly from the URL when looking at the relevant job in dbt. You can insert these three fields into the following iFrame, and then embed it **anywhere that you can embed an iFrame**: ```html/text ``` Replace `YOUR_ACCESS_URL` with your region and plan's Access URL dbt is hosted in multiple regions in the world and each region has a different access URL. Replace `YOUR_ACCESS_URL` with the appropriate [Access URL](https://docs.getdbt.com/docs/cloud/about-cloud/access-regions-ip-addresses.md) for your region and plan. For example, if your account is hosted in the EMEA region, you would use the following iFrame code: ```html/text ``` ###### Embedding with BI tools[​](#embedding-with-bi-tools "Direct link to Embedding with BI tools") The dashboard status tile should work anywhere you can embed an iFrame. But below are some tactical tips on how to integrate with common BI tools. * Mode * Looker * Tableau * Sigma ###### Mode[​](#mode "Direct link to Mode") Mode allows you to directly [edit the HTML](https://mode.com/help/articles/report-layout-and-presentation/#html-editor) of any given report, where you can embed the iFrame. Note that Mode has also built its own [integration](https://mode.com/get-dbt/) with the dbt Discovery API! ###### Looker[​](#looker "Direct link to Looker") Looker does not allow you to directly embed HTML and instead requires creating a [custom visualization](https://docs.looker.com/admin-options/platform/visualizations). One way to do this for admins is to: * Add a [new visualization](https://fishtown.looker.com/admin/visualizations) on the visualization page for Looker admins. You can use [this URL](https://metadata.cloud.getdbt.com/static/looker-viz.js) to configure a Looker visualization powered by the iFrame. It will look like this: [![Configure a Looker visualization powered by the iFrame](/img/docs/dbt-cloud/using-dbt-cloud/dashboard-status-tiles/looker-visualization.jpeg?v=2 "Configure a Looker visualization powered by the iFrame")](#)Configure a Looker visualization powered by the iFrame * Once you have set up your custom visualization, you can use it on any dashboard! You can configure it with the exposure name, job ID, and token relevant to that dashboard. [![]()](#) ###### Tableau[​](#tableau "Direct link to Tableau") Tableau does not require you to embed an iFrame. You only need to use a Web Page object on your Tableau Dashboard and a URL in the following format: ```html/text https://metadata.YOUR_ACCESS_URL/exposure-tile?name=&jobId=&token= ``` Replace `YOUR_ACCESS_URL` with your region and plan's Access URL dbt is hosted in multiple regions in the world and each region has a different access URL. Replace `YOUR_ACCESS_URL` with the appropriate [Access URL](https://docs.getdbt.com/docs/cloud/about-cloud/access-regions-ip-addresses.md) for your region and plan. For example, if your account is hosted in the North American region, you would use the following code: ```html/text https://metadata.cloud.getdbt.com/exposure-tile?name=&jobId=&token= ``` [![Configure Tableau by using a Web page object.](/img/docs/dbt-cloud/using-dbt-cloud/dashboard-status-tiles/tableau-object.png?v=2 "Configure Tableau by using a Web page object.")](#)Configure Tableau by using a Web page object. ###### Sigma[​](#sigma "Direct link to Sigma") Sigma does not require you to embed an iFrame. Add a new embedded UI element in your Sigma Workbook in the following format: ```html/text https://metadata.YOUR_ACCESS_URL/exposure-tile?name=&jobId=&token= ``` Replace `YOUR_ACCESS_URL` with your region and plan's Access URL dbt is hosted in multiple regions in the world and each region has a different access URL. Replace `YOUR_ACCESS_URL` with the appropriate [Access URL](https://docs.getdbt.com/docs/cloud/about-cloud/access-regions-ip-addresses.md) for your region and plan. For example, if your account is hosted in the APAC region, you would use the following code: ```html/text https://metadata.au.dbt.com/exposure-tile?name=&jobId=&token= ``` [![Configure Sigma by using an embedded UI element.](/img/docs/dbt-cloud/using-dbt-cloud/dashboard-status-tiles/sigma-embed.gif?v=2 "Configure Sigma by using an embedded UI element.")](#)Configure Sigma by using an embedded UI element. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Databricks and Apache Iceberg dbt supports materializing Iceberg tables in Unity Catalog using the catalog integration, starting with the dbt-databricks 1.9.0 release, for two Databricks materializations: * [Table](https://docs.getdbt.com/docs/build/materializations.md#table) * [Incremental](https://docs.getdbt.com/docs/build/materializations.md#incremental) #### Databricks Iceberg tables[​](#databricks-iceberg-tables "Direct link to Databricks Iceberg tables") Databricks is built on [Delta Lake](https://docs.databricks.com/aws/en/delta/) and stores data in the [Delta table](https://docs.databricks.com/aws/en/introduction/delta-comparison#delta-tables-default-data-table-architecture) format. Databricks supports two methods for creating Iceberg tables in its data catalog, [Unity Catalog](https://docs.databricks.com/aws/en/data-governance/unity-catalog/): * Creating [Unity Catalog managed Iceberg tables](https://docs.databricks.com/aws/en/tables/managed). Databricks Runtime 16.4 LTS and later support this feature. * Enabling [Iceberg reads](https://docs.databricks.com/aws/en/delta/uniform) on Delta tables. These tables still use the Delta file format, but generate both Delta and Iceberg-compatible metadata. Databricks Runtime 14.3 LTS and later support this feature. External Iceberg compute engines can read from and write to these Iceberg tables using Unity Catalog's [Iceberg REST API endpoint](https://docs.databricks.com/aws/en/external-access/iceberg). However, Databricks only supports reading from external Iceberg catalogs. To set up Databricks for reading and querying external tables, configure [Lakehouse Federation](https://docs.databricks.com/aws/en/query-federation/) and establish the catalog as a foreign catalog. Configure this outside of dbt. Once completed, it becomes another database you can query. dbt does not yet support enabling [Iceberg v3](https://docs.databricks.com/aws/en/iceberg/iceberg-v3) on managed Iceberg tables. #### Creating Iceberg tables[​](#creating-iceberg-tables "Direct link to Creating Iceberg tables") To configure dbt models to materialize as Iceberg tables, you can use a catalog integration with `table_format: iceberg` (see [dbt Catalog integration configurations for databricks](#dbt-catalog-integration-configurations-for-databricks)). ##### External tables[​](#external-tables "Direct link to External tables") dbt also supports creating externally-managed Iceberg tables using the model configuration [`location_root`](https://docs.getdbt.com/reference/resource-configs/databricks-configs.md#configuring-tables). Databricks' DDL for creating tables requires a fully qualified `location`. dbt defines this parameter on the user's behalf to streamline usage and enforce basic isolation of table data: * When you set a `location_root` string, dbt generates a `location` string of the form: `{{ location_root }}/{{ model_name }}`. If you set the configuration option `include_full_name_in_path` to true, dbt generates a `location` string of the form `{{ location_root }}/{{ database_name}}/{{ schema_name }}/{{ model_name }}`. ##### dbt Catalog integration configurations for Databricks[​](#dbt-catalog-integration-configurations-for-databricks "Direct link to dbt Catalog integration configurations for Databricks") ###### Note[​](#note "Direct link to Note") On Databricks, if a model has `catalog_name=<>` in its model config, the catalog name becomes the catalog part of the model's FQN. For example, if the catalog is named `my_database`, a model with `catalog_name='my_database'` is materialized as `my_database..`. #### Configure catalog integration for Iceberg tables[​](#configure-catalog-integration-for-iceberg-tables "Direct link to Configure catalog integration for Iceberg tables") 1. Create a `catalogs.yml` at the top level of your dbt project (at the same level as dbt\_project.yml)

An example of Unity Catalog as the catalog: ```yaml catalogs: - name: unity_catalog active_write_integration: unity_catalog_integration write_integrations: - name: unity_catalog_integration table_format: iceberg catalog_type: unity file_format: delta adapter_properties: location_root: s3://cloud-storage-uri ``` 2. Add the `catalog_name` config parameter in either a config block (inside the .sql model file), properties YAML file (model folder), or your project YAML file (`dbt_project.yml`).

An example of `iceberg_model.sql`: ```yaml {{ config( materialized = 'table', catalog_name = 'unity_catalog' ) }} select * from {{ ref('jaffle_shop_customers') }} ``` 3. Execute the dbt model with a `dbt run -s iceberg_model`. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### dbt Agents overview BetaEnterpriseEnterprise + ### dbt Agents overview [Beta](https://docs.getdbt.com/docs/dbt-versions/product-lifecycles "Go to https://docs.getdbt.com/docs/dbt-versions/product-lifecycles")[Enterprise](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing")[Enterprise +](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing") [dbt Agents](https://www.getdbt.com/product/dbt-agents), available on [dbt Enterprise-tier plans](https://www.getdbt.com/pricing), are a suite of native AI agents that turn structured dbt context into auditable actions. These agents help you build, manage, and consume governed data at scale by bringing intelligence to every step of the analytics development lifecycle. info Some dbt Agents are in beta; others are coming soon. Contact your account manager for early access. See [available agents](#available-agents) to find out what's available. dbt Agents are built on top of dbt's structured context to provide accurate, auditable, and governed results: * Semantic Layer — Metrics, dimensions, and business logic * Metadata — Lineage, tests, documentation, and ownership * Governance — Access policies, data quality rules, and contracts Having dbt as the standard context layer for agentic analytics means that dbt Agents are built on top of this context to provide accurate results rather than hallucinated or inconsistent answers. [YouTube video player](https://www.youtube.com/embed/VMkRXWkEcKk?si=vPNG0T8w8q3g3ugT) #### Key benefits[​](#key-benefits "Direct link to Key benefits") * Faster development — Engineers and analysts ship data products faster with AI assistance. * Better decisions — Business users get accurate answers grounded in governed data. * Auditability — Every agent action includes transparent SQL, lineage, and policies. * Scalability — Routine tasks are automated so teams can focus on high-value work. #### Available agents[​](#available-agents "Direct link to Available agents") dbt offers several specialized agents, each designed for specific workflows in the analytics lifecycle to help you scale your data teams across the dbt platform. The following agents are available. Contact your account manager for early access to agents that are in beta or coming soon. ###### Analyst agent [Beta](https://docs.getdbt.com/docs/dbt-versions/product-lifecycles "Go to https://docs.getdbt.com/docs/dbt-versions/product-lifecycles")[​](#analyst-agent- "Direct link to analyst-agent-") Use Copilot to analyze your data and get contextualized results in real time by asking natural language questions to the [Insights](https://docs.getdbt.com/docs/explore/dbt-insights.md) [Analyst agent](https://docs.getdbt.com/docs/dbt-ai/analyst-agent.md). Chat with your data, get accurate answers powered by the [dbt Semantic Layer](https://docs.getdbt.com/docs/use-dbt-semantic-layer/dbt-sl.md). That means consistent, explainable results with transparent SQL, lineage, and policies. The Analyst agent is a beta feature. Enable beta features under **Account settings** > **Personal profile** > **Experimental features**. For more information, see [Preview new dbt platform features](https://docs.getdbt.com/docs/dbt-versions/experimental-features.md). ###### Developer agent [Beta](https://docs.getdbt.com/docs/dbt-versions/product-lifecycles "Go to https://docs.getdbt.com/docs/dbt-versions/product-lifecycles")[Enterprise](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing")[Enterprise +](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing")[​](#developer-agent- "Direct link to developer-agent-") The Developer agent is the next evolution of Copilot in the Studio IDE, purpose-built to streamline the developer experience. Describe the data product or change you want — the agent writes or refactors models, validates with dbt Fusion engine, and runs against your warehouse with full context. Stay in flow in the [Studio IDE](https://docs.getdbt.com/docs/cloud/studio-ide/develop-in-studio.md). The agent always has access to the latest dbt-recommended guidance through [dbt Agent Skills](https://github.com/dbt-labs/dbt-agent-skills) — curated instructions and scripts managed by dbt Labs, available out of the box with no configuration required. For setup instructions and available use cases, see [Developer agent](https://docs.getdbt.com/docs/dbt-ai/developer-agent.md). ###### Discovery agent [Private beta](https://docs.getdbt.com/docs/dbt-versions/product-lifecycles "Go to https://docs.getdbt.com/docs/dbt-versions/product-lifecycles")[​](#discovery-agent- "Direct link to discovery-agent-") Find the right, approved dataset fast in Catalog. The Discovery agent surfaces definitions, freshness, tests, owners, and lineage right where you work. To request access to the Discovery agent, contact your account manager. ###### Observability agent Coming soon[​](#observability-agent- "Direct link to observability-agent-") The Observability agent autonomously and continuously monitors pipelines, flags likely root causes in context, and guides fixes — resulting in faster mean time to resolution, higher reliability, and streamlined ticket queues. No more digging through logs. ###### dbt MCP server[​](#dbt-mcp-server "Direct link to dbt MCP server") Build your own custom agents and copilots with the local or remote dbt MCP server. The [Model Context Protocol (MCP)](https://docs.getdbt.com/docs/dbt-ai/about-mcp.md) makes dbt's structured context available to any AI system. #### Related docs[​](#related-docs "Direct link to Related docs") * [About dbt AI and intelligence](https://docs.getdbt.com/docs/dbt-ai/about-dbt-ai.md) * [Developer agent](https://docs.getdbt.com/docs/dbt-ai/developer-agent.md) * [dbt Copilot](https://docs.getdbt.com/docs/cloud/dbt-copilot.md) * [dbt MCP server](https://docs.getdbt.com/docs/dbt-ai/about-mcp.md) * [dbt Semantic Layer](https://docs.getdbt.com/docs/use-dbt-semantic-layer/dbt-sl.md) * [dbt Insights](https://docs.getdbt.com/docs/explore/dbt-insights.md) #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### dbt Catalog FAQs [Catalog](https://docs.getdbt.com/docs/explore/explore-projects.md) is dbt’s new knowledge base and lineage visualization experience. It offers an interactive and high-level view of your company’s entire data estate, where you can dive deep into the context you need to understand and improve lineage so your teams can trust the data they’re using to make decisions. #### Overview[​](#overview "Direct link to Overview")  How does dbt Catalog help with data quality? Catalog makes it easy and intuitive to understand your entire lineage — from data source to the reporting layer — so you can troubleshoot, improve, and optimize your pipelines. With built-in features like project recommendations and model performance analysis, you can be sure you have appropriate test and documentation coverage across your estate and quickly spot and remediate slow-running models. With column-level lineage, you can quickly identify the potential downstream impacts of table changes or work backwards to quickly understand the root cause of an incident. Catalog gives teams the insights they need to improve data quality proactively, ensuring pipelines stay performant and data trust remains solid.  How is dbt Catalog priced? Catalog is generally available to all regions and deployment types on all dbt [Enterprise-tier and Starter plans](https://www.getdbt.com/). Certain features within Catalog, such as project recommendations, multi-project lineage, column-level lineage, and more are only available on the Enterprise and Enterprise+ plans. Catalog can be accessed by users with developer and read-only seats.  What happened to dbt Docs? Catalog is the default documentation experience for dbt customers. dbt Docs is still available but doesn't offer the same speed, metadata, or visibility as Catalog and will become a legacy feature. #### How dbt Catalog works[​](#how-dbt-catalog-works "Direct link to How dbt Catalog works")  Can I use dbt Catalog on-premises or with my self-hosted dbt Core deployment? No. Catalog and all of its features are only available as a dbt user experience. Catalog reflects the metadata from your dbt project(s) and their runs.  How does dbt Catalog support dbt environments? Catalog supports a production or staging [deployment environment](https://docs.getdbt.com/docs/deploy/deploy-environments.md) for each project you want to explore. It defaults to the latest production or staging state of a project. Users can only assign one production and one staging environment per dbt project. Support for development (dbt CLI and Studio IDE) environments is coming soon.  How do I get started in Catalog? How does it update? Simply select **Catalog** from the dbt top navigation bar. Catalog automatically updates after each dbt run in the given project’s environment (production, by default). The dbt commands you run within the environment will generate and update the metadata in Catalog, so make sure to run the correct combination of commands within the jobs of the environment; for more details, refer to [Generate metadata](https://docs.getdbt.com/docs/explore/explore-projects.md#generate-metadata).  Is it possible to export dbt lineage to an external system or catalog? Yes. The lineage that powers Catalog is also available through the Discovery API.  How does dbt Catalog integrate with third-party tools to show end-to-end lineage? Catalog reflects all the lineage defined within the dbt project. Our vision for Catalog is to incorporate additional metadata from external tools like data loaders (sources) and BI/analytics tools (exposures) integrated with dbt, all seamlessly incorporated into the lineage of the dbt project.  Why did previously visible data in dbt Catalog disappear? Catalog automatically deletes stale metadata after 3 months if no jobs were run to refresh it. To avoid this, make sure you schedule jobs to run more frequently than 3 months with the necessary commands. #### Key features[​](#key-features "Direct link to Key features")  Does dbt Catalog support multi-project discovery (dbt Mesh)? Yes. Refer to [Explore multiple projects](https://docs.getdbt.com/docs/explore/explore-multiple-projects.md) to learn more.  What kind of search capabilities does dbt Catalog support? Resource search capabilities include using keywords, partial strings (fuzzy search), and set operators like `OR`. Meanwhile, lineage search supports using dbt selectors. For details, refer to [Keyword search](https://docs.getdbt.com/docs/explore/explore-projects.md#search-resources).  Can I view model execution information for a job that is currently being run? dbt updates the performance and metrics after a job run. However, **Model performance** charts only display data for *completed* UTC days. This means that runs from the current UTC day won't appear in the charts until the UTC day changes (midnight UTC). For example, if you're in US Pacific time, you won't see the current day's runs reflected until 4:00 PM PT.  Can I analyze the number of successful model runs within a month? A chart of models built by month is available in thedbt dashboard.  Can model or column descriptions be edited within dbt? Yes. Today, you can edit descriptions in the Studio IDE or dbt CLI by changing the YAML files within the dbt project. In the future, Catalog will support more ways of editing descriptions.  Where do recommendations come from? Can they be customized? Recommendations largely mirror the best practice rules from the `dbt_project_evaluator` package. At this time, recommendations can’t be customized. In the future, Catalog will likely support recommendation customization capabilities (for example, in project code). #### Column-level lineage[​](#column-level-lineage "Direct link to Column-level lineage")  What are the best use cases for column-level lineage in dbt Catalog? Column-level lineage in Catalog can be used to improve many data development workflows, including: * **Audit** — Visualize how data moves through and is used in your dbt project * **Root cause** — Improve time to detect and resolve data quality issues, tracking back to the source * **Impact analysis** — Trace transformations and usage to avoid introducing issues for consumers * **Efficiency** — Prune unnecessary columns to reduce costs and data team overhead  Does the column-level lineage remain functional even if column names vary between models? Yes. Column-level lineage can handle name changes across instances of the column in the dbt project.  Can multiple projects leverage the same column definition? No. Cross-project column lineage is supported in the sense of viewing how a public model is used across projects, but not on a column-level.  Can column descriptions be propagated down in downstream lineage automatically? Yes, a reused column, labeled as passthrough or rename, inherits its description from source and upstream model columns. In other words, source and upstream model columns propagate their descriptions downstream whenever they are not transformed, meaning you don’t need to manually define the description. Refer to [Inherited column descriptions](https://docs.getdbt.com/docs/explore/column-level-lineage.md#inherited-column-descriptions) for more info.  Is column-level lineage also available in the development tab? Not currently, but we plan to incorporate column-level awareness across features in dbt in the future. #### Availability, access, and permissions[​](#availability-access-and-permissions "Direct link to Availability, access, and permissions")  How can non-developers interact with dbt Catalog? Read-only users can consume metadata in Catalog. More bespoke experiences and exploration avenues for analysts and less-technical contributors will be provided in the future.  Does dbt Catalog require a specific dbt plan? Catalog is available on dbt Starter and all Enterprise plans. Certain features within Catalog, like project recommendations, multi-project lineage, column-level lineage, and more are only available on the Enterprise and Enterprise+ plans.  Will dbt Core users be able to leverage any of these new dbt Catalog features? No. Catalog is a dbt-only product experience.  Is it possible to access dbt Catalog using a read-only license? Yes, users with read-only access can use the Catalog. Specific feature availability within Catalog will depend on your dbt plan.  Is there an easy way to share useful dbt Catalog content with people outside of dbt? The ability to embed and share views is being evaluated as a potential future capability.   Is dbt Catalog accessible from other areas inside dbt? Yes, you can [access Catalog from various dbt features](https://docs.getdbt.com/docs/explore/access-from-dbt-cloud.md), ensuring you have a seamless experience navigating between resources and lineage in your project. While the primary way to access Catalog is through the **Catalog** link in the navigation, you can also access it from the [Studio IDE](https://docs.getdbt.com/docs/explore/access-from-dbt-cloud.md#dbt-cloud-ide), [the lineage tab in jobs](https://docs.getdbt.com/docs/explore/access-from-dbt-cloud.md#lineage-tab-in-jobs), and the [model timing tab in jobs](https://docs.getdbt.com/docs/explore/access-from-dbt-cloud.md#model-timing-tab-in-jobs). #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### dbt platform compatible track - changelog Select the **Compatible** and **Extended** release tracks if you need a less-frequent release cadence, the ability to test new dbt releases before they go live in production, and/or ongoing compatibility with the latest open source releases of dbt Core. Each monthly **Compatible** release includes functionality matching up-to-date open source versions of dbt Core and adapters at the time of release. For more information, see [release tracks](https://docs.getdbt.com/docs/dbt-versions/cloud-release-tracks.md). #### March 2026[​](#march-2026 "Direct link to March 2026") The compatible release scheduled for March 2026 will be skipped in order to further stabilize the minor upgrade of `dbt-core==1.11.6` across the dbt platform. Compatible releases will resume in April 2026. #### February 2026[​](#february-2026 "Direct link to February 2026") Release date: February 27, 2026 ##### dbt cloud-based platform[​](#dbt-cloud-based-platform "Direct link to dbt cloud-based platform") ##### Features[​](#features "Direct link to Features") * Support partial success result status for Advanced CI ##### Dependencies[​](#dependencies "Direct link to Dependencies") * Update dbt-databricks upper bound to 1.12 This compatible release includes functionality from the following versions of dbt Core OSS: ```text dbt-core==1.11.6 # shared interfaces dbt-adapters==1.22.6 dbt-common==1.37.2 dbt-extractor==0.6.0 dbt-semantic-interfaces==0.9.0 dbt-sl-sdk[sync]==0.13.1 # adapters dbt-athena==1.10.0 dbt-bigquery==1.11.0 dbt-databricks==1.11.5 dbt-fabric==1.9.4 dbt-postgres==1.10.0 dbt-redshift==1.10.1 dbt-snowflake==1.11.2 dbt-spark==1.10.1 dbt-synapse==1.8.4 dbt-teradata==1.10.1 dbt-trino==1.10.1 ``` Changelogs: * [dbt-core 1.11.6](https://github.com/dbt-labs/dbt-core/blob/1.11.latest/CHANGELOG.md#dbt-core-1116---february-17-2026) * [dbt-adapters 1.22.6](https://github.com/dbt-labs/dbt-adapters/blob/main/dbt-adapters/CHANGELOG.md#dbt-adapters-1226---february-17-2026) * [dbt-common 1.37.2](https://github.com/dbt-labs/dbt-common/blob/main/CHANGELOG.md#dbt-common-1372---december-15-2025) * [dbt-athena 1.10.0](https://github.com/dbt-labs/dbt-adapters/blob/stable/dbt-athena/CHANGELOG.md#dbt-athena-1100---december-22-2025) * [dbt-bigquery 1.11.0](https://github.com/dbt-labs/dbt-adapters/blob/stable/dbt-bigquery/CHANGELOG.md#dbt-bigquery-1110---december-22-2025) * [dbt-databricks 1.11.5](https://github.com/databricks/dbt-databricks/blob/main/CHANGELOG.md#dbt-databricks-1115-feb-19-2026) * [dbt-fabric 1.9.4](https://github.com/microsoft/dbt-fabric/releases/tag/v1.9.4) * [dbt-postgres 1.10.0](https://github.com/dbt-labs/dbt-adapters/blob/stable/dbt-postgres/CHANGELOG.md#dbt-postgres-1100---december-22-2025) * [dbt-redshift 1.10.1](https://github.com/dbt-labs/dbt-adapters/blob/stable/dbt-redshift/CHANGELOG.md#dbt-redshift-1101---february-11-2026) * [dbt-snowflake 1.11.2](https://github.com/dbt-labs/dbt-adapters/blob/stable/dbt-snowflake/CHANGELOG.md#dbt-snowflake-1112---february-11-2026) * [dbt-spark 1.10.1](https://github.com/dbt-labs/dbt-adapters/blob/stable/dbt-spark/CHANGELOG.md#dbt-spark-1101---february-11-2026) * [dbt-synapse 1.8.4](https://github.com/microsoft/dbt-synapse/blob/v1.8.latest/CHANGELOG.md) * [dbt-teradata 1.10.1](https://github.com/Teradata/dbt-teradata/releases/tag/v1.10.1) * [dbt-trino 1.10.1](https://github.com/starburstdata/dbt-trino/blob/master/CHANGELOG.md#dbt-trino-1101---january-16-2026) #### January 2026[​](#january-2026 "Direct link to January 2026") Release date: January 22, 2026 ##### dbt cloud-based platform[​](#dbt-cloud-based-platform-1 "Direct link to dbt cloud-based platform") ##### Under the Hood[​](#under-the-hood "Direct link to Under the Hood") * Add debug log for local md5 hash for fusion conformance * Resolve Click CLI UserWarning regarding --target and --profile usage in Advanced CI This compatible release includes functionality from the following versions of dbt Core OSS: ```text dbt-core==1.10.19 # shared interfaces dbt-adapters==1.22.5 dbt-common==1.37.2 dbt-extractor==0.6.0 dbt-semantic-interfaces==0.9.0 dbt-sl-sdk[sync]==0.13.1 # adapters dbt-athena==1.10.0 dbt-bigquery==1.11.0 dbt-databricks==1.10.19 dbt-fabric==1.9.4 dbt-postgres==1.10.0 dbt-redshift==1.10.0 dbt-snowflake==1.11.1 dbt-spark==1.9.3 dbt-synapse==1.8.4 dbt-teradata==1.10.1 dbt-trino==1.10.1 ``` Changelogs: * [dbt-core 1.10.19](https://github.com/dbt-labs/dbt-core/blob/1.10.latest/CHANGELOG.md#dbt-core-11019---january-20-2026) * [dbt-adapters 1.22.5](https://github.com/dbt-labs/dbt-adapters/blob/main/dbt-adapters/CHANGELOG.md#dbt-adapters-1225---january-14-2026) * [dbt-common 1.37.2](https://github.com/dbt-labs/dbt-common/blob/main/CHANGELOG.md#dbt-common-1372---december-15-2025) * [dbt-athena 1.10.0](https://github.com/dbt-labs/dbt-adapters/blob/stable/dbt-athena/CHANGELOG.md#dbt-athena-1100---december-22-2025) * [dbt-bigquery 1.11.0](https://github.com/dbt-labs/dbt-adapters/blob/stable/dbt-bigquery/CHANGELOG.md#dbt-bigquery-1110---december-22-2025) * [dbt-databricks 1.10.19](https://github.com/databricks/dbt-databricks/blob/1.10.latest/CHANGELOG.md#dbt-databricks-11019-jan-21-2026) * [dbt-fabric 1.9.4](https://github.com/microsoft/dbt-fabric/releases/tag/v1.9.4) * [dbt-postgres 1.10.0](https://github.com/dbt-labs/dbt-adapters/blob/stable/dbt-postgres/CHANGELOG.md#dbt-postgres-1100---december-22-2025) * [dbt-redshift 1.10.0](https://github.com/dbt-labs/dbt-adapters/blob/stable/dbt-redshift/CHANGELOG.md#dbt-redshift-1100---december-22-2025) * [dbt-snowflake 1.11.1](https://github.com/dbt-labs/dbt-adapters/blob/stable/dbt-snowflake/CHANGELOG.md#dbt-snowflake-1111---january-08-2026) * [dbt-spark 1.9.3](https://github.com/dbt-labs/dbt-adapters/blob/stable/dbt-spark/CHANGELOG.md#dbt-spark-193---july-16-2025) * [dbt-synapse 1.8.4](https://github.com/microsoft/dbt-synapse/blob/v1.8.latest/CHANGELOG.md) * [dbt-teradata 1.10.1](https://github.com/Teradata/dbt-teradata/releases/tag/v1.10.1) * [dbt-trino 1.10.1](https://github.com/starburstdata/dbt-trino/blob/master/CHANGELOG.md#dbt-trino-1101---january-16-2026) #### December 2025[​](#december-2025 "Direct link to December 2025") Release date: December 9, 2025 ##### dbt cloud-based platform[​](#dbt-cloud-based-platform-2 "Direct link to dbt cloud-based platform") This compatible release includes functionality from the following versions of dbt Core OSS: ```text dbt-core==1.10.15 # shared interfaces dbt-adapters==1.16.7 dbt-common==1.33.0 dbt-semantic-interfaces==0.9.0 # adapters dbt-athena==1.9.5 dbt-bigquery==1.10.3 dbt-databricks==1.10.15 dbt-extractor==0.6.0 dbt-fabric==1.9.4 dbt-postgres==1.9.1 dbt-redshift==1.9.5 dbt-sl-sdk[sync]==0.13.0 dbt-snowflake==1.10.3 dbt-spark==1.9.3 dbt-synapse==1.8.4 dbt-teradata==1.10.0 dbt-trino==1.9.3 ``` Changelogs: * [dbt-core 1.10.15](https://github.com/dbt-labs/dbt-core/blob/1.10.latest/CHANGELOG.md#dbt-core-11015---november-12-2025) * [dbt-adapters 1.16.7](https://github.com/dbt-labs/dbt-adapters/blob/main/dbt-adapters/CHANGELOG.md#dbt-adapters-1166---september-03-2025) * [dbt-common 1.33.0](https://github.com/dbt-labs/dbt-common/blob/main/CHANGELOG.md#dbt-common-1330---october-20-2025) * [dbt-athena 1.9.5](https://github.com/dbt-labs/dbt-adapters/blob/main/dbt-athena/CHANGELOG.md#dbt-athena-194---april-28-2025) * [dbt-bigquery 1.10.3](https://github.com/dbt-labs/dbt-adapters/blob/main/dbt-bigquery/CHANGELOG.md#dbt-bigquery-1101---july-29-2025) * [dbt-databricks 1.10.15](https://github.com/databricks/dbt-databricks/blob/1.10.latest/CHANGELOG.md#dbt-databricks-11015-nov-17-2025) * [dbt-fabric 1.9.4](https://github.com/microsoft/dbt-fabric/releases/tag/v1.9.4) * [dbt-postgres 1.9.1](https://github.com/dbt-labs/dbt-adapters/blob/main/dbt-postgres/CHANGELOG.md#changelog) * [dbt-redshift 1.9.5](https://github.com/dbt-labs/dbt-adapters/blob/main/dbt-redshift/CHANGELOG.md#dbt-redshift-195---may-13-2025) * [dbt-snowflake 1.10.3](http://github.com/dbt-labs/dbt-adapters/blob/main/dbt-snowflake/CHANGELOG.md) * [dbt-spark 1.9.3](https://github.com/dbt-labs/dbt-adapters/blob/main/dbt-spark/CHANGELOG.md#dbt-spark-193---july-16-2025) * [dbt-synapse 1.8.4](https://github.com/microsoft/dbt-synapse/blob/v1.8.latest/CHANGELOG.md) * [dbt-teradata 1.10.0](https://github.com/Teradata/dbt-teradata/releases/tag/v1.10.0) * [dbt-trino 1.9.3](https://github.com/starburstdata/dbt-trino/blob/master/CHANGELOG.md#dbt-trino-193---july-22-2025) #### November 2025[​](#november-2025 "Direct link to November 2025") Release date: November 11, 2025 ##### dbt cloud-based platform[​](#dbt-cloud-based-platform-3 "Direct link to dbt cloud-based platform") ##### Under the Hood[​](#under-the-hood-1 "Direct link to Under the Hood") * Record source column schemas when `DBT_RECORDER_MODE` is set * Issue additional column schema retrieval for hardcoded relation references in SQL * Make source schema recording cache thread-safe * Record column schemas for deferred relations and unselected dependencies This compatible release includes functionality from the following versions of dbt Core OSS: ```text dbt-core==1.10.15 # shared interfaces dbt-adapters==1.16.7 dbt-common==1.33.0 dbt-semantic-interfaces==0.9.0 # adapters dbt-athena==1.9.5 dbt-bigquery==1.10.3 dbt-databricks==1.10.15 dbt-extractor==0.6.0 dbt-fabric==1.9.4 dbt-postgres==1.9.1 dbt-redshift==1.9.5 dbt-sl-sdk[sync]==0.13.0 dbt-snowflake==1.10.3 dbt-spark==1.9.3 dbt-synapse==1.8.4 dbt-teradata==1.10.0 dbt-trino==1.9.3 ``` Changelogs: * [dbt-core 1.10.15](https://github.com/dbt-labs/dbt-core/blob/1.10.latest/CHANGELOG.md#dbt-core-11015---november-12-2025) * [dbt-adapters 1.16.7](https://github.com/dbt-labs/dbt-adapters/blob/main/dbt-adapters/CHANGELOG.md#dbt-adapters-1166---september-03-2025) * [dbt-common 1.33.0](https://github.com/dbt-labs/dbt-common/blob/main/CHANGELOG.md#dbt-common-1330---october-20-2025) * [dbt-athena 1.9.5](https://github.com/dbt-labs/dbt-adapters/blob/main/dbt-athena/CHANGELOG.md#dbt-athena-194---april-28-2025) * [dbt-bigquery 1.10.3](https://github.com/dbt-labs/dbt-adapters/blob/main/dbt-bigquery/CHANGELOG.md#dbt-bigquery-1101---july-29-2025) * [dbt-databricks 1.10.15](https://github.com/databricks/dbt-databricks/blob/1.10.latest/CHANGELOG.md#dbt-databricks-11015-nov-17-2025) * [dbt-fabric 1.9.4](https://github.com/microsoft/dbt-fabric/releases/tag/v1.9.4) * [dbt-postgres 1.9.1](https://github.com/dbt-labs/dbt-adapters/blob/main/dbt-postgres/CHANGELOG.md#changelog) * [dbt-redshift 1.9.5](https://github.com/dbt-labs/dbt-adapters/blob/main/dbt-redshift/CHANGELOG.md#dbt-redshift-195---may-13-2025) * [dbt-snowflake 1.10.3](http://github.com/dbt-labs/dbt-adapters/blob/main/dbt-snowflake/CHANGELOG.md) * [dbt-spark 1.9.3](https://github.com/dbt-labs/dbt-adapters/blob/main/dbt-spark/CHANGELOG.md#dbt-spark-193---july-16-2025) * [dbt-synapse 1.8.4](https://github.com/microsoft/dbt-synapse/blob/v1.8.latest/CHANGELOG.md) * [dbt-teradata 1.10.0](https://github.com/Teradata/dbt-teradata/releases/tag/v1.10.0) * [dbt-trino 1.9.3](https://github.com/starburstdata/dbt-trino/blob/master/CHANGELOG.md#dbt-trino-193---july-22-2025) #### October 2025[​](#october-2025 "Direct link to October 2025") Release date: October 23, 2025 ##### dbt cloud-based platform[​](#dbt-cloud-based-platform-4 "Direct link to dbt cloud-based platform") ##### Under the Hood[​](#under-the-hood-2 "Direct link to Under the Hood") * Add instrumentation to adapter methods for reliable debugging traces at the adapter boundary This compatible release includes functionality from the following versions of dbt Core OSS: ```text dbt-core==1.10.13 # shared interfaces dbt-adapters==1.16.7 dbt-common==1.33.0 dbt-semantic-interfaces==0.9.0 # adapters dbt-athena==1.9.5 dbt-bigquery==1.10.2 dbt-databricks==1.10.14 dbt-extractor==0.6.0 dbt-fabric==1.9.4 dbt-postgres==1.9.1 dbt-redshift==1.9.5 dbt-sl-sdk[sync]==0.13.0 dbt-snowflake==1.10.2 dbt-spark==1.9.3 dbt-synapse==1.8.4 dbt-teradata==1.10.0 dbt-trino==1.9.3 ``` Changelogs: * [dbt-core 1.10.13](https://github.com/dbt-labs/dbt-core/blob/1.10.latest/CHANGELOG.md#dbt-core-11013---september-25-2025) * [dbt-adapters 1.16.7](https://github.com/dbt-labs/dbt-adapters/blob/main/dbt-adapters/CHANGELOG.md#dbt-adapters-1166---september-03-2025) * [dbt-common 1.33.0](https://github.com/dbt-labs/dbt-common/blob/main/CHANGELOG.md#dbt-common-1330---october-20-2025) * [dbt-athena 1.9.5](https://github.com/dbt-labs/dbt-adapters/blob/main/dbt-athena/CHANGELOG.md#dbt-athena-194---april-28-2025) * [dbt-bigquery 1.10.2](https://github.com/dbt-labs/dbt-adapters/blob/main/dbt-bigquery/CHANGELOG.md#dbt-bigquery-1101---july-29-2025) * [dbt-databricks 1.10.14](https://github.com/databricks/dbt-databricks/blob/main/CHANGELOG.md#dbt-databricks-11014-october-22-2025) * [dbt-fabric 1.9.4](https://github.com/microsoft/dbt-fabric/releases/tag/v1.9.4) * [dbt-postgres 1.9.1](https://github.com/dbt-labs/dbt-adapters/blob/main/dbt-postgres/CHANGELOG.md#changelog) * [dbt-redshift 1.9.5](https://github.com/dbt-labs/dbt-adapters/blob/main/dbt-redshift/CHANGELOG.md#dbt-redshift-195---may-13-2025) * [dbt-snowflake 1.10.2](http://github.com/dbt-labs/dbt-adapters/blob/main/dbt-snowflake/CHANGELOG.md) * [dbt-spark 1.9.3](https://github.com/dbt-labs/dbt-adapters/blob/main/dbt-spark/CHANGELOG.md#dbt-spark-193---july-16-2025) * [dbt-synapse 1.8.4](https://github.com/microsoft/dbt-synapse/blob/v1.8.latest/CHANGELOG.md) * [dbt-teradata 1.10.0](https://github.com/Teradata/dbt-teradata/releases/tag/v1.10.0) * [dbt-trino 1.9.3](https://github.com/starburstdata/dbt-trino/blob/master/CHANGELOG.md#dbt-trino-193---july-22-2025) #### September 2025[​](#september-2025 "Direct link to September 2025") Release date: September 10, 2025 This compatible release includes functionality from the following versions of dbt Core OSS: ```text dbt-core==1.10.11 # shared interfaces dbt-adapters==1.16.6 dbt-common==1.29.0 dbt-semantic-interfaces==0.9.0 # adapters dbt-athena==1.9.5 dbt-bigquery==1.10.2 dbt-databricks==1.10.12 dbt-extractor==0.6.0 dbt-fabric==1.9.4 dbt-postgres==1.9.1 dbt-protos==1.0.348 dbt-redshift==1.9.5 dbt-sl-sdk[sync]==0.13.0 dbt-snowflake==1.10.2 dbt-spark==1.9.3 dbt-synapse==1.8.4 dbt-teradata==1.10.0 dbt-trino==1.9.3 ``` Changelogs: * [dbt-core 1.10.11](https://github.com/dbt-labs/dbt-core/blob/1.10.latest/CHANGELOG.md#dbt-core-11011---september-04-2025) * [dbt-adapters 1.16.6](https://github.com/dbt-labs/dbt-adapters/blob/main/dbt-adapters/CHANGELOG.md#dbt-adapters-1166---september-03-2025) * [dbt-common 1.29.0](https://github.com/dbt-labs/dbt-common/blob/main/CHANGELOG.md#dbt-common-1290---september-04-2025) * [dbt-athena 1.9.4](https://github.com/dbt-labs/dbt-adapters/blob/main/dbt-athena/CHANGELOG.md#dbt-athena-194---april-28-2025) * [dbt-bigquery 1.10.2](https://github.com/dbt-labs/dbt-adapters/blob/main/dbt-bigquery/CHANGELOG.md#dbt-bigquery-1101---july-29-2025) * [dbt-databricks 1.10.12](https://github.com/databricks/dbt-databricks/blob/main/CHANGELOG.md#dbt-databricks-11012-september-8-2025) * [dbt-fabric 1.9.4](https://github.com/microsoft/dbt-fabric/releases/tag/v1.9.4) * [dbt-postgres 1.9.1](https://github.com/dbt-labs/dbt-adapters/blob/main/dbt-postgres/CHANGELOG.md#changelog) * [dbt-redshift 1.9.5](https://github.com/dbt-labs/dbt-adapters/blob/main/dbt-redshift/CHANGELOG.md#dbt-redshift-195---may-13-2025) * [dbt-snowflake 1.10.2](http://github.com/dbt-labs/dbt-adapters/blob/main/dbt-snowflake/CHANGELOG.md) * [dbt-spark 1.9.3](https://github.com/dbt-labs/dbt-adapters/blob/main/dbt-spark/CHANGELOG.md#dbt-spark-193---july-16-2025) * [dbt-synapse 1.8.4](https://github.com/microsoft/dbt-synapse/blob/v1.8.latest/CHANGELOG.md) * [dbt-teradata 1.10.0](https://github.com/Teradata/dbt-teradata/releases/tag/v1.10.0) * [dbt-trino 1.9.3](https://github.com/starburstdata/dbt-trino/blob/master/CHANGELOG.md#dbt-trino-193---july-22-2025) #### August 2025[​](#august-2025 "Direct link to August 2025") Release date: August 12, 2025 ##### Notable dbt Core OSS changes[​](#notable-dbt-core-oss-changes "Direct link to Notable dbt Core OSS changes") This compatible upgrade brings in a minor update to `dbt-core`, from `dbt-core==1.9.8` to `dbt-core==1.10.8`. Some noteworthy changes from this minor version include: * Introduction of several new [deprecations](https://docs.getdbt.com/reference/deprecations.md) that warn about project incompatibilities between dbt Core and Fusion engines. * Support for defining `meta` and `tags` within `config` of columns and exposures, as well as defining `freshness` within `config` of sources. These changes lead to manifest.json minor schema evolutions which may cause an intermittent increase in false positives during `state:modified` comparisons. ##### dbt cloud-based platform[​](#dbt-cloud-based-platform-5 "Direct link to dbt cloud-based platform") ##### Fixes[​](#fixes "Direct link to Fixes") * Update generate publications script to add project and env id in generated publication file * Use JSON stream for publication artifact generation script * Get environment variables correctly from environment for publication artifacts * Adding `--resource-type` and `--exclude-resource-type` flags to Semantic Layer commands * Azure DevOps Private Packages are now properly matched with Private Package Definition in packages.yml ##### Under the Hood[​](#under-the-hood-3 "Direct link to Under the Hood") * Prepare support for Private Package's URLs with multiple levels * Disable telemetry client logger * Update semantic layer SDK to 0.11 This release includes functionality from the following versions of dbt Core OSS: ```text dbt-core==1.10.8 # shared interfaces dbt-adapters==1.16.3 dbt-common==1.27.1 dbt-semantic-interfaces==0.9.0 dbt-extractor==0.6.0 dbt-protos==1.0.348 # dbt-adapters dbt-athena==1.9.4 dbt-bigquery==1.10.1 dbt-databricks==1.10.10 dbt-fabric==1.9.4 dbt-postgres==1.9.0 dbt-redshift==1.9.5 dbt-snowflake==1.10.0 dbt-spark==1.9.3 dbt-synapse==1.8.2 dbt-teradata==1.9.3 dbt-trino==1.9.3 ``` Changelogs: * [dbt-core 1.10.8](https://github.com/dbt-labs/dbt-core/blob/1.10.latest/CHANGELOG.md#dbt-core-1108---august-12-2025) * [dbt-adapters 1.16.3](https://github.com/dbt-labs/dbt-adapters/blob/main/dbt-adapters/CHANGELOG.md#dbt-adapters-1163---july-21-2025) * [dbt-common 1.25.0](https://github.com/dbt-labs/dbt-common/blob/main/CHANGELOG.md#dbt-common-1271---july-21-2025) * [dbt-athena 1.9.4](https://github.com/dbt-labs/dbt-adapters/blob/main/dbt-athena/CHANGELOG.md#dbt-athena-194---april-28-2025) * [dbt-bigquery 1.10.1](https://github.com/dbt-labs/dbt-adapters/blob/main/dbt-bigquery/CHANGELOG.md#dbt-bigquery-1101---july-29-2025) * [dbt-databricks 1.9.7](https://github.com/databricks/dbt-databricks/blob/main/CHANGELOG.md#dbt-databricks-1109-august-7-2025) * [dbt-fabric 1.9.4](https://github.com/microsoft/dbt-fabric/releases/tag/v1.9.4) * [dbt-postgres 1.9.0](https://github.com/dbt-labs/dbt-postgres/blob/main/CHANGELOG.md#dbt-postgres-190---december-09-2024) * [dbt-redshift 1.9.5](https://github.com/dbt-labs/dbt-adapters/blob/main/dbt-redshift/CHANGELOG.md#dbt-redshift-195---may-13-2025) * [dbt-snowflake 1.10.0](https://github.com/dbt-labs/dbt-adapters/blob/main/dbt-snowflake/CHANGELOG.md#dbt-snowflake-1100-rc3---june-24-2025) * [dbt-spark 1.9.3](https://github.com/dbt-labs/dbt-adapters/blob/main/dbt-spark/CHANGELOG.md#dbt-spark-193---july-16-2025) * [dbt-synapse 1.8.2](https://github.com/microsoft/dbt-synapse/blob/v1.8.latest/CHANGELOG.md) * [dbt-teradata 1.9.3](https://github.com/Teradata/dbt-teradata/releases/tag/v1.9.3) * [dbt-trino 1.9.3](https://github.com/starburstdata/dbt-trino/blob/master/CHANGELOG.md#dbt-trino-193---july-22-2025) #### July 2025[​](#july-2025 "Direct link to July 2025") The compatible release slated for July 2025 will be skipped in order to further stabilize the minor upgrade of `dbt-core==1.10.0` ([released June 16, 2025](https://pypi.org/project/dbt-core/1.10.0/)) across the dbt platform. Compatible releases will resume in August 2025. #### June 2025[​](#june-2025 "Direct link to June 2025") Release date: June 12, 2025 This release includes functionality from the following versions of dbt Core OSS: ```text dbt-core==1.9.8 # shared interfaces dbt-adapters==1.15.3 dbt-common==1.25.0 dbt-semantic-interfaces==0.7.4 # adapters dbt-athena==1.9.4 dbt-bigquery==1.9.1 dbt-databricks==1.9.7 dbt-extractor==0.6.0 dbt-fabric==1.9.4 dbt-postgres==1.9.0 dbt-protos==1.0.317 dbt-redshift==1.9.5 dbt-sl-sdk-internal[sync]==0.7.0 dbt-sl-sdk[sync]==0.7.0 dbt-snowflake==1.9.4 dbt-spark==1.9.2 dbt-synapse==1.8.2 dbt-teradata==1.9.2 dbt-trino==1.9.2 ``` Changelogs: * [dbt-core 1.9.8](https://github.com/dbt-labs/dbt-core/blob/1.9.latest/CHANGELOG.md#dbt-core-198---june-10-2025) * [dbt-adapters 1.15.3](https://github.com/dbt-labs/dbt-adapters/blob/main/dbt-adapters/CHANGELOG.md#dbt-adapters-1153---may-20-2025) * [dbt-common 1.25.0](https://github.com/dbt-labs/dbt-common/blob/main/CHANGELOG.md#dbt-common-1250---may-20-2025) * [dbt-athena 1.9.4](https://github.com/dbt-labs/dbt-adapters/blob/main/dbt-athena/CHANGELOG.md#dbt-athena-194---april-28-2025) * [dbt-bigquery 1.9.1](https://github.com/dbt-labs/dbt-bigquery/blob/1.9.latest/CHANGELOG.md#dbt-bigquery-191---january-10-2025) * [dbt-databricks 1.9.7](https://github.com/databricks/dbt-databricks/blob/main/CHANGELOG.md#dbt-databricks-197-feb-25-2025) * [dbt-fabric 1.9.4](https://github.com/microsoft/dbt-fabric/releases/tag/v1.9.4) * [dbt-postgres 1.9.0](https://github.com/dbt-labs/dbt-postgres/blob/main/CHANGELOG.md#dbt-postgres-190---december-09-2024) * [dbt-redshift 1.9.5](https://github.com/dbt-labs/dbt-adapters/blob/main/dbt-redshift/CHANGELOG.md#dbt-redshift-195---may-13-2025) * [dbt-snowflake 1.9.4](https://github.com/dbt-labs/dbt-adapters/blob/main/dbt-snowflake/CHANGELOG.md#dbt-snowflake-194---may-02-2025) * [dbt-spark 1.9.2](https://github.com/dbt-labs/dbt-adapters/blob/main/dbt-spark/CHANGELOG.md#dbt-spark-192---march-07-2025) * [dbt-synapse 1.8.2](https://github.com/microsoft/dbt-synapse/blob/v1.8.latest/CHANGELOG.md) * [dbt-teradata 1.9.2](https://github.com/Teradata/dbt-teradata/releases/tag/v1.9.2) * [dbt-trino 1.9.1](https://github.com/starburstdata/dbt-trino/blob/master/CHANGELOG.md#dbt-trino-192---june-03-2025) #### May 2025[​](#may-2025 "Direct link to May 2025") Release date: May 19, 2025 ##### dbt cloud-based platform[​](#dbt-cloud-based-platform-6 "Direct link to dbt cloud-based platform") These changes reflect capabilities that are only available in the dbt platform. ##### Fixes[​](#fixes-1 "Direct link to Fixes") * Get environment variables correctly from the environment for publication artifacts ##### Under the hood[​](#under-the-hood-4 "Direct link to Under the hood") * Create JSON schemas for PublicationArtifact and ResolvedProjectsArtifact This release includes functionality from the following versions of dbt Core OSS: ```text dbt-core==1.9.4 # shared interfaces dbt-adapters==1.14.8 dbt-common==1.24.0 dbt-semantic-interfaces==0.7.4 # adapters dbt-athena==1.9.4 dbt-bigquery==1.9.1 dbt-databricks==1.9.7 dbt-fabric==1.9.4 dbt-postgres==1.9.0 dbt-redshift==1.9.5 dbt-snowflake==1.9.4 dbt-spark==1.9.2 dbt-synapse==1.8.2 dbt-teradata==1.9.2 dbt-trino==1.9.1 ``` Changelogs: * [dbt-core 1.9.4](https://github.com/dbt-labs/dbt-core/blob/1.9.latest/CHANGELOG.md#dbt-core-194---april-02-2025) * [dbt-adapters 1.14.8](https://github.com/dbt-labs/dbt-adapters/blob/main/dbt-adapters/CHANGELOG.md#dbt-adapters-1148---april-25-2025) * [dbt-common 1.24.0](https://github.com/dbt-labs/dbt-common/blob/main/CHANGELOG.md#dbt-common-1240---may-09-2025) * [dbt-athena 1.9.4](https://github.com/dbt-labs/dbt-adapters/blob/main/dbt-athena/CHANGELOG.md#dbt-athena-194---april-28-2025) * [dbt-bigquery 1.9.1](https://github.com/dbt-labs/dbt-bigquery/blob/1.9.latest/CHANGELOG.md#dbt-bigquery-191---january-10-2025) * [dbt-databricks 1.9.7](https://github.com/databricks/dbt-databricks/blob/main/CHANGELOG.md#dbt-databricks-197-feb-25-2025) * [dbt-fabric 1.9.4](https://github.com/microsoft/dbt-fabric/releases/tag/v1.9.4) * [dbt-postgres 1.9.0](https://github.com/dbt-labs/dbt-postgres/blob/main/CHANGELOG.md#dbt-postgres-190---december-09-2024) * [dbt-redshift 1.9.5](https://github.com/dbt-labs/dbt-adapters/blob/main/dbt-redshift/CHANGELOG.md#dbt-redshift-195---may-13-2025) * [dbt-snowflake 1.9.4](https://github.com/dbt-labs/dbt-adapters/blob/main/dbt-snowflake/CHANGELOG.md#dbt-snowflake-194---may-02-2025) * [dbt-spark 1.9.2](https://github.com/dbt-labs/dbt-adapters/blob/main/dbt-spark/CHANGELOG.md#dbt-spark-192---march-07-2025) * [dbt-synapse 1.8.2](https://github.com/microsoft/dbt-synapse/blob/v1.8.latest/CHANGELOG.md) * [dbt-teradata 1.9.2](https://github.com/Teradata/dbt-teradata/releases/tag/v1.9.2) * [dbt-trino 1.9.1](https://github.com/starburstdata/dbt-trino/blob/master/CHANGELOG.md#dbt-trino-191---march-26-2025) #### April 2025[​](#april-2025 "Direct link to April 2025") Release date: April 9, 2025 ##### dbt Cloud[​](#dbt-cloud "Direct link to dbt Cloud") These changes reflect capabilities that are only available in dbt Cloud. ##### Under the Hood[​](#under-the-hood-5 "Direct link to Under the Hood") * Add secondary profiles to profile.py This release includes functionality from the following versions of dbt Core OSS: ```text dbt-core==1.9.4 # shared interfaces dbt-adapters==1.14.5 dbt-common==1.17.0 dbt-semantic-interfaces==0.7.4 # adapters dbt-athena==1.9.3 dbt-bigquery==1.9.1 dbt-databricks==1.9.7 dbt-fabric==1.9.4 dbt-postgres==1.9.0 dbt-redshift==1.9.3 dbt-snowflake==1.9.2 dbt-spark==1.9.2 dbt-synapse==1.8.2 dbt-teradata==1.9.2 dbt-trino==1.9.1 ``` Changelogs: * [dbt-core 1.9.4](https://github.com/dbt-labs/dbt-core/blob/1.9.latest/CHANGELOG.md#dbt-core-194---april-02-2025) * [dbt-adapters 1.14.5](https://github.com/dbt-labs/dbt-adapters/blob/main/dbt-adapters/CHANGELOG.md#dbt-adapters-1145---april-07-2025) * [dbt-common 1.17.0](https://github.com/dbt-labs/dbt-common/blob/main/CHANGELOG.md#dbt-common-1170---march-31-2025) * [dbt-athena 1.9.3](https://github.com/dbt-labs/dbt-adapters/blob/main/dbt-athena/CHANGELOG.md#dbt-athena-193---april-07-2025) * [dbt-bigquery 1.9.1](https://github.com/dbt-labs/dbt-bigquery/blob/1.9.latest/CHANGELOG.md#dbt-bigquery-191---january-10-2025) * [dbt-databricks 1.9.7](https://github.com/databricks/dbt-databricks/blob/main/CHANGELOG.md#dbt-databricks-197-feb-25-2025) * [dbt-fabric 1.9.4](https://github.com/microsoft/dbt-fabric/releases/tag/v1.9.4) * [dbt-postgres 1.9.0](https://github.com/dbt-labs/dbt-postgres/blob/main/CHANGELOG.md#dbt-postgres-190---december-09-2024) * [dbt-redshift 1.9.3](https://github.com/dbt-labs/dbt-adapters/blob/main/dbt-redshift/CHANGELOG.md#dbt-redshift-193---april-01-2025) * [dbt-snowflake 1.9.2](https://github.com/dbt-labs/dbt-adapters/blob/main/dbt-snowflake/CHANGELOG.md#dbt-snowflake-192---march-07-2025) * [dbt-spark 1.9.2](https://github.com/dbt-labs/dbt-adapters/blob/main/dbt-spark/CHANGELOG.md#dbt-spark-192---march-07-2025) * [dbt-synapse 1.8.2](https://github.com/microsoft/dbt-synapse/blob/v1.8.latest/CHANGELOG.md) * [dbt-teradata 1.9.2](https://github.com/Teradata/dbt-teradata/releases/tag/v1.9.2) * [dbt-trino 1.9.1](https://github.com/starburstdata/dbt-trino/blob/master/CHANGELOG.md#dbt-trino-191---march-26-2025) #### March 2025[​](#march-2025 "Direct link to March 2025") Release date: March 11, 2025 This release includes functionality from the following versions of dbt Core OSS: ```text dbt-core==1.9.3 # shared interfaces dbt-adapters==1.14.1 dbt-common==1.15.0 dbt-semantic-interfaces==0.7.4 # adapters dbt-athena==1.9.2 dbt-bigquery==1.9.1 dbt-databricks==1.9.7 dbt-fabric==1.9.2 dbt-postgres==1.9.0 dbt-redshift==1.9.1 dbt-snowflake==1.9.2 dbt-spark==1.9.2 dbt-synapse==1.8.2 dbt-teradata==1.9.1 dbt-trino==1.9.0 ``` Changelogs: * [dbt Core 1.9.3](https://github.com/dbt-labs/dbt-core/blob/1.9.latest/CHANGELOG.md#dbt-core-193---march-07-2025) * [dbt-adapters 1.14.1](https://github.com/dbt-labs/dbt-adapters/blob/main/dbt-adapters/CHANGELOG.md#dbt-adapters-1141---march-04-2025) * [dbt-common 1.15.0](https://github.com/dbt-labs/dbt-common/blob/main/CHANGELOG.md#dbt-common-1150---february-14-2025) * [dbt-bigquery 1.9.1](https://github.com/dbt-labs/dbt-bigquery/blob/1.9.latest/CHANGELOG.md#dbt-bigquery-191---january-10-2025) * [dbt-databricks 1.9.7](https://github.com/databricks/dbt-databricks/blob/main/CHANGELOG.md#dbt-databricks-197-feb-25-2025) * [dbt-fabric 1.9.2](https://github.com/microsoft/dbt-fabric/releases/tag/v1.9.2) * [dbt-postgres 1.9.0](https://github.com/dbt-labs/dbt-postgres/blob/main/CHANGELOG.md#dbt-postgres-190---december-09-2024) * [dbt-redshift 1.9.1](https://github.com/dbt-labs/dbt-adapters/blob/main/dbt-redshift/CHANGELOG.md#dbt-redshift-191---march-07-2025) * [dbt-snowflake 1.9.2](https://github.com/dbt-labs/dbt-adapters/blob/main/dbt-snowflake/CHANGELOG.md#dbt-snowflake-192---march-07-2025) * [dbt-spark 1.9.2](https://github.com/dbt-labs/dbt-adapters/blob/main/dbt-spark/CHANGELOG.md#dbt-spark-192---march-07-2025) * [dbt-synapse 1.8.2](https://github.com/microsoft/dbt-synapse/blob/v1.8.latest/CHANGELOG.md) * [dbt-teradata 1.9.1](https://github.com/Teradata/dbt-teradata/releases/tag/v1.9.1) * [dbt-trino 1.9.0](https://github.com/starburstdata/dbt-trino/blob/master/CHANGELOG.md#dbt-trino-190---december-20-2024) #### February 2025[​](#february-2025 "Direct link to February 2025") Release date: February 12, 2025 ##### dbt Cloud[​](#dbt-cloud-1 "Direct link to dbt Cloud") These changes reflect capabilities that are only available in dbt. ##### Features[​](#features-1 "Direct link to Features") * Add [`event_time`](https://docs.getdbt.com/reference/resource-configs/event-time.md) to cross-project ref artifact. * Include debug exception message in ObservabilityMetric. ##### Fixes[​](#fixes-2 "Direct link to Fixes") * Adding support for deferral against the new time spine definition. * Fix error messages for SL query. * Semantic Layer commands now respect `--favor-state` when running with `--defer`. This release includes functionality from the following versions of dbt Core OSS: ```text dbt-core==1.9.2 # shared interfaces dbt-adapters==1.14.0 dbt-common==1.14.0 dbt-semantic-interfaces==0.7.4 # adapters dbt-athena==1.9.1 dbt-bigquery==1.9.1 dbt-databricks==1.9.4 dbt-fabric==1.9.0 dbt-postgres==1.9.0 dbt-redshift==1.9.0 dbt-snowflake==1.9.1 dbt-spark==1.9.1 dbt-synapse==1.8.2 dbt-teradata==1.9.1 dbt-trino==1.9.0 ``` Changelogs: * [dbt Core 1.9.2](https://github.com/dbt-labs/dbt-core/blob/1.9.latest/CHANGELOG.md#dbt-core-192---january-29-2025) * [dbt-adapters 1.14.0](https://github.com/dbt-labs/dbt-adapters/blob/main/dbt-adapters/CHANGELOG.md#dbt-adapters-1140---february-07-2025) * [dbt-common 1.14.0](https://github.com/dbt-labs/dbt-common/blob/main/CHANGELOG.md) * [dbt-bigquery 1.9.1](https://github.com/dbt-labs/dbt-bigquery/blob/1.9.latest/CHANGELOG.md#dbt-bigquery-191---january-10-2025) * [dbt-databricks 1.9.4](https://github.com/databricks/dbt-databricks/blob/main/CHANGELOG.md#dbt-databricks-194-jan-30-2024) * [dbt-fabric 1.9.0](https://github.com/microsoft/dbt-fabric/releases/tag/v1.9.0) * [dbt-postgres 1.9.0](https://github.com/dbt-labs/dbt-postgres/blob/main/CHANGELOG.md#dbt-postgres-190---december-09-2024) * [dbt-redshift 1.9.0](https://github.com/dbt-labs/dbt-redshift/blob/1.9.latest/CHANGELOG.md#dbt-redshift-190---december-09-2024) * [dbt-snowflake 1.9.1](https://github.com/dbt-labs/dbt-snowflake/blob/1.9.latest/CHANGELOG.md#dbt-snowflake-191---february-07-2025) * [dbt-spark 1.9.1](https://github.com/dbt-labs/dbt-spark/blob/1.9.latest/CHANGELOG.md#dbt-spark-191---february-07-2025) * [dbt-synapse 1.8.2](https://github.com/microsoft/dbt-synapse/blob/v1.8.latest/CHANGELOG.md) * [dbt-teradata 1.9.1](https://github.com/Teradata/dbt-teradata/releases/tag/v1.9.1) * [dbt-trino 1.9.0](https://github.com/starburstdata/dbt-trino/blob/master/CHANGELOG.md#dbt-trino-190---december-20-2024) #### January 2025[​](#january-2025 "Direct link to January 2025") Release date: January 14, 2025 ##### dbt Cloud[​](#dbt-cloud-2 "Direct link to dbt Cloud") These changes reflect capabilities that are only available in dbt. ##### Features[​](#features-2 "Direct link to Features") * Filter out external exposures in dbt compare. ##### Fixes[​](#fixes-3 "Direct link to Fixes") * Use `meta.dbt_cloud_id` to `build unique_id` for manually defined exposure for merging against a duplicated exposure. This release includes functionality from the following versions of dbt Core OSS: ```text dbt-core==1.9.1 # shared interfaces dbt-adapters==1.13.1 dbt-common==1.14.0 dbt-semantic-interfaces==0.7.4 # adapters dbt-athena==1.9.0 dbt-bigquery==1.9.1 dbt-databricks==1.9.1 dbt-fabric==1.9.0 dbt-postgres==1.9.0 dbt-redshift==1.9.0 dbt-snowflake==1.9.0 dbt-spark==1.9.0 dbt-synapse==1.8.2 dbt-teradata==1.9.0 dbt-trino==1.9.0 ``` Changelogs: * [dbt Core 1.9.1](https://github.com/dbt-labs/dbt-core/blob/1.9.latest/CHANGELOG.md#dbt-core-191---december-16-2024) * [dbt-adapters 1.13.1](https://github.com/dbt-labs/dbt-adapters/blob/main/dbt-adapters/CHANGELOG.md#dbt-adapters-1131---january-10-2025) * [dbt-common 1.14.0](https://github.com/dbt-labs/dbt-common/blob/main/CHANGELOG.md) * [dbt-bigquery 1.9.1](https://github.com/dbt-labs/dbt-bigquery/blob/1.9.latest/CHANGELOG.md#dbt-bigquery-191---january-10-2025) * [dbt-databricks 1.9.1](https://github.com/databricks/dbt-databricks/blob/main/CHANGELOG.md#dbt-databricks-191-december-16-2024) * [dbt-fabric 1.9.0](https://github.com/microsoft/dbt-fabric/releases/tag/v1.9.0) * [dbt-postgres 1.9.0](https://github.com/dbt-labs/dbt-postgres/blob/main/CHANGELOG.md#dbt-postgres-190---december-09-2024) * [dbt-redshift 1.9.0](https://github.com/dbt-labs/dbt-redshift/blob/1.9.latest/CHANGELOG.md#dbt-redshift-190---december-09-2024) * [dbt-snowflake 1.9.0](https://github.com/dbt-labs/dbt-snowflake/blob/1.9.latest/CHANGELOG.md#dbt-snowflake-190---december-09-2024) * [dbt-spark 1.9.0](https://github.com/dbt-labs/dbt-spark/blob/1.9.latest/CHANGELOG.md#dbt-spark-190---december-10-2024) * [dbt-synapse 1.8.2](https://github.com/microsoft/dbt-synapse/blob/v1.8.latest/CHANGELOG.md) * [dbt-teradata 1.9.0](https://github.com/Teradata/dbt-teradata/releases/tag/v1.9.0) * [dbt-trino 1.9.0](https://github.com/starburstdata/dbt-trino/blob/master/CHANGELOG.md#dbt-trino-190---december-20-2024) #### December 2024[​](#december-2024 "Direct link to December 2024") Release date: December 12, 2024 This release includes functionality from the following versions of dbt Core OSS: ```text dbt-core==1.9.0 # shared interfaces dbt-adapters==1.10.4 dbt-common==1.14.0 dbt-semantic-interfaces==0.7.4 # adapters dbt-athena==1.9.0 dbt-bigquery==1.9.0 dbt-databricks==1.9.0 dbt-fabric==1.8.8 dbt-postgres==1.9.0 dbt-redshift==1.9.0 dbt-snowflake==1.9.0 dbt-spark==1.9.0 dbt-synapse==1.8.2 dbt-teradata==1.8.2 dbt-trino==1.8.5 ``` Changelogs: * [dbt Core 1.9.0](https://github.com/dbt-labs/dbt-core/blob/1.9.latest/CHANGELOG.md#dbt-core-190---december-09-2024) * [dbt-adapters 1.10.4](https://github.com/dbt-labs/dbt-adapters/blob/main/dbt-adapters/CHANGELOG.md#dbt-adapters-1104---november-11-2024) * [dbt-common 1.14.0](https://github.com/dbt-labs/dbt-common/blob/main/CHANGELOG.md) * [dbt-bigquery 1.9.0](https://github.com/dbt-labs/dbt-bigquery/blob/1.9.latest/CHANGELOG.md#dbt-bigquery-190---december-09-2024) * [dbt-databricks 1.9.0](https://github.com/databricks/dbt-databricks/blob/main/CHANGELOG.md#dbt-databricks-190-december-9-2024) * [dbt-fabric 1.8.8](https://github.com/microsoft/dbt-fabric/blob/v1.8.latest/CHANGELOG.md) * [dbt-postgres 1.9.0](https://github.com/dbt-labs/dbt-postgres/blob/main/CHANGELOG.md#dbt-postgres-190---december-09-2024) * [dbt-redshift 1.9.0](https://github.com/dbt-labs/dbt-redshift/blob/1.9.latest/CHANGELOG.md#dbt-redshift-190---december-09-2024) * [dbt-snowflake 1.9.0](https://github.com/dbt-labs/dbt-snowflake/blob/1.9.latest/CHANGELOG.md#dbt-snowflake-190---december-09-2024) * [dbt-spark 1.9.0](https://github.com/dbt-labs/dbt-spark/blob/1.9.latest/CHANGELOG.md#dbt-spark-190---december-10-2024) * [dbt-synapse 1.8.2](https://github.com/microsoft/dbt-synapse/blob/v1.8.latest/CHANGELOG.md) * [dbt-teradata 1.8.2](https://github.com/Teradata/dbt-teradata/releases/tag/v1.8.2) * [dbt-trino 1.8.5](https://github.com/starburstdata/dbt-trino/blob/master/CHANGELOG.md#dbt-trino-185---december-11-2024) #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### dbt release notes dbt release notes for recent and historical changes. Release notes fall into one of the following categories: * **New:** New products and features * **Enhancement:** Performance improvements and feature enhancements * **Fix:** Bug and security fixes * **Behavior change:** A change to existing behavior that doesn't fit into the other categories, such as feature deprecations or changes to default settings Release notes are grouped by month for both multi-tenant and virtual private cloud (VPC) environments. #### March 2026[​](#march-2026 "Direct link to March 2026") * **Enhancement:** [Deferral](https://docs.getdbt.com/reference/node-selection/defer.md) now supports [user-defined functions (UDFs)](https://docs.getdbt.com/docs/build/udfs.md). When you run a dbt command with `--defer` and `--state`, dbt resolves `function()` calls from the state manifest. This lets you run models that depend on UDFs without first building those UDFs in your current target. * **Fix**: Status messages that exceed the 1024 character limit are now automatically truncated to prevent validation errors and run timeouts. Previously, long status messages could cause runs to fail with unhandled exceptions or result in lost status information. The system now logs when truncation occurs to help identify and optimize verbose status messages. * **Fix:** Resolved an issue where [retrying failed runs](https://docs.getdbt.com/docs/deploy/retry-jobs.md) that were triggered from Git tags would use the wrong commit. Previously, when runs were triggered from Git tags instead of branches, the system would enter a detached HEAD state, causing retries to use the latest commit on HEAD rather than the original tagged commit. The fix now correctly preserves and uses the original Git tag reference when retrying runs, ensuring consistency between the initial run and any retries. * **New**: The [dbt MCP server](https://docs.getdbt.com/docs/dbt-ai/about-mcp.md?version=2.0#product-docs) now includes product docs tools (`search_product_docs` and `get_product_doc_pages`) that let your AI assistant search and fetch pages from docs.getdbt.com in real time. Get responses grounded in the latest official dbt documentation rather than relying on training data or web searches, so you can stay in your development flow and trust the answers. This allows you to stay in your development flow and trust. These tools are enabled by default with no additional configuration. Restart your MCP server if you don't see the product docs tools in your MCP config. For more information, see [the dbt MCP repo](https://github.com/dbt-labs/dbt-mcp?tab=readme-ov-file#product-docs). * **Enhancement**: The Model Timing tab displays an informative banner for dbt Fusion engine runs instead of the timing chart. The banner explains "Model timing is not yet available for Fusion runs" and provides context about threading differences. Non-Fusion runs continue to show the timing chart normally. * **Behavior change**: [Snowflake plans to increase](https://docs.snowflake.com/en/release-notes/bcr-bundles/un-bundled/bcr-2118) the default column size for string and binary data types in May 2026. `dbt-snowflake` versions below v1.10.6 may fail to build certain incremental models when this change is deployed. [Assess impact and take any required actions](https://docs.getdbt.com/reference/resource-configs/snowflake-configs.md#assess-impact-and-required-actions). * **New**: The new Semantic Layer YAML specification is now available on the dbt platform **Latest** release track. For an overview of the changes and steps how to migrate to the latest YAML spec, see [Migrate to the latest YAML spec](https://docs.getdbt.com/docs/build/latest-metrics-spec.md). #### February 2026[​](#february-2026 "Direct link to February 2026") * **New**: Advanced CI (dbt compare in orchestration) is now supported in the dbt Fusion engine. For more information, see [Advanced CI](https://docs.getdbt.com/docs/deploy/advanced-ci.md). * **Beta**: The `dbt-salesforce` adapter available in the dbt Fusion engine CLI is now in beta. For more information, see [Salesforce Data 360 setup](https://docs.getdbt.com/docs/fusion/connect-data-platform-fusion/salesforce-data-cloud-setup.md). * **Enhancement:** The Analyst permission now has the project-level access to read repositories. See [Project access for project permissions](https://docs.getdbt.com/docs/cloud/manage-access/enterprise-permissions.md#project-access-for-project-permissions) for more information. * **Enhancement:** After a user accepts an email [invite](https://docs.getdbt.com/docs/cloud/manage-access/invite-users.md) to access an [SSO-protected](https://docs.getdbt.com/docs/cloud/manage-access/sso-overview.md) dbt platform account, the UI now prompts them to log in with SSO to complete the process. This replaces the previous "Joined successfully" message, helping avoid confusion when users accept an invite but do not complete the SSO login flow. * **New:** [Profiles](https://docs.getdbt.com/docs/cloud/about-profiles.md) let you define and manage connections, credentials, and attributes for deployment environments at the project level. dbt automatically creates profiles for existing projects and environments based on the current configurations, so you don't need to take any action. This is being rolled out in phases during the coming weeks. * **New**: [Python UDFs](https://docs.getdbt.com/docs/build/udfs.md) are now supported and available in dbt Fusion engine when using Snowflake or BigQuery. * **Enhancement:** Minor enhancements and UI updates to the Studio IDE, file explorer that replicate the VS Code IDE experience. * **Enhancement:** Profile creation now displays specific validation error messages (such as "Profile keys cannot contain spaces or special characters") instead of generic error text, making it easier to identify and fix configuration issues. * **Private beta**: [Cost Insights](https://docs.getdbt.com/docs/explore/cost-insights.md) shows estimated warehouse compute costs and run times for your dbt projects and models, directly in the dbt platform. It highlights cost reductions and efficiency gains from optimizations like [state-aware orchestration](https://docs.getdbt.com/docs/deploy/state-aware-about.md) across your project dashboard, model pages, and job details. See [Set up Cost Insights](https://docs.getdbt.com/docs/explore/set-up-cost-insights.md) and [Explore cost data](https://docs.getdbt.com/docs/explore/explore-cost-data.md) to learn more. * **New**: The [dbt Semantic Layer](https://docs.getdbt.com/docs/use-dbt-semantic-layer/dbt-sl.md) now supports [Omni](https://docs.omni.co/integrations/dbt/semantic-layer) as a partner integration. For more info, see [Available integrations](https://docs.getdbt.com/docs/cloud-integrations/avail-sl-integrations.md). * **Enhancement**: We clarified documentation for cumulative log size limits on run endpoints, originally introduced in [October 2025](https://docs.getdbt.com/docs/dbt-versions/2025-release-notes.md#october-2025). When logs exceed the cumulative size limit, dbt omits them and displays a banner. No functional changes were made in February 2026. For more information, see [Run visibility](https://docs.getdbt.com/docs/deploy/run-visibility.md#log-size-limits). * **New**: The `immutable_where` configuration is now supported for Snowflake dynamic tables. For more information, see [Snowflake configurations](https://docs.getdbt.com/reference/resource-configs/snowflake-configs.md#immutable-where). * **Fix**: The user invite details now show more information in invite status, giving admins visibility into users who accepted an invite to an SSO-protected account but haven't yet logged in via SSO. Previously, these invites were hidden, making it appear as if the user hadn't been invited. The Invites endpoints of the dbt platform Admin v2 API now include these additional statuses: * `4` (PENDINGEMAIL\_VERIFICATION) * `5` (EMAIL\_VERIFIED\_SSO). * **Enhancement**: Improved performance on Runs endpoint for Admin V2 API and run details in dbt platform when connecting with GCP. #### January 2026[​](#january-2026 "Direct link to January 2026") * **Enhancement:** The `defer-env-id` setting for choosing which deployment environment to defer to is [now available](https://docs.getdbt.com/docs/cloud/about-cloud-develop-defer.md#defer-environment) in the Studio IDE. Previously, this configuration only worked for the dbt CLI * **Beta:** The [Analyst agent](https://docs.getdbt.com/docs/explore/navigate-dbt-insights.md#dbt-copilot) in dbt Insights is now in beta. * dbt Copilot's AI assistant in Insights now uses a dropdown menu to select between **Agent** and **Generate SQL**, replacing the previous tab interface. * **Enhancement:** The [Studio IDE](https://docs.getdbt.com/docs/cloud/studio-ide/ide-user-interface.md#search-your-project) now includes search and replace functionality and a command palette, enabling you to quickly find and replace text across your project, navigate files, jump to symbols, and run IDE configuration commands. This feature is being rolled out in phases and will become available to all dbt platform accounts by mid-February. * **Enhancement:** [State-aware orchestration](https://docs.getdbt.com/docs/deploy/state-aware-about.md) improvements: * When a model fails a data test, state-aware orchestration rebuilds it on subsequent runs instead of reusing it from prior state to ensure dbt reevaluates data quality issues. * State-aware orchestration now detects and rebuilds models whose tables are deleted from the warehouse, even when there are no code or data changes. Previously, tables deleted externally were not detected, and therefore not rebuilt, unless code or data had changed. For more information, see [Handling deleted tables](https://docs.getdbt.com/docs/deploy/state-aware-about.md#handling-deleted-tables). State-aware orchestration is in private preview. See the [prerequisites for using the feature](https://docs.getdbt.com/docs/deploy/state-aware-setup.md#prerequisites). * **Enhancement:** [dbt Copilot](https://docs.getdbt.com/docs/cloud/dbt-copilot.md) correctly detects column names across various `schema.yml` files, adds only missing descriptions, and preserves existing ones. * **Enhancement**: The Fusion CLI now automatically reads environment variables from a `.env` file in your current working directory (the folder you `cd` into and run dbt commands from in your terminal), if one exists. This provides a simple way to manage credentials and configuration without hardcoding them in your `profiles.yml`. The [dbt VS Code extension](https://docs.getdbt.com/docs/about-dbt-extension.md) also supports `.env` files as well as LSP-powered features. For more information, refer to [Install Fusion CLI](https://docs.getdbt.com/docs/local/install-dbt.md?version=2#get-started#environment-variables). * **New**: The new Semantic Layer YAML specification creates an open standard for defining metrics and dimensions that works across multiple platforms. The new spec is now live in the dbt Fusion engine. Key changes: * Semantic models are now embedded within model YAML entries. This removes the need to manage YAML entries across multiple files. * Measures are now simple metrics. * Frequently used options are now top-level keys, reducing YAML nesting depth. For an overview of the changes and steps how to migrate to the latest YAML spec, see [Migrate to the latest YAML spec](https://docs.getdbt.com/docs/build/latest-metrics-spec.md). * **Fix:** Debug logs in the **Run summary** tab are now properly truncated to improve performance and user interface responsiveness. Previously, debug logs were not truncated correctly, causing slower page loads. You can access the full debug logs by clicking **Download > Download all debug logs**. For more information, see [Run visibility](https://docs.getdbt.com/docs/deploy/run-visibility.md#run-summary-tab). * **New:** The [Semantic Layer querying](https://docs.getdbt.com/docs/explore/navigate-dbt-insights.md#semantic-layer-querying) within dbt Insights is now generally available (GA), enabling you to build SQL queries against the Semantic Layer without writing SQL code. * **Enhancement**: Eligible dbt platform accounts in the Fusion private preview can now use [Exposures](https://docs.getdbt.com/docs/cloud-integrations/downstream-exposures.md). #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### dbt Semantic Layer architecture The Semantic Layer allows you to define metrics and use various interfaces to query them. The Semantic Layer does the heavy lifting to find where the queried data exists in your data platform and generates the SQL to make the request (including performing joins). [![This diagram shows how the dbt Semantic Layer works with your data stack.](/img/docs/dbt-cloud/semantic-layer/sl-concept.png?v=2 "This diagram shows how the dbt Semantic Layer works with your data stack.")](#)This diagram shows how the dbt Semantic Layer works with your data stack. [![The diagram displays how your data flows using the dbt Semantic Layer and the variety of integration tools it supports.](/img/docs/dbt-cloud/semantic-layer/sl-architecture.jpg?v=2 "The diagram displays how your data flows using the dbt Semantic Layer and the variety of integration tools it supports.")](#)The diagram displays how your data flows using the dbt Semantic Layer and the variety of integration tools it supports. #### Components[​](#components "Direct link to Components") The Semantic Layer includes the following components: | Components | Information | dbt Core users | Developer plans | Starter plans | Enterprise-tier plans | License | | ----------------------------------------------------------------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | -------------- | --------------- | ------------- | --------------------- | ------------------------------------------------------------------------------------------- | | **[MetricFlow](https://docs.getdbt.com/docs/build/about-metricflow.md)** | MetricFlow in dbt allows users to centrally define their semantic models and metrics with YAML specifications. | ✅ | ✅ | ✅ | ✅ | [Apache 2.0 license](https://github.com/dbt-labs/metricflow/blob/main/LICENSE) | | **dbt Semantic interfaces** | A configuration spec for defining metrics, dimensions, and how they link to each other. The [dbt-semantic-interfaces](https://github.com/dbt-labs/dbt-semantic-interfaces) is available under Apache 2.0. | ✅ | ✅ | ✅ | ✅ | [Apache 2.0 license](https://github.com/dbt-labs/dbt-semantic-interfaces/blob/main/LICENSE) | | **Service layer** | Coordinates query requests and dispatching the relevant metric query to the target query engine. This is provided through dbt and is available to all users on dbt version 1.6 or later. The service layer includes a Gateway service for executing SQL against the data platform. | ❌ | ❌ | ✅ | ✅ | Proprietary, Cloud (Starter, Enterprise, Enterprise+) | | **[Semantic Layer APIs](https://docs.getdbt.com/docs/dbt-cloud-apis/sl-api-overview.md)** | The interfaces allow users to submit metric queries using GraphQL and JDBC APIs. They also serve as the foundation for building first-class integrations with various tools. | ❌ | ❌ | ✅ | ✅ | Proprietary, Cloud (Starter, Enterprise, Enterprise+) | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | #### Feature comparison[​](#feature-comparison "Direct link to Feature comparison") The following table compares the features available in dbt and source available in dbt Core: | Feature | MetricFlow Source available | Semantic Layer with dbt | | ---------------------------------------------------------------------------------- | --------------------------- | ----------------------- | | Define metrics and semantic models in dbt using the MetricFlow spec | ✅ | ✅ | | Generate SQL from a set of config files | ✅ | ✅ | | Query metrics and dimensions through the command line interface (CLI) | ✅ | ✅ | | Query dimension, entity, and metric metadata through the CLI | ✅ | ✅ | | Query metrics and dimensions through semantic APIs (ADBC, GQL) | ❌ | ✅ | | Connect to downstream integrations (Tableau, Hex, Mode, Google Sheets, and so on.) | ❌ | ✅ | | Create and run Exports to save metrics queries as tables in your data platform. | ❌ | ✅ | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | #### Related docs[​](#related-docs "Direct link to Related docs") * [Semantic Layer FAQs](https://docs.getdbt.com/docs/use-dbt-semantic-layer/sl-faqs.md) #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### dbt Semantic Layer FAQs The [Semantic Layer](https://docs.getdbt.com/docs/use-dbt-semantic-layer/dbt-sl.md) is a dbt offering that allows users to centrally define their metrics within their dbt project using [MetricFlow](https://docs.getdbt.com/docs/build/about-metricflow.md). The Semantic Layer offers: * Dynamic SQL generation to compute metrics * APIs to query metrics and dimensions * First-class [integrations](https://docs.getdbt.com/docs/cloud-integrations/avail-sl-integrations.md) to query those centralized metrics in downstream tools The Semantic Layer is powered by MetricFlow, which is a source-available component. #### Overview of the dbt Semantic Layer[​](#overview-of-the-dbt-semantic-layer "Direct link to Overview of the dbt Semantic Layer")  What are the main benefits of using the dbt Semantic Layer? The primary value of the dbt Semantic Layer is to centralize and bring consistency to your metrics across your organization. Additionally, it allows you to: * **Meet your users where they are** by being agnostic to where your end users consume data through the supporting of different APIs for integrations. * **Optimize costs** by spending less time preparing data for consumption. * **Simplify your code** by not duplicating metric logic and allowing MetricFlow to perform complex calculations for you. * **Empower stakeholders** with rich context and flexible, yet governed experiences. [![This diagram shows how the dbt Semantic Layer works with your data stack.](/img/docs/dbt-cloud/semantic-layer/sl-concept.png?v=2 "This diagram shows how the dbt Semantic Layer works with your data stack.")](#)This diagram shows how the dbt Semantic Layer works with your data stack.  What's the main difference between the dbt Semantic Layer and dbt Metrics? dbt Metrics is the now-deprecated dbt package that was used to define metrics within dbt. dbt Metrics has been replaced with [MetricFlow](https://docs.getdbt.com/docs/build/about-metricflow.md), a more flexible and powerful engine, which powers the foundation of the dbt Semantic Layer today. MetricFlow introduces SQL generation to the dbt Semantic Layer and offers more advanced capabilities than dbt Metrics, for example: * **Query construction** — MetricFlow iteratively constructs queries using a dataflow plan, our internal DAG for generating SQL. By comparison, dbt Metrics relied on templated Jinja to construct SQL. * **Joins** — MetricFlow also has a sophisticated way of handling joins, which dbt Metrics did not support. With MetricFlow you can effortlessly access all valid dimensions for your metrics on the fly, even when they are defined in different semantic models.  Is there a dbt Semantic Layer discussion hub? Yes, absolutely! Join the [dbt Slack community](https://app.slack.com/client/T0VLPD22H) and [#dbt-cloud-semantic-layer](https://getdbt.slack.com/archives/C046L0VTVR6) slack channel for all things related to the dbt Semantic Layer.  How does the dbt Semantic Layer fit with different modeling approaches (Medallion, Data Vault, Dimensional modeling)? The dbt Semantic Layer is flexible enough to work with many common modeling approaches. It references dbt models, which means how you configure your Semantic Layer will mirror the modeling approach you've taken with the underlying data. The primary consideration is the flexibility and performance of the underlying queries. For example: * A star schema data model offers more flexibility for dimensions that are available for a given metric, but will require more joins. * A fully denormalized data model is simpler, will be materialized to a specific grain, but won’t be able to join to other tables. While the dbt Semantic Layer will work for both cases, it's best to allow MetricFlow do handle some level of denormalization for you in order to provide more flexibility to metric consumers.  How is the dbt Semantic Layer priced? The dbt Semantic Layer measures usage in distinct 'Queried Metrics'. Refer to the [Billing](https://docs.getdbt.com/docs/cloud/billing.md#what-counts-as-a-queried-metric) to learn more about pricing. #### Availability[​](#availability "Direct link to Availability")  What data platforms are supported by the dbt Semantic Layer? The dbt Semantic Layer supports the following data platforms: * Snowflake * BigQuery * Databricks * Redshift * Postgres * Trino Support for other data platforms, such as Fabric, isn't available at this time. If you're interested in using the dbt Semantic Layer with a data platform not on the list, please [contact us](https://www.getdbt.com/get-started).  Do I need to be on a specific version of dbt to use dbt Semantic Layer? Yes, the dbt Semantic Layer is compatible with [dbt v1.6 or higher](https://docs.getdbt.com/docs/dbt-versions/upgrade-dbt-version-in-cloud.md).  Does dbt Semantic Layer require a specific dbt plan? Yes, dbt [Starter, Enterprise, or Enterprise+](https://www.getdbt.com/pricing) plan customers can access the dbt Semantic Layer. Certain features like caching and using multiple credentials are available for Enterprise and Enterprise+ plans.  Is there a way to leverage dbt Semantic Layer capabilities in dbt Core? The dbt Semantic Layer is proprietary to dbt, however some components of it are open-source. dbt Core users can use MetricFlow features, like defining metrics in their projects, without a dbt plan. dbt Core users can also query their semantic layer locally using the command line. However, they won't be able to use the [APIs](https://docs.getdbt.com/docs/dbt-cloud-apis/sl-api-overview.md) or [available integrations](https://docs.getdbt.com/docs/cloud-integrations/avail-sl-integrations.md) to access metrics dynamically.  Is there a solution or licensing path for an organization that doesn't use dbt for pipelining, but might like to implement the dbt Semantic Layer? If you're interested in the this type of implementation, please reach out to us [here](https://www.getdbt.com/get-started). #### How does the dbt Semantic Layer work?[​](#how-does-the-dbt-semantic-layer-work "Direct link to How does the dbt Semantic Layer work?")  Why is the dbt Semantic Layer better than using tables or dbt models to calculate metrics? You can use tables and dbt models to calculate metrics as an option, but it's a static approach that is rigid and cumbersome to maintain. That’s because metrics are seldom useful on their own: they usually need dimensions, grains, and attributes for business users to analyze (or slice and dice) data effectively. If you create a table with a metric, you’ll need to create numerous other tables derived from that table to show the desired metric cut by the desired dimension or time grain. Mature data models have thousands of dimensions, so you can see how this will quickly result in unnecessary duplication, maintenance, and costs. It's also incredibly hard to predict all the slices of data that a user is going to need ahead of time. With the dbt Semantic Layer, you don’t need to pre-join or build any tables; rather, you can simply add a few lines of code to your semantic model, and that data will only be computed upon request. [![This diagram shows how the dbt Semantic Layer works with your data stack.](/img/docs/dbt-cloud/semantic-layer/sl-concept.png?v=2 "This diagram shows how the dbt Semantic Layer works with your data stack.")](#)This diagram shows how the dbt Semantic Layer works with your data stack.  Do I materialize anything when I define a semantic model? No, you don't. When querying the dbt Semantic Layer through the [Semantic Layer APIs](https://docs.getdbt.com/docs/dbt-cloud-apis/sl-api-overview.md), you're not materializing any data by default. The dbt Semantic Layer dynamically computes the metric using the underlying data tables. Then it returns the output to the end user.  Is the dbt Semantic Layer a physical copy of your data stored on your data warehouse? The dbt Semantic Layer does not store a physical copy of your data. It uses underlying tables to construct or compute the requested output.  How does the Semantic Layer handle data? The dbt Semantic Layer is part of the dbt platform. It allows data teams to define metrics once, centrally, and access them from any integrated analytics tool, ensuring consistent answers across diverse datasets. In providing this service, dbt Labs permits clients to access Semantic Layer metrics. Client data passes through the Semantic Layer on the way back from the data warehouse. dbt Labs handles this in a secure way using encryption and authentication from the client’s data warehouse. In certain cases, such data may be cached on dbt Labs system ephemerally (data is not persistently stored). dbt Labs employees cannot access cached data during normal business operations and must have a business need and/or direct manager approval for access to the underlying infrastructure. Access would only be when necessary for providing a client services and never with the purpose of enriching dbt Labs. No client warehouse data is retained on dbt Labs's systems. We offer a caching solution to optimize query performance. The caching feature uses client data warehouse storage rather than being stored on dbt Labs’s systems. In addition, this feature is activated only through a client opt-in. Therefore, caching is always in client hands and at client discretion  Does our agreement, the Terms of Service (ToS) for dbt, apply to the Semantic Layer? Yes it does.  Where is MetricFlow hosted? How do queries pass through MetricFlow and dbt and back to the end user? MetricFlow is hosted in dbt. Requests from the [Semantic Layer APIs](https://docs.getdbt.com/docs/dbt-cloud-apis/sl-api-overview.md) are routed from our API gateway to MetricFlow, which generates the SQL to compute what's requested by the user. MetricFlow hands the SQL back to our gateway, which then executes it against the data platform.  How do I configure the dbt Semantic Layer? 1. You define [semantic models](https://docs.getdbt.com/docs/build/semantic-models.md) in YAML files that describe your data, including entities (for joins), measures (with aggregation types as a building block to your metrics), and dimensions (to slice and dice your metrics). 2. Then you build your metrics on top of these semantic models. This is all done in `.yml` configurations alongside your dbt models in your projects. 3. Once you've defined your metrics and semantic models, you can [configure the dbt Semantic Layer](https://docs.getdbt.com/docs/use-dbt-semantic-layer/setup-sl.md) in dbt. Read our [dbt Semantic Layer quickstart](https://docs.getdbt.com/guides/sl-snowflake-qs.md) guide for more information.  How does caching work in the dbt Semantic Layer? Beginning in March 2024, the dbt Semantic Layer will offer two layers of caching: * The result cache, which caches query results in the data platform so that subsequent runs of the same query are faster. * A declarative cache which also lives in your data platform.  Does the dbt Semantic Layer expect all models to be in normalized format? No, the dbt Semantic Layer is flexible enough to work with many data modeling approaches including Snowflake, Star schemas, Data vaults, or other normalized tables.  How are queries optimized to not scan more data than they should? MetricFlow always tries to generate SQL in the most performant way, while ensuring the metric value is correct. It generates SQL in a way that allows us to add optimizations, like predicate pushdown, to ensure we don’t perform full table scans.  What are the latency considerations of using the dbt Semantic Layer? The latency of query runtimes is low, in the order of milliseconds.  What if different teams have different definitions? If the underlying metric aggregation is different, then these would be different metrics. However, if teams have different definitions because they're using specific filters or dimensions, it's still the same metric. They're just using it in different ways. This can be managed by adjusting how the metric is viewed in downstream tools or setting up [saved queries](https://docs.getdbt.com/docs/build/saved-queries.md) to handle the various permutations of it. #### Build metrics and semantic models[​](#build-metrics-and-semantic-models "Direct link to Build metrics and semantic models")  Can I define my own aggregations? MetricFlow does not currently support custom aggregations on measures. You can find supported aggregation types [here](https://docs.getdbt.com/docs/build/measures.md#aggregation).  How are joins identified in the semantic model? [Joins](https://docs.getdbt.com/docs/build/join-logic.md) are identified through [entities](https://docs.getdbt.com/docs/build/entities.md) defined in a [semantic model](https://docs.getdbt.com/docs/build/semantic-models.md). These are the keys in your dataset. You can specify `foreign`, `unique`, `primary`, or `natural` joins. With multiple semantic models and the entities within them, MetricFlow creates a graph using the semantic models as nodes and the join paths as edges to perform joins automatically. MetricFlow chooses the appropriate join type and avoids fan-out or chasm joins with other tables based on the entity types. You can find supported join types [here](https://docs.getdbt.com/docs/build/join-logic.md#types-of-joins).  What is the benefit of “expr” used in semantic models and metric configurations? Expr (short for “expression”) allows you to put any arbitrary SQL supported by your data platform in any definition of a measure, entity, or dimension. This is useful if you want the object name in the semantic model to be different than what it’s called in the database. Or if you want to include logic in the definition of the component you're creating. The MetricFlow spec is deliberately opinionated, and we offer “expr” as an escape hatch to allow developers to be more expressive.  Do you support semi-additive metrics? Yes, we approach this by specifying a [dimension](https://docs.getdbt.com/docs/build/dimensions.md) that a metric cannot be aggregated across (such as `time`). You can learn how to configure semi-additive dimensions [here](https://docs.getdbt.com/docs/build/measures.md#non-additive-dimensions).  Can I use an entity as a dimension? Yes, while [entities](https://docs.getdbt.com/docs/build/entities.md) must be defined under “entities,” they can be queried like dimensions in downstream tools. Additionally, if the entity isn't used to perform joins across your semantic models, you may optionally define it as a dimension.  Can I test my semantic models and metrics? Yes! You can validate your semantic nodes (semantic models, metrics, saved queries) in a few ways: * [Query and validate you metrics](https://docs.getdbt.com/docs/build/metricflow-commands.md) in your development tool before submitting your code changes. * [Validate semantic nodes in CI](https://docs.getdbt.com/docs/deploy/ci-jobs.md#semantic-validations-in-ci) to ensure code changes made to dbt models don't break these metrics. #### Available integrations[​](#available-integrations "Direct link to Available integrations")  What integrations are supported today? There are a number of data applications that have integrations with the dbt Semantic Layer, including Tableau, Google Sheets, Hex, and Mode, among others. Refer to [Available integrations](https://docs.getdbt.com/docs/cloud-integrations/avail-sl-integrations.md) for more information.  How can I benefit from using the dbt Semantic Layer if my visualization tool is not currently supported? You can use [exports](https://docs.getdbt.com/docs/use-dbt-semantic-layer/exports.md) to materialize your metrics into a table or view in your data platform. From there, you can connect your visualization tool to your data platform. Although this approach doesn't provide the dynamic benefits of the dbt Semantic Layer, you still benefit from centralized metrics and from using MetricFlow configurations to define, generate, and compute SQL for your metrics.  Why should I use exports as opposed to defining a view within my data platform? Creating an [export](https://docs.getdbt.com/docs/use-dbt-semantic-layer/exports.md) allows you to bring your governed metric definitions into your data platform as a table or view. This means your metric logic is managed centrally in dbt, instead of as a view in your data platform and ensures that metric values remain consistent across all interfaces.  Can metric descriptions be viewed from third-party tools? Yes, all of our interfaces or APIs expose metric descriptions, which you can surface in downstream tools. #### Permissions and access[​](#permissions-and-access "Direct link to Permissions and access")  How do fine-grained access controls work with the dbt Semantic Layer? The dbt Semantic Layer uses service or personal tokens for authentication. [Service tokens](https://docs.getdbt.com/docs/dbt-cloud-apis/service-tokens.md) are mapped to underlying data platform credentials. These credentials control physical access to the raw data. The credential configuration allows admins to create a credential and map it to service tokens, which can then be shared to relevant teams for BI connection setup. You can configure credentials and service tokens to reflect your teams and their roles. Personal access tokens [(PATs)](https://docs.getdbt.com/docs/dbt-cloud-apis/user-tokens.md) enable user-level authentication. When you use PATs to authenticate, your personal development credentials are used when running queries against the Semantic Layer. Currently, the credentials you configure when setting up the dbt Semantic Layer are used for every request. Any physical access policies you have tied to your credentials will be respected. #### Implementation[​](#implementation "Direct link to Implementation")  How can I implement dbt Mesh with the dbt Semantic Layer When using the dbt Semantic Layer in a [dbt Mesh](https://docs.getdbt.com/best-practices/how-we-mesh/mesh-1-intro.md) setting, we recommend the following: * You have one standalone project that contains your semantic models and metrics. * Then as you build your Semantic Layer, you can [cross-reference dbt models](https://docs.getdbt.com/docs/mesh/govern/project-dependencies.md) across your various projects or packages to create your semantic models using the [two-argument `ref` function](https://docs.getdbt.com/reference/dbt-jinja-functions/ref.md#ref-project-specific-models)( `ref('project_name', 'model_name')`). * Your dbt Semantic Layer project serves as a global source of truth across the rest of your projects. ###### Usage example[​](#usage-example "Direct link to Usage example") For example, let's say you have a public model (`fct_orders`) that lives in the `jaffle_finance` project. As you build your semantic model, use the following syntax to ref the model: Notice that in the `model` parameter, we're using the `ref` function with two arguments to reference the public model `fct_orders` defined in the `jaffle_finance` project.
 Which ‘staging layer’ should the dbt Semantic Layer talk to? Raw, staging, or marts? We recommend to build your semantic layer on top of the [marts layer](https://docs.getdbt.com/best-practices/how-we-structure/4-marts.md), which represents the clean and transformed data from your dbt models.  Should semantic layer credentials mirror those for production environments? Or should they be different? Semantic layer credentials are different than the credentials you use to run dbt models. Specifically, we recommend a less privileged set of credentials since consumers are only reading data.  How does the dbt Semantic Layer support a dbt Mesh architecture design? Currently, semantic models can be created from dbt models that live across projects ([dbt Mesh](https://docs.getdbt.com/best-practices/how-we-mesh/mesh-1-intro.md)). In the future, users will also be able to use mesh concepts on semantic objects and define metrics across dbt projects. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### dbt Semantic Layer StarterEnterpriseEnterprise + ### dbt Semantic Layer [Starter](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing")[Enterprise](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing")[Enterprise +](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing") The dbt Semantic Layer eliminates duplicate coding by allowing data teams to define metrics on top of existing models and automatically handling data joins. The dbt Semantic Layer, powered by [MetricFlow](https://docs.getdbt.com/docs/build/about-metricflow.md), simplifies the process of defining and using critical business metrics, like `revenue` in the modeling layer (your dbt project). By centralizing metric definitions, data teams can ensure consistent self-service access to these metrics in downstream data tools and applications. Moving metric definitions out of the BI layer and into the modeling layer allows data teams to feel confident that different business units are working from the same metric definitions, regardless of their tool of choice. If a metric definition changes in dbt, it’s refreshed everywhere it’s invoked and creates consistency across all applications. To ensure secure access control, the Semantic Layer implements robust [access permissions](https://docs.getdbt.com/docs/use-dbt-semantic-layer/setup-sl.md#set-up-dbt-semantic-layer) mechanisms. Refer to the [Semantic Layer FAQs](https://docs.getdbt.com/docs/use-dbt-semantic-layer/sl-faqs.md) or [Why we need a universal semantic layer](https://www.getdbt.com/blog/universal-semantic-layer/) blog post to learn more. [YouTube video player](https://www.youtube.com/embed/DS7Ub_CmBR0?si=m92hLmxw1VuE6KKO) #### Get started with the dbt Semantic Layer[​](#get-started-with-the-dbt-semantic-layer "Direct link to Get started with the dbt Semantic Layer") To define and query metrics with the dbt Semantic Layer , you must be on a [dbt Starter or Enterprise-tier](https://www.getdbt.com/pricing/) account. [](https://docs.getdbt.com/docs/cloud/about-cloud/access-regions-ip-addresses)Suitable for both Multi-tenant and Single-tenant accounts. Note: Single-tenant accounts should contact their account representative for necessary setup and enablement.

This page points to various resources available to help you understand, configure, deploy, and integrate the Semantic Layer. The following sections contain links to specific pages that explain each aspect in detail. Use these links to navigate directly to the information you need, whether you're setting up the Semantic Layer for the first time, deploying metrics, or integrating with downstream tools. Refer to the following resources to get started with the Semantic Layer: * [Quickstart with the Semantic Layer](https://docs.getdbt.com/guides/sl-snowflake-qs.md) — Build and define metrics, set up the Semantic Layer, and query them using our first-class integrations. * [Build your metrics](https://docs.getdbt.com/docs/build/build-metrics-intro.md) — Use MetricFlow in dbt to centrally define your metrics. * [Semantic Layer FAQs](https://docs.getdbt.com/docs/use-dbt-semantic-layer/sl-faqs.md) — Discover answers to frequently asked questions about the Semantic Layer, such as availability, integrations, and more. #### Configure the dbt Semantic Layer[​](#configure-the-dbt-semantic-layer "Direct link to Configure the dbt Semantic Layer") The following resources provide information on how to configure the Semantic Layer: * [Administer the Semantic Layer](https://docs.getdbt.com/docs/use-dbt-semantic-layer/setup-sl.md) — Seamlessly set up the credentials and tokens to start querying the Semantic Layer. * [Architecture](https://docs.getdbt.com/docs/use-dbt-semantic-layer/sl-architecture.md) — Explore the powerful components that make up the Semantic Layer. #### Deploy metrics[​](#deploy-metrics "Direct link to Deploy metrics") This section provides information on how to deploy the Semantic Layer and materialize your metrics: * [Deploy your Semantic Layer](https://docs.getdbt.com/docs/use-dbt-semantic-layer/deploy-sl.md) — Run a dbt job to deploy the Semantic Layer and materialize your metrics. * [Write queries with exports](https://docs.getdbt.com/docs/use-dbt-semantic-layer/exports.md) — Use exports to write commonly used queries directly within your data platform, on a schedule. * [Cache common queries](https://docs.getdbt.com/docs/use-dbt-semantic-layer/sl-cache.md) — Leverage result caching and declarative caching for common queries to speed up performance and reduce query computation. #### Consume metrics and integrate[​](#consume-metrics-and-integrate "Direct link to Consume metrics and integrate") Consume metrics and integrate the Semantic Layer with downstream tools and applications: * [Consume metrics](https://docs.getdbt.com/docs/use-dbt-semantic-layer/consume-metrics.md) — Query and consume metrics in downstream tools and applications using the Semantic Layer. * [Available integrations](https://docs.getdbt.com/docs/cloud-integrations/avail-sl-integrations.md) — Review a wide range of partners you can integrate and query with the Semantic Layer. * [Semantic Layer APIs](https://docs.getdbt.com/docs/dbt-cloud-apis/sl-api-overview.md) — Use the Semantic Layer APIs to query metrics in downstream tools for consistent, reliable data metrics. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### dbt support Support for dbt is available to all users through the following channels: * Dedicated dbt Support team (dbt users). * [The Community Forum](https://discourse.getdbt.com/). * [dbt Community slack](https://www.getdbt.com/community/join-the-community/). #### dbt Core support[​](#dbt-core-support "Direct link to dbt Core support") If you're developing on the command line (CLI) and have questions or need some help — reach out to the helpful dbt community through [the Community Forum](https://discourse.getdbt.com/) or [dbt Community slack](https://www.getdbt.com/community/join-the-community/). #### dbt platform support[​](#dbt-platform-support "Direct link to dbt platform support") The global dbt Support team is available to dbt customers by [email](mailto:support@getdbt.com) or by clicking **Create a support ticket** through the dbt navigation. ##### Create a support ticket[​](#create-a-support-ticket "Direct link to Create a support ticket") To create a support ticket in dbt: 1. In the dbt navigation, click on **Help & Guides**. 2. Click **Create a support ticket**. 3. Fill out the form and click **Create Ticket**. 4. A dbt Support team member will respond to your ticket through email. [![Create a support ticket in dbt](/img/create-support-ticket.gif?v=2 "Create a support ticket in dbt")](#)Create a support ticket in dbt ##### Ask dbt Support Assistant[​](#ask-dbt-support-assistant "Direct link to Ask dbt Support Assistant") dbt Support Assistant is an AI widget that provides instant, AI-generated responses to common questions. This feature is available to dbt users and can help answer troubleshooting questions, give a synopsis of features and functionality, or link to relevant documentation. The dbt Support Assistant AI widget is separate from [Copilot](https://docs.getdbt.com/docs/cloud/dbt-copilot.md), a powerful AI engine that helps with code generation to accelerate your analytics workflows. The dbt Support Assistant focuses on answering documentation and troubleshooting-related questions. Enabling or disabling AI features in dbt won't affect the dbt Support Assistant's availability. info We recommend validating information received in AI responses for any scenario using our documentation. Please [contact support](mailto:support@getdbt.com) to report incorrect information provided by the Support Assistant. ##### Support plans and resources[​](#support-plans-and-resources "Direct link to Support plans and resources") We want to help you work through implementing and utilizing dbt platform at your organization. Have a question you can't find an answer to in [our docs](https://docs.getdbt.com/) or [the Community Forum](https://discourse.getdbt.com/)? Our Support team is here to `dbt help` you! * **Enterprise and Enterprise+ plans** — Priority [support](#severity-level-for-enterprise-support), optional premium plans, enhanced SLAs, implementation assistance, dedicated management, and dbt Labs security reviews depending on price point. * **Developer and Starter plans** — 24x5 support (no service level agreement (SLA); [contact Sales](https://www.getdbt.com/pricing/) for Enterprise plan inquiries). * **Support team help** — Assistance with [common dbt questions](https://docs.getdbt.com/category/troubleshooting.md), like project setup, login issues, error understanding, setup private packages, link to a new GitHub account, [how to generate a har file](https://docs.getdbt.com/faqs/Troubleshooting/generate-har-file.md), and so on. * **Resource guide** — Check the [guide](https://docs.getdbt.com/community/resources/getting-help.md) for effective help-seeking strategies. Example of common support questions Types of dbt cloud-based platform related questions our Support team can assist you with, regardless of your dbt plan:

**How do I...**
* set up a dbt project?
* set up a private package in dbt?
* configure custom branches on git repos?
* link dbt to a new GitHub account?

**Help! I can't...**
* log in.
* access logs.
* update user groups.

**I need help understanding...**
* why this run failed.
* why I am getting this error message in dbt?
* why my CI jobs are not kicking off as expected.
#### dbt Enterprise accounts[​](#dbt-enterprise-accounts "Direct link to dbt Enterprise accounts") Basic assistance with dbt project troubleshooting. Help with errors and issues in macros, models, and dbt Labs' packages. For strategic advice, best practices, or expansion conversations, consult your Account team. For customers on a dbt Enterprise-tier plan, we **also** offer basic assistance in troubleshooting issues with your dbt project: * **Something isn't working the way I would expect it to...** * in a macro I created... * in an incremental model I'm building... * in one of dbt Labs' packages like dbt\_utils or audit\_helper... * **I need help understanding and troubleshooting this error...** * `Server error: Compilation Error in rpc request (from remote system) 'dbt_utils' is undefined` * `SQL compilation error: syntax error line 1 at position 38 unexpected ''.` * `Compilation Error Error reading name_of_folder/name_of_file.yml - Runtime Error Syntax error near line 9` Types of questions you should ask your Account team: * How should we think about setting up our dbt projects, environments, and jobs based on our company structure and needs? * I want to expand my account! How do I add more people and train them? * Here is our data road map for the next year - can we talk through how dbt fits into it and what features we may not be utilizing that can help us achieve our goals? * It is time for our contract renewal, what options do I have? ##### Severity level for Enterprise support[​](#severity-level-for-enterprise-support "Direct link to Severity level for Enterprise support") Support tickets are assigned a severity level based on the impact of the issue on your business. The severity level is assigned by dbt Labs, and the level assigned determines the priority level of support you will receive. For specific ticket response time or other questions that relate to your Enterprise or Enterprise+ account’s SLA, please refer to your Enterprise contract. | Severity Level | Description | | ---------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | Severity Level 1 | Any Error which makes the use or continued use of the Subscription or material features impossible; Subscription is not operational, with no alternative available. | | Severity Level 2 | Feature failure, without a workaround, but Subscription is operational. | | Severity Level 3 | Feature failure, but a workaround exists. | | Severity Level 4 | Error with low-to-no impact on Client’s access to or use of the Subscription, or Client has a general question or feature enhancement request. | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | #### Leave feedback[​](#leave-feedback "Direct link to Leave feedback") Leave feedback or submit a feature request for dbt or dbt Core. ###### Share feedback or feature request for the dbt platform[​](#share-feedback-or-feature-request-for-the-dbt-platform "Direct link to Share feedback or feature request for the dbt platform") 1. In the dbt navigation, click **Leave feedback**. 2. In the **Leave feedback** pop up, fill out the form. 3. Upload any relevant files to the feedback form (optional). 4. Confirm if you'd like dbt Labs to contact you about the feedback (optional). 5. Click **Send Feedback**. [![Leave feedback in dbt](/img/docs/leave-feedback.gif?v=2 "Leave feedback in dbt")](#)Leave feedback in dbt ###### Share feedback or feature request for dbt Core[​](#share-feedback-or-feature-request-for-dbt-core "Direct link to Share feedback or feature request for dbt Core") * [Create a GitHub issue here](https://github.com/dbt-labs/dbt-core/issues). #### External help[​](#external-help "Direct link to External help") For SQL writing, project performance review, or project building, refer to dbt Preferred Consulting Providers and dbt Labs' Services. For help writing SQL, reviewing the overall performance of your project, or want someone to actually help build your dbt project, refer to the following pages: * List of [dbt Consulting Partners](https://www.getdbt.com/partner-directory). * dbt Labs' [Services](https://www.getdbt.com/dbt-labs/services/). #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### dbt tips and tricks Use this page for valuable insights and practical advice to enhance your dbt experience. Whether you're new to dbt or an experienced user, these tips are designed to help you work more efficiently and effectively. The following tips are organized into the following categories: * [Package tips](#package-tips) to help you streamline your workflow. * [Advanced tips and techniques](#advanced-tips-and-techniques) to help you get the most out of dbt. If you're developing with the Studio IDE, you can refer to the [keyboard shortcuts](https://docs.getdbt.com/docs/cloud/studio-ide/keyboard-shortcuts.md) page to help make development more productive and easier for everyone. #### YAML tips[​](#yaml-tips "Direct link to YAML tips") This section clarifies where you can use [Jinja](https://docs.getdbt.com/docs/build/jinja-macros.md), nest [vars](https://docs.getdbt.com/reference/dbt-jinja-functions/var.md) and [`env_var`](https://docs.getdbt.com/reference/dbt-jinja-functions/env_var.md) in your YAML files. * You can use Jinja in almost every YAML file in dbt *except* the [`dependencies.yml` file](https://docs.getdbt.com/docs/build/packages.md#use-cases). This is because the `dependencies.yml` file doesn't support Jinja. * Use `vars` in any YAML file that supports Jinja (like `schema.yml`, `snapshots.yml`). However, note that: * In `dbt_project.yml`, `packages.yml`, and `profiles.yml` files, you must pass `vars` through the CLI using `--vars`, not defined inside the `vars:` block in the YAML file. This is because these files are parsed before Jinja is rendered. * You can use `env_var()` in all YAML files that support Jinja. Only `profiles.yml` and `packages.yml` support environment variables for secure values (using the `DBT_ENV_SECRET_` prefix). These are masked in logs and intended for credentials or secrets. For additional information, check out [dbt Core's context docs](https://github.com/dbt-labs/dbt-core/blob/main/core/dbt/context/README.md). #### Package tips[​](#package-tips "Direct link to Package tips") Leverage these dbt packages to streamline your workflow: | Package | Description | | --------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------- | | [`dbt_codegen`](https://hub.getdbt.com/dbt-labs/codegen/latest/) | Use the package to help you generate YML files for your models and sources and SQL files for your staging models. | | [`dbt_utils`](https://hub.getdbt.com/dbt-labs/dbt_utils/latest/) | The package contains macros useful for daily development. For example, `date_spine` generates a table with all dates between the ones provided as parameters. | | [`dbt_project_evaluator`](https://hub.getdbt.com/dbt-labs/dbt_project_evaluator/latest) | The package compares your dbt project against a list of our best practices and provides suggestions and guidelines on how to update your models. | | [`dbt_expectations`](https://hub.getdbt.com/metaplane/dbt_expectations/latest/) | The package contains many tests beyond those built into dbt. | | [`dbt_audit_helper`](https://hub.getdbt.com/#:~:text=adwords-,audit_helper,-codegen) | The package lets you compare the output of 2 queries. Use it when refactoring existing logic to ensure that the new results are identical. | | [`dbt_artifacts`](https://hub.getdbt.com/brooklyn-data/dbt_artifacts/latest) | The package saves information about your dbt runs directly to your data platform so that you can track the performance of models over time. | | [`dbt_meta_testing`](https://hub.getdbt.com/tnightengale/dbt_meta_testing/latest) | This package checks that your dbt project is sufficiently tested and documented. | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | #### Advanced tips and techniques[​](#advanced-tips-and-techniques "Direct link to Advanced tips and techniques") * Use your folder structure as your primary selector method. `dbt build --select marts.marketing` is simpler and more resilient than relying on tagging every model. * Think about jobs in terms of build cadences and SLAs. Run models that have hourly, daily, or weekly build cadences together. * Use the [where config](https://docs.getdbt.com/reference/resource-configs/where.md) for tests to test an assertion on a subset of records. * [store\_failures](https://docs.getdbt.com/reference/resource-configs/store_failures.md) lets you examine records that cause tests to fail, so you can either repair the data or change the test as needed. * Use [severity](https://docs.getdbt.com/reference/resource-configs/severity.md) thresholds to set an acceptable number of failures for a test. * Use [incremental\_strategy](https://docs.getdbt.com/docs/build/incremental-strategy.md) in your incremental model config to implement the most effective behavior depending on the volume of your data and reliability of your unique keys. * Set `vars` in your `dbt_project.yml` to define global defaults for certain conditions, which you can then override using the `--vars` flag in your commands. * Use [for loops](https://docs.getdbt.com/guides/using-jinja.md?step=3) in Jinja to DRY up repetitive logic, such as selecting a series of columns that all require the same transformations and naming patterns to be applied. * Instead of relying on post-hooks, use the [grants config](https://docs.getdbt.com/reference/resource-configs/grants.md) to apply permission grants in the warehouse resiliently. * Define [source-freshness](https://docs.getdbt.com/docs/build/sources.md#source-data-freshness) thresholds on your sources to avoid running transformations on data that has already been processed. * Use the `+` operator on the left of a model `dbt build --select +model_name` to run a model and all of its upstream dependencies. Use the `+` operator on the right of the model `dbt build --select model_name+` to run a model and everything downstream that depends on it. * Use `dir_name` to run all models in a package or directory. * Use the `@` operator on the left of a model in a non-state-aware CI setup to test it. This operator runs all of a selection’s parents and children, and also runs the parents of its children, which in a fresh CI schema will likely not exist yet. * Use the [--exclude flag](https://docs.getdbt.com/reference/node-selection/exclude.md) to remove a subset of models out of a selection. * Use the [--full-refresh](https://docs.getdbt.com/reference/commands/run.md#refresh-incremental-models) flag to rebuild an incremental model from scratch. * Use [seeds](https://docs.getdbt.com/docs/build/seeds.md) to create manual lookup tables, like zip codes to states or marketing UTMs to campaigns. `dbt seed` will build these from CSVs into your warehouse and make them `ref` able in your models. * Use [target.name](https://docs.getdbt.com/docs/build/custom-schemas.md#an-alternative-pattern-for-generating-schema-names) to pivot logic based on what environment you’re using. For example, to build into a single development schema while developing, but use multiple schemas in production. #### Related docs[​](#related-docs "Direct link to Related docs") * [Quickstart guide](https://docs.getdbt.com/guides.md) * [About dbt](https://docs.getdbt.com/docs/cloud/about-cloud/dbt-cloud-features.md) * [Develop in the Cloud](https://docs.getdbt.com/docs/cloud/about-develop-dbt.md) #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### dbt VS Code extension features Preview ### dbt VS Code extension features [Preview](https://docs.getdbt.com/docs/dbt-versions/product-lifecycles "Go to https://docs.getdbt.com/docs/dbt-versions/product-lifecycles") The dbt VS Code extension is backed by the speed and power of the dbt Fusion engine and a dynamic Language Server Protocol (LSP) that enables enhanced workflows, faster development, and easy navigation. The following extension features help you get more done, fast: * **[Live error detection](#live-error-detection):** Automatically validate your SQL code to detect errors and surface warnings, without hitting the warehouse. This includes both dbt errors (like invalid `ref`) and SQL errors (like invalid column name or SQL syntax). * **[Lightning-fast parse times](#lightning-fast-parse-times):** Parse even the largest projects up to 30x faster than dbt Core. * **[Powerful IntelliSense](#powerful-intellisense):** Autocomplete SQL functions, model names, macros, and more. * **[Instant refactoring](#instant-refactoring):** Rename models or columns and see references update project-wide. * **[Go-to-definition](#go-to-definition-and-reference):** Jump to the definition of any `ref`, macro, model, or column with a single click. Particularly useful in large projects with many models and macros. Excludes definitions from installed packages. * **[Hover insights](#hover-insights):** See context on tables, columns, and functions without leaving your code. Simply hover over any SQL element to see details like column names and data types. * **[Live CTE previews](#live-preview-for-models-and-ctes):** Preview a CTE’s output directly from inside your dbt model for faster validation and debugging. * **[Rich lineage in context](#rich-lineage-in-context):** See lineage at the column or table level as you develop with no context switching or breaking the flow. * If you use Cursor, the lineage tab works best in Editor mode and doesn't render in Agent mode. If you're in Agent mode and the lineage tab isn't rendering, just switch to Editor mode to view your project's table and column lineage. * **[View compiled code](#view-compiled-code):** Get a live view of the SQL code your models will build alongside your dbt code. * **[Build flexibly](#build-flexibly):** Use the command palette to build models with complex selectors. ##### Live error detection[​](#live-error-detection "Direct link to Live error detection") Automatically validate your SQL code to detect errors and surface warnings without hitting the warehouse. * Displays diagnostics (red squiggles) for: * Syntax errors (missing commas, misspelled keywords, etc). * Invalid / missing column names (for example, `select not_a_column from {{ ref('real_model') }}`). * Missing `group by` clauses, or columns that are neither grouped nor aggregated. * Invalid function names or arguments * Hover over red squiggles to display errors. * Full diagnostic information is available in the “Problems”. [](/img/docs/extension/live-error-detection.mp4) ##### Lightning-fast parse times[​](#lightning-fast-parse-times "Direct link to Lightning-fast parse times") Parse even the largest projects up to 30x faster than with dbt Core. [](/img/docs/extension/zoomzoom.mp4) ##### Powerful IntelliSense[​](#powerful-intellisense "Direct link to Powerful IntelliSense") Autocomplete SQL functions, model names, macros and more. Usage: * Autocomplete `ref`s and `source` calls. For example, type `{{ ref(` or `{{ source(` and you will see a list of available resources and their type complete the function call. Autocomplete doesn’t trigger when replacing existing model names inside parentheses. * Autocomplete dialect-specific function names. [![Example of the VS Code extension IntelliSense](/img/docs/extension/vsce-intellisense.gif?v=2 "Example of the VS Code extension IntelliSense")](#)Example of the VS Code extension IntelliSense ##### Instant refactoring[​](#instant-refactoring "Direct link to Instant refactoring") Renaming models: * Right-click on a file in the file tree and select **Rename**. * After renaming the file, you'll get a prompt asking if you want to make refactoring changes. * Select **OK** to apply the changes, or **Show Preview** to display a preview of refactorings. * After applying your changes, `ref`s should be updated to use the updated model name. Renaming columns: * Right-click on a column alias and select **Rename Symbol**. * After renaming the column, you'll get a prompt asking if you want to make refactoring changes. * Select **OK** to apply the changes, or **Show Preview** to show a preview of refactorings. * After applying your changes, downstream references to the column should be updated to use the new column name. Note: Renaming models and columns is not yet supported for snapshots, or any resources defined in a .yml file. [](/img/docs/extension/refactor.mp4) ##### Go-to-definition and reference[​](#go-to-definition-and-reference "Direct link to Go-to-definition and reference") Jump to the definition of any `ref`, macro, model, or column with a single click. Particularly useful in large projects with many models and macros. Excludes definitions from installed packages. Usage: * Command or Ctrl-click to go to the definition for an identifier. * You can also right-click an identifier or and select **Go to Definition** or **Go to References**. * Supports CTE names, column names, `*`, macro names, and dbt `ref()` and `source()` call. [](/img/docs/extension/go-to-definition.mp4) ##### Hover insights[​](#hover-insights "Direct link to Hover insights") See context on tables, columns, and functions without leaving your code. Simply hover over any SQL element to see details like column names and data types. Usage: * Hover over `*` to see expanded list of columns and their types. * Hover over column name or alias to see its type. [](/img/docs/extension/hover-insights.mp4) ##### Live preview for models and CTEs[​](#live-preview-for-models-and-ctes "Direct link to Live preview for models and CTEs") Preview a CTE’s output, or an entire model, directly from inside your editor for faster validation and debugging. Usage: * Click the **table icon** or use keyboard shortcut `cmd+enter` (macOS) / `ctrl+enter` (Windows/Linux) to preview query results. * Click the **Preview CTE** codelens to preview CTE results. * Results will be displayed in the **Query Results** tab in the bottom panel. * The preview table is sortable and results are stored until the tab is closed. * You can also select a range of SQL to preview the results of a specific SQL snippet. [](/img/docs/extension/preview-cte.mp4) ##### Rich lineage in context[​](#rich-lineage-in-context "Direct link to Rich lineage in context") See lineage at the column or table level as you develop — no context switching or breaking flow. Using the lineage tab in Cursor If you're using the dbt VS Code extension in Cursor, the lineage tab works best in Editor mode and doesn't render in Agent mode. If you're in Agent mode and the lineage tab isn't rendering, just switch to Editor mode to view your project's table and column lineage. View table lineage: * Open the **Lineage** tab in your editor. It will reflect table lineage focused on the currently-open file. * Double-click nodes to open the files in your editor. * The lineage pane updates as you navigate the files in your dbt project. * Right-click on a node to update the DAG, or view column lineage for a node. View column lineage: * Right-click on a filename, or in the SQL contents of a model file. * Select **dbt: View Lineage** --> **Show column lineage**. * Select the column to view lineage for. * Double-click on a node to update the DAG selector. * You can also use column selectors in the lineage window by adding the `column:` prefix and appending the column name. * For example, if you want the lineage for the `AMOUNT` column of your `stg_payments` model, edit the `+model.jaffle_shop.stg_payments+` to `+column:model.jaffle_shop.stg_payments.AMOUNT+`. [](/img/docs/extension/lineage.mp4) ##### View compiled code[​](#view-compiled-code "Direct link to View compiled code") Get a live view of the SQL code your models will build — right alongside your dbt code. Usage: * Click the **code icon** to view compiled code side-by-side with source code. * Compiled code will update as you save your source code. * Clicking on a dbt macro will focus the corresponding compiled code. * Clicking on a compiled code block will focus the corresponding source code. [](/img/docs/extension/compiled-code.mp4) ##### Build flexibly[​](#build-flexibly "Direct link to Build flexibly") Use the command palette to quickly build models using complex selectors. Usage: * Click the **dbt icon** or use keyboard shortcut `cmd+shift+enter` (macOS) / `ctrl+shift+enter` (Windows/Linux) to launch a quickpick menu. * Select a command to run. [](/img/docs/extension/build-flexibly.mp4) #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Deploy dbt Use dbt's capabilities to seamlessly run a dbt job in production or staging environments. Rather than run dbt commands manually from the command line, you can leverage the [dbt's in-app scheduling](https://docs.getdbt.com/docs/deploy/job-scheduler.md) to automate how and when you execute dbt. The dbt platform offers the easiest and most reliable way to run your dbt project in production. Effortlessly promote high quality code from development to production and build fresh data assets that your business intelligence tools and end users query to make business decisions. Deploying with dbt lets you: * Keep production data fresh on a timely basis * Ensure CI and production pipelines are efficient * Identify the root cause of failures in deployment environments * Maintain high-quality code and data in production * Gain visibility into the [health](https://docs.getdbt.com/docs/explore/data-tile.md) of deployment jobs, models, and tests * Uses [exports](https://docs.getdbt.com/docs/use-dbt-semantic-layer/exports.md) to write [saved queries](https://docs.getdbt.com/docs/build/saved-queries.md) in your data platform for reliable and fast metric reporting * [Visualize](https://docs.getdbt.com/docs/cloud-integrations/downstream-exposures-tableau.md) and [orchestrate](https://docs.getdbt.com/docs/cloud-integrations/orchestrate-exposures.md) downstream exposures to understand how models are used in downstream tools and proactively refresh the underlying data sources during scheduled dbt jobs. [Enterprise](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing")[Enterprise +](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing") * Use [dbt's Git repository caching](https://docs.getdbt.com/docs/cloud/account-settings.md#git-repository-caching) to protect against third-party outages and improve job run reliability. [Enterprise](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing")[Enterprise +](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing") * Use [Hybrid projects](https://docs.getdbt.com/docs/deploy/hybrid-projects.md) to upload dbt artifacts into the dbt platform for central visibility, cross-project referencing, and easier collaboration. [Enterprise +](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing") Preview Before continuing, make sure you understand dbt's approach to [deployment environments](https://docs.getdbt.com/docs/deploy/deploy-environments.md). Learn how to use dbt's features to help your team ship timely and quality production data more easily. #### Deploy with dbt[​](#deploy-with-dbt "Direct link to Deploy with dbt") [![](/img/icons/dbt-bit.svg)](https://docs.getdbt.com/docs/deploy/job-scheduler.md) ###### [Job scheduler](https://docs.getdbt.com/docs/deploy/job-scheduler.md) [The job scheduler is the backbone of running jobs in the dbt platform, bringing power and simplicity to building data pipelines in both continuous integration and production environments.](https://docs.getdbt.com/docs/deploy/job-scheduler.md) [![](/img/icons/dbt-bit.svg)](https://docs.getdbt.com/docs/deploy/deploy-jobs.md) ###### [Deploy jobs](https://docs.getdbt.com/docs/deploy/deploy-jobs.md) [Create and schedule jobs for the job scheduler to run.](https://docs.getdbt.com/docs/deploy/deploy-jobs.md)

[Runs on a schedule, by API, or after another job completes.](https://docs.getdbt.com/docs/deploy/deploy-jobs.md) [![](/img/icons/dbt-bit.svg)](https://docs.getdbt.com/docs/deploy/state-aware-about.md) ###### [State-aware orchestration](https://docs.getdbt.com/docs/deploy/state-aware-about.md) [Intelligently determines which models to build by detecting changes in code or data at each job run.](https://docs.getdbt.com/docs/deploy/state-aware-about.md) [![](/img/icons/dbt-bit.svg)](https://docs.getdbt.com/docs/deploy/continuous-integration.md) ###### [Continuous integration](https://docs.getdbt.com/docs/deploy/continuous-integration.md) [Set up CI checks so you can build and test any modified code in a staging environment when you open PRs and push new commits to your dbt repository.](https://docs.getdbt.com/docs/deploy/continuous-integration.md) [![](/img/icons/dbt-bit.svg)](https://docs.getdbt.com/docs/deploy/continuous-deployment.md) ###### [Continuous deployment](https://docs.getdbt.com/docs/deploy/continuous-deployment.md) [Set up merge jobs to ensure the latest code changes are always in production when pull requests are merged to your Git repository.](https://docs.getdbt.com/docs/deploy/continuous-deployment.md) [![](/img/icons/dbt-bit.svg)](https://docs.getdbt.com/docs/deploy/job-commands.md) ###### [Job commands](https://docs.getdbt.com/docs/deploy/job-commands.md) [Configure which dbt commands to execute when running a dbt job.](https://docs.getdbt.com/docs/deploy/job-commands.md)
#### Monitor jobs and alerts[​](#monitor-jobs-and-alerts "Direct link to Monitor jobs and alerts") [![](/img/icons/dbt-bit.svg)](https://docs.getdbt.com/docs/deploy/orchestrate-exposures.md) ###### [Visualize and orchestrate exposures](https://docs.getdbt.com/docs/deploy/orchestrate-exposures.md) [Learn how to use dbt to automatically generate downstream exposures from dashboards and proactively refresh the underlying data sources during scheduled dbt jobs.](https://docs.getdbt.com/docs/deploy/orchestrate-exposures.md) [![](/img/icons/dbt-bit.svg)](https://docs.getdbt.com/docs/deploy/artifacts.md) ###### [Artifacts](https://docs.getdbt.com/docs/deploy/artifacts.md) [dbt generates and saves artifacts for your project, which it uses to power features like creating docs for your project and reporting the freshness of your sources.](https://docs.getdbt.com/docs/deploy/artifacts.md) [![](/img/icons/dbt-bit.svg)](https://docs.getdbt.com/docs/deploy/job-notifications.md) ###### [Job notifications](https://docs.getdbt.com/docs/deploy/job-notifications.md) [Receive email or Slack channel notifications when a job run succeeds, fails, or is canceled so you can respond quickly and begin remediation if necessary.](https://docs.getdbt.com/docs/deploy/job-notifications.md) [![](/img/icons/dbt-bit.svg)](https://docs.getdbt.com/docs/deploy/model-notifications.md) ###### [Model notifications](https://docs.getdbt.com/docs/deploy/model-notifications.md) [Receive email notifications in real time about issues encountered by your models and tests while a job is running.](https://docs.getdbt.com/docs/deploy/model-notifications.md) [![](/img/icons/dbt-bit.svg)](https://docs.getdbt.com/docs/deploy/run-visibility.md) ###### [Run visibility](https://docs.getdbt.com/docs/deploy/run-visibility.md) [View the history of your runs and the model timing dashboard to help identify where improvements can be made to the scheduled jobs.](https://docs.getdbt.com/docs/deploy/run-visibility.md) [![](/img/icons/dbt-bit.svg)](https://docs.getdbt.com/docs/deploy/retry-jobs.md) ###### [Retry jobs](https://docs.getdbt.com/docs/deploy/retry-jobs.md) [Rerun your errored jobs from start or the failure point.](https://docs.getdbt.com/docs/deploy/retry-jobs.md) [![](/img/icons/dbt-bit.svg)](https://docs.getdbt.com/docs/deploy/source-freshness.md) ###### [Source freshness](https://docs.getdbt.com/docs/deploy/source-freshness.md) [Enable snapshots to capture the freshness of your data sources and configure how frequent these snapshots should be taken. This can help you determine whether your source data freshness is meeting your SLAs.](https://docs.getdbt.com/docs/deploy/source-freshness.md) [![](/img/icons/dbt-bit.svg)](https://docs.getdbt.com/docs/deploy/webhooks.md) ###### [Webhooks](https://docs.getdbt.com/docs/deploy/webhooks.md) [Create outbound webhooks to send events about your dbt jobs' statuses to other systems in your organization.](https://docs.getdbt.com/docs/deploy/webhooks.md)
#### Hybrid projects [Enterprise](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing") Preview[​](#hybrid-projects-- "Direct link to hybrid-projects--") [![](/img/icons/dbt-bit.svg)](https://docs.getdbt.com/docs/deploy/hybrid-projects.md) ###### [Hybrid projects](https://docs.getdbt.com/docs/deploy/hybrid-projects.md) [Use Hybrid projects to upload dbt Core artifacts into the dbt platform for central visibility, cross-project referencing, and easier collaboration.](https://docs.getdbt.com/docs/deploy/hybrid-projects.md)
#### Related docs[​](#related-docs "Direct link to Related docs") * [Use exports to materialize saved queries](https://docs.getdbt.com/docs/use-dbt-semantic-layer/exports.md) * [Integrate with other orchestration tools](https://docs.getdbt.com/docs/deploy/deployment-tools.md) #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Deploy jobs You can use deploy jobs to build production data assets. Deploy jobs make it easy to run dbt commands against a project in your cloud data platform, triggered either by schedule or events. Each job run in dbt will have an entry in the job's run history and a detailed run overview, which provides you with: * Job trigger type * Commit SHA * Environment name * Sources and documentation info, if applicable * Job run details, including run timing, [model timing data](https://docs.getdbt.com/docs/deploy/run-visibility.md#model-timing), and [artifacts](https://docs.getdbt.com/docs/deploy/artifacts.md) * Detailed run steps with logs and their run step statuses You can create a deploy job and configure it to run on [scheduled days and times](#schedule-days), enter a [custom cron schedule](#cron-schedule), or [trigger the job after another job completes](#trigger-on-job-completion). #### Prerequisites[​](#prerequisites "Direct link to Prerequisites") * You must have a [dbt account](https://www.getdbt.com/signup/) and [Developer seat license](https://docs.getdbt.com/docs/cloud/manage-access/seats-and-users.md). * For the [Trigger on job completion](#trigger-on-job-completion) feature, your dbt account must be on the [Starter or an Enterprise-tier](https://www.getdbt.com/pricing/) plan. * You must have a dbt project connected to a [data platform](https://docs.getdbt.com/docs/cloud/connect-data-platform/about-connections.md). * You must have [access permission](https://docs.getdbt.com/docs/cloud/manage-access/about-user-access.md) to view, create, modify, or run jobs. * You must set up a [deployment environment](https://docs.getdbt.com/docs/deploy/deploy-environments.md). #### Create and schedule jobs[​](#create-and-schedule-jobs "Direct link to Create and schedule jobs") info dbt uses [Coordinated Universal Time](https://en.wikipedia.org/wiki/Coordinated_Universal_Time) (UTC) for all jobs, including those configured with cron. It does not adjust for your local timezone or daylight saving time. For example: * 0 means 12am (midnight) UTC * 12 means 12pm (afternoon) UTC * 23 means 11pm UTC 1. On your deployment environment page, click **Create job** > **Deploy job** to create a new deploy job. 2. Options in the **Job settings** section: * **Job name** — Specify the name for the deploy job. For example, `Daily build`. * (Optional) **Description** — Provide a description of what the job does (for example, what the job consumes and what the job produces). * **Environment** — By default, it’s set to the deployment environment you created the deploy job from. 3. Options in the **Execution settings** section: * [**Commands**](https://docs.getdbt.com/docs/deploy/job-commands.md#built-in-commands) — By default, it includes the `dbt build` command. Click **Add command** to add more [commands](https://docs.getdbt.com/docs/deploy/job-commands.md) that you want to be invoked when the job runs. During a job run, [built-in commands](https://docs.getdbt.com/docs/deploy/job-commands.md#built-in-commands) are "chained" together and if one run step fails, the entire job fails with an "Error" status. * [**Generate docs on run**](https://docs.getdbt.com/docs/deploy/job-commands.md#checkbox-commands) — Enable this option if you want to [generate project docs](https://docs.getdbt.com/docs/explore/build-and-view-your-docs.md) when this deploy job runs. If the step fails, the job can succeed if subsequent steps pass. * [**Run source freshness**](https://docs.getdbt.com/docs/deploy/job-commands.md#checkbox-commands) — Enable this option to invoke the `dbt source freshness` command before running the deploy job. If the step fails, the job can succeed if subsequent steps pass. Refer to [Source freshness](https://docs.getdbt.com/docs/deploy/source-freshness.md) for more details. 4. Options in the **Triggers** section: * **Run on schedule** — Run the deploy job on a set schedule. * **Timing** — Specify whether to [schedule](#schedule-days) the deploy job using **Intervals** that run the job every specified number of hours, **Specific hours** that run the job at specific times of day, or **Cron schedule** that run the job specified using [cron syntax](#cron-schedule). * **Days of the week** — By default, it’s set to every day when **Intervals** or **Specific hours** is chosen for **Timing**. * **Run when another job finishes** — Run the deploy job when another *upstream* deploy [job completes](#trigger-on-job-completion). * **Project** — Specify the parent project that has that upstream deploy job. * **Job** — Specify the upstream deploy job. * **Completes on** — Select the job run status(es) that will [enqueue](https://docs.getdbt.com/docs/deploy/job-scheduler.md#scheduler-queue) the deploy job. [![Example of Triggers on the Deploy Job page](/img/docs/dbt-cloud/using-dbt-cloud/example-triggers-section.png?v=2 "Example of Triggers on the Deploy Job page")](#)Example of Triggers on the Deploy Job page 5. (Optional) Options in the **Advanced settings** section: * **Environment variables** — Define [environment variables](https://docs.getdbt.com/docs/build/environment-variables.md) to customize the behavior of your project when the deploy job runs. * **Target name** — Define the [target name](https://docs.getdbt.com/docs/build/custom-target-names.md) to customize the behavior of your project when the deploy job runs. Environment variables and target names are often used interchangeably. * **Run timeout** — Cancel the deploy job if the run time exceeds the timeout value. * **Compare changes against** — By default, it’s set to **No deferral**. Select either **Environment** or **This Job** to let dbt know what it should compare the changes against. info Older versions of dbt only allow you to defer to a specific job instead of an environment. Deferral to a job compares state against the project code that was run in the deferred job's last successful run. While deferral to an environment is more efficient as dbt will compare against the project representation (which is stored in the `manifest.json`) of the last successful deploy job run that executed in the deferred environment. By considering *all* deploy jobs that run in the deferred environment, dbt will get a more accurate, latest project representation state. * **dbt version** — By default, it’s set to inherit the [dbt version](https://docs.getdbt.com/docs/dbt-versions/core.md) from the environment. dbt Labs strongly recommends that you don't change the default setting. This option to change the version at the job level is useful only when you upgrade a project to the next dbt version; otherwise, mismatched versions between the environment and job can lead to confusing behavior. * **Threads** — By default, it’s set to 4 [threads](https://docs.getdbt.com/docs/local/profiles.yml.md#understanding-threads). Increase the thread count to increase model execution concurrency. [![Example of Advanced Settings on the Deploy Job page](/img/docs/dbt-cloud/using-dbt-cloud/deploy-job-adv-settings.png?v=2 "Example of Advanced Settings on the Deploy Job page")](#)Example of Advanced Settings on the Deploy Job page ##### Schedule days[​](#schedule-days "Direct link to Schedule days") To set your job's schedule, use the **Run on schedule** option to choose specific days of the week, and select customized hours or intervals. Under **Timing**, you can either use regular intervals for jobs that need to run frequently throughout the day or customizable hours for jobs that need to run at specific times: * **Intervals** — Use this option to set how often your job runs, in hours. For example, if you choose **Every 2 hours**, the job will run every 2 hours from midnight UTC. This doesn't mean that it will run at exactly midnight UTC. However, subsequent runs will always be run with the same amount of time between them. For example, if the previous scheduled pipeline ran at 00:04 UTC, the next run will be at 02:04 UTC. This option is useful if you need to run jobs multiple times per day at regular intervals. * **Specific hours** — Use this option to set specific times when your job should run. You can enter a comma-separated list of hours (in UTC) when you want the job to run. For example, if you set it to `0,12,23,` the job will run at midnight, noon, and 11 PM UTC. Job runs will always be consistent between both hours and days, so if your job runs at 00:05, 12:05, and 23:05 UTC, it will run at these same hours each day. This option is useful if you want your jobs to run at specific times of day and don't need them to run more frequently than once a day. ##### Cron schedule[​](#cron-schedule "Direct link to Cron schedule") To fully customize the scheduling of your job, choose the **Cron schedule** option and use cron syntax. With this syntax, you can specify the minute, hour, day of the month, month, and day of the week, allowing you to set up complex schedules like running a job on the first Monday of each month. **Note:** Cron schedules in dbt use UTC and don't convert to your local timezone or adjust for daylight saving time. **Cron frequency** To enhance performance, job scheduling frequencies vary by dbt plan: * Developer plans: dbt sets a minimum interval of every 10 minutes for scheduling jobs. This means scheduling jobs to run more frequently, or at less than 10 minute intervals, is not supported. * Starter, Enterprise, and Enterprise+ plans: No restrictions on job execution frequency. **Examples** Use tools such as [crontab.guru](https://crontab.guru/) to generate the correct cron syntax. This tool allows you to input cron snippets and return their plain English translations. The dbt job scheduler supports using `L` to schedule jobs on the last day of the month. Examples of cron job schedules: * `0 * * * *`: Every hour, at minute 0. * `*/5 * * * *`: Every 5 minutes. (Not available on Developer plans) * `5 4 * * *`: At exactly 4:05 AM UTC. * `30 */4 * * *`: At minute 30 past every 4th hour (such as 4:30 AM, 8:30 AM, 12:30 PM, and so on, all UTC). * `0 0 */2 * *`: At 12:00 AM (midnight) UTC every other day. * `0 0 * * 1`: At midnight UTC every Monday. * `0 0 L * *`: At 12:00 AM (midnight), on the last day of the month. * `0 0 L 1,2,3,4,5,6,8,9,10,11,12 *`: At 12:00 AM, on the last day of the month, only in January, February, March, April, May, June, August, September, October, November, and December. * `0 0 L 7 *`: At 12:00 AM, on the last day of the month, only in July. * `0 0 L * FRI,SAT`: At 12:00 AM, on the last day of the month, and on Friday and Saturday. * `0 12 L * *`: At 12:00 PM (afternoon), on the last day of the month. * `0 7 L * 5`: At 07:00 AM, on the last day of the month, and on Friday. * `30 14 L * *`: At 02:30 PM, on the last day of the month. * `0 4 * * MON#1`: At 4:00 AM on the first Monday of every month. ##### Trigger on job completion [Starter](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing")[Enterprise](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing")[Enterprise +](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing")[​](#trigger-on-job-completion-- "Direct link to trigger-on-job-completion--") To *chain* deploy jobs together: 1. In the **Triggers** section, enable the **Run when another job finishes** option. 2. Select the project that has the deploy job you want to run after completion. 3. Specify the upstream (parent) job that, when completed, will trigger your job. * You can also use the [Create Job API](https://docs.getdbt.com/dbt-cloud/api-v2#/operations/Create%20Job) to do this. 4. In the **Completes on** option, select the job run status(es) that will [enqueue](https://docs.getdbt.com/docs/deploy/job-scheduler.md#scheduler-queue) the deploy job. [![Example of Trigger on job completion on the Deploy job page](/img/docs/deploy/deploy-job-completion.jpg?v=2 "Example of Trigger on job completion on the Deploy job page")](#)Example of Trigger on job completion on the Deploy job page 5. You can set up a configuration where an upstream job triggers multiple downstream (child) jobs and jobs in other projects. You must have proper [permissions](https://docs.getdbt.com/docs/cloud/manage-access/enterprise-permissions.md#project-role-permissions) to the project and job to configure the trigger. If another job triggers your job to run, you can find a link to the upstream job in the [run details section](https://docs.getdbt.com/docs/deploy/run-visibility.md#job-run-details). #### Delete a job[​](#delete-a-job "Direct link to Delete a job") To delete a job or multiple jobs in dbt: 1. Click **Deploy** on the navigation header. 2. Click **Jobs** and select the job you want to delete. 3. Click **Settings** on the top right of the page and then click **Edit**. 4. Scroll to the bottom of the page and click **Delete job** to delete the job.
[![Delete a job](/img/docs/dbt-cloud/cloud-configuring-dbt-cloud/delete-job.png?v=2 "Delete a job")](#)Delete a job 5. Confirm your action in the pop-up by clicking **Confirm delete** in the bottom right to delete the job immediately. This action cannot be undone. However, you can create a new job with the same information if the deletion was made in error. 6. Refresh the page, and the deleted job should now be gone. If you want to delete multiple jobs, you'll need to perform these steps for each job. If you're having any issues, feel free to [contact us](mailto:support@getdbt.com) for additional help. #### Job monitoring[​](#job-monitoring "Direct link to Job monitoring") On the **Environments** page, there are two sections that provide an overview of the jobs for that environment: * **In progress** — Lists the currently in progress jobs with information on when the run started * **Top jobs by models built** — Ranks jobs by the number of models built over a specific time [![In progress jobs and Top jobs by models built](/img/docs/deploy/in-progress-top-jobs.png?v=2 "In progress jobs and Top jobs by models built")](#)In progress jobs and Top jobs by models built #### Job settings history[​](#job-settings-history "Direct link to Job settings history") You can view historical job settings changes over the last 90 days. To view the change history: 1. Navigate to **Orchestration** from the main menu and click **Jobs**. 2. Click a **job name**. 3. Click **Settings**. 4. Click **History**. [![Example of the job settings history.](/img/docs/deploy/job-history.png?v=2 "Example of the job settings history.")](#)Example of the job settings history. #### Related docs[​](#related-docs "Direct link to Related docs") * [Artifacts](https://docs.getdbt.com/docs/deploy/artifacts.md) * [Continuous integration (CI) jobs](https://docs.getdbt.com/docs/deploy/ci-jobs.md) * [Webhooks](https://docs.getdbt.com/docs/deploy/webhooks.md) #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Deploy your metrics StarterEnterpriseEnterprise + ### Deploy your metrics [Starter](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing")[Enterprise](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing")[Enterprise +](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing") This section explains how you can perform a job run in your deployment environment in dbt to materialize and deploy your metrics. Currently, the deployment environment is only supported. 1. Once you’ve [defined your semantic models and metrics](https://docs.getdbt.com/guides/sl-snowflake-qs.md?step=10), commit and merge your metric changes in your dbt project. 2. In dbt, create a new [deployment environment](https://docs.getdbt.com/docs/deploy/deploy-environments.md#create-a-deployment-environment) or use an existing environment on dbt 1.6 or higher. * Note — Deployment environment is currently supported (*development experience coming soon*) 3. To create a new environment, navigate to **Deploy** in the navigation menu, select **Environments**, and then select **Create new environment**. 4. Fill in your deployment credentials with your Snowflake username and password. You can name the schema anything you want. Click **Save** to create your new production environment. 5. [Create a new deploy job](https://docs.getdbt.com/docs/deploy/deploy-jobs.md#create-and-schedule-jobs) that runs in the environment you just created. Go back to the **Deploy** menu, select **Jobs**, select **Create job**, and click **Deploy job**. 6. Set the job to run a `dbt parse` job to parse your projects and generate a [`semantic_manifest.json` artifact](https://docs.getdbt.com/reference/artifacts/sl-manifest.md) file. Although running `dbt build` isn't required, you can choose to do so if needed. note If you are on the dbt Fusion engine, add the `dbt docs generate` command to your job to successfully deploy your metrics. 7. Run the job by clicking the **Run now** button. Monitor the job's progress in real-time through the **Run summary** tab. Once the job completes successfully, your dbt project, including the generated documentation, will be fully deployed and available for use in your production environment. If any issues arise, review the logs to diagnose and address any errors. What’s happening internally? * Merging the code into your main branch allows dbt to pull those changes and build the definition in the manifest produced by the run.
* Re-running the job in the deployment environment helps materialize the models, which the metrics depend on, in the data platform. It also makes sure that the manifest is up to date.
* The Semantic Layer APIs pull in the most recent manifest and enables your integration to extract metadata from it. #### Next steps[​](#next-steps "Direct link to Next steps") After you've executed a job and deployed your Semantic Layer: * [Set up your Semantic Layer](https://docs.getdbt.com/docs/use-dbt-semantic-layer/setup-sl.md) in dbt. * Discover the [available integrations](https://docs.getdbt.com/docs/cloud-integrations/avail-sl-integrations.md), such as Tableau, Google Sheets, Microsoft Excel, and more. * Start querying your metrics with the [API query syntax](https://docs.getdbt.com/docs/dbt-cloud-apis/sl-jdbc.md#querying-the-api-for-metric-metadata). #### Related docs[​](#related-docs "Direct link to Related docs") * [Optimize querying performance](https://docs.getdbt.com/docs/use-dbt-semantic-layer/sl-cache.md) using declarative caching. * [Validate semantic nodes in CI](https://docs.getdbt.com/docs/deploy/ci-jobs.md#semantic-validations-in-ci) to ensure code changes made to dbt models don't break these metrics. * If you haven't already, learn how to [build your metrics and semantic models](https://docs.getdbt.com/docs/build/build-metrics-intro.md) in your development tool of choice. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Deployment environments Deployment environments in dbt are crucial for deploying dbt jobs in production and using features or integrations that depend on dbt metadata or results. To execute dbt, environments determine the settings used during job runs, including: * The version of dbt Core that will be used to run your project * The warehouse connection information (including the target database/schema settings) * The version of your code to execute A dbt project can have multiple deployment environments, providing you the flexibility and customization to tailor the execution of dbt jobs. You can use deployment environments to [create and schedule jobs](https://docs.getdbt.com/docs/deploy/deploy-jobs.md#create-and-schedule-jobs), [enable continuous integration](https://docs.getdbt.com/docs/deploy/continuous-integration.md), or more based on your specific needs or requirements. Learn how to manage dbt environments To learn different approaches to managing dbt environments and recommendations for your organization's unique needs, read [dbt environment best practices](https://docs.getdbt.com/guides/set-up-ci.md). Learn more about development vs. deployment environments in [dbt Environments](https://docs.getdbt.com/docs/dbt-cloud-environments.md). There are three types of deployment environments: * **Production**: Environment for transforming data and building pipelines for production use. * **Staging**: Environment for working with production tools while limiting access to production data. * **General**: General use environment for deployment development. We highly recommend using the `Production` environment type for the final, source of truth deployment data. There can be only one environment marked for final production workflows and we don't recommend using a `General` environment for this purpose. #### Create a deployment environment[​](#create-a-deployment-environment "Direct link to Create a deployment environment") To create a new dbt deployment environment, navigate to **Deploy** -> **Environments** and then click **Create Environment**. Select **Deployment** as the environment type. The option will be greyed out if you already have a development environment. [![Navigate to Deploy -> Environments to create a deployment environment](/img/docs/dbt-cloud/cloud-configuring-dbt-cloud/create-deploy-env.png?v=2 "Navigate to Deploy -> Environments to create a deployment environment")](#)Navigate to Deploy -> Environments to create a deployment environment ##### Set as production environment[​](#set-as-production-environment "Direct link to Set as production environment") In dbt, each project can have one designated deployment environment, which serves as its production environment. This production environment is *essential* for using features like Catalog and cross-project references. It acts as the source of truth for the project's production state in dbt. [![Set your production environment as the default environment in your Environment Settings](/img/docs/dbt-cloud/using-dbt-cloud/prod-settings-1.png?v=2 "Set your production environment as the default environment in your Environment Settings")](#)Set your production environment as the default environment in your Environment Settings ##### Semantic Layer[​](#semantic-layer "Direct link to Semantic Layer") For customers using the Semantic Layer, the next section of environment settings is the Semantic Layer configurations. [The Semantic Layer setup guide](https://docs.getdbt.com/docs/use-dbt-semantic-layer/setup-sl.md) has the most up-to-date setup instructions. You can also leverage the dbt Job scheduler to [validate your semantic nodes in a CI job](https://docs.getdbt.com/docs/deploy/ci-jobs.md#semantic-validations-in-ci) to ensure code changes made to dbt models don't break these metrics. #### Staging environment[​](#staging-environment "Direct link to Staging environment") Use a staging environment to grant developers access to deployment workflows and tools while controlling access to production data. Staging environments enable you to achieve more granular control over permissions, data warehouse connections, and data isolation — within the purview of a single project in dbt. ##### Git workflow[​](#git-workflow "Direct link to Git workflow") You can approach this in a couple of ways, but the most straightforward is configuring staging with a long-living branch (for example, `staging`) similar to, but separate from the primary branch (for example, `main`). In this scenario, the workflows would ideally move upstream from the Development environment -> Staging environment -> Production environment with developer branches feeding into the `staging` branch, then ultimately merging into `main`. In many cases, the `main` and `staging` branches will be identical after a merge and remain until the next batch of changes from the `development` branches are ready to be elevated. We recommend setting branch protection rules on `staging` similar to `main`. Some customers prefer to connect Development and Staging to their `main` branch and then cut release branches on a regular cadence (daily or weekly), which feeds into Production. ##### Why use a staging environment[​](#why-use-a-staging-environment "Direct link to Why use a staging environment") These are the primary motivations for using a staging environment: 1. An additional validation layer before changes are deployed into production. You can deploy, test, and explore your dbt models in staging. 2. Clear isolation between development workflows and production data. It enables developers to work in metadata-powered ways, using features like deferral and cross-project references, without accessing data in production deployments. 3. Provide developers with the ability to create, edit, and trigger ad hoc jobs in the staging environment, while keeping the production environment locked down using [environment-level permissions](https://docs.getdbt.com/docs/cloud/manage-access/environment-permissions.md). **Conditional configuration of sources** enables you to point to "prod" or "non-prod" source data, depending on the environment you're running in. For example, this source will point to `.sensitive_source.table_with_pii`, where `` is dynamically resolved based on an environment variable. models/sources.yml ```yaml sources: - name: sensitive_source database: "{{ env_var('SENSITIVE_SOURCE_DATABASE') }}" tables: - name: table_with_pii ``` There is exactly one source (`sensitive_source`), and all downstream dbt models select from it as `{{ source('sensitive_source', 'table_with_pii') }}`. The code in your project and the shape of the DAG remain consistent across environments. By setting it up in this way, rather than duplicating sources, you get some important benefits. **Cross-project references in dbt Mesh:** Let's say you have `Project B` downstream of `Project A` with cross-project refs configured in the models. When developers work in the IDE for `Project B`, cross-project refs will resolve to the staging environment of `Project A`, rather than production. You'll get the same results with those refs when jobs are run in the staging environment. Only the production environment will reference the production data, keeping the data and access isolated without needing separate projects. **Faster development enabled by deferral:** If `Project B` also has a staging deployment, then references to unbuilt upstream models within `Project B` will resolve to that environment using [deferral](https://docs.getdbt.com/docs/cloud/about-cloud-develop-defer.md), rather than resolving to the models in production. This saves developers time and warehouse spend, while preserving clear separation of environments. Finally, the staging environment has its own view in [Catalog](https://docs.getdbt.com/docs/explore/explore-projects.md), giving you a full view of your prod and pre-prod data. [![Explore in a staging environment](/img/docs/collaborate/dbt-explorer/explore-staging-env.png?v=2 "Explore in a staging environment")](#)Explore in a staging environment ##### Create a Staging environment[​](#create-a-staging-environment "Direct link to Create a Staging environment") In the dbt, navigate to **Deploy** -> **Environments** and then click **Create Environment**. Select **Deployment** as the environment type. The option will be greyed out if you already have a development environment. [![Create a staging environment](/img/docs/dbt-cloud/cloud-configuring-dbt-cloud/create-staging-environment.png?v=2 "Create a staging environment")](#)Create a staging environment Follow the steps outlined in [deployment credentials](#deployment-connection) to complete the remainder of the environment setup. We recommend that the data warehouse credentials be for a dedicated user or service principal. #### Deployment connection[​](#deployment-connection "Direct link to Deployment connection") Warehouse Connections Warehouse connections are created and managed at the account-level for dbt accounts and assigned to an environment. To change warehouse type, we recommend creating a new environment. Each project can have multiple connections (Snowflake account, Redshift host, Bigquery project, Databricks host, and so on.) of the same warehouse type. Some details of that connection (databases/schemas/and so on.) can be overridden within this section of the dbt environment settings. This section determines the exact location in your warehouse dbt should target when building warehouse objects! This section will look a bit different depending on your warehouse provider. For all warehouses, use [extended attributes](https://docs.getdbt.com/docs/dbt-cloud-environments.md#extended-attributes) to override missing or inactive (grayed-out) settings. * Postgres * Redshift * Snowflake * Bigquery * Spark * Databricks This section will not appear if you are using Postgres, as all values are inferred from the project's connection. Use [extended attributes](https://docs.getdbt.com/docs/dbt-cloud-environments.md#extended-attributes) to override these values. This section will not appear if you are using Redshift, as all values are inferred from the project's connection. Use [extended attributes](https://docs.getdbt.com/docs/dbt-cloud-environments.md#extended-attributes) to override these values. [![Snowflake Deployment Connection Settings](/img/docs/collaborate/snowflake-deploy-env-deploy-connection.png?v=2 "Snowflake Deployment Connection Settings")](#)Snowflake Deployment Connection Settings ###### Editable fields[​](#editable-fields "Direct link to Editable fields") * **Role**: Snowflake role * **Database**: Target database * **Warehouse**: Snowflake warehouse This section will not appear if you are using Bigquery, as all values are inferred from the project's connection. Use [extended attributes](https://docs.getdbt.com/docs/dbt-cloud-environments.md#extended-attributes) to override these values. This section will not appear if you are using Spark, as all values are inferred from the project's connection. Use [extended attributes](https://docs.getdbt.com/docs/dbt-cloud-environments.md#extended-attributes) to override these values. [![Databricks Deployment Connection Settings](/img/docs/collaborate/databricks-deploy-env-deploy-connection.png?v=2 "Databricks Deployment Connection Settings")](#)Databricks Deployment Connection Settings ###### Editable fields[​](#editable-fields-1 "Direct link to Editable fields") * **Catalog** (optional): [Unity Catalog namespace](https://docs.getdbt.com/docs/local/connect-data-platform/databricks-setup.md) ##### Deployment credentials[​](#deployment-credentials "Direct link to Deployment credentials") This section allows you to determine the credentials that should be used when connecting to your warehouse. The authentication methods may differ depending on the warehouse and dbt tier you are on. For all warehouses, use [extended attributes](https://docs.getdbt.com/docs/dbt-cloud-environments.md#extended-attributes) to override missing or inactive (grayed-out) settings. For credentials, we recommend wrapping extended attributes in [environment variables](https://docs.getdbt.com/docs/build/environment-variables.md) (`password: '{{ env_var(''DBT_ENV_SECRET_PASSWORD'') }}'`) to avoid displaying the secret value in the text box and the logs. * Postgres * Redshift * Snowflake * Bigquery * Spark * Databricks [![Postgres Deployment Credentials Settings](/img/docs/collaborate/postgres-deploy-env-deploy-credentials.png?v=2 "Postgres Deployment Credentials Settings")](#)Postgres Deployment Credentials Settings ###### Editable fields[​](#editable-fields-2 "Direct link to Editable fields") * **Username**: Postgres username to use (most likely a service account) * **Password**: Postgres password for the listed user * **Schema**: Target schema [![Redshift Deployment Credentials Settings](/img/docs/collaborate/postgres-deploy-env-deploy-credentials.png?v=2 "Redshift Deployment Credentials Settings")](#)Redshift Deployment Credentials Settings ###### Editable fields[​](#editable-fields-3 "Direct link to Editable fields") * **Username**: Redshift username to use (most likely a service account) * **Password**: Redshift password for the listed user * **Schema**: Target schema [![Snowflake Deployment Credentials Settings](/img/docs/collaborate/snowflake-deploy-env-deploy-credentials.png?v=2 "Snowflake Deployment Credentials Settings")](#)Snowflake Deployment Credentials Settings ###### Editable fields[​](#editable-fields-4 "Direct link to Editable fields") * **Auth Method**: This determines the way dbt connects to your warehouse * One of: \[**Username & Password**, **Key Pair**] * If **Username & Password**: * **Username**: username to use (most likely a service account) * **Password**: password for the listed user * If **Key Pair**: * **Username**: username to use (most likely a service account) * **Private Key**: value of the Private SSH Key (optional in the user interface, but required for key pair authentication when dbt runs) * **Private Key Passphrase**: value of the Private SSH Key Passphrase (optional, only if required) * **Schema**: Target Schema for this environment [![Bigquery Deployment Credentials Settings](/img/docs/collaborate/bigquery-deploy-env-deploy-credentials.png?v=2 "Bigquery Deployment Credentials Settings")](#)Bigquery Deployment Credentials Settings ###### Editable fields[​](#editable-fields-5 "Direct link to Editable fields") * **Dataset**: Target dataset Use [extended attributes](https://docs.getdbt.com/docs/dbt-cloud-environments.md#extended-attributes) to override missing or inactive (grayed-out) settings. For credentials, we recommend wrapping extended attributes in [environment variables](https://docs.getdbt.com/docs/build/environment-variables.md) (`password: '{{ env_var(''DBT_ENV_SECRET_PASSWORD'') }}'`) to avoid displaying the secret value in the text box and the logs. [![Spark Deployment Credentials Settings](/img/docs/collaborate/spark-deploy-env-deploy-credentials.png?v=2 "Spark Deployment Credentials Settings")](#)Spark Deployment Credentials Settings ###### Editable fields[​](#editable-fields-6 "Direct link to Editable fields") * **Token**: Access token * **Schema**: Target schema [![Databricks Deployment Credentials Settings](/img/docs/collaborate/spark-deploy-env-deploy-credentials.png?v=2 "Databricks Deployment Credentials Settings")](#)Databricks Deployment Credentials Settings ###### Editable fields[​](#editable-fields-7 "Direct link to Editable fields") * **Token**: Access token * **Schema**: Target schema #### Delete an environment[​](#delete-an-environment "Direct link to Delete an environment") Deleting an environment automatically deletes its associated job(s). If you want to keep those jobs, move them to a different environment first. Follow these steps to delete an environment in dbt: 1. Click **Deploy** on the navigation header and then click **Environments** 2. Select the environment you want to delete. 3. Click **Settings** on the top right of the page and then click **Edit**. 4. Scroll to the bottom of the page and click **Delete** to delete the environment. [![Delete an environment](/img/docs/dbt-cloud/cloud-configuring-dbt-cloud/delete-environment.png?v=2 "Delete an environment")](#)Delete an environment 5. Confirm your action in the pop-up by clicking **Confirm delete** in the bottom right to delete the environment immediately. This action cannot be undone. However, you can create a new environment with the same information if the deletion was made in error. 6. Refresh your page and the deleted environment should now be gone. To delete multiple environments, you'll need to perform these steps to delete each one. If you're having any issues, feel free to [contact us](mailto:support@getdbt.com) for additional help. #### Related docs[​](#related-docs "Direct link to Related docs") * [dbt environment best practices](https://docs.getdbt.com/guides/set-up-ci.md) * [Deploy jobs](https://docs.getdbt.com/docs/deploy/deploy-jobs.md) * [CI jobs](https://docs.getdbt.com/docs/deploy/continuous-integration.md) * [Delete a job or environment in dbt](https://docs.getdbt.com/faqs/Environments/delete-environment-job.md) #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Derived metrics In MetricFlow, derived metrics are metrics created by defining an expression using other metrics. They enable you to perform calculations with existing metrics. This is helpful for combining metrics and doing math functions on aggregated columns, like creating a profit metric. The parameters, description, and type for derived metrics are: The following displays the complete specification for derived metrics, along with an example. For advanced data modeling, you can use `fill_nulls_with` and `join_to_timespine` to [set null metric values to zero](https://docs.getdbt.com/docs/build/fill-nulls-advanced.md), ensuring numeric values for every data row. #### Derived metrics example[​](#derived-metrics-example "Direct link to Derived metrics example") #### Derived metric offset[​](#derived-metric-offset "Direct link to Derived metric offset") To perform calculations using a metric's value from a previous time period, you can add an offset parameter to a derived metric. For example, if you want to calculate period-over-period growth or track user retention, you can use this metric offset. **Note:** You must include the [`metric_time` dimension](https://docs.getdbt.com/docs/build/dimensions.md#time) when querying a derived metric with an offset window. The following example displays how you can calculate monthly revenue growth using a 1-month offset window: ##### Offset windows and granularity[​](#offset-windows-and-granularity "Direct link to Offset windows and granularity") You can query any granularity and offset window combination. The following example queries a metric with a 7-day offset and a monthly grain: When you run the query `dbt sl query --metrics d7_booking_change --group-by metric_time__month` for the metric, here's how it's calculated. For dbt Core, you can use the `mf query` prefix. 1. Retrieve the raw, unaggregated dataset with the specified and dimensions at the smallest level of detail, which is currently 'day'. 2. Then, perform an offset join on the daily dataset, followed by performing a date trunc and aggregation to the requested granularity. For example, to calculate `d7_booking_change` for July 2017: * First, sum up all the booking values for each day in July to calculate the bookings metric. * The following table displays the range of days that make up this monthly aggregation. | | Orders | Metric\_time | | ----- | ------ | ------------------------ | | | 330 | 2017-07-31 | | | 7030 | 2017-07-30 to 2017-07-02 | | | 78 | 2017-07-01 | | Total | 7438 | 2017-07-01 | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | 3. Calculate July's bookings with a 7-day offset. The following table displays the range of days that make up this monthly aggregation. Note that the month begins 7 days later (offset by 7 days) on 2017-07-24. | | Orders | Metric\_time | | ----- | ------ | ------------------------ | | | 329 | 2017-07-24 | | | 6840 | 2017-07-23 to 2017-06-30 | | | 83 | 2017-06-24 | | Total | 7252 | 2017-07-01 | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | 4. Lastly, calculate the derived metric and return the final result set: ```bash bookings - bookings_7_days_ago would be compile as 7438 - 7252 = 186. ``` | d7\_booking\_change | metric\_time\_\_month | | ------------------- | --------------------- | | 186 | 2017-07-01 | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | #### Related docs[​](#related-docs "Direct link to Related docs") * [Fill null values for simple, derived, or ratio metrics](https://docs.getdbt.com/docs/build/fill-nulls-advanced.md) #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Developer agent BetaEnterpriseEnterprise + ### Developer agent [Beta](https://docs.getdbt.com/docs/dbt-versions/product-lifecycles "Go to https://docs.getdbt.com/docs/dbt-versions/product-lifecycles")[Enterprise](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing")[Enterprise +](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing") The Developer agent is the next evolution of Copilot and provides agentic capabilities to streamline the developer experience in the Studio IDE. Generate or refactor models, tests, and documentation from natural language — grounded in your project's lineage, metadata, governance, and Semantic Layer — while keeping every change auditable. Use the Developer agent to: * **Generate semantic models, tests, and docs**: Scaffold YAML definitions from existing models and save time on manual setup. * **Build or modify models**: Create or update dbt models from natural language descriptions of the transformation or logic you need. * **Light refactors**: Rename columns, change materializations, or adjust logic. The agent keeps associated YAML files in sync. The agent always has access to the latest dbt-recommended guidance through [dbt Agent Skills](https://github.com/dbt-labs/dbt-agent-skills), a curated collection of instructions managed by dbt Labs. These skills are available out of the box — no configuration needed! 🎉 #### Prerequisites[​](#prerequisites "Direct link to Prerequisites") * An Enterprise-tier plan — Contact your account manager for access. * A [dbt account](https://www.getdbt.com/signup) and [Developer seat license](https://docs.getdbt.com/docs/cloud/manage-access/seats-and-users.md). * A [development environment](https://docs.getdbt.com/docs/cloud/studio-ide/develop-in-studio.md#get-started-with-the-studio-ide) and credentials set up in the Studio IDE. * [Account access to Copilot features](https://docs.getdbt.com/docs/cloud/enable-dbt-copilot.md). ###### Availability and considerations[​](#availability-and-considerations "Direct link to Availability and considerations") * The Developer agent is available in the [Studio IDE](https://docs.getdbt.com/docs/cloud/studio-ide/develop-in-studio.md) only. It's not available in VS Code or the dbt CLI. * It works across all engines (dbt Fusion engine and dbt Core). * It does not retain conversation context between sessions. If you close the Studio IDE, the conversation resets. If you saved any file changes the agent made, those changes will remain in your branch. Unsaved changes are lost. * Currently, **Plan** mode isn't supported. The Developer agent drafts changes directly without showing a plan first. Use **Ask** mode if you want to approve each file change before it is persisted. * You cannot edit a prompt after submitting it. To refine your request, click the **Start over** button located at the top right corner of the Copilot panel. This resets the session and you can submit a new prompt. #### Using the Developer agent[​](#using-the-developer-agent "Direct link to Using the Developer agent") To use the Developer agent, follow these steps: 1. Open your dbt project in the [Studio IDE](https://docs.getdbt.com/docs/cloud/studio-ide/develop-in-studio.md), then click **Copilot** in the command palette. 2. Start a prompt in several ways in the [Copilot panel](https://docs.getdbt.com/docs/dbt-ai/developer-agent.md#panel-controls): * **Quick actions**: The Studio IDE surfaces quick actions at the top of the panel to help you get started with common tasks. * **Plain text**: Type directly into the text field to describe what you want to build or change. * **Model context**: Type `@` to select a model as context. This scopes the agent's changes to that resource. 3. Select the [**Agent mode** button](https://docs.getdbt.com/docs/dbt-ai/developer-agent.md#agent-modes) to specify the mode for the Developer agent. Available modes are **Ask** (default) and **Code**. 4. [Review the agent's suggestions](https://docs.getdbt.com/docs/dbt-ai/developer-agent.md#reviewing-agent-suggestions) and approve or reject the changes. You can also use the **Start over** button to reset the current session. 5. [Approve dbt commands](https://docs.getdbt.com/docs/dbt-ai/developer-agent.md#granting-command-permissions) when the Developer agent requests to run commands like `dbt compile` or `dbt build`. 6. Repeat the process to build or change more models. 7. Commit the changes to your dbt project and open a pull request. Your browser does not support the video tag. Example of using the Developer agent to refactor a model in the Studio IDE. For more details on the Developer agent and how it works, see the following sections. ###### Panel controls[​](#panel-controls "Direct link to Panel controls") The Copilot panel contains: 1. **Quick actions** (center): The Studio IDE surfaces quick actions at the top of the panel to help you get started with common tasks, like generating documentation, semantic models, tests, and metrics. When selected, the text field is pre-filled with a prompt for the selected action. These quick actions may evolve over time as new capabilities are added. 2. **Agent mode button** (bottom left): Switch between **Ask** and **Code** mode. Click the button to change modes. 3. **Model context** (bottom left, next to mode): Shows the currently open file. Use `@` in the text field to reference a different model. Click **x** to remove the model context. 4. **Text input field** (bottom right): Type your prompt in the text field to describe what you want to build or change. Type `@` to select a model as context. This scopes the agent's changes to that resource. 5. **Start over** (top right): Resets the current session. When you click this button, a confirmation prompt appears. Click **Start over** to confirm, or **Cancel** to return to your current conversation. You cannot undo this action. 6. **Stop** or **Enter** (bottom right): Press **Enter** to submit your prompt. Press **Stop** to stop the current session and agent processing. You cannot undo this action. [![The Copilot panel in the Studio IDE showing quick-action buttons, text input field, and agent mode controls.](/img/docs/dbt-cloud/dev-agent-copilot-panel.png?v=2 "The Copilot panel in the Studio IDE showing quick-action buttons, text input field, and agent mode controls.")](#)The Copilot panel in the Studio IDE showing quick-action buttons, text input field, and agent mode controls. ###### Agent modes[​](#agent-modes "Direct link to Agent modes") The Developer agent operates in two modes: | Mode | Behavior | | ----------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------- | | **Ask** (default) | The agent drafts edits to files. You must approve each file change before it is persisted. Best when you want tight control over what gets saved to your branch. | | **Code** | The agent drafts and automatically edits files without per-file approval. Best for faster iteration when you're confident in the prompt. | | | | - | You can switch between modes at any time by clicking the **Agent mode** button in the Copilot panel. [![The Developer agent in Ask mode, requesting approval before making file edits.](/img/docs/dbt-cloud/dev-agent-ask-mode.png?v=2 "The Developer agent in Ask mode, requesting approval before making file edits.")](#)The Developer agent in Ask mode, requesting approval before making file edits. ###### Reviewing agent suggestions[​](#reviewing-agent-suggestions "Direct link to Reviewing agent suggestions") When the Developer agent proposes code changes, you can review them before they are committed to your project: * **View the diff**: The agent displays a diff of the proposed changes. Click **Show all X lines** to expand and view the full suggestion. * **Line indicators**: Added and removed lines are highlighted with line number indicators so you can see exactly what changed. * **Copy or open in editor**: Use the options in the top-right corner of the diff view to copy the suggestion or open it directly in the editor. [![The Developer agent displaying a diff of proposed YAML changes with line indicators and copy/open options.](/img/docs/dbt-cloud/dev-agent-code-suggestion.png?v=2 "The Developer agent displaying a diff of proposed YAML changes with line indicators and copy/open options.")](#)The Developer agent displaying a diff of proposed YAML changes with line indicators and copy/open options. ###### Granting command permissions[​](#granting-command-permissions "Direct link to Granting command permissions") To validate or run models during a session, the agent may request to run dbt commands such as `dbt compile` or `dbt build`. You'll be prompted to approve each request before it executes. For example, the agent might request to run: ```text dbt compile --select model_name ``` [![The Developer agent requesting permission to run a dbt command.](/img/docs/dbt-cloud/dev-agent-invoke-dbt.png?v=2 "The Developer agent requesting permission to run a dbt command.")](#)The Developer agent requesting permission to run a dbt command. You can select one of the following options: | Option | Behavior | | ----------------------------------------------------- | ------------------------------------------------------------------------------------------------ | | **Yes, run once** | Grants permission to run this specific command one time. | | **Yes, and allow `dbt_command_name` for the session** | Grants permission to run dbt commands for the remainder of your session without prompting again. | | **No** | Denies the request. The agent will not run the command. | | | | - | #### Writing effective prompts[​](#writing-effective-prompts "Direct link to Writing effective prompts") Good prompts include the *scope* (which models or area of the project), the *intent* (the transformation or business logic you want), and any *constraints* (naming conventions, materialization, tests). Here are a few examples: | Task | Example prompt | | -------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------- | | Build a new model | "Create a model called `fct_daily_revenue` that joins `stg_orders` and `stg_payments`, aggregates revenue by day, and materializes as a table." | | Refactor an existing model | "Refactor `fct_orders` to use incremental materialization. Keep existing tests and follow our naming conventions." | | Generate tests and docs | "Add `not_null` and `unique` tests to the primary key of `dim_customers`, and generate documentation for all columns." | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | For detailed guidance, patterns, and more examples across SQL, documentation, tests, and semantic models, see the [Prompt cookbook](https://docs.getdbt.com/guides/prompt-cookbook.md). #### Related docs[​](#related-docs "Direct link to Related docs") * [dbt Agents overview](https://docs.getdbt.com/docs/dbt-ai/dbt-agents.md) * [Develop with dbt Copilot](https://docs.getdbt.com/docs/cloud/studio-ide/develop-copilot.md) * [Prompt cookbook](https://docs.getdbt.com/guides/prompt-cookbook.md) * [Semantic models](https://docs.getdbt.com/docs/build/semantic-models.md) * [About dbt AI and intelligence](https://docs.getdbt.com/docs/dbt-ai/about-dbt-ai.md) #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Dimensions All dimensions require a `name`, `type`, and can optionally include an `expr` parameter. The `name` for your Dimension must be unique within the same semantic model. Refer to the following example to see how dimensions are used in a semantic model: #### Dimensions types[​](#dimensions-types "Direct link to Dimensions types") This section further explains the dimension definitions, along with examples. Dimensions have the following types: * [`derived_semantics` in `dimensions`](#derived_semantics-in-dimensions) * [Dimensions types](#dimensions-types) * [Categorical](#categorical) * [Time](#time) * [SCD Type II](#scd-type-ii) * [Basic structure](#basic-structure) * [Semantic model parameters and keys](#semantic-model-parameters-and-keys) * [Implementation](#implementation) * [SCD examples](#scd-examples) #### Categorical[​](#categorical "Direct link to Categorical") Categorical dimensions are used to group metrics by different attributes, features, or characteristics such as product type. They can refer to existing columns in your dbt model or be calculated using a SQL expression with the `expr` parameter. An example of a categorical dimension is `is_bulk_transaction`, which is a group created by applying a case statement to the underlying column `quantity`. This allows users to group or filter the data based on bulk transactions. #### Time[​](#time "Direct link to Time") ```bash # dbt users dbt sl query --metrics users_created,users_deleted --group-by metric_time__year --order-by metric_time__year # dbt Core users mf query --metrics users_created,users_deleted --group-by metric_time__year --order-by metric_time__year ``` You can set `is_partition` for time to define specific time spans. * is\_partition * time\_granularity Use `is_partition: True` to show that a dimension exists over a specific time window. For example, a date-partitioned dimensional table. When you query metrics from different tables, the Semantic Layer uses this parameter to ensure that the correct dimensional values are joined to measures. ##### SCD Type II[​](#scd-type-ii "Direct link to SCD Type II") MetricFlow supports joins against dimensions values in a semantic model built on top of a slowly changing dimension (SCD) Type II table. This is useful when you need a particular metric sliced by a group that changes over time, such as the historical trends of sales by a customer's country. ###### Basic structure[​](#basic-structure "Direct link to Basic structure") SCD Type II are groups that change values at a coarser time granularity. SCD Type II tables typically have two time columns that indicate the validity period of a dimension: `valid_from` (or `tier_start`) and `valid_to` (or `tier_end`). This creates a range of valid rows with different dimension values for a metric. MetricFlow associates the metric with the earliest available dimension value within a coarser time window, such as a month. By default, it uses the group valid at the start of this time granularity. MetricFlow supports the following basic structure of an SCD Type II data platform table: | entity\_key | dimensions\_1 | dimensions\_2 | ... | dimensions\_x | valid\_from | valid\_to | | ----------- | ------------- | ------------- | --- | ------------- | ----------- | ---------- | | 123 | value\_a | value\_x | ... | value\_n | 2024-01-01 | 2024-06-30 | | 123 | value\_b | value\_y | ... | value\_m | 2024-07-01 | 2024-12-31 | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | * `entity_key` (required): A unique identifier for each row in the table, such as a primary key or another unique identifier specific to the entity. * `valid_from` (required): Start date timestamp for when the dimension is valid. Use `validity_params: is_start: True` in the semantic model to specify this. * `valid_to` (required): End date timestamp for when the dimension is valid. Use `validity_params: is_end: True` in the semantic model to specify this. ###### Semantic model parameters and keys[​](#semantic-model-parameters-and-keys "Direct link to Semantic model parameters and keys") When configuring an SCD Type II table in a semantic model, use `validity_params` to specify the start (`valid_from`) and end (`valid_to`) of the validity window for each dimension. * `validity_params`: Parameters that define the validity window. * `is_start: True`: Indicates the start of the validity period. Displayed as `valid_from` in the SCD table. * `is_end: True`: Indicates the end of the validity period. Displayed as `valid_to` in the SCD table. Here’s an example configuration: SCD Type II tables have a specific dimension with a start and end date. To join tables: * Set the additional [entity `type`](https://docs.getdbt.com/docs/build/entities.md#entity-types) parameter to the `natural` key. * Use a `natural` key as an [entity `type`](https://docs.getdbt.com/docs/build/entities.md#entity-types), which means you don't need a `primary` key. * In most instances, SCD tables don't have a logically usable `primary` key because `natural` keys map to multiple rows. ###### Implementation[​](#implementation "Direct link to Implementation") Here are some guidelines to follow when implementing SCD Type II tables: * The SCD table must have `valid_to` and `valid_from` time dimensions, which are logical constructs. * The `valid_from` and `valid_to` properties must be specified exactly once per SCD table configuration. * The `valid_from` and `valid_to` properties shouldn't be used or specified on the same time dimension. * The `valid_from` and `valid_to` time dimensions must cover a non-overlapping period where one row matches each natural key value (meaning they must not overlap and should be distinct). * We recommend defining the underlying dbt model with [dbt snapshots](https://docs.getdbt.com/docs/build/snapshots.md). This supports the SCD Type II table layout and ensures that the table is updated with the latest data. This is an example of SQL code that shows how a sample metric called `num_events` is joined with versioned dimensions data (stored in a table called `scd_dimensions`) using a primary key made up of the `entity_key` and `timestamp` columns. ```sql select metric_time, dimensions_1, sum(1) as num_events from events a left outer join scd_dimensions b on a.entity_key = b.entity_key and a.metric_time >= b.valid_from and (a.metric_time < b. valid_to or b.valid_to is null) group by 1, 2 ``` ###### SCD examples[​](#scd-examples "Direct link to SCD examples") The following are examples of how to use SCD Type II tables in a semantic model:  SCD dimensions for sales tiers and the time length of that tier. This example shows how to create slowly changing dimensions (SCD) using a semantic model. The SCD table contains information about salespersons' tier and the time length of that tier. Suppose you have the underlying SCD table: | sales\_person\_id | tier | start\_date | end\_date | | ----------------- | ---- | ----------- | ---------- | | 111 | 1 | 2019-02-03 | 2020-01-05 | | 111 | 2 | 2020-01-05 | 2048-01-01 | | 222 | 2 | 2020-03-05 | 2048-01-01 | | 333 | 2 | 2020-08-19 | 2021-10-22 | | 333 | 3 | 2021-10-22 | 2048-01-01 | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | As mentioned earlier, the `validity_params` include two important arguments that specify the columns in the SCD table that mark the start and end dates (or timestamps) for each tier or dimension: * `is_start` * `is_end` Additionally, the entity is tagged as `natural` to differentiate it from a `primary` entity. In a `primary` entity, each entity value has one row. In contrast, a `natural` entity has one row for each combination of entity value and its validity period. The following code represents a separate semantic model that holds a fact table for `transactions`: You can now access the metrics in the `transactions` semantic model organized by the slowly changing dimension of `tier`. In the sales tier example, For instance, if a salesperson was Tier 1 from 2022-03-01 to 2022-03-12, and gets promoted to Tier 2 from 2022-03-12 onwards, all transactions from March would be categorized under Tier 1 since the dimensions value of Tier 1 comes earlier (and is the default starting point), even though the salesperson was promoted to Tier 2 on 2022-03-12.  SCD dimensions with sales tiers and group transactions by month when tiers are missing This example shows how to create slowly changing dimensions (SCD) using a semantic model. The SCD table contains information about salespersons' tier and the time length of that tier. Suppose you have the underlying SCD table: | sales\_person\_id | tier | start\_date | end\_date | | ----------------- | ---- | ----------- | ---------- | | 111 | 1 | 2019-02-03 | 2020-01-05 | | 111 | 2 | 2020-01-05 | 2048-01-01 | | 222 | 2 | 2020-03-05 | 2048-01-01 | | 333 | 2 | 2020-08-19 | 2021-10-22 | | 333 | 3 | 2021-10-22 | 2048-01-01 | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | In the sales tier example, if sales\_person\_id 456 is Tier 2 from 2022-03-08 onwards, but there is no associated tier level dimension for this person from 2022-03-01 to 2022-03-08, then all transactions associated with sales\_person\_id 456 for the month of March will be grouped under 'NA' since no tier is present prior to Tier 2. The following command or code represents how to return the count of transactions generated by each sales tier per month: ```bash # dbt platform users dbt sl query --metrics transactions --group-by metric_time__month,sales_person__tier --order-by metric_time__month,sales_person__tier # dbt Core users mf query --metrics transactions --group-by metric_time__month,sales_person__tier --order-by metric_time__month,sales_person__tier ``` #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Discover data with Catalog With Catalog, you can view your project's [resources](https://docs.getdbt.com/docs/build/projects.md) (such as models, tests, and metrics), their lineage, and [model consumption](https://docs.getdbt.com/docs/explore/view-downstream-exposures.md) to gain a better understanding of its latest production state. Use Catalog to navigate and manage your projects within dbt to help you and other data developers, analysts, and consumers discover and leverage your dbt resources. Catalog integrates with the [Studio IDE](https://docs.getdbt.com/docs/cloud/studio-ide/develop-in-studio.md), [dbt Insights](https://docs.getdbt.com/docs/explore/dbt-insights.md), [Orchestrator](https://docs.getdbt.com/docs/deploy/deployments.md), and [Canvas](https://docs.getdbt.com/docs/cloud/canvas.md) to help you develop or view your dbt resources. #### Prerequisites[​](#prerequisites "Direct link to Prerequisites") * You have a dbt account on the [Starter, Enterprise, or Enterprise+ plan](https://www.getdbt.com/pricing/). * You have set up a [production](https://docs.getdbt.com/docs/deploy/deploy-environments.md#set-as-production-environment) or [staging](https://docs.getdbt.com/docs/deploy/deploy-environments.md#create-a-staging-environment) deployment environment for each project you want to explore. * You have at least one successful job run in the deployment environment. Note that [CI jobs](https://docs.getdbt.com/docs/deploy/ci-jobs.md) do not update Catalog. * You are on the Catalog page. To do this, select **Catalog** from the top-level navigation in dbt. #### Generate metadata[​](#generate-metadata "Direct link to Generate metadata") Catalog uses the metadata provided by the [Discovery API](https://docs.getdbt.com/docs/dbt-cloud-apis/discovery-api.md) to display the details about [the state of your dbt project](https://docs.getdbt.com/docs/dbt-cloud-apis/project-state.md). The metadata that's available depends on the [deployment environment](https://docs.getdbt.com/docs/deploy/deploy-environments.md) you've designated as *production* or *staging* in your dbt project. Catalog also allows you to ingest metadata from your data warehouse, giving you visibility into external resources in Catalog. For information on supported warehouses, refer to [External metadata ingestion](https://docs.getdbt.com/docs/explore/external-metadata-ingestion.md#prerequisites). #### dbt metadata[​](#dbt-metadata "Direct link to dbt metadata") If you're using a [hybrid project setup](https://docs.getdbt.com/docs/deploy/hybrid-setup.md) and uploading artifacts from dbt Core, make sure to follow the [setup instructions](https://docs.getdbt.com/docs/deploy/hybrid-setup.md#connect-project-in-dbt-cloud) to connect your project in dbt. This enables Catalog to access and display your metadata correctly. * To ensure all metadata is available in Catalog, run `dbt build` and `dbt docs generate` as part of your job in your production or staging environment. Running these two commands ensures all relevant metadata (like lineage, test results, documentation, and more) is available in Catalog. * Catalog automatically retrieves the metadata updates after each job run in the production or staging deployment environment so it always has the latest results for your project. This includes deploy and merge jobs. * Note that CI jobs don't update Catalog. This is because they don't reflect the production state and don't provide the necessary metadata updates. * To view a resource and its metadata, you must define the resource in your project and run a job in the production or staging environment. * The resulting metadata depends on the [commands](https://docs.getdbt.com/docs/deploy/job-commands.md) executed by the jobs. ##### When dbt creates model metadata[​](#when-dbt-creates-model-metadata "Direct link to When dbt creates model metadata") dbt populates a model's metadata in Catalog when both of the following conditions are met: * The model is defined in your dbt project (it exists in the manifest). * The model appears in the `run_results` of a [`dbt build`](https://docs.getdbt.com/reference/commands/build.md), [`dbt run`](https://docs.getdbt.com/reference/commands/run.md), or [`dbt clone`](https://docs.getdbt.com/reference/commands/clone.md) command, regardless of the run's success or failure status. Note that `dbt docs generate` alone does not create model entries in Catalog. It provides supplementary metadata like column details and descriptions for models that already exist. ##### When dbt removes model metadata[​](#when-dbt-removes-model-metadata "Direct link to When dbt removes model metadata") dbt removes a model's metadata from Catalog in these two cases: * **Model removed from project**: If a model is deleted from your dbt project (and therefore no longer exists in the manifest), its metadata is removed after a subsequent job run in which the model is no longer included. * **Environment inactivity**: If an environment has had no job runs in the past 3 months, all metadata for that environment is purged. To prevent this, schedule jobs to run at least once every 3 months. | To view in Catalog | You must successfully run | | --------------------------------------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | All metadata | [dbt build](https://docs.getdbt.com/reference/commands/build.md), [dbt docs generate](https://docs.getdbt.com/reference/commands/cmd-docs.md), and [dbt source freshness](https://docs.getdbt.com/reference/commands/source.md#dbt-source-freshness) together as part of the same job in the environment | | Model lineage, details, or results | [dbt run](https://docs.getdbt.com/reference/commands/run.md) or [dbt build](https://docs.getdbt.com/reference/commands/build.md) on a given model within a job in the environment | | Columns and statistics for models, sources, and snapshots | [dbt docs generate](https://docs.getdbt.com/reference/commands/cmd-docs.md) within [a job](https://docs.getdbt.com/docs/explore/build-and-view-your-docs.md) in the environment | | Data test results | [dbt test](https://docs.getdbt.com/reference/commands/test.md) or [dbt build](https://docs.getdbt.com/reference/commands/build.md) within a job in the environment | | Unit test results | [dbt test](https://docs.getdbt.com/reference/commands/test.md) or [dbt build](https://docs.getdbt.com/reference/commands/build.md) within a job in the environment. Unit tests are typically run in development or CI environments, so their results rarely appear in production Catalog. | | Source freshness results | [dbt source freshness](https://docs.getdbt.com/reference/commands/source.md#dbt-source-freshness) within a job in the environment | | Snapshot details | [dbt snapshot](https://docs.getdbt.com/reference/commands/snapshot.md) or [dbt build](https://docs.getdbt.com/reference/commands/build.md) within a job in the environment | | Seed details | [dbt seed](https://docs.getdbt.com/reference/commands/seed.md) or [dbt build](https://docs.getdbt.com/reference/commands/build.md) within a job in the environment | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | tip If your organization works in both dbt Core and Cloud, you can unify these workflows by automatically uploading dbt Core artifacts into dbt Cloud and viewing them in Catalog for a more connected dbt experience. To learn more, visit [hybrid projects](https://docs.getdbt.com/docs/deploy/hybrid-projects.md). ##### External metadata ingestion [Preview](https://docs.getdbt.com/docs/dbt-versions/product-lifecycles "Go to https://docs.getdbt.com/docs/dbt-versions/product-lifecycles")[​](#external-metadata-ingestion- "Direct link to external-metadata-ingestion-") Connect directly to your data warehouse with [external metadata ingestion](https://docs.getdbt.com/docs/explore/external-metadata-ingestion.md), giving you visibility into tables, views, and other resources that aren't defined in dbt with Catalog. We create dbt metadata and pull external metadata. Catalog uses the metadata provided by the [Discovery API](https://docs.getdbt.com/docs/dbt-cloud-apis/discovery-api.md) to display details about the state of your project. The available metadata depends on which [deployment environment](https://docs.getdbt.com/docs/deploy/deploy-environments.md) you’ve designated as production or staging in your dbt project. #### Catalog overview[​](#catalog-overview "Direct link to Catalog overview") [Global navigation](https://docs.getdbt.com/docs/explore/global-navigation.md) [Starter](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing")[Enterprise](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing")[Enterprise +](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing") Catalog introduces the ability to widen your search by including dbt resources (models, seeds, snapshots, sources, exposures, and more) across your entire account. This broadens the results returned and gives you greater insight into all the assets across your dbt projects. Learn more in [Global navigation](https://docs.getdbt.com/docs/explore/global-navigation.md) or in our [video overview](https://www.loom.com/share/ae93b3d241cd439fbe5f98f5e6872113?). Navigate the Catalog overview page to access your project's resources and metadata. The page includes the following sections: * **Search bar** — [Search](#search-resources) for resources in your project by keyword. You can also use filters to refine your search results. * **Sidebar** — Use the left sidebar to access model [performance](https://docs.getdbt.com/docs/explore/model-performance.md), [project recommendations](https://docs.getdbt.com/docs/explore/project-recommendations.md) in the **Project details** section. Browse your project's [resources, file tree, and database](#browse-with-the-sidebar) in the lower section of the sidebar. * Find your project recommendations within your project's landing page.\* * **Lineage graph** — Explore your project's or account's [lineage graph](#project-lineage) to visualize the relationships between resources. * **Latest updates** — View the latest changes or issues related to your project's resources, including the most recent job runs, changed properties, lineage, and issues. * **Marts and public models** — View the [marts](https://docs.getdbt.com/best-practices/how-we-structure/1-guide-overview.md#guide-structure-overview) and [public models](https://docs.getdbt.com/docs/mesh/govern/model-access.md#access-modifiers) in your project. You can also navigate to all public models in your account through this view. * **Model query history** — Use [model query history](https://docs.getdbt.com/docs/explore/model-query-history.md) to track consumption queries on your models for deeper insights. * **Visualize downstream exposures** — [Set up](https://docs.getdbt.com/docs/cloud-integrations/downstream-exposures-tableau.md) and [visualize downstream exposures](https://docs.getdbt.com/docs/explore/view-downstream-exposures.md) to automatically expose relevant data models from Tableau to enhance visibility. * **Data health signals** — View the [data-health-signals](https://docs.getdbt.com/docs/explore/data-health-signals.md) for each resource to understand its health and performance. ##### Catalog permissions[​](#catalog-permissions "Direct link to Catalog permissions") When using global navigation and searching across your projects, the following permissions apply. * Your project access permissions determine which dbt projects appear in the left-hand menu of the global navigation. * In Catalog searches, we use soft access controls, you'll see all matching resources in search results, with clear indicators for items you don't have access to. * For external metadata, the global platform credential controls which resources metadata users can discover. See [External metadata ingestion](https://docs.getdbt.com/docs/explore/external-metadata-ingestion.md) for more details. On-demand learning If you enjoy video courses, check out our [dbt Catalog on-demand course](https://learn.getdbt.com/courses/dbt-catalog) and learn how to best explore your dbt project(s)! #### Explore your project's lineage graph[​](#project-lineage "Direct link to Explore your project's lineage graph") Catalog provides a visualization of your project's DAG that you can interact with. To access the project's full lineage graph, select **Overview** in the left sidebar and click the **Explore Lineage** button on the main (center) section of the page. If you don't see the project lineage graph immediately, click **Render Lineage**. It can take some time for the graph to render depending on the size of your project and your computer's available memory. The graph of very large projects might not render so you can select a subset of nodes by using selectors, instead. The nodes in the lineage graph represent the project's resources and the edges represent the relationships between the nodes. Nodes are color-coded and include iconography according to their resource type. By default, Catalog shows the project's [applied state](https://docs.getdbt.com/docs/dbt-cloud-apis/project-state.md#definition-logical-vs-applied-state-of-dbt-nodes) lineage. That is, it shows models that have been successfully built and are available to query, not just the models defined in the project. To explore the lineage graphs of tests and macros, view [their resource details pages](#view-resource-details). By default, Catalog excludes these resources from the full lineage graph unless a search query returns them as results.  How can I interact with the full lineage graph? * Hover over any item in the graph to display the resource's name and type. * Zoom in and out on the graph by mouse-scrolling. * Grab and move the graph and the nodes. * Right-click on a node (context menu) to: * Refocus on the node, including its upstream and downstream nodes * Refocus on the node and its downstream nodes only * Refocus on the node and it upstream nodes only * View the node's [resource details](#view-resource-details) page * Select a resource to highlight its relationship with other resources in your project. A panel opens on the graph's right-hand side that displays a high-level summary of the resource's details. The side panel includes a **General** tab for information like description, materialized type, and other details. In the side panel's upper right corner: * Click the View Resource icon to [view the resource details](#view-resource-details). * Click the [Open in IDE](#open-in-ide) icon to examine the resource using the [Studio IDE](https://docs.getdbt.com/docs/cloud/studio-ide/develop-in-studio.md). * Click the Copy Link to Page icon to copy the page's link to your clipboard. * Use [selectors](https://docs.getdbt.com/reference/node-selection/methods.md) (in the search bar) to select specific resources or a subset of the DAG. This can help narrow the focus on the resources that interest you. All selectors are available for use, except those requiring a state comparison (result, source status, and state). You can also use the `--exclude` and the `--select` flag (which is optional). Examples: * `resource_type:model [RESOURCE_NAME]` — Returns all models matching the name search * `resource_type:metric,tag:nightly` — Returns metrics with the tag `nightly` * Use [graph operators](https://docs.getdbt.com/reference/node-selection/graph-operators.md) (in the search bar) to select specific resources or a subset of the DAG. This can help narrow the focus on the resources that interest you. Examples: * `+orders` — Returns all the upstream nodes of `orders` * `+dim_customers,resource_type:source` — Returns all sources that are upstream of `dim_customers` * Use [set operators](https://docs.getdbt.com/reference/node-selection/set-operators.md) (in the search bar) to select specific resources or a subset of the DAG. This can help narrow the focus on the resources that interest you. For example: * `+snowplow_sessions +fct_orders` — Use space-delineated arguments for a union operation. Returns resources that are upstream nodes of either `snowplow_sessions` or `fct_orders`. * [View resource details](#view-resource-details) by selecting a node (double-clicking) in the graph. * Click **Lenses** (lower right corner of the graph) to use Catalog [lenses](#lenses) feature. ##### Example of full lineage graph[​](#example-of-full-lineage-graph "Direct link to Example of full lineage graph") Example of exploring a model in the project's lineage graph: [![Example of full lineage graph](/img/docs/collaborate/dbt-explorer/example-project-lineage-graph.png?v=2 "Example of full lineage graph")](#)Example of full lineage graph #### Lenses[​](#lenses "Direct link to Lenses") The **Lenses** feature is available from your [project's lineage graph](#project-lineage) (lower right corner). Lenses are like map layers for your DAG. Lenses make it easier to understand your project's contextual metadata at scale, especially to distinguish a particular model or a subset of models. When you apply a lens, tags become visible on the nodes in the lineage graph, indicating the layer value along with coloration based on that value. If you're significantly zoomed out, only the tags and their colors are visible in the graph. Lenses are helpful to analyze a subset of the DAG if you're zoomed in, or to find models/issues from a larger vantage point.  List of available lenses A resource in your project is characterized by resource type, materialization type, or model layer, as well as its latest run or latest test status. Lenses are available for the following metadata: * **Resource type**: Organizes resources by resource type, such as models, tests, seeds, saved query, and [more](https://docs.getdbt.com/docs/build/projects.md). Resource type uses the `resource_type` selector. * **Materialization type**: Identifies the strategy for building the dbt models in your data platform. * **Latest status**: The status from the latest execution of the resource in the current environment. For example, diagnosing a failed DAG region. * **Model layer**: The modeling layer that the model belongs to according to [best practices guide](https://docs.getdbt.com/best-practices/how-we-structure/1-guide-overview.md#guide-structure-overview). For example, discovering marts models to analyze. * **Marts** — A model with the prefix `fct_` or `dim_` or a model that lives in the `/marts/` subdirectory. * **Intermediate** — A model with the prefix `int_`. Or, a model that lives in the `/int/` or `/intermediate/` subdirectory. * **Staging** — A model with the prefix `stg_`. Or, a model that lives in the `/staging/` subdirectory. * **Test status**: The status from the latest execution of the tests that ran again this resource. In the case that a model has multiple tests with different results, the lens reflects the 'worst case' status. * **Consumption query history**: The number of queries against this resource over a given time period. ##### Example of lenses[​](#example-of-lenses "Direct link to Example of lenses") Example of applying the **Materialization type** *lens* with the lineage graph zoomed out. In this view, each model name has a color according to the materialization type legend at the bottom, which specifies the materialization type. This color-coding helps to quickly identify the materialization types of different models. [![Example of the Materialization type lens](/img/docs/collaborate/dbt-explorer/example-materialization-type.jpg?v=2 "Example of the Materialization type lens")](#)Example of the Materialization type lens Example of applying the **Tests Status** *lens*, where each model name displays the tests status according to the legend at the bottom, which specifies the test status. [![Example of the Test Status lens](/img/docs/collaborate/dbt-explorer/example-test-status.jpg?v=2 "Example of the Test Status lens")](#)Example of the Test Status lens #### Keyword search[​](#search-resources "Direct link to Keyword search") With Catalog, global navigation provides a search experience allowing you to find dbt resources across all your projects, as well as non-dbt resources in Snowflake. You can locate resources in your project by performing a keyword search in the search bar. All resource names, column names, resource descriptions, warehouse relations, and code matching your search criteria will be displayed as a list on the main (center) section of the page. When searching for an exact column name, the results show all relational nodes containing that column in their schemas. If there's a match, a notice in the search result indicates the resource contains the specified column. Also, you can apply filters to further refine your search results.  Search features * **Partial keyword search** — Also referred to as fuzzy search. Catalog uses a "contains" logic to improve your search results. This means you can search for partial terms without knowing the exact root word of your search term. * **Exclude keywords** — Prepend a minus sign (-) to the keyword you want to exclude from search results. For example, `-user` will exclude all matches of that keyword from search results. * **Boolean operators** — Use Boolean operators to enhance your keyword search. For example, the search results for `users OR github` will include matches for either keyword. * **Phrase search** — Surround a string of keywords with double quotation marks to search for that exact phrase (for example, `"stg users"`). To learn more, refer to [Phrase search](https://en.wikipedia.org/wiki/Phrase_search) on Wikipedia. * **SQL keyword search** — Use SQL keywords in your search. For example, the search results `int github users joined` will include matches that contain that specific string of keywords (similar to phrase searching).  Filters side panel The **Filters** side panel becomes available after you perform a keyword search. Use this panel to further refine the results from your keyword search. By default, Catalog searches across all resources in the project. You can filter on: * [Resource type](https://docs.getdbt.com/docs/build/projects.md) (like models, sources, and so on) * [Model access](https://docs.getdbt.com/docs/mesh/govern/model-access.md) (like public, private) * [Model layer](https://docs.getdbt.com/best-practices/how-we-structure/1-guide-overview.md) (like marts, staging) * [Model materialization](https://docs.getdbt.com/docs/build/materializations.md) (like view, table) * [Tags](https://docs.getdbt.com/reference/resource-configs/tags.md) (supports multi-select) Under the **Models** option, you can filter on model properties (access or materialization type). Also available are **Advanced** options, where you can limit the search results to column name, model code, and more.  Global navigation Catalog builds on the functionality of the old navigation and introduces exciting new capabilities to enhance your experience. For more information, refer to [Global navigation](https://docs.getdbt.com/docs/explore/global-navigation.md). ##### Example of keyword search[​](#example-of-keyword-search "Direct link to Example of keyword search") Example of results from searching on the keyword `customers` and applying the filters models, description, and code. [Data health signals](https://docs.getdbt.com/docs/explore/data-health-signals.md) are visible to the right of the model name in the search results. #### Browse with the sidebar[​](#browse-with-the-sidebar "Direct link to Browse with the sidebar") From the sidebar, you can browse your project's resources, its file tree, and the database. * **Resources** tab — All resources in the project organized by type. Select any resource type in the list and all those resources in the project will display as a table in the main section of the page. For a description on the different resource types (like models, metrics, and so on), refer to [About dbt projects](https://docs.getdbt.com/docs/build/projects.md). * [Data health signals](https://docs.getdbt.com/docs/explore/data-health-signals.md) are visible to the right of the resource name under the **Health** column. * **File Tree** tab — All resources in the project organized by the file in which they are defined. This mirrors the file tree in your dbt project repository. * **Database** tab — All resources in the project organized by the database and schema in which they are built. This mirrors your data platform's structure that represents the [applied state](https://docs.getdbt.com/docs/dbt-cloud-apis/project-state.md) of your project. #### Integrated tool access[​](#integrated-tool-access "Direct link to Integrated tool access") Users with a [developer license](https://docs.getdbt.com/docs/cloud/manage-access/about-user-access.md#license-based-access-control) or an analyst seat can open a resource directly from the Catalog in the Studio IDE to view its model files, in Insights to query it, or in Canvas for visual editing. #### View model versions[​](#view-model-versions "Direct link to View model versions") If models in the project are versioned, you can see which [version of the model](https://docs.getdbt.com/docs/mesh/govern/model-versions.md) is being applied — `prerelease`, `latest`, and `old` — in the title of the model's details page and in the model list from the sidebar. #### View resource details[​](#view-resource-details "Direct link to View resource details") You can view the definition and latest run results of any resource in your project. To find a resource and view its details, you can interact with the lineage graph, use search, or browse the Catalog. The details (metadata) available to you depends on the resource's type, its definition, and the [commands](https://docs.getdbt.com/docs/deploy/job-commands.md) that run within jobs in the production environment. In the upper right corner of the resource details page, you can: * Click the [Open in Studio IDE](#open-in-ide) icon to examine the resource using the [Studio IDE](https://docs.getdbt.com/docs/cloud/studio-ide/develop-in-studio.md). * Click the Share icon to copy the page's link to your clipboard.  What details are available for a model? * **Data health signals** — [Data health signals](https://docs.getdbt.com/docs/explore/data-health-signals.md) offer a quick, at-a-glance view of data health. These icons indicate whether a model is Healthy, Caution, Degraded, or Unknown. Hover over an icon to view detailed information about the model's health. * **Status bar** (below the page title) — Information on the last time the model ran, whether the run was successful, how the data is materialized, number of rows, and the size of the model. * **General** tab includes: * **Lineage** graph — The model's lineage graph that you can interact with. The graph includes one upstream node and one downstream node from the model. Click the Expand icon in the graph's upper right corner to view the model in full lineage graph mode. * **Description** section — A [description of the model](https://docs.getdbt.com/docs/build/documentation.md#adding-descriptions-to-your-project). * **Recent** section — Information on the last time the model ran, how long it ran for, whether the run was successful, the job ID, and the run ID. * **Tests** section — [Data tests](https://docs.getdbt.com/docs/build/data-tests.md) for the model, including a status indicator for the latest test status. A denotes a passing test. * **Details** section — Key properties like the model's relation name (for example, how it's represented and how you can query it in the data platform: `database.schema.identifier`); model governance attributes like access, group, and if contracted; and more. * **Relationships** section — The nodes the model **Depends On**, is **Referenced by**, and (if applicable) is **Used by** for projects that have declared the models' project as a dependency. * **Code** tab — The source code and compiled code for the model. * **Columns** tab — The available columns in the model. This tab also shows tests results (if any) that you can select to view the test's details page. A denotes a passing test. To filter the columns in the resource, you can use the search bar that's located at the top of the columns view.  What details are available for an exposure? * **Status bar** (below the page title) — Information on the last time the exposure was updated. * **Data health signals** — [Data health signals](https://docs.getdbt.com/docs/explore/data-health-signals.md) offer a quick, at-a-glance view of data health. These icons indicate whether a resource is Healthy, Caution, or Degraded. Hover over an icon to view detailed information about the exposure's health. * **General** tab includes: * **Data health** — The status on data freshness and data quality. * **Status** section — The status on data freshness and data quality. * **Lineage** graph — The exposure's lineage graph. Click the **Expand** icon in the graph's upper right corner to view the exposure in full lineage graph mode. Integrates natively with Tableau and auto-generates downstream lineage. * **Description** section — A description of the exposure. * **Details** section — Details like exposure type, maturity, owner information, and more. * **Relationships** section — The nodes the exposure **Depends On**.  What details are available for a test? * **Status bar** (below the page title) — Information on the last time the test ran, whether the test passed, test name, test target, and column name. Defaults to all if not specified. * **Test Type** (next to the Status bar) — Information on the different test types available: Unit test or Data test. Defaults to all if not specified. When you select a test, the following details are available: * **General** tab includes: * **Lineage** graph — The test's lineage graph that you can interact with. The graph includes one upstream node and one downstream node from the test resource. Click the Expand icon in the graph's upper right corner to view the test in full lineage graph mode. * **Description** section — A description of the test. * **Recent** section — Information on the last time the test ran, how long it ran for, whether the test passed, the job ID, and the run ID. * **Details** section — Details like schema, severity, package, and more. * **Relationships** section — The nodes the test **Depends On**. * **Code** tab — The source code and compiled code for the test. Example of the Tests view:  What details are available for each source table within a source collection? * **Status bar** (below the page title) — Information on the last time the source was updated and the number of tables the source uses. * **Data health signals** — [Data health signals](https://docs.getdbt.com/docs/explore/data-health-signals.md) offer a quick, at-a-glance view of data health. These icons indicate whether a resource is Healthy, Caution, or Degraded. Hover over an icon to view detailed information about the source's health. * **General** tab includes: * **Lineage** graph — The source's lineage graph that you can interact with. The graph includes one upstream node and one downstream node from the source. Click the Expand icon in the graph's upper right corner to view the source in full lineage graph mode. * **Description** section — A description of the source. * **Source freshness** section — Information on whether refreshing the data was successful, the last time the source was loaded, the timestamp of when a run generated data, and the run ID. * **Details** section — Details like database, schema, and more. * **Relationships** section — A table that lists all the sources used with their freshness status, the timestamp of when freshness was last checked, and the timestamp of when the source was last loaded. * **Columns** tab — The available columns in the source. This tab also shows tests results (if any) that you can select to view the test's details page. A denotes a passing test. ##### Example of model details[​](#example-of-model-details "Direct link to Example of model details") Example of the details view for the model `customers`:
[![Example of resource details](/img/docs/collaborate/dbt-explorer/example-model-details.png?v=2 "Example of resource details")](#)Example of resource details [![Example of downstream exposure details for Tableau.](/img/docs/cloud-integrations/auto-exposures/explorer-lineage2.jpg?v=2 "Example of downstream exposure details for Tableau.")](#)Example of downstream exposure details for Tableau. #### Staging environment[​](#staging-environment "Direct link to Staging environment") Catalog supports views for [staging deployment environments](https://docs.getdbt.com/docs/deploy/deploy-environments.md#staging-environment), in addition to the production environment. This gives you a unique view into your pre-production data workflows, with the same tools available in production, while providing an extra layer of scrutiny. You can explore the metadata from your production or staging environment to inform your data development lifecycle. Just [set a single environment](https://docs.getdbt.com/docs/deploy/deploy-environments.md) per dbt project as "production" or "staging," and ensure the proper metadata has been generated then you'll be able to view it in Catalog. Refer to [Generating metadata](https://docs.getdbt.com/docs/explore/explore-projects.md#generate-metadata) for more details. #### Related content[​](#related-content "Direct link to Related content") * [Enterprise permissions](https://docs.getdbt.com/docs/cloud/manage-access/enterprise-permissions.md) * [About model governance](https://docs.getdbt.com/docs/mesh/govern/about-model-governance.md) * Blog on [What is data mesh?](https://www.getdbt.com/blog/what-is-data-mesh-the-definition-and-importance-of-data-mesh) #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Enhance your code [![](/img/icons/dbt-bit.svg)](https://docs.getdbt.com/docs/build/environment-variables.md) ###### [Environment variables](https://docs.getdbt.com/docs/build/environment-variables.md) [Learn how you can use environment variables to customize the behavior of a dbt project.](https://docs.getdbt.com/docs/build/environment-variables.md) [![](/img/icons/dbt-bit.svg)](https://docs.getdbt.com/docs/build/hooks-operations.md) ###### [Hooks and operations](https://docs.getdbt.com/docs/build/hooks-operations.md) [Learn how to use hooks to trigger actions and operations to invoke macros.](https://docs.getdbt.com/docs/build/hooks-operations.md) [![](/img/icons/dbt-bit.svg)](https://docs.getdbt.com/docs/build/packages.md) ###### [Packages](https://docs.getdbt.com/docs/build/packages.md) [Learn how you can leverage code reuse through packages (libraries).](https://docs.getdbt.com/docs/build/packages.md) [![](/img/icons/dbt-bit.svg)](https://docs.getdbt.com/docs/build/project-variables.md) ###### [Project variables](https://docs.getdbt.com/docs/build/project-variables.md) [Learn how to use project variables to provide data to models for compilation.](https://docs.getdbt.com/docs/build/project-variables.md) #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Enhance your models [![](/img/icons/dbt-bit.svg)](https://docs.getdbt.com/docs/build/materializations.md) ###### [Materializations](https://docs.getdbt.com/docs/build/materializations.md) [Learn how to use materializations to make dbt models persist in a data platform.](https://docs.getdbt.com/docs/build/materializations.md) [![](/img/icons/dbt-bit.svg)](https://docs.getdbt.com/docs/build/incremental-models.md) ###### [Incremental models](https://docs.getdbt.com/docs/build/incremental-models.md) [Learn how to use incremental models so you can limit the amount of data that needs to be transformed.](https://docs.getdbt.com/docs/build/incremental-models.md)
#### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Entities Entities are real-world concepts in a business, such as customers, transactions, and ad campaigns. We often focus our analyses on specific entities, such as customer churn or annual recurring revenue modeling. In our Semantic Layer models, these entities serve as a join key across semantic models. Entities can be specified with a single column or multiple columns. Entities (join keys) in a semantic model are identified by their name. Each entity name must be unique within a semantic model, but it doesn't have to be unique across different semantic models. There are four entity types: * [Primary](#primary) — Has only one record for each row in the table and includes every record in the data platform. This key uniquely identifies each record in the table. * [Unique](#unique) — Contains only one record per row in the table and allows for null values. May have a subset of records in the data warehouse. * [Foreign](#foreign) — A field (or a set of fields) in one table that uniquely identifies a row in another table. This key establishes a link between tables. * [Natural](#natural) — Columns or combinations of columns in a table that uniquely identify a record based on real-world data. This key is derived from actual data attributes. Use entities as dimensions You can also use entities as dimensions, which allows you to aggregate a metric to the granularity of that entity. #### Entity types[​](#entity-types "Direct link to Entity types") MetricFlow's join logic depends on the entity `type` you use and determines how to join semantic models. Refer to [Joins](https://docs.getdbt.com/docs/build/join-logic.md) for more info on how to construct joins. ##### Primary[​](#primary "Direct link to Primary") A primary key has *only one* record for each row in the table and includes every record in the data platform. It must contain unique values and can't contain null values. Use the primary key to ensure that each record in the table is distinct and identifiable.  Primary key example For example, consider a table of employees with the following columns: ```sql employee_id (primary key) first_name last_name ``` In this case, `employee_id` is the primary key. Each `employee_id` is unique and represents one specific employee. There can be no duplicate `employee_id` and can't be null. ##### Unique[​](#unique "Direct link to Unique") A unique key contains *only one* record per row in the table but may have a subset of records in the data warehouse. However, unlike the primary key, a unique key allows for null values. The unique key ensures that the column's values are distinct, except for null values.  Unique key example For example, consider a table of students with the following columns: ```sql student_id (primary key) email (unique key) first_name last_name ``` In this example, `email` is defined as a unique key. Each email address must be unique; however, multiple students can have null email addresses. This is because the unique key constraint allows for one or more null values, but non-null values must be unique. This then creates a set of records with unique emails (non-null) that could be a subset of the entire table, which includes all students. ##### Foreign[​](#foreign "Direct link to Foreign") A foreign key is a field (or a set of fields) in one table that uniquely identifies a row in another table. The foreign key establishes a link between the data in two tables. It can include zero, one, or multiple instances of the same record. It can also contain null values.  Foreign key example For example, consider you have two tables, `customers` and `orders`: customers table: ```sql customer_id (primary key) customer_name ``` orders table: ```sql order_id (primary key) order_date customer_id (foreign key) ``` In this example, the `customer_id` in the `orders` table is a foreign key that references the `customer_id` in the `customers` table. This link means each order is associated with a specific customer. However, not every order must have a customer; the `customer_id` in the orders table can be null or have the same `customer_id` for multiple orders. ##### Natural[​](#natural "Direct link to Natural") Natural keys are columns or combinations of columns in a table that uniquely identify a record based on real-world data. For instance, if you have a `sales_person_department` dimension table, the `sales_person_id` can serve as a natural key. You can only use natural keys for [SCD type II dimensions](https://docs.getdbt.com/docs/build/dimensions.md#scd-type-ii). #### Entities configuration[​](#entities-configuration "Direct link to Entities configuration") The following is the complete spec for entities: Here's an example of how to define entities in a semantic model: #### Combine columns with a key[​](#combine-columns-with-a-key "Direct link to Combine columns with a key") If a table doesn't have any key (like a primary key), use *surrogate combination* to form a key that will help you identify a record by combining two columns. This applies to any [entity type](https://docs.getdbt.com/docs/build/entities.md#entity-types). For example, you can combine `date_key` and `brand_code` from the `raw_brand_target_weekly` table to form a *surrogate key*. The following example creates a surrogate key by joining `date_key` and `brand_code` using a pipe (`|`) as a separator. #### Examples[​](#examples "Direct link to Examples") As mentioned, entities serve as our join keys, using the unique entity name. Therefore, we can join a single `unique` key to multiple `foreign` keys. Consider a `date_categories` table with the following columns: ```sql date_id (primary key) date_day (unique key) fiscal_year_name ``` And an `orders` table with the following columns: ```sql order_id (primary key) ordered_at delivered_at order_total ``` How might we define our Semantic Layer YAML so that we can query `order_total` by `ordered_at` `fiscal_year_name`, and `delivered_at` `fiscal_year_name`? With this configuration, our semantic models can join on `ordered_at = date_day` via the `ordered_at_entity`, and on `delivered_at = date_day` via the `delivered_at_entity`. To validate our output, we can run: * `dbt sl query --metrics order_total --group-by ordered_at_entity__fiscal_year_name` or * `dbt sl query --metrics order_total --group-by delivered_at_entity__fiscal_year_name` #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Environment variables Environment variables can be used to customize the behavior of a dbt project depending on where the project is running. See the docs on [env\_var](https://docs.getdbt.com/reference/dbt-jinja-functions/env_var.md) for more information on how to call the Jinja function `{{env_var('DBT_KEY','OPTIONAL_DEFAULT')}}` in your project code. Environment Variable Naming and Prefixing Environment variables in dbt must be prefixed with either `DBT_`, `DBT_ENV_SECRET_`, or `DBT_ENV_CUSTOM_ENV_`. Environment variables keys are uppercased and case sensitive. When referencing `{{env_var('DBT_KEY')}}` in your project's code, the key must match exactly the variable defined in dbt's UI. ##### Setting and overriding environment variables[​](#setting-and-overriding-environment-variables "Direct link to Setting and overriding environment variables") This section explains how to set and override environment variables in dbt. * [Order of precedence](#order-of-precedence) * [Setting environment variables](#setting-environment-variables) * [Overriding environment variables at the job level](#overriding-environment-variables-at-the-job-level) * [Overriding environment variables at the personal level](#overriding-environment-variables-at-the-personal-level) * [Local environment variables](#local-environment-variables) ###### Order of precedence[​](#order-of-precedence "Direct link to Order of precedence") Environment variable values can be set in multiple places within dbt. As a result, dbt will interpret environment variables according to the following order of precedence (lowest to highest): [![Environment variables order of precedence]( "Environment variables order of precedence")](#)Environment variables order of precedence There are four levels of environment variables: 1. The optional default argument supplied to the `env_var` Jinja function in code, which can be overridden at (*lowest precedence*) 2. The project-wide level by its default value, which can be overridden at 3. The environment level, which can in turn be overridden again at 4. The job level (job override) or in the Studio IDE for an individual dev (personal override). (*highest precedence*) ###### Setting environment variables[​](#setting-environment-variables "Direct link to Setting environment variables") To set environment variables at the project and environment level, click **Orchestration** in the left-side menu, then select **Environments**. Click **Environment variables** to add and update your environment variables. [![Environment variables tab]( "Environment variables tab")](#)Environment variables tab You'll notice there is a **Project default** column. This is a great place to set a value that will persist across your whole project, independent of where the code is run. We recommend setting this value when you want to supply a catch-all default or add a project-wide token or secret. To the right of the **Project default** column are all your environments. Values set at the environmental level take priority over the project-level default value. This is where you can tell dbt to interpret an environment value differently in your Staging vs. Production environment, as an example. [![Setting project level and environment level values]( "Setting project level and environment level values")](#)Setting project level and environment level values ###### Overriding environment variables at the job level[​](#overriding-environment-variables-at-the-job-level "Direct link to Overriding environment variables at the job level") You may have multiple jobs that run in the same environment, and you'd like the environment variable to be interpreted differently depending on the job. When setting up or editing a job, you will see a section where you can override environment variable values defined at the environment or project level. [![Navigating to environment variables job override settings]( "Navigating to environment variables job override settings")](#)Navigating to environment variables job override settings Every job runs in a specific, deployment environment, and by default, a job will inherit the values set at the environment level (or the highest precedence level set) for the environment in which it runs. If you'd like to set a different value at the job level, edit the value to override it. [![Setting a job override value]( "Setting a job override value")](#)Setting a job override value ###### Overriding environment variables at the personal level[​](#overriding-environment-variables-at-the-personal-level "Direct link to Overriding environment variables at the personal level") You can also set a personal value override for an environment variable when you develop in the dbt-integrated developer environment (Studio IDE). By default, dbt uses environment variable values set in the project's development environment. To see and override these values, from dbt: * Click on your account name in the left side menu and select **Account settings**. * Under the **Your profile** section, click **Credentials** and then select your project. * Scroll to the **Environment variables** section and click **Edit** to make the necessary changes. [![Navigating to environment variables personal override settings]( "Navigating to environment variables personal override settings")](#)Navigating to environment variables personal override settings To supply an override, developers can edit and specify a different value to use. These values will be respected in the Studio IDE both for the Results and Compiled SQL tabs. [![Setting a personal override value]( "Setting a personal override value")](#)Setting a personal override value Appropriate coverage If you have not set a project level default value for every environment variable, it may be possible that dbt does not know how to interpret the value of an environment variable in all contexts. In such cases, dbt will throw a compilation error: "Env var required but not provided". Changing environment variables mid-session in the Studio IDE If you change the value of an environment variable mid-session while using the Studio IDE, you may have to refresh the Studio IDE for the change to take effect. To refresh the Studio IDE mid-development, click on either the green 'ready' signal or the red 'compilation error' message at the bottom right corner of the Studio IDE. A new modal will pop up, and you should select the **Restart IDE** button. This will load your environment variables values into your development environment. [![Refreshing IDE mid-session]( "Refreshing IDE mid-session")](#)Refreshing IDE mid-session There are some known issues with partial parsing of a project and changing environment variables mid-session in the IDE. If you find that your dbt project is not compiling to the values you've set, try deleting the `target/partial_parse.msgpack` file in your dbt project which will force dbt to re-compile your whole project. ###### Local environment variables[​](#local-environment-variables "Direct link to Local environment variables") If you are using the dbt VS Code extension, you can set environment variables locally in your shell profile (`~/.zshrc` or `~/.bashrc`) or in a `.env` file at the root level of your dbt project. See the [Configure the dbt VS Code extension](https://docs.getdbt.com/docs/configure-dbt-extension.md#set-environment-variables-locally) page for more information. ##### Handling secrets[​](#handling-secrets "Direct link to Handling secrets") While all environment variables are encrypted at rest in dbt, dbt has additional capabilities for managing environment variables with secret or otherwise sensitive values. If you want a particular environment variable to be scrubbed from all logs and error messages, in addition to obfuscating the value in dbt, you can prefix the key with `DBT_ENV_SECRET`. Environment variables prefixed with `DBT_ENV_SECRET_` are protected with additional security controls. They are encrypted at rest using an encryption key (for example, AWS KMS when your deployment is hosted on AWS) and can only be accessed by decrypting them with that key. Decryption is restricted to specific flows where the value is required, such as when a job runs. Secret keys are never written to logs or error messages and are obfuscated in dbt, so they are not exposed in the UI or artifacts, and are only available to dbt at runtime as needed. [![DBT\_ENV\_SECRET prefix obfuscation]( "DBT_ENV_SECRET prefix obfuscation")](#)DBT\_ENV\_SECRET prefix obfuscation **Note**: An environment variable can be used to store a [git token for repo cloning](https://docs.getdbt.com/docs/build/environment-variables.md#clone-private-packages). We recommend you make the git token's permissions read only and consider using a machine account or service user's PAT with limited repo access in order to practice good security hygiene. ##### Special environment variables[​](#special-environment-variables "Direct link to Special environment variables") dbt has a number of pre-defined variables built in. Variables are set automatically and cannot be changed. This means that the order of precedence for overriding environment variables doesn't apply to these pre-defined variables at the project, environment, or job level. ###### Studio IDE details[​](#studio-ide-details "Direct link to Studio IDE details") The following environment variable is set automatically for the Studio IDE: * `DBT_CLOUD_GIT_BRANCH` — Provides the development Git branch name in the [Studio IDE](https://docs.getdbt.com/docs/cloud/studio-ide/develop-in-studio.md). * The variable changes when the branch is changed. * Doesn't require restarting the Studio IDE after a branch change. * Currently not available in the [dbt CLI](https://docs.getdbt.com/docs/cloud/cloud-cli-installation.md). Use case — This is useful in cases where you want to dynamically use the Git branch name as a prefix for a [development schema](https://docs.getdbt.com/docs/build/custom-schemas.md) ( `{{ env_var ('DBT_CLOUD_GIT_BRANCH') }}` ). ###### dbt platform context[​](#dbt-platform-context "Direct link to dbt platform context") The following environment variables are set automatically: * `DBT_ENV` — This key is reserved for the dbt application and will always resolve to 'prod'. For deployment runs only. * `DBT_CLOUD_ENVIRONMENT_NAME` — The name of the dbt environment in which `dbt` is running. * `DBT_CLOUD_ENVIRONMENT_TYPE` — The type of dbt environment in which `dbt` is running. The valid values are `dev`, `staging`, or `prod`. The value will be empty for [General deployment environments](https://docs.getdbt.com/docs/dbt-cloud-environments.md#types-of-environments), so use a default like `{{ env_var('DBT_CLOUD_ENVIRONMENT_TYPE', '') }}`. * `DBT_CLOUD_INVOCATION_CONTEXT` — The context type in which `dbt` is invoked. The values are `dev`, `staging`, `prod`, or `ci`. * Additionally, use `DBT_CLOUD_INVOCATION_CONTEXT` in the `generate_schema_name()` macro to define explicit guidelines to use the default schema only (with the `dbt_cloud_pr prefix`) in CI job runs, even if those CI jobs run in the same environment as production jobs. ###### Run details[​](#run-details "Direct link to Run details") * `DBT_CLOUD_PROJECT_ID` — The ID of the dbt Project for this run * `DBT_CLOUD_JOB_ID` — The ID of the dbt Job for this run * `DBT_CLOUD_RUN_ID` — The ID of this particular run * `DBT_CLOUD_RUN_REASON_CATEGORY` — The "category" of the trigger for this run (one of: `scheduled`, `github_pull_request`, `gitlab_merge_request`, `azure_pull_request`, `other`) * `DBT_CLOUD_RUN_REASON` — The specific trigger for this run (eg. `Scheduled`, `Kicked off by `, or custom via `API`) * `DBT_CLOUD_ENVIRONMENT_ID` — The ID of the environment for this run * `DBT_CLOUD_ACCOUNT_ID` — The ID of the dbt account for this run ###### Git details[​](#git-details "Direct link to Git details") *The following variables are currently only available for GitHub, GitLab, and Azure DevOps PR builds triggered via a webhook* * `DBT_CLOUD_PR_ID` — The Pull Request ID in the connected version control system * `DBT_CLOUD_GIT_SHA` — The git commit SHA which is being run for this Pull Request build ##### Example usage[​](#example-usage "Direct link to Example usage") Environment variables can be used in many ways, and they give you the power and flexibility to do what you want to do more easily in dbt.  Clone private packages Now that you can set secrets as environment variables, you can pass git tokens into your package HTTPS URLs to allow for on-the-fly cloning of private repositories. Read more about enabling [private package cloning](https://docs.getdbt.com/docs/build/packages.md#private-packages).  Dynamically set your warehouse in your Snowflake connection Environment variables make it possible to dynamically change the Snowflake virtual warehouse size depending on the job. Instead of calling the warehouse name directly in your project connection, you can reference an environment variable which will get set to a specific virtual warehouse at runtime. For example, suppose you'd like to run a full-refresh job in an XL warehouse, but your incremental job only needs to run in a medium-sized warehouse. Both jobs are configured in the same dbt environment. In your connection configuration, you can use an environment variable to set the warehouse name to `{{env_var('DBT_WAREHOUSE')}}`. Then in the job settings, you can set a different value for the `DBT_WAREHOUSE` environment variable depending on the job's workload. Currently, it's not possible to dynamically set environment variables across models within a single run. This is because each env\_var can only have a single set value for the entire duration of the run. **Note** — You can also use this method with Databricks SQL Warehouse. [![Adding environment variables to your connection credentials]( "Adding environment variables to your connection credentials")](#)Adding environment variables to your connection credentials Environment variables and Snowflake OAuth limitations Env vars works fine with username/password and keypair, including scheduled jobs, because dbt Core consumes the Jinja inserted into the autogenerated `profiles.yml` and resolves it to do an `env_var` lookup. However, there are some limitations when using env vars with Snowflake OAuth Connection settings: * You can't use them in the account/host field, but they can be used for database, warehouse, and role. For these fields, [use extended attributes](https://docs.getdbt.com/docs/deploy/deploy-environments.md#deployment-connection). Something to note, if you supply an environment variable in the account/host field, Snowflake OAuth Connection will **fail** to connect. This happens because the field doesn't pass through Jinja rendering, so dbt simply passes the literal `env_var` code into a URL string like `{{ env_var("DBT_ACCOUNT_HOST_NAME") }}.snowflakecomputing.com`, which is an invalid hostname. Use [extended attributes](https://docs.getdbt.com/docs/deploy/deploy-environments.md#deployment-credentials) instead.  Audit your run metadata Here's another motivating example that uses the dbt run ID, which is set automatically at each run. This additional data field can be used for auditing and debugging: ```sql {{ config(materialized='incremental', unique_key='user_id') }} with users_aggregated as ( select user_id, min(event_time) as first_event_time, max(event_time) as last_event_time, count(*) as count_total_events from {{ ref('users') }} group by 1 ) select *, -- Inject the run id if present, otherwise use "manual" '{{ env_var("DBT_CLOUD_RUN_ID", "manual") }}' as _audit_run_id from users_aggregated ```  Configure Semantic Layer credentials Use [Extended Attributes](https://docs.getdbt.com/docs/dbt-cloud-environments.md#extended-attributes) and [Environment Variables](https://docs.getdbt.com/docs/build/environment-variables.md) when connecting to the Semantic Layer. If you set a value directly in the Semantic Layer Credentials, it will have a higher priority than Extended Attributes. When using environment variables, the default value for the environment will be used. For example, set the warehouse by using `{{env_var('DBT_WAREHOUSE')}}` in your Semantic Layer credentials. Similarly, if you set the account value using `{{env_var('DBT_ACCOUNT')}}` in Extended Attributes, dbt will check both the Extended Attributes and the environment variable. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Explore cost data Private betaEnterpriseEnterprise + ### Explore cost data [Private beta](https://docs.getdbt.com/docs/dbt-versions/product-lifecycles "Go to https://docs.getdbt.com/docs/dbt-versions/product-lifecycles")[Enterprise](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing")[Enterprise +](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing") You can access Cost Insights in these different dbt platform areas: * [Project dashboard](#project-dashboard) * [Catalog on Model page](#model-performance-in-catalog) * [Job details page](#job-details) Each view provides different levels of detail to help you understand your warehouse spending and optimization impact. Cost and cost reduction estimates are based on historical runs and reflect actual usage, *not* forecasts of future costs. #### Prerequisites[​](#prerequisites "Direct link to Prerequisites") To view cost data, ensure you have: * A dbt account with dbt Fusion engine enabled. Contact your account manager to enable Fusion for your account. * One of the roles listed in [Assign required permissions](https://docs.getdbt.com/docs/explore/set-up-cost-insights.md#assign-required-permissions). * A supported data warehouse: Snowflake, BigQuery, or Databricks. For more information, see [Set up Cost Insights](https://docs.getdbt.com/docs/explore/set-up-cost-insights.md). note For accounts already using state-aware orchestration before Cost Insights is enabled, at least one full model build must occur within the last 10 days to establish a baseline for cost reduction calculations. If you don't see cost reduction data, try running a full build to establish the baseline. #### Project dashboard[​](#project-dashboard "Direct link to Project dashboard") The Cost Insights section in your project dashboard gives you a high-level view of warehouse costs and the impact of optimization through state-aware orchestration. ##### Access[​](#access "Direct link to Access") To go to your project dashboard, select your project in the main menu and click **Dashboard**. ##### Key metrics[​](#key-metrics "Direct link to Key metrics") The project dashboard displays the following metrics that summarize the overall cost and optimization impact for your project: * **Total cost reduction** * **Total % reduction** * **Total query run time reduction** * **Reused assets** ##### Filters[​](#filters "Direct link to Filters") You can customize the cost data you want to view by: * **Deployment type**: Production or Staging * **Last**: 30 days, 60 days, 90 days, 6 months, or 1 year * **View**: Daily, Weekly, or Monthly ##### Visualization tabs[​](#visualization-tabs "Direct link to Visualization tabs") The project dashboard includes the following tabs that help you analyze cost and optimization trends over time: * **Cost**: Shows the estimated build cost reduction when using state-aware orchestration. * **Query run time**: Shows the estimated reduction in build time when using state-aware orchestration. * **Model builds**: Shows the number of models built versus models reused by state-aware orchestration. * **Usage**: Shows the estimated warehouse usage consumed and the reduction in usage from state-aware orchestration over the selected timeframe. The **Usage** tab represents generic usage for your warehouse. The specific unit depends on your data warehouse: * Snowflake: Credits * BigQuery: Slot hours or bytes scanned (currently combined into one generic usage number) * Databricks: Databricks Units (DBUs) ##### Table view[​](#table-view "Direct link to Table view") Access the table view by clicking **Show table**, which provides detailed optimization data such as models reused, usage reduction, and cost reduction. When viewing the table, you can export the data as a CSV file using the **Download** button. #### Model performance in Catalog[​](#model-performance-in-catalog "Direct link to Model performance in Catalog") The **Model performance** section in Catalog displays historical trends to help you identify optimization opportunities and understand model resource consumption. ##### Access[​](#access-1 "Direct link to Access") To access model performance data: 1. From the main menu, go to **Catalog**. 2. Click your project from the file tree. 3. Navigate to the model whose cost data you want to view. You can search for it or click **Models** under **Project assets** in the sidebar to view all available models in the project. 4. Go to the the **Performance** tab on the model's details page. ##### Key metrics[​](#key-metrics "Direct link to Key metrics") The **Model performance** section displays the following metrics that summarize the overall cost and optimization impact for your project: * **Total cost reduction** * **Total % reduction** * **Total query run time deduction** * **Reused assets** (when state-aware orchestration is enabled) ##### Filters[​](#filters "Direct link to Filters") Use the time period filter to customize the data you want to view: from the last 3 months up to the last 1 week. For **Cost insights**, **Usage**, and **Query run time** tabs, you can set the view granularity by **Daily**, **Weekly**, or **Monthly**. ##### Visualization tabs[​](#visualization-tabs "Direct link to Visualization tabs") * **Cost insights**: Shows the estimated warehouse costs incurred by this model and cost reduction from state-aware orchestration. * **Usage**: Shows the estimated warehouse usage consumed by this model over time. The **Usage** tab represents generic usage for your warehouse. The specific unit depends on your data warehouse: * Snowflake: Credits * BigQuery: Slot hours or bytes scanned (currently combined into one generic usage number) * Databricks: Databricks Units (DBUs) * **Query run time**: Shows the estimated query execution time and the reduction in run duration from state-aware orchestration. * **Build time**: Shows average execution time for the model and how it trends over the selected period. * **Build count**: Tracks how many times the model was built or reused, including any failures or errors. * **Test results**: Displays test execution outcomes and pass/fail rates for tests on this model. * **Consumption queries**: Shows queries running against this model, helping you understand downstream usage patterns. ##### Table view[​](#table-view "Direct link to Table view") For **Cost insights**, **Usage**, and **Query run time** tabs, you can access the table view by clicking **Show table**, which provides detailed optimization data such as models reused, usage reduction, and cost reduction. When viewing the table, you can export the data as a CSV file using the **Download** button. ##### Chart interactions[​](#chart-interactions "Direct link to Chart interactions") For **Build time** and **Build count** tabs: * Click on any data point in the charts to see a detailed table listing all job runs for that day. * Each row in the table provides a direct link to the run details if you want to investigate further. #### Job details[​](#job-details "Direct link to Job details") The **Insights** section on the Job details page provides cost and performance data for individual jobs. ##### Access[​](#access-2 "Direct link to Access") To access job details, select your project in the main menu and go to **Orchestration** > **Jobs**. Select the job whose cost data you want to view. ##### Filters[​](#filters-1 "Direct link to Filters") For **Run duration**, **Cost**, **Model builds**, and **Usage** tabs, you can customize the cost data you want to view by: * **Last**: 30 days, 60 days, 90 days, 6 months, or 1 year * **View**: Daily, Weekly, Monthly ##### Visualization tabs[​](#visualization-tabs-1 "Direct link to Visualization tabs") * **Runs**: Displays the success rate and run duration in minutes for recent runs. You can select a time period with options for **Last week**, **Last 14 days**, and **Last 30 days**. * **Query run time**: Shows the estimated query execution time and the reduction in run duration from state-aware orchestration. * **Cost**: Shows the estimated build cost reduction when using state-aware orchestration. * **Model builds**: Shows the number of models built versus models reused by state-aware orchestration. * **Usage**: Shows the estimated warehouse usage consumed and the reduction in usage from state-aware orchestration over the selected timeframe. The **Usage** tab represents generic usage for your warehouse. The specific unit depends on your data warehouse: * Snowflake: Credits * BigQuery: Slot hours or bytes scanned (currently combined into one generic usage number) * Databricks: Databricks Units (DBUs) ##### Table view[​](#table-view-1 "Direct link to Table view") For **Run duration**, **Cost**, **Model builds**, and **Usage** tabs, you can access the table view by clicking **Show table**, which provides detailed optimization data such as models reused, usage reduction, and cost reduction. When viewing the table, you can export the data as a CSV file using the **Download** button. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Explore multiple projects View all the projects and public models in your account (where public models are defined) and gain a better understanding of your cross-project resources and how they're used. On-demand learning If you enjoy video courses, check out our [dbt Catalog on-demand course](https://learn.getdbt.com/courses/dbt-catalog) and learn how to best explore your dbt project(s)! The resource-level lineage graph for a project displays the cross-project relationships in the DAG, with a **PRJ** icon indicating whether or not it's a project resource. That icon is located to the left side of the node name. To view the project-level lineage graph, click the **View lineage** icon in the upper right corner from the main overview page: * This view displays all the projects in your account and their relationships. * Viewing an upstream (parent) project displays the downstream (child) projects that depend on it. * Selecting a model reveals its dependent projects in the lineage. * Click on an upstream (parent) project to view the other projects that reference it in the **Relationships** tab, showing the number of downstream (child) projects that depend on them. * This includes all projects listing the upstream one as a dependency in its `dependencies.yml` file, even without a direct `{{ ref() }}`. * Selecting a project node from a public model opens its detailed lineage graph if you have the [permissions](https://docs.getdbt.com/docs/cloud/manage-access/enterprise-permissions.md) to do so. Indirect dependencies When viewing a project's lineage, Catalog shows only *directly* [referenced](https://docs.getdbt.com/docs/mesh/govern/project-dependencies.md) public models. It doesn't show [indirect dependencies](https://docs.getdbt.com/faqs/Project_ref/indirectly-reference-upstream-model.md). If a referenced model in your project depends on another upstream public model, the second-level model won't appear in Catalog, however it will appear in the [Studio IDE](https://docs.getdbt.com/docs/cloud/studio-ide/develop-in-studio.md) lineage view. [![View your cross-project lineage in a parent project and the other projects that reference it by clicking the 'Relationships' tab.](/img/docs/collaborate/dbt-explorer/cross-project-lineage-parent.png?v=2 "View your cross-project lineage in a parent project and the other projects that reference it by clicking the 'Relationships' tab.")](#)View your cross-project lineage in a parent project and the other projects that reference it by clicking the 'Relationships' tab. When viewing a downstream (child) project that imports and refs public models from upstream (parent) projects: * Public models will show up in the lineage graph and you can click on them to view the model details. * Clicking on a model opens a side panel containing general information about the model, such as the specific dbt project that produces that model, description, package, and more. * Double-clicking on a model from another project opens the resource-level lineage graph of the parent project, if you have the permissions to do so. [![View a downstream (child) project that imports and refs public models from the upstream (parent) project.](/img/docs/collaborate/dbt-explorer/cross-project-child.png?v=2 "View a downstream (child) project that imports and refs public models from the upstream (parent) project.")](#)View a downstream (child) project that imports and refs public models from the upstream (parent) project. #### Explore the project-level lineage graph[​](#explore-the-project-level-lineage-graph "Direct link to Explore the project-level lineage graph") For cross-project collaboration, you can interact with the DAG in all the same ways as described in [Explore your project's lineage](https://docs.getdbt.com/docs/explore/explore-projects.md#project-lineage) but you can also interact with it at the project level and view the details. If you have permissions for a project in the account, you can view all public models used across the entire account. However, you can only view full public model details and private models if you have permissions for the specific project where those models are defined. To view all the projects in your account (displayed as a lineage graph or list view): * Navigate to the left section of the **Catalog** page, near the navigation. * Hover over the project name and select the account name. This takes you to a account-level lineage graph page, where you can view all the projects in the account, including dependencies and relationships between different projects. * Click the **List view** icon in the page's upper right corner to see a list view of all the projects in the account. * The list view page displays a public model list, project list, and a search bar for project searches. * Click the **Lineage view** icon in the page's upper right corner to view the account-level lineage graph. [![View a downstream (child) project, which imports and refs public models from upstream (parent) projects.](/img/docs/collaborate/dbt-explorer/account-level-lineage.gif?v=2 "View a downstream (child) project, which imports and refs public models from upstream (parent) projects.")](#)View a downstream (child) project, which imports and refs public models from upstream (parent) projects. From the account-level lineage graph, you can: * Click the **Lineage view** icon (in the graph’s upper right corner) to view the cross-project lineage graph. * Click the **List view** icon (in the graph’s upper right corner) to view the project list. * Select a project from the **Projects** tab to switch to that project’s main **Explore** page. * Select a model from the **Public Models** tab to view the [model’s details page](https://docs.getdbt.com/docs/explore/explore-projects.md#view-resource-details). * Perform searches on your projects with the search bar. * Select a project node in the graph (double-clicking) to switch to that particular project’s lineage graph. When you select a project node in the graph, a project details panel opens on the graph’s right-hand side where you can: * View counts of the resources defined in the project. * View a list of its public models, if any. * View a list of other projects that uses the project, if any. * Click **Open Project Lineage** to switch to the project’s lineage graph. * Click the **Share** icon to copy the project panel link to your clipboard so you can share the graph with someone. [![Select a downstream (child) project to open the project details panel for resource counts, public models associated, and more. ](/img/docs/collaborate/dbt-explorer/multi-project-overview.gif?v=2 "Select a downstream (child) project to open the project details panel for resource counts, public models associated, and more. ")](#)Select a downstream (child) project to open the project details panel for resource counts, public models associated, and more. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Explore your data dbt provides a variety of tools for you to explore your data, models, and other resources. Many of the features you'd traditionally use your data warehouse services to explore are at your fingertips in your dbt account. [![](/img/icons/dbt-bit.svg)](https://docs.getdbt.com/docs/explore/cost-insights.md) ###### [Cost Insights](https://docs.getdbt.com/docs/explore/cost-insights.md) [Track warehouse compute costs and see realized savings from state-aware orchestration across your dbt projects and models.](https://docs.getdbt.com/docs/explore/cost-insights.md) [![](/img/icons/dbt-bit.svg)](https://docs.getdbt.com/docs/explore/explore-projects.md) ###### [dbt Catalog](https://docs.getdbt.com/docs/explore/explore-projects.md) [Interact with dbt Catalog to understand, improve, and leverage your dbt projects.](https://docs.getdbt.com/docs/explore/explore-projects.md) [![](/img/icons/dbt-bit.svg)](https://docs.getdbt.com/docs/explore/dbt-insights.md) ###### [dbt Insights](https://docs.getdbt.com/docs/explore/dbt-insights.md) [Query data and perform exploratory data analysis using dbt Insights.](https://docs.getdbt.com/docs/explore/dbt-insights.md) [![](/img/icons/dbt-bit.svg)](https://docs.getdbt.com/docs/explore/build-and-view-your-docs.md) ###### [Documentation](https://docs.getdbt.com/docs/explore/build-and-view-your-docs.md) [Document your dbt projects so stakeholders, engineers, and analysts can understand your resources and lineage from start to finish.](https://docs.getdbt.com/docs/explore/build-and-view-your-docs.md)
Some features are only available on [selected plans](https://www.getdbt.com/pricing/). #### Related docs[​](#related-docs "Direct link to Related docs") * [dbt plans and pricing](https://www.getdbt.com/pricing/) * [Quickstart guides](https://docs.getdbt.com/docs/get-started-dbt.md) * [Reference material](https://docs.getdbt.com/reference/references-overview.md) #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### External metadata ingestion EnterpriseEnterprise + Preview ### External metadata ingestion [Enterprise](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing")[Enterprise +](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing") [Preview](https://docs.getdbt.com/docs/dbt-versions/product-lifecycles "Go to https://docs.getdbt.com/docs/dbt-versions/product-lifecycles") With external metadata ingestion, you can connect directly to your data warehouse, giving you visibility into tables, views, and other resources that aren't defined in dbt with Catalog. External metadata ingestion support Currently, external metadata ingestion is supported for Snowflake only. External metadata credentials enable ingestion of metadata that exists *outside* your dbt runs like tables, views, or cost information; typically at a higher level than what dbt environments access. This is useful for enriching Catalog with warehouse-native insights (for example, Snowflake views or access patterns) and creating a unified discovery experience. These credentials are configured separately from dbt environment credentials and are scoped at the account level, not the project level. #### Prerequisites[​](#prerequisites "Direct link to Prerequisites") * Have a dbt account on the [Enterprise or Enterprise+](https://www.getdbt.com/pricing) plan. * You must be an [account admin with permission](https://docs.getdbt.com/docs/cloud/manage-access/enterprise-permissions.md#account-admin) to edit connections. * The credentials must have [sufficient read-level access to fetch metadata](https://docs.getdbt.com/docs/explore/external-metadata-ingestion.md#configuration-instructions). * Have [**global navigation**](https://docs.getdbt.com/docs/explore/explore-projects.md#catalog-overview) enabled. * Use Snowflake as your data platform. * Stayed tuned! Coming very soon, there’ll be support in future for other adapters! #### Configuration instructions[​](#configuration-instructions "Direct link to Configuration instructions") ##### Enable external metadata ingestion[​](#enable-external-metadata-ingestion "Direct link to Enable external metadata ingestion") 1. Click your account name at the bottom of the left-side menu and click **[Account settings](https://docs.getdbt.com/docs/cloud/account-settings.md)**. 2. Under Account information, go to **Settings** and click **Edit** at the top right corner of the page. 3. Select the **Ingest external metadata in dbt Catalog (formerly dbt Explorer)** option (if not already enabled). ##### Configure the warehouse connection[​](#configure-the-warehouse-connection "Direct link to Configure the warehouse connection") 1. Go to **Account settings**. 2. Click **Connections** from the left-hand side panel. 3. Select an existing connection or create a [**New connection**](https://docs.getdbt.com/docs/cloud/connect-data-platform/connect-snowflake.md) where you want to ingest metadata from. 4. Scroll to the bottom of the page and click **Add credentials** in **Platform metadata credentials**. * Enter the necessary credentials. These should have warehouse-level visibility across relevant databases and schemas. * If you have multiple connections that reference the same account identifier, you will only be prompted to add platform metadata credentials to one of them. Other connections using the same account identifier will display a message indicating that platform metadata credentials are already configured. 5. Select the **External metadata ingestion** option. * This allows metadata from this connection to populate the Catalog. * *Optional*: Enable additional features such as **cost optimization** in the **Features** section under **Platform metadata credentials**. 6. Under **Catalog filters**, apply filters to restrict which metadata is ingested: * You can filter by **database**, **schema**, **table**, or **view**. * **Note:** To include all databases or schemas, enter `.*` in the **Allow** field. * It is strongly recommend to filter by certain schemas. See [Important considerations](https://docs.getdbt.com/docs/explore/external-metadata-ingestion.md#important-considerations) for more information. * These fields accept CSV-formatted regular expressions: * Example: `DIM` matches `DIM_ORDERS` and `DIMENSION_TABLE` (basic "contains" match). * Wildcards are supported. For example: `DIM*` matches `DIM_ORDERS` and `DIM_PRODUCTS`. #### Required credentials[​](#required-credentials "Direct link to Required credentials") This section sets up the foundational access for dbt in Snowflake. It creates a role (`dbt_metadata_role`) with minimal permissions and a user (`dbt_metadata_user`) dedicated to dbt’s metadata access. This ensures a clear, controlled separation of access, so dbt can read metadata without broader permissions. This setup ensures dbt can read metadata for profiling, documentation, and lineage, without the ability to modify data or manage resources. 1. Create role: ```sql CREATE OR REPLACE ROLE dbt_metadata_role; ``` 2. Grant access to a warehouse to run queries to view metadata: ```sql GRANT USAGE ON WAREHOUSE "" TO ROLE dbt_metadata_role; ``` If your warehouse needs to be restarted for metadata ingestions (doesn't have auto-resume enabled), you may need to grant `OPERATE` permissions to the role as well. If you do not already have a user, create a dbt-specific user for metadata access. Replace `` with a strong password and `` with the warehouse name used above: ```sql CREATE USER dbt_metadata_user DISPLAY_NAME = 'dbt Metadata Integration' PASSWORD = 'our-password>' DEFAULT_ROLE = dbt_metadata_role TYPE = 'LEGACY_SERVICE' DEFAULT_WAREHOUSE = ''; ``` 3. Grant the role to the user: ```sql GRANT ROLE dbt_metadata_role TO USER dbt_metadata_user; ``` Note: Use read-only service accounts for least privilege and better auditing. #### Assign metadata access privileges[​](#assign-metadata-access-privileges "Direct link to Assign metadata access privileges") This section outlines the minimum necessary privileges to read metadata from each required Snowflake database. It provides access to schemas, tables, views, and lineage information, ensuring dbt can profile and document your data while preventing any modifications. Replace `your-database` with the name of a Snowflake database to grant metadata access. Repeat this block for each relevant database: ```sql SET db_var = '""'; -- Grant access to view the database and its schemas GRANT USAGE ON DATABASE IDENTIFIER($db_var) TO ROLE dbt_metadata_role; GRANT USAGE ON ALL SCHEMAS IN DATABASE IDENTIFIER($db_var) TO ROLE dbt_metadata_role; GRANT USAGE ON FUTURE SCHEMAS IN DATABASE IDENTIFIER($db_var) TO ROLE dbt_metadata_role; -- Grant REFERENCES to enable lineage and dependency analysis GRANT REFERENCES ON ALL TABLES IN DATABASE IDENTIFIER($db_var) TO ROLE dbt_metadata_role; GRANT REFERENCES ON FUTURE TABLES IN DATABASE IDENTIFIER($db_var) TO ROLE dbt_metadata_role; GRANT REFERENCES ON ALL EXTERNAL TABLES IN DATABASE IDENTIFIER($db_var) TO ROLE dbt_metadata_role; GRANT REFERENCES ON FUTURE EXTERNAL TABLES IN DATABASE IDENTIFIER($db_var) TO ROLE dbt_metadata_role; GRANT REFERENCES ON ALL VIEWS IN DATABASE IDENTIFIER($db_var) TO ROLE dbt_metadata_role; GRANT REFERENCES ON FUTURE VIEWS IN DATABASE IDENTIFIER($db_var) TO ROLE dbt_metadata_role; -- Recommended grant SELECT for privileges to enable metadata introspection and profiling GRANT SELECT ON ALL TABLES IN DATABASE IDENTIFIER($db_var) TO ROLE dbt_metadata_role; GRANT SELECT ON FUTURE TABLES IN DATABASE IDENTIFIER($db_var) TO ROLE dbt_metadata_role; GRANT SELECT ON ALL EXTERNAL TABLES IN DATABASE IDENTIFIER($db_var) TO ROLE dbt_metadata_role; GRANT SELECT ON FUTURE EXTERNAL TABLES IN DATABASE IDENTIFIER($db_var) TO ROLE dbt_metadata_role; GRANT SELECT ON ALL VIEWS IN DATABASE IDENTIFIER($db_var) TO ROLE dbt_metadata_role; GRANT SELECT ON FUTURE VIEWS IN DATABASE IDENTIFIER($db_var) TO ROLE dbt_metadata_role; GRANT SELECT ON ALL DYNAMIC TABLES IN DATABASE IDENTIFIER($db_var) TO ROLE dbt_metadata_role; GRANT SELECT ON FUTURE DYNAMIC TABLES IN DATABASE IDENTIFIER($db_var) TO ROLE dbt_metadata_role; -- Grant MONITOR on dynamic tables (e.g., for freshness or status checks) GRANT MONITOR ON ALL DYNAMIC TABLES IN DATABASE IDENTIFIER($db_var) TO ROLE dbt_metadata_role; GRANT MONITOR ON FUTURE DYNAMIC TABLES IN DATABASE IDENTIFIER($db_var) TO ROLE dbt_metadata_role; ``` #### Grant access to Snowflake metadata[​](#grant-access-to-snowflake-metadata "Direct link to Grant access to Snowflake metadata") This step grants the dbt role (`dbt_metadata_role`) access to Snowflake’s system-level database, enabling it to read usage statistics, query histories, and lineage information required for comprehensive metadata insights. Grant privileges to read usage stats and lineage from Snowflake’s system-level database: ```sql GRANT IMPORTED PRIVILEGES ON DATABASE SNOWFLAKE TO ROLE dbt_metadata_role; ``` #### Important considerations[​](#important-considerations "Direct link to Important considerations") The following are best practices for external metadata ingestion, designed to ensure consistent, reliable, and scalable integration of metadata from third-party systems. * Catalog unifies the shared resources between dbt and Snowflake. For example, if there’s a Snowflake table that represents a dbt model, these are represented as a single resource in Catalog. In order for proper unification to occur, the same connection must be used by both the [production environment](https://docs.getdbt.com/docs/deploy/deploy-environments.md#set-as-production-environment) and the external metadata ingestion credential. * Avoid duplicates: Use one metadata connection per platform if possible (for example, one for Snowflake, one for BigQuery). * Having multiple connections pointing to the same warehouse can cause duplicate metadata. * Align with dbt environment: To unify asset lineage and metadata, ensure the same warehouse connection is used by both the dbt environment and the external metadata ingestion. * Use filters to limit ingestion to relevant assets: * For example: restrict to production schemas only, or ignore transient/temp schemas. External metadata ingestion runs daily at 5 PM UTC, and also runs immediately each time you update and save credentials. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Fill null values for metrics Understanding and implementing strategies to fill null values in metrics is key for accurate analytics. This guide explains `fill_nulls_with` and `join_to_timespine` to ensure data completeness, helping end users make more informed decisions and enhancing your dbt workflows. ##### About null values[​](#about-null-values "Direct link to About null values") You can use `fill_nulls_with` to replace null values in metrics with a value like zero (or your chosen integer). This ensures every data row shows a numeric value. This guide explains how to ensure there are no null values in your metrics: * Use `fill_nulls_with` for `simple`, `cumulative`, and `conversion` metrics * Use `join_to_timespine` and `fill_nulls_with` together for derived and ratio metrics to avoid null values appearing. ##### Fill null values for simple metrics[​](#fill-null-values-for-simple-metrics "Direct link to Fill null values for simple metrics") For example, if you'd like to handle days with site visits but no leads, you can use `fill_nulls_with` to set the value for leads to zero on days when there are no conversions. Let's say you have three metrics: * `website_visits` and `leads` * and a derived metric called `leads_to_website_visit` that calculates the ratio of leads to site visits. The `website_visits` and `leads` metrics have the following data: | metric\_time | website\_visits | | ------------ | --------------- | | 2024-01-01 | 50 | | 2024-01-02 | 37 | | 2024-01-03 | 79 | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | | metric\_time | leads | | ------------ | ----- | | 2024-01-01 | 5 | | 2024-01-03 | 8 | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | * Note that there is no data for `2024-01-02` in the `leads` metric. Although there are no days without visits, there are days without leads. After applying `fill_nulls_with: 0` to the `leads` metric, querying these metrics together shows zero for leads on days with no conversions: | metric\_time | website\_visits | leads | | ------------ | --------------- | ----- | | 2024-01-01 | 50 | 5 | | 2024-01-02 | 37 | 0 | | 2024-01-03 | 79 | 8 | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | ##### Use join\_to\_timespine for derived and ratio metrics[​](#use-join_to_timespine-for-derived-and-ratio-metrics "Direct link to Use join_to_timespine for derived and ratio metrics") ##### Fill null values for derived and ratio metrics[​](#fill-null-values-for-derived-and-ratio-metrics "Direct link to Fill null values for derived and ratio metrics") To fill null values for derived and ratio metrics, you can link them with a time spine to ensure daily data coverage. As mentioned in [the previous section](#use-join_to_timespine-for-derived-and-ratio-metrics), this is because `derived` and `ratio` metrics take *metrics* as inputs. For example, the following structure leaves nulls in the final results (`leads_to_website_visit` column) because `COALESCE` isn't applied at the third outer rendering layer for the final metric calculation in `derived` metrics: | metric\_time | website\_visits | leads | leads\_to\_website\_visit | | ------------ | --------------- | ----- | ------------------------- | | 2024-01-01 | 50 | 5 | .1 | | 2024-01-02 | 37 | 0 | null | | 2024-01-03 | 79 | 8 | .1 | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | To display a zero value for `leads_to_website_visit` for `2024-01-02`, you would join the `leads` metric to a time spine model to ensure a value for each day. You can do this by adding `join_to_timespine` to the in the `leads` metric configuration: Once you do this, if you query the `leads` metric after the timespine join, there will be a record for each day and any null values will get filled with zero. | metric\_time | leads | leads\_to\_website\_visit | | ------------ | ----- | ------------------------- | | 2024-01-01 | 5 | .1 | | 2024-01-02 | 0 | 0 | | 2024-01-03 | 8 | .1 | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | Now, if you combine the metrics in a `derived` metric, there will be a zero value for `leads_to_website_visit` on `2024-01-02` and the final result set will not have any null values. #### FAQs[​](#faqs "Direct link to FAQs")  How to handle null values in derived metrics defined on top of multiple tables For additional examples and discussion on how to handle null values in derived metrics that use data from multiple tables, check out [MetricFlow issue #1031](https://github.com/dbt-labs/metricflow/issues/1031). #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Frequently asked questions #### [🗃️ Accounts](https://docs.getdbt.com/category/accounts.md) [11 items](https://docs.getdbt.com/category/accounts.md) --- ### Global navigation StarterEnterpriseEnterprise + Preview ### Global navigation [Starter](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing")[Enterprise](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing")[Enterprise +](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing") [Preview](https://docs.getdbt.com/docs/dbt-versions/product-lifecycles "Go to https://docs.getdbt.com/docs/dbt-versions/product-lifecycles") Search, explore, and analyze data assets across all your dbt projects and connected metadata sources. Discover cross-project lineage, data discovery, and unified analytics governance. **Plan availability** Global navigation search varies depending on your [dbt platform](https://www.getdbt.com/pricing) plan: * Enterprise plans — Catalog lets you search across all [dbt resources](https://docs.getdbt.com/docs/build/projects.md) (models, seeds, snapshots, sources, exposures, and more) in your account, plus discover external metadata. * Starter plans (single project) — Use global navigation to search and navigate resources within your project #### About Global navigation[​](#about-global-navigation "Direct link to About Global navigation") Global navigation in Catalog lets you search, explore, and analyze data assets across all your dbt projects and connected metadata sources—giving you a unified, account-wide view of your analytics ecosystem. With global navigation, you can: * Search data assets — expand your search by including dbt resources (models, seeds, snapshots, sources, exposures, and more) across your entire account. This broadens the results returned and gives you greater insight into all the assets across your dbt projects. * External metadata ingestion — connect directly to your data warehouse, giving you visibility into tables, views, and other resources that aren't defined in dbt with Catalog. * Explore lineage — explore an interactive map of data relationships across all your dbt projects. It lets you: * View upstream/downstream dependencies for models, sources, and more. * Drill into project and column-level lineage, including multi-project (Mesh) links. * Filter with "lineage lenses" by resource type, materialization, layer, or run status. * Troubleshoot data issues by tracing root causes and downstream impacts. * Optimize pipelines by spotting slow, failing, or unused parts of your DAG. * See recommendations — global navigation offers a project-wide snapshot of dbt health, highlighting actionable tips to enhance your analytics engineering. These insights are automatically generated using dbt metadata and best practices from the project evaluator ruleset. * View model query history — see how often each dbt model is queried in your warehouse, helping you: * Track real usage via successful `SELECT`s (excluding builds/tests) * Identify most/least used models for optimization or deprecation * Guide investment and maintenance with data-driven insights * Track downstream exposures — monitor how your dbt models and sources are used by BI tools, apps, ML models, and reports across all connected projects #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Google Sheets StarterEnterpriseEnterprise + ### Google Sheets [Starter](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing")[Enterprise](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing")[Enterprise +](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing") The Semantic Layer offers a seamless integration with Google Sheets through a custom menu. This add-on allows you to build Semantic Layer queries and return data on your metrics directly within Google Sheets #### Prerequisites[​](#prerequisites "Direct link to Prerequisites") * You have [configured the Semantic Layer](https://docs.getdbt.com/docs/use-dbt-semantic-layer/setup-sl.md) and are using dbt v1.6 or higher. * You need a Google account with access to Google Sheets and the ability to install Google add-ons. * You have a [dbt Environment ID](https://docs.getdbt.com/docs/use-dbt-semantic-layer/setup-sl.md#set-up-dbt-semantic-layer). * You have a [service token](https://docs.getdbt.com/docs/dbt-cloud-apis/service-tokens.md) or a [personal access token](https://docs.getdbt.com/docs/dbt-cloud-apis/user-tokens.md) to authenticate with from a dbt account. * You must have a dbt Starter or Enterprise-tier [account](https://www.getdbt.com/pricing). Suitable for both Multi-tenant and Single-tenant deployment. If you're using [IP restrictions](https://docs.getdbt.com/docs/cloud/secure/ip-restrictions.md), ensure you've added [Google’s IP addresses](https://www.gstatic.com/ipranges/goog.txt) to your IP allowlist. Otherwise, the Google Sheets connection will fail. 📹 Learn about the dbt Semantic Layer with on-demand video courses! Explore our [dbt Semantic Layer on-demand course](https://learn.getdbt.com/courses/semantic-layer) to learn how to define and query metrics in your dbt project. Additionally, dive into mini-courses for querying the dbt Semantic Layer in your favorite tools: [Tableau](https://courses.getdbt.com/courses/tableau-querying-the-semantic-layer), [Excel](https://learn.getdbt.com/courses/querying-the-semantic-layer-with-excel), [Hex](https://courses.getdbt.com/courses/hex-querying-the-semantic-layer), and [Mode](https://courses.getdbt.com/courses/mode-querying-the-semantic-layer). #### Installing the add-on[​](#installing-the-add-on "Direct link to Installing the add-on") 1. Navigate to the [Semantic Layer for Sheets App](https://gsuite.google.com/marketplace/app/foo/392263010968) to install the add-on. You can also find it in Google Sheets by going to [**Extensions -> Add-on -> Get add-ons**](https://support.google.com/docs/answer/2942256?hl=en\&co=GENIE.Platform%3DDesktop\&oco=0#zippy=%2Cinstall-add-ons%2Cinstall-an-add-on) and searching for it there. 2. After installing, open the **Extensions** menu and select **Semantic Layer for Sheets**. This will open a custom menu on the right-hand side of your screen. 3. [Find your](https://docs.getdbt.com/docs/use-dbt-semantic-layer/setup-sl.md#set-up-dbt-semantic-layer) **Host** and **Environment ID** in dbt. * Navigate to **Account Settings** and select **Projects** on the left sidebar. * Select your project and then navigate to the **Semantic Layer** settings. You'll need this to authenticate in Google Sheets in the following step. * You can generate your service token by clicking **Generate service token** within the Semantic Layer configuration page or navigating to **API tokens** in dbt. Alternatively, you can also create a personal access token by going to **API tokens** > **Personal tokens**. [![Access your Environment ID, Host, and URLs in your dbt Semantic Layer settings. Generate a service token in the Semantic Layer settings or API tokens settings](/img/docs/dbt-cloud/semantic-layer/sl-and-gsheets.png?v=2 "Access your Environment ID, Host, and URLs in your dbt Semantic Layer settings. Generate a service token in the Semantic Layer settings or API tokens settings")](#)Access your Environment ID, Host, and URLs in your dbt Semantic Layer settings. Generate a service token in the Semantic Layer settings or API tokens settings 4. In Google Sheets, authenticate with your Host, dbt Environment ID, and service or personal token. 5. Start querying your metrics using the **Query Builder**. For more info on the menu functions, refer to [Query Builder functions](#query-builder-functions). To cancel a query while running, press the "Cancel" button. When querying your data with Google Sheets: * It returns the data to the cell you clicked on. * The custom menu operation has a timeout limit of six (6) minutes. * If you're using this extension, make sure you're signed into Chrome with the same Google profile you used to set up the Add-On. Log in with one Google profile at a time as using multiple Google profiles at once might cause issues. * Note that only standard granularities are currently available, custom time granularities aren't currently supported for this integration. #### Query Builder functions[​](#query-builder-functions "Direct link to Query Builder functions") The Google Sheets **Query Builder** custom menu has the following capabilities: | Menu items | Description | | ---------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | Metrics | Search and select metrics. | | Group By | Search and select dimensions or entities to group by. Dimensions are grouped by the entity of the semantic model they come from. You may choose dimensions on their own without metrics. | | Time Range | Quickly select time ranges to look at the data, which applies to the main time series for the metrics (metric time), or do more advanced filter using the "Custom" selection. | | Where | Filter your data. This includes categorical and time filters. | | Order By | Return your data order. | | Limit | Set a limit for the rows of your output. | Note: Click the **info** button next to any metric or dimension to see its defined description from your dbt project. ###### Modifying time granularity[​](#modifying-time-granularity "Direct link to Modifying time granularity") When you select time dimensions in the **Group By** menu, you'll see a list of available time granularities. The lowest granularity is selected by default. Metric time is the default time dimension for grouping your metrics. info Note: [Custom time granularities](https://docs.getdbt.com/docs/build/metricflow-time-spine.md#add-custom-granularities) (like fiscal year) aren't currently supported or accessible in this integration. Only [standard granularities](https://docs.getdbt.com/docs/build/dimensions.md?dimension=time_gran#time) (like day, week, month, and so on) are available. If you'd like to access custom granularities, consider using the [Semantic Layer APIs](https://docs.getdbt.com/docs/dbt-cloud-apis/sl-api-overview.md). ###### Filtering data[​](#filtering-data "Direct link to Filtering data") To use the filter functionality, choose the [dimension](https://docs.getdbt.com/docs/build/dimensions.md) you want to filter by and select the operation you want to filter on. * For categorical dimensions, you can type a value into search or select from a populated list. * For entities, you must type the value you are looking for as we do not load all of them given the large number of values. * Continue adding additional filters as needed with AND and OR. * For time dimensions, you can use the time range selector to filter on presets or custom options. The time range selector applies only to the primary time dimension (`metric_time`). For all other time dimensions that aren't `metric_time`, you can use the "Where" option to apply filters. ###### Other settings[​](#other-settings "Direct link to Other settings") If you would like to just query the data values without the headers, you can optionally select the **Exclude column names** box. To return your results and keep any previously selected data below it intact, un-select the **Clear trailing rows** box. By default, we'll clear all trailing rows if there's stale data. [![Run a query in the Query Builder. Use the arrow next to the Query button to select additional settings.](/img/docs/dbt-cloud/semantic-layer/query-builder.png?v=2 "Run a query in the Query Builder. Use the arrow next to the Query button to select additional settings.")](#)Run a query in the Query Builder. Use the arrow next to the Query button to select additional settings. #### Using saved selections[​](#using-saved-selections "Direct link to Using saved selections") Saved selections allow you to save the inputs you've created in the Google Sheets **Query Builder** and easily access them again so you don't have to continuously build common queries from scratch. To create a saved selection: 1. Run a query in the **Query Builder**. 2. Save the selection by selecting the arrow next to the **Query** button and then select **Query & Save Selection**. 3. The application saves these selections, allowing you to view and edit them from the hamburger menu under **Saved Selections**. You can also make these selections private or public. Public selections mean your inputs are available in the menu to everyone on the sheet. Private selections mean your inputs are only visible to you. Note that anyone added to the sheet can still see the data from these private selections, but they won't be able to interact with the selection in the menu or benefit from the automatic refresh. ##### Refreshing selections[​](#refreshing-selections "Direct link to Refreshing selections") Set your saved selections to automatically refresh every time you load the addon. You can do this by selecting **Refresh on Load** when creating the saved selection. When you access the addon and have saved selections that should refresh, you'll see "Loading..." in the cells that are refreshing. Public saved selections will refresh for anyone who edits the sheet. What's the difference between saved selections and saved queries? * Saved selections are saved components that you can create only when using the application. * Saved queries, explained in the next section, are code-defined sections of data you create in your dbt project that you can easily access and use for building selections. You can also use the results from a saved query to create a saved selection. #### Using saved queries[​](#using-saved-queries "Direct link to Using saved queries") Access [saved queries](https://docs.getdbt.com/docs/build/saved-queries.md), powered by MetricFlow, in Google Sheets to quickly get results from pre-defined sets of data. To access the saved queries in Google Sheets: 1. Open the hamburger menu in Google Sheets . 2. Navigate to **Saved Queries** to access the ones available to you. 3. You can also select **Build Selection**, which allows you to explore the existing query. This won't change the original query defined in the code. * If you use a `WHERE` filter in a saved query, Google Sheets displays the advanced syntax for this filter. **Limited use policy disclosure** The Semantic Layer for Sheet's use and transfer to any other app of information received from Google APIs will adhere to [Google API Services User Data Policy](https://developers.google.com/terms/api-services-user-data-policy), including the Limited Use requirements. #### FAQs[​](#faqs "Direct link to FAQs") I'm receiving an \`Failed ALPN\` error when trying to connect to the dbt Semantic Layer. If you're receiving a `Failed ALPN` error when trying to connect the dbt Semantic Layer with the various [data integration tools](https://docs.getdbt.com/docs/cloud-integrations/avail-sl-integrations.md) (such as Tableau, DBeaver, Datagrip, ADBC, or JDBC), it typically happens when connecting from a computer behind a corporate VPN or Proxy (like Zscaler or Check Point). The root cause is typically the proxy interfering with the TLS handshake as the Semantic Layer uses gRPC/HTTP2 for connectivity. To resolve this: * If your proxy supports gRPC/HTTP2 but isn't configured to allow ALPN, adjust its settings accordingly to allow ALPN. Or create an exception for the dbt domain. * If your proxy does not support gRPC/HTTP2, add an SSL interception exception for the dbt domain in your proxy settings This should help in successfully establishing the connection without the Failed ALPN error. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Hooks and operations #### Related documentation[​](#related-documentation "Direct link to Related documentation") * [pre-hook & post-hook](https://docs.getdbt.com/reference/resource-configs/pre-hook-post-hook.md) * [on-run-start & on-run-end](https://docs.getdbt.com/reference/project-configs/on-run-start-on-run-end.md) * [`run-operation` command](https://docs.getdbt.com/reference/commands/run-operation.md) ##### Assumed knowledge[​](#assumed-knowledge "Direct link to Assumed knowledge") * [Project configurations](https://docs.getdbt.com/reference/dbt_project.yml.md) * [Model configurations](https://docs.getdbt.com/reference/model-configs.md) * [Macros](https://docs.getdbt.com/docs/build/jinja-macros.md#macros) #### Getting started with hooks and operations[​](#getting-started-with-hooks-and-operations "Direct link to Getting started with hooks and operations") Effective database administration sometimes requires additional SQL statements to be run, for example: * Creating UDFs * Managing row- or column-level permissions * Vacuuming tables on Redshift * Creating partitions in Redshift Spectrum external tables * Resuming/pausing/resizing warehouses in Snowflake * Refreshing a pipe in Snowflake * Create a share on Snowflake * Cloning a database on Snowflake dbt provides hooks and operations so you can version control and execute these statements as part of your dbt project. #### About hooks[​](#about-hooks "Direct link to About hooks") Hooks are snippets of SQL that are executed at different times: * `pre-hook`: executed *before* a model, seed or snapshot is built. * `post-hook`: executed *after* a model, seed or snapshot is built. * `on-run-start`: executed at the *start* of `dbt build`, `dbt compile`, `dbt docs generate`, `dbt run`, `dbt seed`, `dbt snapshot`, or `dbt test`. * `on-run-end`: executed at the *end* of `dbt build`, `dbt compile`, `dbt docs generate`, `dbt run`, `dbt seed`, `dbt snapshot`, or `dbt test`. Hooks are a more-advanced capability that enable you to run custom SQL, and leverage database-specific actions, beyond what dbt makes available out-of-the-box with standard materializations and configurations. If (and only if) you can't leverage the [`grants` resource-config](https://docs.getdbt.com/reference/resource-configs/grants.md), you can use `post-hook` to perform more advanced workflows: * Need to apply `grants` in a more complex way, which the dbt Core `grants` config doesn't (yet) support. * Need to perform post-processing that dbt does not support out-of-the-box. For example, `analyze table`, `alter table set property`, `alter table ... add row access policy`, etc. ##### Examples using hooks[​](#examples-using-hooks "Direct link to Examples using hooks") You can use hooks to trigger actions at certain times when running an operation or building a model, seed, or snapshot. For more information about when hooks can be triggered, see reference sections for [`on-run-start` and `on-run-end` hooks](https://docs.getdbt.com/reference/project-configs/on-run-start-on-run-end.md) and [`pre-hook`s and `post-hook`s](https://docs.getdbt.com/reference/resource-configs/pre-hook-post-hook.md). You can use hooks to provide database-specific functionality not available out-of-the-box with dbt. For example, you can use a `config` block to run an `ALTER TABLE` statement right after building an individual model using a `post-hook`: models/\.sql ```sql {{ config( post_hook=[ "alter table {{ this }} ..." ] ) }} ``` ##### Calling a macro in a hook[​](#calling-a-macro-in-a-hook "Direct link to Calling a macro in a hook") You can also use a [macro](https://docs.getdbt.com/docs/build/jinja-macros.md#macros) to bundle up hook logic. Check out some of the examples in the reference sections for [on-run-start and on-run-end hooks](https://docs.getdbt.com/reference/project-configs/on-run-start-on-run-end.md) and [pre- and post-hooks](https://docs.getdbt.com/reference/resource-configs/pre-hook-post-hook.md). models/\.sql ```sql {{ config( pre_hook=[ "{{ some_macro() }}" ] ) }} ``` models/properties.yml ```yaml models: - name: config: pre_hook: - "{{ some_macro() }}" ``` dbt\_project.yml ```yaml models: : +pre-hook: - "{{ some_macro() }}" ``` #### About operations[​](#about-operations "Direct link to About operations") Operations are [macros](https://docs.getdbt.com/docs/build/jinja-macros.md#macros) that you can run using the [`run-operation`](https://docs.getdbt.com/reference/commands/run-operation.md) command. As such, operations aren't actually a separate resource in your dbt project — they are just a convenient way to invoke a macro without needing to run a model. Explicitly execute the SQL in an operation Unlike hooks, you need to explicitly execute a query within a macro, by using either a [statement block](https://docs.getdbt.com/reference/dbt-jinja-functions/statement-blocks.md) or a helper macro like the [run\_query](https://docs.getdbt.com/reference/dbt-jinja-functions/run_query.md) macro. Otherwise, dbt will return the query as a string without executing it. This macro performs a similar action as the above hooks: macros/grant\_select.sql ```sql {% macro grant_select(role) %} {% set sql %} grant usage on schema {{ target.schema }} to role {{ role }}; grant select on all tables in schema {{ target.schema }} to role {{ role }}; grant select on all views in schema {{ target.schema }} to role {{ role }}; {% endset %} {% do run_query(sql) %} {% do log("Privileges granted", info=True) %} {% endmacro %} ``` To invoke this macro as an operation, execute `dbt run-operation grant_select --args '{role: reporter}'`. ```text $ dbt run-operation grant_select --args '{role: reporter}' Running with dbt=1.6.0 Privileges granted ``` Full usage docs for the `run-operation` command can be found [here](https://docs.getdbt.com/reference/commands/run-operation.md). #### Additional examples[​](#additional-examples "Direct link to Additional examples") These examples from the community highlight some of the use-cases for hooks and operations! * [In-depth discussion of granting privileges using hooks and operations, for dbt Core versions prior to 1.2](https://discourse.getdbt.com/t/the-exact-grant-statements-we-use-in-a-dbt-project/430) * [Staging external tables](https://github.com/dbt-labs/dbt-external-tables) * [Performing a zero copy clone on Snowflake to reset a dev environment](https://discourse.getdbt.com/t/creating-a-dev-environment-quickly-on-snowflake/1151/2) * [Running `vacuum` and `analyze` on a Redshift warehouse](https://github.com/dbt-labs/redshift/tree/0.2.3/#redshift_maintenance_operation-source) * [Creating a Snowflake share](https://discourse.getdbt.com/t/how-drizly-is-improving-collaboration-with-external-partners-using-dbt-snowflake-shares/1110) * [Unloading files to S3 on Redshift](https://github.com/dbt-labs/redshift/tree/0.2.3/#unload_table-source) * [Creating audit events for model timing](https://github.com/dbt-labs/dbt-event-logging) * [Creating UDFs](https://discourse.getdbt.com/t/using-dbt-to-manage-user-defined-functions/18) #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Hybrid setup Enterprise + ### Hybrid setup [Enterprise +](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing") Set up Hybrid projects to upload dbt Core artifacts into dbt for better collaboration and visibility. Available in public preview Hybrid projects is available in public preview to [dbt Enterprise accounts](https://www.getdbt.com/pricing). #### Set up Hybrid projects[​](#set-up-hybrid-projects "Direct link to Set up Hybrid projects") In a hybrid project, you use dbt Core locally and can upload artifacts of that dbt Core project to dbt for central visibility, cross-project referencing, and easier collaboration. This setup requires connecting your dbt Core project to a dbt project and configuring a few environment variables and access settings. Follow these steps to set up a dbt Hybrid project and upload dbt Core artifacts into dbt: * [Make dbt Core models public](#make-dbt-core-models-public) (optional) * [Create hybrid project](#create-hybrid-project) * [Generate service token and artifact upload values](#generate-service-token-and-artifact-upload-values) * [Configure dbt Core project and upload artifacts](#configure-dbt-core-project-and-upload-artifacts) * [Review artifacts in dbt](#review-artifacts-in-dbt-cloud) Make sure to enable the hybrid projects toggle in dbt’s **Account settings** page. ##### Make dbt Core models public (optional)[​](#make-dbt-core-models-public "Direct link to Make dbt Core models public (optional)") This step is optional and and only needed if you want to share your dbt Core models with other dbt projects using the [cross-project referencing](https://docs.getdbt.com/docs/mesh/govern/project-dependencies.md#how-to-write-cross-project-ref) feature. Before connecting your dbt Core project to a dbt project, make sure models that you want to share have `access: public` in their model configuration. This setting makes those models visible to other dbt projects for better collaboration, such as [cross-project referencing](https://docs.getdbt.com/docs/mesh/govern/project-dependencies.md#how-to-write-cross-project-ref). 1. The easiest way to set this would be in your `dbt_project.yml` file, however you can also set this in the following places: * `dbt_project.yml` (project-level) * `properties.yml` (for individual models) * A model's `.sql` file using a `config` block Here's an example using a `dbt_project.yml` file where the marts directory is set as public so they can be consumed by downstream tools: dbt\_project.yml ```yaml models: define_public_models: # This is my project name, remember it must be specified marts: +access: public ``` 2. After defining `access: public`, rerun a dbt execution in the dbt Core command line interface (CLI) (like `dbt run`) to apply the change. 3. For more details on how to set this up, see [access modifier](https://docs.getdbt.com/docs/mesh/govern/model-access.md#access-modifiers) and [`access` config](https://docs.getdbt.com/reference/resource-configs/access.md). ##### Create hybrid project[​](#create-hybrid-project "Direct link to Create hybrid project") Create a hybrid project in dbt to allow you to upload your dbt Core artifacts to dbt. A [dbt account admin](https://docs.getdbt.com/docs/cloud/manage-access/enterprise-permissions.md#permission-sets) should perform the following steps and share the artifacts information with a dbt Core user: 1. To create a new project in dbt, navigate to **Account home**. 2. Click on **+New project**. 3. Fill out the **Project name**. Name the project something that allows you to recognize it's a dbt Core project. * You don't need to set up a [data warehouse](https://docs.getdbt.com/docs/supported-data-platforms.md) or [Git connection](https://docs.getdbt.com/docs/cloud/git/git-configuration-in-dbt-cloud.md), however to upgrade the hybrid project to a full dbt project, you'd need to set up data warehouse and Git connection. 4. Select the **Advanced settings** toggle and then select the **Hybrid development** checkbox. Click **Continue**. * The hybrid project will have a visible **Hybrid** indicator in the project list to help you identify it. [![Hybrid project new project](/img/docs/deploy/hp-new-project.jpg?v=2 "Hybrid project new project")](#)Hybrid project new project 5. After creating a project, create a corresponding [production environment](https://docs.getdbt.com/docs/deploy/deploy-environments.md#create-a-deployment-environment) and click **Save**. You will need to create a placeholder [profile](https://docs.getdbt.com/docs/cloud/about-profiles.md) and assign it to the environment to save. 6. (Optional) To update an existing dbt project to a hybrid project, navigate to **Account settings** and then select the **Project**. Click **Edit** and then check the **Hybrid development** checkbox. [![Hybrid project for an existing project](/img/docs/deploy/hp-existing-project.jpg?v=2 "Hybrid project for an existing project")](#)Hybrid project for an existing project ##### Generate service token and artifact upload values[​](#generate-service-token-and-artifact-upload-values "Direct link to Generate service token and artifact upload values") A dbt admin should perform these steps to generate a [service token](https://docs.getdbt.com/docs/dbt-cloud-apis/service-tokens.md#enterprise-plans-using-service-account-tokens) (with both **Job Runner** *and* **Job Viewer** permissions) and copy the values needed to configure a dbt Core project so it's ready to upload generated artifacts to dbt. The dbt admin should share the values with a dbt Core user. 1. Go to the Hybrid project environment you created in the previous step by navigating to **Deploy** > **Environments** and selecting the environment. 2. Select the **Artifact upload** button and copy the following values, which the dbt Core user will need to reference in their dbt Core's `dbt_project.yml` configuration: * **[Tenant URL](https://docs.getdbt.com/docs/cloud/about-cloud/access-regions-ip-addresses.md)** * **Account ID** * **Environment ID** * **Create a service token** * dbt creates a service token with both **Job Runner** *and* **Job Viewer** permissions. * Note if you don't see the **Create service token** button, it's likely you don't have the necessary permissions to create a service token. Contact your dbt admin to either get the necessary permissions or create the service token for you. [![Generate hybrid project service token](/img/docs/deploy/hp-artifact-upload.png?v=2 "Generate hybrid project service token")](#)Generate hybrid project service token 3. Make sure to copy and save the values as they're needed to configure your dbt Core project in the next step. Once the service token is created, you can't access it again. ##### Configure dbt Core project and upload artifacts[​](#configure-dbt-core-project-and-upload-artifacts "Direct link to Configure dbt Core project and upload artifacts") Once you have the values from the previous step, you can prepare your dbt Core project for artifact upload by following these steps: 1. Check your dbt version by running `dbt --version` and you should see the following: ```bash Core: - installed: 1.10.0-b1 - latest: 1.9.3 - Ahead of latest version! ``` 2. If you don't have the latest version (1.10 or later), [upgrade](https://docs.getdbt.com/docs/local/install-dbt.md?version=1#change-dbt-core-versions) your dbt Core project by running `python -m pip install --upgrade dbt-core`. 3. Set the following environment variables in your dbt Core project by running the following commands in the CLI. Replace the `your_account_id`, `your_environment_id`, and `your_token` with the actual values in the [previous step](#generate-service-token-and-artifact-upload-values). ```bash export DBT_CLOUD_ACCOUNT_ID=your_account_id export DBT_CLOUD_ENVIRONMENT_ID=your_environment_id export DBT_CLOUD_TOKEN=your_token export DBT_UPLOAD_TO_ARTIFACTS_INGEST_API=True ``` * Set the environment variables in whatever way you use them in your project. * To unset an environment variable, run `unset environment_variable_name`, replacing `environment_variable_name` with the actual name of the environment variable. 4. In your local dbt Core project, add the following items you copied in the [previous section](https://docs.getdbt.com/docs/deploy/hybrid-setup.md#enable-artifact-upload) to the dbt Core's `dbt_project.yml` file: * `tenant_hostname` ```yaml name: "jaffle_shop" version: "3.0.0" require-dbt-version: ">=1.5.0" ....rest of dbt_project.yml configuration... dbt-cloud: tenant_hostname: cloud.getdbt.com # Replace with your Tenant URL ``` 5. Once you set the environment variables using the `export` command in the same dbt Core CLI session, you can execute a `dbt run` in the CLI. ```bash dbt run ``` To override the environment variables set, execute a `dbt run` with the environment variable prefix. For example, to use a different account ID and environment ID: ```bash DBT_CLOUD_ACCOUNT_ID=1 DBT_CLOUD_ENVIRONMENT_ID=123 dbt run ``` 6. After the run completes, you should see a `Artifacts uploaded successfully to artifact ingestion API: command run completed successfully` message and a run in dbt under your production environment. ##### Review artifacts in the dbt platform[​](#review-artifacts-in-the-dbt-platform "Direct link to Review artifacts in the dbt platform") Now that you've uploaded dbt Core artifacts into the dbt platform and executed a `dbt run`, you can view the artifacts job run: 1. Navigate to **Deploy** 2. Click on **Jobs** and then the **Runs** tab. 3. You should see a job run with the status **Success** with a ` Artifact ingestion` indicator. 4. Click on the job run to review the logs to confirm a successfully artifacts upload message. If there are any errors, resolve them by checking out the debug logs. [![Hybrid project job run with artifact ingestion](/img/docs/deploy/hp-artifact-job.jpg?v=2 "Hybrid project job run with artifact ingestion")](#)Hybrid project job run with artifact ingestion #### Benefits of using Hybrid projects[​](#benefits-of-using-hybrid-projects "Direct link to Benefits of using Hybrid projects") Now that you've integrated dbt Core artifacts with your dbt project, you can now: * Collaborate with dbt users by enabling them to visualize and perform [cross-project references](https://docs.getdbt.com/docs/mesh/govern/project-dependencies.md#how-to-write-cross-project-ref) to dbt models that live in Core projects. * (Coming soon) New users interested in the [Canvas](https://docs.getdbt.com/docs/cloud/canvas.md) can build off of dbt models already created by a central data team in dbt Core rather than having to start from scratch. * dbt Core users can navigate to [Catalog](https://docs.getdbt.com/docs/explore/explore-projects.md) and view their models and assets. To view Catalog, you must have a [read-only seat](https://docs.getdbt.com/docs/cloud/manage-access/seats-and-users.md). #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Install the dbt VS Code extension Preview ### Install the dbt VS Code extension [Preview](https://docs.getdbt.com/docs/dbt-versions/product-lifecycles "Go to https://docs.getdbt.com/docs/dbt-versions/product-lifecycles") The dbt extension — available for [VS Code, Cursor](https://marketplace.visualstudio.com/items?itemName=dbtLabsInc.dbt\&ssr=false#overview), and [Windsurf](https://open-vsx.org/extension/dbtLabsInc/dbt) — uses the dbt Fusion engine to make dbt development smoother and more efficient. The dbt VS Code extension is only compatible with the dbt Fusion engine, but not with dbt Core. note This is the only official dbt Labs VS Code extension. Other extensions *can* work alongside the dbt VS Code extension, but they aren’t tested or supported by dbt Labs. Read the [Fusion Diaries](https://github.com/dbt-labs/dbt-fusion/discussions/categories/announcements) for the latest updates. #### Prerequisites[​](#prerequisites "Direct link to Prerequisites") Before installing, make sure to review the [Limitations](https://docs.getdbt.com/docs/fusion/supported-features.md#limitations) page as some features don't support Fusion just yet. To use the extension, you must meet the following prerequisites: | Prerequisite | Details | | ----------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | **dbt Fusion engine** | The [dbt VS Code extension](https://marketplace.visualstudio.com/items?itemName=dbtLabsInc.dbt\&ssr=false#overview) requires the dbt Fusion engine binary (a small executable program). The extension will prompt you to install it, or you can [install it manually](#install-the-dbt-fusion-engine-from-the-command-line-if-you-havent-already) at any time.

[Register your email](#register-the-extension) within 14 days of installing the dbt extension. Free for up to 15 users. | | **Project files** | - You need a `profiles.yml` configuration file.

⁃ You *may* need to [download](#register-with-dbt_cloudyml) a `dbt_cloud.yml` file depending on your [registration path](#choose-your-registration-path).

⁃ You don't need a dbt platform project to use the extension. | | **Editor** | [VS Code](https://code.visualstudio.com/), [Cursor](https://www.cursor.com/en), or [Windsurf](https://windsurf.com/editor) code editor. | | **Operating systems** | macOS, Windows, or Linux-based computer. | | **Configure your local setup** (Optional) | [Configure the extension](https://docs.getdbt.com/docs/configure-dbt-extension.md) to mirror your dbt environment locally and set any environment variables locally to use the VS Code extension features. | | **Run dbt-autofix** (Optional) | [Run dbt-autofix](https://docs.getdbt.com/docs/dbt-versions/upgrade-dbt-version-in-cloud.md#dbt-autofix) to fix any errors and deprecations in your dbt project. | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | #### Install the extension[​](#install-the-extension "Direct link to Install the extension") To install the dbt VS Code extension, follow these steps in your editor of choice: 1. Navigate to the **Extensions** tab of your editor and search for `dbt`. Locate the extension from the publisher `dbtLabsInc` or `dbt Labs Inc`. Click **Install**. [![Search for the extension](/img/docs/extension/extension-marketplace.png?v=2 "Search for the extension")](#)Search for the extension 2. Open a dbt project in your VS Code environment if you haven't already. Make sure it is added to your current workspace. If you see a **dbt Extension** label in your editor's status bar, then the extension has installed successfully. You can hover over this **dbt Extension** label to see diagnostic information about the extension. [![If you see the 'dbt Extension\` label, the extension is activated](/img/docs/extension/dbt-extension-statusbar.png?v=2 "If you see the 'dbt Extension` label, the extension is activated")](#)If you see the 'dbt Extension\` label, the extension is activated 3. Once the dbt extension is activated, it will automatically begin downloading the correct dbt Language Server (LSP) for your operating system. [![The dbt Language Server will be installed automatically](/img/docs/extension/extension-lsp-download.png?v=2 "The dbt Language Server will be installed automatically")](#)The dbt Language Server will be installed automatically 4. If the dbt Fusion engine is not already installed on your machine, the extension will prompt you to download and install it. Follow the steps shown in the notification to complete the installation or [install it manually from the command line](#install-the-dbt-fusion-engine-from-the-command-line-if-you-havent-already). [![Follow the prompt to install the dbt Fusion engine](/img/docs/extension/install-dbt-fusion-engine.png?v=2 "Follow the prompt to install the dbt Fusion engine")](#)Follow the prompt to install the dbt Fusion engine 5. Run the VS Code extension [upgrade tool](#upgrade-to-fusion) to ensure your dbt project is Fusion ready and help you fix any errors and deprecations. 6. (Optional) If you're new to the extension or VS Code/Cursor, you [can set your local environment](https://docs.getdbt.com/docs/configure-dbt-extension.md) to mirror your dbt platform environment and [set any environment variables](https://docs.getdbt.com/docs/configure-dbt-extension.md#configure-environment-variables) locally to use the VS Code extension features. You're all set up with the dbt extension! The next steps are: * Follow the [getting started](#getting-started) section to begin the terminal onboarding workflow and configure your set up. If you encounter any parsing errors, you can also run the [`dbt-autofix` tool](https://github.com/dbt-labs/dbt-autofix?tab=readme-ov-file#installation) to resolve them. *  Install the dbt Fusion engine from the command line, if you haven't already. If you already have the dbt Fusion engine installed, you can skip this step. If you don't have it installed, you can follow these steps to install it: 1. Open a new command-line window and run the following command to install the dbt Fusion engine: * macOS & Linux * Windows (PowerShell) Run the following command in the terminal: ```shell curl -fsSL https://public.cdn.getdbt.com/fs/install/install.sh | sh -s -- --update ``` To use `dbtf` immediately after installation, reload your shell so that the new `$PATH` is recognized: ```shell exec $SHELL ``` Or, close and reopen your Terminal window. This will load the updated environment settings into the new session. Run the following command in PowerShell: ```powershell irm https://public.cdn.getdbt.com/fs/install/install.ps1 | iex ``` To use `dbtf` immediately after installation, reload your shell so that the new `Path` is recognized: ```powershell Start-Process powershell ``` Or, close and reopen PowerShell. This will load the updated environment settings into the new session. 2. Run the following command to verify you've installed Fusion: ```bash dbtf --version ``` You can use `dbt` or its Fusion alias `dbtf` (handy if you already have another dbt CLI installed). Default install path: * macOS/Linux: `$HOME/.local/bin/dbt` * Windows: `C:\Users\\.local\bin\dbt.exe` The installer adds this path automatically, but you may need to reload your shell for the `dbtf` command to work. 3) Follow the [getting started](https://docs.getdbt.com/docs/install-dbt-extension.md#getting-started) guide to get started with the extension. You can get started using one of these methods: * Running `dbtf init` to use terminal onboarding. * Running **Run dbt: Register dbt extension** in the command palette. * Using the **Get started** button in the extension menu. * [Register the extension](#register-the-extension) with your email address or dbt platform account to continue using it beyond the trial period. * Review the [limitations and unsupported features](https://docs.getdbt.com/docs/fusion/supported-features.md#limitations) if you haven't already. #### Getting started[​](#getting-started "Direct link to Getting started") Once the dbt Fusion engine and dbt VS Code extension have been installed in your environment, the dbt logo will appear on the sidebar. From here, you can access workflows to help you get started, offers information about the extension and your dbt project, and provides helpful links to guide you. For more information, see the [the dbt extension menu](https://docs.getdbt.com/docs/about-dbt-extension.md#the-dbt-extension-menu) documentation. You can get started with the extension a couple of ways: * Running `dbtf init` to use the terminal onboarding, * Opening **dbt: Register dbt extension** in the command palette, * Using the **Get started** button in the extension menu. The following steps explain how to get started using the **Get started** button in the extension menu: 1. From the sidebar menu, click the dbt logo to open the menu and expand the **Get started** section. 2. Click the **dbt Walkthrough** status bar to view the welcome screen. [![dbt VS Code extension welcome screen.](/img/docs/extension/welcome-screen.png?v=2 "dbt VS Code extension welcome screen.")](#)dbt VS Code extension welcome screen. 3. Click through the items to get started with the extension: * **Open your dbt project:** Launches file explorer so you can select the dbt project you want to open with Fusion. * **Check Fusion compatibility:** Runs the [Fusion upgrade](#upgrade-to-fusion) workflows to bring your project up-to-date. If you encounter any parsing errors, you can also run the [`dbt-autofix` tool](https://github.com/dbt-labs/dbt-autofix?tab=readme-ov-file#installation) to resolve them. * **Explore features:** Opens the [documentation](https://docs.getdbt.com/docs/about-dbt-extension.md) so you can learn more about all the extension has to offer. * [**Register:**](#register-the-extension) Launches the registration workflow so you can continue to use the extension beyond the trial period. #### Upgrade to Fusion[​](#upgrade-to-fusion "Direct link to Upgrade to Fusion") note If you are already running the dbt Fusion engine, you must be on version `2.0.0-beta.66` or higher to use the upgrade tool. The dbt extension provides a built-in upgrade tool to walk you through the process of configuring Fusion and updating your dbt project to support all of its features and fix any deprecated code. To start the process: 1. From the VS Code sidebar menu, click the **dbt logo**. 2. In the resulting pane, open the **Get started** section and click the **Get started** button. [![The dbt extension help pane and upgrade assistant.](/img/docs/extension/fusion-onboarding-experience.png?v=2 "The dbt extension help pane and upgrade assistant.")](#)The dbt extension help pane and upgrade assistant. You can also manually start this process by opening a CLI window and running: ```text dbt init --fusion-upgrade ``` This will start the upgrade tool and guide you through the Fusion upgrade with a series of prompts: * **Do you have an existing dbt platform account?**: If you answer `Y`, you will be given instructions for downloading your dbt platform profile to register the extension. An `N` answer will skip to the next step. * **Ready to run a dbtf init?** (If there is no `profiles.yml` file present): You will go through the dbt configuration processes, including connecting to your data warehouse. * **Ready to run a dbtf debug?** (If there is an existing `profiles.yml` file): Validates that your project is configured correctly and can connect to your data warehouse. * **Ready to run a dbtf parse?**: Your dbt project will be parsed to check for compatibility with Fusion. * If any issues are encountered during the parsing, you'll be given the option to run the [dbt-autofix](https://github.com/dbt-labs/dbt-autofix?tab=readme-ov-file#installation) tool to resolve the errors. If you opt to not run the tool during the upgrade processes, you can always run it later or manually fix any errors. However, the upgrade tool cannot continue until the errors are resolved. AI Agents There are cases where dbt-autofix may not resolve all errors and requires manual intervention. For those cases, the dbt-autofix tool provides an [AI Agents.md](https://github.com/dbt-labs/dbt-autofix/blob/main/AGENTS.md) file to enable AI agents to help with migration work after dbt-autofix has completed its part. * **Ready to run a ‘dbtf compile -static-analysis off’?** (Only runs once the parse passes): Compiles your project without any static analysis, mimicking dbt Core. This compile only renders Jinja into SQL, so Fusion's advanced SQL comprehension is temporarily disabled. * **Ready to run a ‘dbtf compile’?**: Compiles your project with full Fusion static analysis. It checks that your SQL code is valid in the context of your warehouse's tables and columns. [![The message received when you have completed upgrading your project to the dbt Fusion engine.](/img/docs/extension/fusion-onboarding-complete.png?v=2 "The message received when you have completed upgrading your project to the dbt Fusion engine.")](#)The message received when you have completed upgrading your project to the dbt Fusion engine. Once the upgrade is completed, you're ready to dive into all the features that the dbt Fusion engine has to offer! #### Register the extension[​](#register-the-extension "Direct link to Register the extension") After downloading the extension and installing the dbt Fusion engine, make sure you’re running the latest version of the dbt VS Code extension and restart VS Code, then register the extension within 14 days of installing (or re-installing) it. **Key points:** * The extension is free for organizations for up to 15 users (see the [acceptable use policy](https://www.getdbt.com/dbt-assets/vscode-plugin-aup)). * Registration links your editor to a dbt account so you can keep using the extension beyond the grace period. * This *does not* require a dbt platform project — just a dbt account. * If a valid `dbt_cloud.yml` file exists on your machine, the extension will automatically use it and skip login. * If you already have a dbt account (even from years ago), you will be directed into an OAuth sign-in flow.  Understanding regions Most users can sign in from the extension's browser registration page for the default `US1` region. If that works, you have an account in the default region and don't need to consider other [regions](https://docs.getdbt.com/docs/cloud/about-cloud/access-regions-ip-addresses.md). Use a credential file (`dbt_cloud.yml`) instead of sign-in when: * You can't sign in. * Your organization uses a non-default region (`eu1`, `us2`, and so on). * You prefer file-based credentials. If you're unsure whether you have a `US1` account from the past, try signing in or using **Forgot password** at [us1.dbt.com](http://us1.dbt.com). If nothing comes up, continue with [Register with `dbt_cloud.yml`](#register-with-dbt_cloudyml). ###### Choose your registration path[​](#choose-your-registration-path "Direct link to Choose your registration path") Your dbt VS Code extension registration path depends on your situation. Select the one that applies to you: * **New to dbt and never created a dbt account?** → Use [First-time registration](#first-time-registration). * **Have an existing dbt account and can sign in?** → Use [Existing dbt account](#existing-dbt-account). * **Email already exists or can’t sign in?** (locked, forgot password) → Use [Recover your login](#recover-your-login). * **Can't sign in or your organization uses a non-default region** (`eu1`, `us2`) → Use [Register with `dbt_cloud.yml`](#register-with-dbt_cloudyml). ##### First-time registration[​](#first-time-registration "Direct link to First-time registration") Use this if you've *never* created a dbt account before. 1. Click the registration prompt or open the command palette (Ctrl + Shift + P (Windows/Linux) or Cmd + Shift + P (macOS)) and type: **dbt: Register dbt extension**. [![The extension registration prompt in VS Code.](/img/docs/extension/registration-prompt.png?v=2 "The extension registration prompt in VS Code.")](#)The extension registration prompt in VS Code. 2. In the browser registration form, enter your name and email, then click **Continue**. 3. Check your inbox for a verification email and click the verification link. 4. After verification, return to the browser flow to complete sign‑in. 5. You'll return to the editor and see **Registered**. 6. Continue with the [Get started](#getting-started) onboarding workflow and get your dbt project up and running. **Note:** You do not need a dbt platform project to register; this only creates your dbt account. ##### Existing account sign-in[​](#existing-dbt-account "Direct link to Existing account sign-in") Use this if you have an existing dbt account — including older or inactive accounts. dbt automatically detects your account and `dbt_cloud.yml` file if it exists (no file download needed). Use this to easily work across machines. 1. [Update the VS Code extension](https://code.visualstudio.com/docs/setup/setup-overview#_update-cadence) to the latest version and restart your editor before beginning the registration process. 2. Click the registration prompt or open the command palette and type: **dbt: Register dbt extension.** 3. In the browser registration form, select **Sign in** at the bottom of the form. 4. Enter your email address associated with your dbt account and click **Continue**. If you don't remember your password, see [Recover your login](#recover-your-login) for help. 5. You'll then have the option to select your existing dbt account. 6. Select the account you want to use and click **Continue**. 7. You should see a page confirming your successful registration. Close the tab and go back to your editor to continue the registration. **When you might still need a `dbt_cloud.yml`:** * You want a file-based credential for automations. * You're on the free Developer plan and your workflow needs a local credential file for defer. * Your region requires it (for example, regions like `eu1` or `us2`). ###### Recover your login[​](#recover-your-login "Direct link to Recover your login") Choose this path if the registration form tells you your email already exists but you don't remember your password or your account is locked. To reset your password and sign in through the OAuth flow: 1. On the sign-in screen, click **Forgot password**. 2. Enter the email associated with your dbt account. 3. Check your inbox and reset your password. 4. Return to the sign in screen in the browser and complete the sign-in process. 5. If you've signed in, you will then have the option to select your existing dbt account. 6. Select the account you want to use and click **Continue**. 7. You should see a page confirming your successful registration. Close the tab and go back to your editor to continue the registration. **If you still can't sign in:** * Your account may be locked. Contact [dbt Support](mailto:support@getdbt.com) to unlock. * After unlocking, continue with the registration flow as described in [Sign in with your existing dbt account](#existing-dbt-account). ##### Register with `dbt_cloud.yml`[​](#register-with-dbt_cloudyml "Direct link to register-with-dbt_cloudyml") Use this if you can't sign in to your dbt account, your org uses a non-default region (`eu1`, `us2`), or your workflow requires a credential file. 1. Log in to dbt platform and open **Account settings** → **VS Code extension**. 2. In the **Set up your credentials** section, click **Download credentials** to get `dbt_cloud.yml` file. [![Download the dbt\_cloud.yml file from your dbt platform account.](/img/docs/extension/download-registration-2.png?v=2 "Download the dbt_cloud.yml file from your dbt platform account.")](#)Download the dbt\_cloud.yml file from your dbt platform account. 3. Move the file into your dbt directory: * macOS/Linux: `~/.dbt/dbt_cloud.yml` * Windows: `C:\Users\[username]\.dbt\` For help creating/moving the `.dbt` directory, see [this FAQ](#how-to-create-a-dbt-directory-in-root-and-move-dbt_cloudyml-file). 4. Return to the VS Code editor, open the command palette and type: **dbt: Register dbt extension**. 5. The extension will detect the credential file and you can continue with the registration flow. **Behavior details:** * If the `dbt_cloud.yml` file exists, it takes precedence over any login flow and the extension uses it automatically. * If the file is missing, you'll be prompted to sign in or add the file. #### Configure environment variables locally[​](#configure-environment-variables "Direct link to Configure environment variables locally") *This section is optional. You only need to configure environment variables locally if your dbt project uses environment variables that are already configured in the dbt platform.* If your dbt project uses environment variables, you can configure them to use the extension's features. See the [Configure environment variables](https://docs.getdbt.com/docs/configure-dbt-extension.md) page for more information. #### Troubleshooting[​](#troubleshooting "Direct link to Troubleshooting") If you run into any issues, check out the troubleshooting section below.  How to create a .dbt directory in root and move dbt\_cloud.yml file If you've never had a `.dbt` directory, you should perform the following recommended steps to create one. If you already have a `.dbt` directory, move the `dbt_cloud.yml` file into it. Some information about the `.dbt` directory: * A `.dbt` directory is a hidden folder in the root of your filesystem. It's used to store your dbt configuration files. The `.` prefix is used to create a hidden folder, which means it's not visible in Finder or File Explorer by default. * To view hidden files and folders, press Command + Shift + G on macOS or Ctrl + Shift + G on Windows. This opens the "Go to Folder" dialog where you can search for the `.dbt` directory. - Create a .dbt directory - Move the dbt\_cloud.yml file 1. Clone your dbt project repository locally. 2. Use the `mkdir` command followed by the name of the folder you want to create. * If using macOS, add the `~` prefix to create a `.dbt` folder in the root of your filesystem: ```bash mkdir ~/.dbt # macOS mkdir %USERPROFILE%\.dbt # Windows ``` You can move the `dbt_cloud.yml` file into the `.dbt` directory using the `mv` command or by dragging and dropping the file into the `.dbt` directory by opening the Downloads folder using the "Go to Folder" dialog and then using drag-and-drop in the UI. To move the file using the terminal, use the `mv/move` command. This command moves the `dbt_cloud.yml` from the `Downloads` folder to the `.dbt` folder. If your `dbt_cloud.yml` file is located elsewhere, adjust the path accordingly. ###### Mac or Linux[​](#mac-or-linux "Direct link to Mac or Linux") In your command line, use the `mv` command to move your `dbt_cloud.yml` file into the `.dbt` directory. If you've just downloaded the `dbt_cloud.yml` file and it's in your Downloads folder, the command might look something like this: ```bash mv ~/Downloads/dbt_cloud.yml ~/.dbt/dbt_cloud.yml ``` ###### Windows[​](#windows "Direct link to Windows") In your command line, use the move command. Assuming your file is in the Downloads folder, the command might look like this: ```bash move %USERPROFILE%\Downloads\dbt_cloud.yml %USERPROFILE%\.dbt\dbt_cloud.yml ```  I can't see the lineage tab in Cursor If you're using the dbt VS Code extension in Cursor, the lineage tab works best in Editor mode and doesn't render in Agent mode. If you're in Agent mode and the lineage tab isn't rendering, just switch to Editor mode to view your project's table and column lineage.  The extension gets stuck in a loading state If the extension is attempting to activate during startup and locks into a permanent loading state, check that: * Your dbt VS Code extension is on the latest version. * Your IDE is on the latest version. * You have a valid `dbt_cloud.yml` file configured and in the [correct location](#register-with-dbt_cloudyml). If you're still experiencing issues, try these steps before contacting dbt Support: * Delete and download a new copy of your `dbt_cloud.yml` file. * Delete and reinstall the dbt VS Code extension.  dbt platform configurations If you're a cloud-based dbt platform user who has the `dbt-cloud:` config in the `dbt_project.yml` file and are also using dbt Mesh, you must have the project ID configured: ```yaml dbt-cloud: project-id: 12345 # Required ``` If you don’t configure this correctly, cross-platform references will not resolve properly, and you will encounter errors executing dbt commands.  dbt extension not activating If the dbt extension has activated successfully, you will see the **dbt Extension** label in the status bar at the bottom left of your editor. You can view diagnostic information about the dbt extension by clicking the **dbt Extension** button. If the **dbt Extension** label is not present, then it is likely that the dbt extension was not installed successfully. If this happens, try uninstalling the extension, restarting your editor, and then reinstalling the extension. **Note:** It is possible to "hide" status bar items in VS Code. Double-check if the dbt Extension status bar label is hidden by right-clicking on the status bar in your editor. If you see dbt Extension in the right-click menu, then the extension has installed successfully.  Missing dbt LSP features If you receive a `no active LSP for this workspace` error message or aren't seeing dbt Language Server (LSP) features in your editor (like autocomplete, go-to-definition, or hover text), start by first following the general troubleshooting steps mentioned earlier. If you've confirmed the dbt extension is installed correctly but don't see LSP features, try the following: 1. Check extension version — Ensure that you're using the latest available version of the dbt extension by: * Opening the **Extensions** page in your editor, or * Going to the **Output** tab and looking for the version number, or * Running `dbtf --version` in the terminal. 2. Reinstall the LSP — If the version is correct, reinstall the LSP: 1. Open the Command Palette: Command + Shift + P (macOS) or Ctrl + Shift + P (Windows/Linux). 2. Paste `dbt: Reinstall dbt LSP` and enter. This command downloads the LSP and re-activates the extension to resolve the error.  Unsupported dbt version If you see an error message indicating that your version of dbt is unsupported, then there is likely a problem with your environment. Check the dbt Path setting in your VS Code settings. If this path is set, ensure that it is pointing to a valid dbt Fusion Engine executable. If necessary, you can also install the dbt Fusion Engine directly using these instructions: [Install the Fusion CLI](https://docs.getdbt.com/docs/local/install-dbt.md?version=2#get-started)  Addressing the 'dbt language server is not running in this workspace' error To resolve the `dbt language server is not running in this workspace` error, you need to add your dbt project folder to a workspace: 1. In VS Code, click **File** in the toolbar then select **Add Folder to Workspace**. 2. Select the dbt project file you want to add to a workspace. 3. To save your workspace, click **File** then select **Save Workspace As**. 4. Navigate to the location you want to save your workspace. This should resolve the error and open your dbt project by opening the workspace it belongs to. For more information on workspaces, refer to [What is a VS Code workspace?](https://code.visualstudio.com/docs/editing/workspaces/workspaces).  Manifest cannot be downloaded from the dbt platform If the dbt VS Code extension cannot download the manifest from the dbt platform or you get `warning: dbt1200: Failed to download manifest` using Fusion locally, you are probably having DNS-related issues. To confirm this, do a DNS lookup for the host Fusion is trying to download from (for example, prodeu2.blob.core.windows.net) by using `dig` on Linux/Mac or `nslookup` on Windows. If this doesn't return an IP address, the likely reason is that your company uses the same cloud provider with private endpoints for cloud resources, and DNS requests for these are forwarded to private DNS zones. This situation can be remedied by setting up an internet fallback, which will then return a public IP to any cloud storage that does not have a private IP registered with the private DNS zone. For Azure refer to [Fallback to internet for Azure Private DNS zones](https://learn.microsoft.com/en-us/azure/dns/private-dns-fallback). #### More information about Fusion[​](#more-information-about-fusion "Direct link to More information about Fusion") Fusion marks a significant update to dbt. While many of the workflows you've grown accustomed to remain unchanged, there are a lot of new ideas, and a lot of old ones going away. The following is a list of the full scope of our current release of the Fusion engine, including implementation, installation, deprecations, and limitations: * [About the dbt Fusion engine](https://docs.getdbt.com/docs/fusion/about-fusion.md) * [About the dbt extension](https://docs.getdbt.com/docs/about-dbt-extension.md) * [New concepts in Fusion](https://docs.getdbt.com/docs/fusion/new-concepts.md) * [Supported features matrix](https://docs.getdbt.com/docs/fusion/supported-features.md) * [Installing Fusion CLI](https://docs.getdbt.com/docs/local/install-dbt.md?version=2#get-started) * [Installing VS Code extension](https://docs.getdbt.com/docs/install-dbt-extension.md) * [Fusion release track](https://docs.getdbt.com/docs/dbt-versions/upgrade-dbt-version-in-cloud.md#dbt-fusion-engine) * [Quickstart for Fusion](https://docs.getdbt.com/guides/fusion.md?step=1) * [Upgrade guide](https://docs.getdbt.com/docs/dbt-versions/core-upgrade/upgrading-to-fusion.md) * [Fusion licensing](http://www.getdbt.com/licenses-faq) #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Integrate Claude with dbt MCP Claude is an AI assistant from Anthropic with two primary interfaces: * [Claude for desktop](https://claude.ai/download): A GUI with MCP support for file access and commands as well as basic coding features * [Claude Code](https://www.anthropic.com/claude-code): A terminal/IDE tool for development #### Claude Desktop[​](#claude-desktop "Direct link to Claude Desktop") Static subdomains required Only accounts with static subdomains (for example, `abc123` in `abc123.us1.dbt.com`) can use OAuth with MCP servers. Follow [these](https://docs.getdbt.com/docs/cloud/about-cloud/access-regions-ip-addresses.md) instructions to find your account subdomain. If your account does not have a subdomain, contact support for more information. To configure Claude Desktop to use the dbt MCP server: 1. Go to the [latest dbt MCP release](https://github.com/dbt-labs/dbt-mcp/releases/latest) and download the `dbt-mcp.mcpb` file. 2. Double-click the downloaded file to open it in Claude Desktop. 3. Configure the **dbt Platform Host**. You can find this in your dbt platform account by navigating to **Account settings** and copying the **Access URL**. 4. Enable the server in Claude Desktop. 5. Ask Claude a data-related question and see dbt MCP in action! ##### Advanced configuration with Claude Desktop[​](#advanced-configuration-with-claude-desktop "Direct link to Advanced configuration with Claude Desktop") To add advanced configurations: 1. Go to the Claude settings and select **Settings…**. 2. In the Settings window, navigate to the **Developer** tab in the left sidebar. This section contains options for configuring MCP servers and other developer features. 3. Click the **Edit Config** button and open the configuration file with a text editor. 4. Add your server configuration based on your use case. Choose the [correct JSON structure](https://modelcontextprotocol.io/quickstart/user#installing-the-filesystem-server) from the following options:  Local MCP with OAuth ###### Local MCP with dbt platform authentication [Enterprise](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing")[Enterprise +](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing")[​](#local-mcp-with-dbt-platform-authentication- "Direct link to local-mcp-with-dbt-platform-authentication-") Configuration for users who want seamless OAuth authentication with the dbt platform * dbt platform only * dbt platform + CLI This option is for users who only want dbt platform features (Discovery API, Semantic Layer, job management) without local CLI commands. When you use only the dbt platform, the CLI tools are automatically disabled. You can find the `DBT_HOST` field value in your dbt platform account information under **Access URLs**. ```json { "mcpServers": { "dbt": { "command": "uvx", "args": ["dbt-mcp"], "env": { "DBT_HOST": "https://", } } } } ``` **Note:** Replace `` with your actual host (for example, `abc123.us1.dbt.com`). This enables OAuth authentication without requiring local dbt installation. This option is for users who want both dbt CLI commands and dbt platform features (Discovery API, Semantic Layer, job management). The `DBT_PROJECT_DIR` and `DBT_PATH` fields are required for CLI access. You can find the `DBT_HOST` field value in your dbt platform account information under **Access URLs**. ```json { "mcpServers": { "dbt": { "command": "uvx", "args": ["dbt-mcp"], "env": { "DBT_HOST": "https://", "DBT_PROJECT_DIR": "/path/to/project", "DBT_PATH": "/path/to/dbt/executable" } } } } ``` **Note:** Replace `` with your actual host (for example, `https://abc123.us1.dbt.com`). This enables OAuth authentication.  Local MCP (CLI only) Local configuration for users who only want to use dbt CLI commands with dbt Core or Fusion ```json { "mcpServers": { "dbt": { "command": "uvx", "args": ["dbt-mcp"], "env": { "DBT_PROJECT_DIR": "/path/to/your/dbt/project", "DBT_PATH": "/path/to/your/dbt/executable" } } } } ``` Finding your paths: * **DBT\_PROJECT\_DIR**: Full path to the folder containing your `dbt_project.yml` file * **DBT\_PATH**: Find by running `which dbt` in Terminal (macOS/Linux) or `where dbt` (Windows) in Powershell  Local MCP with .env Advanced configuration for users who need custom environment variables Using the `env` field (recommended): ```json { "mcpServers": { "dbt": { "command": "uvx", "args": ["dbt-mcp"], "env": { "DBT_HOST": "cloud.getdbt.com", "DBT_TOKEN": "your-token-here", "DBT_PROD_ENV_ID": "12345", "DBT_PROJECT_DIR": "/path/to/project", "DBT_PATH": "/path/to/dbt" } } } } ``` Using an .env file (alternative): ```json { "mcpServers": { "dbt": { "command": "uvx", "args": ["--env-file", "/path/to/.env", "dbt-mcp"] } } } ``` 5. Save the file. Upon a successful restart of Claude Desktop, you'll see an MCP server indicator in the bottom-right corner of the conversation input box. For debugging, you can find the Claude desktop logs at `~/Library/Logs/Claude` for Mac or `%APPDATA%\Claude\logs` for Windows. #### Claude Code[​](#claude-code "Direct link to Claude Code") You can set up Claude Code with both the local and remote `dbt-mcp` server. We recommend using the local `dbt-mcp` for more developer-focused workloads. See the [About MCP](https://docs.getdbt.com/docs/dbt-ai/about-mcp.md#server-access) page for more more information about local and remote server features. ##### Set up with local dbt MCP server[​](#set-up-with-local-dbt-mcp-server "Direct link to Set up with local dbt MCP server") Prerequisites: * Complete the [local MCP setup](https://docs.getdbt.com/docs/dbt-ai/setup-local-mcp.md). * Know your configuration method (OAuth or environment variables) In your Claude Code set up, run one of these commands based on your use case. Be sure to update the commands for your specific needs: * CLI only * OAuth with dbt platform For dbt Core or Fusion only (no dbt platform account): ```shell claude mcp add dbt \ -e DBT_PROJECT_DIR=/path/to/your/dbt/project \ -e DBT_PATH=/path/to/your/dbt/executable \ -- uvx dbt-mcp ``` For OAuth authentication (requires static subdomain). Find your static subdomain [here](https://docs.getdbt.com/docs/cloud/about-cloud/access-regions-ip-addresses.md): ```shell claude mcp add dbt \ -e DBT_HOST=your-host-with-subdomain \ -e DBT_PROJECT_DIR=/path/to/your/dbt/project \ -e DBT_PATH=/path/to/your/dbt/executable \ -- uvx dbt-mcp ``` Replacing `your-host-with-subdomain`, `path/to/your/dbt/project`, and `path/to/your/dbt/executable` with your actual static subdomain, project path, and dbt executable path. For example, if your static subdomain is `abc123.us1.dbt.com`, your command would look like this: ```shell claude mcp add dbt \ -e DBT_HOST=abc123.us1.dbt.com \ ## this is the static subdomain -e DBT_PROJECT_DIR=/path/to/your/dbt/project \ -e DBT_PATH=/path/to/your/dbt/executable \ -- uvx dbt-mcp ``` ###### Using an `.env` file[​](#using-an-env-file "Direct link to using-an-env-file") If you prefer to manage environment variables in a separate file, you can use the `--env-file` parameter from `uvx`: ```bash claude mcp add dbt -- uvx --env-file dbt-mcp ``` Replace `` with the full path to your `.env` file. #### Troubleshooting[​](#troubleshooting "Direct link to Troubleshooting") * Claude desktop may return errors such as `Error: spawn uvx ENOENT` or `Could not connect to MCP server dbt-mcp`. Try replacing the command and environment variables file path with the full path. For `uvx`, find the full path to `uvx` by running `which uvx` on Unix systems and placing this full path in the JSON. For instance: `"command": "/the/full/path/to/uvx"`. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Integrate Cursor with dbt MCP [Cursor](https://docs.cursor.com/context/model-context-protocol) is an AI-powered code editor, powered by Microsoft Visual Studio Code (VS Code). After setting up your MCP server, you connect it to Cursor. Log in to Cursor and follow the steps that align with your use case. #### Set up with local dbt MCP server[​](#set-up-with-local-dbt-mcp-server "Direct link to Set up with local dbt MCP server") Choose your setup based on your workflow: * OAuth for dbt platform connections * CLI only if using dbt Core or the dbt Fusion engine locally. * Configure environment variables if you're using them in your dbt platform account. ##### OAuth or CLI[​](#oauth-or-cli "Direct link to OAuth or CLI") Click one of the following application links with Cursor open to automatically configure your MCP server: * CLI only (dbt Core and Fusion) * OAuth with dbt platform Local configuration for users who only want to use dbt CLI commands with dbt Core or dbt Fusion engine (no dbt platform features). [Add dbt Core or Fusion to Cursor](cursor://anysphere.cursor-deeplink/mcp/install?name=dbt\&config=eyJlbnYiOnsiREJUX1BST0pFQ1RfRElSIjoiL3BhdGgvdG8veW91ci9kYnQvcHJvamVjdCIsIkRCVF9QQVRIIjoiL3BhdGgvdG8veW91ci9kYnQvZXhlY3V0YWJsZSJ9LCJjb21tYW5kIjoidXZ4IiwiYXJncyI6WyJkYnQtbWNwIl19) After clicking: 1. Update `DBT_PROJECT_DIR` with the full path to your dbt project (the folder containing `dbt_project.yml`). 2. Update `DBT_PATH` with the full path to your dbt executable: * macOS/Linux: Run `which dbt` in Terminal. * Windows: Run `where dbt` in Command Prompt or PowerShell. 3. Save the configuration. Configuration settings for users who want OAuth authentication with the dbt platform [Enterprise](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing")[Enterprise +](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing") * [dbt platform only](cursor://anysphere.cursor-deeplink/mcp/install?name=dbt\&config=eyJlbnYiOnsiREJUX0hPU1QiOiJodHRwczovLzx5b3VyLWRidC1ob3N0LXdpdGgtY3VzdG9tLXN1YmRvbWFpbj4iLCJESVNBQkxFX0RCVF9DTEkiOiJ0cnVlIn0sImNvbW1hbmQiOiJ1dngiLCJhcmdzIjpbImRidC1tY3AiXX0%3D) * [dbt platform + CLI](cursor://anysphere.cursor-deeplink/mcp/install?name=dbt\&config=eyJlbnYiOnsiREJUX0hPU1QiOiJodHRwczovLzx5b3VyLWRidC1ob3N0LXdpdGgtY3VzdG9tLXN1YmRvbWFpbj4iLCJEQlRfUFJPSkVDVF9ESVIiOiIvcGF0aC90by9wcm9qZWN0IiwiREJUX1BBVEgiOiJwYXRoL3RvL2RidC9leGVjdXRhYmxlIn0sImNvbW1hbmQiOiJ1dngiLCJhcmdzIjpbImRidC1tY3AiXX0%3D) After clicking: 1. Replace `` with your actual host (for example, `abc123.us1.dbt.com`). 2. (For dbt platform + CLI) Update `DBT_PROJECT_DIR` and `DBT_PATH` as described above. 3. Save the configuration. ##### Custom environment variables[​](#custom-environment-variables "Direct link to Custom environment variables") If you need custom environment variable configuration or prefer to use service tokens: 1. Click the following link with Cursor open: [Add to Cursor](cursor://anysphere.cursor-deeplink/mcp/install?name=dbt\&config=eyJjb21tYW5kIjoidXZ4IiwiYXJncyI6WyJkYnQtbWNwIl0sImVudiI6e319) 2. In the template, add your environment variables to the `env` section based on your needs. 3. Save the configuration. ###### Using an `.env` file[​](#using-an-env-file "Direct link to using-an-env-file") If you prefer to manage environment variables in a separate file, click this link: [Add to Cursor (with .env file)](cursor://anysphere.cursor-deeplink/mcp/install?name=dbt-mcp\&config=eyJjb21tYW5kIjoidXZ4IC0tZW52LWZpbGUgPGVudi1maWxlLXBhdGg%252BIGRidC1tY3AifQ%3D%3D) Then update `` with the full path to your `.env` file. #### Set up with remote dbt MCP server[​](#set-up-with-remote-dbt-mcp-server "Direct link to Set up with remote dbt MCP server") 1. Click the following application link with Cursor open: [Add to Cursor](cursor://anysphere.cursor-deeplink/mcp/install?name=dbt\&config=eyJ1cmwiOiJodHRwczovLzxob3N0Pi9hcGkvYWkvdjEvbWNwLyIsImhlYWRlcnMiOnsiQXV0aG9yaXphdGlvbiI6InRva2VuIDx0b2tlbj4iLCJ4LWRidC1wcm9kLWVudmlyb25tZW50LWlkIjoiPHByb2QtaWQ%252BIn19) 2. Provide your URL/headers by updating the **host**, **production environment ID**, and **service token** in the template. 3. Save, and now you have access to the dbt MCP server! #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Integrate VS Code with MCP [Microsoft Visual Studio Code (VS Code)](https://code.visualstudio.com/mcp) is a powerful and popular integrated development environment (IDE). These instructions are for integrating dbt MCP and VS Code. Before starting, ensure you have: * Completed the [local MCP setup](https://docs.getdbt.com/docs/dbt-ai/setup-local-mcp.md) * Installed VS Code with the latest updates * (For local MCP with CLI) Configured your dbt project paths #### Set up with local dbt MCP server[​](#set-up-with-local-dbt-mcp-server "Direct link to Set up with local dbt MCP server") To get started, in VS Code: 1. Open the **Settings** menu and select the correct tab atop the page for your use case: * **Workspace**: Configures the server in the context of your workspace * **User**: Configures the server in the context of your user
**Note for WSL users**: If you're using VS Code with Windows Subsystem for Linux (WSL), you'll need to configure WSL-specific settings. Run the **Preferences: Open Remote Settings** command from the **Command Palette** (F1) or select the **Remote** tab in the **Settings** editor. Local user settings are reused in WSL but can be overridden with WSL-specific settings. Configuring MCP servers in the local user settings will not work properly in a WSL environment. 2. Select **Features** --> **Chat** 3. Ensure that **MCP** is **Enabled** [![mcp-vscode-settings](/img/mcp/vscode_mcp_enabled_image.png?v=2 "mcp-vscode-settings")](#)mcp-vscode-settings 4. Open the command palette `Control/Command + Shift + P`, and select either: * **MCP: Open Workspace Folder MCP Configuration** — if you want to install the MCP server for this workspace * **MCP: Open User Configuration** — if you want to install the MCP server for the user 5. Add your server configuration (`dbt`) to the provided `mcp.json` file as one of the servers:  Local MCP with dbt platform OAuth Local MCP with OAuth is for users who want to use the dbt platform features. Choose your configuration based on your use case: * dbt platform only * dbt platform + CLI This option is for users who only want dbt platform features (Discovery API, Semantic Layer, job management) without local CLI commands. When you use only the dbt platform, the CLI tools are automatically disabled. You can find the `DBT_HOST` field value in your dbt platform account information under **Access URLs**. ```json { "mcpServers": { "dbt": { "command": "uvx", "args": ["dbt-mcp"], "env": { "DBT_HOST": "https://", } } } } ``` **Note:** Replace `` with your actual host (for example, `abc123.us1.dbt.com`). This enables OAuth authentication without requiring local dbt installation. This option is for users who want both dbt CLI commands and dbt platform features (Discovery API, Semantic Layer, job management). The `DBT_PROJECT_DIR` and `DBT_PATH` fields are required for CLI access. You can find the `DBT_HOST` field value in your dbt platform account information under **Access URLs**. ```json { "mcpServers": { "dbt": { "command": "uvx", "args": ["dbt-mcp"], "env": { "DBT_HOST": "https://", "DBT_PROJECT_DIR": "/path/to/project", "DBT_PATH": "/path/to/dbt/executable" } } } } ``` **Note:** Replace `` with your actual host (for example, `https://abc123.us1.dbt.com`). This enables OAuth authentication.  Local MCP (CLI only) For users who only want to use dbt CLI commands with dbt Core or Fusion ```json { "servers": { "dbt": { "command": "uvx", "args": ["dbt-mcp"], "env": { "DBT_PROJECT_DIR": "/path/to/your/dbt/project", "DBT_PATH": "/path/to/your/dbt/executable" } } } } ``` **Finding your paths:** * **DBT\_PROJECT\_DIR**: Full path to the folder containing your `dbt_project.yml` file * macOS/Linux: Run `pwd` from your project folder. * Windows: Run `cd` from your project folder in Command Prompt. * **DBT\_PATH**: Path to dbt executable * macOS/Linux: Run `which dbt`. * Windows: Run `where dbt`.  Local MCP with .env For advanced users who need custom environment variables or service token authentication Using the `env` field (recommended - single-file configuration): ```json { "servers": { "dbt": { "command": "uvx", "args": ["dbt-mcp"], "env": { "DBT_HOST": "cloud.getdbt.com", "DBT_TOKEN": "your-token-here", "DBT_PROD_ENV_ID": "12345", "DBT_PROJECT_DIR": "/path/to/project", "DBT_PATH": "/path/to/dbt" } } } } ``` Using an `.env` file (alternative - two-file configuration): ```json { "servers": { "dbt": { "command": "uvx", "args": ["--env-file", "/path/to/.env", "dbt-mcp"] } } } ``` 6. You can start, stop, and configure your MCP servers by: * Running the `MCP: List Servers` command from the Command Palette (Control/Command + Shift + P) and selecting the server. * Utilizing the keywords inline within the `mcp.json` file. [![VS Code inline management](/img/mcp/vscode_run_server_keywords_inline.png?v=2 "VS Code inline management")](#)VS Code inline management Now, you can access the dbt MCP server in VS Code through interfaces like GitHub Copilot. #### Troubleshooting[​](#troubleshooting "Direct link to Troubleshooting") This section contains troubleshooting steps for errors you might encounter when integrating VS Code with MCP.  Cannot find \`uvx\` executable If you see errors like `Could not connect to MCP server dbt` or `spawn uvx ENOENT`, VS Code may be unable to find the `uvx` executable. To resolve, use the full path to `uvx` in your configuration: 1. Find the full path: * macOS/Linux: Run `which uvx` in Terminal. * Windows: Run `where uvx` in Command Prompt or PowerShell. 2. Update your `mcp.json` to use the full path: ```json { "servers": { "dbt": { "command": "/full/path/to/uvx", "args": ["dbt-mcp"], "env": { ... } } } } ``` Example on macOS with Homebrew: `"command": "/opt/homebrew/bin/uvx"`  Configuration not working in WSL If you're using VS Code with Windows Subsystem for Linux (WSL), make sure you've configured the MCP server in the WSL-specific settings, not the local user settings. Use the **Remote** tab in the Settings editor or run **Preferences: Open Remote Settings** from the Command Palette.  Server not starting Check the MCP server status: 1. Run `MCP: List Servers` from the Command Palette (Control/Command + Shift + P). 2. Look for any error messages next to the dbt server. 3. Click on the server to see detailed logs. Common issues: * Missing or incorrect paths for `DBT_PROJECT_DIR` or `DBT_PATH` * Invalid authentication tokens * Missing required environment variables #### Resources[​](#resources "Direct link to Resources") * [Microsoft VS Code MCP documentation](https://code.visualstudio.com/docs/copilot/chat/mcp-servers) #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Integrate with other orchestration tools Alongside [dbt](https://docs.getdbt.com/docs/deploy/jobs.md), discover other ways to schedule and run your dbt jobs with the help of tools such as the ones described on this page. Build and install these tools to automate your data workflows, trigger dbt jobs (including those hosted on dbt), and enjoy a hassle-free experience, saving time and increasing efficiency. #### Airflow[​](#airflow "Direct link to Airflow") If your organization uses [Airflow](https://airflow.apache.org/), there are a number of ways you can run your dbt jobs, including: * dbt platform * dbt Core Installing the [dbt Provider](https://airflow.apache.org/docs/apache-airflow-providers-dbt-cloud/stable/index.html) to orchestrate dbt jobs. This package contains multiple Hooks, Operators, and Sensors to complete various actions within dbt. [![Airflow DAG using DbtCloudRunJobOperator](/img/docs/running-a-dbt-project/airflow_dbt_connector.png?v=2 "Airflow DAG using DbtCloudRunJobOperator")](#)Airflow DAG using DbtCloudRunJobOperator [![dbt job triggered by Airflow](/img/docs/running-a-dbt-project/dbt_cloud_airflow_trigger.png?v=2 "dbt job triggered by Airflow")](#)dbt job triggered by Airflow Invoking dbt Core jobs through the [BashOperator](https://registry.astronomer.io/providers/apache-airflow/modules/bashoperator). In this case, be sure to install dbt into a virtual environment to avoid issues with conflicting dependencies between Airflow and dbt. For more details on both of these methods, including example implementations, check out [this guide](https://docs.astronomer.io/learn/airflow-dbt-cloud). #### Automation servers[​](#automation-servers "Direct link to Automation servers") Automation servers (such as CodeDeploy, GitLab CI/CD ([video](https://youtu.be/-XBIIY2pFpc?t=1301)), Bamboo and Jenkins) can be used to schedule bash commands for dbt. They also provide a UI to view logging to the command line, and integrate with your git repository. #### Azure Data Factory[​](#azure-data-factory "Direct link to Azure Data Factory") Integrate dbt and [Azure Data Factory](https://learn.microsoft.com/en-us/azure/data-factory/) (ADF) for a smooth data process from data ingestion to data transformation. You can seamlessly trigger dbt jobs upon completion of ingestion jobs by using the [dbt API](https://docs.getdbt.com/docs/dbt-cloud-apis/overview.md) in ADF. The following video provides you with a detailed overview of how to trigger a dbt job via the API in Azure Data Factory. To use the dbt API to trigger a job in dbt through ADF: 1. In dbt, go to the job settings of the daily production job and turn off the scheduled run in the **Trigger** section. 2. You'll want to create a pipeline in ADF to trigger a dbt job. 3. Securely fetch the dbt service token from a key vault in ADF, using a web call as the first step in the pipeline. 4. Set the parameters in the pipeline, including the dbt account ID and job ID, as well as the name of the key vault and secret that contains the service token. * You can find the dbt job and account id in the URL, for example, if your URL is `https://YOUR_ACCESS_URL/deploy/88888/projects/678910/jobs/123456`, the account ID is 88888 and the job ID is 123456 5. Trigger the pipeline in ADF to start the dbt job and monitor the status of the dbt job in ADF. 6. In dbt, you can check the status of the job and how it was triggered in dbt. #### Cron[​](#cron "Direct link to Cron") Cron is a decent way to schedule bash commands. However, while it may seem like an easy route to schedule a job, writing code to take care of all of the additional features associated with a production deployment often makes this route more complex compared to other options listed here. #### Dagster[​](#dagster "Direct link to Dagster") If your organization uses [Dagster](https://dagster.io/), you can use the [dagster\_dbt](https://docs.dagster.io/_apidocs/libraries/dagster-dbt) library to integrate dbt commands into your pipelines. This library supports the execution of dbt through dbt or dbt Core. Running dbt from Dagster automatically aggregates metadata about your dbt runs. Refer to the [example pipeline](https://dagster.io/blog/dagster-dbt) for details. #### Databricks workflows[​](#databricks-workflows "Direct link to Databricks workflows") Use Databricks workflows to call the dbt job API, which has several benefits such as integration with other ETL processes, utilizing dbt job features, separation of concerns, and custom job triggering based on custom conditions or logic. These advantages lead to more modularity, efficient debugging, and flexibility in scheduling dbt jobs. For more info, refer to the guide on [Databricks workflows and dbt jobs](https://docs.getdbt.com/guides/how-to-use-databricks-workflows-to-run-dbt-cloud-jobs.md). #### Kestra[​](#kestra "Direct link to Kestra") If your organization uses [Kestra](http://kestra.io/), you can leverage the [dbt plugin](https://kestra.io/plugins/plugin-dbt) to orchestrate dbt and dbt Core jobs. Kestra's user interface (UI) has built-in [Blueprints](https://kestra.io/docs/user-interface-guide/blueprints), providing ready-to-use workflows. Navigate to the Blueprints page in the left navigation menu and [select the dbt tag](https://demo.kestra.io/ui/blueprints/community?selectedTag=36) to find several examples of scheduling dbt Core commands and dbt jobs as part of your data pipelines. After each scheduled or ad-hoc workflow execution, the Outputs tab in the Kestra UI allows you to download and preview all dbt build artifacts. The Gantt and Topology view additionally render the metadata to visualize dependencies and runtimes of your dbt models and tests. The dbt task provides convenient links to easily navigate between Kestra and dbt UI. #### Orchestra[​](#orchestra "Direct link to Orchestra") If your organization uses [Orchestra](https://getorchestra.io), you can trigger dbt jobs using the dbt API. Create an API token from your dbt account and use this to authenticate Orchestra in the [Orchestra Portal](https://app.getorchestra.io). For details, refer to the [Orchestra docs on dbt](https://orchestra-1.gitbook.io/orchestra-portal/integrations/transformation/dbt-cloud). Orchestra automatically collects metadata from your runs so you can view your dbt jobs in the context of the rest of your data stack. The following is an example of the run details in dbt for a job triggered by Orchestra: [![Example of Orchestra triggering a dbt job](/img/docs/running-a-dbt-project/dbt_cloud_orchestra_trigger.png?v=2 "Example of Orchestra triggering a dbt job")](#)Example of Orchestra triggering a dbt job The following is an example of viewing lineage in Orchestra for dbt jobs: [![Example of a lineage view for dbt jobs in Orchestra](/img/docs/running-a-dbt-project/orchestra_lineage_dbt_cloud.png?v=2 "Example of a lineage view for dbt jobs in Orchestra")](#)Example of a lineage view for dbt jobs in Orchestra #### Prefect[​](#prefect "Direct link to Prefect") If your organization uses [Prefect](https://www.prefect.io/), the way you will run your jobs depends on the dbt version you're on, and whether you're orchestrating dbt or dbt Core jobs. Refer to the following variety of options: [![Prefect DAG using a dbt job run flow](/img/docs/running-a-dbt-project/prefect_dag_dbt_cloud.jpg?v=2 "Prefect DAG using a dbt job run flow")](#)Prefect DAG using a dbt job run flow ##### Prefect 2[​](#prefect-2 "Direct link to Prefect 2") * dbt platform * dbt Core - Use the [trigger\_dbt\_cloud\_job\_run\_and\_wait\_for\_completion](https://prefecthq.github.io/prefect-dbt/cloud/jobs/#prefect_dbt.cloud.jobs.trigger_dbt_cloud_job_run_and_wait_for_completion) flow. - As jobs are executing, you can poll dbt to see whether or not the job completes without failures, through the [Prefect user interface (UI)](https://docs.prefect.io/ui/overview/). * Use the [trigger\_dbt\_cli\_command](https://prefecthq.github.io/prefect-dbt/cli/commands/#prefect_dbt.cli.commands.trigger_dbt_cli_command) task. * For details on both of these methods, see [prefect-dbt docs](https://prefecthq.github.io/prefect-dbt/). ##### Prefect 1[​](#prefect-1 "Direct link to Prefect 1") * dbt platform * dbt Core - Trigger dbt jobs with the [DbtCloudRunJob](https://docs.prefect.io/api/latest/tasks/dbt.html#dbtcloudrunjob) task. - Running this task will generate a markdown artifact viewable in the Prefect UI. - The artifact will contain links to the dbt artifacts generated as a result of the job run. * Use the [DbtShellTask](https://docs.prefect.io/api/latest/tasks/dbt.html#dbtshelltask) to schedule, execute, and monitor your dbt runs. * Use the supported [ShellTask](https://docs.prefect.io/api/latest/tasks/shell.html#shelltask) to execute dbt commands through the shell. #### Related docs[​](#related-docs "Direct link to Related docs") * [dbt plans and pricing](https://www.getdbt.com/pricing/) * [Quickstart guides](https://docs.getdbt.com/guides.md) * [Webhooks for your jobs](https://docs.getdbt.com/docs/deploy/webhooks.md) * [Orchestration guides](https://docs.getdbt.com/guides.md) * [Commands for your production deployment](https://discourse.getdbt.com/t/what-are-the-dbt-commands-you-run-in-your-production-deployment-of-dbt/366) #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Jinja and macros #### Related reference docs[​](#related-reference-docs "Direct link to Related reference docs") * [Jinja Template Designer Documentation](https://jinja.palletsprojects.com/page/templates/) (external link) * [dbt Jinja context](https://docs.getdbt.com/reference/dbt-jinja-functions-context-variables.md) * [Macro properties](https://docs.getdbt.com/reference/macro-properties.md) #### Overview[​](#overview "Direct link to Overview") In dbt, you can combine SQL with [Jinja](https://jinja.palletsprojects.com), a templating language. Using Jinja turns your dbt project into a programming environment for SQL, giving you the ability to do things that aren't normally possible in SQL. It's important to note that Jinja itself isn't a programming language; instead, it acts as a tool to enhance and extend the capabilities of SQL within your dbt projects. For example, with Jinja, you can: * Use control structures (for example, `if` statements and `for` loops) in SQL * Use [environment variables](https://docs.getdbt.com/reference/dbt-jinja-functions/env_var.md) in your dbt project for production deployments * Change the way your project builds based on the current target. * Operate on the results of one query to generate another query, for example: * Return a list of payment methods, to create a subtotal column per payment method (pivot) * Return a list of columns in two relations, and select them in the same order to make it easier to union them together * Abstract snippets of SQL into reusable [**macros**](#macros) — these are analogous to functions in most programming languages. If you've used the [`{{ ref() }}` function](https://docs.getdbt.com/reference/dbt-jinja-functions/ref.md), you're already using Jinja! Jinja can be used in any SQL in a dbt project, including [models](https://docs.getdbt.com/docs/build/sql-models.md), [analyses](https://docs.getdbt.com/docs/build/analyses.md), [data tests](https://docs.getdbt.com/docs/build/data-tests.md), and even [hooks](https://docs.getdbt.com/docs/build/hooks-operations.md). Ready to get started with Jinja and macros? Check out the [tutorial on using Jinja](https://docs.getdbt.com/guides/using-jinja.md) for a step-by-step example of using Jinja in a model, and turning it into a macro! #### Getting started[​](#getting-started "Direct link to Getting started") ##### Jinja[​](#jinja "Direct link to Jinja") Here's an example of a dbt model that leverages Jinja: /models/order\_payment\_method\_amounts.sql ```sql {% set payment_methods = ["bank_transfer", "credit_card", "gift_card"] %} select order_id, {% for payment_method in payment_methods %} sum(case when payment_method = '{{payment_method}}' then amount end) as {{payment_method}}_amount, {% endfor %} sum(amount) as total_amount from app_data.payments group by 1 ``` This query will get compiled to: /models/order\_payment\_method\_amounts.sql ```sql select order_id, sum(case when payment_method = 'bank_transfer' then amount end) as bank_transfer_amount, sum(case when payment_method = 'credit_card' then amount end) as credit_card_amount, sum(case when payment_method = 'gift_card' then amount end) as gift_card_amount, sum(amount) as total_amount from app_data.payments group by 1 ``` You can recognize Jinja based on the delimiters the language uses, which we refer to as "curlies": * **Expressions `{{ ... }}`**: Expressions are used when you want to output a string. You can use expressions to reference [variables](https://docs.getdbt.com/reference/dbt-jinja-functions/var.md) and call [macros](https://docs.getdbt.com/docs/build/jinja-macros.md#macros). * **Statements `{% ... %}`**: Statements don't output a string. They are used for control flow, for example, to set up `for` loops and `if` statements, to [set](https://jinja.palletsprojects.com/en/3.1.x/templates/#assignments) or [modify](https://jinja.palletsprojects.com/en/3.1.x/templates/#expression-statement) variables, or to define macros. * **Comments `{# ... #}`**: Jinja comments are used to prevent the text within the comment from executing or outputing a string. Don't use `--` for comment. When used in a dbt model, your Jinja needs to compile to a valid query. To check what SQL your Jinja compiles to: * **Using dbt:** Click the compile button to see the compiled SQL in the Compiled SQL pane * **Using dbt Core:** Run `dbt compile` from the command line. Then open the compiled SQL file in the `target/compiled/{project name}/` directory. Use a split screen in your code editor to keep both files open at once. ##### Macros[​](#macros "Direct link to Macros") [Macros](https://docs.getdbt.com/docs/build/jinja-macros.md) in Jinja are pieces of code that can be reused multiple times – they are analogous to "functions" in other programming languages, and are extremely useful if you find yourself repeating code across multiple models. Macros are defined in `.sql` files, typically in your `macros` directory ([docs](https://docs.getdbt.com/reference/project-configs/macro-paths.md)). Macro files can contain one or more macros — here's an example: macros/cents\_to\_dollars.sql ```sql {% macro cents_to_dollars(column_name, scale=2) %} ({{ column_name }} / 100)::numeric(16, {{ scale }}) {% endmacro %} ``` A model which uses this macro might look like: models/stg\_payments.sql ```sql select id as payment_id, {{ cents_to_dollars('amount') }} as amount_usd, ... from app_data.payments ``` This would be *compiled* to: target/compiled/models/stg\_payments.sql ```sql select id as payment_id, (amount / 100)::numeric(16, 2) as amount_usd, ... from app_data.payments ``` 💡 Use Jinja's whitespace control to tidy your macros! When you're modifying macros in your project, you might notice extra white space in your code in the `target/compiled` folder. You can remove unwanted spaces and lines with Jinja's [whitespace control](https://docs.getdbt.com/faqs/Jinja/jinja-whitespace.md) by using a minus sign. For example, use `{{- ... -}}` or `{%- ... %}` around your macro definitions (such as `{%- macro generate_schema_name(...) -%} ... {%- endmacro -%}`). ##### Using a macro from a package[​](#using-a-macro-from-a-package "Direct link to Using a macro from a package") A number of useful macros have also been grouped together into [packages](https://docs.getdbt.com/docs/build/packages.md) — our most popular package is [dbt-utils](https://hub.getdbt.com/dbt-labs/dbt_utils/latest/). After installing a package into your project, you can use any of the macros in your own project — make sure you qualify the macro by prefixing it with the [package name](https://docs.getdbt.com/reference/dbt-jinja-functions/project_name.md): ```sql select field_1, field_2, field_3, field_4, field_5, count(*) from my_table {{ dbt_utils.dimensions(5) }} ``` You can also qualify a macro in your own project by prefixing it with your [package name](https://docs.getdbt.com/reference/dbt-jinja-functions/project_name.md) (this is mainly useful for package authors). #### FAQs[​](#faqs "Direct link to FAQs") What parts of Jinja are dbt-specific? There are certain expressions that are specific to dbt — these are documented in the [Jinja function reference](https://docs.getdbt.com/reference/dbt-jinja-functions-context-variables.md) section of these docs. Further, docs blocks, snapshots, and materializations are custom Jinja *blocks* that exist only in dbt. Which docs should I use when writing Jinja or creating a macro? If you are stuck with a Jinja issue, it can get confusing where to check for more information. We recommend you check (in order): 1. [Jinja's Template Designer Docs](https://jinja.palletsprojects.com/page/templates/): This is the best reference for most of the Jinja you'll use 2. [Our Jinja function reference](https://docs.getdbt.com/reference/dbt-jinja-functions-context-variables.md): This documents any additional functionality we've added to Jinja in dbt. 3. [Agate's table docs](https://agate.readthedocs.io/page/api/table.html): If you're operating on the result of a query, dbt will pass it back to you as an agate table. This means that the methods you call on the table belong to the Agate library rather than Jinja or dbt. Why do I need to quote column names in Jinja? In the [macro example](https://docs.getdbt.com/docs/build/jinja-macros.md#macros) we passed the column name `amount` quotes: ```sql {{ cents_to_dollars('amount') }} as amount_usd ``` We have to use quotes to pass the *string* `'amount'` to the macro. Without the quotes, the Jinja parser will look for a variable named `amount`. Since this doesn't exist, it will compile to nothing. Quoting in Jinja can take a while to get used to! The rule is that you're within a Jinja expression or statement (i.e. within `{% ... %}` or `{{ ... }}`), you'll need to use quotes for any arguments that are strings. Single and double quotes are equivalent in Jinja – just make sure you match them appropriately. And if you do need to pass a variable as an argument, make sure you [don't nest your curlies](https://docs.getdbt.com/best-practices/dont-nest-your-curlies.md). My compiled SQL has a lot of spaces and new lines, how can I get rid of it? This is known as "whitespace control". Use a minus sign (`-`, e.g. `{{- ... -}}`, `{%- ... %}`, `{#- ... -#}`) at the start or end of a block to strip whitespace before or after the block (more docs [here](https://jinja.palletsprojects.com/page/templates/#whitespace-control)). Check out the [tutorial on using Jinja](https://docs.getdbt.com/guides/using-jinja.md#use-whitespace-control-to-tidy-up-compiled-code) for an example. Take caution: it's easy to fall down a rabbit hole when it comes to whitespace control! How do I debug my Jinja? You should get familiar with checking the compiled SQL in `target/compiled//` and the logs in `logs/dbt.log` to see what dbt is running behind the scenes. You can also use the [log](https://docs.getdbt.com/reference/dbt-jinja-functions/log.md) function to debug Jinja by printing objects to the command line. How do I document macros? To document macros, use a [properties file](https://docs.getdbt.com/reference/macro-properties.md) and nest the configurations under a `macros:` key #### Example[​](#example "Direct link to Example") macros/properties.yml ```yml macros: - name: cents_to_dollars description: A macro to convert cents to dollars arguments: - name: column_name type: column description: The name of the column you want to convert - name: precision type: integer description: Number of decimal places. Defaults to 2. ``` tip From dbt Core v1.10, you can opt into validating the arguments you define in macro documentation using the `validate_macro_args` behavior change flag. When enabled, dbt will: * Infer arguments from the macro and includes them in the [manifest.json](https://docs.getdbt.com/reference/artifacts/manifest-json.md) file if no arguments are documented. * Raise a warning if documented argument names don't match the macro definition. * Raise a warning if `type` fields don't follow [supported formats](https://docs.getdbt.com/reference/resource-properties/arguments.md#supported-types). Learn more about [macro argument validation](https://docs.getdbt.com/reference/global-configs/behavior-changes.md#macro-argument-validation). #### Document a custom materialization[​](#document-a-custom-materialization "Direct link to Document a custom materialization") When you create a [custom materialization](https://docs.getdbt.com/guides/create-new-materializations.md), dbt creates an associated macro with the following format: ```text materialization_{materialization_name}_{adapter} ``` To document a custom materialization, use the previously mentioned format to determine the associated macro name(s) to document. macros/properties.yml ```yaml macros: - name: materialization_my_materialization_name_default description: A custom materialization to insert records into an append-only table and track when they were added. - name: materialization_my_materialization_name_xyz description: A custom materialization to insert records into an append-only table and track when they were added. ``` Why does my dbt output have so many macros in it? The output of a dbt run counts over 100 macros in your project! ```shell $ dbt run Running with dbt=1.7.0 Found 1 model, 0 tests, 0 snapshots, 0 analyses, 138 macros, 0 operations, 0 seed files, 0 sources ``` This is because dbt ships with its own project, which also includes macros! You can learn more about this [here](https://discourse.getdbt.com/t/did-you-know-dbt-ships-with-its-own-project/764). #### dbtonic Jinja[​](#dbtonic-jinja "Direct link to dbtonic Jinja") Just like well-written python is pythonic, well-written dbt code is dbtonic. ##### Favor readability over DRY-ness[​](#favor-readability-over-dry-ness "Direct link to favor-readability-over-dry-ness") Once you learn the power of Jinja, it's common to want to abstract every repeated line into a macro! Remember that using Jinja can make your models harder for other users to interpret — we recommend favoring readability when mixing Jinja with SQL, even if it means repeating some lines of SQL in a few places. If all your models are macros, it might be worth re-assessing. ##### Leverage package macros[​](#leverage-package-macros "Direct link to Leverage package macros") Writing a macro for the first time? Check whether we've open sourced one in [dbt-utils](https://hub.getdbt.com/dbt-labs/dbt_utils/latest/) that you can use, and save yourself some time! ##### Set variables at the top of a model[​](#set-variables-at-the-top-of-a-model "Direct link to Set variables at the top of a model") `{% set ... %}` can be used to create a new variable, or update an existing one. We recommend setting variables at the top of a model, rather than hardcoding it inline. This is a practice borrowed from many other coding languages, since it helps with readability, and comes in handy if you need to reference the variable in two places: ```sql -- 🙅 This works, but can be hard to maintain as your code grows {% for payment_method in ["bank_transfer", "credit_card", "gift_card"] %} ... {% endfor %} -- ✅ This is our preferred method of setting variables {% set payment_methods = ["bank_transfer", "credit_card", "gift_card"] %} {% for payment_method in payment_methods %} ... {% endfor %} ``` ##### Questions from the Community[​](#questions-from-the-community "Direct link to Questions from the Community") ![Loading](/img/loader-icon.svg)[Ask the Community](https://discourse.getdbt.com/new-topic?category=help\&tags=wee "Ask the Community") #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Job commands A dbt production job allows you to set up a system to run a dbt job and job commands on a schedule, rather than running dbt commands manually from the command line or [Studio IDE](https://docs.getdbt.com/docs/cloud/studio-ide/develop-in-studio.md). A job consists of commands that are "chained" together and executed as run steps. Each run step can succeed or fail, which may determine the job's run status (Success, Cancel, or Error). Each job allows you to: * Configure job commands * View job run details, including timing, artifacts, and detailed run steps * Access logs to view or help debug issues and historical invocations of dbt * Set up notifications, and [more](https://docs.getdbt.com/docs/deploy/deployments.md#dbt-cloud) #### Job command types[​](#job-command-types "Direct link to Job command types") Job commands are specific tasks executed by the job, and you can configure them seamlessly by either adding [dbt commands](https://docs.getdbt.com/reference/dbt-commands.md) or using the checkbox option in the **Commands** section. During a job run, the commands are "chained" together and executed as run steps. When you add a dbt command in the **Commands** section, you can expect different outcomes compared to the checkbox option. [![Configuring checkbox and commands list](/img/docs/dbt-cloud/using-dbt-cloud/job-commands.gif?v=2 "Configuring checkbox and commands list")](#)Configuring checkbox and commands list ##### Built-in commands[​](#built-in-commands "Direct link to Built-in commands") Every job invocation automatically includes the [`dbt deps`](https://docs.getdbt.com/reference/commands/deps.md) command, meaning you don't need to add it to the **Commands** list in your job settings. You will also notice every job will include a run step to reclone your repository and connect to your data platform, which can affect your job status if these run steps aren't successful. **Job outcome** — During a job run, the built-in commands are "chained" together. This means if one of the run steps in the chain fails, then the next commands aren't executed, and the entire job fails with an "Error" job status. [![A failed job that had an error during the dbt deps run step.](/img/docs/dbt-cloud/using-dbt-cloud/fail-dbtdeps.png?v=2 "A failed job that had an error during the dbt deps run step.")](#)A failed job that had an error during the dbt deps run step. ##### Checkbox commands[​](#checkbox-commands "Direct link to Checkbox commands") For every job, you have the option to select the [Generate docs on run](https://docs.getdbt.com/docs/explore/build-and-view-your-docs.md) or [Run source freshness](https://docs.getdbt.com/docs/deploy/source-freshness.md) checkboxes, enabling you to run the commands automatically. **Job outcome Generate docs on run checkbox** — dbt executes the `dbt docs generate` command, *after* the listed commands. If that particular run step in your job fails, the job can still succeed if all subsequent run steps are successful. Read [Set up documentation job](https://docs.getdbt.com/docs/explore/build-and-view-your-docs.md) for more info. **Job outcome Source freshness checkbox** — dbt executes the `dbt source freshness` command as the first run step in your job. If that particular run step in your job fails, the job can still succeed if all subsequent run steps are successful. Read [Source freshness](https://docs.getdbt.com/docs/deploy/source-freshness.md) for more info. ##### Command list[​](#command-list "Direct link to Command list") You can add or remove as many dbt commands as necessary for every job. However, you need to have at least one dbt command. There are few commands listed as "dbt CLI" or "dbt Core" in the [dbt Command reference page](https://docs.getdbt.com/reference/dbt-commands.md) page. This means they are meant for use in dbt Core or dbt CLI, and not in Studio IDE. Using selectors Use [selectors](https://docs.getdbt.com/reference/node-selection/syntax.md) as a powerful way to select and execute portions of your project in a job run. For example, to run tests for `one_specific_model`, use the selector: `dbt test --select one_specific_model`. The job will still run if a selector doesn't match any models. ###### Compare changes custom commands[​](#compare-changes-custom-commands "Direct link to Compare changes custom commands") For users that have Advanced CI's [compare changes](https://docs.getdbt.com/docs/deploy/advanced-ci.md#compare-changes) feature enabled and selected the **dbt compare** checkbox, you can add custom dbt commands to optimize running the comparison (for example, to exclude specific large models, or groups of models with tags). Running comparisons on large models can significantly increase the time it takes for CI jobs to complete. [![Add custom dbt commands to when using dbt compare.](/img/docs/deploy/dbt-compare.jpg?v=2 "Add custom dbt commands to when using dbt compare.")](#)Add custom dbt commands to when using dbt compare. The following examples highlight how you can customize the dbt compare command box: * Exclude the large `fct_orders` model from the comparison to run a CI job on fewer or smaller models and reduce job time/resource consumption. Use the following command: ```sql --select state:modified --exclude fct_orders ``` * Exclude models based on tags for scenarios like when models share a common feature or function. Use the following command: ```sql --select state:modified --exclude tag:tagname_a tag:tagname_b ``` * Include models that were directly modified and also those one step downstream using the `modified+1` selector. Use the following command: ```sql --select state:modified+1 ``` ###### Job outcome[​](#job-outcome "Direct link to Job outcome") During a job run, the commands are "chained" together and executed as run steps. If one of the run steps in the chain fails, then the subsequent steps aren't executed, and the job will fail. In the following example image, the first four run steps are successful. However, if the fifth run step (`dbt run --select state:modified+ --full-refresh --fail-fast`) fails, then the next run steps aren't executed, and the entire job fails. The failed job returns a non-zero [exit code](https://docs.getdbt.com/reference/exit-codes.md) and "Error" job status: [![A failed job run that had an error during a run step](/img/docs/dbt-cloud/using-dbt-cloud/skipped-jobs.png?v=2 "A failed job run that had an error during a run step")](#)A failed job run that had an error during a run step #### Job command failures[​](#job-command-failures "Direct link to Job command failures") Job command failures can mean different things for different commands. Some common reasons why a job command may fail: * **Failure at`dbt run`** — [`dbt run`](https://docs.getdbt.com/reference/commands/run.md) executes compiled SQL model files against the current target database. It will fail if there is an error in any of the built models. Tests on upstream resources prevent downstream resources from running and a failed test will skip them. * **Failure at `dbt test`** — [`dbt test`](https://docs.getdbt.com/reference/commands/test.md) runs tests defined on models, sources, snapshots, and seeds. A test can pass, fail, or warn depending on its [severity](https://docs.getdbt.com/reference/resource-configs/severity.md). Unless you set [warnings as errors](https://docs.getdbt.com/reference/global-configs/warnings.md), only an error stops the next step. * **Failure at `dbt build`** — [`dbt build`](https://docs.getdbt.com/reference/commands/build.md) runs models, tests, snapshots, and seeds. This command executes resources in the DAG-specified order. If any upstream resource fails, all downstream resources are skipped, and the command exits with an error code of 1. * **Selector failures** * If a [`select`](https://docs.getdbt.com/reference/node-selection/set-operators.md) matches multiple nodes and one of the nodes fails, then the job will have an exit code `1` and the subsequent command will fail. If you specified the [`—fail-fast`](https://docs.getdbt.com/reference/global-configs/failing-fast.md) flag, then the first failure will stop the entire connection for any models that are in progress. * If a selector doesn't match any nodes, it's not considered a failure. #### Related docs[​](#related-docs "Direct link to Related docs") * [Job creation best practices](https://discourse.getdbt.com/t/job-creation-best-practices-in-dbt-cloud-feat-my-moms-lasagna/2980) * [dbt Command reference](https://docs.getdbt.com/reference/dbt-commands.md) * [Job notifications](https://docs.getdbt.com/docs/deploy/job-notifications.md) * [Source freshness](https://docs.getdbt.com/docs/deploy/source-freshness.md) * [Build and view your docs](https://docs.getdbt.com/docs/explore/build-and-view-your-docs.md) #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Job notifications Set up notifications in dbt platform to receive alerts about the status of a job run. You can choose to be notified by one or more of the following job run statuses: * **Succeeds** option — A job run completed successfully with no warnings or errors. * **Warns** option — A job run encountered warnings from [data tests](https://docs.getdbt.com/docs/build/data-tests.md) or [source freshness](https://docs.getdbt.com/docs/deploy/source-freshness.md) checks (if applicable). * **Fails** option — A job run failed to complete. * **Is canceled** option — A job run is canceled. ##### Notification options[​](#notification-options "Direct link to Notification options") dbt platform currently supports the following notification channels: * [Email](#email-notifications) — Available for all users * [Slack (user-linked)](#slack-notifications) — Available for all users * [Slack (account-level)](#slack-notifications-account) — Available in beta. To request access, contact your account manager. * [Microsoft Teams](#microsoft-teams-notifications) — Available in beta. To request access, contact your account manager. Microsoft Teams without the beta If you don’t have access to the native Microsoft Teams integration (available in beta), you can still send job notifications to a Teams channel by using the channel’s email address as an external email, as explained in the next section, Email notifications. #### Email notifications[​](#email-notifications "Direct link to Email notifications") You can receive email alerts about jobs by configuring the dbt email notification settings. ##### Prerequisites[​](#prerequisites "Direct link to Prerequisites") * You must be either a *developer user* or an *account admin* to configure email notifications in dbt. For more details, refer to [Users and licenses](https://docs.getdbt.com/docs/cloud/manage-access/seats-and-users.md). * As a developer user, you can set up email notifications for yourself. * As an account admin, you can set up notifications for yourself and other team members. ##### Configure email notifications[​](#configure-email-notifications "Direct link to Configure email notifications") 1. Select your profile icon and then click **Notification settings**. 2. By default, dbt sends notifications to the email address that's in your **User profile** page. If you're an account admin, you can choose a different email address to receive notifications: 1. Under Job notifications, click the **Notification email** dropdown. 2. Select another address from the list. The list includes **Internal Users** with access to the account and **External Emails** that have been added. 3. To add an external email address, click the **Notification email** dropdown 4. Click **Add external email**. 5. Enter the email address, and click Add user. After adding an external email, it becomes available for selection in the **Notification email** dropdown list. External emails can be addresses that are outside of your dbt account and also for third-party integrations like [channels in Microsoft Teams](https://support.microsoft.com/en-us/office/tip-send-email-to-a-channel-2c17dbae-acdf-4209-a761-b463bdaaa4ca) and [PagerDuty email integration](https://support.pagerduty.com/docs/email-integration-guide). note External emails and their notification settings persist until edited or removed even if you remove the admin who added them from the account. [![Example of the Notification email dropdown](/img/docs/deploy/example-notification-external-email.png?v=2 "Example of the Notification email dropdown")](#)Example of the Notification email dropdown 3. Select the **Environment** for the jobs you want to receive notifications about from the dropdown. 4. Click **Edit** to configure the email notification settings. Choose one or more of the run statuses for each job you want to receive notifications about. 5. When you're done with the settings, click **Save**. As an account admin, you can add more email recipients by choosing another **Notification email** from the dropdown, **Edit** the job notification settings, and **Save** the changes. To set up alerts on jobs from a different environment, select another **Environment** from the dropdown, **Edit** those job notification settings, and **Save** the changes. [![Example of the Email notifications page](/img/docs/deploy/example-email-notification-settings-page.png?v=2 "Example of the Email notifications page")](#)Example of the Email notifications page ##### Unsubscribe from email notifications[​](#unsubscribe-from-email-notifications "Direct link to Unsubscribe from email notifications") 1. Select your profile icon and click on **Notification settings**. 2. On the **Email notifications** page, click **Unsubscribe from all email notifications**. ##### Send job notifications to a Microsoft Teams channel (email)[​](#email-job-notification-teams "Direct link to Send job notifications to a Microsoft Teams channel (email)") You can send dbt job [notification emails](#configure-email-notifications) directly to a Microsoft Teams channel by using the channel’s email address. 1. In Microsoft Teams, get the email address for the channel you want to send notifications to. See [Send an email to a channel](https://support.microsoft.com/en-us/office/tip-send-email-to-a-channel-2c17dbae-acdf-4209-a761-b463bdaaa4ca). 2. In dbt platform, click on your profile in the left sidebar and then click **Notification settings**. 3. Under **Job notifications**, click the **Notification email** dropdown. 4. To add an external email address, click **Add external email** at the bottom of the dropdown. 5. Enter the Teams channel email address, and click **Add user**. 6. Make sure you select the Teams channel email from the **Notification email** dropdown (it might be selected already). 7. Then choose the environment for the jobs you want to receive notifications from. 8. Click **Edit**, select the job statuses you want. Then click **Save** to save. #### Slack notifications (user)[​](#slack-notifications "Direct link to Slack notifications (user)") You can receive Slack alerts about jobs by setting up the Slack integration and then configuring the dbt Slack notification settings. dbt integrates with Slack via OAuth to ensure secure authentication. This is the current Slack integration available for all users and set at the user level, not to be confused with the [Slack notifications at the account level](#slack-notifications-account) feature, which is available only in beta. To request access, contact your account manager. Only refer to these instructions if you *don't* have access to the beta features. note Virtual Private Cloud (VPC) admins must [contact support](mailto:support@getdbt.com) to complete the Slack integration. If there has been a change in user roles or Slack permissions where you no longer have access to edit a configured Slack channel, please [contact support](mailto:support@getdbt.com) for assistance. ##### Prerequisites[​](#prerequisites-1 "Direct link to Prerequisites") * You have a Slack workspace that you want to receive job notifications from. * You must be a Slack Workspace Owner. * You must be an account admin to configure Slack notifications in dbt. For more details, refer to [Users and licenses](https://docs.getdbt.com/docs/cloud/manage-access/seats-and-users.md). * The integration only supports *public* channels in the Slack workspace. ##### Set up the Slack integration[​](#set-up-the-slack-integration "Direct link to Set up the Slack integration") 1. Select **Account settings** and then select **Integrations** from the left sidebar. 2. Locate the **OAuth** section with the Slack application and click **Link**. [![Link for the Slack app](/img/docs/dbt-cloud/Link-your-Slack-Profile.png?v=2 "Link for the Slack app")](#)Link for the Slack app ###### Logged in to Slack[​](#logged-in-to-slack "Direct link to Logged in to Slack") If you're already logged in to Slack, the handshake only requires allowing the app access. If you're a member of multiple workspaces, you can select the appropriate workspace from the dropdown menu in the upper right corner. [![Allow dbt access to Slack](/img/docs/dbt-cloud/Allow-dbt-to-access-slack.png?v=2 "Allow dbt access to Slack")](#)Allow dbt access to Slack ###### Logged out[​](#logged-out "Direct link to Logged out") If you're logged out or the Slack app/website is closed, you must authenticate before completing the integration. 1. Complete the field defining the Slack workspace you want to integrate with dbt. [![Define the workspace](/img/docs/dbt-cloud/define-workspace.png?v=2 "Define the workspace")](#)Define the workspace 2. Sign in with an existing identity or use the email address and password. 3. Once you have authenticated successfully, accept the permissions. [![Allow dbt access to Slack](/img/docs/dbt-cloud/accept-permissions.png?v=2 "Allow dbt access to Slack")](#)Allow dbt access to Slack ##### Configure Slack notifications[​](#configure-slack-notifications "Direct link to Configure Slack notifications") 1. Select your profile icon and then click on **Notification settings**. 2. Select **Slack notifications** in the left sidebar. 3. Select the **Notification channel** you want to receive the job run notifications from the dropdown. [![Example of the Notification channel dropdown](/img/docs/deploy/example-notification-slack-channels.png?v=2 "Example of the Notification channel dropdown")](#)Example of the Notification channel dropdown 4. Select the **Environment** for the jobs you want to receive notifications about from the dropdown. 5. Click **Edit** to configure the Slack notification settings. Choose one or more of the run statuses for each job you want to receive notifications about. 6. When you're done with the settings, click **Save**. To send alerts to another Slack channel, select another **Notification channel** from the dropdown, **Edit** those job notification settings, and **Save** the changes. To set up alerts on jobs from a different environment, select another **Environment** from the dropdown, **Edit** those job notification settings, and **Save** the changes. [![Example of the Slack notifications page](/img/docs/deploy/example-slack-notification-settings-page.png?v=2 "Example of the Slack notifications page")](#)Example of the Slack notifications page ##### Disable the Slack integration[​](#disable-the-slack-integration "Direct link to Disable the Slack integration") 1. Select **Account settings** and on the **Integrations** page, scroll to the **OAuth** section. 2. Click the **X** icon (on the far right of the Slack integration) and click **Unlink**. Channels that you configured will no longer receive Slack notifications. *This is not an account-wide action.* Channels configured by other account admins will continue to receive Slack notifications if they still have active Slack integrations. To migrate ownership of a Slack channel notification configuration, have another account admin edit their configuration. #### Slack notifications (account) [Beta](https://docs.getdbt.com/docs/dbt-versions/product-lifecycles "Go to https://docs.getdbt.com/docs/dbt-versions/product-lifecycles")[​](#slack-notifications-account "Direct link to slack-notifications-account") info Configuring Slack notifications at the account level is currently available in beta. To request access, contact your account manager. Only refer to these instructions if you have access to the beta feature. Integrate Slack with dbt platform at the account level to receive job notifications in Slack. dbt integrates with Slack via OAuth to ensure secure authentication. A single dbt platform account can integrate with one Slack workspace. ##### Prerequisites[​](#prerequisites-2 "Direct link to Prerequisites") * You have a Slack workspace that you want to receive job notifications from. * A dbt platform account admin must link the Slack app at the account level. * Install the official dbt platform Slack app using the [steps outlined in the next section](#set-up-the-slack-integration-1). * To install the Slack app to a workspace, your Slack org must permit app installations. In some orgs this requires a Slack admin approval. * The integration only supports *public* channels in the Slack workspace. After an account admin links the Slack app for the account, [any licensed user](https://docs.getdbt.com/docs/cloud/manage-access/seats-and-users.md) in the account can configure Slack job notifications so long as they are assigned to the **Account Admin**, **Owner**, or **Member** default [groups](https://docs.getdbt.com/docs/cloud/manage-access/about-user-access.md#groups). IT licenses don't have access to configure Slack job notifications. ##### Set up the Slack integration[​](#set-up-the-slack-integration-1 "Direct link to Set up the Slack integration") The account-level Slack integration uses the official dbt platform Slack app, which is separate from the [user-linked Slack integration](#slack-notifications). To use the beta Slack notifications, you must unlink the old Slack app and then connect the new official app: 1. Go to **Account settings** > **Integrations** > **OAuth**. 2. Click the **X** icon next to Slack and select **Unlink**. 3. In the same OAuth section, click **Link** to connect the official Slack app. Until you do this, the account-level Slack option will not appear. [![Link for the Slack app](/img/docs/dbt-cloud/Link-your-Slack-Profile.png?v=2 "Link for the Slack app")](#)Link for the Slack app ##### Logged in to Slack[​](#logged-in-to-slack-1 "Direct link to Logged in to Slack") If you're already logged in to Slack, the integration only requires allowing the app access. If you're a member of multiple workspaces, you can select the appropriate workspace from the dropdown menu in the upper right corner. [![Allow dbt access to Slack](/img/docs/dbt-cloud/Allow-dbt-to-access-slack.png?v=2 "Allow dbt access to Slack")](#)Allow dbt access to Slack ##### Logged out[​](#logged-out-1 "Direct link to Logged out") If you're logged out or the Slack app/website is closed, you must authenticate before completing the integration. 1. Complete the field defining the Slack workspace you want to integrate with dbt. [![Define the workspace](/img/docs/dbt-cloud/define-workspace.png?v=2 "Define the workspace")](#)Define the workspace 2. Sign in with an existing identity or use the email address and password. 3. Once you have authenticated successfully, accept the permissions. [![Allow dbt access to Slack](/img/docs/dbt-cloud/accept-permissions.png?v=2 "Allow dbt access to Slack")](#)Allow dbt access to Slack ##### Configure Slack notifications[​](#configure-slack-notifications-1 "Direct link to Configure Slack notifications") Configure the Slack channel you want to receive job notifications from. 1. Select your profile icon and then click on **Notification settings**. 2. Select **Slack notifications** in the left sidebar. 3. From the first dropdown, select the **Notification channel** you want to receive the job run notifications. [![Example of the Notification channel dropdown](/img/docs/deploy/example-notification-slack-channels.png?v=2 "Example of the Notification channel dropdown")](#)Example of the Notification channel dropdown 4. From the second dropdown, select the **Environment** for the jobs you want to receive notifications about. 5. Click **Edit** to configure the Slack notification settings. Choose one or more of the run statuses for each job you want to receive notifications about. 6. When you're done with the settings, click **Save**. * To send alerts to another Slack channel, select another **Notification channel** from the dropdown, **Edit** those job notification settings, and **Save** the changes. * To set up alerts on jobs from a different environment, select another **Environment** from the dropdown, **Edit** those job notification settings, and **Save** the changes. [![Example of the Slack notifications page](/img/docs/deploy/example-slack-notification-settings-page.png?v=2 "Example of the Slack notifications page")](#)Example of the Slack notifications page That's it! Your Slack channel is now set up to receive dbt job notifications at the account level. This integration is now available throughout the account for all licensed users. ##### Disable the Slack integration[​](#disable-the-slack-integration-1 "Direct link to Disable the Slack integration") In this step, you'll disable the Slack integration and remove the account-level Slack credentials. You can always re-enable the integration by following the [Set up the Slack integration](#set-up-the-slack-integration-1) steps. 1. Select **Account settings** and on the **Integrations** page, scroll to the **OAuth** section. 2. Click the **X** icon (on the far right of the Slack integration) and click **Unlink**. * This removes the account-level Slack credentials. All Slack notifications that rely on the account-level integration will stop sending. * If any legacy, user-linked Slack integrations still exist, those notifications may continue until the legacy link is removed. We recommend migrating to the new account-level app and removing legacy links. #### Microsoft Teams notifications [Beta](https://docs.getdbt.com/docs/dbt-versions/product-lifecycles "Go to https://docs.getdbt.com/docs/dbt-versions/product-lifecycles")[​](#microsoft-teams-notifications- "Direct link to microsoft-teams-notifications-") info Configuring Microsoft Teams notifications is currently in beta. To request access, contact your account manager. You can receive Microsoft Teams alerts for your dbt jobs by connecting your Teams account to the dbt platform, and configuring your notification preferences. dbt integrates with Teams through Microsoft Entra to provide secure authentication. Only refer to these instructions if you have access to the beta feature. ##### Prerequisites[​](#prerequisites-3 "Direct link to Prerequisites") Before you begin: * You must have a dbt platform account * You have a Microsoft Teams account that you want to receive job notifications from. * Make sure you have permission to view the **Account integrations** and **Job notifications** pages in dbt platform. ##### Set up Microsoft Teams[​](#set-up-microsoft-teams "Direct link to Set up Microsoft Teams") To enable Microsoft Teams job notifications, complete the following sections: 1. [Link dbt platform account to Teams](#link-dbt-platform-account-to-teams) — A user-level connection that links an individual dbt platform account (or a dedicated service account) to a Microsoft Teams user profile within your tenant. 2. [Configure Teams notifications](#configure-teams-notifications) — Configures which Teams channels receive job notifications. 3. [Disable the Teams integration](#disable-the-teams-integration) (optional) — Remove or reset the connection between dbt platform and Microsoft Teams. ##### Link dbt platform account to Teams[​](#link-dbt-platform-account-to-teams "Direct link to Link dbt platform account to Teams") info You can link any Teams user account from your tenant, but we recommend creating a dedicated account just for posting dbt notifications. During the OAuth process, you’ll need to sign in to a Microsoft account to complete the integration. * If you’re logged into a single Microsoft account, the integration will complete automatically. * If you’re logged into multiple accounts (or none), you’ll be prompted to select or log in to one.  Image of the Microsoft account selection popup [![Example of the Microsoft account popup](/img/docs/deploy/pick-account.png?v=2 "Example of the Microsoft account popup")](#)Example of the Microsoft account popup To link your dbt platform account to Microsoft Teams: 1. In dbt platform, go to the **Account settings** page by clicking on your account name and selecting **Account settings**. 2. In the left sidebar, select **Integrations**. 3. Scroll to the **OAuth** section. 4. Next to **Teams** and click on the **Link** button. 5. After doing this, you’ll either be prompted to choose your Microsoft account before completing the setup, or return directly to the dbt platform with your Teams profile linked. 6. Your dbt platform account is now linked to Microsoft Teams! dbt will now add the **dbt-cloud-integration app** to your Microsoft Entra tenant. This app manages authentication requests and permissions securely. [![Example of the dbt-cloud-integration app overview](/img/docs/deploy/dbt-cloud-integrations.png?v=2 "Example of the dbt-cloud-integration app overview")](#)Example of the dbt-cloud-integration app overview * The current Entra app permissions are: * `profile` * `openid` * `offline_access` * `Team.ReadBasic.All` * `TeamsActivity.Send` * `ChannelMessage.Send` * `ChannelMessage.Read.All` * `Channel.ReadBasic.All` ##### Configure Teams notifications[​](#configure-teams-notifications "Direct link to Configure Teams notifications") Once you’ve connected dbt platform and Teams, you can configure which Teams channels receive job notifications. The **Teams notifications** menu requires that you have an active integration with Teams on the account. info Currently, dbt only sends notifications to Teams channels (standard, shared, or private) that you belong to. 1. In the dbt platform, click your profile icon and select **Notification settings**. 2. Select **Teams notifications** in the left sidebar. 3. From the first dropdown, select the **Notification team** that you want to send notifications to. 4. From the second dropdown, select the **Notification channel** you want to send notifications to. * dbt platform only sends notifications to Teams channels (standard, shared, or private) that *you* belong to. 5. In the dropdown, choose the environment for the jobs you want to receive notifications about. 6. Click **Edit** on the top right to configure the Teams job notification settings and customize which job statuses trigger job notifications. 7. When finished, click **Save**. Your Teams channel is now set up to receive dbt job notifications! [![Example of the configure Teams notification page](/img/docs/deploy/configure-teams-notification.png?v=2 "Example of the configure Teams notification page")](#)Example of the configure Teams notification page ##### Disable the Teams integration[​](#disable-the-teams-integration "Direct link to Disable the Teams integration") Disabling and unlinking the Teams integration in the dbt platform removes it for the entire account. To disable it: 1. In the dbt platform, go to **Account settings**. 2. Click on **Integrations** and scroll down to **OAuth**. 3. On the far right of the **Teams** integration, click the **X** icon. 4. Confirm the unlinking by selecting **Unlink**. The Teams integration has been disabled. You can always re-enable the integration by following the [Set up Microsoft Teams](#set-up-microsoft-teams) steps. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Job scheduler The job scheduler is the backbone of running jobs in dbt, bringing power and simplicity to building data pipelines in both continuous integration and production contexts. The scheduler frees teams from having to build and maintain their own infrastructure, and ensures the timeliness and reliability of data transformations. The scheduler enables both cron-based and event-driven execution of dbt commands in the user’s data platform. Specifically, it handles: * Cron-based execution of dbt jobs that run on a predetermined cadence * Event-driven execution of dbt jobs that run based on the completion of another job ([trigger on job completion](https://docs.getdbt.com/docs/deploy/deploy-jobs.md#trigger-on-job-completion)) * Event-driven execution of dbt CI jobs triggered when a pull request is merged to the branch ([merge jobs](https://docs.getdbt.com/docs/deploy/merge-jobs.md)) * Event-driven execution of dbt jobs triggered by API * Event-driven execution of dbt jobs manually triggered by a user to **Run now** The scheduler handles various tasks including: * Queuing jobs * Creating temporary environments to run the dbt commands required for those jobs * Providing logs for debugging and remediation * Storing dbt artifacts for direct consumption/ingestion by the Discovery API The scheduler also: * Uses [dbt's Git repository caching](https://docs.getdbt.com/docs/cloud/account-settings.md#git-repository-caching) to protect against third-party outages and improve job run reliability. [Enterprise](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing")[Enterprise +](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing") * Powers running dbt in staging and production environments, bringing ease and confidence to CI/CD workflows and enabling observability and governance in deploying dbt at scale. * Uses [Hybrid projects](https://docs.getdbt.com/docs/deploy/hybrid-projects.md) to upload dbt Core artifacts into dbt for central visibility, cross-project referencing, and easier collaboration. [Beta](https://docs.getdbt.com/docs/dbt-versions/product-lifecycles "Go to https://docs.getdbt.com/docs/dbt-versions/product-lifecycles")[Enterprise +](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing") * Uses [state-aware orchestration](https://docs.getdbt.com/docs/deploy/state-aware-about.md) to decide what needs to be rebuilt based on source freshness, model staleness, and code changes. [Beta](https://docs.getdbt.com/docs/dbt-versions/product-lifecycles "Go to https://docs.getdbt.com/docs/dbt-versions/product-lifecycles")[Enterprise](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing")[Enterprise +](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing") #### Scheduler terms[​](#scheduler-terms "Direct link to Scheduler terms") Familiarize yourself with these useful terms to help you understand how the job scheduler works. | Term | Definition | | ------------------ | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | Scheduler | The dbt engine that powers job execution. The scheduler queues scheduled or API-triggered job runs, prepares an environment to execute job commands in your cloud data platform, and stores and serves logs and artifacts that are byproducts of run execution. | | Job | A collection of run steps, settings, and a trigger to invoke dbt commands against a project in the user's cloud data platform. | | Job queue | The job queue acts as a waiting area for job runs when they are scheduled or triggered to run; runs remain in queue until execution begins. More specifically, the Scheduler checks the queue for runs that are due to execute, ensures the run is eligible to start, and then prepares an environment with appropriate settings, credentials, and commands to begin execution. Once execution begins, the run leaves the queue. | | Over-scheduled job | A situation when a cron-scheduled job's run duration becomes longer than the frequency of the job’s schedule, resulting in a job queue that will grow faster than the scheduler can process the job’s runs. | | Deactivated job | A situation where a job has reached 100 consecutive failing runs. | | Prep time | The time dbt takes to create a short-lived environment to execute the job commands in the user's cloud data platform. Prep time varies most significantly at the top of the hour when the dbt Scheduler experiences a lot of run traffic. | | Run | A single, unique execution of a dbt job. | | Run slot | Run slots control the number of jobs that can run concurrently. Each running job occupies a run slot for the duration of the run. To view the number of run slots available in your plan, check out the [dbt pricing page](https://www.getdbt.com/pricing).

Starter and Developer plans are limited to one project each. For additional projects or more run slots, consider upgrading to an [Enterprise-tier plan](https://www.getdbt.com/pricing/). | | Threads | When dbt builds a project's DAG, it tries to parallelize the execution by using threads. The [thread](https://docs.getdbt.com/docs/running-a-dbt-project/using-threads.md) count is the maximum number of paths through the DAG that dbt can work on simultaneously. The default thread count in a job is 4. | | Wait time | Amount of time that dbt waits before running a job, either because there are no available slots or because a previous run of the same job is still in progress. | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | #### Scheduler queue[​](#scheduler-queue "Direct link to Scheduler queue") The scheduler queues a deployment job to be processed when it's triggered to run by a [set schedule](https://docs.getdbt.com/docs/deploy/deploy-jobs.md#schedule-days), [a job completed](https://docs.getdbt.com/docs/deploy/deploy-jobs.md#trigger-on-job-completion), an API call, or manual action. Before the job starts executing, the scheduler checks these conditions to determine if the run can start executing: * **Is there a run slot that's available on the account for use?** — If all run slots are occupied, the queued run will wait. The wait time is displayed in dbt. If there are long wait times, [upgrading to an Enterprise-tier plan](https://www.getdbt.com/contact/) can provide more run slots and allow for higher job concurrency. * **Does this same job have a run already in progress?** — The scheduler executes distinct runs of the same dbt job serially to avoid model build collisions. If there's a job already running, the queued job will wait, and the wait time will be displayed in dbt. If there is an available run slot and there isn't an actively running instance of the job, the scheduler will prepare the job to run in your cloud data platform. This prep involves readying a Kubernetes pod with the right version of dbt installed, setting environment variables, loading data platform credentials, and Git provider authorization, amongst other environment-setting tasks. The time it takes to prepare the job is displayed as **Prep time** in the UI. [![An overview of a dbt job run](/img/docs/dbt-cloud/deployment/deploy-scheduler.png?v=2 "An overview of a dbt job run")](#)An overview of a dbt job run ##### Treatment of CI jobs[​](#treatment-of-ci-jobs "Direct link to Treatment of CI jobs") When compared to deployment jobs, the scheduler behaves differently when handling [continuous integration (CI) jobs](https://docs.getdbt.com/docs/deploy/continuous-integration.md). It queues a CI job to be processed when it's triggered to run by a Git pull request, and the conditions the scheduler checks to determine if the run can start executing are also different: * **Will the CI run consume a run slot?** — CI runs don't consume run slots and will never block production runs. * **Does this same job have a run already in progress?** — CI runs can execute concurrently (in parallel). CI runs build into unique temporary schemas, and CI checks execute in parallel to help increase team productivity. Teammates never have to wait to get a CI check review. ##### Treatment of merge jobs[​](#treatment-of-merge-jobs "Direct link to Treatment of merge jobs") When triggered by a *merged* Git pull request, the scheduler queues a [merge job](https://docs.getdbt.com/docs/deploy/merge-jobs.md) to be processed. * **Will the merge job run consume a run slot?** — Yes, merge jobs do consume run slots. * **Does this same job have a run already in progress?** — A merge job can only have one run in progress at a time. If there are multiple runs queued up, the scheduler will enqueue the most recent run and cancel all the other runs. If there is a run in progress, it will wait until the run completes before queuing the next run. #### Job memory[​](#job-memory "Direct link to Job memory") In dbt, the setting to provision memory available to a job is defined at the account-level and applies to each job running in the account; the memory limit cannot be customized per job. If a running job reaches its memory limit, the run is terminated with a "memory limit error" message. Jobs consume a lot of memory in the following situations: * A high thread count was specified * Custom dbt macros attempt to load data into memory instead of pushing compute down to the cloud data platform * Having a job that generates dbt project documentation for a large and complex dbt project. * To prevent problems with the job running out of memory, we recommend generating documentation in a separate job that is set aside for that task and removing `dbt docs generate` from all other jobs. This is especially important for large and complex projects. Refer to [dbt architecture](https://docs.getdbt.com/docs/cloud/about-cloud/architecture.md) for an architecture diagram and to learn how the data flows. #### Run cancellation for over-scheduled jobs[​](#run-cancellation-for-over-scheduled-jobs "Direct link to Run cancellation for over-scheduled jobs") Scheduler won't cancel API-triggered jobs The scheduler will not cancel over-scheduled jobs triggered by the [API](https://docs.getdbt.com/docs/dbt-cloud-apis/overview.md). The dbt scheduler prevents too many job runs from clogging the queue by canceling unnecessary ones. If a job takes longer to run than its scheduled frequency, the queue will grow faster than the scheduler can process the runs, leading to an ever-expanding queue with runs that don’t need to be processed (called *over-scheduled jobs*). The scheduler prevents queue clog by canceling runs that aren't needed, ensuring there is only one run of the job in the queue at any given time. If a newer run is queued, the scheduler cancels any previously queued run for that job and displays an error message. [![The cancelled runs display an error message explaining why the run was cancelled and recommendations](/img/docs/dbt-cloud/deployment/run-error-message.png?v=2 "The cancelled runs display an error message explaining why the run was cancelled and recommendations")](#)The cancelled runs display an error message explaining why the run was cancelled and recommendations To prevent over-scheduling, users will need to take action by either refactoring the job so it runs faster or modifying its [schedule](https://docs.getdbt.com/docs/deploy/deploy-jobs.md#schedule-days). #### Deactivation of jobs [Beta](https://docs.getdbt.com/docs/dbt-versions/product-lifecycles "Go to https://docs.getdbt.com/docs/dbt-versions/product-lifecycles")[​](#deactivation-of-jobs- "Direct link to deactivation-of-jobs-") To reduce unnecessary resource consumption and reduce contention for run slots in your account, dbt will deactivate a [deploy job](https://docs.getdbt.com/docs/deploy/deploy-jobs.md) or a [CI job](https://docs.getdbt.com/docs/deploy/ci-jobs.md) if it reaches 100 consecutive failing runs. A banner containing this message is displayed when a job is deactivated: "Job has been deactivated due to repeated run failures. To reactivate, verify the job is configured properly and run manually or reenable any trigger". When this happens, scheduled and triggered-to-run jobs will no longer be enqueued. To reactivate a deactivated job, you can either: * Update the job's settings to fix the issue and save the job (recommended) * Perform a manual run by clicking **Run now** on the job's page #### FAQs[​](#faqs "Direct link to FAQs") I'm receiving a 'This run exceeded your account's run memory limits' error in my failed job If you're receiving a `This run exceeded your account's run memory limits` error in your failed job, it means that the job exceeded the [memory limits](https://docs.getdbt.com/docs/deploy/job-scheduler.md#job-memory) set for your account. All dbt accounts have a pod memory of 600Mib and memory limits are on a per run basis. They're typically influenced by the amount of result data that dbt has to ingest and process, which is small but can become bloated unexpectedly by project design choices. ##### Common reasons[​](#common-reasons "Direct link to Common reasons") Some common reasons for higher memory usage are: * dbt run/build: Macros that capture large result sets from run query may not all be necessary and may be memory inefficient. * dbt docs generate: Source or model schemas with large numbers of tables (even if those tables aren't all used by dbt) cause the ingest of very large results for catalog queries. ##### Resolution[​](#resolution "Direct link to Resolution") There are various reasons why you could be experiencing this error but they are mostly the outcome of retrieving too much data back into dbt. For example, using the `run_query()` operations or similar macros, or even using database/schemas that have a lot of other non-dbt related tables/views. Try to reduce the amount of data / number of rows retrieved back into dbt by refactoring the SQL in your `run_query()` operation using `group`, `where`, or `limit` clauses. Additionally, you can also use a database/schema with fewer non-dbt related tables/views. Video example As an additional resource, check out [this example video](https://www.youtube.com/watch?v=sTqzNaFXiZ8), which demonstrates how to refactor the sample code by reducing the number of rows returned. If you've tried the earlier suggestions and are still experiencing failed job runs with this error about hitting the memory limits of your account, please [reach out to support](mailto:support@getdbt.com). We're happy to help! ##### Additional resources[​](#additional-resources "Direct link to Additional resources") * [Blog post on how we shaved 90 mins off](https://docs.getdbt.com/blog/how-we-shaved-90-minutes-off-model) #### Related docs[​](#related-docs "Direct link to Related docs") * [dbt architecture](https://docs.getdbt.com/docs/cloud/about-cloud/architecture.md#dbt-cloud-features-architecture) * [Job commands](https://docs.getdbt.com/docs/deploy/job-commands.md) * [Job notifications](https://docs.getdbt.com/docs/deploy/job-notifications.md) * [Webhooks](https://docs.getdbt.com/docs/deploy/webhooks.md) * [dbt continuous integration](https://docs.getdbt.com/docs/deploy/continuous-integration.md) #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Jobs in the dbt platform These are the available job types in dbt: * [Deploy jobs](https://docs.getdbt.com/docs/deploy/deploy-jobs.md) — Build production data assets. Runs on a schedule, by API, or after another job completes. * [Continuous integration (CI) jobs](https://docs.getdbt.com/docs/deploy/continuous-integration.md) — Test and validate code changes before merging. Triggered by commit to a PR or by API. * [Merge jobs](https://docs.getdbt.com/docs/deploy/merge-jobs.md) — Deploy merged changes into production. Runs after a successful PR merge or by API. * [State-aware jobs](https://docs.getdbt.com/docs/deploy/state-aware-about.md) — Intelligently decide what needs to be rebuilt based on source freshness, code, or upstream data changes. Rebuild models only if they are older than the specified interval. The following comparison table describes the behaviors of the different job types: | | **Deploy jobs** | **CI jobs** | **Merge jobs** | **State-aware jobs** | | ---------------------- | ------------------------------------------------------------------------------ | --------------------------------------------------------------------------------------------------------------------- | --------------------------------------------------------------------------------------------------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------- | | Purpose | Builds production data assets. | Builds and tests new code before merging changes into production. | Build merged changes into production or update state for deferral. | Trigger model builds and job runs only when source data is updated. | | Trigger types | Triggered by a schedule, API, or the successful completion of another job. | Triggered by a commit to a PR or by API. | Triggered by a successful merge into the environment's branch or by API. | Triggered when code, sources, or upstream data changes and at custom refresh intervals and for custom source freshness configurations | | Destination | Builds into a production database and schema. | Builds into a staging database and ephemeral schema, lived for the lifetime of the PR. | Builds into a production database and schema. | Builds into a production database and schema. | | Execution mode | Runs execute sequentially, so as to not have collisions on the underlying DAG. | Runs execute in parallel to promote team velocity. | Runs execute sequentially, so as to not have collisions on the underlying DAG. | | | Efficiency run savings | Detects over-scheduled jobs and cancels unnecessary runs to avoid queue clog. | Cancels existing runs when a newer commit is pushed to avoid redundant work. | N/A | Runs jobs and build models *only* when source data is updated or if models are older than what you specified in the project refresh interval | | State comparison | Only sometimes needs to detect state. | Almost always needs to compare state against the production environment to build on modified code and its dependents. | Almost always needs to compare state against the production environment to build on modified code and its dependents. | | | Job run duration | Limit is 24 hours. | Limit is 24 hours. | Limit is 24 hours. | Limit is 24 hours. | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Joins Joins are a powerful part of MetricFlow and simplify the process of making all valid dimensions available for your metrics at query time, regardless of where they are defined in different semantic models. With Joins, you can also create metrics using from different semantic models. Joins use `entities` defined in your semantic model configs as the join keys between tables. Assuming entities are defined in the semantic model, MetricFlow creates a graph using the semantic models as nodes and the join paths as edges to perform joins automatically. MetricFlow chooses the appropriate join type and avoids fan-out or chasm joins with other tables based on the entity types.  What are fan-out or chasm joins? * Fan-out joins are when one row in a table is joined to multiple rows in another table, resulting in more output rows than input rows. * Chasm joins are when two tables have a many-to-many relationship through an intermediate table, and the join results in duplicate or missing data. #### Types of joins[​](#types-of-joins "Direct link to Types of joins") Joins are auto-generated MetricFlow automatically generates the necessary joins to the defined semantic objects, eliminating the need for you to create new semantic models or configuration files. This section explains the different types of joins that can be used with entities and how to query them. Metricflow uses these specific join strategies: * Primarily uses left joins when joining `fct` and `dim` models. Left joins make sure all rows from the "base" table are retained, while matching rows are included from the joined table. * For queries that involve multiple `fct` models, MetricFlow uses full outer joins to ensure all data points are captured, even when some `dim` or `fct` models are missing in certain tables. * MetricFlow restricts the use of fan-out and chasm joins. Refer to [SQL examples](#sql-examples) for more information on how MetricFlow handles joins in practice. The following table identifies which joins are allowed based on specific entity types to prevent the creation of risky joins. This table primarily represents left joins unless otherwise specified. For scenarios involving multiple `fct` models, MetricFlow uses full outer joins. | entity type - Table A | entity type - Table B | Join type | | --------------------- | --------------------- | ------------------------ | | Primary | Primary | ✅ Left | | Primary | Unique | ✅ Left | | Primary | Foreign | ❌ Fan-out (Not allowed) | | Unique | Primary | ✅ Left | | Unique | Unique | ✅ Left | | Unique | Foreign | ❌ Fan-out (Not allowed) | | Foreign | Primary | ✅ Left | | Foreign | Unique | ✅ Left | | Foreign | Foreign | ❌ Fan-out (Not allowed) | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | ##### Semantic validation[​](#semantic-validation "Direct link to Semantic validation") MetricFlow performs semantic validation by executing `explain` queries in the data platform to ensure that the generated SQL gets executed without errors. This validation includes: * Verifying that all referenced tables and columns exist. * Ensuring the data platform supports SQL functions, such as `date_diff(x, y)`. * Checking for ambiguous joins or paths in multi-hop joins. If validation fails, MetricFlow surfaces errors for users to address before executing the query. #### Example[​](#example "Direct link to Example") The following example uses two semantic models with a common entity and shows a MetricFlow query that requires a join between the two semantic models: `transactions` and `user_signup`. ```yaml dbt sl query --metrics average_purchase_price --group-by metric_time,user_id__type # In dbt platform /> ``` ```yaml mf query --metrics average_purchase_price --group-by metric_time,user_id__type # In dbt Core ``` ###### SQL examples[​](#sql-examples "Direct link to SQL examples") These SQL examples show how MetricFlow handles both left join and full outer join scenarios in practice: * SQL example for left join * SQL example for outer joins Using the previous example for `transactions` and `user_signup` semantic models, this shows a left join between those two semantic models. ```sql select transactions.user_id, transactions.purchase_price, user_signup.type from transactions left outer join user_signup on transactions.user_id = user_signup.user_id where transactions.purchase_price is not null group by transactions.user_id, user_signup.type; ``` If you have multiple `fct` models, let's say `sales` and `returns`, MetricFlow uses full outer joins to ensure all data points are captured. This example shows a full outer join between the `sales` and `returns` semantic models. ```sql select sales.user_id, sales.total_sales, returns.total_returns from sales full outer join returns on sales.user_id = returns.user_id where sales.user_id is not null or returns.user_id is not null; ``` #### Multi-hop joins[​](#multi-hop-joins "Direct link to Multi-hop joins") MetricFlow allows users to join and dimensions across a graph of entities by moving from one table to another within a graph. This is referred to as "multi-hop join". MetricFlow can join up to three tables, supporting multi-hop joins with a limit of two hops. This does the following: * Enables complex data analysis without ambiguous paths. * Supports navigating through data models, like moving from `orders` to `customers` to `country` tables. While direct three-hop paths are limited to prevent confusion from multiple routes to the same data, MetricFlow does allow joining more than three tables if the joins don’t exceed two hops to reach a dimension. For example, if you have two models, `country` and `region`, where customers are linked to countries, which in turn are linked to regions, you can join all of them in a single SQL query and can dissect `orders` by `customer__country_country_name` but not by `customer__country__region_name`. ![Multi-Hop-Join](/assets/images/multihop-diagram-03171b81496cb0fd452d2c2f0b5e0ed3.png "Example schema for reference") ##### Query multi-hop joins[​](#query-multi-hop-joins "Direct link to Query multi-hop joins") To query dimensions *without* a multi-hop join involved, you can use the fully qualified dimension name with the syntax entity double underscore (dunder) dimension, like `entity__dimension`. For dimensions retrieved by a multi-hop join, you need to additionally provide the entity path as a list, like `user_id`. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Materializations #### Overview[​](#overview "Direct link to Overview") Materializations are strategies for persisting dbt models in a warehouse. There are five types of materializations built into dbt. They are: * table * view * incremental * ephemeral * materialized view You can also configure [custom materializations](https://docs.getdbt.com/guides/create-new-materializations.md?step=1) in dbt. Custom materializations are a powerful way to extend dbt's functionality to meet your specific needs. For a detailed guide on materializations, refer to [Materializations best practices](https://docs.getdbt.com/best-practices/materializations/1-guide-overview.md). For information about data streaming, refer to [How to handle real-time data](https://docs.getdbt.com/best-practices/how-we-handle-real-time-data/1-intro.md). Learn by video! For video tutorials on Materializations , go to dbt Learn and check out the [Materializations fundamentals](https://learn.getdbt.com/courses/materializations-fundamentals) [ course](https://learn.getdbt.com/courses/materializations-fundamentals). #### Configuring materializations[​](#configuring-materializations "Direct link to Configuring materializations") By default, dbt models are materialized as "views". Models can be configured with a different materialization by supplying the [`materialized` configuration](https://docs.getdbt.com/reference/resource-configs/materialized.md) parameter as shown in the following tabs. * Project file * Model file * Property file dbt\_project.yml ```yaml # The following dbt_project.yml configures a project that looks like this: # . # └── models # ├── csvs # │   ├── employees.sql # │   └── goals.sql # └── events # ├── stg_event_log.sql # └── stg_event_sessions.sql name: my_project version: 1.0.0 config-version: 2 models: my_project: events: # materialize all models in models/events as tables +materialized: table csvs: # this is redundant, and does not need to be set +materialized: view ``` Alternatively, materializations can be configured directly inside of the model SQL files. This can be useful if you are also setting \[Performance Optimization] configs for specific models (for example, [Redshift specific configurations](https://docs.getdbt.com/reference/resource-configs/redshift-configs.md) or [BigQuery specific configurations](https://docs.getdbt.com/reference/resource-configs/bigquery-configs.md)). models/events/stg\_event\_log.sql ```sql {{ config(materialized='table', sort='timestamp', dist='user_id') }} select * from ... ``` Materializations can also be configured in the model's `properties.yml` file. The following example shows the `table` materialization type. For a complete list of materialization types, refer to [materializations](https://docs.getdbt.com/docs/build/materializations.md#materializations). models/properties.yml ```yaml models: - name: events config: materialized: table ``` #### Materializations[​](#materializations "Direct link to Materializations") ##### View[​](#view "Direct link to View") When using the `view` materialization, your model is rebuilt as a view on each run, via a `create view as` statement. * **Pros:** No additional data is stored, views on top of source data will always have the latest records in them. * **Cons:** Views that perform a significant transformation, or are stacked on top of other views, are slow to query. * **Advice:** * Generally start with views for your models, and only change to another materialization when you notice performance problems. * Views are best suited for models that do not do significant transformation, for example, renaming, or recasting columns. ##### Table[​](#table "Direct link to Table") When using the `table` materialization, your model is rebuilt as a table on each run, via a `create table as` statement. * **Pros:** Tables are fast to query * **Cons:** * Tables can take a long time to rebuild, especially for complex transformations * New records in underlying source data are not automatically added to the table * **Advice:** * Use the table materialization for any models being queried by BI tools, to give your end user a faster experience * Also use the table materialization for any slower transformations that are used by many downstream models ##### Incremental[​](#incremental "Direct link to Incremental") `incremental` models allow dbt to insert or update records into a table since the last time that model was run. * **Pros:** You can significantly reduce the build time by just transforming new records * **Cons:** Incremental models require extra configuration and are an advanced usage of dbt. Read more about using incremental models [here](https://docs.getdbt.com/docs/build/incremental-models.md). * **Advice:** * Incremental models are best for event-style data * Use incremental models when your `dbt run`s are becoming too slow (i.e. don't start with incremental models) ##### Ephemeral[​](#ephemeral "Direct link to Ephemeral") `ephemeral` models are not directly built into the database. Instead, dbt will interpolate the code from an ephemeral model into its dependent models using a common table expression (CTE). You can control the identifier for this CTE using a [model alias](https://docs.getdbt.com/docs/build/custom-aliases.md), but dbt will always prefix the model identifier with `__dbt__cte__`. * **Pros:** * You can still write reusable logic * Ephemeral models can help keep your data warehouse clean by reducing clutter (also consider splitting your models across multiple schemas by [using custom schemas](https://docs.getdbt.com/docs/build/custom-schemas.md)). * **Cons:** * You cannot select directly from this model. * [Operations](https://docs.getdbt.com/docs/build/hooks-operations.md#about-operations) (for example, macros called using [`dbt run-operation`](https://docs.getdbt.com/reference/commands/run-operation.md) cannot `ref()` ephemeral nodes) * Overuse of ephemeral materialization can also make queries harder to debug. * Ephemeral materialization doesn't support [model contracts](https://docs.getdbt.com/docs/mesh/govern/model-contracts.md#where-are-contracts-supported). * **Advice:** Use the ephemeral materialization for: * Very light-weight transformations that are early on in your DAG * Are only used in one or two downstream models, and * Don't need to be queried directly ##### Materialized View[​](#materialized-view "Direct link to Materialized View") The `materialized_view` materialization allows the creation and maintenance of materialized views in the target database. Materialized views are a combination of a view and a table, and serve use cases similar to incremental models. * **Pros:** * Materialized views combine the query performance of a table with the data freshness of a view * Materialized views operate much like incremental materializations, however they are usually able to be refreshed without manual interference on a regular cadence (depending on the database), forgoing the regular dbt batch refresh required with incremental materializations * `dbt run` on materialized views corresponds to a code deployment, just like views - **Cons:** * Due to the fact that materialized views are more complex database objects, database platforms tend to have fewer configuration options available; see your database platform's docs for more details * Materialized views may not be supported by every database platform * **Advice:** * Consider materialized views for use cases where incremental models are sufficient, but you would like the data platform to manage the incremental logic and refresh. ###### Configuration Change Monitoring[​](#configuration-change-monitoring "Direct link to Configuration Change Monitoring") This materialization makes use of the [`on_configuration_change`](https://docs.getdbt.com/reference/resource-configs/on_configuration_change.md) config, which aligns with the incremental nature of the namesake database object. This setting tells dbt to attempt to make configuration changes directly to the object when possible, as opposed to completely recreating the object to implement the updated configuration. Using `dbt-postgres` as an example, indexes can be dropped and created on the materialized view without the need to recreate the materialized view itself. ###### Scheduled Refreshes[​](#scheduled-refreshes "Direct link to Scheduled Refreshes") In the context of a `dbt run` command, materialized views should be thought of as similar to views. For example, a `dbt run` command is only needed if there is the potential for a change in configuration or sql; it's effectively a deploy action. By contrast, a `dbt run` command is needed for a table in the same scenarios *AND when the data in the table needs to be updated*. This also holds true for incremental and snapshot models, whose underlying relations are tables. In the table cases, the scheduling mechanism is either dbt or your local scheduler; there is no built-in functionality to automatically refresh the data behind a table. However, most platforms (Postgres excluded) provide functionality to configure automatically refreshing a materialized view. Hence, materialized views work similarly to incremental models with the benefit of not needing to run dbt to refresh the data. This assumes, of course, that auto refresh is turned on and configured in the model. info `dbt-snowflake` *does not* support materialized views, it uses Dynamic Tables instead. For details, refer to [Snowflake specific configurations](https://docs.getdbt.com/reference/resource-configs/snowflake-configs.md#dynamic-tables). #### Python materializations[​](#python-materializations "Direct link to Python materializations") Python models support two materializations: * `table` * `incremental` Incremental Python models support all the same [incremental strategies](https://docs.getdbt.com/docs/build/incremental-strategy.md) as their SQL counterparts. The specific strategies supported depend on your adapter. Python models can't be materialized as `view` or `ephemeral`. Python isn't supported for non-model resource types (like tests and snapshots). For incremental models, like SQL models, you will need to filter incoming tables to only new rows of data: * Snowpark * PySpark models/my\_python\_model.py ```python import snowflake.snowpark.functions as F def model(dbt, session): dbt.config(materialized = "incremental") df = dbt.ref("upstream_table") if dbt.is_incremental: # only new rows compared to max in current table max_from_this = f"select max(updated_at) from {dbt.this}" df = df.filter(df.updated_at >= session.sql(max_from_this).collect()[0][0]) # or only rows from the past 3 days df = df.filter(df.updated_at >= F.dateadd("day", F.lit(-3), F.current_timestamp())) ... return df ``` models/my\_python\_model.py ```python import pyspark.sql.functions as F def model(dbt, session): dbt.config(materialized = "incremental") df = dbt.ref("upstream_table") if dbt.is_incremental: # only new rows compared to max in current table max_from_this = f"select max(updated_at) from {dbt.this}" df = df.filter(df.updated_at >= session.sql(max_from_this).collect()[0][0]) # or only rows from the past 3 days df = df.filter(df.updated_at >= F.date_add(F.current_timestamp(), F.lit(-3))) ... return df ``` **Note:** Incremental models are supported on BigQuery/Dataproc for the `merge` incremental strategy. The `insert_overwrite` strategy is not yet supported. ##### Questions from the Community[​](#questions-from-the-community "Direct link to Questions from the Community") ![Loading](/img/loader-icon.svg)[Ask the Community](https://discourse.getdbt.com/new-topic?category=help\&tags=materialization "Ask the Community") #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Measures Measures are deprecated in the new spec Heads up, measures have been deprecated in favor of simple metrics under the `metrics:` key. Migrate by converting each measure to a `type: simple` metric. For more info, check out [Migrate to the latest YAML spec](https://docs.getdbt.com/docs/build/latest-metrics-spec.md) and [upgrade to dbt Fusion v2.0](https://docs.getdbt.com/docs/dbt-versions/core-upgrade/upgrading-to-fusion.md). Measures are aggregations performed on columns in your model. They can be used as final metrics or as building blocks for more complex metrics. Measures have several inputs, which are described in the following table along with their field types. | Parameter | Description | Required | Type | | -------------------------------------------------------------------------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | -------- | ------- | | [`name`](https://docs.getdbt.com/docs/build/measures.md#name) | Provide a name for the measure, which must be unique and can't be repeated across all semantic models in your dbt project. | Required | String | | [`description`](https://docs.getdbt.com/docs/build/measures.md#description) | Describes the calculated measure. | Optional | String | | [`agg`](https://docs.getdbt.com/docs/build/measures.md#aggregation) | dbt supports the following aggregations: `sum`, `max`, `min`, `average`, `median`, `count_distinct`, `percentile`, and `sum_boolean`. | Required | String | | [`expr`](https://docs.getdbt.com/docs/build/measures.md#expr) | Either reference an existing column in the table or use a SQL expression to create or derive a new one. | Optional | String | | [`non_additive_dimension`](https://docs.getdbt.com/docs/build/measures.md#non-additive-dimensions) | Non-additive dimensions can be specified for measures that cannot be aggregated over certain dimensions, such as bank account balances, to avoid producing incorrect results. | Optional | String | | `agg_params` | Specific aggregation properties, such as a percentile. | Optional | Dict | | `agg_time_dimension` | The time field. Defaults to the default agg time dimension for the semantic model. | Optional | String | | `label` | String that defines the display value in downstream tools. Accepts plain text, spaces, and quotes (such as `orders_total` or `"orders_total"`). Available in dbt version 1.7 or higher. | Optional | String | | `create_metric` | Create a `simple` metric from a measure by setting `create_metric: True`. The `label` and `description` attributes will be automatically propagated to the created metric. Available in dbt version 1.7 or higher. | Optional | Boolean | | `config` | Use the [`config`](https://docs.getdbt.com/reference/resource-properties/config.md) property to specify configurations for your metric. Supports the [`meta`](https://docs.getdbt.com/reference/resource-configs/meta.md) property, nested under `config`. | Optional | | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | #### Measure spec[​](#measure-spec "Direct link to Measure spec") An example of the complete YAML measures spec is below. The actual configuration of your measures will depend on the aggregation you're using. ```yaml semantic_models: - name: semantic_model_name ..rest of the semantic model config measures: - name: The name of the measure description: 'same as always' ## Optional agg: the aggregation type. expr: the field agg_params: 'specific aggregation properties such as a percentile' ## Optional agg_time_dimension: The time field. Defaults to the default agg time dimension for the semantic model. ## Optional non_additive_dimension: 'Use these configs when you need non-additive dimensions.' ## Optional config: Use the config property to specify configurations for your measure. ## Optional meta: {} Set metadata for a resource and organize resources. Accepts plain text, spaces, and quotes. ## Optional ``` ##### Name[​](#name "Direct link to Name") When you create a measure, you can either give it a custom name or use the `name` of the data platform column directly. If the measure's `name` differs from the column name, you need to add an `expr` to specify the column name. The `name` of the measure is used when creating a metric. Measure names must be unique across all semantic models in a project and can not be the same as an existing `entity` or `dimension` within that same model. ##### Description[​](#description "Direct link to Description") The description describes the calculated measure. It's strongly recommended you create verbose and human-readable descriptions in this field. ##### Aggregation[​](#aggregation "Direct link to Aggregation") The aggregation determines how the field will be aggregated. For example, a `sum` aggregation type over a granularity of `day` would sum the values across a given day. Supported aggregations include: | Aggregation types | Description | | ----------------- | ------------------------------------------ | | sum | Sum across the values | | min | Minimum across the values | | max | Maximum across the values | | average | Average across the values | | sum\_boolean | A sum for a boolean type | | count\_distinct | Distinct count of values | | median | Median (p50) calculation across the values | | percentile | Percentile calculation across the values. | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | ###### Percentile aggregation example[​](#percentile-aggregation-example "Direct link to Percentile aggregation example") If you're using the `percentile` aggregation, you must use the `agg_params` field to specify details for the percentile aggregation (such as what percentile to calculate and whether to use discrete or continuous calculations). ```yaml name: p99_transaction_value description: The 99th percentile transaction value expr: transaction_amount_usd agg: percentile agg_params: percentile: .99 use_discrete_percentile: False # False calculates the continuous percentile, True calculates the discrete percentile. ``` ###### Percentile across supported engine types[​](#percentile-across-supported-engine-types "Direct link to Percentile across supported engine types") The following table lists which SQL engine supports continuous, discrete, approximate, continuous, and approximate discrete percentiles. | | Cont. | Disc. | Approx. cont | Approx. disc | | ------------------------------------------------------------------------ | ------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------ | ------------------------------------------------------------------------------------------------------------- | ---------------------------------------------------------------------------------------- | | Snowflake | [Yes](https://docs.snowflake.com/en/sql-reference/functions/percentile_cont.html) | [Yes](https://docs.snowflake.com/en/sql-reference/functions/percentile_disc.html) | [Yes](https://docs.snowflake.com/en/sql-reference/functions/approx_percentile.html) (t-digest) | No | | Bigquery | No (window) | No (window) | [Yes](https://cloud.google.com/bigquery/docs/reference/standard-sql/functions-and-operators#approx_quantiles) | No | | Databricks | [Yes](https://docs.databricks.com/sql/language-manual/functions/percentile_cont.html) | [No](https://docs.databricks.com/sql/language-manual/functions/percentile_disc.html) | No | [Yes](https://docs.databricks.com/sql/language-manual/functions/approx_percentile.html) | | Redshift | [Yes](https://docs.aws.amazon.com/redshift/latest/dg/r_PERCENTILE_CONT.html) | No (window) | No | [Yes](https://docs.aws.amazon.com/redshift/latest/dg/r_APPROXIMATE_PERCENTILE_DISC.html) | | [Postgres](https://www.postgresql.org/docs/9.4/functions-aggregate.html) | Yes | Yes | No | No | | [DuckDB](https://duckdb.org/docs/sql/aggregates.html) | Yes | Yes | Yes (t-digest) | No | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | ##### Expr[​](#expr "Direct link to Expr") If the `name` you specified for a measure doesn't match a column name in your model, you can use the `expr` parameter instead. This allows you to use any valid SQL to manipulate an underlying column name into a specific output. The `name` parameter then serves as an alias for your measure. **Notes**: When using SQL functions in the `expr` parameter, **always use data platform-specific SQL**. This is because outputs may differ depending on your specific data platform. For Snowflake users For Snowflake users, if you use a week-level function in the `expr` parameter, it'll now return Monday as the default week start day based on ISO standards. If you have any account or session level overrides for the `WEEK_START` parameter that fixes it to a value other than 0 or 1, you will still see Monday as the week starts. If you use the `dayofweek` function in the `expr` parameter with the legacy Snowflake default of `WEEK_START = 0`, it will now return ISO-standard values of 1 (Monday) through 7 (Sunday) instead of Snowflake's legacy default values of 0 (Monday) through 6 (Sunday). ##### Model with different aggregations[​](#model-with-different-aggregations "Direct link to Model with different aggregations") ```yaml semantic_models: - name: transactions description: A record of every transaction that takes place. Carts are considered multiple transactions for each sku. model: ref('schema.transactions') defaults: agg_time_dimension: transaction_date # --- entities --- entities: - name: transaction_id type: primary - name: customer_id type: foreign - name: store_id type: foreign - name: product_id type: foreign # --- measures --- measures: - name: transaction_amount_usd description: Total usd value of transactions expr: transaction_amount_usd agg: sum config: meta: used_in_reporting: true - name: transaction_amount_usd_avg description: Average usd value of transactions expr: transaction_amount_usd agg: average - name: transaction_amount_usd_max description: Maximum usd value of transactions expr: transaction_amount_usd agg: max - name: transaction_amount_usd_min description: Minimum usd value of transactions expr: transaction_amount_usd agg: min - name: quick_buy_transactions description: The total transactions bought as quick buy expr: quick_buy_flag agg: sum_boolean - name: distinct_transactions_count description: Distinct count of transactions expr: transaction_id agg: count_distinct - name: transaction_amount_avg description: The average value of transactions expr: transaction_amount_usd agg: average - name: transactions_amount_usd_valid # Notice here how we use expr to compute the aggregation based on a condition description: The total usd value of valid transactions only expr: case when is_valid = True then transaction_amount_usd else 0 end agg: sum - name: transactions description: The average value of transactions. expr: transaction_amount_usd agg: average - name: p99_transaction_value description: The 99th percentile transaction value expr: transaction_amount_usd agg: percentile agg_params: percentile: .99 use_discrete_percentile: False # False calculates the continuous percentile, True calculates the discrete percentile. - name: median_transaction_value description: The median transaction value expr: transaction_amount_usd agg: median # --- dimensions --- dimensions: - name: transaction_date type: time expr: date_trunc('day', ts) # expr refers to underlying column ts type_params: time_granularity: day - name: is_bulk_transaction type: categorical expr: case when quantity > 10 then true else false end ``` ##### Non-additive dimensions[​](#non-additive-dimensions "Direct link to Non-additive dimensions") Some measures cannot be aggregated over certain dimensions, like time, because it could result in incorrect outcomes. Examples include bank account balances where it does not make sense to carry over balances month-to-month, and monthly recurring revenue where daily recurring revenue cannot be summed up to achieve monthly recurring revenue. You can specify non-additive dimensions to handle this, where certain dimensions are excluded from aggregation. To demonstrate the configuration for non-additive measures, consider a subscription table that includes one row per date of the registered user, the user's active subscription plan(s), and the plan's subscription value (revenue) with the following columns: * `date_transaction`: The daily date-spine. * `user_id`: The ID of the registered user. * `subscription_plan`: A column to indicate the subscription plan ID. * `subscription_value`: A column to indicate the monthly subscription value (revenue) of a particular subscription plan ID. Parameters under the `non_additive_dimension` will specify dimensions that the measure should not be aggregated over. | Parameter | Description | Field type | | ------------------ | ---------------------------------------------------------------------------------------------------------------------------------------------- | ---------- | | `name` | This will be the name of the time dimension (that has already been defined in the data source) that the measure should not be aggregated over. | Required | | `window_choice` | Choose either `min` or `max`, where `min` reflects the beginning of the time period and `max` reflects the end of the time period. | Required | | `window_groupings` | Provide the entities that you would like to group by. | Optional | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | ```yaml semantic_models: - name: subscriptions description: A subscription table with one row per date for each active user and their subscription plans. model: ref('your_schema.subscription_table') defaults: agg_time_dimension: subscription_date entities: - name: user_id type: foreign primary_entity: subscription dimensions: - name: subscription_date type: time expr: date_transaction type_params: time_granularity: day measures: - name: count_users description: Count of users at the end of the month expr: user_id agg: count_distinct non_additive_dimension: name: subscription_date window_choice: max - name: mrr description: Aggregate by summing all users' active subscription plans expr: subscription_value agg: sum non_additive_dimension: name: subscription_date window_choice: max - name: user_mrr description: Group by user_id to achieve each user's MRR expr: subscription_value agg: sum non_additive_dimension: name: subscription_date window_choice: max window_groupings: - user_id metrics: - name: mrr_metrics type: simple type_params: measure: mrr ``` We can query the semi-additive metrics using the following syntax: For dbt: ```bash dbt sl query --metrics mrr_by_end_of_month --group-by subscription__subscription_date__month --order subscription__subscription_date__month dbt sl query --metrics mrr_by_end_of_month --group-by subscription__subscription_date__week --order subscription__subscription_date__week ``` For dbt Core: ```bash mf query --metrics mrr_by_end_of_month --group-by subscription__subscription_date__month --order subscription__subscription_date__month mf query --metrics mrr_by_end_of_month --group-by subscription__subscription_date__week --order subscription__subscription_date__week ``` #### Dependencies[​](#dependencies "Direct link to Dependencies") #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Merge jobs in dbt StarterEnterprise ### Merge jobs in dbt [Starter](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing")[Enterprise](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing") You can set up a merge job to implement a continuous deployment (CD) workflow in dbt. The merge job triggers a dbt job to run when someone merges Git pull requests into production. This workflow creates a seamless development experience where changes made in code will automatically update production data. Also, you can use this workflow for running `dbt compile` to update your environment's manifest so subsequent CI job runs are more performant. By using CD in dbt, you can take advantage of deferral to build only the edited model and any downstream changes. With merge jobs, state will be updated almost instantly, always giving the most up-to-date state information in [Catalog](https://docs.getdbt.com/docs/explore/explore-projects.md). Triggering merge jobs in monorepos If you have a monorepo with several dbt projects, merging a single pull request in one of your projects will trigger jobs for all projects connected to the monorepo. To address this, you can use separate target branches per project (for example, `main-project-a`, `main-project-b`) to separate CI triggers. #### Prerequisites[​](#prerequisites "Direct link to Prerequisites") * You have a dbt account. * You have set up a [connection with your Git provider](https://docs.getdbt.com/docs/cloud/git/git-configuration-in-dbt-cloud.md). This integration lets dbt run jobs on your behalf for job triggering. * If you're using a native [GitLab](https://docs.getdbt.com/docs/cloud/git/connect-gitlab.md) integration, you need a paid or self-hosted account that includes support for GitLab webhooks and [project access tokens](https://docs.gitlab.com/ee/user/project/settings/project_access_tokens.html). If you're using GitLab Free, merge requests will trigger CI jobs but CI job status updates (success or failure of the job) will not be reported back to GitLab. * For deferral (which is the default), make sure there has been at least one successful job run in the environment you defer to. #### Set up job trigger on Git merge[​](#set-up-merge-jobs "Direct link to Set up job trigger on Git merge") 1. On your deployment environment page, click **Create job** > **Merge job**. 2. Options in the **Job settings** section: * **Job name** — Specify the name for the merge job. * **Description** — Provide a description about the job. * **Environment** — By default, it’s set to the environment you created the job from. 3. In the **Git trigger** section, the **Run on merge** option is enabled by default. Every time a PR merges (to a base branch configured in the environment) in your Git repo, this job will get triggered to run. 4. Options in the **Execution settings** section: * **Commands** — By default, it includes the `dbt build --select state:modified+` command. This informs dbt to build only new or changed models and their downstream dependents. Importantly, state comparison can only happen when there is a deferred environment selected to compare state to. Click **Add command** to add more [commands](https://docs.getdbt.com/docs/deploy/job-commands.md) that you want to be invoked when this job runs. * **Compare changes against** — By default, it's set to compare changes against the environment you created the job from. This option allows dbt to check the state of the code in the PR against the code running in the deferred environment, so as to only check the modified code, instead of building the full table or the entire DAG. To change the default settings, you can select **No deferral**, **This job** for self-deferral, or choose a different environment. 5. (optional) Options in the **Advanced settings** section: * **Environment variables** — Define [environment variables](https://docs.getdbt.com/docs/build/environment-variables.md) to customize the behavior of your project when this job runs. * **Target name** — Define the [target name](https://docs.getdbt.com/docs/build/custom-target-names.md). Similar to environment variables, this option lets you customize the behavior of the project. * **Run timeout** — Cancel this job if the run time exceeds the timeout value. * **dbt version** — By default, it’s set to inherit the [dbt version](https://docs.getdbt.com/docs/dbt-versions/core.md) from the environment. dbt Labs strongly recommends that you don't change the default setting. This option to change the version at the job level is useful only when you upgrade a project to the next dbt version; otherwise, mismatched versions between the environment and job can lead to confusing behavior. * **Threads** — By default, it’s set to 4 [threads](https://docs.getdbt.com/docs/local/profiles.yml.md#understanding-threads). Increase the thread count to increase model execution concurrency. [![Example of creating a merge job](/img/docs/dbt-cloud/using-dbt-cloud/example-create-merge-job.png?v=2 "Example of creating a merge job")](#)Example of creating a merge job #### Verify push events in Git[​](#verify-push-events-in-git "Direct link to Verify push events in Git") Merge jobs require push events so make sure they've been enabled in your Git provider, especially if you have an already-existing Git integration. However, for a new integration setup, you can skip this check since push events are typically enabled by default.  GitHub example The following is a GitHub example of when the push events are already set: [![Example of the Pushes option enabled in the GitHub settings](/img/docs/dbt-cloud/using-dbt-cloud/example-github-push-events.png?v=2 "Example of the Pushes option enabled in the GitHub settings")](#)Example of the Pushes option enabled in the GitHub settings  GitLab example The following is a GitLab example of when the push events are already set: [![Example of the Push events option enabled in the GitLab settings](/img/docs/dbt-cloud/using-dbt-cloud/example-gitlab-push-events.png?v=2 "Example of the Push events option enabled in the GitLab settings")](#)Example of the Push events option enabled in the GitLab settings  Azure DevOps example The following is an example of creating a new **Code pushed** trigger in Azure DevOps. Create a new service hooks subscription when code pushed events haven't been set: [![Example of creating a new trigger to push events in Azure Devops](/img/docs/dbt-cloud/using-dbt-cloud/example-azuredevops-new-event.png?v=2 "Example of creating a new trigger to push events in Azure Devops")](#)Example of creating a new trigger to push events in Azure Devops #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### MetricFlow commands Once you define metrics in your dbt project, you can query metrics, dimensions, and dimension values, and validate your configs using the MetricFlow commands, available across the dbt Core or [dbt Fusion engine](https://docs.getdbt.com/docs/fusion.md). To upgrade to Fusion, see [Get started with Fusion](https://docs.getdbt.com/docs/fusion/get-started-fusion.md). MetricFlow allows you to define and query metrics in your dbt project in [dbt platform](https://docs.getdbt.com/docs/cloud/about-develop-dbt.md) or [dbt Core](https://docs.getdbt.com/docs/local/install-dbt.md). To experience the power of the universal [Semantic Layer](https://docs.getdbt.com/docs/use-dbt-semantic-layer/dbt-sl.md) and dynamically query those metrics in downstream tools, you'll need a dbt [Starter, Enterprise, or Enterprise+](https://www.getdbt.com/pricing/) account. MetricFlow is compatible with Python versions 3.8, 3.9, 3.10, 3.11, and 3.12. #### MetricFlow[​](#metricflow "Direct link to MetricFlow") * MetricFlow in Fusion or dbt platform * MetricFlow with dbt Core This section applies to dbt platform users running the dbt Fusion engine, where commands and validations execute remotely in dbt platform. * Run MetricFlow commands using the `dbt sl` prefix in the [Studio IDE](https://docs.getdbt.com/docs/cloud/studio-ide/develop-in-studio.md) or [dbt CLI](https://docs.getdbt.com/docs/cloud/cloud-cli-installation.md) or using the [VS Code extension](https://docs.getdbt.com/docs/install-dbt-extension.md). * For CLI or VS Code/Cursor users, MetricFlow commands are embedded, which means you can immediately run them once you install the [dbt CLI](https://docs.getdbt.com/docs/cloud/cloud-cli-installation.md) or [VS Code extension](https://docs.getdbt.com/docs/install-dbt-extension.md) and don't need to install MetricFlow separately. * Using MetricFlow with dbt platform doesn't require you to manage versioning — your dbt account will automatically manage the versioning. * dbt jobs support the `dbt sl validate` command to [automatically test your semantic nodes](https://docs.getdbt.com/docs/deploy/ci-jobs.md#semantic-validations-in-ci). You can also add MetricFlow validations with your Git provider (such as GitHub Actions) by installing MetricFlow (`python -m pip install metricflow`). This allows you to run MetricFlow commands as part of your continuous integration checks on PRs. This section applies to dbt Core users running the dbt Core engine or users running [source available](https://www.getdbt.com/licenses-faq) Fusion locally and aren't on dbt platform. You can install [MetricFlow](https://github.com/dbt-labs/metricflow#getting-started) from [PyPI](https://pypi.org/project/dbt-metricflow/). You need to use `pip` to install MetricFlow on Windows or Linux operating systems: 1. Create or activate your virtual environment `python -m venv venv`. 2. Run `pip install dbt-metricflow`. * You can install MetricFlow using PyPI as an extension of your dbt adapter in the command line. To install the adapter, run `python -m pip install "dbt-metricflow[adapter_package_name]"` and add the adapter name at the end of the command. For example, for a Snowflake adapter, run `python -m pip install "dbt-metricflow[dbt-snowflake]"`. **Note**, you'll need to manage versioning between dbt Core, your adapter, and MetricFlow. Something to note, MetricFlow `mf` commands return an error if you have a Metafont latex package installed. To run `mf` commands, uninstall the package. #### MetricFlow commands[​](#metricflow-commands "Direct link to MetricFlow commands") Use MetricFlow commands to retrieve metadata and query metrics. The following table lists the compatibility matrix for MetricFlow commands and where you can run them. | Development setup | Engine | Hosted on | Prefix | Notes | | --------------------------------------------------- | ------------------------------------------------- | ------------- | -------- | -------------------------------------------------------------------------------------------------------------------------------------------------- | | Studio IDE/dbt CLI or VS Code extension Fusion only | dbt Fusion engine or dbt Core engine | dbt platform | `dbt sl` | Remote execution; Platform manages versions.
VS Code extension users must have a `dbt_cloud.yml` file with a token to connect to dbt platform | | Open-source
(no dbt platform project) | Fusion (source available) or
dbt Core engine | Local machine | `mf` | Install and manage MetricFlow locally. | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | * If you’re using Fusion with dbt platform and have a `dbt_cloud.yml` file with a valid token to connect to dbt platform, run MetricFlow commands using the `dbt sl` prefix. * This allows you to interact with metrics that are executed remotely on dbt platform (for example, from the [Studio IDE](https://docs.getdbt.com/docs/cloud/studio-ide/develop-in-studio.md) or [dbt CLI](https://docs.getdbt.com/docs/cloud/cloud-cli-installation.md)). * If you’re using [Fusion CLI](https://docs.getdbt.com/docs/local/install-dbt.md?version=2#get-started) ([source available](https://www.getdbt.com/licenses-faq)) and aren't connected to dbt platform, install MetricFlow separately and use the `mf` prefix to run commands locally. * If you’re using dbt Core locally without Fusion, run MetricFlow commands using the `mf` prefix. - Commands for dbt platform - Commands for dbt Core This section applies to dbt platform users running the dbt Fusion engine or dbt Core engine where commands and validations execute remotely in dbt platform. * Use the `dbt sl` prefix before the command name to execute them in the dbt platform (Studio IDE, VS Code/Cursor, dbt CLI) (like `dbt sl list metrics` to list all metrics). * For dbt platform users developing with a CLI or an editor (like VS Code), run the `dbt sl --help` command in the terminal to view a complete list of the MetricFlow commands and flags. * The following table lists the commands compatible with dbt platform (Studio IDE, VS Code/Cursor, dbt CLI) powered by the dbt Fusion engine or dbt Core engine: | Command | Description | Studio IDE | dbt CLI | VS Code/Cursor | | ------------------------------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ---------- | ------- | -------------- | | [`list metrics`](#list-metrics) | Lists metrics with dimensions. | ✅ | ✅ | ✅ | | [`list dimensions`](#list) | Lists unique dimensions for metrics. | ✅ | ✅ | ✅ | | [`list dimension-values`](#list-dimension-values) | List dimensions with metrics. | ✅ | ✅ | ✅ | | [`list entities`](#list-entities) | Lists all unique entities. | ✅ | ✅ | ✅ | | [`list saved-queries`](#list-saved-queries) | Lists available saved queries. Use the `--show-exports` flag to display each export listed under a saved query or `--show-parameters` to show the full query parameters each saved query uses. | ✅ | ✅ | ✅ | | [`query`](#query) | Query metrics, saved queries, and dimensions you want to see in the command line interface. Refer to [query examples](#query-examples) to query metrics and dimensions (such as querying metrics, using the `where` filter, adding an `order`, and more). | ✅ | ✅ | ✅ | | [`validate`](#validate) | Validates semantic model configurations. | ✅ | ✅ | ✅ | | [`export`](#export) | Runs exports for a singular saved query for testing and generating exports in your development environment. You can also use the `--select` flag to specify particular exports from a saved query. | ❌ | ✅ | ✅ | | [`export-all`](#export-all) | Runs exports for multiple saved queries at once, saving time and effort. | ❌ | ✅ | ✅ | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | Run dbt parse to reflect metric changes When you make changes to metrics, make sure to run `dbt parse` at a minimum to update the Semantic Layer. This updates the `semantic_manifest.json` file, reflecting your changes when querying metrics. By running `dbt parse`, you won't need to rebuild all the models.  How can I query or preview metrics with the dbt CLI? Check out the following video for a short video demo of how to query or preview metrics with the dbt CLI: This section applies to dbt Core users running the dbt Core engine or users running [source available](https://www.getdbt.com/licenses-faq) Fusion locally and not a dbt platform user. Commands and validations execute locally and use the `mf` prefix before the command name to execute them. For example, to list all metrics, run `mf list metrics`. * [`list metrics`](#list-metrics) — Lists metrics with dimensions. * [`list dimensions`](#list) — Lists unique dimensions for metrics. * [`list dimension-values`](#list-dimension-values) — List dimensions with metrics. * [`list entities`](#list-entities) — Lists all unique entities. * [`validate-configs`](#validate-configs) — Validates semantic model configurations. * [`health-checks`](#health-checks) — Performs data platform health check. * [`tutorial`](#tutorial) — Dedicated MetricFlow tutorial to help get you started. * [`query`](#query) — Query metrics and dimensions you want to see in the command line interface. Refer to [query examples](#query-examples) to help you get started. #### List metrics[​](#list-metrics "Direct link to List metrics") This command lists the metrics with their available dimensions: ```bash dbt sl list metrics # For dbt platform users (Core or Fusion engine) mf list metrics # For open-source users (Core or Fusion source available) Options: --search TEXT Filter available metrics by this search term --show-all-dimensions Show all dimensions associated with a metric. --help Show this message and exit. ``` #### List dimensions[​](#list-dimensions "Direct link to List dimensions") This command lists all unique dimensions for a metric or multiple metrics. It displays only common dimensions when querying multiple metrics: ```bash dbt sl list dimensions --metrics # For dbt platform users (Core or Fusion engine) mf list dimensions --metrics # For open-source users (Core or Fusion source available) Options: --metrics SEQUENCE List dimensions by given metrics (intersection). Ex. --metrics bookings,messages --help Show this message and exit. ``` #### List dimension-values[​](#list-dimension-values "Direct link to List dimension-values") This command lists all dimension values with the corresponding metric: ```bash dbt sl list dimension-values --metrics --dimension # For dbt platform users (Core or Fusion engine) mf list dimension-values --metrics --dimension # For open-source users (Core or Fusion source available) Options: --dimension TEXT Dimension to query values from [required] --metrics SEQUENCE Metrics that are associated with the dimension [required] --end-time TEXT Optional iso8601 timestamp to constraint the end time of the data (inclusive) *Not available in the dbt platform/Fusion yet --start-time TEXT Optional iso8601 timestamp to constraint the start time of the data (inclusive) *Not available in in the dbt platform/Fusion yet --help Show this message and exit. ``` #### List entities[​](#list-entities "Direct link to List entities") This command lists all unique entities: ```bash dbt sl list entities --metrics # For dbt platform users (Core or Fusion engine) mf list entities --metrics # For open-source users (Core or Fusion source available) Options: --metrics SEQUENCE List entities by given metrics (intersection). Ex. --metrics bookings,messages --help Show this message and exit. ``` #### List saved queries[​](#list-saved-queries "Direct link to List saved queries") This command lists all available saved queries: ```bash dbt sl list saved-queries # For dbt platform users (Core or Fusion engine) ``` You can also add the `--show-exports` flag (or option) to show each export listed under a saved query: ```bash dbt sl list saved-queries --show-exports # For dbt platform users (Core or Fusion engine) ``` **Output** ```bash dbt sl list saved-queries --show-exports The list of available saved queries: - new_customer_orders exports: - Export(new_customer_orders_table, exportAs=TABLE) - Export(new_customer_orders_view, exportAs=VIEW) - Export(new_customer_orders, alias=orders, schemas=customer_schema, exportAs=TABLE) ``` #### Validate[​](#validate "Direct link to Validate") The following command performs validations against the defined semantic model configurations. * For Fusion and dbt users in the dbt platform CLI or locally with a valid `dbt_cloud.yml`: ```bash dbt sl validate ``` * For open-source users (Core or Fusion source available): ```bash mf validate-configs ``` ```bash Options: --timeout # dbt platform only Optional timeout for data warehouse validation in dbt platform. --dw-timeout INTEGER # dbt Core only Optional timeout for data warehouse validation steps. Default None. --skip-dw # dbt Core only Skips the data warehouse validations. --show-all # dbt Core only Prints warnings and future errors. --verbose-issues # dbt Core only Prints extra details about issues. --semantic-validation-workers INTEGER # dbt Core only Uses specified number of workers for large configs. --help Show this message and exit. ``` #### Health checks[​](#health-checks "Direct link to Health checks") The following command performs a health check against the data platform you provided in the configs. Note, in dbt, the `health-checks` command isn't required since it uses dbt's credentials to perform the health check. ```bash mf health-checks # For open-source users (Core or Fusion source available) ``` #### Tutorial[​](#tutorial "Direct link to Tutorial") Follow the dedicated MetricFlow tutorial to help you get started: ```bash mf tutorial # For open-source users (Core or Fusion source available) ``` #### Query[​](#query "Direct link to Query") Create a new query with MetricFlow and execute it against your data platform. The query returns the following result: ```bash dbt sl query --metrics --group-by # For dbt platform users (Core or Fusion engine) dbt sl query --saved-query # For dbt platform users (Core or Fusion engine) mf query --metrics --group-by # For open-source users (Core or Fusion source available) Options: --metrics SEQUENCE Syntax to query single metrics: --metrics metric_name For example, --metrics bookings To query multiple metrics, use --metrics followed by the metric names, separated by commas without spaces. For example, --metrics bookings,messages --group-by SEQUENCE Syntax to group by single dimension/entity: --group-by dimension_name For example, --group-by ds For multiple dimensions/entities, use --group-by followed by the dimension/entity names, separated by commas without spaces. For example, --group-by ds,org --end-time TEXT Optional iso8601 timestamp to constraint the end time of the data (inclusive). *Not available in the dbt platform/Fusion yet --start-time TEXT Optional iso8601 timestamp to constraint the start time of the data (inclusive) *Not available in the dbt platform/Fusion yet --where TEXT SQL-like where statement provided as a string and wrapped in quotes. All filter items must explicitly reference fields or dimensions that are part of your model. To query a single statement: ---where "{{ Dimension('order_id__revenue') }} > 100" To query multiple statements: --where "{{ Dimension('order_id__revenue') }} > 100" --where "{{ Dimension('user_count') }} < 1000" # make sure to wrap each statement in quotes To add a dimension filter, use the `Dimension()` template wrapper to indicate that the filter item is part of your model. Refer to the FAQ for more info on how to do this using a template wrapper. --limit TEXT Limit the number of rows out using an int or leave blank for no limit. For example: --limit 100 --order-by SEQUENCE Specify metrics, dimension, or group bys to order by. Add the `-` prefix to sort query in descending (DESC) order. Leave blank for ascending (ASC) order. For example, to sort metric_time in DESC order: --order-by -metric_time To sort metric_time in ASC order and revenue in DESC order: --order-by metric_time,-revenue --csv FILENAME Provide filepath for data frame output to csv --compile (dbt platform/Fusion) In the query output, show the query that was --explain (dbt Core) executed against the data warehouse --show-dataflow-plan Display dataflow plan in explain output --display-plans Display plans (such as metric dataflow) in the browser --decimals INTEGER Choose the number of decimal places to round for the numerical values --show-sql-descriptions Shows inline descriptions of nodes in displayed SQL --help Show this message and exit. ``` #### Query examples[​](#query-examples "Direct link to Query examples") This section shares various types of query examples that you can use to query metrics and dimensions. The query examples listed are: * [Query metrics](#query-metrics) * [Query dimensions](#query-dimensions) * [Add `order`/`limit` function](#add-orderlimit) * [Add `where` clause](#add-where-clause) * [Filter by time](#filter-by-time) * [Query saved queries](#query-saved-queries) ##### Query metrics[​](#query-metrics "Direct link to Query metrics") Use the example to query multiple metrics by dimension and return the `order_total` and `users_active` metrics by `metric_time.` **Query** ```bash dbt sl query --metrics order_total,users_active --group-by metric_time # For dbt platform users (Core or Fusion engine) mf query --metrics order_total,users_active --group-by metric_time # For open-source users (Core or Fusion source available) ``` **Result** ```bash ✔ Success 🦄 - query completed after 1.24 seconds | METRIC_TIME | ORDER_TOTAL | |:--------------|---------------:| | 2017-06-16 | 792.17 | | 2017-06-17 | 458.35 | | 2017-06-18 | 490.69 | | 2017-06-19 | 749.09 | | 2017-06-20 | 712.51 | | 2017-06-21 | 541.65 | ``` ##### Query dimensions[​](#query-dimensions "Direct link to Query dimensions") You can include multiple dimensions in a query. For example, you can group by the `is_food_order` dimension to confirm if orders were for food or not. Note that when you query a dimension, you need to specify the primary entity for that dimension. In the following example, the primary entity is `order_id`. **Query** ```bash dbt sl query --metrics order_total --group-by order_id__is_food_order # For dbt platform users (Core or Fusion engine) mf query --metrics order_total --group-by order_id__is_food_order # For open-source users (Core or Fusion source available) ``` **Result** ```bash Success 🦄 - query completed after 1.70 seconds | METRIC_TIME | IS_FOOD_ORDER | ORDER_TOTAL | |:--------------|:----------------|---------------:| | 2017-06-16 | True | 499.27 | | 2017-06-16 | False | 292.90 | | 2017-06-17 | True | 431.24 | | 2017-06-17 | False | 27.11 | | 2017-06-18 | True | 466.45 | | 2017-06-18 | False | 24.24 | | 2017-06-19 | False | 300.98 | | 2017-06-19 | True | 448.11 | ``` ##### Add order/limit[​](#add-orderlimit "Direct link to Add order/limit") You can add order and limit functions to filter and present the data in a readable format. The following query limits the data set to 10 records and orders them by `metric_time`, descending. Note that using the `-` prefix will sort the query in descending order. Without the `-` prefix sorts the query in ascending order. Note that when you query a dimension, you need to specify the primary entity for that dimension. In the following example, the primary entity is `order_id`. **Query** ```bash # For dbt platform users (Core or Fusion engine) dbt sl query --metrics order_total --group-by order_id__is_food_order --limit 10 --order-by -metric_time # For open-source users (Core or Fusion source available) mf query --metrics order_total --group-by order_id__is_food_order --limit 10 --order-by -metric_time ``` **Result** ```bash ✔ Success 🦄 - query completed after 1.41 seconds | METRIC_TIME | IS_FOOD_ORDER | ORDER_TOTAL | |:--------------|:----------------|---------------:| | 2017-08-31 | True | 459.90 | | 2017-08-31 | False | 327.08 | | 2017-08-30 | False | 348.90 | | 2017-08-30 | True | 448.18 | | 2017-08-29 | True | 479.94 | | 2017-08-29 | False | 333.65 | | 2017-08-28 | False | 334.73 | ``` ##### Add where clause[​](#add-where-clause "Direct link to Add where clause") You can further filter the data set by adding a `where` clause to your query. The following example shows you how to query the `order_total` metric, grouped by `is_food_order` with multiple `where` statements (orders that are food orders and orders from the week starting on or after Feb 1st, 2024). **Query** ```bash # For dbt platform users (Core or Fusion engine) dbt sl query --metrics order_total --group-by order_id__is_food_order --where "{{ Dimension('order_id__is_food_order') }} = True" --where "{{ TimeDimension('metric_time', 'week') }} >= '2024-02-01'" # For open-source users (Core or Fusion source available) mf query --metrics order_total --group-by order_id__is_food_order --where "{{ Dimension('order_id__is_food_order') }} = True" --where "{{ TimeDimension('metric_time', 'week') }} >= '2024-02-01'" ``` Notes: * The type of dimension changes the syntax you use. So if you have a date field, use `TimeDimension` instead of `Dimension`. * When you query a dimension, you need to specify the primary entity for that dimension. In the example just shared, the primary entity is `order_id`. **Result** ```bash ✔ Success 🦄 - query completed after 1.06 seconds | METRIC_TIME | IS_FOOD_ORDER | ORDER_TOTAL | |:--------------|:----------------|---------------:| | 2017-08-31 | True | 459.90 | | 2017-08-30 | True | 448.18 | | 2017-08-29 | True | 479.94 | | 2017-08-28 | True | 513.48 | | 2017-08-27 | True | 568.92 | | 2017-08-26 | True | 471.95 | | 2017-08-25 | True | 452.93 | | 2017-08-24 | True | 384.40 | | 2017-08-23 | True | 423.61 | | 2017-08-22 | True | 401.91 | ``` ##### Filter by time[​](#filter-by-time "Direct link to Filter by time") To filter by time, there are dedicated start and end time options. Using these options to filter by time allows MetricFlow to further optimize query performance by pushing down the where filter when appropriate. Note that when you query a dimension, you need to specify the primary entity for that dimension. In the following example, the primary entity is `order_id`. **Query** ```bash # For open-source users (Core or Fusion source available) mf query --metrics order_total --group-by order_id__is_food_order --limit 10 --order-by -metric_time --where "is_food_order = True" --start-time '2017-08-22' --end-time '2017-08-27' ``` **Result** ```bash ✔ Success 🦄 - query completed after 1.53 seconds | METRIC_TIME | IS_FOOD_ORDER | ORDER_TOTAL | |:--------------|:----------------|---------------:| | 2017-08-27 | True | 568.92 | | 2017-08-26 | True | 471.95 | | 2017-08-25 | True | 452.93 | | 2017-08-24 | True | 384.40 | | 2017-08-23 | True | 423.61 | | 2017-08-22 | True | 401.91 | ``` ##### Query saved queries[​](#query-saved-queries "Direct link to Query saved queries") You can use this for frequently used queries. Replace `` with the name of your [saved query](https://docs.getdbt.com/docs/build/saved-queries.md). **Query** ```bash dbt sl query --saved-query # For dbt platform users (Core or Fusion engine) mf query --saved-query # For open-source users (Core or Fusion source available) ``` For example, if you use dbt and have a saved query named `new_customer_orders`, you would run `dbt sl query --saved-query new_customer_orders`. A note on querying saved queries When querying [saved queries](https://docs.getdbt.com/docs/build/saved-queries.md), you can use parameters such as `where`, `limit`, `order`, `compile`, and so on. However, keep in mind that you can't access `metric` or `group_by` parameters in this context. This is because they are predetermined and fixed parameters for saved queries, and you can't change them at query time. If you would like to query more metrics or dimensions, you can build the query using the standard format. #### Additional query examples[​](#additional-query-examples "Direct link to Additional query examples") The following tabs present additional query examples, like exporting to a CSV. Select the tab that best suits your needs: * \--compile/--explain flag * Export to CSV Add `--compile` (or `--explain` for dbt Core users) to your query to view the SQL generated by MetricFlow. **Query** ```bash # For dbt platform users (Core or Fusion engine) dbt sl query --metrics order_total --group-by metric_time,is_food_order --limit 10 --order-by -metric_time --where "is_food_order = True" --start-time '2017-08-22' --end-time '2017-08-27' --compile # For open-source users (Core or Fusion source available) mf query --metrics order_total --group-by metric_time,is_food_order --limit 10 --order-by -metric_time --where "is_food_order = True" --start-time '2017-08-22' --end-time '2017-08-27' --explain ``` **Result** ```bash ✔ Success 🦄 - query completed after 0.28 seconds 🔎 SQL (remove --compile to see data or add --show-dataflow-plan to see the generated dataflow plan): select metric_time , is_food_order , sum(order_cost) as order_total from ( select cast(ordered_at as date) as metric_time , is_food_order , order_cost from analytics.js_dbt_sl_demo.orders orders_src_1 where cast(ordered_at as date) between cast('2017-08-22' as timestamp) and cast('2017-08-27' as timestamp) ) subq_3 where is_food_order = True group by metric_time , is_food_order order by metric_time desc limit 10 ``` Add the `--csv file_name.csv` flag to export the results of your query to a CSV. The `--csv` flag is available in dbt Core only and not supported in dbt. **Query** ```bash # For open-source users (Core or Fusion source available) mf query --metrics order_total --group-by metric_time,is_food_order --limit 10 --order-by -metric_time --where "is_food_order = True" --start-time '2017-08-22' --end-time '2017-08-27' --csv query_example.csv ``` **Result** ```bash ✔ Success 🦄 - query completed after 0.83 seconds 🖨 Successfully written query output to query_example.csv ``` #### Time granularity[​](#time-granularity "Direct link to Time granularity") Optionally, you can specify the time granularity you want your data to be aggregated at by appending two underscores and the unit of granularity you want to `metric_time`, the global time dimension. You can group the granularity by: `day`, `week`, `month`, `quarter`, and `year`. Below is an example for querying metric data at a monthly grain: ```bash dbt sl query --metrics revenue --group-by metric_time__month # For dbt platform users (Core or Fusion engine) mf query --metrics revenue --group-by metric_time__month # For open-source users (Core or Fusion source available) ``` #### Export[​](#export "Direct link to Export") Run [exports for a specific saved query](https://docs.getdbt.com/docs/use-dbt-semantic-layer/exports.md#exports-for-single-saved-query). Use this command to test and generate exports in your development environment. You can also use the `--select` flag to specify particular exports from a saved query. Refer to [exports in development](https://docs.getdbt.com/docs/use-dbt-semantic-layer/exports.md#exports-in-development) for more info. Export is available in dbt. ```bash dbt sl export # For dbt platform users (Core or Fusion engine) ``` #### Export-all[​](#export-all "Direct link to Export-all") Run [exports for multiple saved queries](https://docs.getdbt.com/docs/use-dbt-semantic-layer/exports.md#exports-for-multiple-saved-queries) at once. This command provides a convenient way to manage and execute exports for several queries simultaneously, saving time and effort. Refer to [exports in development](https://docs.getdbt.com/docs/use-dbt-semantic-layer/exports.md#exports-in-development) for more info. Export is available in dbt. ```bash dbt sl export-all # For dbt platform users (Core or Fusion engine) ``` #### FAQs[​](#faqs "Direct link to FAQs")  How can I add a dimension filter to a where filter? To add a dimension filter to a where filter, you have to indicate that the filter item is part of your model and use a template wrapper: `{{Dimension('primary_entity__dimension_name')}}`. Here's an example query: `dbt sl query --metrics order_total --group-by metric_time --where "{{Dimension('order_id__is_food_order')}} = True"`. Before using the template wrapper, however, set up your terminal to escape curly braces for the filter template to work. Details How to set up your terminal to escape curly braces? To configure your `.zshrc`profile to escape curly braces, you can use the `setopt` command to enable the `BRACECCL` option. This option will cause the shell to treat curly braces as literals and prevent brace expansion. Refer to the following steps to set it up:
1. Open your terminal. 2. Open your `.zshrc` file using a text editor like `nano`, `vim`, or any other text editor you prefer. You can use the following command to open it with `nano`: ```bash nano ~/.zshrc ``` 3. Add the following line to the file: ```bash setopt BRACECCL ``` 4. Save and exit the text editor (in `nano`, press Ctrl + O to save, and Ctrl + X to exit). 5. Source your `.zshrc` file to apply the changes: ```bash source ~/.zshrc ``` 6. After making these changes, your Zsh shell will treat curly braces as literal characters and will not perform brace expansion. This means that you can use curly braces without worrying about unintended expansions. Keep in mind that modifying your shell configuration files can have an impact on how your shell behaves. If you're not familiar with shell configuration, it's a good idea to make a backup of your `.zshrc` file before making any changes. If you encounter any issues or unexpected behavior, you can revert to the backup.  Why is my query limited to 100 rows in the dbt CLI? The default `limit` for query issues from the dbt CLI is 100 rows. We set this default to prevent returning unnecessarily large data sets as the dbt CLI is typically used to query the dbt Semantic Layer during the development process, not for production reporting or to access large data sets. For most workflows, you only need to return a subset of the data. However, you can change this limit if needed by setting the `--limit` option in your query. For example, to return 1000 rows, you can run `dbt sl list metrics --limit 1000`.  How can I query multiple metrics, group bys, or where statements? To query multiple metrics, group bys, or where statements in your command, follow this guidance: * To query multiple metrics and group bys, use the `--metrics` or `--group-by` syntax followed by the metric or dimension/entity names, separated by commas without spaces: * Multiple metrics example: `dbt sl query --metrics accounts_active,users_active` * Multiple dimension/entity example: `dbt sl query --metrics accounts_active,users_active --group-by metric_time__week,accounts__plan_tier` * To query multiple where statements, use the `--where` syntax and wrap the statement in quotes: * Multiple where statement example: `dbt sl query --metrics accounts_active,users_active --group-by metric_time__week,accounts__plan_tier --where "metric_time__week >= '2024-02-01'" --where "accounts__plan_tier = 'coco'"`  How can I sort my query in ascending or descending order? When you query metrics, use `--order-by` to specify metrics or groupings to order by. The `order_by` option applies to metrics, dimensions, and group bys. Add the `-` prefix to sort your query in descending (DESC) order. Leave blank for ascending (ASC) order: * For example, to query a metric and sort `metric_time` in descending order, run `dbt sl query --metrics order_total --group-by metric_time --order-by -metric_time`. Note that the `-` prefix in `-metric_time` sorts the query in descending order. * To query a metric and sort `metric_time` in ascending order and `revenue` in descending order, run `dbt sl query --metrics order_total --order-by metric_time,-revenue`. Note that `metric_time` without a prefix is sorted in ascending order and `-revenue` with a `-` prefix sorts the query in descending order. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### MetricFlow time spine #### Custom calendar Preview[​](#custom-calendar- "Direct link to custom-calendar-") tip Check out our mini guide on [how to create a time spine table](https://docs.getdbt.com/guides/mf-time-spine.md) to get started! #### Related docs[​](#related-docs "Direct link to Related docs") * [MetricFlow time granularity](https://docs.getdbt.com/docs/build/dimensions.md?dimension=time_gran#time) * [MetricFlow time spine mini guide](https://docs.getdbt.com/guides/mf-time-spine.md) #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Metrics as dimensions with metric filters [Metrics](https://docs.getdbt.com/docs/build/metrics-overview.md) provide users with valuable insights into their data, like number of active users and overall performance trends to inform business decisions. [Dimensions](https://docs.getdbt.com/docs/build/dimensions.md), on the other hand, help categorize data through attributes, like user type or number of orders placed by a customer. To make informed business decisions, some metrics need the value of another metric as part of the metric definition, leading us to "metrics as dimensions". This document explains how you can use metrics as dimensions with metric filters, enabling you to create more complex metrics and gain more insights. #### Reference a metric in a filter[​](#reference-a-metric-in-a-filter "Direct link to Reference a metric in a filter") Use the `Metric()` object syntax to reference a metric in the `where` filter for another metric. The function for referencing a metric accepts a metric name and exactly one entity: ```yaml {{ Metric('metric_name', group_by=['entity_name']) }} ``` ##### Usage example[​](#usage-example "Direct link to Usage example") As an example, a Software as a service (SaaS) company wants to count activated accounts. In this case, the definition of an activated account is an account with more than five data model runs. To express this metric in SQL, the company will: * Write a query to calculate the number of data model runs per account. * Then count the number of accounts who have more than five data model runs. models/model\_name.sql ```sql with data_models_per_user as ( select account_id as account, count(model_runs) as data_model_runs from {{ ref('fct_model_runs') }} group by account_id ), activated_accounts as ( select count(distinct account_id) as activated_accounts from {{ ref('dim_accounts') }} left join data_models_per_user on {{ ref('dim_accounts') }}.account_id = data_models_per_user.account where data_models_per_user.data_model_runs > 5 ) select * from activated_accounts ``` This SQL query calculates the number of `activated_accounts` by using the `data_model_runs` metric as a dimension for the user entity. It filters based on the metric value scoped to the account entity. You can express this logic at the query level or in the metric's YAML configuration. ###### YAML configuration[​](#yaml-configuration "Direct link to YAML configuration") Using the same `activated_accounts` example mentioned in [the usage example](#usage-example), the following YAML example explains how a company can create [semantic models](https://docs.getdbt.com/docs/build/semantic-models.md) and [metrics](https://docs.getdbt.com/docs/build/metrics-overview.md), and use the `Metric()` object to reference the `data_model_runs` metric in the `activated_accounts` metric filter: Let’s break down the SQL the system generates based on the metric definition when you run `dbt sl query --metrics activated_accounts` from the command line interface: * The filter `{{ Metric('data_model_runs', group_by=['account']) }}` generates SQL similar to the `data_models_per_user` sub-query shown earlier: ```sql select sum(1) as data_model_runs, account from data_model_runs group by account ``` * MetricFlow joins this query to the query generated by `accounts` on the group by elements and applies the filter conditions: ```sql select sum(1) as activated_accounts from accounts left join ( select sum(1) as data_model_runs, account from data_model_runs group by account ) as subq on accounts.account = subq.account where data_model_runs > 5 ``` The intermediate tables used to create this metric is: Accounts with the `data_model_runs` dimension | account | data\_model runs | | ------- | ---------------- | | 1 | 4 | | 2 | 7 | | 3 | 9 | | 4 | 1 | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | MetricFlow then filters this table to accounts with more than 5 data model runs and counts the number of accounts that meet this criteria: | activated\_accounts | | ------------------- | | 2 | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | ###### Query filter[​](#query-filter "Direct link to Query filter") You can also use metrics in filters at the query level. Run this command in the command line interface (CLI) to generate the same SQL query referenced earlier: `dbt sl query --metrics accounts --where "{{ Metric('data_model_runs', group_by=['account']) }} > 5"` The resulting SQL and data will be the same, except with the `accounts` metric name instead of `activated_accounts`. #### Considerations[​](#considerations "Direct link to Considerations") * When using a metric filter, ensure the sub-query can join to the outer query without fanning out the result (unexpectedly increasing the number of rows). * The example that filters accounts using `{{ Metric('data_model_runs', group_by=['account']) }}` is valid because it aggregates the model runs to the account level. * However, filtering 'accounts' by `{{ Metric('data_model_runs', group_by=['model']) }}` isn't valid due to a one-to-many relationship between accounts and model runs, leading to duplicate data. * You can only group a metric by one entity. The ability to support grouping by multiple entities and dimensions is pending. * In the future, you can use metrics as dimensions for some of the following example use cases: * User segments: Segment users by using the number of orders placed by a user in the last 7 days as a dimension. * Churn prediction: Use the number of support tickets an account submitted in the first 30 days to predict potential churn. * Activation tracking: Define account or user activation based on the specific actions taken within a certain number of days after signing up. * Support for metric filters requiring multi-hop joins is pending. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Microsoft Excel StarterEnterpriseEnterprise + ### Microsoft Excel [Starter](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing")[Enterprise](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing")[Enterprise +](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing") The Semantic Layer offers a seamless integration with Excel Online and Desktop through a custom menu. This add-on allows you to build Semantic Layer queries and return data on your metrics directly within Excel. #### Prerequisites[​](#prerequisites "Direct link to Prerequisites") * You have [configured the Semantic Layer](https://docs.getdbt.com/docs/use-dbt-semantic-layer/setup-sl.md) and are using dbt v1.6 or higher. * You need a Microsoft Excel account with access to install add-ons. * You have a [dbt Environment ID](https://docs.getdbt.com/docs/use-dbt-semantic-layer/setup-sl.md#set-up-dbt-semantic-layer). * You have a [service token](https://docs.getdbt.com/docs/dbt-cloud-apis/service-tokens.md) or a [personal access token](https://docs.getdbt.com/docs/dbt-cloud-apis/user-tokens.md) to authenticate with from a dbt account. * You must have a dbt Starter, Enterprise, or Enterprise+ [account](https://www.getdbt.com/pricing). Suitable for both Multi-tenant and Single-tenant deployment. tip 📹 For on-demand video learning, explore the [Querying the Semantic Layer with Excel](https://learn.getdbt.com/courses/querying-the-semantic-layer-with-excel) course to learn how to query metrics with Excel. #### Installing the add-on[​](#installing-the-add-on "Direct link to Installing the add-on") The Semantic Layer Microsoft Excel integration is available to download directly on [Microsoft AppSource](https://appsource.microsoft.com/en-us/product/office/WA200007100?tab=Overview). You can choose to download this add-on in for both [Excel Desktop](https://pages.store.office.com/addinsinstallpage.aspx?assetid=WA200007100\&rs=en-US\&correlationId=4132ecd1-425d-982d-efb4-de94ebc83f26) and [Excel Online](https://pages.store.office.com/addinsinstallpage.aspx?assetid=WA200007100\&rs=en-US\&correlationid=4132ecd1-425d-982d-efb4-de94ebc83f26\&isWac=True) 1. In Excel, authenticate with your Host, dbt Environment ID, and service token. * Access your Environment ID, Host, and URLs in your Semantic Layer settings. Generate a service token in the Semantic Layer settings or **API tokens** settings. Alternatively, you can also create a personal access token by going to **API tokens** > **Personal tokens**. [![Access your Environment ID, Host, and URLs in your dbt Semantic Layer settings. Generate a service token in the Semantic Layer settings or API tokens settings](/img/docs/dbt-cloud/semantic-layer/sl-and-gsheets.png?v=2 "Access your Environment ID, Host, and URLs in your dbt Semantic Layer settings. Generate a service token in the Semantic Layer settings or API tokens settings")](#)Access your Environment ID, Host, and URLs in your dbt Semantic Layer settings. Generate a service token in the Semantic Layer settings or API tokens settings 2. Start querying your metrics using the **Query Builder**. For more info on the menu functions, refer to [Query Builder functions](#query-builder-functions). To cancel a query while running, press the **Cancel** button. When querying your data with Microsoft Excel: * It returns the data to the cell you clicked on. * Results that take longer than one minute to load into Excel will fail. This limit only applies to the loading process, not the time it takes for the data platform to run the query. * If you're using this extension, make sure you're signed into Microsoft with the same Excel profile you used to set up the Add-In. Log in with one profile at a time as using multiple profiles at once might cause issues. * Note that only standard granularities are currently available, custom time granularities aren't currently supported for this integration. #### Query Builder functions[​](#query-builder-functions "Direct link to Query Builder functions") The Microsoft Excel **Query Builder** custom menu has the following capabilities: | Menu items | Description | | ---------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | Metrics | Search and select metrics. | | Group By | Search and select dimensions or entities to group by. Dimensions are grouped by the entity of the semantic model they come from. You may choose dimensions on their own without metrics. | | Time Range | Quickly select time ranges to look at the data, which applies to the main time series for the metrics (metric time), or do more advanced filter using the "Custom" selection. | | Where | Filter your data. This includes categorical and time filters. | | Order By | Return your data order. | | Limit | Set a limit for the rows of your output. | Note: Click the **info** button next to any metric or dimension to see its defined description from your dbt project. ###### Modifying time granularity[​](#modifying-time-granularity "Direct link to Modifying time granularity") When you select time dimensions in the **Group By** menu, you'll see a list of available time granularities. The lowest granularity is selected by default. Metric time is the default time dimension for grouping your metrics. info Note: [Custom time granularities](https://docs.getdbt.com/docs/build/metricflow-time-spine.md#add-custom-granularities) (like fiscal year) aren't currently supported or accessible in this integration. Only [standard granularities](https://docs.getdbt.com/docs/build/dimensions.md?dimension=time_gran#time) (like day, week, month, and so on) are available. If you'd like to access custom granularities, consider using the [Semantic Layer APIs](https://docs.getdbt.com/docs/dbt-cloud-apis/sl-api-overview.md). ###### Filtering data[​](#filtering-data "Direct link to Filtering data") To use the filter functionality, choose the [dimension](https://docs.getdbt.com/docs/build/dimensions.md) you want to filter by and select the operation you want to filter on. * For categorical dimensions, you can type a value into search or select from a populated list. * For entities, you must type the value you are looking for as we do not load all of them given the large number of values. * Continue adding additional filters as needed with AND and OR. * For time dimensions, you can use the time range selector to filter on presets or custom options. The time range selector applies only to the primary time dimension (`metric_time`). For all other time dimensions that aren't `metric_time`, you can use the "Where" option to apply filters. ###### Other settings[​](#other-settings "Direct link to Other settings") If you would like to just query the data values without the headers, you can optionally select the **Exclude column names** box. To return your results and keep any previously selected data below it intact, un-select the **Clear trailing rows** box. By default, we'll clear all trailing rows if there's stale data. [![Run a query in the Query Builder. Use the arrow next to the Query button to select additional settings.](/img/docs/dbt-cloud/semantic-layer/query-builder.png?v=2 "Run a query in the Query Builder. Use the arrow next to the Query button to select additional settings.")](#)Run a query in the Query Builder. Use the arrow next to the Query button to select additional settings. #### Using saved selections[​](#using-saved-selections "Direct link to Using saved selections") Saved selections allow you to save the inputs you've created in the Microsoft Excel **Query Builder** and easily access them again so you don't have to continuously build common queries from scratch. To create a saved selection: 1. Run a query in the **Query Builder**. 2. Save the selection by selecting the arrow next to the **Query** button and then select **Query & Save Selection**. 3. The application saves these selections, allowing you to view and edit them from the hamburger menu under **Saved Selections**. ##### Refreshing selections[​](#refreshing-selections "Direct link to Refreshing selections") Set your saved selections to automatically refresh every time you load the addon. You can do this by selecting **Refresh on Load** when creating the saved selection. When you access the addon and have saved selections that should refresh, you'll see "Loading..." in the cells that are refreshing. Public saved selections will refresh for anyone who edits the sheet. What's the difference between saved selections and saved queries? * Saved selections are saved components that you can create only when using the application. * Saved queries, explained in the next section, are code-defined sections of data you create in your dbt project that you can easily access and use for building selections. You can also use the results from a saved query to create a saved selection. #### Using saved queries[​](#using-saved-queries "Direct link to Using saved queries") Access [saved queries](https://docs.getdbt.com/docs/build/saved-queries.md), powered by MetricFlow, in Microsoft Excel to quickly get results from pre-defined sets of data. To access the saved queries in Microsoft Excel: 1. Open the hamburger menu in Microsoft Excel . 2. Navigate to **Saved Queries** to access the ones available to you. 3. You can also select **Build Selection**, which allows you to explore the existing query. This won't change the original query defined in the code. * If you use a `WHERE` filter in a saved query, Microsoft Excel displays the advanced syntax for this filter. #### FAQs[​](#faqs "Direct link to FAQs") I'm receiving an \`Failed ALPN\` error when trying to connect to the dbt Semantic Layer. If you're receiving a `Failed ALPN` error when trying to connect the dbt Semantic Layer with the various [data integration tools](https://docs.getdbt.com/docs/cloud-integrations/avail-sl-integrations.md) (such as Tableau, DBeaver, Datagrip, ADBC, or JDBC), it typically happens when connecting from a computer behind a corporate VPN or Proxy (like Zscaler or Check Point). The root cause is typically the proxy interfering with the TLS handshake as the Semantic Layer uses gRPC/HTTP2 for connectivity. To resolve this: * If your proxy supports gRPC/HTTP2 but isn't configured to allow ALPN, adjust its settings accordingly to allow ALPN. Or create an exception for the dbt domain. * If your proxy does not support gRPC/HTTP2, add an SSL interception exception for the dbt domain in your proxy settings This should help in successfully establishing the connection without the Failed ALPN error. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Migrate to the latest YAML spec The latest Semantic Layer specification creates an open standard for defining metrics and dimensions that works across multiple platforms. It simplifies authorship by embedding semantic annotations alongside each model, replacing measures with simple metrics, and promoting frequently used options to top-level keys. With the new spec, you get simpler configuration without losing flexibility, faster onboarding for new contributors, and a clearer path to consistent, governed metrics across your organization. Availability The new YAML spec is currently available in the dbt Fusion engine and the dbt platform **Latest** release track. Coming soon to dbt Core v1.12. For more information about availability, reach out to your account manager or post in the [#dbt-semantic-layer](https://getdbt.slack.com/archives/C046L0VTVR6) channel in the [dbt Community Slack](https://www.getdbt.com/community/join-the-community/). #### Changes in the latest spec[​](#changes-in-the-latest-spec "Direct link to Changes in the latest spec") This section highlights the key updates in the latest metrics spec and compares them to the legacy spec. * [Semantic models](#semantic-models): These define the business logic for your metrics by specifying entities, dimensions, and how they relate to your data models. In the new spec, `semantic_model` is nested directly under each model in `models:` instead of being a top-level key. * [Entities and dimensions](#entities-and-dimensions): Entities are the people, places, or things you want to group or join your metrics by (like `user_id` or `order_id`), while dimensions are the attributes you use to filter or slice your data (like `status` or `region`). In the new spec, both are defined directly under `columns:`. * [Time dimension](#time-dimension): Time dimensions are the date or timestamp columns that let you analyze metrics over time (like `order_date` or `created_at`). In the new spec, set `agg_time_dimension` at the model level as the default time dimension for all metrics, with the option to override per metric. Define `granularity` at the column level instead of using the deprecated `time_granularity`. * [Simple metrics](#simple-metrics): Metrics that directly reference a single column expression within a semantic model, without any additional columns involved. Simple metrics replace measures in the new spec. Use `type: simple` metrics defined directly within the model to replace measures. * [Advanced metrics](#advanced-metrics): These are metrics that combine or build upon other metrics, such as ratios, conversions, or derived calculations. In the new spec, define simple metrics inside the model, and create cross‑model metrics under a top‑level `metrics` block. Top-level key is required for any metric that depends on metrics or dimensions defined in a different semantic model. * [`type_params`](#type_params): This is a wrapper key in the legacy spec that contains metric-specific configurations (for example, `expr`, `join_to_timespine`). `type_params` is deprecated in the new spec and these parameters are promoted to top-level keys within each metric definition. ##### Semantic models[​](#semantic-models "Direct link to Semantic models") The `semantic_model` key is embedded under `models`. ###### New spec ```yml models: - name: fct_orders semantic_model: enabled: true # required name: fct_orders_semantic_model # optional override; defaults to value of model.name ``` ###### Legacy spec ```yml semantic_models: - name: orders model: ref('orders') ``` ##### Entities and dimensions[​](#entities-and-dimensions "Direct link to Entities and dimensions") Entities and dimensions are defined directly under columns, creating a 1:1 relationship between the physical columns and their semantic definitions. ###### New spec ```yml models: - name: orders semantic_model: enabled: true agg_time_dimension: ordered_at columns: # entities - name: order_id entity: type: primary name: order - name: customer_id entity: type: foreign name: customer # time dimension - name: ordered_at granularity: day dimension: type: time # categorical dimension - name: order_status dimension: type: categorical ``` ###### Legacy spec ```yml semantic_models: - name: orders model: ref('orders') entities: - name: order type: primary expr: order_id - name: customer type: foreign expr: customer_id dimensions: - name: ordered_at type: time type_params: time_granularity: day - name: status type: categorical expr: order_status ``` ##### Time dimension[​](#time-dimension "Direct link to Time dimension") * `agg_time_dimension`: Set once at the model level as the default time dimension for all metrics in that semantic model. You can still override it per metric with `agg_time_dimension`. * `time granularity`: Deprecated in the new spec. Define the native grain on the time dimension column with `granularity` (for example, `hour`, `day`). ###### New spec ```yml models: - name: subscriptions semantic_model: enabled: true # default aggregation time dimension for metrics in this model agg_time_dimension: activated_at columns: - name: activated_at granularity: day # native grain on the column dimension: type: time - name: created_at granularity: hour # another time column with a different native grain dimension: type: time metrics: - name: active_subscriptions type: simple agg: count expr: 1 # inherits agg_time_dimension: activated_at - name: signups_by_created_day type: simple agg: count expr: 1 agg_time_dimension: created_at # override to use created_at as the time dimension ``` ###### Legacy spec ```yml semantic_models: - name: subscriptions model: ref('subscriptions') defaults: agg_time_dimension: activated_at dimensions: - name: activated_at type: time type_params: time_granularity: day - name: created_at type: time type_params: time_granularity: hour measures: - name: active_subscriptions agg: count metrics: - name: active_subscriptions type: simple type_params: measure: active_subscriptions ``` ##### Simple metrics[​](#simple-metrics "Direct link to Simple metrics") Measures are deprecated in the new spec and are replaced with simple metrics. ###### New spec ```yml models: - name: customers semantic_model: enabled: true agg_time_dimension: first_ordered_at columns: - name: customer_id entity: name: customer type: primary - name: first_ordered_at dimension: type: time granularity: day metrics: - name: lifetime_spend_pretax type: simple # simple metrics agg: sum expr: amount_pretax ``` ###### Legacy spec ```yml semantic_models: - name: customers model: ref('customers') entities: - name: customer type: primary expr: customer_id dimensions: - name: first_ordered_at type: time type_params: time_granularity: day measures: - name: lifetime_spend_pretax agg: sum metrics: - name: lifetime_spend_pretax type: simple type_params: measure: lifetime_spend_pretax ``` ##### Advanced metrics[​](#advanced-metrics "Direct link to Advanced metrics") Define simple metrics inside the model, and create cross‑model metrics under a top‑level `metrics` block. Top-level key is required for any metric that depends on metrics or dimensions defined in a different semantic model. ###### New spec ```yml # define simple metrics where the data lives models: - name: orders ... semantic_model: enabled: true metrics: - name: orders type: simple agg: count expr: 1 - name: website semantic_model: enabled: true metrics: - name: sessions type: simple agg: count expr: 1 # advanced metrics under top-level metrics key metrics: - name: orders_per_session type: ratio numerator: orders denominator: sessions ``` ###### Legacy spec ```yml semantic_models: - name: orders model: ref('orders') measures: - name: orders agg: count - name: website model: ref('website') measures: - name: sessions agg: count metrics: - name: orders_per_session type: ratio type_params: numerator: { measure: orders } denominator: { measure: sessions } ``` ##### `type_params`[​](#type_params "Direct link to type_params") The `type_params` key is deprecated. The following are direct keys on the metric: * `expr` * `percentile` * `percentile_type` * `non_additive_dimension: { name, window_agg, group_by }` * `join_to_timespine` * `fill_nulls_with` ###### New spec ```yml models: - name: payments semantic_model: enabled: true metrics: - name: revenue_p95 type: simple agg: percentile expr: amount percentile: 95.0 percentile_type: discrete ``` ###### Legacy spec ```yml metrics: - name: revenue_p95 type: simple type_params: expr: amount percentile: 95.0 percentile_type: discrete ``` For [derived metrics](https://docs.getdbt.com/docs/build/derived.md), `type_params.metrics` is renamed `input_metrics`. ###### New spec ```yaml metrics: - name: d7_booking_change description: Difference between bookings now and 7 days ago type: derived label: d7 bookings change expr: current_bookings - bookings_7_days_ago input_metrics: - name: bookings alias: current_bookings - name: bookings offset_window: 7 days alias: bookings_7_days_ago ``` ###### Legacy spec ```yaml - name: d7_booking_change description: Difference between bookings now and 7 days ago type: derived label: d7 bookings change type_params: expr: bookings - bookings_7_days_ago metrics: - name: bookings alias: current_bookings - name: bookings offset_window: 7 days alias: bookings_7_days_ago ``` For [ratio metrics](https://docs.getdbt.com/docs/build/ratio.md), `numerator` and `denominator` are now direct keys on the metric. ###### New spec ```yaml metrics: - name: conversion_rate type: ratio numerator: conversions denominator: sessions ``` ###### Legacy spec ```yaml metrics: - name: conversion_rate type: ratio type_params: numerator: conversions denominator: sessions ``` For [cumulative metrics](https://docs.getdbt.com/docs/build/cumulative.md): * `type_params.measure` is renamed `input_metric` and must reference a metric. * `type_params.cumulative_type_params` values are direct keys on the metric: `window`, `grain_to_date`, and `period_agg`. ###### New spec ```yaml metrics: - name: revenue_mtd_cumulative type: cumulative input_metric: revenue_daily window: 30d grain_to_date: month period_agg: sum ``` ###### Legacy spec ```yaml metrics: - name: revenue_mtd_cumulative type: cumulative type_params: measure: revenue_daily cumulative_type_params: window: 30d grain_to_date: month period_agg: sum ``` For [conversion metrics](https://docs.getdbt.com/docs/build/conversion.md), the following `type_params.conversion_type_params` values are direct keys on the metric: * `entity` * `calculation` * `base_metric` (previously `base_measure`) * `conversion_metric` (previously `conversion_measure`) * `constant_properties` ###### New spec ```yaml metrics: - name: paid_signup_conversion type: conversion entity: user_id calculation: conversion_rate base_metric: signups conversion_metric: paid_signups constant_properties: - base_property: plan conversion_property: plan ``` ###### Legacy spec ```yaml metrics: - name: paid_signup_conversion type: conversion type_params: conversion_type_params: entity: user_id calculation: conversion_rate base_measure: signups conversion_measure: paid_signups constant_properties: plan: pro ``` #### Migrating to the latest spec[​](#migrating-to-the-latest-spec "Direct link to Migrating to the latest spec") Migrate your legacy metrics to the latest YAML spec using the dbt-autofix tool in your CLI, the [dbt VS Code extension](https://docs.getdbt.com/docs/about-dbt-extension.md), or dbt platform's Studio IDE. note [dbt Copilot](https://docs.getdbt.com/docs/cloud/dbt-copilot.md) doesn't yet support generating semantic models with the latest YAML spec. ##### Package compatibility[​](#package-compatibility "Direct link to Package compatibility") If your project uses dbt packages (listed in `packages.yml`) that define metrics or semantic models, the package maintainer must update those packages to use the latest YAML spec. The [dbt-autofix tool](https://github.com/dbt-labs/dbt-autofix) only updates files in your current dbt project (like models, marts, and so on) and does not update installed packages under `dbt_packages/`. If an installed package still uses the legacy metrics spec, dbt may raise parsing or validation errors after migration. To update packages, a package maintainer should: 1. Run `dbt-autofix deprecations --semantic-layer` in the package repository. 2. Validate the changes by running: * For Fusion and dbt users in the dbt platform CLI or locally with a valid `dbt_cloud.yml`: ```bash dbt parse dbt sl validate ``` When using `dbt sl validate` locally, the command validates your local semantic manifest, and not the platform's manifest. This means your uncommitted local changes are included in the validation. * For Fusion CLI users not connected to dbt platform and using local MetricFlow: ```bash dbt parse mf validate-configs ``` 3. Release a new version of the package with the updated metrics definitions. After a compatible version is released, update your project to [install the new package version](https://docs.getdbt.com/docs/build/packages.md). You can then migrate your metrics to the latest spec with the following steps, depending on which tool you're using. * [Using the CLI or VS Code extension](#using-the-cli-or-vs-code-extension) * [Using the Studio IDE](#using-the-studio-ide) ##### Using the CLI or VS Code extension[​](#using-the-cli-or-vs-code-extension "Direct link to Using the CLI or VS Code extension") The [dbt-autofix tool](https://github.com/dbt-labs/dbt-autofix) rewrites legacy metrics YAML into the latest format and produces a clear, reviewable diff in version control. Make sure you have installed the latest version of the autofix tool before migrating to the new spec using the CLI or the dbt VS Code extension. 1. In your CLI or in the VS Code extension, run the following command: ```bash dbt-autofix deprecations --semantic-layer ``` 2. Review the diff and resolve all flagged items. 3. Run parsing and validations: ```bash dbt parse mf validate-configs ``` ##### Using the Studio IDE[​](#using-the-studio-ide "Direct link to Using the Studio IDE") Convert your metrics in the Studio IDE in the dbt platform without having to install the `dbt-autofix` tool. 1. Navigate to the Studio IDE by clicking **Studio** in the left menu. 2. Make sure to save and commit your work before proceeding. The autofix command may overwrite any unsaved changes. 3. In the Studio IDE, run the following command: ```bash dbt-autofix deprecations --semantic-layer ``` 4. Click **Commit and sync** in the top left of the Studio IDE to commit these changes to the project repository. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Model access "Model access" is not "User access" **Model groups and access** and **user groups and access** mean two different things. "User groups and access" is a specific term used in dbt to manage permissions. Refer to [User access](https://docs.getdbt.com/docs/cloud/manage-access/about-user-access.md) for more info. The two concepts will be closely related, as we develop multi-project collaboration workflows this year: * Users with access to develop in a dbt project can view and modify **all** models in that project, including private models. * Users in the same dbt account *without* access to develop in a project cannot view that project's private models, and they can take a dependency on its public models only. #### Related documentation[​](#related-documentation "Direct link to Related documentation") * [`groups`](https://docs.getdbt.com/docs/build/groups.md) * [`access`](https://docs.getdbt.com/reference/resource-configs/access.md) #### Groups[​](#groups "Direct link to Groups") Models can be grouped under a common designation with a shared owner. For example, you could group together all models owned by a particular team, or related to modeling a specific data source (`github`). Why define model `groups`? There are two reasons: * It turns implicit relationships into an explicit grouping, with a defined owner. By thinking about the interface boundaries *between* groups, you can have a cleaner (less entangled) DAG. In the future, those interface boundaries could be appropriate as the interfaces between separate projects. * It enables you to designate certain models as having "private" access—for use exclusively within that group. Other models will be restricted from referencing (taking a dependency on) those models. In the future, they won't be visible to other teams taking a dependency on your project—only "public" models will be. If you follow our [best practices for structuring a dbt project](https://docs.getdbt.com/best-practices/how-we-structure/1-guide-overview.md), you're probably already using subdirectories to organize your dbt project. It's easy to apply a `group` label to an entire subdirectory at once: dbt\_project.yml ```yml models: my_project_name: marts: customers: +group: customer_success finance: +group: finance ``` Each model can only belong to one `group`, and groups cannot be nested. If you set a different `group` in that model's YAML or in-file config, it will override the `group` applied at the project level. ###### Considerations[​](#considerations "Direct link to Considerations") There are some considerations to keep in mind when using model governance features: * Model governance features like model access, contracts, and versions strengthen trust and stability in your dbt project. Because they add structure, they can make rollbacks harder (for example, removing model access) and increase maintenance if adopted too early. Before adding governance features, consider whether your dbt project is ready to benefit from them. Introducing governance while models are still changing can complicate future changes. * Governance features are model-specific. They don't apply to other resource types, including snapshots, seeds, or sources. This is because these objects can change structure over time (for example, snapshots capture evolving historical data) and aren't suited to guarantees like contracts, access, or versioning. #### Access modifiers[​](#access-modifiers "Direct link to Access modifiers") Some models are implementation details, meant for reference only within their group of related models. Other models should be accessible through the [ref](https://docs.getdbt.com/reference/dbt-jinja-functions/ref.md) function across groups and projects. Models can set an [access modifier](https://en.wikipedia.org/wiki/Access_modifiers) to indicate their intended level of accessibility. | Access | Referenceable by | | --------- | ---------------------------------------------------------------------------------------- | | private | Same group | | protected | Same project (or installed as a package) | | public | Any group, package, or project. When defined, rerun a production job to apply the change | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | If you try to reference a model outside of its supported access, you will see an error: ```shell dbt run -s marketing_model ... dbt.exceptions.DbtReferenceError: Parsing Error Node model.jaffle_shop.marketing_model attempted to reference node model.jaffle_shop.finance_model, which is not allowed because the referenced node is private to the finance group. ``` By default, all models are `protected`. This means that other models in the same project can reference them, regardless of their group. This is largely for backward compatibility when assigning groups to an existing set of models, as there may already be existing references across group assignments. However, it is recommended to set the access modifier of a new model to `private` to prevent other project resources from taking dependencies on models not intentionally designed for sharing across groups. models/marts/customers.yml ```yaml # First, define the group and owner groups: - name: customer_success owner: name: Customer Success Team email: cx@jaffle.shop # Then, add 'group' + 'access' modifier to specific models models: # This is a public model -- it's a stable & mature interface for other teams/projects - name: dim_customers config: group: customer_success # changed to config in v1.10 access: public # changed to config in v1.10 # This is a private model -- it's an intermediate transformation intended for use in this context *only* - name: int_customer_history_rollup config: group: customer_success # changed to config in v1.10 access: private # changed to config in v1.10 # This is a protected model -- it might be useful elsewhere in *this* project, # but it shouldn't be exposed elsewhere - name: stg_customer__survey_results config: group: customer_success # changed to config in v1.10 access: protected # changed to config in v1.10 ``` Models with `materialized` set to `ephemeral` cannot have the access property set to public. For example, if you have a model config set as: models/my\_model.sql ```sql {{ config(materialized='ephemeral') }} ``` And the model access is defined: models/my\_project.yml ```yaml models: - name: my_model config: access: public # changed to config in v1.10 ``` It will lead to the following error: ```text ❯ dbt parse 02:19:30 Encountered an error: Parsing Error Node model.jaffle_shop.my_model with 'ephemeral' materialization has an invalid value (public) for the access field ``` #### FAQs[​](#faqs "Direct link to FAQs") ##### How does model access relate to database permissions?[​](#how-does-model-access-relate-to-database-permissions "Direct link to How does model access relate to database permissions?") These are different! Specifying `access: public` on a model does not trigger dbt to automagically grant `select` on that model to every user or role in your data platform when you materialize it. You have complete control over managing database permissions on every model/schema, as makes sense to you & your organization. Of course, dbt can facilitate this by means of [the `grants` config](https://docs.getdbt.com/reference/resource-configs/grants.md), and other flexible mechanisms. For example: * Grant access to downstream queriers on public models * Restrict access to private models, by revoking default/future grants, or by landing them in a different schema As we continue to develop multi-project collaboration, `access: public` will mean that other teams are allowed to start taking a dependency on that model. This assumes that they've requested, and you've granted them access, to select from the underlying dataset. ##### How do I ref a model from another project?[​](#how-do-i-ref-a-model-from-another-project "Direct link to How do I ref a model from another project?") You can `ref` a model from another project in two ways: 1. [Project dependency](https://docs.getdbt.com/docs/mesh/govern/project-dependencies.md): In dbt Enterprise, you can use project dependencies to `ref` a model. dbt uses a behind-the-scenes metadata service to resolve the reference, enabling efficient collaboration across teams and at scale. 2. ["Package" dependency](https://docs.getdbt.com/docs/build/packages.md): Another way to `ref` a model from another project is to treat the other project as a package dependency. This requires installing the other project as a package, including its full source code, as well as its upstream dependencies. ##### How do I restrict access to models defined in a package?[​](#how-do-i-restrict-access-to-models-defined-in-a-package "Direct link to How do I restrict access to models defined in a package?") Source code installed from a package becomes part of your runtime environment. You can call macros and run models as if they were macros and models that you had defined in your own project. For this reason, model access restrictions are "off" by default for models defined in packages. You can reference models from that package regardless of their `access` modifier. The project is installed as a package can optionally restrict external `ref` access to just its public models. The package maintainer does this by setting a `restrict-access` config to `True` in `dbt_project.yml`. By default, the value of this config is `False`. This means that: * Models in the package with `access: protected` may be referenced by models in the root project, as if they were defined in the same project * Models in the package with `access: private` may be referenced by models in the root project, so long as they also have the same `group` config When `restrict-access: True`: * Any `ref` from outside the package to a protected or private model in that package will fail. * Only models with `access: public` can be referenced outside the package. dbt\_project.yml ```yml restrict-access: True # default is False ``` #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Model contracts #### Related documentation[​](#related-documentation "Direct link to Related documentation") * [`contract`](https://docs.getdbt.com/reference/resource-configs/contract.md) * [`columns`](https://docs.getdbt.com/reference/resource-properties/columns.md) * [`constraints`](https://docs.getdbt.com/reference/resource-properties/constraints.md) #### Why define a contract?[​](#why-define-a-contract "Direct link to Why define a contract?") Defining a dbt model is as easy as writing a SQL `select` statement. Your query naturally produces a dataset with columns of names and types based on the columns you select and the transformations you apply. While this is ideal for quick and iterative development, for some models, constantly changing the shape of its returned dataset poses a risk when other people and processes are querying that model. It's better to define a set of upfront "guarantees" that define the shape of your model. We call this set of guarantees a "contract." While building your model, dbt will verify that your model's transformation will produce a dataset matching up with its contract, or it will fail to build. ###### Considerations[​](#considerations "Direct link to Considerations") There are some considerations to keep in mind when using model governance features: * Model governance features like model access, contracts, and versions strengthen trust and stability in your dbt project. Because they add structure, they can make rollbacks harder (for example, removing model access) and increase maintenance if adopted too early. Before adding governance features, consider whether your dbt project is ready to benefit from them. Introducing governance while models are still changing can complicate future changes. * Governance features are model-specific. They don't apply to other resource types, including snapshots, seeds, or sources. This is because these objects can change structure over time (for example, snapshots capture evolving historical data) and aren't suited to guarantees like contracts, access, or versioning. #### Prerequisites[​](#prerequisites "Direct link to Prerequisites") **These places support model contracts:** * `dbt_project.yml` file * `properties.yml` file * SQL models * Models materialized as one of the following: * `table` * `view` — views offer limited support for column names and data types, but not `constraints` * `incremental` — with `on_schema_change: append_new_columns` or `on_schema_change: fail` * Certain data platforms, but the supported and [enforced `constraints`](https://docs.getdbt.com/reference/resource-properties/constraints.md) vary by platform **These places do *NOT* support model contracts:** * Python models * `materialized view` or `ephemeral` — materialized SQL models * Custom materializations (unless added by the author) * Models with recursive CTE's in BigQuery * Other resource types, such as `sources`, `seeds`, `snapshots`, and so on #### Define a contract[​](#define-a-contract "Direct link to Define a contract") Let's say you have a model with a query like: models/marts/dim\_customers.sql ```sql -- lots of SQL final as ( select customer_id, customer_name, -- ... many more ... from ... ) select * from final ``` To enforce a model's contract, set `enforced: true` under the `contract` configuration. When enforced, your contract *must* include every column's `name` and `data_type` (where `data_type` matches one that your data platform understands). If your model is materialized as `table` or `incremental`, and depending on your data platform, you may optionally specify additional [constraints](https://docs.getdbt.com/reference/resource-properties/constraints.md), such as `not_null` (containing zero null values). models/marts/customers.yml ```yaml models: - name: dim_customers config: contract: enforced: true columns: - name: customer_id data_type: int constraints: - type: not_null - name: customer_name data_type: string ... ``` When building a model with a defined contract, dbt will do two things differently: 1. dbt will run a "preflight" check to ensure that the model's query will return a set of columns with names and data types matching the ones you have defined. This check is agnostic to the order of columns specified in your model (SQL) or YAML spec. 2. dbt will include the column names, data types, and constraints in the DDL statements it submits to the data platform, which will be enforced while building or updating the model's table, and order the columns per the contract instead of your dbt model. #### Platform constraint support[​](#platform-constraint-support "Direct link to Platform constraint support") Select the adapter-specific tab for more information on [constraint](https://docs.getdbt.com/reference/resource-properties/constraints.md) support across platforms. Constraints fall into three categories based on definability and platform enforcement: * **Definable and enforced** — The model won't build if it violates the constraint. * **Definable and not enforced** — The platform supports specifying the type of constraint, but a model can still build even if building the model violates the constraint. This constraint exists for metadata purposes only. This approach is more typical in cloud data warehouses than in transactional databases, where strict rule enforcement is more common. * **Not definable and not enforced** — You can't specify the type of constraint for the platform. - Redshift - Snowflake - BigQuery - Postgres - Spark - Databricks - Athena | Constraint type | Definable | Enforced | | --------------- | --------- | -------- | | not\_null | ✅ | ✅ | | primary\_key | ✅ | ❌ | | foreign\_key | ✅ | ❌ | | unique | ✅ | ❌ | | check | ❌ | ❌ | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | | Constraint type | Definable | Enforced | | --------------- | --------- | -------- | | not\_null | ✅ | ✅ | | primary\_key | ✅ | ❌ | | foreign\_key | ✅ | ❌ | | unique | ✅ | ❌ | | check | ❌ | ❌ | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | | Constraint type | Definable | Enforced | | --------------- | --------- | -------- | | not\_null | ✅ | ✅ | | primary\_key | ✅ | ❌ | | foreign\_key | ✅ | ❌ | | unique | ❌ | ❌ | | check | ❌ | ❌ | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | | Constraint type | Definable | Enforced | | --------------- | --------- | -------- | | not\_null | ✅ | ✅ | | primary\_key | ✅ | ✅ | | foreign\_key | ✅ | ✅ | | unique | ✅ | ✅ | | check | ✅ | ✅ | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | Currently, `not_null` and `check` constraints are enforced only after a model is built. Because of this platform limitation, dbt considers these constraints definable but not enforced, which means they're not part of the *model contract* since they can't be enforced at build time. This table will change as the features evolve. | Constraint type | Definable | Enforced | | --------------- | --------- | -------- | | not\_null | ✅ | ❌ | | primary\_key | ✅ | ❌ | | foreign\_key | ✅ | ❌ | | unique | ✅ | ❌ | | check | ✅ | ❌ | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | Currently, `not_null` and `check` constraints are enforced only after a model is built. Because of this platform limitation, dbt considers these constraints definable but not enforced, which means they're not part of the *model contract* since they can't be enforced at build time. This table will change as the features evolve. | Constraint type | Definable | Enforced | | --------------- | --------- | -------- | | not\_null | ✅ | ✅ | | primary\_key | ✅ | ❌ | | foreign\_key | ✅ | ❌ | | unique | ❌ | ❌ | | check | ✅ | ✅ | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | | Constraint type | Definable | Enforced | | --------------- | --------- | -------- | | not\_null | ❌ | ❌ | | primary\_key | ❌ | ❌ | | foreign\_key | ❌ | ❌ | | unique | ❌ | ❌ | | check | ❌ | ❌ | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | #### FAQs[​](#faqs "Direct link to FAQs") ##### Which models should have contracts?[​](#which-models-should-have-contracts "Direct link to Which models should have contracts?") Any model meeting the criteria described above *can* define a contract. We recommend defining contracts for ["public" models](https://docs.getdbt.com/docs/mesh/govern/model-access.md) that are being relied on downstream. * Inside of dbt: Shared with other groups, other teams, and [other dbt projects](https://docs.getdbt.com/best-practices/how-we-mesh/mesh-1-intro.md). * Outside of dbt: Reports, dashboards, or other systems & processes that expect this model to have a predictable structure. You might reflect these downstream uses with [exposures](https://docs.getdbt.com/docs/build/exposures.md). ##### How are contracts different from tests?[​](#how-are-contracts-different-from-tests "Direct link to How are contracts different from tests?") A model's contract defines the **shape** of the returned dataset. If the model's logic or input data doesn't conform to that shape, the model does not build. [Data Tests](https://docs.getdbt.com/docs/build/data-tests.md) are a more flexible mechanism for validating the content of your model *after* it's built. So long as you can write the query, you can run the data test. Data tests are more configurable, such as with [custom severity thresholds](https://docs.getdbt.com/reference/resource-configs/severity.md). They are easier to debug after finding failures because you can query the already-built model, or [store the failing records in the data warehouse](https://docs.getdbt.com/reference/resource-configs/store_failures.md). In some cases, you can replace a data test with its equivalent constraint. This has the advantage of guaranteeing the validation at build time, and it probably requires less compute (cost) in your data platform. The prerequisites for replacing a data test with a constraint are: * Making sure that your data platform can support and enforce the constraint that you need. Most platforms only enforce `not_null`. * Materializing your model as `table` or `incremental` (**not** `view` or `ephemeral`). * Defining a full contract for this model by specifying the `name` and `data_type` of each column. **Why aren't tests part of the contract?** In a parallel for software APIs, the structure of the API response is the contract. Quality and reliability ("uptime") are also very important attributes of an API's quality, but they are not part of the contract per se. When the contract changes in a backwards-incompatible way, it is a breaking change that requires a bump in major version. ##### Do I need to define every column for a contract?[​](#do-i-need-to-define-every-column-for-a-contract "Direct link to Do I need to define every column for a contract?") Yes. dbt contracts apply to *all* columns defined in a model, and they require declaring explicit expectations about *all* of those columns. The explicit declaration of a contract is not an accident — it's very much the intent of this feature. At the same time, for models with many columns, we understand that this can mean a *lot* of YAML. See [dbt-core#11764](https://github.com/dbt-labs/dbt-core/issues/11764) for discussion of potential approaches to generate and update model contract definitions. ##### How are breaking changes handled?[​](#how-are-breaking-changes-handled "Direct link to How are breaking changes handled?") When comparing to a previous project state, dbt will look for breaking changes that could impact downstream consumers. If breaking changes are detected, dbt will present a contract error. Breaking changes include: * Removing an existing column * Changing the data\_type of an existing column * Removing or modifying one of the `constraints` on an existing column (dbt v1.6 or higher) * Removing a contracted model by deleting, renaming, or disabling it (dbt v1.9 or higher). * versioned models will raise an error. unversioned models will raise a warning. More details are available in the [contract reference](https://docs.getdbt.com/reference/resource-configs/contract.md#incremental-models-and-on_schema_change). #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Model notifications Set up dbt to notify model owners through email about issues in your deployment environments. Configure dbt to send email notifications to model owners about issues in deployment [environments](https://docs.getdbt.com/docs/dbt-cloud-environments.md#types-of-environments) as soon as they happen — while the job is still running. Model owners can specify which statuses to receive notifications about: * **Success** and **Fails** for models * **Warning**, **Success**, and **Fails** for tests With model-level notifications, model owners can be the first ones to know about issues before anyone else (like the stakeholders). To be timely and keep the number of notifications to a reasonable amount when multiple models or tests trigger them, dbt observes the following guidelines when notifying the owners: * Send a notification to each unique owner/email during a job run about any models (with status of failure/success) or tests (with status of warning/failure/success). Each owner receives only one notification, the initial one. * No notifications sent about subsequent models or tests while a dbt job is still running. * Each owner/user who subscribes to notifications for one or more statuses (like failure, success, warning) will receive only *one* email notification at the end of the job run. * The email includes a consolidated list of all models or tests that match the statuses the user subscribed to, instead of sending separate emails for each status. Create configuration YAML files in your project for dbt to send notifications about the status of your models and tests in your deployment environments. #### Prerequisites[​](#prerequisites "Direct link to Prerequisites") * Your dbt administrator has [enabled the appropriate account setting](#enable-access-to-model-notifications) for you. * Your deployment environment(s) must be on a [release track](https://docs.getdbt.com/docs/dbt-versions/cloud-release-tracks.md) instead of a legacy dbt Core version. #### Configure groups[​](#configure-groups "Direct link to Configure groups") Define your [groups](https://docs.getdbt.com/docs/build/groups.md) in any `.yml` file in your [models directory](https://docs.getdbt.com/reference/project-configs/model-paths.md). Each group's owner can now specify one or multiple email addresses to receive model-level notifications. The `email` field supports a single email address as a string or a list of multiple email addresses. The following example shows how to define groups in a `groups.yml` file. models/groups.yml ```yml groups: - name: finance owner: # Email is required to receive model-level notifications, additional properties are also allowed. name: "Finance team" email: finance@dbtlabs.com - name: marketing owner: name: "Marketing team" email: marketing@dbtlabs.com config: meta: slack: '#marketing-team' # Example of multiple emails supported - name: documentation team owner: name: "Docs team" email: - docs@dbtlabs.com - community@dbtlabs.com - product@dbtlabs.com config: meta: slack: '#docs-fox' ``` tip The `owner` field supports `name` and `email`, which are required values. Additional arbitrary fields (such as `favorite_food`) are deprecated and will no longer be allowed in a future release. To store additional metadata (like Slack channels, team info, or custom attributes), use `config.meta` instead. #### Attach groups to models[​](#attach-groups-to-models "Direct link to Attach groups to models") Attach groups to models as you would any other config, in either the `dbt_project.yml` or `whatever.yml` files. For example: models/marts.yml ```yml models: - name: sales description: "Sales data model" config: group: finance - name: campaigns description: "Campaigns data model" config: group: marketing ``` By assigning groups in the `dbt_project.yml` file, you can capture all models in a subdirectory at once. In this example, model notifications related to staging models go to the data engineering group, `marts/sales` models to the finance team, and `marts/campaigns` models to the marketing team. dbt\_project.yml ```yml config-version: 2 name: "jaffle_shop" [...] models: jaffle_shop: staging: +group: data_engineering marts: sales: +group: finance campaigns: +group: marketing ``` Attaching a group to a model also encompasses its tests, so you will also receive notifications for a model's test failures. #### Enable access to model notifications[​](#enable-access-to-model-notifications "Direct link to Enable access to model notifications") Provide dbt account members the ability to configure and receive alerts about issues with models or tests that are encountered during job runs. To use model-level notifications, your dbt account must have access to the feature. Ask your dbt administrator to enable this feature for account members by following these steps: 1. Navigate to **Notification settings** from your profile name in the sidebar (lower left-hand side). 2. From **Email notifications**, enable the setting **Enable group/owner notifications on models** under the **Model notifications** section. Then, specify which statuses to receive notifications about (Success, Warning, and/or Fails). 3. Click **Save**. [![Example of the setting Enable group/owner notifications on models](/img/docs/dbt-cloud/example-enable-model-notifications.png?v=2 "Example of the setting Enable group/owner notifications on models")](#)Example of the setting Enable group/owner notifications on models #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Model performance EnterpriseEnterprise + ### Model performance [Enterprise](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing")[Enterprise +](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing") Catalog provides metadata on dbt runs for in-depth model performance and quality analysis. This feature assists in reducing infrastructure costs and saving time for data teams by highlighting where to fine-tune projects and deployments — such as model refactoring or job configuration adjustments. [![Overview of Performance page navigation.](/img/docs/collaborate/dbt-explorer/explorer-model-performance.gif?v=2 "Overview of Performance page navigation.")](#)Overview of Performance page navigation. On-demand learning If you enjoy video courses, check out our [dbt Catalog on-demand course](https://learn.getdbt.com/courses/dbt-catalog) and learn how to best explore your dbt project(s)! #### The Performance overview page[​](#the-performance-overview-page "Direct link to The Performance overview page") You can pinpoint areas for performance enhancement by using the Performance overview page. This page presents a comprehensive analysis across all project models and displays the longest-running models, those most frequently executed, and the ones with the highest failure rates during runs/tests. Data can be segmented by environment and job type which can offer insights into: * Most executed models (total count). * Models with the longest execution time (average duration). * Models with the most failures, detailing run failures (percentage and count) and test failures (percentage and count). Each data point links to individual models in Catalog. [![Example of Performance overview page](/img/docs/collaborate/dbt-explorer/example-performance-overview-page.png?v=2 "Example of Performance overview page")](#)Example of Performance overview page You can view historical metadata for up to the past three months. Select the time horizon using the filter, which defaults to a two-week lookback. [![Example of dropdown](/img/docs/collaborate/dbt-explorer/ex-2-week-default.png?v=2 "Example of dropdown")](#)Example of dropdown #### The Model performance tab[​](#the-model-performance-tab "Direct link to The Model performance tab") The **Model performance** section in Catalog displays historical trends to help you identify optimization opportunities and understand model resource consumption. ##### Key metrics[​](#key-metrics "Direct link to Key metrics") The **Model performance** section displays the following metrics that summarize the overall cost and optimization impact for your project: * **Total cost reduction** * **Total % reduction** * **Total query run time deduction** * **Reused assets** (when state-aware orchestration is enabled) ##### Filters[​](#filters "Direct link to Filters") Use the time period filter to customize the data you want to view: from the last 3 months up to the last 1 week. For **Cost insights**, **Usage**, and **Query run time** tabs, you can set the view granularity by **Daily**, **Weekly**, or **Monthly**. ##### Visualization tabs[​](#visualization-tabs "Direct link to Visualization tabs") * **Cost insights**: Shows the estimated warehouse costs incurred by this model and cost reduction from state-aware orchestration. * **Usage**: Shows the estimated warehouse usage consumed by this model over time. The **Usage** tab represents generic usage for your warehouse. The specific unit depends on your data warehouse: * Snowflake: Credits * BigQuery: Slot hours or bytes scanned (currently combined into one generic usage number) * Databricks: Databricks Units (DBUs) * **Query run time**: Shows the estimated query execution time and the reduction in run duration from state-aware orchestration. * **Build time**: Shows average execution time for the model and how it trends over the selected period. * **Build count**: Tracks how many times the model was built or reused, including any failures or errors. * **Test results**: Displays test execution outcomes and pass/fail rates for tests on this model. * **Consumption queries**: Shows queries running against this model, helping you understand downstream usage patterns. ##### Table view[​](#table-view "Direct link to Table view") For **Cost insights**, **Usage**, and **Query run time** tabs, you can access the table view by clicking **Show table**, which provides detailed optimization data such as models reused, usage reduction, and cost reduction. When viewing the table, you can export the data as a CSV file using the **Download** button. ##### Chart interactions[​](#chart-interactions "Direct link to Chart interactions") For **Build time** and **Build count** tabs: * Click on any data point in the charts to see a detailed table listing all job runs for that day. * Each row in the table provides a direct link to the run details if you want to investigate further. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Model query history EnterpriseEnterprise + ### Model query history [Enterprise](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing")[Enterprise +](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing") Model query history helps data teams track model usage by analyzing query logs. Model query history allows you to: * View the count of consumption queries for a model based on the data warehouse's query logs. * Provides data teams insight, so they can focus their time and infrastructure spend on the worthwhile used data products. * Enable analysts to find the most popular models used by other people. Model query history is powered by a single consumption query of the query log table in your data warehouse aggregated on a daily basis.  What is a consumption query? Consumption query is a metric of queries in your dbt project that has used the model in a given time. It filters down to `select` statements only to gauge model consumption and excludes dbt model build and test executions. So for example, if `model_super_santi` was queried 10 times in the past week, it would count as having 10 consumption queries for that particular time period. Support for Snowflake (Enterprise tier or higher) and BigQuery Model query history for Snowflake users is **only available for Enterprise tier or higher**. The feature also supports BigQuery. Additional platforms coming soon. #### Prerequisites[​](#prerequisites "Direct link to Prerequisites") To access the features, you should meet the following: 1. You have a dbt account on an [Enterprise-tier plan](https://www.getdbt.com/pricing/). Single-tenant accounts should contact their account representative for setup. 2. You have set up a [production](https://docs.getdbt.com/docs/deploy/deploy-environments.md#set-as-production-environment) deployment environment for each project you want to explore, with at least one successful job run. 3. You have [admin permissions](https://docs.getdbt.com/docs/cloud/manage-access/enterprise-permissions.md) in dbt to edit project settings or production environment settings. 4. Use Snowflake or BigQuery as your data warehouse and can enable [query history permissions](#snowflake-model-query-history) or work with an admin to do so. Support for additional data platforms coming soon. * For Snowflake users: You **must** have a Snowflake Enterprise tier or higher subscription. #### Enable query history in dbt[​](#enable-query-history-in-dbt "Direct link to Enable query history in dbt") To enable model query history in dbt, follow these steps: 1. Navigate to **Orchestration** and then **Environments**. 2. Select the environment marked **PROD** and click **Settings**. 3. Click **Edit** and scroll to the **Query History** section. 4. Click the **Test Permissions** button to validate the deployment credentials permissions are sufficient to support query history. 5. Click the **Enable query history** box to enable. 6. **Save** your settings. dbt automatically enables query history for brand new environments. If query history fails to retrieve data, dbt automatically disables it to prevent unintended warehouse costs. * If the failure is temporary (like a network timeout), dbt may retry. * If the problem keeps happening (for example, missing permissions), dbt turns off query history so customers don’t waste warehouse compute. To turn it back on, click **Test Permissions** in **Environment settings**. If the test succeeds, dbt re-enables the environment. [![Enable query history in your environment settings.](/img/docs/collaborate/dbt-explorer/enable-query-history.png?v=2 "Enable query history in your environment settings.")](#)Enable query history in your environment settings. #### Credential permissions[​](#credential-permissions "Direct link to Credential permissions") This section explains the permissions and steps you need to enable and view model query history in Catalog. The model query history feature uses the credentials in your production environment to gather metadata from your data warehouse’s query logs. This means you may need elevated permissions with the warehouse. Before making any changes to your data platform permissions, confirm the configured permissions in dbt: 1. Navigate to **Deploy** and then **Environments**. 2. Select the Environment marked **PROD** and click **Settings**. 3. Look at the information under **Deployment credentials**. * Note: Querying query history entails warehouse costs / uses credits. [![Confirm your deployment credentials in your environment settings page.](/img/docs/collaborate/dbt-explorer/model-query-credentials.jpg?v=2 "Confirm your deployment credentials in your environment settings page.")](#)Confirm your deployment credentials in your environment settings page. 4. Copy or cross reference those credential permissions with the warehouse permissions and grant your user the right permissions. ###### Snowflake model query history[​](#snowflake-model-query-history "Direct link to Snowflake model query history") Model query history makes use of metadata tables available to [Snowflake Enterprise tier](https://docs.snowflake.com/en/user-guide/intro-editions#enterprise-edition) accounts or higher, `QUERY_HISTORY` and `ACCESS_HISTORY`. The Snowflake user in the production environment must have the `GOVERNANCE_VIEWER` permission to view the data. Before enabling Model query history, your `ACCOUNTADMIN` must run the following grant statement in Snowflake to ensure for access: ```sql GRANT DATABASE ROLE SNOWFLAKE.GOVERNANCE_VIEWER TO ROLE ; ``` Without this grant, model query history won't display any data. For more details, view the snowflake docs [here](https://docs.snowflake.com/en/sql-reference/account-usage#enabling-other-roles-to-use-schemas-in-the-snowflake-database). ###### BigQuery model query history[​](#bigquery-model-query-history "Direct link to BigQuery model query history") The model query history uses metadata from the [`INFORMATION_SCHEMA.JOBS` view](https://docs.cloud.google.com/bigquery/docs/information-schema-jobs) in BigQuery. To access the metadata, the production environment user must have the correct [IAM role](https://docs.cloud.google.com/bigquery/docs/access-control#bigquery.resourceViewer) or permission to access this data: * If you use a BigQuery provided role, we recommend `roles/bigquery.resourceViewer`. * If you use a custom role, ensure it includes the `bigquery.jobs.listAll permission`. #### View query history in Explorer[​](#view-query-history-in-explorer "Direct link to View query history in Explorer") To enhance your discovery, you can view your model query history in various locations within Catalog: * [View from Performance charts](#view-from-performance-charts) * [View from Project lineage](#view-from-project-lineage) * [View from Model list](#view-from-model-list) ##### View from Performance charts[​](#view-from-performance-charts "Direct link to View from Performance charts") 1. Navigate to Catalog by clicking **Catalog** in the navigation. 2. In the main **Overview** page, click on **Performance** under the **Project details** section. Scroll down to view the **Most consumed models**. 3. Use the dropdown menu on the right to select the desired time period, with options available for up to the past 3 months. [![View most consumed models on the 'Performance' page in dbt Catalog.](/img/docs/collaborate/dbt-explorer/most-consumed-models.jpg?v=2 "View most consumed models on the 'Performance' page in dbt Catalog.")](#)View most consumed models on the 'Performance' page in dbt Catalog. 4. Click on a model for more details and go to the **Performance** tab. 5. On the **Performance** tab, scroll down to the **Model performance** section. 6. Select the **Consumption queries** tab to view the consumption queries over a given time for that model. [![View consumption queries over time for a given model.](/img/docs/collaborate/model-consumption-queries.jpg?v=2 "View consumption queries over time for a given model.")](#)View consumption queries over time for a given model. ##### View from Project lineage[​](#view-from-project-lineage "Direct link to View from Project lineage") 1. To view your model in your project lineage, go to the main **Overview page** and click on **Project lineage.** 2. In the lower left of your lineage, click on **Lenses** and select **Consumption queries**. [![View model consumption queries in your lineage using the 'Lenses' feature.](/img/docs/collaborate/dbt-explorer/model-consumption-lenses.jpg?v=2 "View model consumption queries in your lineage using the 'Lenses' feature.")](#)View model consumption queries in your lineage using the 'Lenses' feature. 3. Your lineage should display a small red box above each model, indicating the consumption query number. The number for each model represents the model consumption over the last 30 days. ##### View from Model list[​](#view-from-model-list "Direct link to View from Model list") 1. To view a list of models, go to the main **Overview page**. 2. In the left navigation, go to the **Resources** tab and click on **Models** to view the models list. 3. You can view the consumption query count for the models and sort by most or least consumed. The consumption query number for each model represents the consumption over the last 30 days. [![View models consumption in the 'Models' list page under the 'Consumption' column.](/img/docs/collaborate/dbt-explorer/model-consumption-list.jpg?v=2 "View models consumption in the 'Models' list page under the 'Consumption' column.")](#)View models consumption in the 'Models' list page under the 'Consumption' column. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Model versions Model versions, dbt\_project.yml versions, and .yml versions The word "version" appears in multiple places in docs site and with different meanings: * [Model versions](https://docs.getdbt.com/docs/mesh/govern/model-versions.md) — A dbt Mesh feature that enables better governance and data model management by allowing you to track changes and updates to models over time. * [dbt\_project.yml version](https://docs.getdbt.com/reference/project-configs/version.md#dbt_projectyml-versions)(optional) — `dbt_project.yml` version is unrelated to Mesh and refers to the compatibility of the dbt project with a specific version of dbt. * [.yml property file version](https://docs.getdbt.com/reference/project-configs/version.md#yml-property-file-versions)(optional) — Version numbers within .yml property files inform how dbt parses those YAML files. Unrelated to Mesh. Versioning APIs is a hard problem in software engineering. The root of the challenge is that the producers and consumers of an API have competing incentives: * Producers of an API need the ability to modify its logic and structure. There is a real cost to maintaining legacy endpoints forever, but losing the trust of downstream users is far costlier. * Consumers of an API need to trust in its stability: their queries will keep working, and won't break without warning. Although migrating to a newer API version incurs an expense, an unplanned migration is far costlier. When sharing a final dbt model with other teams or systems, that model is operating like an API. When the producer of that model needs to make significant changes, how can they avoid breaking the queries of its users downstream? Model versioning is a tool to tackle this problem, thoughtfully and head-on. The goal is not to make the problem go away entirely, nor to pretend it's easier or simpler than it is. ###### Considerations[​](#considerations "Direct link to Considerations") There are some considerations to keep in mind when using model governance features: * Model governance features like model access, contracts, and versions strengthen trust and stability in your dbt project. Because they add structure, they can make rollbacks harder (for example, removing model access) and increase maintenance if adopted too early. Before adding governance features, consider whether your dbt project is ready to benefit from them. Introducing governance while models are still changing can complicate future changes. * Governance features are model-specific. They don't apply to other resource types, including snapshots, seeds, or sources. This is because these objects can change structure over time (for example, snapshots capture evolving historical data) and aren't suited to guarantees like contracts, access, or versioning. #### Related documentation[​](#related-documentation "Direct link to Related documentation") * [`versions`](https://docs.getdbt.com/reference/resource-properties/versions.md) * [`latest_version`](https://docs.getdbt.com/reference/resource-properties/latest_version.md) * [`include` and `exclude`](https://docs.getdbt.com/reference/resource-properties/versions.md#include) * [`ref` with `version` argument](https://docs.getdbt.com/reference/dbt-jinja-functions/ref.md#versioned-ref) #### Why version a model?[​](#why-version-a-model "Direct link to Why version a model?") If a model defines a ["contract"](https://docs.getdbt.com/docs/mesh/govern/model-contracts.md) (a set of guarantees for its structure), it's also possible to change that model's structure in a way that breaks the previous set of guarantees. This could be as obvious as removing or renaming a column, or more subtle, like changing its data type or nullability. One approach is to force every model consumer to immediately handle the breaking change as soon as it's deployed to production. This is actually the appropriate answer at many smaller organizations, or while rapidly iterating on a not-yet-mature set of data models. But it doesn’t scale well beyond that. Instead, for mature models at larger organizations, powering queries inside & outside dbt, the model owner can use **model versions** to: * Test "prerelease" changes (in production, in downstream systems) * Bump the latest version, to be used as the canonical source of truth * Offer a migration window off the "old" version During that migration window, anywhere that model is being used downstream, it can continue to be referenced at a specific version. dbt Core 1.6 introduced first-class support for **deprecating models** by specifying a [`deprecation_date`](https://docs.getdbt.com/reference/resource-properties/deprecation_date.md). Taken together, model versions and deprecation offer a pathway for model producers to *sunset* old models, and consumers the time to *migrate* across breaking changes. It's a way of managing change across an organization: develop a new version, bump the latest, slate the old version for deprecation, update downstream references, and then remove the old version. There is a real trade-off that exists here—the cost to frequently migrate downstream code, and the cost (and clutter) of materializing multiple versions of a model in the data warehouse. Model versions do not make that problem go away, but by setting a deprecation date, and communicating a clear window for consumers to gracefully migrate off old versions, they put a known boundary on the cost of that migration. #### When should you version a model?[​](#when-should-you-version-a-model "Direct link to When should you version a model?") By enforcing a model's contract, dbt can help you catch unintended changes to column names and data types that could cause a big headache for downstream queriers. If you're making these changes intentionally, you should create a new model version. If you're making a non-breaking change, you don't need a new version—such as adding a new column, or fixing a bug in an existing column's calculation. Of course, it's possible to change a model's definition in other ways—recalculating a column in a way that doesn't change its name, data type, or enforceable characteristics—but would substantially change the results seen by downstream queriers. This is always a judgment call. As the maintainer of a widely-used model, you know best what's a bug fix and what's an unexpected behavior change. The process of sunsetting and migrating model versions requires real work, and likely significant coordination across teams. You should opt for non-breaking changes whenever possible. Inevitably, however, these non-breaking additions will leave your most important models with lots of unused or deprecated columns. Rather than constantly adding a new version for each small change, you should opt for a predictable cadence (once or twice a year, communicated well in advance) where you bump the "latest" version of your model, removing columns that are no longer being used. #### How is this different from "version control"?[​](#how-is-this-different-from-version-control "Direct link to How is this different from \"version control\"?") [Version control](https://docs.getdbt.com/docs/cloud/git/git-version-control.md) allows your team to collaborate simultaneously on a single code repository, manage conflicts between changes, and review changes before deploying into production. In that sense, version control is an essential tool for versioning the deployment of an entire dbt project—always the latest state of the `main` branch. In general, only one version of your project code is deployed into an environment at a time. If something goes wrong, you have the ability to roll back changes by reverting a commit or pull request, or by leveraging data platform capabilities around "time travel." When you make updates to a model's source code — its logical definition, in SQL or Python, or related configuration — dbt can [compare your project to the previous state](https://docs.getdbt.com/reference/node-selection/syntax.md#about-node-selection), enabling you to rebuild only models that have changed, and models downstream of a change. In this way, it's possible to develop changes to a model, quickly test in CI, and efficiently deploy into production — all coordinated via your version control system. **Versioned models are different.** Defining model `versions` is appropriate when people, systems, and processes beyond your team's control, inside or outside of dbt, depend on your models. You can neither simply go migrate them all, nor break their queries on a whim. You need to offer a migration path, with clear diffs and deprecation dates. Multiple versions of a model will live in the same code repository at the same time, and be deployed into the same data environment simultaneously. This is similar to how web APIs are versioned: Multiple versions live simultaneously, two or three, and not more). Over time, newer versions come online, and older versions are sunsetted . #### How is this different from just creating a new model?[​](#how-is-this-different-from-just-creating-a-new-model "Direct link to How is this different from just creating a new model?") Honestly, it's only a little bit different! There isn't much magic here, and that's by design. You've always been able to copy-paste, create a new model file, and name it `dim_customers_v2.sql`. Why should you opt for a "real" versioned model instead? As the **producer** of a versioned model: * You keep track of all live versions in one place, rather than scattering them throughout the codebase * You can reuse the model's configuration, and highlight just the diffs between versions * You can select models to build (or not) based on whether they're a `latest`, `prerelease`, or `old` version * dbt will notify consumers of your versioned model when new versions become available, or when they are slated for deprecation As the **consumer** of a versioned model: * You use a consistent `ref`, with the option of pinning to a specific live version * You will be notified throughout the life cycle of a versioned model All versions of a model preserve the model's original name. They are `ref`'d by that name, rather than the name of the file that they're defined in. By default, the `ref` resolves to the latest version (as declared by that model's maintainer), but you can also `ref` a specific version of the model, with a `version` keyword. Let's say that `dim_customers` has three versions defined: `v2` is the "latest", `v3` is "prerelease," and `v1` is an old version that's still within its deprecation window. Because `v2` is the latest version, it gets some special treatment: it can be defined in a file without a suffix, and `ref('dim_customers')` will resolve to `v2` if a version pin is not specified. The table below breaks down the standard conventions: | v | version | `ref` syntax | File name | Database relation | | - | ------------ | ---------------------------------------------------------- | ------------------------------------------------- | ---------------------------------------------------------------------------- | | 3 | "prerelease" | `ref('dim_customers', v=3)` | `dim_customers_v3.sql` | `analytics.dim_customers_v3` | | 2 | "latest" | `ref('dim_customers', v=2)` **and** `ref('dim_customers')` | `dim_customers_v2.sql` **or** `dim_customers.sql` | `analytics.dim_customers_v2` **and** `analytics.dim_customers` (recommended) | | 1 | "old" | `ref('dim_customers', v=1)` | `dim_customers_v1.sql` | `analytics.dim_customers_v1` | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | As you'll see in the implementation section below, a versioned model can reuse the majority of its YAML properties and configuration. Each version needs to only say how it *differs* from the shared set of attributes. This gives you, as the producer of a versioned model, the opportunity to highlight the differences across versions—which is otherwise difficult to detect in models with dozens or hundreds of columns—and to clearly track, in one place, all versions of the model which are currently live. dbt also supports [`version`-based selection](https://docs.getdbt.com/reference/node-selection/methods.md#version). For example, you could define a [default YAML selector](https://docs.getdbt.com/reference/node-selection/yaml-selectors.md#default) that avoids running any old model versions in development, even while you continue to run them in production through a sunset and migration period. (You could accomplish something similar by applying `tags` to these models, and cycling through those tags over time.) selectors.yml ```yml selectors: - name: exclude_old_versions default: "{{ target.name == 'dev' }}" definition: method: fqn value: "*" exclude: - method: version value: old ``` Because dbt knows that these models are *actually the same model*, it can notify downstream consumers as new versions become available, and as older versions are slated for deprecation. ```bash Found an unpinned reference to versioned model 'dim_customers'. Resolving to latest version: my_model.v2 A prerelease version 3 is available. It has not yet been marked 'latest' by its maintainer. When that happens, this reference will resolve to my_model.v3 instead. Try out v3: {{ ref('my_dbt_project', 'my_model', v='3') }} Pin to v2: {{ ref('my_dbt_project', 'my_model', v='2') }} ``` #### How to create a new version of a model[​](#how-to-create-a-new-version-of-a-model "Direct link to How to create a new version of a model") Most often, you'll start with a model that is not yet versioned. Let's go back in time to when `dim_customers` was a simple standalone model, with an enforced contract. For simplicity, let's pretend it has only two columns, `customer_id` and `country_name`, though most mature models will have many more. models/dim\_customers.sql ```sql -- lots of sql final as ( select customer_id, country_name from ... ) select * from final ``` models/schema.yml ```yaml models: - name: dim_customers config: materialized: table contract: enforced: true columns: - name: customer_id description: This is the primary key data_type: int - name: country_name description: Where this customer lives data_type: varchar ``` Let's say you need to make a breaking change to the model: Removing the `country_name` column, which is no longer reliable. First, create a new model file (SQL or Python) encompassing those breaking changes. The default convention is naming the new file with a `_v` suffix. Let's make a new file, named `dim_customers_v2.sql`. (We don't need to rename the existing model file just yet, while it's still the "latest" version.) models/dim\_customers\_v2.sql ```sql -- lots of sql final as ( select customer_id -- country_name has been removed! from ... ) select * from final ``` Now, you could define properties and configuration for `dim_customers_v2` as a new standalone model, with no actual relation to `dim_customers` save a striking resemblance. Instead, we're going to declare that these are versions of the same model, both named `dim_customers`. We can define their properties in common, and then **just** highlight the diffs between them. (Or, you can choose to define each model version with full specifications, and repeat the values they have in common.) * Diffs only (recommended) * Fully specified models/schema.yml ```yaml models: - name: dim_customers latest_version: 1 config: materialized: table contract: {enforced: true} columns: - name: customer_id description: This is the primary key data_type: int - name: country_name description: Where this customer lives data_type: varchar # Declare the versions, and highlight the diffs versions: - v: 1 # Matches what's above -- nothing more needed - v: 2 # Removed a column -- this is the breaking change! columns: # This means: use the 'columns' list from above, but exclude country_name - include: all exclude: [country_name] ``` models/schema.yml ```yaml models: - name: dim_customers latest_version: 1 # declare the versions, and fully specify them versions: - v: 2 config: materialized: table contract: {enforced: true} columns: - name: customer_id description: This is the primary key data_type: int # no country_name column - v: 1 config: materialized: table contract: {enforced: true} columns: - name: customer_id description: This is the primary key data_type: int - name: country_name description: Where this customer lives data_type: varchar ``` Note: If none of your model versions specify columns, you don't need to define columns at all and can omit the `columns/include`/`exclude` keys from the versioned model. In this case, dbt will automatically use all top-level columns for all versions. The configuration above says: Instead of two unrelated models, I have two versioned definitions of the same model: `dim_customers_v1` and `dim_customers_v2`. **Where are they defined?** dbt expects each model version to be defined in a file named `_v`. In this case: `dim_customers_v1.sql` and `dim_customers_v2.sql`. It's also possible to define the "latest" version in `dim_customers.sql` (no suffix), without additional configuration. Finally, you can override this convention by setting [`defined_in: any_file_name_you_want`](https://docs.getdbt.com/reference/resource-properties/versions.md#defined_in)—but we strongly encourage you to follow the convention, unless you have a very good reason. **Where will they be materialized?** Each model version will create a database relation with alias `_v`. In this case: `dim_customers_v1` and `dim_customers_v2`. See [the section below](#configuring-database-location-with-alias) for more details on configuring aliases. **Which version is "latest"?** If not specified explicitly, the `latest_version` would be `2`, because it's numerically greatest. In this case, we've explicitly specified that `latest_version: 1`. That means `v2` is a "prerelease," in early development and testing. When we're ready to roll out `v2` to everyone by default, we would bump `latest_version: 2`, or remove `latest_version` from the specification. ##### Configuring versioned models[​](#configuring-versioned-models "Direct link to Configuring versioned models") You can reconfigure each version independently. For example, you could materialize `v2` as a table and `v1` as a view: models/schema.yml ```yml versions: - v: 2 config: materialized: table - v: 1 config: materialized: view ``` Like with all config inheritance, any configs set *within* the versioned model's definition (`.sql` or `.py` file) will take precedence over the configs set in YAML. ##### Configuring database location with `alias`[​](#configuring-database-location-with-alias "Direct link to configuring-database-location-with-alias") Following the example, let's say you wanted `dim_customers_v1` to continue populating the database table named `dim_customers`. That's what the table was named previously, and you may have several other dashboards or tools expecting to read its data from `..dim_customers`. You could use the `alias` configuration: models/schema.yml ```yml - v: 1 config: alias: dim_customers # keep v1 in its original database location ``` **The pattern we recommend:** Create a view or table clone with the model's canonical name that always points to the latest version. By following this pattern, you can offer the same flexibility as `ref`, even if someone is querying outside of dbt. Want a specific version? Pin to version X by adding the `_vX` suffix. Want the latest version? No suffix, and the view will redirect you. We intend to build this into `dbt-core` as out-of-the-box functionality. (Upvote or comment on [dbt-core#7442](https://github.com/dbt-labs/dbt-core/issues/7442).) In the meantime, you can implement this pattern yourself with a custom macro and post-hook: macros/create\_latest\_version\_view.sql ```sql {% macro create_latest_version_view() %} -- this hook will run only if the model is versioned, and only if it's the latest version -- otherwise, it's a no-op {% if model.get('version') and model.get('version') == model.get('latest_version') %} {% set new_relation = this.incorporate(path={"identifier": model['name']}) %} {% set existing_relation = load_relation(new_relation) %} {% if existing_relation and not existing_relation.is_view %} {{ drop_relation_if_exists(existing_relation) }} {% endif %} {% set create_view_sql -%} -- this syntax may vary by data platform create or replace view {{ new_relation }} as select * from {{ this }} {%- endset %} {% do log("Creating view " ~ new_relation ~ " pointing to " ~ this, info = true) if execute %} {{ return(create_view_sql) }} {% else %} -- no-op select 1 as id {% endif %} {% endmacro %} ``` dbt\_project.yml ```yml # dbt_project.yml models: post-hook: - "{{ create_latest_version_view() }}" ``` info If your project has historically implemented [custom aliases](https://docs.getdbt.com/docs/build/custom-aliases.md) by reimplementing the `generate_alias_name` macro, and you'd like to start using model versions, you should update your custom implementation to account for model versions. Specifically, we'd encourage you to add [a condition like this one](https://github.com/dbt-labs/dbt-core/blob/ada8860e48b32ac712d92e8b0977b2c3c9749981/core/dbt/include/global_project/macros/get_custom_name/get_custom_alias.sql#L26-L30). Your existing implementation of `generate_alias_name` should not encounter any errors upon first upgrading to v1.5. It's only when you create your first versioned model, that you may see an error like: ```sh dbt.exceptions.AmbiguousAliasError: Compilation Error dbt found two resources with the database representation "database.schema.model_name". dbt cannot create two resources with identical database representations. To fix this, change the configuration of one of these resources: - model.project_name.model_name.v1 (models/.../model_name.sql) - model.project_name.model_name.v2 (models/.../model_name_v2.sql) ``` We opted to use `generate_alias_name` for this functionality so that the logic remains accessible to end users, and could be reimplemented with custom logic. ##### Run a model with multiple versions[​](#run-a-model-with-multiple-versions "Direct link to Run a model with multiple versions") To run a model with multiple versions, you can use the [`--select` flag](https://docs.getdbt.com/reference/node-selection/syntax.md). For example: * Run all versions of `dim_customers`: ```bash dbt run --select dim_customers # Run all versions of the model ``` * Run only version 2 of `dim_customers`: You can use either of the following commands (both achieve the same result): ```bash dbt run --select dim_customers.v2 # Run a specific version of the model dbt run --select dim_customers_v2 # Alternative syntax for the specific version ``` * Run the latest version of `dim_customers` using the `--select` flag shorthand: ```bash dbt run -s dim_customers,version:latest # Run the latest version of the model ``` These commands provide flexibility in managing and executing different versions of a dbt model. ##### Optimizing model versions[​](#optimizing-model-versions "Direct link to Optimizing model versions") How you define each model version is completely up to you. While it's easy to start by copy-pasting from one model's SQL definition into another, you should think about *what actually is changing* from one version to another. For example, if your new model version is only renaming or removing certain columns, you could define one version as a view on top of the other one: models/dim\_customers\_v2.sql ```sql {{ config(materialized = 'view') }} {% set dim_customers_v1 = ref('dim_customers', v=1) %} select {{ dbt_utils.star(from=dim_customers_v1, except=["country_name"]) }} from {{ dim_customers_v1 }} ``` Of course, if one model version makes meaningful and substantive changes to logic in another, it may not be possible to optimize it in this way. At that point, the cost of human intuition and legibility is more important than the cost of recomputing similar transformations. We expect to develop more opinionated recommendations as teams start adopting model versions in practice. One recommended pattern we can envision: Prioritize the definition of the `latest_version`, and define other versions (old and prerelease) based on their diffs from the latest. How? * Define the properties and configuration for the latest version in the top-level model YAML, and the diffs for other versions below (via `include`/`exclude`) * Where possible, define other versions as `select` transformations, which take the latest version as their starting point * When bumping the `latest_version`, migrate the SQL and YAML accordingly. In the example above, the third point might be tricky. It's easier to *exclude* `country_name`, than it is to add it back in. Instead, we might need to keep around the full original logic for `dim_customers_v1`—but materialize it as a `view`, to minimize the data warehouse cost of building it. If downstream queriers see slightly degraded performance, it's still significantly better than broken queries, and all the more reason to migrate to the new "latest" version. #### Coordinate model versioning[​](#coordinate-model-versioning "Direct link to Coordinate model versioning") Safely releasing a new model version requires coordination between model producers (who build the models) and model consumers (who depend on them). For practical guidance on how producers and consumers should communicate, test, and roll out versioned models across projects, refer to [Coordinating model versions best practices](https://docs.getdbt.com/best-practices/how-we-mesh/mesh-6-coordinate-versions.md). #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Monitor jobs and alerts Monitor your dbt jobs to help identify improvement and set up alerts to proactively alert the right people or team. This portion of our documentation will go over dbt's various capabilities that help you monitor your jobs and set up alerts to ensure seamless orchestration, including: * [Visualize and orchestrate downstream exposures](https://docs.getdbt.com/docs/deploy/orchestrate-exposures.md) [Enterprise](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing")[Enterprise +](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing") — Automatically visualize and orchestrate exposures from dashboards and proactively refresh the underlying data sources during scheduled dbt jobs. * [Leverage artifacts](https://docs.getdbt.com/docs/deploy/artifacts.md) — dbt generates and saves artifacts for your project, which it uses to power features like creating docs for your project and reporting freshness of your sources. * [Job notifications](https://docs.getdbt.com/docs/deploy/job-notifications.md) — Receive email, Slack, or Microsoft Teams notifications when a job run succeeds, encounters warnings, fails, or is canceled. * [Model notifications](https://docs.getdbt.com/docs/deploy/model-notifications.md) — Receive email notifications about any issues encountered by your models and tests as soon as they occur while running a job. * [Retry jobs](https://docs.getdbt.com/docs/deploy/retry-jobs.md) — Rerun your errored jobs from start or the failure point. * [Run visibility](https://docs.getdbt.com/docs/deploy/run-visibility.md) — View your run history to help identify where improvements can be made to scheduled jobs. * [Source freshness](https://docs.getdbt.com/docs/deploy/source-freshness.md) — Monitor data governance by enabling snapshots to capture the freshness of your data sources. * [Webhooks](https://docs.getdbt.com/docs/deploy/webhooks.md) — Use webhooks to send events about your dbt jobs' statuses to other systems. To set up and add data health tiles to view data freshness and quality checks in your dashboard, refer to [data health tiles](https://docs.getdbt.com/docs/explore/data-tile.md). [![An overview of a dbt job run which contains run summary, job trigger, run duration, and more.](/img/docs/dbt-cloud/deployment/deploy-scheduler.png?v=2 "An overview of a dbt job run which contains run summary, job trigger, run duration, and more.")](#)An overview of a dbt job run which contains run summary, job trigger, run duration, and more. [![Run history dashboard allows you to monitor the health of your dbt project and displays jobs, job status, environment, timing, and more.](/img/docs/dbt-cloud/deployment/run-history.png?v=2 "Run history dashboard allows you to monitor the health of your dbt project and displays jobs, job status, environment, timing, and more.")](#)Run history dashboard allows you to monitor the health of your dbt project and displays jobs, job status, environment, timing, and more. [![Access logs for run steps](/img/docs/dbt-cloud/deployment/access-logs.gif?v=2 "Access logs for run steps")](#)Access logs for run steps #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Navigate the dbt Insights interface EnterpriseEnterprise + ### Navigate the dbt Insights interface [Enterprise](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing")[Enterprise +](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing") Learn how to navigate Insights interface and use the main components. Insights provides an interactive interface for writing, running, and analyzing SQL queries. This section highlights the main components of Insights. #### Query console[​](#query-console "Direct link to Query console") The query console is the main component of Insights. It allows you to write, run, and analyze SQL queries. The Query console supports: 1. Query console editor, which allows you to write, run, and analyze SQL queries: * It supports syntax highlighting and autocomplete suggestions * Hyperlink from SQL code `ref` to the corresponding Explorer page 2. [Query console menu](#query-console-menu), which contains **Bookmark (icon)**, **Develop**, and **Run** buttons. 3. [Query output panel](#query-output-panel), below the query editor and displays the results of a query: * Has three tabs: **Data**, **Chart**, and **Details**, which allow you to analyze query execution and visualize results. 4. [Query console sidebar menu](#query-console-sidebar-menu), which contains the **Catalog**, **Bookmark**, **Query history**, and **Copilot** icons. [![dbt Insights main interface with blank query editor](/img/docs/dbt-insights/insights-main.png?v=2 "dbt Insights main interface with blank query editor")](#)dbt Insights main interface with blank query editor ##### Query console menu[​](#query-console-menu "Direct link to Query console menu") The Query console menu is located at the top right of the Query editor. It contains the **Bookmark**, **Develop**, and **Run** buttons: * **Bookmark** button — Save your frequently used SQL queries as favorites for easier access. * When you click **Bookmark**, a **Bookmark Query Details** modal (pop up box) will appear where you can add a **Title** and **Description**. * Let [Copilot](https://docs.getdbt.com/docs/cloud/dbt-copilot.md) do the writing for you — use the AI assistant to automatically generate a helpful description for your bookmark. * Access the newly created bookmark from the **Bookmark** icon in the [Query console sidebar menu](#query-console-sidebar-menu). * **Develop**: Open the [Studio IDE](https://docs.getdbt.com/docs/cloud/studio-ide/develop-in-studio.md) or [Canvas](https://docs.getdbt.com/docs/cloud/canvas.md) to continue editing your SQL query. * **Run** button — Run your SQL query and view the results in the **Data** tab. #### Semantic Layer querying[​](#semantic-layer-querying "Direct link to Semantic Layer querying") The Semantic Layer querying in dbt Insights lets you build queries against the Semantic Layer without writing SQL code. It guides you in creating queries based on available metrics, dimensions, and entities. With this feature, you can: * Build analyses from your predefined semantic layer metrics. * Have filters, time ranges, and aggregates tailored to the semantic model. * View the underlying SQL code for each metric query. To build a query in dbt Insights: 1. From the main menu, go to **Insights**. 2. Click **Build a query**. 3. Select what you want to include in your query. * Click **Add Metric** to select the metrics for your query. * Click **Add Group by** to choose the dimensions that break down your metric, such as time grain (day, week, month), region, product, or customer. * Click **Add Filter** to create a filter to narrow your results. * Click **Add Order by** to select how you want to sort the results of your query. * Click **Add Limit**, select the amount of results you want to see when you run your query. If left blank, you will get all the results. 4. Click **Run** to run your query. Results are available in the **Data** tab. You can see the SQL code generated in the **Details** tab. [![Semantic Layer querying within dbt Insights](/img/docs/dbt-insights/insights-query-builder-interface.png?v=2 "Semantic Layer querying within dbt Insights")](#)Semantic Layer querying within dbt Insights [![Results are displayed in the Data tab](/img/docs/dbt-insights/insights-query-builder.png?v=2 "Results are displayed in the Data tab")](#)Results are displayed in the Data tab [![The generated SQL code in the Details tab](/img/docs/dbt-insights/insights-query-builder-sql.png?v=2 "The generated SQL code in the Details tab")](#)The generated SQL code in the Details tab #### Query output panel[​](#query-output-panel "Direct link to Query output panel") The Query output panel is below the query editor and displays the results of a query. It displays the following tabs to analyze query execution and visualize results: * **Data** tab — Preview your SQL results, with results paginated. * **Details** tab — Generates succinct details of executed SQL query: * Query metadata — Copilot's AI-generated title and description. Along with the supplied SQL and compiled SQL. * Connection details — Relevant data platform connection information. * Query details — Query duration, status, column count, row count. * **Chart** tab — Visualizes query results with built-in charts. * Use the chart icon to select the type of chart you want to visualize your results. Available chart types are **line chart, bar chart, or scatterplot**. * Use the **Chart settings** to customize the chart type and the columns you want to visualize. * Available chart types are **line chart, bar chart, or scatterplot**. * **Download** button — Allows you to export the results to CSV [![dbt Insights Data tab](/img/docs/dbt-insights/insights-chart-tab.png?v=2 "dbt Insights Data tab")](#)dbt Insights Data tab [![dbt Insights Chart tab](/img/docs/dbt-insights/insights-chart.png?v=2 "dbt Insights Chart tab")](#)dbt Insights Chart tab [![dbt Insights Details tab](/img/docs/dbt-insights/insights-details.png?v=2 "dbt Insights Details tab")](#)dbt Insights Details tab #### Query console sidebar menu[​](#query-console-sidebar-menu "Direct link to Query console sidebar menu") The Query console sidebar menu and icons contains the following options: ##### dbt Catalog[​](#dbt-catalog "Direct link to dbt Catalog") **Catalog icon** — View your project's models, columns, metrics, and more using the integrated Catalog view. [![dbt Insights dbt Catalog icon](/img/docs/dbt-insights/insights-explorer.png?v=2 "dbt Insights dbt Catalog icon")](#)dbt Insights dbt Catalog icon ##### Bookmark[​](#bookmark "Direct link to Bookmark") Save and access your frequently used queries. [![Manage your query bookmarks](/img/docs/dbt-insights/manage-bookmarks.png?v=2 "Manage your query bookmarks")](#)Manage your query bookmarks ##### Query history[​](#query-history "Direct link to Query history") View past queries, their statuses (All, Success, Error, or Pending), start time, and duration. Search for past queries and filter by status. You can also re-run a query from the Query history. [![dbt Insights Query history icon](/img/docs/dbt-insights/insights-query-history.png?v=2 "dbt Insights Query history icon")](#)dbt Insights Query history icon ##### dbt Copilot[​](#dbt-copilot "Direct link to dbt Copilot") Use [dbt Copilot's AI assistant](https://docs.getdbt.com/docs/cloud/dbt-copilot.md) to modify or generate queries using natural language prompts or to chat with the Analyst agent to gather insights about your data. There are two ways you can use dbt Copilot in Insights to interact with your data: [![dbt Copilot in Insights](/img/docs/dbt-insights/insights-copilot-tabs.png?v=2 "dbt Copilot in Insights")](#)dbt Copilot in Insights * **Agent** [Beta](https://docs.getdbt.com/docs/dbt-versions/product-lifecycles "Go to https://docs.getdbt.com/docs/dbt-versions/product-lifecycles") - Ask questions to the Analyst agent to get intelligent data analysis with automated workflows, governed insights, and actionable recommendations. This is a conversational AI feature where you can ask natural language prompts and receive analysis in real-time. To enable the Analyst agent, enable beta features under **Account settings** > **Personal profile** > **Experimental features**. For steps on how to enable, see [Preview new dbt platform features](https://docs.getdbt.com/docs/dbt-versions/experimental-features.md). Some sample questions you can ask the agent: * *What region are my sales growing the fastest?* * *What was the revenue last month?* * *How should I optimize my marketing spend next quarter?* * *How many customers do I have, broken down by customer type?* The Analyst agent creates an analysis plan based on your question. The agent: 1. Gets context using your semantic models and metrics. 2. Generates SQL queries using your project's definitions. 3. Executes the SQL query and returns results with context. 4. Reviews and summarizes the generated insights and provides a comprehensive answer. The agent can loop through these steps multiple times if it hasn't reached a complete answer, allowing for complex, multi-step analysis.⁠ For more information, see [Analyze data with the Analyst agent](https://docs.getdbt.com/docs/cloud/use-dbt-copilot.md#analyze-data-with-the-analyst-agent). * **Generate SQL** - Build queries in Insights with natural language prompts to explore and query data with an intuitive, context-rich interface. For more information, see [Build queries](https://docs.getdbt.com/docs/cloud/use-dbt-copilot.md#build-queries). #### LSP features[​](#lsp-features "Direct link to LSP features") The following Language Server Protocol (LSP) features are available for projects upgraded to Fusion: * **Live CTE previews:** Preview a CTE’s output for faster validation and debugging. [![Preview CTE in Insights](/img/docs/dbt-insights/preview-cte.png?v=2 "Preview CTE in Insights")](#)Preview CTE in Insights * **Real-time error detection:** Automatically validate your SQL code to detect errors and surface warnings, without hitting the warehouse. This includes both dbt errors (like invalid `ref`) and SQL errors (like invalid column name or SQL syntax). [![Live error detection](/img/docs/dbt-insights/sql-validation.png?v=2 "Live error detection")](#)Live error detection * **`ref` suggestions:** Autocomplete model names when using the `ref()` function to reference other models in your project. [![ref suggestions in Insights](/img/docs/dbt-insights/ref-autocomplete.png?v=2 "ref suggestions in Insights")](#)ref suggestions in Insights * **Hover insights:** View context on tables, columns, and functions without leaving your code. Hover over any SQL element to see details like column names and data types. [![Sample column details](/img/docs/dbt-insights/column-info.png?v=2 "Sample column details")](#)Sample column details [![Sample column details](/img/docs/dbt-insights/column-hover.png?v=2 "Sample column details")](#)Sample column details #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Navigating the state-aware interface Learn how to navigate the state-aware orchestration interface for better visibility into model builds and cost tracking. #### Models built and reused chart[​](#models-built-and-reused-chart "Direct link to Models built and reused chart") When you go to your **Account home**, you'll see a chart showing the number of models built and reused, giving you visibility into how state-aware orchestration is optimizing your data builds. This chart helps you to: * **Track effectiveness of state-aware orchestration** — See how state-aware orchestration reduces unnecessary model rebuilds by only building models when there are changes to the data or code⁠. This chart provides transparency into how the optimization is working across your dbt implementation. * **Analyze build patterns** — Gain insights into your project's build frequency and identify opportunities for further optimization. You can also view the number of reused models per project in the **Accounts home** page. [![Models built and reused chart in Account home](/img/docs/dbt-cloud/using-dbt-cloud/account-home-chart.png?v=2 "Models built and reused chart in Account home")](#)Models built and reused chart in Account home [![View reused models count per project in the Accounts home page](/img/docs/deploy/sao-model-reuse.png?v=2 "View reused models count per project in the Accounts home page")](#)View reused models count per project in the Accounts home page #### Model consumption view in jobs[​](#model-consumption-view-in-jobs "Direct link to Model consumption view in jobs") State-aware jobs provide charts that show information about your job runs, and how many models were built and reused by your job in the past week, in the last 14 days, or in the last 30 days. In the **Overview** section of your job, the following charts are available: Under the **Runs** tab: * **Recent runs** * **Total run duration time** [![Charts for Recent runs and Total run duration time](/img/docs/dbt-cloud/using-dbt-cloud/sao-runs-chart.png?v=2 "Charts for Recent runs and Total run duration time")](#)Charts for Recent runs and Total run duration time Under the **Models** tab: * **Models built** * **Models reused** [![Charts for Models built and Models reused](/img/docs/dbt-cloud/using-dbt-cloud/sao-models-chart.png?v=2 "Charts for Models built and Models reused")](#)Charts for Models built and Models reused #### Logs view of built models[​](#logs-view-of-built-models "Direct link to Logs view of built models") When running a job, a structured logs view shows which models were built, skipped, or reused. [![Logs view of built models](/img/docs/dbt-cloud/using-dbt-cloud/sao-logs-view.png?v=2 "Logs view of built models")](#)Logs view of built models 1. Each model has an icon indicating its status. 2. The **Reused** tab indicates the total number of reused models. 3. You can use the search bar or filter the logs to show **All**, **Success**, **Warning**, **Failed**, **Running**, **Skipped**, **Reused**, or **Debugged** messages. 4. Detailed log messages are provided to get more context on why models were built, reused, or skipped. These messages are highlighted in the logs. #### Reused tag in the Latest status lens[​](#reused-tag-in-the-latest-status-lens "Direct link to Reused tag in the Latest status lens") Lineage lenses are interactive visual filters in [dbt Catalog](https://docs.getdbt.com/docs/explore/explore-projects.md#lenses) that show additional context on your lineage graph to understand how resources are defined or performing. When you apply a lens, tags become visible on the nodes in the lineage graph, indicating the layer value along with coloration based on that value. If you're significantly zoomed out, only the tags and their colors are visible in the graph. The **Latest status** lens shows the status from the latest execution of the resource in the current environment. When you use this lens to view your lineage, models that were reused from state-aware orchestration are tagged with **Reused**. [![Latest status lens showing reused models](/img/docs/dbt-cloud/using-dbt-cloud/sao-latest-status-lens.png?v=2 "Latest status lens showing reused models")](#)Latest status lens showing reused models To view your lineage with the **Latest status** lens: 1. From the main menu, go to **Orchestration** > **Runs**. 2. Select your run. 3. Go to the **Lineage** tab. The lineage of your project appears. 4. In the **Lenses** field, select **Latest status**. #### Clear cache button[​](#clear-cache-button "Direct link to Clear cache button") State-aware orchestration uses a cached hash of both code and data state for each model in an environment stored in Redis. When running a job, dbt checks if there are changes in the hash for the model being built between the saved state in Redis and the current state that would be built by the job. If there is a change, dbt builds the model. If there are no changes, dbt reuses the model from the last time it was built. * To wipe this state clean and start again, clear the cache by going to **Orchestration** > **Environments**. Select your environment and click the **Clear cache** button. * The **Clear cache** button is only available if you have enabled state-aware orchestration. * After clearing the cache, the next run rebuilds every model from scratch. Subsequent runs rely on the regenerated cache. [![Clear cache button](/img/docs/dbt-cloud/using-dbt-cloud/sao-clear-cache.png?v=2 "Clear cache button")](#)Clear cache button #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Orchestrate downstream exposures EnterpriseEnterprise +Beta ### Orchestrate downstream exposures [Enterprise](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing")[Enterprise +](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing")[Beta](https://docs.getdbt.com/docs/dbt-versions/product-lifecycles "Go to https://docs.getdbt.com/docs/dbt-versions/product-lifecycles") Use dbt [Cloud job scheduler](https://docs.getdbt.com/docs/deploy/job-scheduler.md) to proactively refresh downstream exposures and the underlying data sources (extracts) that power your Tableau Workbooks. Available in private beta Orchestrating exposures is currently available in private beta to dbt Enterprise accounts. To join the beta, contact your account representative. Orchestrating exposures integrates with [downstream exposures](https://docs.getdbt.com/docs/cloud-integrations/downstream-exposures-tableau.md) and uses your `dbt build` job to ensure that Tableau extracts are updated regularly. Control the frequency of these refreshes by configuring environment variables in your dbt environment.  Differences between visualizing and orchestrating downstream exposures The following table summarizes the differences between visualizing and orchestrating downstream exposures: | Info | Set up and visualize downstream exposures | Orchestrate downstream exposures [Beta](https://docs.getdbt.com/docs/dbt-versions/product-lifecycles "Go to https://docs.getdbt.com/docs/dbt-versions/product-lifecycles") | | ----------------- | ------------------------------------------------------------------------------ | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | Purpose | Automatically brings downstream assets into your dbt lineage. | Proactively refreshes the underlying data sources during scheduled dbt jobs. | | Benefits | Provides visibility into data flow and dependencies. | Ensures BI tools always have up-to-date data without manual intervention. | | Location | Exposed in [Catalog](https://docs.getdbt.com/docs/explore/explore-projects.md) | Exposed in [dbt scheduler](https://docs.getdbt.com/docs/deploy/deployments.md) | | Supported BI tool | Tableau | Tableau | | Use case | Helps users understand how models are used and reduces incidents. | Optimizes timeliness and reduces costs by running models when needed. | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | #### Prerequisites[​](#prerequisites "Direct link to Prerequisites") To orchestrate downstream exposures, you should meet the following: * [Configured downstream exposures](https://docs.getdbt.com/docs/cloud-integrations/downstream-exposures-tableau.md) and ensured desired exposures are included in your lineage. * Verified your environment and jobs are on a supported dbt [release track](https://docs.getdbt.com/docs/dbt-versions/cloud-release-tracks.md). * Have a dbt account on the [Enterprise or Enterprise+ plan](https://www.getdbt.com/pricing/). * Created a [production](https://docs.getdbt.com/docs/deploy/deploy-environments.md#set-as-production-environment) deployment environment for each project you want to explore, with at least one successful job run. * Have [admin permissions](https://docs.getdbt.com/docs/cloud/manage-access/enterprise-permissions.md) in dbt to edit project settings or production environment settings. * Configured a [Tableau personal access token (PAT)](https://help.tableau.com/current/server/en-us/security_personal_access_tokens.htm) whose creator has privileges to view and refresh the data sources used by your exposures. The PAT inherits the permissions of its creator. Use a PAT created by: * A Tableau server or site administrator * A data source owner or a project leader #### Orchestrate downstream exposures[​](#orchestrate-downstream-exposures "Direct link to Orchestrate downstream exposures") To orchestrate downstream exposures and see the refresh happen automatically during scheduled dbt jobs: 1. In the dbt, click **Deploy**, then **Environments**, and select the **Environment variables** tab. 2. Click **Add variable** and set the [environment level variable](https://docs.getdbt.com/docs/build/environment-variables.md#setting-and-overriding-environment-variables) `DBT_ACTIVE_EXPOSURES` to `1` within the environment you want the refresh to happen. 3. Then set the `DBT_ACTIVE_EXPOSURES_BUILD_AFTER` to control the maximum refresh frequency (in minutes) you want between each exposure refresh. 4. Set the variable to **1440** minutes (24 hours) by default. This means that downstream exposures won’t refresh Tableau extracts more often than this set interval, even if the related models run more frequently. [![Set the environment variable \`DBT\_ACTIVE\_EXPOSURES\` to \`1\`.](/img/docs/cloud-integrations/auto-exposures/active-exposures-env-var.jpg?v=2 "Set the environment variable `DBT_ACTIVE_EXPOSURES` to `1`.")](#)Set the environment variable \`DBT\_ACTIVE\_EXPOSURES\` to \`1\`. 5. Run a job in production. You will see the update each time a job runs in production. * If a job runs before the set interval has passed, dbt skips the downstream exposure refresh and marks it as `skipped` in the job logs. 6. View the downstream exposure logs in the dbt run job logs. [![View the downstream exposure logs in the dbt run job logs.](/img/docs/cloud-integrations/auto-exposures/active-exposure-log.jpg?v=2 "View the downstream exposure logs in the dbt run job logs.")](#)View the downstream exposure logs in the dbt run job logs. * View more details in the debug logs for any troubleshooting. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Organize your outputs [![](/img/icons/dbt-bit.svg)](https://docs.getdbt.com/docs/build/custom-schemas.md) ###### [Custom schemas](https://docs.getdbt.com/docs/build/custom-schemas.md) [Learn how to use the `schema` configuration key to specify a custom schema.](https://docs.getdbt.com/docs/build/custom-schemas.md) [![](/img/icons/dbt-bit.svg)](https://docs.getdbt.com/docs/build/custom-databases.md) ###### [Custom databases](https://docs.getdbt.com/docs/build/custom-databases.md) [Learn how to use the `database` configuration key to specify a custom database.](https://docs.getdbt.com/docs/build/custom-databases.md)
[![](/img/icons/dbt-bit.svg)](https://docs.getdbt.com/docs/build/custom-aliases.md) ###### [Custom aliases](https://docs.getdbt.com/docs/build/custom-aliases.md) [Learn how to use the `alias` model configuration to change the name of a model's identifier in the database.](https://docs.getdbt.com/docs/build/custom-aliases.md) [![](/img/icons/dbt-bit.svg)](https://docs.getdbt.com/docs/build/custom-target-names.md) ###### [Custom target names](https://docs.getdbt.com/docs/build/custom-target-names.md) [Learn how to define a custom target name for a dbt job.](https://docs.getdbt.com/docs/build/custom-target-names.md) #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Packages Software engineers frequently modularize code into libraries. These libraries help programmers operate with leverage: they can spend more time focusing on their unique business logic, and less time implementing code that someone else has already spent the time perfecting. In dbt, libraries like these are called *packages*. dbt's packages are so powerful because so many of the analytic problems we encountered are shared across organizations, for example: * transforming data from a consistently structured SaaS dataset, for example: * turning [Snowplow](https://hub.getdbt.com/dbt-labs/snowplow/latest/) or [Segment](https://hub.getdbt.com/dbt-labs/segment/latest/) pageviews into sessions * transforming [AdWords](https://hub.getdbt.com/dbt-labs/adwords/latest/) or [Facebook Ads](https://hub.getdbt.com/dbt-labs/facebook_ads/latest/) spend data into a consistent format. * writing dbt macros that perform similar functions, for example: * [generating SQL](https://github.com/dbt-labs/dbt-utils#sql-helpers) to union together two relations, pivot columns, or construct a surrogate key * creating [custom schema tests](https://github.com/dbt-labs/dbt-utils#schema-tests) * writing [audit queries](https://hub.getdbt.com/dbt-labs/audit_helper/latest/) * building models and macros for a particular tool used in your data stack, for example: * Models to understand [Redshift](https://hub.getdbt.com/dbt-labs/redshift/latest/) privileges. * Macros to work with data loaded by [Stitch](https://hub.getdbt.com/dbt-labs/stitch_utils/latest/). dbt *packages* are in fact standalone dbt projects, with models, macros, and other resources that tackle a specific problem area. As a dbt user, by adding a package to your project, all of the package's resources will become part of your own project. This means: * Models in the package will be materialized when you `dbt run`. * You can use `ref` in your own models to refer to models from the package. * You can use `source` to refer to sources in the package. * You can use macros in the package in your own project. * It's important to note that defining and installing dbt packages is different from [defining and installing Python packages](https://docs.getdbt.com/docs/build/python-models.md#using-pypi-packages) #### Use cases[​](#use-cases "Direct link to Use cases") The following setup will work for every dbt project: * Add [any package dependencies](https://docs.getdbt.com/docs/mesh/govern/project-dependencies.md#when-to-use-project-dependencies) to `packages.yml` * Add [any project dependencies](https://docs.getdbt.com/docs/mesh/govern/project-dependencies.md#when-to-use-package-dependencies) to `dependencies.yml` However, you may be able to consolidate both into a single `dependencies.yml` file. Read the following section to learn more. ###### About packages.yml and dependencies.yml[​](#about-packagesyml-and-dependenciesyml "Direct link to About packages.yml and dependencies.yml") The `dependencies.yml`. file can contain both types of dependencies: "package" and "project" dependencies. * [Package dependencies](https://docs.getdbt.com/docs/build/packages.md#how-do-i-add-a-package-to-my-project) lets you add source code from someone else's dbt project into your own, like a library. * Project dependencies provide a different way to build on top of someone else's work in dbt. * Private packages are not supported in `dependencies.yml` because they intentionally don't support Jinja rendering or conditional configuration. This is to maintain static and predictable configuration and ensures compatibility with other services, like dbt. If your dbt project doesn't require the use of Jinja within the package specifications, you can simply rename your existing `packages.yml` to `dependencies.yml`. However, something to note is if your project's package specifications use Jinja, particularly for scenarios like adding an environment variable or a [Git token method](https://docs.getdbt.com/docs/build/packages.md#git-token-method) in a private Git package specification, you should continue using the `packages.yml` file name. Use the following toggles to understand the differences and determine when to use `dependencies.yml` or `packages.yml` (or both). Refer to the [FAQs](#faqs) for more info.  When to use Project dependencies Project dependencies are designed for the [dbt Mesh](https://docs.getdbt.com/best-practices/how-we-mesh/mesh-1-intro.md) and [cross-project reference](https://docs.getdbt.com/docs/mesh/govern/project-dependencies.md#how-to-write-cross-project-ref) workflow: * Use `dependencies.yml` when you need to set up cross-project references between different dbt projects, especially in a dbt Mesh setup. * Use `dependencies.yml` when you want to include both projects and non-private dbt packages in your project's dependencies. * Use `dependencies.yml` for organization and maintainability if you're using both [cross-project refs](https://docs.getdbt.com/docs/mesh/govern/project-dependencies.md#how-to-write-cross-project-ref) and [dbt Hub packages](https://hub.getdbt.com/). This reduces the need for multiple YAML files to manage dependencies.  When to use Package dependencies Package dependencies allow you to add source code from someone else's dbt project into your own, like a library: * If you only use packages like those from the [dbt Hub](https://hub.getdbt.com/), remain with `packages.yml`. * Use `packages.yml` when you want to download dbt packages, such as dbt projects, into your root or parent dbt project. Something to note is that it doesn't contribute to the dbt Mesh workflow. * Use `packages.yml` to include packages in your project's dependencies. This includes both public packages, such as those from the [dbt Hub](https://hub.getdbt.com/), and private packages. dbt now supports [native private packages](https://docs.getdbt.com/docs/build/packages.md#native-private-packages). * [`packages.yml` supports Jinja rendering](https://docs.getdbt.com/docs/build/dbt-tips.md#yaml-tips) for historical reasons, allowing dynamic configurations. This can be useful if you need to insert values, like a [Git token method](https://docs.getdbt.com/docs/build/packages.md#git-token-method) from an environment variable, into your package specifications. Previously, to use private Git repositories in dbt, you needed to use a workaround that involved embedding a Git token with Jinja. This is not ideal as it requires extra steps like creating a user and sharing a Git token. We’ve introduced support for [native private packages](https://docs.getdbt.com/docs/build/packages.md#native-private-packages-) to address this. #### How do I create a package?[​](#how-do-i-create-a-package "Direct link to How do I create a package?") Creating packages is an advanced use of dbt, but it can be a relatively simple task. The only strict requirement is the presence of a [`dbt_project.yml` file](https://docs.getdbt.com/reference/dbt_project.yml.md). The most common use-cases for packages are: * Sharing [models](https://docs.getdbt.com/docs/build/models.md) to share across multiple projects. * Sharing [macros](https://docs.getdbt.com/docs/build/jinja-macros.md) to share across multiple projects. Note that packages can be [private](#private-packages) — they don't need to be shared publicly. Private packages can be hosted on your own Git provider (for example, GitHub or GitLab). For instructions on creating dbt packages and additional information, refer to our guide [Building dbt packages](https://docs.getdbt.com/guides/building-packages.md?step=1). #### How do I add a package to my project?[​](#how-do-i-add-a-package-to-my-project "Direct link to How do I add a package to my project?") 1. Add a file named `dependencies.yml` or `packages.yml` to your dbt project. This should be at the same level as your `dbt_project.yml` file. 2. Specify the package(s) you wish to add using one of the supported syntaxes, for example: ```yaml packages: - package: dbt-labs/snowplow version: 0.7.0 - git: "https://github.com/dbt-labs/dbt-utils.git" revision: 0.9.2 - local: /opt/dbt/redshift ``` The default [`packages-install-path`](https://docs.getdbt.com/reference/project-configs/packages-install-path.md) is `dbt_packages`. 3. Run `dbt deps` to install the package(s). Packages get installed in the `dbt_packages` directory – by default this directory is ignored by git, to avoid duplicating the source code for the package. #### How do I specify a package?[​](#how-do-i-specify-a-package "Direct link to How do I specify a package?") You can specify a package using one of the following methods, depending on where your package is stored. ##### Hub packages (recommended)[​](#hub-packages-recommended "Direct link to Hub packages (recommended)") dbt Labs hosts the [Package hub](https://hub.getdbt.com), registry for dbt packages, as a courtesy to the dbt Community, but does not certify or confirm the integrity, operability, effectiveness, or security of any Packages. Please read the [dbt Labs Package Disclaimer](https://hub.getdbt.com/disclaimer/) before installing Hub packages. You can install available hub packages in the following way: packages.yml ```yaml packages: - package: dbt-labs/snowplow version: 0.7.3 # version number ``` Hub packages require a version to be specified – you can find the latest release number on dbt Hub. Since Hub packages use [semantic versioning](https://semver.org/), we recommend pinning your package to the latest patch version from a specific minor release, like so: ```yaml packages: - package: dbt-labs/snowplow version: [">=0.7.0", "<0.8.0"] ``` `dbt deps` "pins" each package by default. See ["Pinning packages"](#pinning-packages) for details. Where possible, we recommend installing packages via dbt Hub, since this allows dbt to handle duplicate dependencies. This is helpful in situations such as: * Your project uses both the dbt-utils and Snowplow packages, and the Snowplow package *also* uses the dbt-utils package. * Your project uses both the Snowplow and Stripe packages, both of which use the dbt-utils package. In comparison, other package installation methods are unable to handle the duplicate dbt-utils package. Advanced users can choose to host an internal version of the package hub based on [this repository](https://github.com/dbt-labs/hub.getdbt.com) and setting the `DBT_PACKAGE_HUB_URL` environment variable. ###### Prerelease versions[​](#prerelease-versions "Direct link to Prerelease versions") Some package maintainers may wish to push prerelease versions of packages to the dbt Hub, in order to test out new functionality or compatibility with a new version of dbt. A prerelease version is demarcated by a suffix, such as `a1` (first alpha), `b2` (second beta), or `rc3` (third release candidate). By default, `dbt deps` will not include prerelease versions when resolving package dependencies. You can enable the installation of prereleases in one of two ways: * Explicitly specifying a prerelease version in your `version` criteria * Setting `install_prerelease` to `true`, and providing a compatible version range For example, both of the following configurations would successfully install `0.4.5-a2` for the [`dbt_artifacts` package](https://hub.getdbt.com/brooklyn-data/dbt_artifacts/latest/): ```yaml packages: - package: brooklyn-data/dbt_artifacts version: 0.4.5-a2 ``` ```yaml packages: - package: brooklyn-data/dbt_artifacts version: [">=0.4.4", "<0.4.6"] install_prerelease: true ``` ##### Git packages[​](#git-packages "Direct link to Git packages") Packages stored on a Git server can be installed using the `git` syntax, like so: packages.yml ```yaml packages: - git: "https://github.com/dbt-labs/dbt-utils.git" # git URL revision: 0.9.2 # tag or branch name ``` Add the Git URL for the package, and optionally specify a revision. The revision can be: * a branch name * a tagged release * a specific commit (full 40-character hash) Example of a revision specifying a 40-character hash: ```yaml packages: - git: "https://github.com/dbt-labs/dbt-utils.git" revision: 4e28d6da126e2940d17f697de783a717f2503188 ``` By default, `dbt deps` "pins" each package. See ["Pinning packages"](#pinning-packages) for details. ##### Internally hosted tarball URL[​](#internally-hosted-tarball-url "Direct link to Internally hosted tarball URL") Some organizations have security requirements to pull resources only from internal services. To address the need to install packages from hosted environments such as Artifactory or cloud storage buckets, dbt Core enables you to install packages from internally-hosted tarball URLs. ```yaml packages: - tarball: https://codeload.github.com/dbt-labs/dbt-utils/tar.gz/0.9.6 name: 'dbt_utils' ``` Where `name: 'dbt_utils'` specifies the subfolder of `dbt_packages` that's created for the package source code to be installed within. #### Private packages[​](#private-packages "Direct link to Private packages") ##### Native private packages [Beta](https://docs.getdbt.com/docs/dbt-versions/product-lifecycles "Go to https://docs.getdbt.com/docs/dbt-versions/product-lifecycles")[​](#native-private-packages- "Direct link to native-private-packages-") Native private packages let you install packages from [supported](#prerequisites) private Git repos using the `private` key, without having to configure a [token](#git-token-method) or write out a full Git URL. This simplifies setup and reduces credential management. * dbt platform: Uses your existing Git [integration](https://docs.getdbt.com/docs/cloud/git/git-configuration-in-dbt-cloud.md) for authentication. * Fusion locally: Uses your system's SSH configuration. Requires the [`provider` key](#using-the-provider-key). ###### Prerequisites[​](#prerequisites "Direct link to Prerequisites") * You must have the feature flag enabled. Contact your account team to request access. * To use native private packages, you must have one of the following Git providers configured in the **Integrations** section of your **Account settings**: * [GitHub](https://docs.getdbt.com/docs/cloud/git/connect-github.md) * [Azure DevOps](https://docs.getdbt.com/docs/cloud/git/connect-azure-devops.md) * Private packages only work within a single Azure DevOps project. If your repositories are in different projects within the same organization, you can't reference them in the `private` key at this time. * For Azure DevOps, use the `org/repo` path (not the `org_name/project_name/repo_name` path) with the project tier inherited from the integrated source repository. * [GitLab](https://docs.getdbt.com/docs/cloud/git/connect-gitlab.md) * Every GitLab repo with private packages must also be a dbt project. * If using Fusion locally, you must have an SSH key configured on your machine for the relevant Git provider and include the [`provider` key](#using-the-provider-key) in your package configuration. ###### Configuration[​](#configuration "Direct link to Configuration") Use the `private` key in your `packages.yml` or `dependencies.yml` to clone package repos using your existing dbt Git integration without having to provision an access token or create a dbt environment variable. packages.yml ```yaml packages: - private: dbt-labs/awesome_repo # your-org/your-repo path - package: normal packages [...] ``` Azure DevOps considerations * Private packages currently only work if the package repository is in the same Azure DevOps project as the source repo. * Use the `org/repo` path (not the normal ADO `org_name/project_name/repo_name` path) in the `private` key. * Repositories in different Azure DevOps projects is currently not supported until a future update. You can use private packages by specifying `org/repo` in the `private` key: packages.yml ```yaml packages: - private: my-org/my-repo # Works if your ADO source repo and package repo are in the same project ``` You can pin private packages similar to regular dbt packages: ```yaml packages: - private: dbt-labs/awesome_repo revision: "0.9.5" # Pin to a tag, branch, or complete 40-character commit hash ``` ###### Using the `provider` key[​](#using-the-provider-key "Direct link to using-the-provider-key") Add the `provider` key when: * You are using multiple Git integrations or using the dbt Fusion engine. * You are using Fusion locally (with the [Fusion CLI](https://docs.getdbt.com/docs/local/install-dbt.md?version=2#get-started) or the [VS Code extension](https://docs.getdbt.com/docs/local/install-dbt.md?version=2#get-started)) (required). ```yaml packages: - private: dbt-labs/awesome_repo provider: "github" # Supported values: "github", "gitlab", "azure_devops" ``` Fusion uses the `provider` value to construct the correct SSH URL for cloning, based on the provider: | Provider | SSH URL format | | -------------- | ----------------------------------- | | `github` | `git@github.com:org/repo.git` | | `gitlab` | `git@gitlab.com:org/repo.git` | | `azure_devops` | `git@ssh.dev.azure.com:v3/org/repo` | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | Fusion relies on your system's SSH configuration to authenticate and clone the private repository. If `git clone` works on your system for the private package repo, the private package install should work too. ##### SSH key method (CLI only)[​](#ssh-key-method-cli-only "Direct link to SSH key method (CLI only)") note This method uses the `git:` key with a full SSH URL, which is different from [native private packages](#native-private-packages) that use the `private:` key. For most use cases, native private packages is the recommended approach as it simplifies setup. If you're using the Command Line, private packages can be cloned via SSH and an SSH key. When you use SSH keys to authenticate to your git remote server, you don’t need to supply your username and password each time. Read more about SSH keys, how to generate them, and how to add them to your git provider here: [Github](https://docs.github.com/en/github/authenticating-to-github/connecting-to-github-with-ssh) and [GitLab](https://docs.gitlab.com/ee/user/ssh.html). packages.yml ```yaml packages: - git: "git@github.com:dbt-labs/dbt-utils.git" # git SSH URL ``` If you're using the dbt platform, the SSH key method will not work, but you can use [native private packages](#native-private-packages) or the [HTTPS Git Token Method](https://docs.getdbt.com/docs/build/packages.md#git-token-method). ##### Git token method[​](#git-token-method "Direct link to Git token method") note [Native private packages](#native-private-packages) is the recommended approach for GitHub, GitLab, and Azure DevOps. The git token method is still functional in both Fusion and the dbt platform, but requires provisioning a personal access token. It can be useful as a fallback if you need to unblock yourself. This method allows the user to clone via HTTPS by passing in a git token via an environment variable. Be careful of the expiration date of any token you use, as an expired token could cause a scheduled run to fail. Additionally, user tokens can create a challenge if the user ever loses access to a specific repo. dbt usage If you are using dbt, you must adhere to the naming conventions for environment variables. Environment variables in dbt must be prefixed with either `DBT_` or `DBT_ENV_SECRET`. Environment variables keys are uppercased and case sensitive. When referencing `{{env_var('DBT_KEY')}}` in your project's code, the key must match exactly the variable defined in dbt's UI. In GitHub: packages.yml ```yaml packages: # use this format when accessing your repository via a github application token - git: "https://{{env_var('DBT_ENV_SECRET_GIT_CREDENTIAL')}}@github.com/dbt-labs/awesome_repo.git" # git HTTPS URL # use this format when accessing your repository via a classical personal access token - git: "https://{{env_var('DBT_ENV_SECRET_GIT_CREDENTIAL')}}@github.com/dbt-labs/awesome_repo.git" # git HTTPS URL # use this format when accessing your repository via a fine-grained personal access token (username sometimes required) - git: "https://GITHUB_USERNAME:{{env_var('DBT_ENV_SECRET_GIT_CREDENTIAL')}}@github.com/dbt-labs/awesome_repo.git" # git HTTPS URL ``` Read more about creating a GitHub Personal Access token [here](https://docs.github.com/en/enterprise-server@3.1/github/authenticating-to-github/keeping-your-account-and-data-secure/creating-a-personal-access-token). You can also use a GitHub App installation [token](https://docs.github.com/en/rest/reference/apps#create-an-installation-access-token-for-an-app). In GitLab: packages.yml ```yaml packages: - git: "https://{{env_var('DBT_USER_NAME')}}:{{env_var('DBT_ENV_SECRET_DEPLOY_TOKEN')}}@gitlab.example.com/dbt-labs/awesome_project.git" # git HTTPS URL ``` Read more about creating a GitLab Deploy Token [here](https://docs.gitlab.com/ee/user/project/deploy_tokens/#creating-a-deploy-token) and how to properly construct your HTTPS URL [here](https://docs.gitlab.com/ee/user/project/deploy_tokens/#git-clone-a-repository). Deploy tokens can be managed by Maintainers only. In Azure DevOps: packages.yml ```yaml packages: - git: "https://{{env_var('DBT_ENV_SECRET_PERSONAL_ACCESS_TOKEN')}}@dev.azure.com/dbt-labs/awesome_project/_git/awesome_repo" # git HTTPS URL ``` Read more about creating a Personal Access Token [here](https://docs.microsoft.com/en-us/azure/devops/organizations/accounts/use-personal-access-tokens-to-authenticate?view=azure-devops\&tabs=preview-page#create-a-pat). In Bitbucket: packages.yml ```yaml packages: - git: "https://{{env_var('DBT_USER_NAME')}}:{{env_var('DBT_ENV_SECRET_PERSONAL_ACCESS_TOKEN')}}@bitbucketserver.com/scm/awesome_project/awesome_repo.git" # for Bitbucket Server ``` Read more about creating a Personal Access Token [here](https://confluence.atlassian.com/bitbucketserver/personal-access-tokens-939515499.html). #### Configure subdirectory for packaged projects[​](#configure-subdirectory-for-packaged-projects "Direct link to Configure subdirectory for packaged projects") In general, dbt expects `dbt_project.yml` to be located as a top-level file in a package. If the packaged project is instead nested in a subdirectory—perhaps within a much larger mono repo—you can optionally specify the folder path as `subdirectory`. dbt will attempt a [sparse checkout](https://git-scm.com/docs/git-sparse-checkout) of just the files located within that subdirectory. Note that you must be using a recent version of `git` (`>=2.26.0`). packages.yml ```yaml packages: - git: "https://github.com/dbt-labs/dbt-labs-experimental-features" # git URL subdirectory: "materialized-views" # name of subdirectory containing `dbt_project.yml` ``` ##### Local packages[​](#local-packages "Direct link to Local packages") A "local" package is a dbt project accessible from your local file system. They're best suited for when there is a common collection of models and macros that you want to share across multiple downstream dbt projects (but each downstream project still has its own unique models, macros, etc). You can install local packages by specifying the project's path. It works best when you nest the project within a subdirectory relative to your current project's directory. packages.yml ```yaml packages: - local: relative/path/to/subdirectory ``` Other patterns may work in some cases, but not always. For example, if you install this project as a package elsewhere, or try running it on a different system, the relative and absolute paths will yield the same results. packages.yml ```yaml packages: # not recommended - support for these patterns vary - local: /../../redshift # relative path to a parent directory - local: /opt/dbt/redshift # absolute path on the system ``` There are a few specific use cases where we recommend using a "local" package: 1. **Monorepo** — When you have multiple projects, each nested in a subdirectory, within a monorepo. "Local" packages allow you to combine projects for coordinated development and deployment. 2. **Testing changes** — To test changes in one project or package within the context of a downstream project or package that uses it. By temporarily switching the installation to a "local" package, you can make changes to the former and immediately test them in the latter for quicker iteration. This is similar to [editable installs](https://pip.pypa.io/en/stable/topics/local-project-installs/) in Python. 3. **Nested project** — When you have a nested project that defines fixtures and tests for a project of utility macros, like [the integration tests within the `dbt-utils` package](https://github.com/dbt-labs/dbt-utils/tree/main/integration_tests). #### What packages are available?[​](#what-packages-are-available "Direct link to What packages are available?") To see the library of published dbt packages, check out the [dbt package hub](https://hub.getdbt.com)! #### Fusion package compatibility[​](#fusion-package-compatibility "Direct link to Fusion package compatibility") To determine if a package is compatible with the dbt Fusion engine, visit the [dbt package hub](https://hub.getdbt.com/) and look for the Fusion-compatible badge, or review the package's [`require-dbt-version` configuration](https://docs.getdbt.com/reference/project-configs/require-dbt-version.md#pin-to-a-range). * Packages with a `require-dbt-version` that equals or contains `2.0.0` are compatible with Fusion. For example, `require-dbt-version: ">=1.10.0,<3.0.0"`. Even if a package doesn't reflect compatibility in the package hub, it may still work with Fusion. Work with package maintainers to track updates, and [thoroughly test packages](https://docs.getdbt.com/guides/fusion-package-compat?step=5) that aren't clearly compatible before deploying. * Package maintainers who would like to make their package compatible with Fusion can refer to the [Fusion package upgrade guide](https://docs.getdbt.com/guides/fusion-package-compat.md) for instructions. Fivetran package considerations: * The Fivetran `source` and `transformation` packages have been combined into a single package. * If you manually installed source packages like `fivetran/github_source`, you need to ensure `fivetran/github` is installed and deactivate the transformation models. ###### Package compatibility messages[​](#package-compatibility-messages "Direct link to Package compatibility messages") Inconsistent Fusion warnings and `dbt-autofix` logs Fusion warnings and `dbt-autofix` logs may show different messages about package compatibility. If you use [`dbt-autofix`](https://github.com/dbt-labs/dbt-autofix) while upgrading to Fusion in the Studio IDE or dbt VS Code extension, you may see different messages about package compatibility between `dbt-autofix` and Fusion warnings. Here's why: * Fusion warnings are emitted based on a package's `require-dbt-version` and whether `require-dbt-version` contains `2.0.0`. * Some packages are already Fusion-compatible even though package maintainers haven't yet updated `require-dbt-version`. * `dbt-autofix` knows about these compatible packages and will not try to upgrade a package that it knows is already compatible. This means that even if you see a Fusion warning for a package that `dbt-autofix` identifies as compatible, you don't need to change the package. The message discrepancy is temporary while we implement and roll out `dbt-autofix`'s enhanced compatibility detection to Fusion warnings. Here's an example of a Fusion warning in the Studio IDE that says a package isn't compatible with Fusion but `dbt-autofix` indicates it is compatible: ```text dbt1065: Package 'dbt_utils' requires dbt version [>=1.30,<2.0.0], but current version is 2.0.0-preview.72. This package may not be compatible with your dbt version. dbt(1065) [Ln 1, Col 1] ``` #### Advanced package configuration[​](#advanced-package-configuration "Direct link to Advanced package configuration") ##### Updating a package[​](#updating-a-package "Direct link to Updating a package") When you update a version or revision in your `packages.yml` file, it isn't automatically updated in your dbt project. You should run `dbt deps` to update the package. You may also need to run a [full refresh](https://docs.getdbt.com/reference/commands/run.md) of the models in this package. ##### Uninstalling a package[​](#uninstalling-a-package "Direct link to Uninstalling a package") When you remove a package from your `packages.yml` file, it isn't automatically deleted from your dbt project, as it still exists in your `dbt_packages/` directory. If you want to completely uninstall a package, you should either: * delete the package directory in `dbt_packages/`; or * run `dbt clean` to delete *all* packages (and any compiled models), followed by `dbt deps`. ##### Pinning packages[​](#pinning-packages "Direct link to Pinning packages") Running [`dbt deps`](https://docs.getdbt.com/reference/commands/deps.md) "pins" each package by creating or updating the `package-lock.yml` file in the *project\_root* where `packages.yml` is recorded. * The `package-lock.yml` file contains a record of all packages installed. * If subsequent `dbt deps` runs contain no changes to `dependencies.yml` or `packages.yml`, dbt-core installs from `package-lock.yml`. For example, if you use a branch name, the `package-lock.yml` file pins to the head commit. If you use a version range, it pins to the latest release. In either case, subsequent commits or versions will **not** be installed. To get new commits or versions, run `dbt deps --upgrade` or add `package-lock.yml` to your .gitignore file. dbt will warn you if you install a package using the `git` syntax without specifying a revision (see below). ##### Configuring packages[​](#configuring-packages "Direct link to Configuring packages") You can configure the models and seeds in a package from the `dbt_project.yml` file, like so: dbt\_project.yml ```yml vars: snowplow: 'snowplow:timezone': 'America/New_York' 'snowplow:page_ping_frequency': 10 'snowplow:events': "{{ ref('sp_base_events') }}" 'snowplow:context:web_page': "{{ ref('sp_base_web_page_context') }}" 'snowplow:context:performance_timing': false 'snowplow:context:useragent': false 'snowplow:pass_through_columns': [] models: snowplow: +schema: snowplow seeds: snowplow: +schema: snowplow_seeds ``` For example, when using a dataset specific package, you may need to configure variables for the names of the tables that contain your raw data. Configurations made in your project YAML file (`dbt_project.yml`) will override any configurations in a package (either in the project YAML file of the package, or in config blocks). ##### Specifying unpinned Git packages[​](#specifying-unpinned-git-packages "Direct link to Specifying unpinned Git packages") If your project specifies an "unpinned" Git package, you may see a warning like: ```text The git package "https://github.com/dbt-labs/dbt-utils.git" is not pinned. This can introduce breaking changes into your project without warning! ``` This warning can be silenced by setting `warn-unpinned: false` in the package specification. **Note:** This is not recommended. packages.yml ```yaml packages: - git: https://github.com/dbt-labs/dbt-utils.git warn-unpinned: false ``` #### Troubleshooting[​](#troubleshooting "Direct link to Troubleshooting") If you encounter errors while working with dbt packages, see the following FAQs: Why am I receiving a Runtime Error in my packages? If you're receiving the runtime error below in your packages.yml folder, it may be due to an old version of your dbt\_utils package that isn't compatible with your current dbt version. ```shell Running with dbt=xxx Runtime Error Failed to read package: Runtime Error Invalid config version: 1, expected 2 Error encountered in dbt_utils/dbt_project.yml ``` Try updating the old version of the dbt\_utils package in your packages.yml to the latest version found in the [dbt hub](https://hub.getdbt.com/dbt-labs/dbt_utils/latest/): ```shell packages: - package: dbt-labs/dbt_utils version: xxx ``` If you've tried the workaround above and are still experiencing this behavior - reach out to the Support team at and we'll be happy to help! \[Error] Could not find my\_project package If a package name is included in the `search_order` of a project-level `dispatch` config, dbt expects that package to contain macros which are viable candidates for dispatching. If an included package does not contain *any* macros, dbt will raise an error like: ```shell Compilation Error In dispatch: Could not find package 'my_project' ``` This does not mean the package or root project is missing—it means that any macros from it are missing, and so it is missing from the search spaces available to `dispatch`. If you've tried the step above and are still experiencing this behavior - reach out to the Support team at and we'll be happy to help! #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Parallel microbatch execution Use parallel batch execution to process your microbatch models faster. The microbatch strategy offers the benefit of updating a model in smaller, more manageable batches. Depending on your use case, configuring your microbatch models to run in parallel offers faster processing, in comparison to running batches sequentially. Parallel batch execution means that multiple batches are processed at the same time, instead of one after the other (sequentially) for faster processing of your microbatch models. dbt automatically detects whether a batch can be run in parallel in most cases, which means you don’t need to configure this setting. However, the [`concurrent_batches` config](https://docs.getdbt.com/reference/resource-properties/concurrent_batches.md) is available as an override (not a gate), allowing you to specify whether batches should or shouldn’t be run in parallel in specific cases. For example, if you have a microbatch model with 12 batches, you can execute those batches to run in parallel. Specifically they'll run in parallel limited by the number of [available threads](https://docs.getdbt.com/docs/running-a-dbt-project/using-threads.md). #### Prerequisites[​](#prerequisites "Direct link to Prerequisites") To use parallel execution, you must meet the following prerequisites: * Use Snowflake as a supported adapter. * We'll continue to test and add concurrency support for more adapters in the future. * A batch can only be run in parallel if: * The batch is *not* the first batch. * The batch is *not* the last batch. #### How parallel batch execution works[​](#how-parallel-batch-execution-works "Direct link to How parallel batch execution works") After checking for the conditions in the [prerequisites](#prerequisites), and if `concurrent_batches` value isn't set, dbt will intelligently auto-detect if the model invokes the [`{{ this }}`](https://docs.getdbt.com/reference/dbt-jinja-functions/this.md) Jinja function. If it references `{{ this }}`, the batches will run sequentially since `{{ this }}` represents the database of the current model and referencing the same relation causes conflict. Otherwise, if `{{ this }}` isn't detected (and other conditions are met), the batches will run in parallel, which can be overriden when you [set a value for `concurrent_batches`](https://docs.getdbt.com/reference/resource-properties/concurrent_batches.md). #### Parallel or sequential execution[​](#parallel-or-sequential-execution "Direct link to Parallel or sequential execution") Choosing between parallel batch execution and sequential processing depends on the specific requirements of your use case. * Parallel batch execution is faster but requires logic independent of batch execution order. For example, if you're developing a data pipeline for a system that processes user transactions in batches, each batch is executed in parallel for better performance. However, the logic used to process each transaction shouldn't depend on the order of how batches are executed or completed. * Sequential processing is slower but essential for calculations like [cumulative metrics](https://docs.getdbt.com/docs/build/cumulative.md) in microbatch models. It processes data in the correct order, allowing each step to build on the previous one. #### Configure `concurrent_batches`[​](#configure-concurrent_batches "Direct link to configure-concurrent_batches") By default, dbt auto-detects whether batches can run in parallel for microbatch models, and this works correctly in most cases. However, you can override dbt's detection by setting the [`concurrent_batches` config](https://docs.getdbt.com/reference/resource-properties/concurrent_batches.md) in your `dbt_project.yml` or model `.sql` file to specify parallel or sequential execution, given you meet all the [conditions](#prerequisites): * dbt\_project.yml * my\_model.sql dbt\_project.yml ```yaml models: +concurrent_batches: true # value set to true to run batches in parallel ``` models/my\_model.sql ```sql {{ config( materialized='incremental', incremental_strategy='microbatch', event_time='session_start', begin='2020-01-01', batch_size='day', concurrent_batches=true, # value set to true to run batches in parallel ... ) }} select ... ``` #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Power BI StarterEnterpriseEnterprise +Preview ### Power BI [Starter](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing")[Enterprise](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing")[Enterprise +](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing")[Preview](https://docs.getdbt.com/docs/dbt-versions/product-lifecycles "Go to https://docs.getdbt.com/docs/dbt-versions/product-lifecycles") The Power BI integration enables you to query the Semantic Layer directly, allowing you to build dashboards with trusted, live data in Power BI. It provides a live connection to the Semantic Layer through Power BI Desktop or Power BI Service. #### Prerequisites[​](#prerequisites "Direct link to Prerequisites") * You have [configured the Semantic Layer](https://docs.getdbt.com/docs/use-dbt-semantic-layer/setup-sl.md). * You are on a supported [dbt release track](https://docs.getdbt.com/docs/dbt-versions/cloud-release-tracks.md) or on dbt v1.6 or higher. * You installed [Power BI Desktop or Power BI On-premises Data Gateway](https://learn.microsoft.com/en-us/power-bi/connect-data/service-gateway-custom-connectors). * Power BI Service doesn't natively support custom connectors. To use the connector in Power BI Service, you must install and configure it on an On-premises Data Gateway. * You need your [dbt host](https://docs.getdbt.com/docs/use-dbt-semantic-layer/setup-sl.md#3-view-connection-detail), [Environment ID](https://docs.getdbt.com/docs/use-dbt-semantic-layer/setup-sl.md#set-up-dbt-semantic-layer), and a [service token](https://docs.getdbt.com/docs/dbt-cloud-apis/service-tokens.md) or a [personal access token](https://docs.getdbt.com/docs/dbt-cloud-apis/user-tokens.md) to log in. This account should be set up with the Semantic Layer. * You must have a dbt Starter or Enterprise-tier [account](https://www.getdbt.com/pricing). Suitable for both Multi-tenant and Single-tenant deployment. 📹 Learn about the dbt Semantic Layer with on-demand video courses! Explore our [dbt Semantic Layer on-demand course](https://learn.getdbt.com/courses/semantic-layer) to learn how to define and query metrics in your dbt project. Additionally, dive into mini-courses for querying the dbt Semantic Layer in your favorite tools: [Tableau](https://courses.getdbt.com/courses/tableau-querying-the-semantic-layer), [Excel](https://learn.getdbt.com/courses/querying-the-semantic-layer-with-excel), [Hex](https://courses.getdbt.com/courses/hex-querying-the-semantic-layer), and [Mode](https://courses.getdbt.com/courses/mode-querying-the-semantic-layer). #### Install the connector[​](#install-the-connector "Direct link to Install the connector") power bi versions The Power BI connector may be incompatible with older versions of Power BI desktop. For the best results, we recommend installing the most recent version directly from the [Microsoft Store](https://apps.microsoft.com/detail/9ntxr16hnw1t?hl=en-US\&gl=US) or [Download Center](https://www.microsoft.com/en-us/download/details.aspx?id=58494). The Semantic Layer Power BI connector consists of a custom `.pqx` Power BI connector and an ODBC driver. Install both using our Windows installer by following these steps: 1. Download and install the [`.msi` installer](https://github.com/dbt-labs/semantic-layer-powerbi-connector/releases/download/v1.0.0/dbt.Semantic.Layer.for.Power.BI.zip) 2. Run the installer and follow the on-screen instructions to install the ODBC driver and connector onto your Power BI Desktop. ##### Verify installation[​](#verify-installation "Direct link to Verify installation") Note that users on older versions of Power BI may have to [configure the connector](#configure-the-connector) before they can verify the installation. To verify the installation: 1. Open **ODBC Data Sources (64-bit)** file on your computer. 2. Navigate to **System DSN** and verify that the `dbt Labs ODBC DSN` is registered. 3. Navigate to **Drivers** and verify that the `dbt Labs ODBC Driver` is installed. 4. Open Power BI Desktop, navigate to **Settings**, then **Data Source Settings**. Verify that the `dbt Semantic Layer` connector is properly loaded. To allow published reports in Power BI Service to use the connector. An IT admin in your organization needs to install and configure the connector on an On-premises Data Gateway. #### For IT admins[​](#for-it-admins "Direct link to For IT admins") This section is for IT admins trying to install the ODBC driver and connector into an On-premises Data Gateway. To allow published reports to use the connector in Power BI Service, an IT Admin must install and configure the connector: 1. Install the ODBC driver and connector into an On-premises Data Gateway. Run the same `.msi` installer used for Power BI Desktop and install it on the machine where your gateway is hosted. 2. Copy connector file to Gateway directory: 1. Locate that `.pqx` file: `C:\Users\\Documents\Power BI Desktop\Custom Connectors\dbtSemanticLayer.pqx`. 2. Copy it to the Power BI On-premises Data Gateway custom connectors directory: `C:\Windows\ServiceProfiles\PBIEgwService\Documents\Power BI Desktop\Custom Connectors`. 3. Verify installation by following the steps from the [install the connector](#verify-installation) section. 4. Enable connector in Power BI Enterprise Gateway: 1. Open the `EnterpriseGatewayConfigurator.exe`. 2. Navigate to **Connectors**. 3. Verify that the `dbt Semantic Layer` connector is installed and active. For more information on how to set up custom connectors in the Power BI On-premises Data Gateway, refer to Power BI’s [official documentation](https://learn.microsoft.com/en-us/power-bi/connect-data/service-gateway-custom-connectors). #### Configure the connector[​](#configure-the-connector "Direct link to Configure the connector") After installing the connector, you’ll have to configure your project credentials to connect to the Semantic Layer from a report. To configure project credentials in Power BI Desktop: 1. Create a blank report. 2. On the top-left, click on **Get data**. 3. Search for Semantic Layer, then click **Connect**. 4. Fill in your connection details. You can find your Host and Environment ID under the Semantic Layer configuration for your dbt project. tip Make sure you select **DirectQuery** under **Data Connectivity mode** since the Semantic Layer connector does not support **Import** mode. See [Considerations](#considerations) for more details. 5. Click **OK** to proceed. [![Select DirectQuery mode](/img/docs/cloud-integrations/sl-pbi/pbi-directquery.jpg?v=2 "Select DirectQuery mode")](#)Select DirectQuery mode 6. On the next screen, paste your service or personal token and then click **Connect**. 7. You should see a side pane with a few "virtual" tables. `ALL` represents all of your defined semantic layer objects. The other tables represent each of your saved queries. Select the one you want to load into your dashboard. Then click **Load**. [![Select tables in the side panel](/img/docs/cloud-integrations/sl-pbi/pbi-sidepanel.jpg?v=2 "Select tables in the side panel")](#)Select tables in the side panel Now that you've configured the connector, you can configure published reports in the next section to use the connector. #### Configure published reports[​](#configure-published-reports "Direct link to Configure published reports") After publishing a report and the first time you hit **Publish** on a given report, configure Power BI Service to use your organization’s On-premises Data Gateway to access data from the Semantic Layer: 1. On the top right, click on **Settings > Power BI settings**. [![Navigate to Settings > Power BI Settings](/img/docs/cloud-integrations/sl-pbi/pbi-settings.jpg?v=2 "Navigate to Settings > Power BI Settings")](#)Navigate to Settings > Power BI Settings 2. Navigate to the **Semantic models** tab and select your report on the sidebar on the left. 3. Under **Gateway and cloud connections**, select the **On-premises Data Gateway** where your IT admin has installed the Semantic Layer connector. * If the Status is **Not configured correctly**, you’ll have to configure it. [![Configure the gateway connection](/img/docs/cloud-integrations/sl-pbi/pbi-gateway-cloud-connections.jpg?v=2 "Configure the gateway connection")](#)Configure the gateway connection 4. Click on the arrow under **Actions** and then, click on **Manually add to gateway**. [![Manually add to gateway](/img/docs/cloud-integrations/sl-pbi/pbi-manual-gateway.jpg?v=2 "Manually add to gateway")](#)Manually add to gateway 5. Provide a name for your connection and enter your connection details. * Set the connection as **Encrypted** (Required). Failing to do so will result in the Semantic Layer servers rejecting the connection. [![Set the connection as Encrypted](/img/docs/cloud-integrations/sl-pbi/pbi-encrypted.jpg?v=2 "Set the connection as Encrypted")](#)Set the connection as Encrypted 6. Click **Create**. This will run a connection test (unless you choose to skip it). If the connection succeeds, the connection will be saved. You can now go back to your published report on Power BI Service to assert data loads as expected. #### Use the connector[​](#use-the-connector "Direct link to Use the connector") This section describes how to use the Semantic Layer connector in Power BI. The Semantic Layer connector creates: * A virtual table for each saved query. * A `METRICS.ALL` table containing all metrics, and dimensions and entities appear as regular dimension columns. These tables do not actually map to an underlying table in your data warehouse. Instead, Power BI sends queries to these tables and (before actually executing on the warehouse) the Semantic Layer servers: * Parse the SQL. * Extract all the queried columns, group bys and filters. * Generates SQL to query your existing tables. * Returns data back to Power BI, which doesn’t know any of this happened. [![Power BI integration diagram](/img/docs/cloud-integrations/sl-pbi/sl-pbi.jpg?v=2 "Power BI integration diagram")](#)Power BI integration diagram This allows for very flexible analytics workflows, like drag and drop metrics and slice by dimensions and entities — the Semantic Layer will generate the appropriate SQL to actually query your data source for you. ###### Modifying time granularity[​](#modifying-time-granularity "Direct link to Modifying time granularity") When you select time dimensions in the **Group By** menu, you'll see a list of available time granularities. The lowest granularity is selected by default. Metric time is the default time dimension for grouping your metrics. info Note: [Custom time granularities](https://docs.getdbt.com/docs/build/metricflow-time-spine.md#add-custom-granularities) (like fiscal year) aren't currently supported or accessible in this integration. Only [standard granularities](https://docs.getdbt.com/docs/build/dimensions.md?dimension=time_gran#time) (like day, week, month, and so on) are available. If you'd like to access custom granularities, consider using the [Semantic Layer APIs](https://docs.getdbt.com/docs/dbt-cloud-apis/sl-api-overview.md). #### Considerations[​](#considerations "Direct link to Considerations")  Not every “column” of METRICS.ALL are compatible with every other column * `METRICS.ALL` combines all your existing metrics, entities and dimensions. Queries must be valid Semantic Layer queries, otherwise they'll fail with MetricFlow query compilation errors. * For saved query tables, all “columns” will be compatible with every other “column” since, by definition, saved queries are valid queries that can be sliced by any of the dimensions present in the query.  The dbt Semantic Layer connector does not support Import mode natively * Use `DirectQuery` mode to ensure compatibility. * `Import` mode tries to select an entire table to import into Power BI, which means it'll likely generate SQL that translates to an invalid Semantic Layer query which will try to query all metrics, dimensions and entities at the same time. * To import data into a PowerBI report, select a valid combination of columns to import, (something that will generate a valid Semantic Layer query). * You can use `Table.SelectColumns` for this: `= Table.SelectColumns(Source{[Item="ALL",Schema="METRICS",Catalog=null]}[Data], {"Total Profit", "Metric Time (Day)"})` * Be aware that all calculations will happen inside of Power BI and won’t pass through Semantic Layer servers. This could lead to incorrect or diverging results. * For example, the Semantic Layer is usually responsible for rolling up cumulative metrics to coarser time granularities. Doing a sum over all the weeks in a year to get a yearly granularity out of a weekly Semantic Layer query will most likely generate incorrect results. Instead, you should query the Semantic Layer directly to get accurate results.  The dbt Semantic Layer connector ignores aggregations defined in Power BI * If you change the aggregation type of a metric from `SUM()` to `COUNT()` or anything else, nothing will change. This is because aggregation functions are defined in the Semantic Layer and we ignore them when translating Power BI generated SQL into Semantic Layer queries. * Aggregations like `Count (Distinct)`, `Standard Deviation`, `Variance`, and `Median` in Power BI may return an error and not work at all.  What actions aren't supported? The following are not supported: * Custom modeling * Joining tables * Creating custom columns within a table * Custom Data Analysis Expressions (DAX) or Power Query (PQ) #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Preview new and experimental features in the dbt platform dbt Labs often tests experimental features before deciding to continue on the [Product lifecycle](https://docs.getdbt.com/docs/dbt-versions/product-lifecycles.md#dbt-cloud). You can access experimental features to preview beta features that haven’t yet been released to dbt. You can toggle on or off all experimental features in your Profile settings. Experimental features: * May not be feature-complete or fully stable as we’re actively developing them. * Could be discontinued at any time. * May require feedback from you to understand their limitations or impact. Each experimental feature collects feedback directly in dbt, which may impact dbt Labs' decisions to implement. * May have limited technical support and be excluded from our Support SLAs. * May not have public documentation available. To enable or disable experimental features: 1. From dbt, click on your account name in the left side menu and select **Account settings**. 2. Go to **Personal profile** under the **Your profile** header. 3. Find **Experimental features** at the bottom of **Your Profile** page. 4. Click **Beta** to toggle the features on or off as shown in the following image. ![Experimental features](/assets/images/experimental-feats-a099dce8fc8f8ac6081f85df4b0aa379.png) #### Beta terms and conditions[​](#beta-terms-and-conditions "Direct link to Beta terms and conditions") By using or enabling features that are not yet in general release ("Beta Features"), you agree to the [Beta Features Terms and Conditions](https://docs.getdbt.com/assets/files/beta-tc-740ff696113c89c38a96bb70b968775e.pdf). #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Product lifecycles dbt Labs is directly involved with the maintenance of three products: * dbt Core: The [open-source](https://github.com/dbt-labs/dbt-core) software that’s freely available. * dbt platform: The cloud-based [SaaS solution](https://www.getdbt.com/signup), originally built on top of dbt Core. We're now introducing dbt's new engine, the dbt Fusion engine. For more information, refer to [the dbt Fusion engine](https://docs.getdbt.com/docs/fusion.md). * dbt Fusion engine: The next-generation dbt engine, substantially faster than dbt Core and has built in SQL comprehension technology to power the next generation of analytics engineering workflows. The dbt Fusion engine is designed to deliver data teams a lightning-fast development experience, intelligent cost savings, and improved governance. All dbt features fall into a lifecycle category determined by their availability in the following products: ##### The dbt platform[​](#the-dbt-platform "Direct link to The dbt platform") dbt features all fall into one of the following categories: * **Beta:** Beta features are in development and might not be entirely stable; they should be used at the customer’s risk, as breaking changes could occur. Beta features might not be fully documented, technical support is limited, and service level objectives (SLOs) might not be provided. Download the [Beta Features Terms and Conditions](https://docs.getdbt.com/assets/files/beta-tc-740ff696113c89c38a96bb70b968775e.pdf) for more details. If a beta feature is marked `Private`, it must be enabled by dbt Labs, and access is not self-service. If documentation is available, it will include instructions for requesting access. * **Preview:** Preview features are stable and considered functionally ready for production deployments. Some planned additions and modifications to feature behaviors could occur before they become generally available. New functionality that is not backward compatible could also be introduced. Preview features include documentation, technical support, and service level objectives (SLOs). Features in preview are provided at no extra cost, although they might become paid features when they become generally available. If a preview feature is marked `Private`, it must be enabled by dbt Labs, and access is not self-service. Refer to the feature documentation for instructions on requesting access. * **Generally available (GA):** Generally available features provide stable features introduced to all qualified dbt accounts. Service level agreements (SLAs) apply to GA features, including documentation and technical support. Certain GA feature availability is determined by the dbt version of the environment. To always receive the latest GA features, ensure your dbt [environments](https://docs.getdbt.com/docs/dbt-cloud-environments.md) are on a supported [Release Track](https://docs.getdbt.com/docs/dbt-versions/cloud-release-tracks.md). * **Deprecated:** Features in this state are no longer being developed or enhanced by dbt Labs. They will continue functioning as-is, and their documentation will persist until their removal date. However, they are no longer subject to technical support. * **Removed:** Removed features are no longer available on the platform in any capacity. ##### dbt Core[​](#dbt-core "Direct link to dbt Core") We release dbt Core in the following lifecycle states. Core releases follow semantic versioning, which you can read more about in [About Core versions](https://docs.getdbt.com/docs/dbt-versions/core.md). * **Unreleased:** We will include this functionality in the next minor version prerelease. However, we make no commitments about its behavior or implementation. As maintainers, we reserve the right to change any part of it, or remove it entirely (with an accompanying explanation.) * **Prerelease:** * **Beta:** The purpose of betas is to provide a first glimpse of the net-new features that will be arriving in this minor version, when it has its final release. The code included in beta should work, without regression from existing functionality, or negative interactions with other released features. Net-new features included in a beta *may be* incomplete or have known edge cases/limitations. Changes included in beta are not “locked,” and the maintainers reserve the right to change or remove (with an explanation). * **Release Candidate:** The purpose of a release candidate is to offer a 2-week window for more extensive production-level testing, with the goal of catching regressions before they go live in a final release. Users can believe that features in a Release Candidate will work the same on release day. However, if we do find a significant bug, we do still reserve the right to change or remove the underlying behavior, with a clear explanation. * **Released:** Ready for use in production. * **Experimental:** Features we release for general availability, which we believe are usable in their current form, but for which we may document additional caveats. * **Undocumented:** These are subsets of dbt Core functionality that are internal, not contracted, or intentionally left undocumented. Do not consider this functionality part of that release’s product surface area. * **Deprecated:** Features in this state are not actively worked on or enhanced by dbt Labs and will continue to function as-is until their removal date. * **Removed:** Removed features no longer have any level of product functionality or platform support. ##### dbt Fusion engine[​](#dbt-fusion-engine "Direct link to dbt Fusion engine") The dbt Fusion engine and [VS Code extension](https://docs.getdbt.com/docs/about-dbt-extension.md) are currently in preview for local installations and beta in dbt. * **Beta:** Beta features are still in development and are only available to select customers. Beta features are incomplete and might not be entirely stable; they should be used at the customer’s risk, as breaking changes could occur. Beta features might not be fully documented, technical support is limited, and service level objectives (SLOs) might not be provided. Download the [Beta Features Terms and Conditions](https://docs.getdbt.com/assets/files/beta-tc-740ff696113c89c38a96bb70b968775e.pdf) for more details. * **Preview:** Preview features are stable and considered functionally ready for production deployments that are using supported features and do not depend on deprecated functionality. For more about the status of features and functionality, the [Fusion Diaries](https://github.com/dbt-labs/dbt-fusion/discussions/categories/announcements) contain the most recent updates. * **Path to Generally available (GA):** Learn what's required for the dbt Fusion engine to reach GA in our [Path to GA](https://docs.getdbt.com/blog/dbt-fusion-engine-path-to-ga) blog post. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Project dependencies EnterpriseEnterprise + ### Project dependencies [Enterprise](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing")[Enterprise +](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing") Available on dbt [Enterprise or Enterprise+](https://www.getdbt.com/pricing) plans. For a long time, dbt has supported code reuse and extension by installing other projects as [packages](https://docs.getdbt.com/docs/build/packages.md). When you install another project as a package, you are pulling in its full source code, and adding it to your own. This enables you to call macros and run models defined in that other project. While this is a great way to reuse code, share utility macros, and establish a starting point for common transformations, it's not a great way to enable collaboration across teams and at scale, especially in larger organizations. dbt Labs supports an expanded notion of `dependencies` across multiple dbt projects: * **Packages** — Familiar and pre-existing type of dependency. You take this dependency by installing the package's full source code (like a software library). * **Projects** — The dbt method to take a dependency on another project. Using a metadata service that runs behind the scenes, dbt resolves references on-the-fly to public models defined in other projects. You don't need to parse or run those upstream models yourself. Instead, you treat your dependency on those models as an API that returns a dataset. The maintainer of the public model is responsible for guaranteeing its quality and stability. #### Prerequisites[​](#prerequisites "Direct link to Prerequisites") * Available in [dbt Enterprise or Enterprise+](https://www.getdbt.com/pricing). To use it, designate a [public model](https://docs.getdbt.com/docs/mesh/govern/model-access.md) and add a [cross-project ref](#how-to-write-cross-project-ref). * For the upstream ("producer") project setup: * Configure models in upstream project with [`access: public`](https://docs.getdbt.com/reference/resource-configs/access.md) and have at least one successful job run after defining `access`. * Define a [Production deployment environment](https://docs.getdbt.com/docs/deploy/deploy-environments.md#set-as-production-environment) in the upstream project and make sure at least *one deployment job* has run successfully there. This job should generate a [`manifest.json` file](https://docs.getdbt.com/reference/artifacts/manifest-json.md) — it includes the metadata needed for downstream projects. * If the upstream project has a Staging environment, run at least one successful deployment job there to ensure downstream cross-project references resolve correctly. * Each project `name` must be unique in your dbt account. For example, if you have a dbt project (codebase) for the `jaffle_marketing` team, avoid creating projects for `Jaffle Marketing - Dev` and `Jaffle Marketing - Prod`; use [environment-level isolation](https://docs.getdbt.com/docs/dbt-cloud-environments.md#types-of-environments) instead. * dbt supports [Connections](https://docs.getdbt.com/docs/cloud/connect-data-platform/about-connections.md#connection-management), available to all dbt users. Connections allows different data platform connections per environment, eliminating the need to duplicate projects. Projects can use multiple connections of the same warehouse type. Connections are reusable across projects and environments. * The `dbt_project.yml` file is case-sensitive, which means the project name must exactly match the name in your `dependencies.yml`. For example, `jaffle_marketing`, not `JAFFLE_MARKETING`. #### Use cases[​](#use-cases "Direct link to Use cases") The following setup will work for every dbt project: * Add [any package dependencies](https://docs.getdbt.com/docs/mesh/govern/project-dependencies.md#when-to-use-project-dependencies) to `packages.yml` * Add [any project dependencies](https://docs.getdbt.com/docs/mesh/govern/project-dependencies.md#when-to-use-package-dependencies) to `dependencies.yml` However, you may be able to consolidate both into a single `dependencies.yml` file. Read the following section to learn more. ###### About packages.yml and dependencies.yml[​](#about-packagesyml-and-dependenciesyml "Direct link to About packages.yml and dependencies.yml") The `dependencies.yml`. file can contain both types of dependencies: "package" and "project" dependencies. * [Package dependencies](https://docs.getdbt.com/docs/build/packages.md#how-do-i-add-a-package-to-my-project) lets you add source code from someone else's dbt project into your own, like a library. * Project dependencies provide a different way to build on top of someone else's work in dbt. * Private packages are not supported in `dependencies.yml` because they intentionally don't support Jinja rendering or conditional configuration. This is to maintain static and predictable configuration and ensures compatibility with other services, like dbt. If your dbt project doesn't require the use of Jinja within the package specifications, you can simply rename your existing `packages.yml` to `dependencies.yml`. However, something to note is if your project's package specifications use Jinja, particularly for scenarios like adding an environment variable or a [Git token method](https://docs.getdbt.com/docs/build/packages.md#git-token-method) in a private Git package specification, you should continue using the `packages.yml` file name. Use the following toggles to understand the differences and determine when to use `dependencies.yml` or `packages.yml` (or both). Refer to the [FAQs](#faqs) for more info.  When to use Project dependencies Project dependencies are designed for the [dbt Mesh](https://docs.getdbt.com/best-practices/how-we-mesh/mesh-1-intro.md) and [cross-project reference](https://docs.getdbt.com/docs/mesh/govern/project-dependencies.md#how-to-write-cross-project-ref) workflow: * Use `dependencies.yml` when you need to set up cross-project references between different dbt projects, especially in a dbt Mesh setup. * Use `dependencies.yml` when you want to include both projects and non-private dbt packages in your project's dependencies. * Use `dependencies.yml` for organization and maintainability if you're using both [cross-project refs](https://docs.getdbt.com/docs/mesh/govern/project-dependencies.md#how-to-write-cross-project-ref) and [dbt Hub packages](https://hub.getdbt.com/). This reduces the need for multiple YAML files to manage dependencies.  When to use Package dependencies Package dependencies allow you to add source code from someone else's dbt project into your own, like a library: * If you only use packages like those from the [dbt Hub](https://hub.getdbt.com/), remain with `packages.yml`. * Use `packages.yml` when you want to download dbt packages, such as dbt projects, into your root or parent dbt project. Something to note is that it doesn't contribute to the dbt Mesh workflow. * Use `packages.yml` to include packages in your project's dependencies. This includes both public packages, such as those from the [dbt Hub](https://hub.getdbt.com/), and private packages. dbt now supports [native private packages](https://docs.getdbt.com/docs/build/packages.md#native-private-packages). * [`packages.yml` supports Jinja rendering](https://docs.getdbt.com/docs/build/dbt-tips.md#yaml-tips) for historical reasons, allowing dynamic configurations. This can be useful if you need to insert values, like a [Git token method](https://docs.getdbt.com/docs/build/packages.md#git-token-method) from an environment variable, into your package specifications. Previously, to use private Git repositories in dbt, you needed to use a workaround that involved embedding a Git token with Jinja. This is not ideal as it requires extra steps like creating a user and sharing a Git token. We’ve introduced support for [native private packages](https://docs.getdbt.com/docs/build/packages.md#native-private-packages-) to address this. #### Define project dependencies[​](#define-project-dependencies "Direct link to Define project dependencies") If your dbt project relies on models from another project, you can define that relationship using project dependencies. The following steps walk you through specifying project dependencies in dbt: 1. Create a file called `dependencies.yml` at the root of your dbt project. 2. In the `dependencies.yml`, list the upstream dbt project your project depends on as they appear in the `dbt_projects.yml` file. 3. (Optional) Define the specific models you expect from that upstream project to make the dependency explicit. 4. Use [`ref()`](https://docs.getdbt.com/reference/dbt-jinja-functions/ref.md) with the project name to reference upstream models in your SQL. 5. Commit the changes and ensure the dependency is configured in dbt. 6. dbt will resolve the dependency, ensure upstream projects are built first, and surface cross-project lineage in the lineage and DAG (Directed Acyclic Graph) views. ##### Example[​](#example "Direct link to Example") As an example, let's say you work on the Marketing team at the Jaffle Shop. The name of your team's project is `jaffle_marketing`: dbt\_project.yml ```yml name: jaffle_marketing ``` As part of your modeling of marketing data, you need to take a dependency on two other projects: * `dbt_utils` as a package: A collection of utility macros you can use while writing the SQL for your own models. This package is open-source public and maintained by dbt Labs. * `jaffle_finance` as a project use case: Data models about the Jaffle Shop's revenue. This project is private and maintained by your colleagues on the Finance team. You want to select from some of this project's final models, as a starting point for your own work. Refer to [Use cases](https://docs.getdbt.com/docs/mesh/govern/project-dependencies.md#use-cases) for information on package and project dependencies. dependencies.yml ```yml packages: - package: dbt-labs/dbt_utils version: 1.1.1 projects: - name: jaffle_finance # case sensitive and matches the 'name' in the 'dbt_project.yml' ``` What's happening here? The `dbt_utils` package — When you run `dbt deps`, dbt will pull down this package's full contents (100+ macros) as source code and add them to your environment. You can then call any macro from the package, just as you can call macros defined in your own project. The `jaffle_finance` projects — This is a new scenario. Unlike installing a package, the models in the `jaffle_finance` project will *not* be pulled down as source code and parsed into your project. Instead, dbt provides a metadata service that resolves references to [**public models**](https://docs.getdbt.com/docs/mesh/govern/model-access.md) defined in the `jaffle_finance` project. ##### Advantages[​](#advantages "Direct link to Advantages") When you're building on top of another team's work, resolving the references in this way has several advantages: * You're using an intentional interface designated by the model's maintainer with `access: public`. * You're keeping the scope of your project narrow, and avoiding unnecessary resources and complexity. This is faster for you and faster for dbt. * You don't need to mirror any conditional configuration of the upstream project such as `vars`, environment variables, or `target.name`. You can reference them directly wherever the Finance team is building their models in production. Even if the Finance team makes changes like renaming the model, changing the name of its schema, or [bumping its version](https://docs.getdbt.com/docs/mesh/govern/model-versions.md), your `ref` would still resolve successfully. * You eliminate the risk of accidentally building those models with `dbt run` or `dbt build`. While you can select those models, you can't actually build them. This prevents unexpected warehouse costs and permissions issues. This also ensures proper ownership and cost allocation for each team's models. ##### How to write cross-project ref[​](#how-to-write-cross-project-ref "Direct link to How to write cross-project ref") **Writing `ref`:** Models referenced from a `project`-type dependency must use [two-argument `ref`](https://docs.getdbt.com/reference/dbt-jinja-functions/ref.md#ref-project-specific-models), including the project name: models/marts/roi\_by\_channel.sql ```sql with monthly_revenue as ( select * from {{ ref('jaffle_finance', 'monthly_revenue') }} ), ... ``` ###### Cycle detection[​](#cycle-detection "Direct link to Cycle detection") You can enable bidirectional dependencies across projects so these relationships can go in either direction, meaning that the `jaffle_finance` project can add a new model that depends on any public models produced by the `jaffle_marketing` project, so long as the new dependency doesn't introduce any node-level cycles. dbt checks for cycles across projects and raises errors if any are detected. When setting up projects that depend on each other, it's important to do so in a stepwise fashion. Each project must run and produce public models before the original producer project can take a dependency on the original consumer project. For example, the order of operations would be as follows for a simple two-project setup: 1. The `project_a` project runs in a deployment environment and produces public models. 2. The `project_b` project adds `project_a` as a dependency. 3. The `project_b` project runs in a deployment environment and produces public models. 4. The `project_a` project adds `project_b` as a dependency. For more guidance on how to use Mesh, refer to the dedicated [Mesh guide](https://docs.getdbt.com/best-practices/how-we-mesh/mesh-1-intro.md) and also our freely available [Mesh learning course](https://learn.getdbt.com/courses/dbt-mesh). ##### Safeguarding production data with staging environments[​](#safeguarding-production-data-with-staging-environments "Direct link to Safeguarding production data with staging environments") When working in a Development environment, cross-project `ref`s normally resolve to the Production environment of the project. However, to protect production data, set up a [Staging deployment environment](https://docs.getdbt.com/docs/deploy/deploy-environments.md#staging-environment) within your projects. With a staging environment integrated into the project, Mesh automatically fetches public model information from the producer’s staging environment if the consumer is also in staging. Similarly, Mesh fetches from the producer’s production environment if the consumer is in production. This ensures consistency between environments and adds a layer of security by preventing access to production data during development workflows. Read [Why use a staging environment](https://docs.getdbt.com/docs/deploy/deploy-environments.md#why-use-a-staging-environment) for more information about the benefits. ###### Staging with downstream dependencies[​](#staging-with-downstream-dependencies "Direct link to Staging with downstream dependencies") dbt begins using the Staging environment to resolve cross-project references from downstream projects as soon as it exists in a project without "fail-over" to Production. This means that dbt will consistently use metadata from the Staging environment to resolve references in downstream projects, even if there haven't been any successful runs in the configured Staging environment. To avoid causing downtime for downstream developers, you should define and trigger a job before marking the environment as Staging: 1. Create a new environment, but do NOT mark it as **Staging**. 2. Define a job in that environment. 3. Trigger the job to run, and ensure it completes successfully. 4. Update the environment to mark it as **Staging**. ##### Comparison[​](#comparison "Direct link to Comparison") If you were to instead install the `jaffle_finance` project as a `package` dependency, you would instead be pulling down its full source code and adding it to your runtime environment. This means: * dbt needs to parse and resolve more inputs (which is slower) * dbt expects you to configure these models as if they were your own (with `vars`, env vars, etc) * dbt will run these models as your own unless you explicitly `--exclude` them * You could be using the project's models in a way that their maintainer (the Finance team) hasn't intended There are a few cases where installing another internal project as a package can be a useful pattern: * Unified deployments — In a production environment, if the central data platform team of Jaffle Shop wanted to schedule the deployment of models across both `jaffle_finance` and `jaffle_marketing`, they could use dbt's [selection syntax](https://docs.getdbt.com/reference/node-selection/syntax.md) to create a new "passthrough" project that installed both projects as packages. * Coordinated changes — In development, if you wanted to test the effects of a change to a public model in an upstream project (`jaffle_finance.monthly_revenue`) on a downstream model (`jaffle_marketing.roi_by_channel`) *before* introducing changes to a staging or production environment, you can install the `jaffle_finance` package as a package within `jaffle_marketing`. The installation can point to a specific git branch, however, if you find yourself frequently needing to perform end-to-end testing across both projects, we recommend you re-examine if this represents a stable interface boundary. These are the exceptions, rather than the rule. Installing another team's project as a package adds complexity, latency, and risk of unnecessary costs. By defining clear interface boundaries across teams, by serving one team's public models as "APIs" to another, and by enabling practitioners to develop with a more narrowly defined scope, we can enable more people to contribute, with more confidence, while requiring less context upfront. #### FAQs[​](#faqs "Direct link to FAQs") Can I define private packages in the dependencies.yml file? It depends on how you're accessing your private packages: * If you're using [native private packages](https://docs.getdbt.com/docs/build/packages.md#native-private-packages), you can define them in the `dependencies.yml` file. * If you're using the [git token method](https://docs.getdbt.com/docs/build/packages.md#git-token-method), you must define them in the `packages.yml` file instead of the `dependencies.yml` file. This is because conditional rendering (like Jinja-in-yaml) is not supported in `dependencies.yml`. Why doesn’t an indirectly referenced upstream public model appear in Explorer? For [project dependencies](https://docs.getdbt.com/docs/mesh/govern/project-dependencies.md) in Mesh, [Catalog](https://docs.getdbt.com/docs/explore/explore-multiple-projects.md) only displays directly referenced [public models](https://docs.getdbt.com/docs/mesh/govern/model-access.md) from upstream projects, even if an upstream model indirectly depends on another public model. So for example, if: * `project_b` adds `project_a` as a dependency * `project_b`'s model `downstream_c` references `project_a.upstream_b` * `project_a.upstream_b` references another public model, `project_a.upstream_a` Then: * In Explorer, only directly referenced public models (`upstream_b` in this case) appear. * In the [Studio IDE](https://docs.getdbt.com/docs/cloud/studio-ide/develop-in-studio.md) lineage view, however, `upstream_a` (the indirect dependency) *will* appear because dbt dynamically resolves the full dependency graph. This behavior makes sure that Catalog only shows the immediate dependencies available to that specific project. #### Related docs[​](#related-docs "Direct link to Related docs") * Refer to the [Mesh](https://docs.getdbt.com/best-practices/how-we-mesh/mesh-1-intro.md) guide for more guidance on how to use Mesh. * [Quickstart with Mesh](https://docs.getdbt.com/guides/mesh-qs.md) #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Project recommendations EnterpriseEnterprise + ### Project recommendations [Enterprise](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing")[Enterprise +](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing") Catalog provides recommendations about your project from the `dbt_project_evaluator` [package](https://hub.getdbt.com/dbt-labs/dbt_project_evaluator/latest/) using metadata from the [Discovery API](https://docs.getdbt.com/docs/dbt-cloud-apis/discovery-api.md). * Catalog also offers a global view, showing all the recommendations across the project for easy sorting and summarizing. * These recommendations provide insight into how you can create a better-documented, better-tested, and better-built dbt project, creating more trust and less confusion. * For a seamless and consistent experience, recommendations use `dbt_project_evaluator`'s pre-defined settings and don't import customizations applied to your package or project. On-demand learning If you enjoy video courses, check out our [dbt Catalog on-demand course](https://learn.getdbt.com/courses/dbt-catalog) and learn how to best explore your dbt project(s)! #### Recommendations page[​](#recommendations-page "Direct link to Recommendations page") The Recommendations overview page includes two top-level metrics measuring the test and documentation coverage of the models in your project. * **Model test coverage** — The percent of models in your project (models not from a package or imported via Mesh) with at least one dbt test configured on them. * **Model documentation coverage** — The percent of models in your project (models not from a package or imported via Mesh) with a description. [![Example of the Recommendations overview page with project metrics and the recommendations for all resources in the project](/img/docs/collaborate/dbt-explorer/example-recommendations-overview.png?v=2 "Example of the Recommendations overview page with project metrics and the recommendations for all resources in the project")](#)Example of the Recommendations overview page with project metrics and the recommendations for all resources in the project #### List of rules[​](#list-of-rules "Direct link to List of rules") The following table lists the rules currently defined in the `dbt_project_evaluator` [package](https://hub.getdbt.com/dbt-labs/dbt_project_evaluator/latest/). | Category | Name | Description | Package Docs Link | | ------------- | ----------------------------- | -------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------- | | Modeling | Direct Join to Source | Model that joins both a model and source, indicating a missing staging model | [GitHub](https://dbt-labs.github.io/dbt-project-evaluator/0.8/rules/modeling/#direct-join-to-source) | | Modeling | Duplicate Sources | More than one source node corresponds to the same data warehouse relation | [GitHub](https://dbt-labs.github.io/dbt-project-evaluator/0.8/rules/modeling/#duplicate-sources) | | Modeling | Multiple Sources Joined | Models with more than one source parent, indicating lack of staging models | [GitHub](https://dbt-labs.github.io/dbt-project-evaluator/0.8/rules/modeling/#multiple-sources-joined) | | Modeling | Root Model | Models with no parents, indicating potential hardcoded references and need for sources | [GitHub](https://dbt-labs.github.io/dbt-project-evaluator/0.8/rules/modeling/#root-models) | | Modeling | Source Fanout | Sources with more than one model child, indicating a need for staging models | [GitHub](https://dbt-labs.github.io/dbt-project-evaluator/0.8/rules/modeling/#source-fanout) | | Modeling | Unused Source | Sources that are not referenced by any resource | [GitHub](https://dbt-labs.github.io/dbt-project-evaluator/0.8/rules/modeling/#unused-sources) | | Performance | Exposure Dependent on View | Exposures with at least one model parent materialized as a view, indicating potential query performance issues | [GitHub](https://dbt-labs.github.io/dbt-project-evaluator/0.8/rules/performance/#exposure-parents-materializations) | | Testing | Missing Primary Key Test | Models with insufficient testing on the grain of the model. | [GitHub](https://dbt-labs.github.io/dbt-project-evaluator/0.8/rules/testing/#missing-primary-key-tests) | | Documentation | Undocumented Models | Models without a model-level description | [GitHub](https://dbt-labs.github.io/dbt-project-evaluator/0.8/rules/documentation/#undocumented-models) | | Documentation | Undocumented Source | Sources (collections of source tables) without descriptions | [GitHub](https://dbt-labs.github.io/dbt-project-evaluator/0.8/rules/documentation/#undocumented-sources) | | Documentation | Undocumented Source Tables | Source tables without descriptions | [GitHub](https://dbt-labs.github.io/dbt-project-evaluator/0.8/rules/documentation/#undocumented-source-tables) | | Governance | Public Model Missing Contract | Models with public access that do not have a model contract to ensure the data types | [GitHub](https://dbt-labs.github.io/dbt-project-evaluator/0.8/rules/governance/#public-models-without-contracts) | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | #### The Recommendations tab[​](#the-recommendations-tab "Direct link to The Recommendations tab") Models, sources, and exposures each also have a **Recommendations** tab on their resource details page, with the specific recommendations that correspond to that resource: [![Example of the Recommendations tab ](/img/docs/collaborate/dbt-explorer/example-recommendations-tab.png?v=2 "Example of the Recommendations tab ")](#)Example of the Recommendations tab #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Project variables dbt provides a mechanism called [variables](https://docs.getdbt.com/reference/dbt-jinja-functions/var.md) to provide data to models for compilation. Variables allow you to define configurable values for your project instead of hardcoding them in SQL. You might use variables to [configure timezones](https://github.com/dbt-labs/snowplow/blob/0.3.9/dbt_project.yml#L22), set reporting date ranges, [avoid hardcoding table names](https://github.com/dbt-labs/quickbooks/blob/v0.1.0/dbt_project.yml#L23), or otherwise control how models are compiled. To use a variable in a model, hook, or macro, use the `{{ var('...') }}` function. The `var()` function retrieves the value defined in your project or passed using `--vars`. For more information, see [About var function](https://docs.getdbt.com/reference/dbt-jinja-functions/var.md). Note, refer to [YAML tips](https://docs.getdbt.com/docs/build/dbt-tips.md#yaml-tips) for more YAML information. ##### Defining variables in `dbt_project.yml`[​](#defining-variables-in-dbt_projectyml "Direct link to defining-variables-in-dbt_projectyml") info Jinja is not supported within the `vars` config, and all values will be interpreted literally. To define variables in a dbt project, add a `vars` config to your `dbt_project.yml` file. These `vars` can be scoped globally, or to a specific package imported in your project. dbt\_project.yml ```yaml name: my_dbt_project version: 1.0.0 config-version: 2 vars: # The `start_date` variable will be accessible in all resources start_date: '2016-06-01' # The `platforms` variable is only accessible to resources in the my_dbt_project project my_dbt_project: platforms: ['web', 'mobile'] # The `app_ids` variable is only accessible to resources in the snowplow package snowplow: app_ids: ['marketing', 'app', 'landing-page'] models: ... ``` ##### Defining variables on the command line[​](#defining-variables-on-the-command-line "Direct link to Defining variables on the command line") The `dbt_project.yml` file is a great place to define variables that rarely change. When you need to override a variable for a specific run, use the `--vars` command line option. For example, when you want to test with a different date range, run models with environment-specific settings, or adjust behavior dynamically. Use `--vars` to pass one or more variables to a dbt command. Provide the argument as a YAML dictionary string. For example: ```text $ dbt run --vars '{"event_type": "signup"}' ``` Inside a model or macro, access the value using the `var()` function: ```text select '{{ var("event_type") }}' as event_type ``` When you pass variables using `--vars`, you can access them anywhere you use the `var()` function in your project. You can pass multiple variables at once: ```text $ dbt run --vars '{event_type: signup, region: us}' ``` If only one variable is being set, the brackets are optional: ```text $ dbt run --vars 'event_type: signup' ``` The `--vars` argument accepts a YAML dictionary as a string on the command line. YAML is convenient because it does not require strict quoting as with JSON. Both of the following are valid and equivalent: ```text $ dbt run --vars '{"key": "value", "date": 20180101}' $ dbt run --vars '{key: value, date: 20180101}' ``` Variables defined using `--var`, override values defined in `dbt_project.yml`. This makes `--vars` useful for temporarily overriding configuration without changing your committed project files. For the complete order of precedence (including package-scoped variables and default values defined in `var()`), see [Variable precedence](https://docs.getdbt.com/docs/build/project-variables.md#variable-precedence). You can find more information on defining dictionaries with YAML [here](https://github.com/Animosity/CraftIRC/wiki/Complete-idiot%27s-introduction-to-yaml). ##### Variable precedence[​](#variable-precedence "Direct link to Variable precedence") If dbt is unable to find a definition for a variable after checking all possible variable declaration places, then a compilation error will be raised. **Note:** Variable scope is based on the node ultimately using that variable. Imagine the case where a model defined in the root project is calling a macro defined in an installed package. That macro, in turn, uses the value of a variable. The variable will be resolved based on the *root project's* scope, rather than the package's scope. ##### Questions from the Community[​](#questions-from-the-community "Direct link to Questions from the Community") ![Loading](/img/loader-icon.svg)[Ask the Community](https://discourse.getdbt.com/new-topic?category=help\&tags=variables "Ask the Community") #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Python models Note that only specific data platforms support `dbt-py` models. Check the [platform configuration pages](https://docs.getdbt.com/reference/resource-configs.md) to confirm if Python models are supported. Python models for Snowflake, BigQuery, and Databricks are supported in [Fusion](https://docs.getdbt.com/docs/fusion/about-fusion.md). Please refer to the [supported features](https://docs.getdbt.com/docs/fusion/supported-features.md) page to learn more about Fusion. We encourage you to: * Read [the original discussion](https://github.com/dbt-labs/dbt-core/discussions/5261) that proposed this feature. * Share your thoughts and ideas on [next steps for Python models](https://github.com/dbt-labs/dbt-core/discussions/5742). * Join the **#dbt Core-python-models** channel in the [dbt Community Slack](https://www.getdbt.com/community/join-the-community/). #### Overview[​](#overview "Direct link to Overview") dbt Python (`dbt-py`) models can help you solve use cases that can't be solved with SQL. You can perform analyses using tools available in the open-source Python ecosystem, including state-of-the-art packages for data science and statistics. Before, you would have needed separate infrastructure and orchestration to run Python transformations in production. Python transformations defined in dbt are models in your project with all the same capabilities around testing, documentation, and lineage. models/my\_python\_model.py ```python import ... def model(dbt, session): my_sql_model_df = dbt.ref("my_sql_model") final_df = ... # stuff you can't write in SQL! return final_df ``` models/config.yml ```yml models: - name: my_python_model # Document within the same codebase description: My transformation written in Python # Configure in ways that feel intuitive and familiar config: materialized: table tags: ['python'] # Test the results of my Python transformation columns: - name: id # Standard validation for 'grain' of Python results data_tests: - unique - not_null data_tests: # Write your own validation logic (in SQL) for Python results - custom_generic_test ``` [![SQL + Python, together at last](/img/docs/building-a-dbt-project/building-models/python-models/python-model-dag.png?v=2 "SQL + Python, together at last")](#)SQL + Python, together at last The prerequisites for dbt Python models include using an adapter for a data platform that supports a fully featured Python runtime when using dbt Core or Fusion engine. In a dbt Python model, all Python code is executed remotely on the platform. None of it is run by dbt locally. We believe in clearly separating *model definition* from *model execution*. In this and many other ways, you'll find that dbt's approach to Python models mirrors its longstanding approach to modeling data in SQL. We've written this guide assuming that you have some familiarity with dbt. If you've never before written a dbt model, we encourage you to start by first reading [dbt Models](https://docs.getdbt.com/docs/build/models.md). Throughout, we'll be drawing connections between Python models and SQL models, as well as making clear their differences. ##### What is a Python model?[​](#what-is-a-python-model "Direct link to What is a Python model?") A dbt Python model is a function that reads in dbt sources or other models, applies a series of transformations, and returns a transformed dataset. DataFrame operations define the starting points, the end state, and each step along the way. This is similar to the role of CTEs in dbt SQL models. We use CTEs to pull in upstream datasets, define (and name) a series of meaningful transformations, and end with a final `select` statement. You can run the compiled version of a dbt SQL model to see the data included in the resulting view or table. When you `dbt run`, dbt wraps that query in `create view`, `create table`, or more complex DDL to save its results in the database. Instead of a final `select` statement, each Python model returns a final DataFrame. Each DataFrame operation is "lazily evaluated." In development, you can preview its data, using methods like `.show()` or `.head()`. When you run a Python model, the full result of the final DataFrame will be saved as a table in your data warehouse. dbt Python models have access to almost all of the same configuration options as SQL models. You can test and document them, add `tags` and `meta` properties, and grant access to their results to other users. You can select them by their name, file path, configurations, whether they are upstream or downstream of another model, or if they have been modified compared to a previous project state. ##### Defining a Python model[​](#defining-a-python-model "Direct link to Defining a Python model") Each Python model lives in a `.py` file in your `models/` folder. It defines a function named **`model()`**, which takes two parameters: * **`dbt`**: A class compiled by dbt Core, unique to each model, enables you to run your Python code in the context of your dbt project and DAG. * **`session`**: A class representing your data platform’s connection to the Python backend. The session is needed to read in tables as DataFrames, and to write DataFrames back to tables. In PySpark, by convention, the `SparkSession` is named `spark`, and available globally. For consistency across platforms, we always pass it into the `model` function as an explicit argument called `session`. The `model()` function must return a single DataFrame. On Snowpark (Snowflake), this can be a Snowpark or pandas DataFrame. On BigQuery this can be BigFrames, pandas or Spark datafame. Via PySpark (Databricks), this can be a Spark, pandas, or pandas-on-Spark DataFrame. For more information about choosing between pandas and native DataFrames, see [DataFrame API + syntax](#dataframe-api-and-syntax). When you `dbt run --select python_model`, dbt will prepare and pass in both arguments (`dbt` and `session`). All you have to do is define the function. This is how every single Python model should look: models/my\_python\_model.py ```python def model(dbt, session): ... return final_df ``` ##### Referencing other models[​](#referencing-other-models "Direct link to Referencing other models") Python models participate fully in dbt's directed acyclic graph (DAG) of transformations. Use the `dbt.ref()` method within a Python model to read data from other models (SQL or Python). If you want to read directly from a raw source table, use `dbt.source()`. These methods return DataFrames pointing to the upstream source, model, seed, or snapshot. models/my\_python\_model.py ```python def model(dbt, session): # DataFrame representing an upstream model upstream_model = dbt.ref("upstream_model_name") # DataFrame representing an upstream source upstream_source = dbt.source("upstream_source_name", "table_name") ... ``` Of course, you can `ref()` your Python model in downstream SQL models, too: models/downstream\_model.sql ```sql with upstream_python_model as ( select * from {{ ref('my_python_model') }} ), ... ``` caution Referencing [ephemeral](https://docs.getdbt.com/docs/build/materializations.md#ephemeral) models is currently not supported (see [feature request](https://github.com/dbt-labs/dbt-core/issues/7288)) From dbt version 1.8, Python models also support dynamic configurations within Python f-strings. This allows for more nuanced and dynamic model configurations directly within your Python code. For example: models/my\_python\_model.py ```python # Previously, attempting to access a configuration value like this would result in None print(f"{dbt.config.get('my_var')}") # Output before change: None # Now you can access the actual configuration value # Assuming 'my_var' is configured to 5 for the current model print(f"{dbt.config.get('my_var')}") # Output after change: 5 ``` This also means you can use `dbt.config.get()` within Python models to ensure that configuration values are effectively retrievable and usable within Python f-strings. #### Configuring Python models[​](#configuring-python-models "Direct link to Configuring Python models") Just like SQL models, there are three ways to configure Python models: 1. In `dbt_project.yml`, where you can configure many models at once 2. In a dedicated `.yml` file, within the `models/` directory 3. Within the model's `.py` file, using the `dbt.config()` method Calling the `dbt.config()` method will set configurations for your model within your `.py` file, similar to the `{{ config() }}` macro in `.sql` model files: models/my\_python\_model.py ```python def model(dbt, session): # setting configuration dbt.config(materialized="table") ``` There's a limit to how complex you can get with the `dbt.config()` method. It accepts *only* literal values (strings, booleans, and numeric types) and dynamic configuration. Passing another function or a more complex data structure is not possible. The reason is that dbt statically analyzes the arguments to `config()` while parsing your model without executing your Python code. If you need to set a more complex configuration, we recommend you define it using the [`config` property](https://docs.getdbt.com/reference/resource-properties/config.md) in a properties YAML file. ###### Accessing project context[​](#accessing-project-context "Direct link to Accessing project context") dbt Python models don't use Jinja to render compiled code. Python models have limited access to global project contexts compared to SQL models. That context is made available from the `dbt` class, passed in as an argument to the `model()` function. Out of the box, the `dbt` class supports: * Returning DataFrames referencing the locations of other resources: `dbt.ref()` + `dbt.source()` * Accessing the database location of the current model: `dbt.this()` (also: `dbt.this.database`, `.schema`, `.identifier`) * Determining if the current model's run is incremental: `dbt.is_incremental` * Accessing custom values stored in `meta`: `dbt.config.meta_get()` It is possible to extend this context by "getting" them with `dbt.config.get()` after they are configured in the [model's config](https://docs.getdbt.com/reference/model-configs.md). The `dbt.config.get()` method supports dynamic access to configurations within Python models, enhancing flexibility in model logic. This includes inputs such as `var`, `env_var`, and `target`. If you want to use those values for the conditional logic in your model, we require setting them through a dedicated properties YAML file config: models/config.yml ```yml models: - name: my_python_model config: materialized: table target_name: "{{ target.name }}" specific_var: "{{ var('SPECIFIC_VAR') }}" specific_env_var: "{{ env_var('SPECIFIC_ENV_VAR') }}" ``` Then, within the model's Python code, use the `dbt.config.get()` function to *access* values of configurations that have been set: models/my\_python\_model.py ```python def model(dbt, session): target_name = dbt.config.get("target_name") specific_var = dbt.config.get("specific_var") specific_env_var = dbt.config.get("specific_env_var") orders_df = dbt.ref("fct_orders") # limit data in dev if target_name == "dev": orders_df = orders_df.limit(500) ``` ###### Accessing custom meta values[​](#accessing-custom-meta-values "Direct link to Accessing custom meta values") To store custom values, use the [`meta` config](https://docs.getdbt.com/reference/resource-configs/meta.md). For example, if you have a model named `my_python_model` and you want to store custom values, you can do the following: models/schema.yml ```yml models: - name: my_python_model config: meta: custom_value: "111" another_value: "abc" ``` Then access them in your Python model using the `dbt.config.meta_get()` method: models/my\_python\_model.py ```python def model(dbt, session): # Access custom values stored in meta directly custom_value = dbt.config.meta_get("custom_value") another_value = dbt.config.meta_get("another_value") # Use your custom values in your model logic orders_df = dbt.ref("fct_orders") ... ``` Alternative approach You can also retrieve meta values using `dbt.config.get("meta")`, which returns the entire meta dictionary. When using this approach, handle the case where `meta` might not be configured: ```python custom_value = dbt.config.get("meta", {}).get("custom_value") ``` ###### Dynamic configurations[​](#dynamic-configurations "Direct link to Dynamic configurations") In addition to the existing methods of configuring Python models, you also have dynamic access to configuration values set with `dbt.config()` within Python models using f-strings. This increases the possibilities for custom logic and configuration management. models/my\_python\_model.py ```python def model(dbt, session): dbt.config(materialized="table") # Dynamic configuration access within Python f-strings, # which allows for real-time retrieval and use of configuration values. # Assuming 'my_var' is set to 5, this will print: Dynamic config value: 5 print(f"Dynamic config value: {dbt.config.get('my_var')}") ``` ##### Materializations[​](#materializations "Direct link to Materializations") Python models support these materializations: * `table` (default) * `incremental` Incremental Python models support all the same [incremental strategies](https://docs.getdbt.com/docs/build/incremental-strategy.md) as their SQL counterparts. The specific strategies supported depend on your adapter. As an example, incremental models are supported on BigQuery with Dataproc for the `merge` incremental strategy; the `insert_overwrite` strategy is not yet supported. Python models can't be materialized as `view` or `ephemeral`. Python isn't supported for non-model resource types (like tests and snapshots). For incremental models, like SQL models, you need to filter incoming tables to only new rows of data: * Snowpark * BigQuery DataFrames * PySpark models/my\_python\_model.py ```python import snowflake.snowpark.functions as F def model(dbt, session): dbt.config(materialized = "incremental") df = dbt.ref("upstream_table") if dbt.is_incremental: # only new rows compared to max in current table max_from_this = f"select max(updated_at) from {dbt.this}" df = df.filter(df.updated_at >= session.sql(max_from_this).collect()[0][0]) # or only rows from the past 3 days df = df.filter(df.updated_at >= F.dateadd("day", F.lit(-3), F.current_timestamp())) ... return df ``` models/my\_python\_model.py ```python import datetime def model(dbt, session): dbt.config(materialized = "incremental") bdf = dbt.ref("upstream_table") if dbt.is_incremental: # only new rows compared to max in current table max_from_this = f"select max(updated_at) from {dbt.this}" bdf = bdf[bdf['updated_at'] >= bpd.read_gbq(max_from_this).values[0][0]] # or only rows from the past 3 days bdf = bdf[bdf['updated_at'] >= datetime.date.today() - datetime.timedelta(days=3)] ... return bdf ``` models/my\_python\_model.py ```python import pyspark.sql.functions as F def model(dbt, session): dbt.config(materialized = "incremental") df = dbt.ref("upstream_table") if dbt.is_incremental: # only new rows compared to max in current table max_from_this = f"select max(updated_at) from {dbt.this}" df = df.filter(df.updated_at >= session.sql(max_from_this).collect()[0][0]) # or only rows from the past 3 days df = df.filter(df.updated_at >= F.date_add(F.current_timestamp(), F.lit(-3))) ... return df ``` #### Python-specific functionality[​](#python-specific-functionality "Direct link to Python-specific functionality") ##### Defining functions[​](#defining-functions "Direct link to Defining functions") In addition to defining a `model` function, the Python model can import other functions or define its own. Here's an example on Snowpark, defining a custom `add_one` function: models/my\_python\_model.py ```python def add_one(x): return x + 1 def model(dbt, session): dbt.config(materialized="table") temps_df = dbt.ref("temperatures") # warm things up just a little df = temps_df.withColumn("degree_plus_one", add_one(temps_df["degree"])) return df ``` Currently, Python functions defined in one dbt model can't be imported and reused in other models. Refer to [Code reuse](#code-reuse) for the potential patterns being considered. ##### Using PyPI packages[​](#using-pypi-packages "Direct link to Using PyPI packages") You can also define functions that depend on third-party packages so long as those packages are installed and available to the Python runtime on your data platform. In this example, we use the `holidays` package to determine if a given date is a holiday in France. The code below uses the pandas API for simplicity and consistency across platforms. The exact syntax, and the need to refactor for multi-node processing, still vary. * Snowpark * BigQuery DataFrames * PySpark models/my\_python\_model.py ```python import holidays def is_holiday(date_col): # Chez Jaffle french_holidays = holidays.France() is_holiday = (date_col in french_holidays) return is_holiday def model(dbt, session): dbt.config( materialized = "table", packages = ["holidays"] ) orders_df = dbt.ref("stg_orders") df = orders_df.to_pandas() # apply our function # (columns need to be in uppercase on Snowpark) df["IS_HOLIDAY"] = df["ORDER_DATE"].apply(is_holiday) df["ORDER_DATE"].dt.tz_localize('UTC') # convert from Number/Long to tz-aware Datetime # return final dataset (Pandas DataFrame) return df ``` models/my\_python\_model.py ```python import holidays def model(dbt, session): dbt.config(submission_method="bigframes") data = { 'id': [0, 1, 2], 'name': ['Brian Davis', 'Isaac Smith', 'Marie White'], 'birthday': ['2024-03-14', '2024-01-01', '2024-11-07'] } bdf = bpd.DataFrame(data) bdf['birthday'] = bpd.to_datetime(bdf['birthday']) bdf['birthday'] = bdf['birthday'].dt.date us_holidays = holidays.US(years=2024) return bdf[bdf['birthday'].isin(us_holidays)] ``` models/my\_python\_model.py ```python import holidays def is_holiday(date_col): # Chez Jaffle french_holidays = holidays.France() is_holiday = (date_col in french_holidays) return is_holiday def model(dbt, session): dbt.config( materialized = "table", packages = ["holidays"] ) orders_df = dbt.ref("stg_orders") df = orders_df.to_pandas_on_spark() # Spark 3.2+ # df = orders_df.toPandas() in earlier versions # apply our function df["is_holiday"] = df["order_date"].apply(is_holiday) # convert back to PySpark df = df.to_spark() # Spark 3.2+ # df = session.createDataFrame(df) in earlier versions # return final dataset (PySpark DataFrame) return df ``` ###### Configuring packages[​](#configuring-packages "Direct link to Configuring packages") We encourage you to configure required packages and versions so dbt can track them in project metadata. This configuration is required for the implementation on some platforms. If you need specific versions of packages, specify them. models/my\_python\_model.py ```python def model(dbt, session): dbt.config( packages = ["numpy==1.23.1", "scikit-learn"] ) ``` models/config.yml ```yml models: - name: my_python_model config: packages: - "numpy==1.23.1" - scikit-learn ``` ###### User-defined functions (UDFs)[​](#user-defined-functions-udfs "Direct link to User-defined functions (UDFs)") You can use the `@udf` decorator or `udf` function to define an "anonymous" function and call it within your `model` function's DataFrame transformation. This is a typical pattern for applying more complex functions as DataFrame operations, especially if those functions require inputs from third-party packages. * [Snowpark Python: Creating UDFs](https://docs.snowflake.com/en/developer-guide/snowpark/python/creating-udfs.html) * [BigQuery DataFrames UDFs](https://cloud.google.com/bigquery/docs/use-bigquery-dataframes#custom-python-functions) * [PySpark functions: udf](https://spark.apache.org/docs/latest/api/python/reference/pyspark.sql/api/pyspark.sql.functions.udf.html) tip You can also define [SQL or Python UDFs](https://docs.getdbt.com/docs/build/udfs.md) as first-class resources under `/functions` with a matching `YAML` file. dbt builds them as part of the DAG, and you reference them from SQL using `{{ function('my_udf') }}`. These UDFs are reusable across tools (BI, notebooks, SQL clients) because they live in your warehouse. * Snowpark * BigQuery DataFrames * PySpark models/my\_python\_model.py ```python import snowflake.snowpark.types as T import snowflake.snowpark.functions as F import numpy def register_udf_add_random(): add_random = F.udf( # use 'lambda' syntax, for simple functional behavior lambda x: x + numpy.random.normal(), return_type=T.FloatType(), input_types=[T.FloatType()] ) return add_random def model(dbt, session): dbt.config( materialized = "table", packages = ["numpy"] ) temps_df = dbt.ref("temperatures") add_random = register_udf_add_random() # warm things up, who knows by how much df = temps_df.withColumn("degree_plus_random", add_random("degree")) return df ``` **Note:** Due to a Snowpark limitation, it is not currently possible to register complex named UDFs within stored procedures and, therefore, dbt Python models. We are looking to add native support for Python UDFs as a project/DAG resource type in a future release. For the time being, if you want to create a "vectorized" Python UDF via the Batch API, we recommend either: * Writing [`create function`](https://docs.snowflake.com/en/developer-guide/udf/python/udf-python-batch.html) inside a SQL macro, to run as a hook or run-operation * [Registering from a staged file](https://docs.snowflake.com/en/developer-guide/snowpark/python/creating-udfs#creating-a-udf-from-a-python-source-file) within your Python model code models/my\_python\_model.py ```python def model(dbt, session): dbt.config(submission_method="bigframes") # You can also use @bpd.udf @bpd.remote_function(dataset='jialuo_test_us') def my_func(x: int) -> int: return x * 1100 data = {"int": [1, 2], "str": ['a', 'b']} bdf = bpd.DataFrame(data=data) bdf['int'] = bdf['int'].apply(my_func) return bdf ``` models/my\_python\_model.py ```python import pyspark.sql.types as T import pyspark.sql.functions as F import numpy # use a 'decorator' for more readable code @F.udf(returnType=T.DoubleType()) def add_random(x): random_number = numpy.random.normal() return x + random_number def model(dbt, session): dbt.config( materialized = "table", packages = ["numpy"] ) temps_df = dbt.ref("temperatures") # warm things up, who knows by how much df = temps_df.withColumn("degree_plus_random", add_random("degree")) return df ``` ###### Code reuse[​](#code-reuse "Direct link to Code reuse") To re-use a Python function across multiple dbt models, you can define [Python UDFs](https://docs.getdbt.com/docs/build/udfs.md) under `/functions` with a matching YAML file. These UDFs live in your warehouse and can be reused across tools (BI, notebooks, SQL clients). In the future, we're considering also adding support for Private Python packages. In addition to importing reusable functions from public PyPI packages, many data platforms support uploading custom Python assets and registering them as packages. The upload process looks different across platforms, but your code’s actual `import` looks the same. ❓ dbt questions * How can dbt help users when uploading or initializing private Python assets? Is this a new form of `dbt deps`? * How can dbt support users who want to test custom functions? If defined as UDFs: "unit testing" in the database? If "pure" functions in packages: encourage adoption of `pytest`? 💬 Discussion: ["Python models: package, artifact/object storage, and UDF management in dbt"](https://github.com/dbt-labs/dbt-core/discussions/5741) ##### DataFrame API and syntax[​](#dataframe-api-and-syntax "Direct link to DataFrame API and syntax") Over the past decade, most people writing [data transformations](https://www.getdbt.com/analytics-engineering/transformation/) in Python have adopted DataFrame as their common abstraction. dbt follows this convention by returning `ref()` and `source()` as DataFrames, and it expects all Python models to return a DataFrame. A DataFrame is a two-dimensional data structure (rows and columns). It supports convenient methods for transforming that data and creating new columns from calculations performed on existing columns. It also offers convenient ways for previewing data while developing locally or in a notebook. That's about where the agreement ends. There are numerous frameworks with their own syntaxes and APIs for DataFrames. The [pandas](https://pandas.pydata.org/docs/) library offered one of the original DataFrame APIs, and its syntax is the most common to learn for new data professionals. Most newer DataFrame APIs are compatible with pandas-style syntax, though few can offer perfect interoperability. This is true for BigQuery DataFrames, Snowpark, and PySpark, which have their own DataFrame APIs. When developing a Python model, you will find yourself asking these questions: **Why pandas?** — It's the most common API for DataFrames. It makes it easy to explore sampled data and develop transformations locally. You can “promote” your code as-is into dbt models and run it in production for small datasets. **Why *not* pandas?** — Performance. pandas runs "single-node" transformations, which cannot benefit from the parallelism and distributed computing offered by modern data warehouses. This quickly becomes a problem as you operate on larger datasets. Some data platforms support optimizations for code written using pandas DataFrame API, preventing the need for major refactors. For example, [pandas on PySpark](https://spark.apache.org/docs/latest/api/python/getting_started/quickstart_ps.html) offers support for 95% of pandas functionality, using the same API while still leveraging parallel processing. ❓ dbt questions * When developing a new dbt Python model, should we recommend pandas-style syntax for rapid iteration and then refactor? * Which open source libraries provide compelling abstractions across different data engines and vendor-specific APIs? * Should dbt attempt to play a longer-term role in standardizing across them? 💬 Discussion: ["Python models: the pandas problem (and a possible solution)"](https://github.com/dbt-labs/dbt-core/discussions/5738) #### Limitations[​](#limitations "Direct link to Limitations") Python models have capabilities that SQL models do not. They also have some drawbacks compared to SQL models: * **Time and cost.** Python models are slower to run than SQL models, and the cloud resources that run them can be more expensive. Running Python requires more general-purpose compute. That compute might sometimes live on a separate service or architecture from your SQL models. **However:** We believe that deploying Python models via dbt—with unified lineage, testing, and documentation—is, from a human standpoint, **dramatically** faster and cheaper. By comparison, spinning up separate infrastructure to orchestrate Python transformations in production and different tooling to integrate with dbt is much more time-consuming and expensive. * **Syntax differences** are even more pronounced. Over the years, dbt has done a lot, via dispatch patterns and packages such as `dbt_utils`, to abstract over differences in SQL dialects across popular data warehouses. Python offers a **much** wider field of play. If there are five ways to do something in SQL, there are 500 ways to write it in Python, all with varying performance and adherence to standards. Those options can be overwhelming. As the maintainers of dbt, we will be learning from state-of-the-art projects tackling this problem and sharing guidance as we develop it. * **These capabilities are very new.** As data warehouses develop new features, we expect them to offer cheaper, faster, and more intuitive mechanisms for deploying Python transformations. **We reserve the right to change the underlying implementation for executing Python models in future releases.** Our commitment to you is around the code in your model `.py` files, following the documented capabilities and guidance we're providing here. * **Lack of `print()` support.** The data platform runs and compiles your Python model without dbt's oversight. This means it doesn't display the output of commands such as Python's built-in [`print()`](https://docs.python.org/3/library/functions.html#print) function in dbt's logs. *  Alternatives to using print() in Python models The following explains other methods you can use for debugging, such as writing messages to a dataframe column: * Using platform logs: Use your data platform's logs to debug your Python models. * Return logs as a dataframe: Create a dataframe containing your logs and build it into the warehouse. * Develop locally with DuckDB: Test and debug your models locally using DuckDB before deploying them. Here's an example of debugging in a Python model: ```python def model(dbt, session): dbt.config( materialized = "table" ) df = dbt.ref("my_source_table").df() # One option for debugging: write messages to temporary table column # Pros: visibility # Cons: won't work if table isn't building for some reason msg = "something" df["debugging"] = f"My debug message here: {msg}" return df ``` As a general rule, if there's a transformation you could write equally well in SQL or Python, we believe that well-written SQL is preferable: it's more accessible to a greater number of colleagues, and it's easier to write code that's performant at scale. If there's a transformation you *can't* write in SQL, or where ten lines of elegant and well-annotated Python could save you 1000 lines of hard-to-read Jinja-SQL, Python is the way to go. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Ratio metrics Ratio metrics allow you to create a ratio between two metrics. You specify a numerator and a denominator metric. You can optionally apply filters, names, and aliases to both the numerator and denominator when computing the metric. The parameters for ratio metrics are as follows: The complete specification for ratio metrics is as follows: For advanced data modeling, you can use `fill_nulls_with` and `join_to_timespine` to [set null metric values to zero](https://docs.getdbt.com/docs/build/fill-nulls-advanced.md), ensuring numeric values for every data row. #### Ratio metrics example[​](#ratio-metrics-example "Direct link to Ratio metrics example") These examples demonstrate how to create ratio metrics in your model. They cover basic and advanced use cases, including applying filters to the numerator and denominator metrics. ###### Example 1[​](#example-1 "Direct link to Example 1") This example is a basic ratio metric that calculates the ratio of food orders to total orders: ###### Example 2[​](#example-2 "Direct link to Example 2") This example is a ratio metric that calculates the ratio of food orders to total orders, with a filter and alias applied to the numerator. Note that in order to add these attributes, you'll need to use an explicit key for the name attribute too. #### Ratio metrics using different semantic models[​](#ratio-metrics-using-different-semantic-models "Direct link to Ratio metrics using different semantic models") The system will simplify and turn the numerator and denominator into a ratio metric from different semantic models by computing their values in sub-queries. It will then join the result set based on common dimensions to calculate the final ratio. Here's an example of the SQL generated for such a ratio metric. ```sql select subq_15577.metric_time as metric_time, cast(subq_15577.mql_queries_created_test as double) / cast(nullif(subq_15582.distinct_query_users, 0) as double) as mql_queries_per_active_user from ( select metric_time, sum(mql_queries_created_test) as mql_queries_created_test from ( select cast(query_created_at as date) as metric_time, case when query_status in ('PENDING','MODE') then 1 else 0 end as mql_queries_created_test from prod_dbt.mql_query_base mql_queries_test_src_2552 ) subq_15576 group by metric_time ) subq_15577 inner join ( select metric_time, count(distinct distinct_query_users) as distinct_query_users from ( select cast(query_created_at as date) as metric_time, case when query_status in ('MODE','PENDING') then email else null end as distinct_query_users from prod_dbt.mql_query_base mql_queries_src_2585 ) subq_15581 group by metric_time ) subq_15582 on ( ( subq_15577.metric_time = subq_15582.metric_time ) or ( ( subq_15577.metric_time is null ) and ( subq_15582.metric_time is null ) ) ) ``` #### Add filter[​](#add-filter "Direct link to Add filter") Users can define constraints on input metrics for a ratio metric by applying a filter directly to the input metric, like so: Note the `filter` and `alias` parameters for the metric referenced in the numerator. * Use the `filter` parameter to apply a filter to the metric it's attached to. * The `alias` parameter is used to avoid naming conflicts in the rendered SQL queries when the same metric is used with different filters. * If there are no naming conflicts, the `alias` parameter can be left out. #### Related docs[​](#related-docs "Direct link to Related docs") * [Fill null values for simple, derived, or ratio metrics](https://docs.getdbt.com/docs/build/fill-nulls-advanced.md) #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Release tracks in dbt platform Since May 2024, new capabilities in the dbt framework are delivered continuously to dbt. Your projects and environments are upgraded automatically on a cadence that you choose, depending on your dbt plan. Previously, customers would pin to a minor version of dbt Core, and receive only patch updates during that specific version's active support period. Release tracks ensure that your project stays up-to-date with the modern capabilities of dbt and recent versions of dbt Core. This will require you to make one final update to your current jobs and environments. When that's done, you'll never have to think about managing, coordinating, or upgrading dbt versions again. By moving your environments and jobs to release tracks you can get all the functionality in dbt as soon as it's ready. On the **Latest** release track, this includes access to features *before* they're available in final releases of dbt Core OSS. #### Which release tracks are available?[​](#which-release-tracks-are-available "Direct link to Which release tracks are available?") | Release track | Description | Plan availability | API value | | ----------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------- | --------------- | | **Latest Fusion** | The latest build of the new engine for dbt, available to select accounts. | All plans
[Private preview](https://docs.getdbt.com/docs/dbt-versions/product-lifecycles "Go to https://docs.getdbt.com/docs/dbt-versions/product-lifecycles") | `latest-fusion` | | **Latest** | Provides a continuous release of the latest functionality in the dbt platform .

Includes early access to new features of the dbt framework before they're available in dbt Core. | All plans | `latest` | | **Compatible** | Provides a monthly release aligned with the most recent open source versions of dbt Core and adapters, plus functionality exclusively available in the dbt platform.

See [Compatible track changelog](https://docs.getdbt.com/docs/dbt-versions/compatible-track-changelog.md) for more information. | Starter, Enterprise, Enterprise+ | `compatible` | | **Extended** | The previous month's **Compatible** release. | Enterprise, Enterprise+ | `extended` | | **Fallback** | The previous month's **Extended** release. | Enterprise+ | `fallback` | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | To configure an environment in the [dbt Admin API](https://docs.getdbt.com/docs/dbt-cloud-apis/admin-cloud-api.md) or [Terraform](https://registry.terraform.io/providers/dbt-labs/dbtcloud/latest) to use a release track, set `dbt_version` to the release track name: * `latest-fusion` [Private preview](https://docs.getdbt.com/docs/dbt-versions/product-lifecycles "Go to https://docs.getdbt.com/docs/dbt-versions/product-lifecycles") * `latest` * `compatible` * `extended` #### Which release track should I choose?[​](#which-release-track-should-i-choose "Direct link to Which release track should I choose?") Choose the **Latest** release track to continuously receive new features, fixes, performance improvements — latest & greatest dbt. This is the default for all customers on dbt. Choose the **Compatible** and **Extended** release tracks if you need a less-frequent release cadence, the ability to test new dbt releases before they go live in production, and/or ongoing compatibility with the latest open source releases of dbt Core. ##### Using the Fallback release track[​](#using-the-fallback-release-track "Direct link to Using the Fallback release track") The **Fallback** release track provides an emergency rollback option for account admins if you suspect a regression in the "Extended" track. 1. Enable it by going to **Account settings** 2. Click the **Fallback** release track button in the dbt platform interface, rather than through environment settings. 3. Fill in the details in the **Revert to Fallback** pop up to confirm and share any info with dbt Support. Switching to **Fallback** alerts the dbt Support team, who may reach out to help resolve the issue. This track is meant only as a temporary safety option to unblock you and not for ongoing use. You should return to "Extended" or "Compatible" once the issue is resolved. [![Fallback release track button in dbt platform](/img/docs/dbt-versions/rollback.png?v=2 "Fallback release track button in dbt platform")](#)Fallback release track button in dbt platform [![Fallback release track popup in dbt platform](/img/docs/dbt-versions/rollback-popup.png?v=2 "Fallback release track popup in dbt platform")](#)Fallback release track popup in dbt platform ##### Common architectures[​](#common-architectures "Direct link to Common architectures") **Default** - Majority of customers on all plans * Prioritize immediate access to fixes and features * Leave all environments on the **Latest** release track (default configuration) **Hybrid** - Starter, Enterprise, Enterprise+ * Prioritize ongoing compatibility between dbt and dbt Core for development & deployment using both products in the same dbt projects * Configure all environments to use the **Compatible** release track * Understand that new features will not be available until they are first released in dbt Core OSS (several months after the **Latest** release track) **Cautious** - Enterprise, Enterprise+, Business Critical * Prioritize "bake in" time for new features & fixes * Configure development & test environments to use the **Compatible** release track * Configure pre-production & production environments to use the **Extended** release track * Understand that new features will not be available until *a month after* they are first released in dbt Core OSS and the Compatible track. Developers (on **Compatible**) will get access to new features before they can leverage those capabilities in production (on **Extended**), and must be mindful of the additional delay. **Virtual Private dbt or Single Tenant** * Changes to all release tracks roll out as part of dbt instance upgrades once per week #### Upgrading from older versions[​](#upgrading-from-older-versions "Direct link to Upgrading from older versions") ##### How to upgrade[​](#upgrade-tips "Direct link to How to upgrade") If you regularly develop your dbt project in dbt, and you're still running on a legacy version of dbt Core, dbt Labs recommends that you try upgrading your project in a development environment. [Override your dbt version in development](https://docs.getdbt.com/docs/dbt-versions/upgrade-dbt-version-in-cloud.md#override-dbt-version). Then, launch the Studio IDE or dbt CLI and do your development work as usual. Everything should work as you expect. If you do see something unexpected or surprising, revert back to the previous version and record the differences you observed. [Contact dbt support](https://docs.getdbt.com/docs/dbt-support.md#dbt-cloud-support) with your findings for a more detailed investigation. Next, we recommend that you try upgrading your project’s [deployment environment](https://docs.getdbt.com/docs/dbt-versions/upgrade-dbt-version-in-cloud.md#environments). If your project has a [staging deployment environment](https://docs.getdbt.com/docs/deploy/deploy-environments.md#staging-environment), upgrade and try working with it for a few days before you proceed with upgrading the production environment. If your organization has multiple dbt projects, we recommend starting your upgrade with projects that are smaller, newer, or more familiar for your team. That way, if you do encounter any issues, it'll be easier and faster to troubleshoot those before proceeding to upgrade larger or more complex projects. ##### Considerations[​](#considerations "Direct link to Considerations") To learn more about how dbt Labs deploys stable dbt upgrades in a safe manner to dbt, we recommend that you read our blog post: [How we're making sure you can confidently switch to the "Latest" release track in dbt](https://docs.getdbt.com/blog/latest-dbt-stability). If you're running dbt version 1.6 or older, please know that your version of dbt Core has reached [end-of-life (EOL)](https://docs.getdbt.com/docs/dbt-versions/core.md#eol-version-support) and is no longer supported. We strongly recommend that you update to a newer version as soon as reasonably possible. dbt Labs has extended the critical support period of dbt Core v1.7 for dbt Enterprise-tier customers to March 2025. At that point, we will be encouraging all customers to select a Release Track for ongoing updates in dbt.  I'm using an older version of dbt in the dbt platform. What should I do? What happens if I do nothing? If you're running dbt version v1.6 or older, please know that your version of dbt Core has reached [end-of-life (EOL)](https://docs.getdbt.com/docs/dbt-versions/core.md#eol-version-support) and is no longer supported. We strongly recommend that you update to a newer version as soon as reasonably possible. dbt Labs has extended the "Critical Support" period of dbt Core v1.7 for dbt Enterprise-tier customers while we work through the migration with those customers to Release Tracks. In the meantime, this means that v1.7 will continue to be accessible in dbt for Enterprise customers, jobs and environments on v1.7 for those customers will not be automatically migrated to "Latest," and dbt Labs will continue to fix critical bugs and security issues. Starting in October 2024, dbt accounts on the Developer and Starter (formerly Teams) plans have been migrated to release tracks from older dbt Core versions. If your account was migrated to the **Latest** release track and you notice new failures in scheduled jobs, please [contact dbt support](https://docs.getdbt.com/docs/dbt-support.md#dbt-cloud-support) to report the problem or request an extension.  What are other known issues when upgrading from older dbt Core versions? If you are upgrading from a very old unsupported version of dbt Core, you may run into one of these edge cases after the upgrade to a newer version: * \[v1.1] Customers on BigQuery should be aware that dbt sets a default [per-model timeout](https://docs.getdbt.com/docs/local/connect-data-platform/bigquery-setup.md#job_execution_timeout_seconds) of 5 minutes. You may override this config in your connection details. Older versions of dbt (including v1.0) did not appropriately respect this timeout configuration. * \[v1.3] Customers with non-dbt `.py` files defined within their project directories, such as `models/`. Since v1.3, dbt expects these files be valid [Python models](https://docs.getdbt.com/docs/build/python-models.md). The customer needs to move these files out of their `models/` directory, or ignore them via `.dbtignore` * \[v1.5] Customers who have `--m` in their job definitions, instead of `-m` or `--models`. This autocompletion (`--m[odels]` for `--models`) has never been officially documented or supported. It was an implicit behavior of argparse (CLI library used in dbt-core v1.0-1.4) that is not supported by `click` (the CLI library used in dbt-core since v1.5+). * \[v1.5] Empty invalid `tests` config start raising a validation error]\(/docs/dbt-versions/core-upgrade/Older%20versions/upgrading-to-v1.5). Replace empty `tests` config with `tests: []` or remove it altogether. * \[v1.6] Performance optimization to `load_result` means you cannot call it on the same query result multiple times. Instead, save it to a local variable once, and reuse that variable (context: [dbt-core#7371](https://github.com/dbt-labs/dbt-core/pull/7371) You should [contact dbt support](https://docs.getdbt.com/docs/dbt-support.md#dbt-cloud-support) to request an extension, during which you will need to make those updates.  I see that my account was migrated to Latest. What should I do? For the vast majority of customers, there is no further action needed. If you see new failures in your scheduled jobs now that they are running on a newer version of dbt, you may need to update your project code to account for one of the edge cases described on this page. You should [contact dbt support](https://docs.getdbt.com/docs/dbt-support.md#dbt-cloud-support) to request an extension, during which you will need to make those updates.  What about breaking changes to packages (maintained by dbt Labs or by others)? When we talk about *latest version*, we’re referring to the underlying runtime for dbt, not the versions of packages you’re installing. Our continuous release for dbt includes testing against several popular dbt packages. This ensures that updates we make to dbt-core, adapters, or anywhere else are compatible with the code in those packages. If a new version of a dbt package includes a breaking change (for example, a change to one of the macros in `dbt_utils`), you don’t have to immediately use the new version. In your `packages` configuration (in `dependencies.yml` or `packages.yml`), you can still specify which versions or version ranges of packages you want dbt to install. If you're not already doing so, we strongly recommend [checking `package-lock.yml` into version control](https://docs.getdbt.com/reference/commands/deps.md#predictable-package-installs) for predictable package installs in deployment environments and a clear change history whenever you install upgrades. If you upgrade to the **Latest** release track, and immediately see something that breaks, please [contact support](https://docs.getdbt.com/docs/dbt-support.md#dbt-cloud-support) and, in the meantime, downgrade back to v1.7. If you’re already on the **Latest** release track, and you observe a breaking change (like something worked yesterday, but today it isn't working, or works in a surprising/different way), please [contact support](https://docs.getdbt.com/docs/dbt-support.md#dbt-cloud-support) immediately. Depending on your contracted support agreement, the dbt Labs team will respond within our SLA time and we would seek to roll back the change and/or roll out a fix (just as we would for any other part of dbt). This is the same whether or not the root cause of the breaking change is in the project code or in the code of a package. If the package you’ve installed relies on *undocumented* functionality of dbt, it doesn't have the same guarantees as functionality that we’ve documented and tested. However, we will still do our best to avoid breaking them.  I see that dbt Core version 1.8 was released in April 2024. Will a version 1.8 become available in the dbt platform? No. Going forward, customers will access new functionality and ongoing support in dbt by receiving automatic updates. We believe this is the best way for us to offer a reliable, stable, and secure runtime for dbt, and for you as dbt users to be able to consistently take advantage of new features. In 2023 (and earlier), customers were expected to manage their own upgrades by selecting dbt Core versions, up to and including dbt Core v1.7, which was released in October 2023. (Way back in 2021, dbt customers would pick specific *patch releases* of dbt Core, such as upgrading from `v0.21.0` to `v0.21.1`. We’ve come a long way since then!) In 2024, we've changed the way that new dbt functionality is made available for dbt customers. Behavior or breaking changes are gated behind opt-in flags. Users don't need to spend valuable time managing their own upgrades. Currently, it is possible to receive continuous (daily) updates. We are adding other release cadence options for managed customers of dbt by the end of the year. Opting into a release cadence with automated upgrades is required for accessing any new functionality that we've released in 2024, and going forward. We continue to release new minor versions of dbt Core (OSS). We most recently released dbt Core v1.9 on December 9, 2024. These releases always include a subset of the functionality that's already available to the dbt platform customers, and always after the functionality has been available in the dbt platform. If you have comments or concerns, we’re happy to help. If you’re an existing dbt customer, you may reach out to your account team or [contact support](https://docs.getdbt.com/docs/dbt-support.md#dbt-cloud-support). #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Retry your dbt jobs If your dbt job run completed with a status of **Error**, you can rerun it from start or from the point of failure in dbt. #### Prerequisites[​](#prerequisites "Direct link to Prerequisites") * You have a [dbt account](https://www.getdbt.com/signup). * You must be using [dbt version](https://docs.getdbt.com/docs/dbt-versions/upgrade-dbt-version-in-cloud.md) 1.6 or newer. * dbt can successfully parse the project and generate a [manifest](https://docs.getdbt.com/reference/artifacts/manifest-json.md) * The most recent run of the job hasn't completed successfully. The latest status of the run is **Error**. * The job command that failed in the run must be one that supports the [retry command](https://docs.getdbt.com/reference/commands/retry.md). #### Rerun an errored job[​](#rerun-an-errored-job "Direct link to Rerun an errored job") 1. Select **Deploy** from the top navigation bar and choose **Run History.** 2. Choose the job run that has errored. 3. In the **Run Summary** tab on the job’s **Run** page, expand the run step that failed. An denotes the failed step. 4. Examine the error message and determine how to fix it. After you have made your changes, save and commit them to your [Git repo](https://docs.getdbt.com/docs/cloud/git/git-version-control.md). 5. Return to your job’s **Run** page. In the upper right corner, click **Rerun** and choose **Rerun from start** or **Rerun from failure**. If you chose to rerun from the failure point, a **Rerun failed steps** modal opens. The modal lists the run steps that will be invoked: the failed step and any skipped steps. To confirm these run steps, click **Rerun from failure**. The job reruns from the failed command in the previously failed run. A banner at the top of the **Run Summary** tab captures this with the message, "This run resumed execution from last failed step". [![Example of the Rerun options in dbt](/img/docs/deploy/native-retry.gif?v=2 "Example of the Rerun options in dbt")](#)Example of the Rerun options in dbt #### Related content[​](#related-content "Direct link to Related content") * [Retry a failed run for a job](https://docs.getdbt.com/dbt-cloud/api-v2#/operations/Retry%20Failed%20Job) API endpoint * [Run visibility](https://docs.getdbt.com/docs/deploy/run-visibility.md) * [Jobs](https://docs.getdbt.com/docs/deploy/jobs.md) * [Job commands](https://docs.getdbt.com/docs/deploy/job-commands.md) #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Run visibility You can view the history of your runs and the model timing dashboard to help identify where improvements can be made to jobs. #### Run history[​](#run-history "Direct link to Run history") The **Run history** dashboard in dbt helps you monitor the health of your dbt project. It provides a detailed overview of all your project's job runs and empowers you with a variety of filters that enable you to focus on specific aspects. You can also use it to review recent runs, find errored runs, and track the progress of runs in progress. You can access it from the top navigation menu by clicking **Deploy** and then **Run history**. The dashboard displays your full run history, including job name, status, associated environment, job trigger, commit SHA, schema, and timing info. dbt developers can access their run history for the last 365 days through the dbt user interface (UI) and API. dbt Labs limits self-service retrieval of run history metadata to 365 days to improve dbt's performance. [![Run history dashboard allows you to monitor the health of your dbt project and displays jobs, job status, environment, timing, and more.](/img/docs/dbt-cloud/deployment/run-history.png?v=2 "Run history dashboard allows you to monitor the health of your dbt project and displays jobs, job status, environment, timing, and more.")](#)Run history dashboard allows you to monitor the health of your dbt project and displays jobs, job status, environment, timing, and more. #### Job run details[​](#job-run-details "Direct link to Job run details") From the **Run history** dashboard, select a run to view complete details about it. The job run details page displays job trigger, commit SHA, time spent in the scheduler queue, all the run steps and their [logs](#access-logs), [model timing](#model-timing), and more. Click **Rerun now** to rerun the job immediately. An example of a completed run with a configuration for a [job completion trigger](https://docs.getdbt.com/docs/deploy/deploy-jobs.md#trigger-on-job-completion): [![Example of run details](/img/docs/dbt-cloud/deployment/example-job-details.png?v=2 "Example of run details")](#)Example of run details ##### Run summary tab[​](#run-summary-tab "Direct link to Run summary tab") You can view and download in-progress and historical logs for your dbt runs. This makes it easier for you to debug errors more efficiently. * To download logs for an individual step, select the step in the **Run summary** tab and click **Download** > **Download logs**. * Note that when viewing debug logs, the log output is truncated. To view and export all debug logs for an individual step, click **Download** > **Download all debug logs**. [![Download logs](/img/docs/dbt-cloud/deployment/download-logs.png?v=2 "Download logs")](#)Download logs ###### Log size limits[​](#log-size-limits "Direct link to Log size limits") dbt enforces cumulative log size limits on run endpoints. If a single step's logs or the total run logs exceed this limit, dbt omits the logs. When dbt omits logs due to size, it displays a **Run logs are too large** banner and shows a message where the logs would usually appear. The run step also displays an **Unknown** status. You can still download omitted logs. If the log file is too large, the download may fail. If that happens, you can [reach out to support](mailto:support@getdbt.com). ##### Lineage tab[​](#lineage-tab "Direct link to Lineage tab") View the lineage graph associated with the job run so you can better understand the dependencies and relationships of the resources in your project. To view a node's metadata directly in [Catalog](https://docs.getdbt.com/docs/explore/explore-projects.md), select it (double-click) from the graph. [![Example of accessing dbt Catalog from the Lineage tab](/img/docs/collaborate/dbt-explorer/explorer-from-lineage.gif?v=2 "Example of accessing dbt Catalog from the Lineage tab")](#)Example of accessing dbt Catalog from the Lineage tab ##### Model timing tab [Starter](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing")[Enterprise](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing")[Enterprise +](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing")[​](#model-timing-tab- "Direct link to model-timing-tab-") The **Model timing** tab displays the composition, order, and time each model takes in a job run. The visualization appears for successful jobs and highlights the top 1% of model durations. This helps you identify bottlenecks in your runs so you can investigate them and potentially make changes to improve their performance. You can find the dashboard on the [job's run details](#job-run-details). [![The Model timing tab displays the top 1% of model durations and visualizes model bottlenecks](/img/docs/dbt-cloud/model-timing.png?v=2 "The Model timing tab displays the top 1% of model durations and visualizes model bottlenecks")](#)The Model timing tab displays the top 1% of model durations and visualizes model bottlenecks ##### Artifacts tab[​](#artifacts-tab "Direct link to Artifacts tab") This provides a list of the artifacts generated by the job run. The files are saved and available for download. [![Example of the Artifacts tab](/img/docs/dbt-cloud/example-artifacts-tab.png?v=2 "Example of the Artifacts tab")](#)Example of the Artifacts tab ##### Compare tab [Enterprise](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing")[Enterprise +](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing")[​](#compare-tab- "Direct link to compare-tab-") The **Compare** tab is shown for [CI job runs](https://docs.getdbt.com/docs/deploy/ci-jobs.md) with the **Run compare changes** setting enabled. It displays details about [the changes from the comparison dbt performed](https://docs.getdbt.com/docs/deploy/advanced-ci.md#compare-changes) between what's in your production environment and the pull request. To help you better visualize the differences, dbt highlights changes to your models in red (deletions) and green (inserts). From the **Modified** section, you can view the following: * **Overview** — High-level summary about the changes to the models such as the number of primary keys that were added or removed. * **Primary keys** — Details about the changes to the records. * **Modified rows** — Details about the modified rows. Click **Show full preview** to display all columns. * **Columns** — Details about the changes to the columns. To view the dependencies and relationships of the resources in your project more closely, click **View in Catalog** to launch [Catalog](https://docs.getdbt.com/docs/explore/explore-projects.md). [![Example of the Compare tab](/img/docs/dbt-cloud/example-ci-compare-changes-tab.png?v=2 "Example of the Compare tab")](#)Example of the Compare tab #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Saved queries Saved queries are a way to save commonly used queries in MetricFlow. You can group metrics, dimensions, and filters that are logically related into a saved query. Saved queries are nodes and visible in the dbt DAG. Saved queries serve as the foundational building block, allowing you to [configure exports](#configure-exports) in your saved query configuration. Exports takes this functionality a step further by enabling you to [schedule and write saved queries](https://docs.getdbt.com/docs/use-dbt-semantic-layer/exports.md) directly within your data platform using [dbt's job scheduler](https://docs.getdbt.com/docs/deploy/job-scheduler.md). #### Parameters[​](#parameters "Direct link to Parameters") To create a saved query, refer to the following table parameters. tip Note that we use dot notation (`.`) to indicate whether a parameter is nested within another parameter. For example, `query_params.metrics` means the `metrics` parameter is nested under `query_params`. If you use multiple metrics in a saved query, then you will only be able to reference the common dimensions these metrics share in the `group_by` or `where` clauses. Use the entity name prefix with the Dimension object, like `Dimension('user__ds')`. #### Configure saved query[​](#configure-saved-query "Direct link to Configure saved query") Use saved queries to define and manage common Semantic Layer queries in YAML, including metrics and dimensions. Saved queries enable you to organize and reuse common MetricFlow queries within dbt projects. For example, you can group related metrics together for better organization, and include commonly used dimensions and filters. In your saved query config, you can also leverage [caching](https://docs.getdbt.com/docs/use-dbt-semantic-layer/sl-cache.md) with the dbt job scheduler to cache common queries, speed up performance, and reduce compute costs. In the following example, you can set the saved query in the `semantic_model.yml` file: semantic\_model.yml Note that you can set `export_as` to both the saved query and the exports [config](https://docs.getdbt.com/reference/resource-properties/config.md), with the exports config value taking precedence. If a key isn't set in the exports config, it will inherit the saved query config value. ###### Where clause[​](#where-clause "Direct link to Where clause") Use the following syntax to reference entities, dimensions, time dimensions, or metrics in filters and refer to [Metrics as dimensions](https://docs.getdbt.com/docs/build/ref-metrics-in-filters.md) for details on how to use metrics as dimensions with metric filters: ```yaml filter: | {{ Entity('entity_name') }} filter: | {{ Dimension('primary_entity__dimension_name') }} filter: | {{ TimeDimension('time_dimension', 'granularity') }} filter: | {{ Metric('metric_name', group_by=['entity_name']) }} ``` ###### Project-level saved queries[​](#project-level-saved-queries "Direct link to Project-level saved queries") To enable saved queries at the project level, you can set the `saved-queries` configuration in the [`dbt_project.yml` file](https://docs.getdbt.com/reference/dbt_project.yml.md). This saves you time in configuring saved queries in each file: dbt\_project.yml ```yaml saved-queries: my_saved_query: +cache: enabled: true ``` For more information on `dbt_project.yml` and config naming conventions, see the [dbt\_project.yml reference page](https://docs.getdbt.com/reference/dbt_project.yml.md#naming-convention). To build `saved_queries`: * Make sure you set the right [environment variable](https://docs.getdbt.com/docs/use-dbt-semantic-layer/exports.md#set-environment-variable) in your environment. * Run the command `dbt build --resource-type saved_query` using the [`--resource-type` flag](https://docs.getdbt.com/reference/global-configs/resource-type.md). #### Configure exports[​](#configure-exports "Direct link to Configure exports") Exports are an additional configuration added to a saved query. They define *how* to write a saved query, along with the schema and table name. Once you've configured your saved query and set the foundation block, you can now configure exports in the `saved_queries` YAML configuration file (the same file as your metric definitions). This will also allow you to [run exports](#run-exports) automatically within your data platform using [dbt's job scheduler](https://docs.getdbt.com/docs/deploy/job-scheduler.md). The following is an example of a saved query with an export: semantic\_model.yml #### Run exports[​](#run-exports "Direct link to Run exports") Once you've configured exports, you can now take things a step further by running exports to automatically write saved queries within your data platform using [dbt's job scheduler](https://docs.getdbt.com/docs/deploy/job-scheduler.md). This feature is only available with the [dbt's Semantic Layer](https://docs.getdbt.com/docs/use-dbt-semantic-layer/dbt-sl.md). For more information on how to run exports, refer to the [Exports](https://docs.getdbt.com/docs/use-dbt-semantic-layer/exports.md) documentation. #### FAQs[​](#faqs "Direct link to FAQs")  Can I have multiple exports in a single saved query? Yes, this is possible. However, the difference would be the name, schema, and materialization strategy of the export.  How can I select saved\_queries by their resource type? To include all saved queries in the dbt build run, use the [`--resource-type` flag](https://docs.getdbt.com/reference/global-configs/resource-type.md) and run the command `dbt build --resource-type saved_query`. #### Related docs[​](#related-docs "Direct link to Related docs") * [Validate semantic nodes in a CI job](https://docs.getdbt.com/docs/deploy/ci-jobs.md#semantic-validations-in-ci) * Configure [caching](https://docs.getdbt.com/docs/use-dbt-semantic-layer/sl-cache.md) #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Semantic models Tip Use [dbt Copilot](https://docs.getdbt.com/docs/cloud/dbt-copilot.md), available for dbt Enterprise and Enterprise+ accounts, to generate semantic models in the Studio IDE only. Semantic models are the foundation for data definition in MetricFlow, which powers the Semantic Layer: 📹 Learn about the dbt Semantic Layer with on-demand video courses! Explore our [dbt Semantic Layer on-demand course](https://learn.getdbt.com/courses/semantic-layer) to learn how to define and query metrics in your dbt project. Additionally, dive into mini-courses for querying the dbt Semantic Layer in your favorite tools: [Tableau](https://courses.getdbt.com/courses/tableau-querying-the-semantic-layer), [Excel](https://learn.getdbt.com/courses/querying-the-semantic-layer-with-excel), [Hex](https://courses.getdbt.com/courses/hex-querying-the-semantic-layer), and [Mode](https://courses.getdbt.com/courses/mode-querying-the-semantic-layer). Here we describe the Semantic model components with examples: #### Semantic models components[​](#semantic-models-components "Direct link to Semantic models components") The complete spec for semantic models is below: The following example displays a complete configuration and detailed descriptions of each field: ##### Description[​](#description "Direct link to Description") Includes important details of the semantic model. This description will primarily be used by other configuration contributors. You can use the pipe operator `(|)` to include multiple lines in the description. ##### Primary entity[​](#primary-entity "Direct link to Primary entity") You can define a primary entity using the following configs: * Entity types * Sample config Here are the types of keys: * **Primary** — Only one record per row in the table, and it includes every record in the data platform. * **Unique** — Only one record per row in the table, but it may have a subset of records in the data platform. Null values may also be present. * **Foreign** — Can have zero, one, or multiple instances of the same record. Null values may also be present. * **Natural** — A column or combination of columns in a table that uniquely identifies a record based on real-world data. For example, the `sales_person_id` can serve as a natural key in a `sales_person_department` dimension table. This example shows a semantic model with three entities and their entity types: `transaction` (primary), `order` (foreign), and `user` (foreign). To reference a desired column, use the actual column name from the model in the `name` parameter. You can also use `name` as an alias to rename the column, and the `expr` parameter to refer to the original column name or a SQL expression of the column. ```yaml entity: - name: transaction type: primary - name: order type: foreign expr: id_order - name: user type: foreign expr: substring(id_order FROM 2) ``` You can refer to entities (join keys) in a semantic model using the `name` parameter. Entity names must be unique within a semantic model, and identifier names can be non-unique across semantic models since MetricFlow uses them for [joins](https://docs.getdbt.com/docs/build/join-logic.md). ##### Dimensions[​](#dimensions "Direct link to Dimensions") [Dimensions](https://docs.getdbt.com/docs/build/dimensions.md) are different ways to organize or look at data. They are effectively the group by parameters for metrics. For example, you might group data by things like region, country, or job title. #### Dependencies[​](#dependencies "Direct link to Dependencies") #### Related docs[​](#related-docs "Direct link to Related docs") #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Set up automatic exposures in Tableau EnterpriseEnterprise + ### Set up automatic exposures in Tableau [Enterprise](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing")[Enterprise +](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing") Set up and automatically populate downstream exposures for supported BI tool integrations, like Tableau. Visualize and orchestrate them through [dbt Catalog](https://docs.getdbt.com/docs/explore/explore-projects) and the [dbt job scheduler](https://docs.getdbt.com/docs/deploy/job-scheduler.md) for a richer experience. As a data team, it’s critical that you have context into the downstream use cases and users of your data products. By leveraging automatic downstream [exposures](https://docs.getdbt.com/docs/build/exposures.md), you can: * Gain a better understanding of how models are used in downstream analytics, improving governance and decision-making. * Reduce incidents and optimize workflows by linking upstream models to downstream dependencies. * Automate exposure tracking for supported BI tools, ensuring lineage is always up to date. * [Orchestrate exposures](https://docs.getdbt.com/docs/cloud-integrations/orchestrate-exposures.md) to refresh the underlying data sources during scheduled dbt jobs, improving timeliness and reducing costs. Orchestrating exposures is a way to ensure that your BI tools are updated regularly using the [dbt job scheduler](https://docs.getdbt.com/docs/deploy/job-scheduler.md). See the [previous page](https://docs.getdbt.com/docs/cloud-integrations/downstream-exposures.md) for more info. In dbt, you can configure downstream exposures in two ways: * Manually — Declared [explicitly](https://docs.getdbt.com/docs/build/exposures.md#declaring-an-exposure) in your project’s YAML files. * Automatic — dbt [creates and visualizes downstream exposures](https://docs.getdbt.com/docs/cloud-integrations/downstream-exposures.md) automatically for supported integrations, removing the need for manual YAML definitions. These downstream exposures are stored in dbt’s metadata system, appear in [Catalog](https://docs.getdbt.com/docs/explore/explore-projects.md), and behave like manual exposures. However, they don’t exist in YAML files. Tableau Server If you're using Tableau Server, you need to add the [dbt IP addresses for your region](https://docs.getdbt.com/docs/cloud/about-cloud/access-regions-ip-addresses.md) to your allowlist. #### Prerequisites[​](#prerequisites "Direct link to Prerequisites") To configure automatic downstream exposures, you should meet the following: 1. Your environment and jobs are on a supported [dbt release track](https://docs.getdbt.com/docs/dbt-versions/cloud-release-tracks.md). 2. You have a dbt account on the [Enterprise or Enterprise+ plan](https://www.getdbt.com/pricing/). 3. You have set up a [production](https://docs.getdbt.com/docs/deploy/deploy-environments.md#set-as-production-environment) deployment environment for each project you want to explore, with at least one successful job run. 4. You have [proper permissions](https://docs.getdbt.com/docs/cloud/manage-access/enterprise-permissions.md) to edit dbt project or production environment settings. 5. Use Tableau as your BI tool and enable metadata permissions or work with an admin to do so. Compatible with Tableau Cloud or Tableau Server with the Metadata API enabled. 6. You have configured a [Tableau personal access token (PAT)](https://help.tableau.com/current/server/en-us/security_personal_access_tokens.htm) whose creator has permission to view data sources. The PAT inherits the permissions of its creator, so ensure the Tableau user who created the token has [Connect permissions](https://help.tableau.com/current/api/rest_api/en-us/REST/rest_api_concepts_permissions.htm). ##### Considerations[​](#considerations "Direct link to Considerations") Configuring automatic downstream exposures with Tableau have the following considerations: * You can only connect to a single Tableau site on the same server. * If you're using Tableau Server, you need to [allowlist dbt's IP addresses](https://docs.getdbt.com/docs/cloud/about-cloud/access-regions-ip-addresses.md) for your dbt region. * Tableau dashboards built using custom SQL queries aren't supported. * Downstream exposures sync automatically *once per day* or when a user updates the selected collections. *  The database fully qualified names (FQNs) in Tableau must match those in the dbt build. Tableau's database FQNs (fully qualified names) must match those in the dbt build. To view all expected dependencies in your exposure, the FQNs must match but aren't case-sensitive. For example: | Tableau FQN | dbt FQN | Result | | ---------------------------------- | --------------------------------------- | ---------------------------------------------------------------- | | `analytics.dbt_data_team.my_model` | `analytics.dbt_data_team.my_model` | ✅ Matches and dependencies will display as expected. | | `analytics.dbt_data_team.my_model` | `prod_analytics.dbt_data_team.my_model` | ❌ Doesn't match and not all expected dependencies will display. | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | To troubleshoot this: 1. In dbt, download the `manifest.json` from the most recent production run that includes the missing dependencies by clicking on the **Artifacts** tab and scrolling to `manifest.json`. 2. Run the following [GraphiQl](https://help.tableau.com/current/api/metadata_api/en-us/docs/meta_api_start.html#explore-the-metadata-api-schema-using-graphiql) query. Make sure to run the query at `your_tableau_server/metadata/graphiql`, where `your_tableau_server` is the value you provided for the Server URL when [setting up your Tableau integration](https://docs.getdbt.com/docs/cloud-integrations/downstream-exposures-tableau.md#set-up-in-tableau): ```jsx query { workbooks { name uri id luid projectLuid projectName upstreamTables { id name schema database { name connectionType } } } } ``` 3. Compare database FQNs between `manifest.json` and the GraphiQL response. Make sure that `{database}.{schema}.{name}` matches in both. The following images are examples of FQNs that *match* in both `manifest.json` and the GraphiQL response and aren't case-sensitive: [![manifest.json example with lowercase FQNs.](/img/docs/cloud-integrations/auto-exposures/manifest-json-example.png?v=2 "manifest.json example with lowercase FQNs.")](#)manifest.json example with lowercase FQNs. [![GraphiQl response example with uppercase FQNs.](/img/docs/cloud-integrations/auto-exposures/graphiql-example.png?v=2 "GraphiQl response example with uppercase FQNs.")](#)GraphiQl response example with uppercase FQNs. 4. If the FQNs don't match, update your Tableau FQNs to match the dbt FQNs. 5. If you're still experiencing issues, please contact [dbt Support](mailto:support@getdbt.com) and share the results with them. #### Set up downstream exposures[​](#set-up-downstream-exposures "Direct link to Set up downstream exposures") Set up downstream exposures in [Tableau](#set-up-in-tableau) and [dbt](#set-up-in-dbt-cloud) to ensure that your BI tool's extracts are updated automatically. ##### Set up in Tableau[​](#set-up-in-tableau "Direct link to Set up in Tableau") This section explains the steps to configure the integration in Tableau. A Tableau site admin must complete these steps. Once configured in both Tableau and [dbt](#set-up-in-dbt-cloud), you can [view downstream exposures](#view-downstream-exposures) in Catalog. 1. Enable [personal access tokens (PATs)](https://help.tableau.com/current/server/en-us/security_personal_access_tokens.htm) for your Tableau account. [![Enable PATs for the account in Tableau](/img/docs/cloud-integrations/auto-exposures/tableau-enable-pat.jpg?v=2 "Enable PATs for the account in Tableau")](#)Enable PATs for the account in Tableau 2. Create a PAT to add to dbt to pull in Tableau metadata for the downstream exposures. When creating the token, you must have permission to access collections/folders, as the PAT only grants access matching the creator's existing privileges. [![Create PATs for the account in Tableau](/img/docs/cloud-integrations/auto-exposures/tableau-create-pat.jpg?v=2 "Create PATs for the account in Tableau")](#)Create PATs for the account in Tableau 3. Copy the **Secret** and the **Token name** for use in a later step in dbt. The secret is only displayed once, so store it in a safe location (like a password manager). [![Copy the secret and token name to enter them in dbt](/img/docs/cloud-integrations/auto-exposures/tableau-copy-token.jpg?v=2 "Copy the secret and token name to enter them in dbt")](#)Copy the secret and token name to enter them in dbt 4. Copy the **Server URL** and **Sitename**. You can find these in the URL while logged into Tableau. [![Locate the Server URL and Sitename in Tableau](/img/docs/cloud-integrations/auto-exposures/tablueau-serverurl.jpg?v=2 "Locate the Server URL and Sitename in Tableau")](#)Locate the Server URL and Sitename in Tableau For example, if the full URL is: `10az.online.tableau.com/#/site/dbtlabspartner/explore`: * The **Server URL** is the fully qualified domain name, in this case: `10az.online.tableau.com` * The **Sitename** is the path fragment right after `site` in the URL, in this case: `dbtlabspartner` 5. With the following items copied, you are now ready to set up downstream exposures in dbt: * ServerURL * Sitename * Token name * Secret ##### Set up in dbt[​](#set-up-in-dbt "Direct link to Set up in dbt") 1. In dbt, navigate to the **Dashboard** of the project you want to add the downstream exposure to and then select **Settings**. 2. Under the **Exposures** section, select **Add lineage integration** to add the Tableau connection. [![Select Add lineage integration to add the Tableau connection.](/img/docs/cloud-integrations/auto-exposures/cloud-add-integration.png?v=2 "Select Add lineage integration to add the Tableau connection.")](#)Select Add lineage integration to add the Tableau connection. 3. Enter the details for the exposure connection you collected from Tableau in the [previous step](#set-up-in-tableau) and click **Continue**. Note that all fields are case-sensitive. [![Enter the details for the exposure connection.](/img/docs/cloud-integrations/auto-exposures/cloud-integration-details.png?v=2 "Enter the details for the exposure connection.")](#)Enter the details for the exposure connection. 4. Select the collections you want to include for the downstream exposures and click **Save**. [![Select the collections you want to include for the downstream exposures.](/img/docs/cloud-integrations/auto-exposures/cloud-select-collections.png?v=2 "Select the collections you want to include for the downstream exposures.")](#)Select the collections you want to include for the downstream exposures. info dbt automatically imports and syncs any workbook within the selected collections. New additions to the collections will appear in the lineage in dbt once per day — after the daily sync and a job run. dbt immediately starts a sync when you update the selected collections list, capturing new workbooks and removing irrelevant ones. 5. dbt imports everything in the collection(s) and you can continue to [view them](#view-auto-exposures) in Catalog. [![View from the dbt Catalog in your Project lineage view, displayed with the Tableau icon.](/img/docs/cloud-integrations/auto-exposures/explorer-lineage2.jpg?v=2 "View from the dbt Catalog in your Project lineage view, displayed with the Tableau icon.")](#)View from the dbt Catalog in your Project lineage view, displayed with the Tableau icon. #### View downstream exposures[​](#view-downstream-exposures "Direct link to View downstream exposures") After setting up downstream exposures in dbt, you can view them in [Catalog](https://docs.getdbt.com/docs/explore/explore-projects.md) for a richer experience. Navigate to Catalog by selecting **Catalog** from the top-level navigation. From the **Overview** page, you can view downstream exposures from a couple of places: * [Exposures menu](#exposures-menu) * [File tree](#file-tree) * [Project lineage](#project-lineage) ##### Exposures menu[​](#exposures-menu "Direct link to Exposures menu") View downstream exposures from the **Exposures** menu item under **Resources**. This menu provides a comprehensive list of all the exposures so you can quickly access and manage them. The menu displays the following information: * **Name**: The name of the exposure. * **Health**: The [data health signal](https://docs.getdbt.com/docs/explore/data-health-signals.md) of the exposure. * **Type**: The type of exposure, such as `dashboard` or `notebook`. * **Owner**: The owner of the exposure. * **Owner email**: The email address of the owner of the exposure. * **Integration**: The BI tool that the exposure is integrated with. * **Exposure mode**: The type of exposure defined: **Auto** or **Manual**. [![View from the dbt Catalog under the project menu.](/img/docs/cloud-integrations/auto-exposures/explorer-view-resources.png?v=2 "View from the dbt Catalog under the project menu.")](#)View from the dbt Catalog under the project menu. ##### File tree[​](#file-tree "Direct link to File tree") Locate directly from within the **File tree** under the **imported\_from\_tableau** sub-folder. This view integrates exposures seamlessly with your project files, making it easy to find and reference them from your project's structure. [![View from the dbt Catalog under the 'File tree' menu.](/img/docs/cloud-integrations/auto-exposures/explorer-view-file-tree.jpg?v=2 "View from the dbt Catalog under the 'File tree' menu.")](#)View from the dbt Catalog under the 'File tree' menu. ##### Project lineage[​](#project-lineage "Direct link to Project lineage") From the **Project lineage** view, which visualizes the dependencies and relationships in your project. Exposures are represented with the Tableau icon, offering an intuitive way to see how they fit into your project's overall data flow. [![View from the dbt Catalog in your Project lineage view, displayed with the Tableau icon.](/img/docs/cloud-integrations/auto-exposures/explorer-lineage2.jpg?v=2 "View from the dbt Catalog in your Project lineage view, displayed with the Tableau icon.")](#)View from the dbt Catalog in your Project lineage view, displayed with the Tableau icon. [![View from the dbt Catalog in your Project lineage view, displayed with the Tableau icon.](/img/docs/cloud-integrations/auto-exposures/explorer-lineage.jpg?v=2 "View from the dbt Catalog in your Project lineage view, displayed with the Tableau icon.")](#)View from the dbt Catalog in your Project lineage view, displayed with the Tableau icon. #### Orchestrate exposures [Beta](https://docs.getdbt.com/docs/dbt-versions/product-lifecycles "Go to https://docs.getdbt.com/docs/dbt-versions/product-lifecycles")[Enterprise](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing")[Enterprise +](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing")[​](#orchestrate-exposures- "Direct link to orchestrate-exposures-") [Orchestrate exposures](https://docs.getdbt.com/docs/cloud-integrations/orchestrate-exposures.md) using the dbt [Cloud job scheduler](https://docs.getdbt.com/docs/deploy/job-scheduler.md) to proactively refresh the underlying data sources (extracts) that power your Tableau Workbooks. * Orchestrating exposures with a `dbt build` job ensures that downstream exposures, like Tableau extracts, are updated regularly and automatically. * You can control the frequency of these refreshes by configuring environment variables. To set up and proactively run exposures with the dbt job scheduler, refer to [Orchestrate exposures](https://docs.getdbt.com/docs/cloud-integrations/orchestrate-exposures.md). #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Set up Cost Insights Private betaEnterpriseEnterprise + ### Set up Cost Insights [Private beta](https://docs.getdbt.com/docs/dbt-versions/product-lifecycles "Go to https://docs.getdbt.com/docs/dbt-versions/product-lifecycles")[Enterprise](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing")[Enterprise +](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing") This guide walks you through setting up Cost Insights to track warehouse compute costs and cost reductions from state-aware orchestration across your dbt projects and models. #### Prerequisites[​](#prerequisites "Direct link to Prerequisites") Before setting up Cost Insights, ensure you have: * A dbt account with dbt Fusion engine enabled. Contact your account manager to enable Fusion for your account. * An administrator role. * A supported data warehouse: Snowflake, BigQuery, or Databricks. To set up Cost Insights, refer to the following steps: 1. [Assign required permissions.](#assign-required-permissions) 2. [Configure platform metadata credentials.](#configure-platform-metadata-credentials) 3. [(Optional) Configure Cost Insights settings.](#configure-cost-insights-settings-optional) 4. [(Optional) Enable state-aware orchestration in your job settings.](#enable-state-aware-orchestration-optional) After completing these setup steps, you can view cost and optimization data across multiple areas of the dbt platform. Refer to [Explore cost data](https://docs.getdbt.com/docs/explore/explore-cost-data.md) to learn more about the Cost Insights section and how to use it. #### Assign required permissions[​](#assign-required-permissions "Direct link to Assign required permissions") Users with the following [permission sets](https://docs.getdbt.com/docs/cloud/manage-access/enterprise-permissions.md) can view cost data by default: * Account Admin * Account Viewer * Cost Insights Admin * Cost Insights Viewer * Database Admin * Git Admin * Job Admin * Project Creator * Team Admin For more information on how to assign permissions to users, refer to [About user access](https://docs.getdbt.com/docs/cloud/manage-access/about-user-access.md). #### Configure platform metadata credentials[​](#configure-platform-metadata-credentials "Direct link to Configure platform metadata credentials") 1. Click your account name at the bottom of the left-side menu and click **Account settings**. 2. Under **Settings**, go to **Connections**. 3. Select an existing connection or create a new connection for the project where you want to enable Cost Insights. 4. Enable platform metadata credentials for your connection. 1. Go to the **Platform metadata credentials** section and click **Add credentials**. 2. Add credentials with permissions to the warehouse tables. Expand each connection to see the permissions required.  Snowflake * `read` permissions to the [`ORGANIZATION_USAGE`](https://docs.snowflake.com/en/sql-reference/organization-usage) and [`ACCOUNT_USAGE`](https://docs.snowflake.com/en/sql-reference/account-usage) schemas * A Snowflake database role assigned the following access: * `ACCOUNT_USAGE.QUERY_HISTORY` * `ACCOUNT_USAGE.QUERY_ATTRIBUTION_HISTORY` * `ACCOUNT_USAGE.ACCESS_HISTORY` * `ACCOUNT_USAGE.WAREHOUSE_METERING_HISTORY` * `ORGANIZATION_USAGE.USAGE_IN_CURRENCY_DAILY` (Optional)  BigQuery * `bigquery.datasets.get` * `bigquery.jobs.create` * `bigquery.jobs.listAll`  Databricks * Access to a [Unity Catalog workspace](https://docs.databricks.com/aws/en/admin/system-tables/#requirements) * `USE` permissions on the catalog and schema * `SELECT` permissions on the following system tables: * [`system.billing`](https://docs.databricks.com/aws/en/admin/system-tables/billing) * [`system.pricing`](https://docs.databricks.com/aws/en/admin/system-tables/pricing) * [`system.query_history`](https://docs.databricks.com/aws/en/admin/system-tables/query-history) For more information, see the Databricks documentation on [granting access to system tables](https://docs.databricks.com/aws/en/admin/system-tables/#grant-access-to-system-tables). If you have multiple connections that reference the same account identifier, you will only be prompted to add platform metadata credentials to one of them. Other connections using the same account identifier will display a message indicating that credentials are already configured. 5. Verify that **Cost Insights** is enabled under **Features**. This feature is enabled by default when you configure platform metadata credentials. 6. Click **Save**. #### Configure Cost Insights settings (optional)[​](#configure-cost-insights-settings-optional "Direct link to Configure Cost Insights settings (optional)") By default, dbt uses standard warehouse pricing. If you have custom pricing contracts, you can override these values *except* for Databricks connections. The default values vary by warehouse: | Warehouse | Default values | | ---------------------------------------------------------------------------- | ---------------------------------------------------------------------------------- | | [Snowflake](https://www.snowflake.com/en/pricing-options/) | `price_per_credit` = $3 | | [BigQuery](https://cloud.google.com/bigquery/pricing) | `price_per_slot_hour` = $0.04, `price_per_tib` = $6.25 | | [Databricks](https://docs.databricks.com/aws/en/admin/system-tables/pricing) | dbt queries the `list_prices` system table directly, so there is no default value. | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | |
To change the default value: 1. Click your account name at the bottom of the left-side menu and click **Account settings**. 2. Under **Settings**, go to **Connections**. 3. Select the connection where you want to configure Cost Insights settings. 4. Go to the **Cost Insights settings** section. 5. Enter your custom value in the **Price per credit** field. 6. Click **Save**. These custom values will apply to all future cost calculations for this connection. If you clear these values, they will reset to the default warehouse pricing. #### Enable state-aware orchestration (optional)[​](#enable-state-aware-orchestration-optional "Direct link to Enable state-aware orchestration (optional)") Cost Insights displays cost data for your dbt models and jobs without state-aware orchestration. However, to understand the impact of optimizations and see cost reductions from model and test reuse, you must enable state-aware orchestration in your jobs. For steps on how to enable this feature, see [Setting up state-aware orchestration](https://docs.getdbt.com/docs/deploy/state-aware-setup.md). note For accounts already using state-aware orchestration before Cost Insights is enabled, at least one full model build must occur within the last 10 days to establish a baseline for cost reduction calculations. If you don't see cost reduction data, try running a full build to establish the baseline. #### Disable Cost Insights[​](#disable-cost-insights "Direct link to Disable Cost Insights") To disable Cost Insights, you must have an administrator role. 1. Click your account name at the bottom of the left-side menu and click **Account settings**. 2. Under **Settings**, go to **Connections**. 3. Select the connection where you want to disable Cost Insights. 4. Go to **Platform metadata credentials** and click **Edit**. 5. Go to the **Features** section and clear the **Cost Insights** option. 6. Click **Save**. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Set up local MCP [The local dbt MCP server](https://github.com/dbt-labs/dbt-mcp) runs locally on your machine and supports dbt Core, dbt Fusion engine, and dbt CLI. You can use it with or without a dbt platform account. #### Prerequisites[​](#prerequisites "Direct link to Prerequisites") * [Install uv](https://docs.astral.sh/uv/getting-started/installation/) to be able to run `dbt-mcp` and [related dependencies](https://github.com/dbt-labs/dbt-mcp/blob/main/pyproject.toml) into an isolated virtual environment. * Have a local dbt project (if you want to use dbt CLI commands). #### Setup options[​](#setup-options "Direct link to Setup options") Choose the setup method that best fits your workflow: ##### OAuth authentication with dbt platform [Enterprise](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing")[Enterprise +](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing")[​](#oauth-authentication-with-dbt-platform- "Direct link to oauth-authentication-with-dbt-platform-") This method uses OAuth to authenticate with your dbt platform account. It's the simplest setup and doesn't require managing tokens or environment variables manually. Static subdomains required Only accounts with static subdomains (for example, `abc123` in `abc123.us1.dbt.com`) can use OAuth with MCP servers. Follow [these](https://docs.getdbt.com/docs/cloud/about-cloud/access-regions-ip-addresses.md) instructions to find your account subdomain. If your account does not have a subdomain, contact support for more information. ###### Configuration options[​](#configuration-options "Direct link to Configuration options") * dbt platform only * dbt platform + CLI This option is for users who only want dbt platform features (Discovery API, Semantic Layer, job management) without local CLI commands. When you use only the dbt platform, the CLI tools are automatically disabled. You can find the `DBT_HOST` field value in your dbt platform account information under **Access URLs**. ```json { "mcpServers": { "dbt": { "command": "uvx", "args": ["dbt-mcp"], "env": { "DBT_HOST": "https://", } } } } ``` **Note:** Replace `` with your actual host (for example, `abc123.us1.dbt.com`). This enables OAuth authentication without requiring local dbt installation. This option is for users who want both dbt CLI commands and dbt platform features (Discovery API, Semantic Layer, job management). The `DBT_PROJECT_DIR` and `DBT_PATH` fields are required for CLI access. You can find the `DBT_HOST` field value in your dbt platform account information under **Access URLs**. ```json { "mcpServers": { "dbt": { "command": "uvx", "args": ["dbt-mcp"], "env": { "DBT_HOST": "https://", "DBT_PROJECT_DIR": "/path/to/project", "DBT_PATH": "/path/to/dbt/executable" } } } } ``` **Note:** Replace `` with your actual host (for example, `https://abc123.us1.dbt.com`). This enables OAuth authentication. Once configured, your session connects to the dbt platform account, starts the OAuth authentication workflow, and then opens your account where you can select the project you want to reference. [![Select your dbt platform project](/img/mcp/select-project.png?v=2 "Select your dbt platform project")](#)Select your dbt platform project After completing OAuth setup, skip to [Test your configuration](#optional-test-your-configuration). ##### CLI only (no dbt platform)[​](#cli-only-no-dbt-platform "Direct link to CLI only (no dbt platform)") This option runs the MCP server locally and connects it to your local dbt project using `DBT_PROJECT_DIR` and `DBT_PATH`. If you're using the dbt Core or Fusion CLI and don't need access to dbt platform features (Discovery API, Semantic Layer, Administrative API), you can set up local MCP with just your dbt project information. Add this configuration to your MCP client (refer to the specific [integration guides](#set-up-your-mcp-client) for exact file locations): ```json { "mcpServers": { "dbt": { "command": "uvx", "args": ["dbt-mcp"], "env": { "DBT_PROJECT_DIR": "/path/to/your/dbt/project", "DBT_PATH": "/path/to/your/dbt/executable" } } } } ``` ###### Locating your paths[​](#locating-your-paths "Direct link to Locating your paths") Follow the appropriate instructions for your OS to locate your path:  macOS/Linux * **DBT\_PROJECT\_DIR**: The full path to your dbt project folder * Example: `/Users/yourname/dbt-projects/my_project` * This is the folder containing your `dbt_project.yml` file. * **DBT\_PATH**: Find your dbt executable path by running in terminal: ```bash which dbt ``` * Example output: `/opt/homebrew/bin/dbt` * Use this exact path in your configuration.  Windows * **DBT\_PROJECT\_DIR**: The full path to your dbt project folder * Example: `C:\Users\yourname\dbt-projects\my_project` * This is the folder containing your `dbt_project.yml` file. * Use forward slashes or escaped backslashes: `C:/Users/yourname/dbt-projects/my_project` * **DBT\_PATH**: Find your dbt executable path by running in Command Prompt or PowerShell: ```bash where dbt ``` * Example output: `C:\Python39\Scripts\dbt.exe` * Use forward slashes or escaped backslashes: `C:/Python39/Scripts/dbt.exe` After completing this setup, skip to [Test your configuration](#optional-test-your-configuration). ##### Environment variable configuration[​](#environment-variable-configuration "Direct link to Environment variable configuration") If you need to configure multiple environment variables or prefer to manage them separately, you can use environment variables. If you are only using the dbt CLI commands, you do not need to supply the dbt platform-specific environment variables, and vice versa. Here is an example of the file: ```code DBT_HOST=cloud.getdbt.com DBT_PROD_ENV_ID=your-production-environment-id DBT_DEV_ENV_ID=your-development-environment-id DBT_USER_ID=your-user-id DBT_ACCOUNT_ID=your-account-id DBT_TOKEN=your-service-token DBT_PROJECT_DIR=/path/to/your/dbt/project DBT_PATH=/path/to/your/dbt/executable DBT_HOST_PREFIX=your-account-prefix ``` You will need this file for integrating with MCP-compatible tools. #### API and SQL tool settings[​](#api-and-sql-tool-settings "Direct link to API and SQL tool settings") | Environment Variable | Required | Description | | -------------------- | -------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | | `DBT_HOST` | Required | Your dbt platform [instance hostname](https://docs.getdbt.com/docs/cloud/about-cloud/access-regions-ip-addresses.md). **Important:** For multi-cell accounts, exclude the account prefix from the hostname and set it in `DBT_HOST_PREFIX` instead. The default is `cloud.getdbt.com`. | | `DBT_HOST_PREFIX` | Only required for multi-cell instances | Your account prefix (for example, `abc123` from `abc123.us1.dbt.com`). Don't include this in `DBT_HOST`. If you're not using a multi-cell account, don't set this value. You can learn more about regions and hosting [here](https://docs.getdbt.com/docs/cloud/about-cloud/access-regions-ip-addresses.md). | | DBT\_TOKEN | Required | Your personal access token or service token from the dbt platform.
**Note**: When using the Semantic Layer, it is recommended to use a personal access token. If you're using a service token, make sure that it has at least `Semantic Layer Only`, `Metadata Only`, and `Developer` permissions. | | DBT\_ACCOUNT\_ID | Required for Administrative API tools | Your [dbt account ID](https://docs.getdbt.com/faqs/Accounts/find-user-id.md) | | DBT\_PROD\_ENV\_ID | Required | Your dbt platform production environment ID | | DBT\_DEV\_ENV\_ID | Optional | Your dbt platform development environment ID | | DBT\_USER\_ID | Optional | Your dbt platform user ID ([docs](https://docs.getdbt.com/faqs/Accounts/find-user-id.md)) | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | **Multi-cell configuration examples:** ✅ **Correct configuration:** ```bash DBT_HOST=us1.dbt.com DBT_HOST_PREFIX=abc123 ``` ❌ **Incorrect configuration (common mistake):** ```bash DBT_HOST=abc123.us1.dbt.com # Don't include prefix in host! # DBT_HOST_PREFIX not set ``` If your full URL is `abc123.us1.dbt.com`, separate it as: * `DBT_HOST=us1.dbt.com` * `DBT_HOST_PREFIX=abc123` #### dbt CLI settings[​](#dbt-cli-settings "Direct link to dbt CLI settings") The local dbt-mcp supports all flavors of dbt, including dbt Core and dbt Fusion engine. | Environment Variable | Required | Description | Example | | -------------------- | -------- | --------------------------------------------------------------------------------------------------------------------------------------- | -------------------------------------------------------------------------------- | | `DBT_PROJECT_DIR` | Required | The full path to where the repository of your dbt project is hosted locally. This is the folder containing your `dbt_project.yml` file. | macOS/Linux: `/Users/myname/reponame`
Windows: `C:/Users/myname/reponame` | | DBT\_PATH | Required | The full path to your dbt executable (dbt Core/Fusion/dbt CLI). See the next section for how to find this. | macOS/Linux: `/opt/homebrew/bin/dbt`
Windows: `C:/Python39/Scripts/dbt.exe` | | DBT\_CLI\_TIMEOUT | Optional | Configure the number of seconds before your agent will timeout dbt CLI commands. | Defaults to 60 seconds. | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | ##### Locating your `DBT_PATH`[​](#locating-your-dbt_path "Direct link to locating-your-dbt_path") Follow the instructions for your OS to locate your `DBT_PATH`:  macOS/Linux Run this command in your Terminal: ```bash which dbt ``` Example output: `/opt/homebrew/bin/dbt`  Windows Run this command in Command Prompt or PowerShell: ```bash where dbt ``` Example output: `C:\Python39\Scripts\dbt.exe` **Note:** Use forward slashes in your configuration: `C:/Python39/Scripts/dbt.exe` **Additional notes:** * You can set any environment variable supported by your dbt executable, like [the ones supported in dbt Core](https://docs.getdbt.com/reference/global-configs/about-global-configs.md#available-flags). * dbt MCP respects the standard environment variables and flags for usage tracking mentioned [here](https://docs.getdbt.com/reference/global-configs/usage-stats.md). * `DBT_WARN_ERROR_OPTIONS='{"error": ["NoNodesForSelectionCriteria"]}'` is automatically set so that the MCP server knows if no node is selected when running a dbt command. You can overwrite it if needed, but it provides a better experience when calling dbt from the MCP server, ensuring the tool selects valid nodes. #### Disabling tools[​](#disabling-tools "Direct link to Disabling tools") You can disable the following tool access on the local `dbt-mcp`: | Name | Default | Description | | ----------------------------- | ------- | ----------------------------------------------------------------------------------------------------------------------------------------------------- | | `DISABLE_DBT_CLI` | `false` | Set this to `true` to disable dbt Core, dbt CLI, and dbt Fusion MCP tools. | | `DISABLE_SEMANTIC_LAYER` | `false` | Set this to `true` to disable dbt Semantic Layer MCP tools. | | `DISABLE_DISCOVERY` | `false` | Set this to `true` to disable dbt Discovery API MCP tools. | | `DISABLE_ADMIN_API` | `false` | Set this to `true` to disable dbt Administrative API MCP tools. | | `DISABLE_SQL` | `true` | Set this to `false` to enable SQL MCP tools. | | `DISABLE_DBT_CODEGEN` | `true` | Set this to `false` to enable [dbt codegen MCP tools](https://docs.getdbt.com/docs/dbt-ai/about-mcp.md#codegen-tools) (requires dbt-codegen package). | | `DISABLE_LSP` | `false` | Set this to `true` to disable dbt LSP/Fusion MCP tools. | | `DISABLE_MCP_SERVER_METADATA` | `true` | Set this to `false` to enable MCP server metadata tools (like `get_mcp_server_version`). | | `DISABLE_TOOLS` | "" | Set this to a list of tool names delimited by a `,` to disable specific tools. | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | ###### Using environment variables in your MCP client configuration[​](#using-environment-variables-in-your-mcp-client-configuration "Direct link to Using environment variables in your MCP client configuration") The recommended way to configure your MCP client is to use the `env` field in your JSON configuration file. This keeps all configuration in one file: ```json { "mcpServers": { "dbt": { "command": "uvx", "args": ["dbt-mcp"], "env": { "DBT_HOST": "cloud.getdbt.com", "DBT_TOKEN": "your-token-here", "DBT_PROD_ENV_ID": "12345", "DBT_PROJECT_DIR": "/path/to/project", "DBT_PATH": "/path/to/dbt" } } } } ``` ###### Using an `.env` file[​](#using-an-env-file "Direct link to using-an-env-file") If you prefer to manage environment variables in a separate file, you can create an `.env` file and reference it: ```json { "mcpServers": { "dbt": { "command": "uvx", "args": ["--env-file", "/path/to/.env", "dbt-mcp"] } } } ``` However, this approach requires managing two files instead of one. #### (Optional) Test your configuration[​](#optional-test-your-configuration "Direct link to (Optional) Test your configuration") In your command line tool, run the following to test your setup: **If using the `env` field in JSON:** ```bash export DBT_PROJECT_DIR=/path/to/project export DBT_PATH=/path/to/dbt uvx dbt-mcp ``` **If using an `.env` file:** ```bash uvx --env-file dbt-mcp ``` If there are no errors, your configuration is correct. #### Set up your MCP client[​](#set-up-your-mcp-client "Direct link to Set up your MCP client") After completing your configuration, follow the specific integration guide for your chosen tool: * [Claude](https://docs.getdbt.com/docs/dbt-ai/integrate-mcp-claude.md) * [Cursor](https://docs.getdbt.com/docs/dbt-ai/integrate-mcp-cursor.md) * [VS Code](https://docs.getdbt.com/docs/dbt-ai/integrate-mcp-vscode.md) #### Debug configurations[​](#debug-configurations "Direct link to Debug configurations") These settings allow you to customize the MCP server’s logging level to help with diagnosing and troubleshooting. | Name | Default | Description | | ------------------- | ------- | ------------------------------------------------------------------------------------------------------------------------ | | `DBT_MCP_LOG_LEVEL` | `INFO` | Environment variable to override the MCP server log level. Options are: `DEBUG`, `INFO`, `WARNING`, `ERROR`, `CRITICAL`. | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | To see more detail about what’s happening inside the MCP server and help debug issues, you can temporarily set the log level to `DEBUG`. We recommend setting it temporarily to avoid filling up disk space with logs. #### Troubleshooting[​](#troubleshooting "Direct link to Troubleshooting") ###### Can't find `uvx` executable[​](#cant-find-uvx-executable "Direct link to cant-find-uvx-executable") Some MCP clients may be unable to find `uvx` from the JSON config. This will result in error messages like `Could not connect to MCP server dbt-mcp`, `Error: spawn uvx ENOENT`, or similar. **Solution:** Locate the full path to `uvx` and use it in your configuration: * **macOS/Linux:** Run `which uvx` in your Terminal. * **Windows:** Run `where uvx` in CMD or PowerShell. Then update your JSON configuration to use the full path: ```json { "mcpServers": { "dbt": { "command": "/full/path/to/uvx", # For example, on macOS with Homebrew: "command": "/opt/homebrew/bin/uvx" "args": ["dbt-mcp"], "env": { ... } } } } ``` ##### Oauth login not initiating[​](#oauth-login-not-initiating "Direct link to Oauth login not initiating") dbt MCP uses a lock file to avoid repeated authentication. In some cases, this can block authentication entirely. If this happens, close your client (for example, Cursor or Claude) and delete the local dbt MCP config files to reset the auth flow. * On macOS and Linux, run: `rm -f ~/.dbt/mcp.yml ~/.dbt/mcp.lock` * On [Windows](https://learn.microsoft.com/en-us/powershell/module/microsoft.powershell.management/remove-item?view=powershell-7.5), run: `Remove-Item -Force $env:USERPROFILE\.dbt\mcp.yml, $env:USERPROFILE\.dbt\mcp.lock` #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Set up remote MCP The remote MCP server uses an HTTP connection and makes calls to dbt-mcp hosted on the cloud-based dbt platform. This setup requires no local installation and is ideal for data consumption use cases. #### When to use remote MCP[​](#when-to-use-remote-mcp "Direct link to When to use remote MCP") The remote MCP server is the ideal choice when: * You don't want to or are restricted from installing additional software (`uvx`, `dbt-mcp`) on your system. * Your primary use case is *consumption-based*: querying metrics, exploring metadata, viewing lineage. * You need access to Semantic Layer and Discovery APIs without maintaining a local dbt project. * You don't need to execute CLI commands. Remote MCP does not support dbt CLI commands (`dbt run`, `dbt build`, `dbt test`, and more). If you need to execute dbt CLI commands, use the [local MCP server](https://docs.getdbt.com/docs/dbt-ai/setup-local-mcp.md) instead. info Only [`text_to_sql`](#sql) consumes dbt Copilot credits. Other MCP tools do not. When your account runs out of Copilot credits, the remote MCP server blocks all tools that run through it, even tools invoked from a local MCP server and [proxied](https://github.com/dbt-labs/dbt-mcp/blob/main/src/dbt_mcp/tools/toolsets.py#L24) to remote MCP (like SQL and remote Fusion tools). If you reach your dbt Copilot usage limit, all tools will be blocked until your Copilot credits reset. If you need help, please reach out to your account manager. #### Setup instructions[​](#setup-instructions "Direct link to Setup instructions") 1. Ensure that you have [AI features](https://docs.getdbt.com/docs/cloud/enable-dbt-copilot) turned on. 2. Obtain the following information from dbt platform: * **dbt Cloud host**: Use this to form the full URL. For example, replace `YOUR_DBT_HOST_URL` here: `https://YOUR_DBT_HOST_URL/api/ai/v1/mcp/`. It may look like: `https://cloud.getdbt.com/api/ai/v1/mcp/`. If you have a multi-cell account, the host URL will be in the `ACCOUNT_PREFIX.us1.dbt.com` format. For more information, refer to [Access, Regions, & IP addresses](https://docs.getdbt.com/docs/cloud/about-cloud/access-regions-ip-addresses.md). * **Production environment ID**: You can find this on the **Orchestration** page in the dbt platform. Use this to set an `x-dbt-prod-environment-id` header. * **Token**: Generate either a personal access token or a service token. In terms of permissions, to fully utilize remote MCP, it must be configured with Semantic Layer and Developer permissions. Note: to use functionality that requires the `x-dbt-user-id` header, a personal access token is required. 3. For the remote MCP, you will pass on headers through the JSON blob to configure required fields: **Configuration for APIs and SQL tools** | Header | Required | Description | | ------------------------- | -------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | Authorization | Required | Your [personal access token (PAT)](https://docs.getdbt.com/docs/dbt-cloud-apis/user-tokens.md) or [service token](https://docs.getdbt.com/docs/dbt-cloud-apis/service-tokens.md) from the dbt platform.
**Note**: When using the Semantic Layer, we recommended to use a PAT. If you're using a service token, make sure that it has at least `Semantic Layer Only`, `Metadata Only`, and `Developer` permissions.

The value must be in the format `Token YOUR_DBT_ACCESS_TOKEN` or `Bearer YOUR_DBT_ACCESS_TOKEN`, replacing `YOUR_DBT_ACCESS_TOKEN` with your actual token. | | x-dbt-prod-environment-id | Required | Your dbt platform production environment ID | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | **Additional configuration for SQL tools** | Header | Required | Description | | ------------------------ | -------------------------- | --------------------------------------------------------------------------------------------- | | x-dbt-dev-environment-id | Required for `execute_sql` | Your dbt platform development environment ID | | x-dbt-user-id | Required for `execute_sql` | Your dbt platform user ID ([see docs](https://docs.getdbt.com/faqs/Accounts/find-user-id.md)) | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | **Additional configuration for Fusion tools** Fusion tools, by default, defer to the environment provided via `x-dbt-prod-environment-id` for model and table metadata. | Header | Required | Description | | -------------------------- | -------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | | x-dbt-dev-environment-id | Required | Your dbt platform development environment ID | | x-dbt-user-id | Required | Your dbt platform user ID ([see docs](https://docs.getdbt.com/faqs/Accounts/find-user-id.md)) | | x-dbt-fusion-disable-defer | Optional | Default: `false`. When set to `true`, Fusion tools will not defer to the production environment and use the models and table metadata from the development environment (`x-dbt-dev-environment-id`) instead. | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | **Configuration to disable tools** | Header | Required | Description | | ---------------------- | -------- | ---------------------------------------------------------------------------------------------------- | | x-dbt-disable-tools | Optional | A comma-separated list of tools to disable. For instance: `get_all_models,text_to_sql,list_entities` | | x-dbt-disable-toolsets | Optional | A comma-separated list of toolsets to disable. For instance: `semantic_layer,sql,discovery` | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | 4. After establishing which headers you need, you can follow the [examples](https://github.com/dbt-labs/dbt-mcp/tree/main/examples) to create your own agent. The MCP protocol is programming language and framework agnostic, so use whatever helps you build agents. Alternatively, you can connect the remote dbt MCP server to MCP clients that support header-based authentication. You can use this example Cursor configuration, replacing `YOUR_DBT_HOST_URL`, `YOUR_DBT_ACCESS_TOKEN`, `PROD-ID`, `USER-ID`, and `DEV-ID` with your information: ```text { "mcpServers": { "dbt": { "url": "https://YOUR_DBT_HOST_URL/api/ai/v1/mcp/", "headers": { "Authorization": "Token YOUR_DBT_ACCESS_TOKEN", "x-dbt-prod-environment-id": "PROD-ID", "x-dbt-user-id": "USER-ID", "x-dbt-dev-environment-id": "DEV-ID" } } } } ``` #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Set up the dbt Snowflake Native App Preview ### Set up the dbt Snowflake Native App [Preview](https://docs.getdbt.com/docs/dbt-versions/product-lifecycles "Go to https://docs.getdbt.com/docs/dbt-versions/product-lifecycles") The [dbt Snowflake Native App](https://docs.getdbt.com/docs/cloud-integrations/snowflake-native-app.md) enables these features within the Snowflake user interface: Catalog, the **Copilot** chatbot, and dbt's orchestration observability features. Configure both dbt and Snowflake to set up this integration. The high-level steps are described as follows: 1. Set up the **Copilot** configuration. 2. Configure Snowflake. 3. Configure dbt. 4. Purchase and install the dbt Snowflake Native App. 5. Configure the app. 6. Verify successful installation of the app. 7. Onboard new users to the app. The order of the steps is slightly different if you purchased the public listing of the Native App; you'll start by purchasing the Native App, satisfying the prerequisites, and then completing the remaining steps in order. #### Prerequisites[​](#prerequisites "Direct link to Prerequisites") The following are the prerequisites for dbt and Snowflake. ##### dbt[​](#dbt "Direct link to dbt") * You must have a dbt account on an Enterprise-tier plan that's in an AWS Region or Azure region. If you don't already have one, please [contact us](mailto:sales_snowflake_marketplace@dbtlabs.com) to get started. * Currently, Semantic Layer is unavailable for Azure ST instances and the **Copilot** chatbot will not function in the dbt Snowflake Native App without it. * Your dbt account must have permission to create a [service token](https://docs.getdbt.com/docs/dbt-cloud-apis/service-tokens.md). For details, refer to [Enterprise permissions](https://docs.getdbt.com/docs/cloud/manage-access/enterprise-permissions.md). * There's a dbt project with [Semantic Layer configured](https://docs.getdbt.com/docs/use-dbt-semantic-layer/setup-sl.md) and metrics declared. * You have set up a [production deployment environment](https://docs.getdbt.com/docs/deploy/deploy-environments.md#set-as-production-environment). * There has been at least one successful job run that includes a `docs generate` step in the deployment environment. ##### Snowflake[​](#snowflake "Direct link to Snowflake") * You have **ACCOUNTADMIN** access in Snowflake. * Your Snowflake account must have access to the Native App/SPCS integration and NA/SPCS configurations (Public Preview planned at end of June). If you're unsure, please check with your Snowflake account manager. * The Snowflake account must be in an AWS Region. Azure is not currently supported for Native App/SPCS integration. * You have access to Snowflake Cortex through your Snowflake permissions and [Snowflake Cortex is available in your region](https://docs.snowflake.com/en/user-guide/snowflake-cortex/llm-functions#availability). Without this, Copilot will not work. #### Set up the configuration for Copilot[​](#set-up-the-configuration-for-copilot "Direct link to Set up the configuration for Copilot") Configure dbt and Snowflake Cortex to power the **Copilot** chatbot. 1. In dbt, browse to your Semantic Layer configurations. 1. Navigate to the left hand side panel and click your account name. From there, select **Account settings**. 2. In the left sidebar, select **Projects** and choose your dbt project from the project list. 3. In the **Project details** panel, click the **Edit Semantic Layer Configuration** link (which is below the **GraphQL URL** option). 2. In the **Semantic Layer Configuration Details** panel, identify the Snowflake credentials (which you'll use to access Snowflake Cortex) and the environment against which the Semantic Layer is run. Save the username, role, and the environment in a temporary location to use later on. [![Semantic Layer credentials](/img/docs/cloud-integrations/semantic_layer_configuration.png?v=2 "Semantic Layer credentials")](#)Semantic Layer credentials 3. In Snowflake, verify that your SL and deployment user has been granted permission to use Snowflake Cortex. For more information, refer to [Required Privileges](https://docs.snowflake.com/en/user-guide/snowflake-cortex/llm-functions#required-privileges) in the Snowflake docs. By default, all users should have access to Snowflake Cortex. If this is disabled for you, open a Snowflake SQL worksheet and run these statements: ```sql create role cortex_user_role; grant database role SNOWFLAKE.CORTEX_USER to role cortex_user_role; grant role cortex_user_role to user SL_USER; grant role cortex_user_role to user DEPLOYMENT_USER; ``` Make sure to replace `SNOWFLAKE.CORTEX_USER`, `DEPLOYMENT_USER`, and `SL_USER` with the appropriate strings for your environment. #### Configure dbt[​](#configure-dbt "Direct link to Configure dbt") Collect the following pieces of information from dbt to set up the application. 1. Navigate to the left-hand side panel and click your account name. From there, select **Account settings**. Then click **API tokens > Service tokens**. Create a service token with access to all the projects you want to access in the dbt Snowflake Native App. Grant these permission sets: * **Manage marketplace apps** * **Job Admin** * **Metadata Only** * **Semantic Layer Only** Make sure to save the token information in a temporary location to use later during Native App configuration. The following is an example of granting the permission sets to all projects: [![Example of a new service token for the dbt Snowflake Native App](/img/docs/cloud-integrations/example-snowflake-native-app-service-token.png?v=2 "Example of a new service token for the dbt Snowflake Native App")](#)Example of a new service token for the dbt Snowflake Native App 2. From the left sidebar, select **Account** and save this information in a temporary location to use later during Native App configuration: * **Account ID** — A numerical string representing your dbt account. * **Access URL** — If you have a North America multi-tenant account, use `cloud.getdbt.com` as the access URL. For all other regions, refer to [Access, Regions, & IP addresses](https://docs.getdbt.com/docs/cloud/about-cloud/access-regions-ip-addresses.md) and look up the access URL you should use in the table. #### Install the dbt Snowflake Native App[​](#install-the-dbt-snowflake-native-app "Direct link to Install the dbt Snowflake Native App") 1. Browse to the listing for the dbt Snowflake Native App: * **Private listing** (recommended) — Use the link from the email sent to you. * **Public listing** — Navigate to the [Snowflake Marketplace](https://app.snowflake.com/marketplace/listing/GZTYZSRT2R3). 2. Click **Get** on the listing to install the dbt Snowflake Native App. This can take several minutes. When installation is complete, an email is sent to you. A message will appear asking if you want to change the application and grant access to the warehouse for installation. dbt Labs strongly recommends not changing the application name unless necessary. 3. When the dbt Snowflake Native App is successfully installed, click **Configure** in the modal window. #### Configure the dbt Snowflake Native App[​](#configure-the-dbt-snowflake-native-app "Direct link to Configure the dbt Snowflake Native App") 1. On the **Activate dbt** page, click **Grant** in **Step 1: Grant Account Privileges**. 2. When privileges have been successfully granted, click **Review** in **Step 2: Allow Connections**. Walk through the **Connect to dbt External Access Integration** steps. You will need your dbt account information that you collected earlier. Enter your account ID, access URL, and API service token as the **Secret value** when prompted. 3. On the **Activate dbt** page, click **Activate** when you've established a successful connection to the dbt External Access Integration. It can take a few minutes to spin up the required Snowflake services and compute resources. 4. When activation is complete, select the **Telemetry** tab and enable the option to share your `INFO` logs. The option might take some time to display. This is because Snowflake needs to create the events table so it can be shared. 5. When the option is successfully enabled, click **Launch app**. Then, log in to the app with your Snowflake credentials. If it redirects you to a Snowsight worksheet (instead of the login page), that means the app hasn't finished installing. You can resolve this issue, typically, by refreshing the page. The following is an example of the dbt Snowflake Native App after configuration: [![Example of the dbt Snowflake Native App](/img/docs/cloud-integrations/example-dbt-snowflake-native-app.png?v=2 "Example of the dbt Snowflake Native App")](#)Example of the dbt Snowflake Native App #### Verify the app installed successfully[​](#verify-the-app-installed-successfully "Direct link to Verify the app installed successfully") To verify the app installed successfully, select any of the following from the sidebar: * **Explore** — Launch Catalog and make sure you can access your dbt project information. * **Jobs** — Review the run history of the dbt jobs. * **Copilot** — From Catalog, click the Copilot button to ask the chatbot a question. The following is an example of the **Copilot** chatbot with the suggested prompts near the top: [![Example of the Copilot chatbot](/img/docs/cloud-integrations/example-ask-dbt-native-app.png?v=2 "Example of the Copilot chatbot")](#)Example of the Copilot chatbot #### Onboard new users[​](#onboard-new-users "Direct link to Onboard new users") 1. From the sidebar in Snowflake, select **Data Products > Apps**. Choose **dbt** from the list to open the app's configuration page. Then, click **Manage access** (in the upper right) to onboard new users to the application. Grant the **APP\_USER** role to the appropriate roles that should have access to the application but not the ability to edit the configurations. Grant **APP\_ADMIN** to roles that should have access to edit or remove the configurations. 2. New users can access the app with either the Snowflake app URL that's been shared with them, or by clicking **Launch app** from the app's configuration page. #### FAQs[​](#faqs "Direct link to FAQs")  Unable to install the dbt Snowflake Native app from the Snowflake Marketplace The dbt Snowflake Native App is not available to Snowflake Free Trial accounts.  Received the error message \`Unable to access schema dbt\_sl\_llm\` from Copilot Check that the SL user has been granted access to the `dbt_sl_llm` schema and make sure they have all the necessary permissions to read and write from the schema.  Need to update the dbt configuration options used by the Native App If there's been an update to the dbt account ID, access URL, or API service token, you need to update the configuration for the dbt Snowflake Native App. In Snowflake, navigate to the app's configuration page and delete the existing configurations. Add the new configuration and then run `CALL app_public.restart_app();` in the application database in Snowsight.  Are environment variables supported in the Native App? [Environment variables](https://docs.getdbt.com/docs/build/environment-variables.md), like `{{env_var('DBT_WAREHOUSE') }}` aren’t supported in the Semantic Layer yet. To use the 'Copilot' feature, you must use the actual credentials instead. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Setting up state-aware orchestration Private previewEnterpriseEnterprise + ### Setting up state-aware orchestration [Private preview](https://docs.getdbt.com/docs/dbt-versions/product-lifecycles "Go to https://docs.getdbt.com/docs/dbt-versions/product-lifecycles")[Enterprise](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing")[Enterprise +](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing") Set up state-aware orchestration to automatically determine which models to build by detecting changes in code or data and only building the changed models each time a job is run. important The dbt Fusion engine is currently available for installation in: * [Local command line interface (CLI) tools](https://docs.getdbt.com/docs/local/install-dbt.md?version=2#get-started) [Preview](https://docs.getdbt.com/docs/dbt-versions/product-lifecycles "Go to https://docs.getdbt.com/docs/dbt-versions/product-lifecycles") * [VS Code and Cursor with the dbt extension](https://docs.getdbt.com/docs/install-dbt-extension.md) [Preview](https://docs.getdbt.com/docs/dbt-versions/product-lifecycles "Go to https://docs.getdbt.com/docs/dbt-versions/product-lifecycles") * [dbt platform environments](https://docs.getdbt.com/docs/dbt-versions/upgrade-dbt-version-in-cloud.md#dbt-fusion-engine) [Private preview](https://docs.getdbt.com/docs/dbt-versions/product-lifecycles "Go to https://docs.getdbt.com/docs/dbt-versions/product-lifecycles") Join the conversation in our Community Slack channel [`#dbt-fusion-engine`](https://getdbt.slack.com/archives/C088YCAB6GH). Read the [Fusion Diaries](https://github.com/dbt-labs/dbt-fusion/discussions/categories/announcements) for the latest updates. #### Prerequisites[​](#prerequisites "Direct link to Prerequisites") To use state-aware orchestration, make sure you meet these prerequisites: * You must have a dbt [Enterprise and Enterprise+ accounts](https://www.getdbt.com/signup/) and a [Developer seat license](https://docs.getdbt.com/docs/cloud/manage-access/seats-and-users.md). * You have updated the environment that will run state-aware orchestration to the dbt Fusion engine. For more information, refer to [Upgrading to dbt Fusion engine](https://docs.getdbt.com/docs/dbt-versions/core-upgrade/upgrading-to-fusion.md). * You must have a dbt project connected to a [data platform](https://docs.getdbt.com/docs/cloud/connect-data-platform/about-connections.md). * You must have [access permission](https://docs.getdbt.com/docs/cloud/manage-access/about-user-access.md) to view, create, modify, or run jobs. * You must set up a [deployment environment](https://docs.getdbt.com/docs/deploy/deploy-environments.md) that is production or staging only. * You must use a deploy job. Continuous integration (CI) and merge jobs currently do not support state-aware orchestration. * (Optional) To customize behavior, you have configured your model or source data with [advanced configurations](#advanced-configurations). info State-aware orchestration is available for SQL models only. Python models are not supported. #### Default settings[​](#default-settings "Direct link to Default settings") By default, for an Enterprise-tier account upgraded to the dbt Fusion engine, any newly created job will automatically be state-aware. Out of the box, without custom configurations, when you run a job, the job will only build models when either the code has changed, or there’s any new data in a source. #### Create a job[​](#create-a-job "Direct link to Create a job") New jobs are state-aware by default For existing jobs, make them state-aware by selecting **Enable Fusion cost optimization features** in the **Job settings** page. To create a state-aware job: 1. From your deployment environment page, click **Create job** and select **Deploy job**. 2. Options in the **Job settings** section: * **Job name**: Specify the name, for example, `Daily build`. * (Optional) **Description**: Provide a description of what the job does (for example, what the job consumes and what the job produces). * **Environment**: By default, it’s set to the deployment environment you created the state-aware job from. 3. Options in the **Execution settings** and **Triggers** sections: [![Example of Triggers on the Deploy Job page](/img/docs/dbt-cloud/using-dbt-cloud/example-triggers-section.png?v=2 "Example of Triggers on the Deploy Job page")](#)Example of Triggers on the Deploy Job page * **Execution settings** section: * **Commands**: By default, it includes the `dbt build` command. Click **Add command** to add more [commands](https://docs.getdbt.com/docs/deploy/job-commands.md) that you want to be invoked when the job runs. * **Generate docs on run**: Enable this option if you want to [generate project docs](https://docs.getdbt.com/docs/build/documentation.md) when this deploy job runs. * **Enable Fusion cost optimization features**: Select this option to enable **State-aware orchestration**. **Efficient testing** is disabled by default. You can expand **More options** to enable or disable individual settings. * **Triggers** section: * **Run on schedule**: Run the deploy job on a set schedule. * **Timing**: Specify whether to [schedule](#schedule-days) the deploy job using **Intervals** that run the job every specified number of hours, **Specific hours** that run the job at specific times of day, or **Cron schedule** that run the job specified using [cron syntax](#cron-schedule). * **Days of the week**: By default, it’s set to every day when **Intervals** or **Specific hours** is chosen for **Timing**. * **Run when another job finishes**: Run the deploy job when another *upstream* deploy [job completes](#trigger-on-job-completion). * **Project**: Specify the parent project that has that upstream deploy job. * **Job**: Specify the upstream deploy job. * **Completes on**: Select the job run status(es) that will [enqueue](https://docs.getdbt.com/docs/deploy/job-scheduler.md#scheduler-queue) the deploy job. 6. (Optional) Options in the **Advanced settings** section: * **Environment variables**: Define [environment variables](https://docs.getdbt.com/docs/build/environment-variables.md) to customize the behavior of your project when the deploy job runs. * **Target name**: Define the [target name](https://docs.getdbt.com/docs/build/custom-target-names.md) to customize the behavior of your project when the deploy job runs. Environment variables and target names are often used interchangeably. * **Run timeout**: Cancel the deploy job if the run time exceeds the timeout value. * **Compare changes against**: By default, it’s set to **No deferral**. Select either **Environment** or **This Job** to let dbt know what it should compare the changes against. 7. Click **Save**. You can see which models dbt builds in the run summary logs. Models that weren't rebuilt during the run are tagged as **Reused** with context about why dbt skipped rebuilding them (and saving you unnecessary compute!). You can also see the reused models under the **Reused** tab. [![Example logs for state-aware orchestration](/img/docs/dbt-cloud/using-dbt-cloud/SAO_logs_view.png?v=2 "Example logs for state-aware orchestration")](#)Example logs for state-aware orchestration #### Delete a job[​](#delete-a-job "Direct link to Delete a job") To delete a job or multiple jobs in dbt: 1. Click **Deploy** on the navigation header. 2. Click **Jobs** and select the job you want to delete. 3. Click **Settings** on the top right of the page and then click **Edit**. 4. Scroll to the bottom of the page and click **Delete job** to delete the job.
[![Delete a job](/img/docs/dbt-cloud/cloud-configuring-dbt-cloud/delete-job.png?v=2 "Delete a job")](#)Delete a job 5. Confirm your action in the pop-up by clicking **Confirm delete** in the bottom right to delete the job immediately. This action cannot be undone. However, you can create a new job with the same information if the deletion was made in error. 6. Refresh the page, and the deleted job should now be gone. If you want to delete multiple jobs, you'll need to perform these steps for each job. If you're having any issues, feel free to [contact us](mailto:support@getdbt.com) for additional help. #### Advanced configurations[​](#advanced-configurations "Direct link to Advanced configurations") By default, we use the warehouse metadata to check if sources (or upstream models in the case of Mesh) are fresh. For more advanced use cases, dbt provides other options that enable you to specify what gets run by state-aware orchestration. You can use the following optional parameters to customize your state-aware orchestration: | Parameter | Description | Allowed values | Supports Jinja | | ------------------------ | --------------------------------------------------------------------------------------------------------------------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | -------------- | | `loaded_at_field` | Specifies a specific column to use from the data. | Name of timestamp column. For example, `created_at`, `"CAST(created_at AS TIMESTAMP)"`. | ✅ | | `loaded_at_query` | Defines a custom freshness condition in SQL to account for partial loading or streaming data. | SQL string. For example, `"select {{ current_timestamp() }}"`. For a multi-line query, see the example after this table. | ✅ | | `build_after.count` | Determines how many units of time must pass before a model can be rebuilt to help reduce build frequency. | A positive integer or a Jinja expression. For example, `4` or `"{{ var('build_after_count', 4) }}"`. | ✅ | | `build_after.period` | The time unit for the count to define the build interval. | `minute`, `hour`, `day`, or a Jinja expression (for example, `"{{ var('build_after_period', 'day') }}"`). | ✅ | | `build_after.updates_on` | Determines whether a model rebuild is triggered when any upstream dependency has fresh data or only when all upstream dependencies are fresh. | \* `any` (default) — Use this value when you want a downstream model to rebuild if *any* of its upstream dependencies receives fresh data, even if others haven’t.\* `all` — Use this value when you want to trigger a rebuild only when *all* upstream dependencies are fresh — minimizing unnecessary builds and reducing compute cost. Recommended to use in state-aware orchestration. | ❌ | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | Some notes when using `loaded_at_field` or `loaded_at_query`: * You can either define `loaded_at_field` or `loaded_at_query` but not both. * To use a multi-line SQL query for a `loaded_at_query` configuration, include your query as a YAML block so dbt can execute it as the custom freshness query. For example: ```yaml loaded_at_query: | select max(ingested_at) from {{ this }} where ingested_at >= current_timestamp - interval '3 days' ``` * If a source is a view in the data warehouse, dbt can’t track updates from the warehouse metadata when the view changes. Without a `loaded_at_field` or `loaded_at_query`, dbt treats the source as "always fresh” and emits a warning during freshness checks. To check freshness for sources that are views, add a `loaded_at_field` or `loaded_at_query` to your configuration. To learn more about model freshness and `build_after`, refer to [model `freshness` config](https://docs.getdbt.com/reference/resource-configs/freshness.md). To learn more about source and upstream model freshness configs, refer to [resource `freshness` config](https://docs.getdbt.com/reference/resource-properties/freshness.md). ##### Customizing behavior[​](#customizing-behavior "Direct link to Customizing behavior") You can optionally configure state-aware orchestration when you want to fine-tune orchestration behavior for these reasons: * **Defining source freshness:** By default, dbt uses metadata from the data warehouse to automatically detect when source data changes. Freshness configuration is not required for state-aware orchestration to work. You can optionally configure source freshness if you want to: * Receive alerts when sources don't update within your expected Service Level Agreement (SLA) using `warn_after`/`error_after`. * Specify a custom column using `loaded_at_field`. * Specify a custom SQL statement using `loaded_at_query` to define what freshness means. Not all source freshness is equal — especially with partial ingestion pipelines. You may want to delay a model build until your sources have received a larger volume of data or until a specific time window has passed. You can define what "fresh" means on a source-by-source basis using a custom freshness query. This lets you: * Add a time difference to account for late-arriving data * Delay freshness detection until a threshold is reached (for example, number of records or hours of data) The following examples show how to configure a source so that state-aware orchestration detects new upstream data only when your custom condition is met. * loaded\_at\_field * loaded\_at\_query State-aware orchestration treats the source as fresh when the maximum value of the `loaded_at_field` column changes since the previous run: models/sources.yml ```yaml sources: - name: jaffle_shop config: freshness: warn_after: {count: 12, period: hour} error_after: {count: 24, period: hour} loaded_at_field: _etl_loaded_at ``` To define freshness with custom SQL, use `loaded_at_query`. State-aware orchestration runs the query to get a single timestamp. When that value changes compared to the previous run, the source is considered fresh. models/sources.yml ```yaml sources: - name: raw_orders tables: - name: orders loaded_at_query: | select max(ingested_at) from {{ this }} where ingested_at >= current_timestamp - interval '3 days' ``` In this example, dbt runs the custom `loaded_at_query` to get a single timestamp — the latest `ingested_at` within the last three days. On each run, dbt compares this new maximum timestamp to the value from the previous run. If the maximum timestamp is newer, state-aware orchestration considers the source to have fresh data and may trigger rebuilds. * **Reducing model build frequency** Some models don’t need to be rebuilt every time their source data is updated. To control this: * Set a refresh interval on models, folders, or the project to define how often they should be rebuilt at most * This helps avoid overbuilding and reduces costs by only running what's really needed * **Changing the default from `any` to `all`** Based on what a model depends on upstream, you may want to wait until all upstream models have been refreshed rather than going as soon as there is any new data. * Change what orchestration waits on from any to all for models, folders, or the project to wait until all upstream models have new data * This helps avoid overbuilding and reduces costs by building models once everything has been refreshed To configure and customize behavior, you can do so in the following places using the `build_after` config: * `dbt_project.yml` at the project level in YAML * `model/properties.yml` at the model level in YAML * `model/model.sql` at the model level in SQL These configurations are powerful because you can define a sensible default at the project level or for specific model folders, and override it for individual models or model groups that require more frequent updates. ##### Handling late-arriving data[​](#handling-late-arriving-data "Direct link to Handling late-arriving data") If your incremental models use a lookback window to capture [late-arriving data](https://docs.getdbt.com/best-practices/materializations/4-incremental-models.md#late-arriving-facts), make sure your freshness logic aligns with that window. When you use a `loaded_at_field` or `loaded_at_query`, state-aware orchestration uses that value to determine whether new data has arrived. When the `loaded_at` value reflects an event timestamp (for example, `event_date`), late-arriving records may not update this value if the event occurred in the past. In these cases, state-aware orchestration may not trigger a rebuild, even though your incremental model’s lookback window would normally include those rows. To ensure late-arriving data is detected by state-aware orchestration, use `loaded_at_query` and make sure it aligns with the same lookback window used in your incremental filter. See the following samples of a lookback window and its corresponding `loaded_at_query` value: * Lookback window * loaded\_at\_query ```sql {{ config( materialized='incremental', unique_key='order_id' ) }} select * from {{ source('raw_orders', 'orders') }} {% if is_incremental() %} where ingested_at > (select max(ingested_at) from {{ this }}) - interval '3 days' {% endif %} ``` ```yaml loaded_at_query: | select max(ingested_at) from {{ this }} where ingested_at >= current_timestamp - interval '3 days' ``` #### Example[​](#example "Direct link to Example") Let's use an example to illustrate how to customize our project so a model and its parent model are rebuilt only if they haven't been refreshed in the past 4 hours — even if a job runs more frequently than that. A Jaffle shop has recently expanded globally and wanted to make savings. To reduce spend, they found out about dbt's state-aware orchestration and want to rebuild models only when needed. Maggie — the analytics engineer — wants to configure her dbt `jaffle_shop` project to only rebuild certain models if they haven't been refreshed in the last 4 hours, even if a job runs more often than that. To do this, she uses the model `freshness` config. This config helps state-aware orchestration decide *when* a model should be rebuilt. Note that for every `freshness` config, you're required to set values for both `count` and `period`. This applies to all `freshness` types: `freshness.warn_after`, `freshness.error_after`, and `freshness.build_after`. Refer to the following examples for using the `freshness` config in the model file, in the project YAML file, and in the config block of the `model.sql` file: * Model YAML * Project YAML file * SQL file config models/model.yml ```yaml models: - name: dim_wizards config: freshness: build_after: count: 4 # how long to wait before rebuilding period: hour # unit of time updates_on: all # only rebuild if all upstream dependencies have new data - name: dim_worlds config: freshness: build_after: count: 4 period: hour updates_on: all ``` dbt\_project.yml ```yaml models: : +freshness: build_after: count: 4 period: hour updates_on: all ``` models/\.sql ```jinja {{ config( freshness={ "build_after": { "count": 4, "period": "hour", "updates_on": "all" } } ) }} ``` With this config, dbt: * Checks if there's new data in the upstream sources * Checks when `dim_wizards` and `dim_worlds` were last built If any new data is available *and* at least 4 hours have passed, dbt rebuilds the models. You can override freshness rules set at higher levels in your dbt project. For example, in the project YAML file, you set: dbt\_project.yml ```yml models: +freshness: build_after: count: 4 period: hour jaffle_shop: # this needs to match your project `name:` in dbt_project.yml staging: +materialized: view marts: +materialized: table ``` This configuration means that every model in the project has a `build_after` of 4 hours. To change this for specific models or groups of models, you could set: dbt\_project.yml ```yml models: +freshness: build_after: count: 4 period: hour marts: # only applies to models inside the marts folder +freshness: build_after: count: 1 period: hour ``` If you want to exclude a model from the freshness rule set at a higher level, set `freshness: null` for that model. With freshness disabled, state-aware orchestration falls back to its default behavior and builds the model whenever there’s an upstream code or data change. ##### Differences between `all` and `any`[​](#differences-between-all-and-any "Direct link to differences-between-all-and-any") * Since Maggie configured `updates_on: all`, this means *both* models must have new upstream data to trigger a rebuild. If only one model has fresh data and the other doesn't, nothing is built -- which will massively reduce unnecessary compute costs and save time. * If Maggie wanted these models to rebuild more often (for example, if *any* upstream source has new data), she would then use `updates_on: any` instead: models/model.yml ```yaml freshness: build_after: count: 1 period: hour updates_on: any ``` This way, if either `dim_wizards` or `dim_worlds` has fresh upstream data and enough time passed, dbt rebuilds the models. This method helps when the need for fresher data outweighs the costs. #### Related docs[​](#related-docs "Direct link to Related docs") * [State-aware orchestration configuration](https://docs.getdbt.com/docs/deploy/state-aware-about.md) * [Artifacts](https://docs.getdbt.com/docs/deploy/artifacts.md) * [Continuous integration (CI) jobs](https://docs.getdbt.com/docs/deploy/ci-jobs.md) * [`freshness`](https://docs.getdbt.com/reference/resource-configs/freshness.md) #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Simple metrics The following displays the complete specification for simple metrics, along with an example. For advanced data modeling, you can use `fill_nulls_with` and `join_to_timespine` to [set null metric values to zero](https://docs.getdbt.com/docs/build/fill-nulls-advanced.md), ensuring numeric values for every data row. #### Simple metrics example[​](#simple-metrics-example "Direct link to Simple metrics example") #### Related docs[​](#related-docs "Direct link to Related docs") * [Fill null values for simple, derived, or ratio metrics](https://docs.getdbt.com/docs/build/fill-nulls-advanced.md) #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Snowflake and Apache Iceberg dbt supports materializing the table in Iceberg table format in two different ways: * The model configuration field `table_format = 'iceberg'` (legacy) * Catalog integration can be configured in a config block (inside the `.sql` model file), properties YAML file (model folder), or project YAML file ([`dbt_project.yml`](https://docs.getdbt.com/reference/dbt_project.yml.md)) Catalog integration configuration You need to create a `catalogs.yml` file to use the integration and apply that integration on the config level. Refer to [Configure catalog integration](#configure-catalog-integration-for-managed-iceberg-tables) for more information. We recommend using the Iceberg catalog configuration and applying the catalog in the model config for ease of use and to future-proof your code. Using `table_format = 'iceberg'` directly on the model configuration is a legacy approach and limits usage to just Snowflake Horizon as the catalog. Catalog support is available on dbt 1.10+. #### Creating Iceberg tables[​](#creating-iceberg-tables "Direct link to Creating Iceberg tables") dbt supports creating Iceberg tables for three of the Snowflake materializations: * [Table](https://docs.getdbt.com/docs/build/materializations.md#table) * [Incremental](https://docs.getdbt.com/docs/build/materializations.md#incremental) * [Dynamic Table](https://docs.getdbt.com/reference/resource-configs/snowflake-configs.md#dynamic-tables) #### Iceberg catalogs[​](#iceberg-catalogs "Direct link to Iceberg catalogs") Snowflake has support for Iceberg tables via built-in and external catalogs, including: * Snowflake Horizon (the built-in catalog) * Polaris/Open Catalog (managed Polaris) * Glue Data Catalog (Supported in dbt-snowflake through a [catalog-linked database](https://docs.snowflake.com/en/user-guide/tables-iceberg-catalog-linked-database#label-catalog-linked-db-create) with Iceberg REST) * Iceberg REST Compatible dbt supports the Snowflake built-in catalog and Iceberg REST-compatible catalogs (including Polaris and Unity Catalog) on dbt-snowflake. To use an externally managed catalog (anything outside of the built-in catalog), you must set up a catalog integration. To do so, you must run a SQL command similar to the following. ##### External catalogs[​](#external-catalogs "Direct link to External catalogs") Example configurations for external catalogs. * Polaris/Open Catalog * Glue Data Catalog * Iceberg REST API You must set up a catalog integration to use Polaris/Open Catalog (managed Polaris). Example code: ```sql CREATE CATALOG INTEGRATION my_polaris_catalog_int CATALOG_SOURCE = POLARIS TABLE_FORMAT = ICEBERG REST_CONFIG = ( CATALOG_URI = 'https://-.snowflakecomputing.com/polaris/api/catalog' CATALOG_NAME = '' ) REST_AUTHENTICATION = ( TYPE = OAUTH OAUTH_CLIENT_ID = '' OAUTH_CLIENT_SECRET = '' OAUTH_ALLOWED_SCOPES = ('PRINCIPAL_ROLE:ALL') ) ENABLED = TRUE; ``` Executing this will register the external Polaris catalog with Snowflake. Once configured, dbt can create Iceberg tables in Snowflake that register the existence of the new database object with the catalog as metadata and query Polaris-managed tables. To configure Glue Data Catalog as the external catalog, you will need to set up two prerequisites: * **Create AWS IAM Role for Glue Access:** Configure AWS permissions so Snowflake can read the Glue Catalog. This typically means creating an AWS IAM role that Snowflake will assume, with policies allowing Glue catalog read operations (at minimum, `glue:GetTable` and `glue:GetTables` on the relevant Glue databases). Attach a trust policy to enable Snowflake to assume this role (via an external ID). * **Set up the catalog integration:** In Snowflake, create a catalog integration of type GLUE. This registers the Glue Data Catalog information and the IAM role with Snowflake. For example: ```sql CREATE CATALOG INTEGRATION my_glue_catalog_int CATALOG_SOURCE = GLUE CATALOG_NAMESPACE = 'dbt_database' TABLE_FORMAT = ICEBERG GLUE_AWS_ROLE_ARN = 'arn:aws:iam::123456789012:role/myGlueRole' GLUE_CATALOG_ID = '123456789012' GLUE_REGION = 'us-east-2' ENABLED = TRUE; ``` Glue Data Catalog supports the Iceberg REST specification so that you can connect to Glue via the Iceberg REST API. ###### Table materialization in Snowflake[​](#table-materialization-in-snowflake "Direct link to Table materialization in Snowflake") Starting in dbt Core v1.11, dbt-snowflake supports basic table materialization on Iceberg tables registered in a Glue catalog through a catalog-linked database. Note that incremental materializations are not yet supported. This feature requires the following: * **Catalog-linked database:** You must use a [catalog-linked database](https://docs.snowflake.com/en/user-guide/tables-iceberg-catalog-linked-database#label-catalog-linked-db-create) configured for your Glue Catalog integration. * **Identifier format:** Table and column names must use only alphanumeric characters (letters and numbers), be lowercase, and surrounded by double quotes for Glue compatibility. To specify Glue as the database type, add `catalog_linked_database_type: glue` under the `adapter_properties` section: ```yml catalogs: - name: my_glue_catalog active_write_integration: glue_rest write_integrations: - name: glue_rest catalog_type: iceberg_rest table_format: iceberg adapter_properties: catalog_linked_database: catalog_linked_db_glue catalog_linked_database_type: glue ``` You can set up an integration for your catalogs that are compatible with the open-source Apache Iceberg REST specification, Example code: ```sql CREATE CATALOG INTEGRATION my_iceberg_catalog_int CATALOG_SOURCE = ICEBERG_REST TABLE_FORMAT = ICEBERG CATALOG_NAMESPACE = 'dbt_database' REST_CONFIG = ( restConfigParams ) REST_AUTHENTICATION = ( restAuthenticationParams ) ENABLED = TRUE REFRESH_INTERVAL_SECONDS = COMMENT = 'catalog integration for dbt iceberg tables' ``` For Unity Catalog with a bearer token : ```sql CREATE OR REPLACE CATALOG INTEGRATION my_unity_catalog_int_pat CATALOG_SOURCE = ICEBERG_REST TABLE_FORMAT = ICEBERG CATALOG_NAMESPACE = 'my_namespace' REST_CONFIG = ( CATALOG_URI = 'https://my-api/api/2.1/unity-catalog/iceberg' CATALOG_NAME= '' ) REST_AUTHENTICATION = ( TYPE = BEARER BEARER_TOKEN = '' ) ENABLED = TRUE; ``` After you have created the external catalog integration, you will be able to do two things: * **Query an externally managed table:** Snowflake can query Iceberg tables whose metadata lives in the external catalog. In this scenario, Snowflake is a "reader" of the external catalog. The table’s data remains in external cloud storage (AWS S3 or GCP Bucket) as defined in the catalog storage configuration. Snowflake will use the catalog integration to fetch metadata via the REST API. Snowflake then reads the data files from cloud storage. * **Sync Snowflake-managed tables to an external catalog:** You can create a Snowflake Iceberg table that Snowflake manages via a cloud storage location and then register/sync that table to the external catalog. This allows other engines to discover the table. #### dbt Catalog integration configurations for Snowflake[​](#dbt-catalog-integration-configurations-for-snowflake "Direct link to dbt Catalog integration configurations for Snowflake") The following table outlines the configuration fields required to set up a catalog integration for [Iceberg tables in Snowflake](https://docs.getdbt.com/reference/resource-configs/snowflake-configs.md#iceberg-table-format). | Field | Required | Accepted values | | -------------------- | -------- | ---------------------------------------------------------------------------------------- | | `name` | yes | Name of catalog integration | | `catalog_name` | yes | The name of the catalog integration in Snowflake. For example, `my_dbt_iceberg_catalog`) | | `external_volume` | yes | `` | | `table_format` | yes | `iceberg` | | `catalog_type` | yes | `built_in`, `iceberg_rest` | | `adapter_properties` | optional | See below | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | You can connect to external Iceberg-compatible catalogs, such as Polaris and Unity Catalog, via the Iceberg REST `catalog_type`. Please note that we only support Iceberg REST with [Catalog Linked Databases](https://docs.snowflake.com/en/user-guide/tables-iceberg-catalog-linked-database). ##### Adapter properties[​](#adapter-properties "Direct link to Adapter properties") These are the additional configurations, unique to Snowflake, that can be supplied and nested under `adapter_properties`. ###### Built-in catalog[​](#built-in-catalog "Direct link to Built-in catalog") ###### REST catalog[​](#rest-catalog "Direct link to REST catalog") | Field | Required | Accepted values | | --------------------------------- | ----------------------------------------- | ------------------------------------------------------------------------------- | | `auto_refresh` | Optional | `True` or `False` | | `catalog_linked_database` | Required for `catalog type: iceberg_rest` | Catalog-linked database name | | `catalog_linked_database_type` | Optional | Catalog-linked database type. For example, `glue` | | `max_data_extension_time_in_days` | Optional | `0` to `90` with a default of `14` | | `target_file_size` | Optional | Values like `'AUTO'`, `'16MB'`, `'32MB'`, `'64MB'`, `'128MB'`. Case-insensitive | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | * **storage\_serialization\_policy:** The serialization policy tells Snowflake what kind of encoding and compression to perform on the table data files. If not specified at table creation, the table inherits the value set at the schema, database, or account level. If the value isn’t specified at any level, the table uses the default value. You can’t change the value of this parameter after table creation. * **max\_data\_extension\_time\_in\_days:** The maximum number of days Snowflake can extend the data retention period for tables to prevent streams on the tables from becoming stale. The `MAX_DATA_EXTENSION_TIME_IN_DAYS` parameter enables you to limit this automatic extension period to control storage costs for data retention, or for compliance reasons. * **data\_retention\_time\_in\_days:** For managed Iceberg tables, you can set a retention period for Snowflake Time Travel and undropping the table over the default account values. For tables that use an external catalog, Snowflake uses the value of the DATA\_RETENTION\_TIME\_IN\_DAYS parameter to set a retention period for Snowflake Time Travel and undropping the table. When the retention period expires, Snowflake does not delete the Iceberg metadata or snapshots from your external cloud storage. * **change\_tracking:** Specifies whether to enable change tracking on the table. * **catalog\_linked\_database:** [Catalog-linked databases](https://docs.snowflake.com/en/user-guide/tables-iceberg-catalog-linked-database) (CLD) in Snowflake ensures that Snowflake can automatically sync metadata (including namespaces and iceberg tables) from the external Iceberg Catalog and registers them as remote tables in the catalog-linked database. The reason we require the usage of Catalog-linked databases for building Iceberg tables with external catalogs is that without it, dbt will be unable to truly manage the table end-to-end. Snowflake does not support dropping the Iceberg table on non-CLDs in the external catalog; instead, it only allows unlinking the Snowflake table, which creates a discrepancy with how dbt expects to manage the materialized object. * **auto\_refresh:** Specifies whether Snowflake should automatically poll the external Iceberg catalog for metadata updates. If `REFRESH_INTERVAL_SECONDS` isn’t set on the catalog integration, the default refresh interval is 30 seconds. * **target\_file\_size:** Specifies a target Parquet file size. Default is `AUTO`. ##### Configure catalog integration for managed Iceberg tables[​](#configure-catalog-integration-for-managed-iceberg-tables "Direct link to Configure catalog integration for managed Iceberg tables") 1. Create a `catalogs.yml` at the top level of your dbt project.

An example of Snowflake Horizon as the catalog: ```yaml catalogs: - name: catalog_horizon active_write_integration: snowflake_write_integration write_integrations: - name: snowflake_write_integration external_volume: dbt_external_volume table_format: iceberg catalog_type: built_in adapter_properties: change_tracking: True ``` 2. Add the `catalog_name` config parameter in either a config block (inside the .sql model file), properties YAML file (model folder), or your project YAML file (`dbt_project.yml`).

An example of `iceberg_model.sql`: ```sql {{ config( materialized='table', catalog_name = 'catalog_horizon' ) }} select * from {{ ref('jaffle_shop_customers') }} ``` 3. Execute the dbt model with a `dbt run -s iceberg_model`. For more information, refer to our documentation on [Snowflake configurations](https://docs.getdbt.com/reference/resource-configs/snowflake-configs.md). ##### Limitations[​](#limitations "Direct link to Limitations") The syncing experience will be different depending on the catalog you choose. Some catalogs are automatically refreshed, and you can set parameters to do so with your catalog integration. Other catalogs might require a separate job to manage the metadata sync. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Source freshness dbt provides a helpful interface around dbt's [source data freshness](https://docs.getdbt.com/docs/build/sources.md#source-data-freshness) calculations. When a dbt job is configured to snapshot source data freshness, dbt will render a user interface showing you the state of the most recent snapshot. This interface is intended to help you determine if your source data freshness is meeting the service level agreement (SLA) that you've defined for your organization. [![Data Sources in dbt](/img/docs/dbt-cloud/using-dbt-cloud/data-sources-next.png?v=2 "Data Sources in dbt")](#)Data Sources in dbt ##### Enabling source freshness snapshots[​](#enabling-source-freshness-snapshots "Direct link to Enabling source freshness snapshots") [`dbt build`](https://docs.getdbt.com/reference/commands/build.md) does *not* include source freshness checks when building and testing resources in your DAG. Instead, you can use one of these common patterns for defining jobs: * Add `dbt build` to the run step to run models, tests, and so on. * Select the **Generate docs on run** checkbox to automatically [generate project docs](https://docs.getdbt.com/docs/explore/build-and-view-your-docs.md). * Select the **Run source freshness** checkbox to enable [source freshness](#checkbox) as the first step of the job. [![Selecting source freshness](/img/docs/dbt-cloud/select-source-freshness.png?v=2 "Selecting source freshness")](#)Selecting source freshness To enable source freshness snapshots, firstly make sure to configure your sources to [snapshot freshness information](https://docs.getdbt.com/docs/build/sources.md#source-data-freshness). You can add source freshness to the list of commands in the job run steps or enable the checkbox. However, you can expect different outcomes when you configure a job by selecting the **Run source freshness** checkbox compared to adding the command to the run steps. Review the following options and outcomes: | Options | Outcomes | | ----------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | **Select checkbox[]()** | The **Run source freshness** checkbox in your **Execution Settings** will run `dbt source freshness` as the first step in your job and won't break subsequent steps if it fails. If you wanted your job dedicated *exclusively* to running freshness checks, you still need to include at least one placeholder step, such as `dbt compile`. | | **Add as a run step** | Add the `dbt source freshness` command to a job anywhere in your list of run steps. However, if your source data is out of date — this step will "fail", and subsequent steps will not run. dbt will trigger email notifications (if configured) based on the end state of this step.

You can create a new job to snapshot source freshness.

If you *do not* want your models to run if your source data is out of date, then it could be a good idea to run `dbt source freshness` as the first step in your job. Otherwise, we recommend adding `dbt source freshness` as the last step in the job, or creating a separate job just for this task. | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | [![Adding a step to snapshot source freshness](/img/docs/dbt-cloud/using-dbt-cloud/job-step-source-freshness.png?v=2 "Adding a step to snapshot source freshness")](#)Adding a step to snapshot source freshness ##### Source freshness snapshot frequency[​](#source-freshness-snapshot-frequency "Direct link to Source freshness snapshot frequency") It's important that your freshness jobs run frequently enough to snapshot data latency in accordance with your SLAs. You can imagine that if you have a 1 hour SLA on a particular dataset, snapshotting the freshness of that table once daily would not be appropriate. As a good rule of thumb, you should run your source freshness jobs with at least double the frequency of your lowest SLA. Here's an example table of some reasonable snapshot frequencies given typical SLAs: | SLA | Snapshot Frequency | | ------ | ------------------ | | 1 hour | 30 mins | | 1 day | 12 hours | | 1 week | About daily | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | #### Further reading[​](#further-reading "Direct link to Further reading") * Refer to [Artifacts](https://docs.getdbt.com/docs/deploy/artifacts.md) for more info on how to create dbt artifacts, share links to the latest documentation, and share source freshness reports with your team. * Source freshness for Snowflake is calculated using the `LAST_ALTERED` column. Read about the limitations in [Snowflake configs](https://docs.getdbt.com/reference/resource-configs/snowflake-configs.md#source-freshness-known-limitation). #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### SQL models #### Related reference docs[​](#related-reference-docs "Direct link to Related reference docs") * [Model configurations](https://docs.getdbt.com/reference/model-configs.md) * [Model properties](https://docs.getdbt.com/reference/model-properties.md) * [`run` command](https://docs.getdbt.com/reference/commands/run.md) * [`ref` function](https://docs.getdbt.com/reference/dbt-jinja-functions/ref.md) #### Getting started[​](#getting-started "Direct link to Getting started") Building your first models If you're new to dbt, we recommend that you read a [quickstart guide](https://docs.getdbt.com/guides.md) to build your first dbt project with models. dbt's Python capabilities are an extension of its capabilities with SQL models. If you're new to dbt, we recommend that you read this page first, before reading: ["Python Models"](https://docs.getdbt.com/docs/build/python-models.md) A SQL model is a `select` statement. Models are defined in `.sql` files (typically in your `models` directory): * Each `.sql` file contains one model / `select` statement * The model name is inherited from the filename and must match the *filename* of a model — including case sensitivity. Any mismatched casing can prevent dbt from applying configurations correctly and may affect metadata in [Catalog](https://docs.getdbt.com/docs/explore/explore-projects.md). * We strongly recommend using underscores for model names, not dots. For example, use `models/my_model.sql` instead of `models/my.model.sql`. * Models can be nested in subdirectories within the `models` directory. Refer to [How we style our dbt models](https://docs.getdbt.com/best-practices/how-we-style/1-how-we-style-our-dbt-models.md) for details on how we recommend you name your models. When you execute the [`dbt run` command](https://docs.getdbt.com/reference/commands/run.md), dbt will build this model data warehouse by wrapping it in a `create view as` or `create table as` statement. For example, consider this `customers` model: models/customers.sql ```sql with customer_orders as ( select customer_id, min(order_date) as first_order_date, max(order_date) as most_recent_order_date, count(order_id) as number_of_orders from jaffle_shop.orders group by 1 ) select customers.customer_id, customers.first_name, customers.last_name, customer_orders.first_order_date, customer_orders.most_recent_order_date, coalesce(customer_orders.number_of_orders, 0) as number_of_orders from jaffle_shop.customers left join customer_orders using (customer_id) ``` When you execute `dbt run`, dbt will build this as a *view* named `customers` in your target schema: ```sql create view dbt_alice.customers as ( with customer_orders as ( select customer_id, min(order_date) as first_order_date, max(order_date) as most_recent_order_date, count(order_id) as number_of_orders from jaffle_shop.orders group by 1 ) select customers.customer_id, customers.first_name, customers.last_name, customer_orders.first_order_date, customer_orders.most_recent_order_date, coalesce(customer_orders.number_of_orders, 0) as number_of_orders from jaffle_shop.customers left join customer_orders using (customer_id) ) ``` Why a *view* named `dbt_alice.customers`? By default dbt will: * Create models as views * Build models in a target schema you define * Use your file name as the view or table name in the database You can use *configurations* to change any of these behaviors — more on that later. ##### FAQs[​](#faqs "Direct link to FAQs") How can I see the SQL that dbt is running? To check out the SQL that dbt is running, you can look in: * dbt: * Within the run output, click on a model name, and then select "Details" * dbt Core: * The `target/compiled/` directory for compiled `select` statements * The `target/run/` directory for compiled `create` statements * The `logs/dbt.log` file for verbose logging. Do I need to create my target schema before running dbt? Nope! dbt will check if the schema exists when it runs. If the schema does not exist, dbt will create it for you. If I rerun dbt, will there be any downtime as models are rebuilt? Nope! The SQL that dbt generates behind the scenes ensures that any relations are replaced atomically (i.e. your business users won't experience any downtime). The implementation of this varies on each warehouse, check out the [logs](https://docs.getdbt.com/faqs/Runs/checking-logs.md) to see the SQL dbt is executing. What happens if the SQL in my query is bad or I get a database error? If there's a mistake in your SQL, dbt will return the error that your database returns. ```shell $ dbt run --select customers Running with dbt=1.9.0 Found 3 models, 9 tests, 0 snapshots, 0 analyses, 133 macros, 0 operations, 0 seed files, 0 sources 14:04:12 | Concurrency: 1 threads (target='dev') 14:04:12 | 14:04:12 | 1 of 1 START view model dbt_alice.customers.......................... [RUN] 14:04:13 | 1 of 1 ERROR creating view model dbt_alice.customers................. [ERROR in 0.81s] 14:04:13 | 14:04:13 | Finished running 1 view model in 1.68s. Completed with 1 error and 0 warnings: Database Error in model customers (models/customers.sql) Syntax error: Expected ")" but got identifier `your-info-12345` at [13:15] compiled SQL at target/run/jaffle_shop/customers.sql Done. PASS=0 WARN=0 ERROR=1 SKIP=0 TOTAL=1 ``` Any models downstream of this model will also be skipped. Use the error message and the [compiled SQL](https://docs.getdbt.com/faqs/Runs/checking-logs.md) to debug any errors. Which SQL dialect should I write my models in? Or which SQL dialect does dbt use? dbt can feel like magic, but it isn't actually magic. Under the hood, it's running SQL in your own warehouse — your data is not processed outside of your warehouse. As such, your models should just use the **SQL dialect of your own database**. Then, when dbt wraps your `select` statements in the appropriate DDL or DML, it will use the correct DML for your warehouse — all of this logic is written in to dbt. You can find more information about the databases, platforms, and query engines that dbt supports in the [Supported Data Platforms](https://docs.getdbt.com/docs/supported-data-platforms.md) docs. Want to go a little deeper on how this works? Consider a snippet of SQL that works on each warehouse: models/test\_model.sql ```sql select 1 as my_column ``` To replace an existing table, here's an *illustrative* example of the SQL dbt will run on different warehouses (the actual SQL can get much more complicated than this!) * Redshift * BigQuery * Snowflake ```sql -- you can't create or replace on redshift, so use a transaction to do this in an atomic way begin; create table "dbt_alice"."test_model__dbt_tmp" as ( select 1 as my_column ); alter table "dbt_alice"."test_model" rename to "test_model__dbt_backup"; alter table "dbt_alice"."test_model__dbt_tmp" rename to "test_model" commit; begin; drop table if exists "dbt_alice"."test_model__dbt_backup" cascade; commit; ``` ```sql -- Make an API call to create a dataset (no DDL interface for this)!!; create or replace table `dbt-dev-87681`.`dbt_alice`.`test_model` as ( select 1 as my_column ); ``` ```sql create schema if not exists analytics.dbt_alice; create or replace table analytics.dbt_alice.test_model as ( select 1 as my_column ); ``` #### Configuring models[​](#configuring-models "Direct link to Configuring models") Configurations are "model settings" that you can set in your `dbt_project.yml` file, *and* in your model file using a `config` block. Some example configurations include: * Changing the materialization that a model uses — a [materialization](https://docs.getdbt.com/docs/build/materializations.md) determines the SQL that dbt uses to create the model in your warehouse. * Build models into separate [schemas](https://docs.getdbt.com/docs/build/custom-schemas.md). * Apply [tags](https://docs.getdbt.com/reference/resource-configs/tags.md) to a model. The following diagram shows an example directory structure of a models folder: ```text models ├── staging └── marts └── marketing ``` Here's an example of a model configuration: dbt\_project.yml ```yaml name: jaffle_shop config-version: 2 ... models: jaffle_shop: # this matches the `name:`` config +materialized: view # this applies to all models in the current project marts: +materialized: table # this applies to all models in the `marts/` directory marketing: +schema: marketing # this applies to all models in the `marts/marketing/`` directory ``` models/customers.sql ```sql {{ config( materialized="view", schema="marketing" ) }} with customer_orders as ... ``` It is important to note that configurations are applied hierarchically — a configuration applied to a subdirectory will override any general configurations. You can learn more about configurations in the [reference docs](https://docs.getdbt.com/reference/model-configs.md). ##### FAQs[​](#faqs-1 "Direct link to FAQs") What materializations are available in dbt? dbt ships with five built-in materializations: `view`, `table`, `incremental`, `ephemeral`, and `materialized_view`. Check out the documentation on [materializations](https://docs.getdbt.com/docs/build/materializations.md) for more information on each of these options. You can also create your own [custom materializations](https://docs.getdbt.com/guides/create-new-materializations.md). This is an advanced feature of dbt. What model configurations exist? You can also configure: * [tags](https://docs.getdbt.com/reference/resource-configs/tags.md) to support easy categorization and graph selection * [custom schemas](https://docs.getdbt.com/reference/resource-properties/schema.md) to split your models across multiple schemas * [aliases](https://docs.getdbt.com/reference/resource-configs/alias.md) if your view/table name should differ from the filename * Snippets of SQL to run at the start or end of a model, known as [hooks](https://docs.getdbt.com/docs/build/hooks-operations.md) * Warehouse-specific configurations for performance (e.g. `sort` and `dist` keys on Redshift, `partitions` on BigQuery) Check out the docs on [model configurations](https://docs.getdbt.com/reference/model-configs.md) to learn more. #### Building dependencies between models[​](#building-dependencies-between-models "Direct link to Building dependencies between models") You can build dependencies between models by using the [`ref` function](https://docs.getdbt.com/reference/dbt-jinja-functions/ref.md) in place of table names in a query. Use the name of another model as the argument for `ref`. * Model * Compiled code in dev * Compiled code in prod models/customers.sql ```sql with customers as ( select * from {{ ref('stg_customers') }} ), orders as ( select * from {{ ref('stg_orders') }} ), ... ``` ```sql create view dbt_alice.customers as ( with customers as ( select * from dbt_alice.stg_customers ), orders as ( select * from dbt_alice.stg_orders ), ... ) ... ``` ```sql create view analytics.customers as ( with customers as ( select * from analytics.stg_customers ), orders as ( select * from analytics.stg_orders ), ... ) ... ``` dbt uses the `ref` function to: * Determine the order to run the models by creating a dependent acyclic graph (DAG). [![The DAG for our dbt project](/img/dbt-dag.png?v=2 "The DAG for our dbt project")](#)The DAG for our dbt project * Manage separate environments — dbt will replace the model specified in the `ref` function with the database name for the table (or view). Importantly, this is environment-aware — if you're running dbt with a target schema named `dbt_alice`, it will select from an upstream table in the same schema. Check out the tabs above to see this in action. Additionally, the `ref` function encourages you to write modular transformations, so that you can re-use models, and reduce repeated code. #### Testing and documenting models[​](#testing-and-documenting-models "Direct link to Testing and documenting models") You can also document and test models — skip ahead to the section on [testing](https://docs.getdbt.com/docs/build/data-tests.md) and [documentation](https://docs.getdbt.com/docs/build/documentation.md) for more information. #### Additional FAQs[​](#additional-faqs "Direct link to Additional FAQs") Are there any example dbt models? Yes! * **Quickstart Tutorial:** You can build your own example dbt project in the [quickstart guide](https://docs.getdbt.com/docs/get-started-dbt.md) * **Jaffle Shop:** A demonstration project (closely related to the tutorial) for a fictional e-commerce store ([main source code](https://github.com/dbt-labs/jaffle-shop) and [source code using duckdb](https://github.com/dbt-labs/jaffle_shop_duckdb)) * **GitLab:** Gitlab's internal dbt project is open source and is a great example of how to use dbt at scale ([source code](https://gitlab.com/gitlab-data/analytics/-/tree/master/transform/snowflake-dbt)) * **dummy-dbt:** A containerized dbt project that populates the Sakila database in Postgres and populates dbt seeds, models, snapshots, and tests. The project can be used for testing and experimentation purposes ([source code](https://github.com/gmyrianthous/dbt-dummy)) * **Google Analytics 4:** A demonstration project that transforms the Google Analytics 4 BigQuery exports to various models ([source code](https://github.com/stacktonic-com/stacktonic-dbt-example-project), [docs](https://stacktonic.com/article/google-analytics-big-query-and-dbt-a-dbt-example-project)) * **Make Open Data:** A production-grade ELT with tests, documentation, and CI/CD (GHA) about French open data (housing, demography, geography, etc). It can be used to learn with voluminous and ambiguous data. Contributions are welcome ([source code](https://github.com/make-open-data/make-open-data), [docs](https://make-open-data.fr/)) If you have an example project to add to this list, suggest an edit by clicking **Edit this page** below. Can I store my models in a directory other than the \`models\` directory in my project? By default, dbt expects the files defining your models to be located in the `models` subdirectory of your project. To change this, update the [model-paths](https://docs.getdbt.com/reference/project-configs/model-paths.md) configuration in your `dbt_project.yml` file, like so: dbt\_project.yml ```yml model-paths: ["transformations"] ``` Can I build my models in a schema other than my target schema or split my models across multiple schemas? Yes! Use the [schema](https://docs.getdbt.com/reference/resource-configs/schema.md) configuration in your `dbt_project.yml` file, or using a `config` block: dbt\_project.yml ```yml name: jaffle_shop ... models: jaffle_shop: marketing: +schema: marketing # models in the `models/marketing/` subdirectory will use the marketing schema ``` models/customers.sql ```sql {{ config( schema='core' ) }} ``` Do ref-able resource names need to be unique? Within one project: yes! To build dependencies between resources (such as models, seeds, and snapshots), you need to use the `ref` function, and pass in the resource name as an argument. dbt uses that resource name to uniquely resolve the `ref` to a specific resource. As a result, these resource names need to be unique, *even if they are in distinct folders*. A resource in one project can have the same name as a resource in another project (installed as a dependency). dbt uses the project name to uniquely identify each resource. We call this "namespacing." If you `ref` a resource with a duplicated name, it will resolve to the resource within the same namespace (package or project), or raise an error because of an ambiguous reference. Use [two-argument `ref`](https://docs.getdbt.com/reference/dbt-jinja-functions/ref.md#ref-project-specific-models) to disambiguate references by specifying the namespace. Those resource will still need to land in distinct locations in the data warehouse. Read the docs on [custom aliases](https://docs.getdbt.com/docs/build/custom-aliases.md) and [custom schemas](https://docs.getdbt.com/docs/build/custom-schemas.md) for details on how to achieve this. How do I remove deleted models from my data warehouse? If you delete a model from your dbt project, dbt does not automatically drop the relation from your schema. This means that you can end up with extra objects in schemas that dbt creates, which can be confusing to other users. (This can also happen when you switch a model from being a view or table, to ephemeral) When you remove models from your dbt project, you should manually drop the related relations from your schema. As I create more models, how should I keep my project organized? What should I name my models? There's no one best way to structure a project! Every organization is unique. If you're just getting started, check out how we (dbt Labs) [structure our dbt projects](https://docs.getdbt.com/best-practices/how-we-structure/1-guide-overview.md). If models can only be \`select\` statements, how do I insert records? For those coming from an ETL (Extract Transform Load) paradigm, there's often a desire to write transformations as `insert` and `update` statements. In comparison, dbt will wrap your `select` query in a `create table as` statement, which can feel counter-productive. * If you wish to use `insert` statements for performance reasons (i.e. to reduce data that is processed), consider [incremental models](https://docs.getdbt.com/docs/build/incremental-models.md) * If you wish to use `insert` statements since your source data is constantly changing (e.g. to create "Type 2 Slowly Changing Dimensions"), consider [snapshotting your source data](https://docs.getdbt.com/docs/build/sources.md#source-data-freshness), and building models on top of your snaphots. Why can't I just write DML in my transformations? ###### `select` statements make transformations accessible[​](#select-statements-make-transformations-accessible "Direct link to select-statements-make-transformations-accessible") More people know how to write `select` statements, than DML, making the transformation layer accessible to more people! ###### Writing good DML is hard[​](#writing-good-dml-is-hard "Direct link to Writing good DML is hard") If you write the DDL / DML yourself you can end up getting yourself tangled in problems like: * What happens if the table already exists? Or this table already exists as a view, but now I want it to be a table? * What if the schema already exists? Or, should I check if the schema already exists? * How do I replace a model atomically (such that there's no down-time for someone querying the table) * What if I want to parameterize my schema so I can run these transformations in a development environment? * What order do I need to run these statements in? If I run a `cascade` does it break other things? Each of these problems *can* be solved, but they are unlikely to be the best use of your time. ###### dbt does more than generate SQL[​](#dbt-does-more-than-generate-sql "Direct link to dbt does more than generate SQL") You can test your models, generate documentation, create snapshots, and more! ###### You reduce your vendor lock in[​](#you-reduce-your-vendor-lock-in "Direct link to You reduce your vendor lock in") SQL dialects tend to diverge the most in DML and DDL (rather than in `select` statements) — check out the example [here](https://docs.getdbt.com/faqs/Models/sql-dialect.md). By writing less SQL, it can make a migration to a new database technology easier. If you do need to write custom DML, there are ways to do this in dbt using [custom materializations](https://docs.getdbt.com/guides/create-new-materializations.md). How do I specify column types? Simply cast the column to the correct type in your model: ```sql select id, created::timestamp as created from some_other_table ``` You might have this question if you're used to running statements like this: ```sql create table dbt_alice.my_table id integer, created timestamp; insert into dbt_alice.my_table ( select id, created from some_other_table ) ``` In comparison, dbt would build this table using a `create table as` statement: ```sql create table dbt_alice.my_table as ( select id, created from some_other_table ) ``` So long as your model queries return the correct column type, the table you create will also have the correct column type. To define additional column options: * Rather than enforcing uniqueness and not-null constraints on your column, use dbt's [data testing](https://docs.getdbt.com/docs/build/data-tests.md) functionality to check that your assertions about your model hold true. * Rather than creating default values for a column, use SQL to express defaults (e.g. `coalesce(updated_at, current_timestamp()) as updated_at`) * In edge-cases where you *do* need to alter a column (e.g. column-level encoding on Redshift), consider implementing this via a [post-hook](https://docs.getdbt.com/reference/resource-configs/pre-hook-post-hook.md). #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Supported data platforms dbt connects to and runs SQL against your database, warehouse, lake, or query engine. These SQL-speaking platforms are collectively referred to as *data platforms*. dbt connects with data platforms by using a dedicated adapter plugin for each. Plugins are built as Python modules that dbt Core discovers if they are installed on your system. Refer to the [Build, test, document, and promote adapters](https://docs.getdbt.com/guides/adapter-creation.md) guide for details. You can [connect](https://docs.getdbt.com/docs/connect-adapters.md) to adapters and data platforms natively in dbt or install them manually using dbt Core. You can also further customize how dbt works with your specific data platform via configuration: see [Configuring Postgres](https://docs.getdbt.com/reference/resource-configs/postgres-configs.md) for an example. #### Types of Adapters[​](#types-of-adapters "Direct link to Types of Adapters") There are two types of adapters available today: * **Trusted** — [Trusted adapters](https://docs.getdbt.com/docs/trusted-adapters.md) are those where the adapter maintainers have decided to participate in the Trusted Adapter Program and have made a commitment to meeting those requirements. For adapters supported in dbt, maintainers have undergone an additional rigorous process that covers contractual requirements for development, documentation, user experience, and maintenance. * **Community** — [Community adapters](https://docs.getdbt.com/docs/community-adapters.md) are open-source and maintained by community members. These adapters are not part of the Trusted Adapter Program and could have usage inconsistencies. Considerations for depending on an open-source project 1. Does it work? 2. Does anyone "own" the code, or is anyone liable for ensuring it works? 3. Do bugs get fixed quickly? 4. Does it stay up-to-date with new dbt Core features? 5. Is the usage substantial enough to self-sustain? 6. Do other known projects depend on this library? #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Tableau StarterEnterpriseEnterprise + ### Tableau [Starter](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing")[Enterprise](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing")[Enterprise +](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing") The Tableau integration allows you to use worksheets to query the Semantic Layer directly and produce your dashboards with trusted data. It provides a live connection to the Semantic Layer through Tableau Desktop or Tableau Server. #### Prerequisites[​](#prerequisites "Direct link to Prerequisites") * You have [configured the Semantic Layer](https://docs.getdbt.com/docs/use-dbt-semantic-layer/setup-sl.md) and are using dbt v1.6 or higher. * You must have [Tableau Desktop](https://www.tableau.com/en-gb/products/desktop) version 2021.1 and greater, Tableau Server, or [Tableau Cloud](https://www.tableau.com/products/cloud-bi). * Log in to Tableau Desktop (with Cloud or Server credentials) or Tableau Cloud. You can also use a licensed Tableau Server deployment. * You need your [dbt host](https://docs.getdbt.com/docs/use-dbt-semantic-layer/setup-sl.md#3-view-connection-detail), [Environment ID](https://docs.getdbt.com/docs/use-dbt-semantic-layer/setup-sl.md#set-up-dbt-semantic-layer), and a [service token](https://docs.getdbt.com/docs/dbt-cloud-apis/service-tokens.md) or a [personal access token](https://docs.getdbt.com/docs/dbt-cloud-apis/user-tokens.md) to log in. This account should be set up with the Semantic Layer. * You must have a dbt Starter or Enterprise-tier [account](https://www.getdbt.com/pricing). Suitable for both Multi-tenant and Single-tenant deployment. 📹 Learn about the dbt Semantic Layer with on-demand video courses! Explore our [dbt Semantic Layer on-demand course](https://learn.getdbt.com/courses/semantic-layer) to learn how to define and query metrics in your dbt project. Additionally, dive into mini-courses for querying the dbt Semantic Layer in your favorite tools: [Tableau](https://courses.getdbt.com/courses/tableau-querying-the-semantic-layer), [Excel](https://learn.getdbt.com/courses/querying-the-semantic-layer-with-excel), [Hex](https://courses.getdbt.com/courses/hex-querying-the-semantic-layer), and [Mode](https://courses.getdbt.com/courses/mode-querying-the-semantic-layer). #### Installing the connector[​](#installing-the-connector "Direct link to Installing the connector") The Semantic Layer Tableau connector is available to download directly on [Tableau Exchange](https://exchange.tableau.com/products/1020). The connector is supported in Tableau Desktop, Tableau Server, and Tableau Cloud. Alternatively, you can follow these steps to install the connector. Note that these steps only apply to Tableau Desktop and Tableau Server. The connector for Tableau Cloud is managed by Tableau. 1. Download the GitHub [connector file](https://github.com/dbt-labs/semantic-layer-tableau-connector/releases/latest/download/dbt_semantic_layer.taco) locally and add it to your default folder: | Operating system | Tableau Desktop | Tableau Server | | ---------------- | --------------------------------------------------------------------- | ------------------------------------- | | Windows | `C:\Users\\[Windows User]\Documents\My Tableau Repository\Connectors` | `C:\Program Files\Tableau\Connectors` | | Mac | `/Users/[user]/Documents/My Tableau Repository/Connectors` | Not applicable | | Linux | `/opt/tableau/connectors` | `/opt/tableau/connectors` | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | 2. Install the [JDBC driver](https://docs.getdbt.com/docs/dbt-cloud-apis/sl-jdbc.md) to the folder based on your operating system: * Windows: `C:\Program Files\Tableau\Drivers` * Mac: `~/Library/Tableau/Drivers` or `/Library/JDBC` or `~/Library/JDBC` * Linux: `/opt/tableau/tableau_driver/jdbc` 3. Open Tableau Desktop or Tableau Server and find the **Semantic Layer by dbt Labs** connector on the left-hand side. You may need to restart these applications for the connector to be available. 4. Connect with your Host, Environment ID, and service or personal token information dbt provides during the [Semantic Layer configuration](https://docs.getdbt.com/docs/use-dbt-semantic-layer/setup-sl.md). * In Tableau Server, the authentication screen may show "User" & "Password" instead, in which case the User is the Environment ID and the password is the Service Token. #### Using the integration[​](#using-the-integration "Direct link to Using the integration") 1. **Authentication** — Once you authenticate, the system will direct you to the data source page. 2. **Access all Semantic Layer Objects** — Use the "ALL" data source to access all the metrics, dimensions, and entities configured in your Semantic Layer. Note that the "METRICS\_AND\_DIMENSIONS" data source has been deprecated and replaced by "ALL". Be sure to use a live connection since extracts are not supported at this time. 3. **Access saved queries** — You can optionally access individual [saved queries](https://docs.getdbt.com/docs/build/saved-queries.md) that you've defined. These will also show up as unique data sources when you log in. 4. **Access worksheet** — From your data source selection, go directly to a worksheet in the bottom left-hand corner. 5. **Query metrics and dimensions** — Then, you'll find all the metrics, dimensions, and entities that are available to query on the left side of your window based on your selection. Visit the [Tableau documentation](https://help.tableau.com/current/pro/desktop/en-us/gettingstarted_overview.htm) to learn more about how to use Tableau worksheets and dashboards. ##### Publish from Tableau Desktop to Tableau Server[​](#publish-from-tableau-desktop-to-tableau-server "Direct link to Publish from Tableau Desktop to Tableau Server") * **From Desktop to Server** — Like any Tableau workflow, you can publish your workbook from Tableau Desktop to Tableau Server. For step-by-step instructions, visit Tableau's [publishing guide](https://help.tableau.com/current/pro/desktop/en-us/publish_workbooks_share.htm). ###### Modifying time granularity[​](#modifying-time-granularity "Direct link to Modifying time granularity") When you select time dimensions in the **Group By** menu, you'll see a list of available time granularities. The lowest granularity is selected by default. Metric time is the default time dimension for grouping your metrics. info Note: [Custom time granularities](https://docs.getdbt.com/docs/build/metricflow-time-spine.md#add-custom-granularities) (like fiscal year) aren't currently supported or accessible in this integration. Only [standard granularities](https://docs.getdbt.com/docs/build/dimensions.md?dimension=time_gran#time) (like day, week, month, and so on) are available. If you'd like to access custom granularities, consider using the [Semantic Layer APIs](https://docs.getdbt.com/docs/dbt-cloud-apis/sl-api-overview.md). #### Things to note[​](#things-to-note "Direct link to Things to note") **Aggregation**
* All metrics are shown as using the "SUM" aggregation type in Tableau's UI, and this cannot be altered using Tableau's interface. * The Semantic Layer controls the aggregation type in code and it is intentionally fixed. Keep in mind that the underlying aggregation in the Semantic Layer might not be "SUM" ("SUM" is Tableau's default). **Data sources and display**
* In the "ALL" data source, Tableau surfaces all metrics and dimensions from the Semantic Layer on the left-hand side. Note, that not all metrics and dimensions can be combined. You will receive an error message if a particular dimension cannot be sliced with a metric (or vice versa). You can use saved queries for smaller pieces of data that you want to combine. * To display available metrics and dimensions, Semantic Layer returns metadata for a fake table with the dimensions and metrics as 'columns' on this table. Because of this, you can't actually query this table for previews or extracts. **Calculations and querying**
* Certain Table calculations like "Totals" and "Percent Of" may not be accurate when using metrics aggregated in a non-additive way (such as count distinct) * In any of our Semantic Layer interfaces (not only Tableau), you must include a [time dimension](https://docs.getdbt.com/docs/build/cumulative.md#limitations) when working with any cumulative metric that has a time window or granularity. * We can support calculated fields for creating parameter filters or dynamically selecting metrics and dimensions. However, other uses of calculated fields are not supported. * *Note: For calculated field use cases that are not currently covered, please reach out to [dbt Support]() and share them so we can further understand.* * When using saved queries that include filters, we will automatically apply any filters that the query has. #### Unsupported functionality[​](#unsupported-functionality "Direct link to Unsupported functionality") The following Tableau features aren't supported at this time, however, the Semantic Layer may support some of this functionality in a future release: * Updating the data source page * Using "Extract" mode to view your data * Unioning Tables * Writing Custom SQL / Initial SQL * Table Extensions * Cross-Database Joins * Some functions in Analysis --> Create Calculated Field * Filtering on a Date Part time dimension for a Cumulative metric type * Changing your date dimension to use "Week Number" * Performing joins between tables that the Semantic Layer creates. It handles joins for you, so there's no need to join components in the Semantic Layer. Note, that you *can* join tables from the Semantic Layer to ones outside your data platform. * The Tableau integration doesn't currently display descriptive labels defined in your `metrics` configuration, meaning custom labels won't be visible when those metrics are imported/queried into Tableau. #### FAQs[​](#faqs "Direct link to FAQs") I'm receiving an \`Failed ALPN\` error when trying to connect to the dbt Semantic Layer. If you're receiving a `Failed ALPN` error when trying to connect the dbt Semantic Layer with the various [data integration tools](https://docs.getdbt.com/docs/cloud-integrations/avail-sl-integrations.md) (such as Tableau, DBeaver, Datagrip, ADBC, or JDBC), it typically happens when connecting from a computer behind a corporate VPN or Proxy (like Zscaler or Check Point). The root cause is typically the proxy interfering with the TLS handshake as the Semantic Layer uses gRPC/HTTP2 for connectivity. To resolve this: * If your proxy supports gRPC/HTTP2 but isn't configured to allow ALPN, adjust its settings accordingly to allow ALPN. Or create an exception for the dbt domain. * If your proxy does not support gRPC/HTTP2, add an SSL interception exception for the dbt domain in your proxy settings This should help in successfully establishing the connection without the Failed ALPN error. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Trusted adapters Trusted adapters take part in the Trusted Adapter Program, including a commitment to meet the program's requirements. They are maintained by dbt Labs, partners, and community members. Trusted adapters in dbt undergo an additional rigorous process that covers development, documentation, user experience, and maintenance requirements. We strongly recommend using them in production environments. For further details, refer to [What it means to be trusted](https://docs.getdbt.com/guides/adapter-creation.md?step=8#what-it-means-to-be-trusted). Free and open-source tools for the data professional are increasingly abundant. This is by-and-large a *good thing*, however it requires due diligence that wasn't required in a paid-license, closed-source software world. As a user, there are important questions to answer before taking a dependency on an open-source project. The trusted adapter designation is meant to streamline this process for end users. ##### Trusted adapter specifications[​](#trusted-adapter-specifications "Direct link to Trusted adapter specifications") Refer to the [Build, test, document, and promote adapters](https://docs.getdbt.com/guides/adapter-creation.md) guide for more information, particularly if you are an adapter maintainer considering having your adapter be added to the trusted list. ##### Trusted adapters[​](#trusted-adapters "Direct link to Trusted adapters") ![](/img/icons/alloydb.svg) ###### AlloyDB * [Set up in the dbt platform](https://docs.getdbt.com/docs/cloud/connect-data-platform/connect-postgresql-alloydb.md)
* [Install with dbt Core](https://docs.getdbt.com/docs/local/connect-data-platform/alloydb-setup.md)
[![](https://badge.fury.io/py/dbt-postgres.svg/)](https://badge.fury.io/py/dbt-postgres) dbt platformdbt Core ![](/img/icons/apache-spark.svg) ###### Apache Spark * [Set up in the dbt platform](https://docs.getdbt.com/docs/cloud/connect-data-platform/connect-apache-spark.md)
* [Install with dbt Core](https://docs.getdbt.com/docs/local/connect-data-platform/spark-setup.md)
[![](https://badge.fury.io/py/dbt-spark.svg/)](https://badge.fury.io/py/dbt-spark) dbt platformdbt Core ![](/img/icons/athena.svg) ###### Athena * [Set up in the dbt platform](https://docs.getdbt.com/docs/cloud/connect-data-platform/connect-amazon-athena.md)
* [Install with dbt Core](https://docs.getdbt.com/docs/local/connect-data-platform/athena-setup.md)

[![](https://badge.fury.io/py/dbt-athena.svg/)](https://badge.fury.io/py/dbt-athena) dbt platformdbt Core ![](/img/icons/azure-synapse-analytics.svg) ###### Azure Synapse * [Set up in the dbt platform](https://docs.getdbt.com/docs/cloud/connect-data-platform/connect-azure-synapse-analytics.md)
* [Install with dbt Core](https://docs.getdbt.com/docs/local/connect-data-platform/azuresynapse-setup.md)
[![](https://badge.fury.io/py/dbt-synapse.svg/)](https://badge.fury.io/py/dbt-synapse) dbt platformdbt Core ![](/img/icons/bigquery.svg) ###### BigQuery * [Set up in the dbt platform](https://docs.getdbt.com/docs/cloud/connect-data-platform/connect-bigquery.md)
* [Install with dbt Core](https://docs.getdbt.com/docs/local/connect-data-platform/bigquery-setup.md)
* [Install with dbt Fusion](https://docs.getdbt.com/docs/local/connect-data-platform/bigquery-setup.md)
[![](https://badge.fury.io/py/dbt-bigquery.svg/)](https://badge.fury.io/py/dbt-bigquery) dbt platformdbt CoreFusion ![](/img/icons/clickhouse.svg) ###### ClickHouse * [Install with dbt Core](https://docs.getdbt.com/docs/local/connect-data-platform/clickhouse-setup.md)
[![](https://badge.fury.io/py/dbt-clickhouse.svg/)](https://badge.fury.io/py/dbt-clickhouse) dbt Core ![](/img/icons/databricks.svg) ###### Databricks * [Set up in the dbt platform](https://docs.getdbt.com/docs/cloud/connect-data-platform/connect-databricks.md)
* [Install with dbt Core](https://docs.getdbt.com/docs/local/connect-data-platform/databricks-setup.md)
* [Install with dbt Fusion](https://docs.getdbt.com/docs/local/connect-data-platform/databricks-setup.md)
[![](https://badge.fury.io/py/dbt-databricks.svg/)](https://badge.fury.io/py/dbt-databricks) dbt platformdbt CoreFusion ![](/img/icons/dremio.svg) ###### Dremio * [Install with dbt Core](https://docs.getdbt.com/docs/local/connect-data-platform/dremio-setup.md)

[![](https://badge.fury.io/py/dbt-dremio.svg/)](https://badge.fury.io/py/dbt-dremio) dbt Core ![](/img/icons/glue.svg) ###### Glue * [Install with dbt Core](https://docs.getdbt.com/docs/local/connect-data-platform/glue-setup.md)

[![](https://badge.fury.io/py/dbt-glue.svg/)](https://badge.fury.io/py/dbt-glue) dbt Core ![](/img/icons/exasol.svg) ###### Exasol * [Install with dbt Core](https://docs.getdbt.com/docs/local/connect-data-platform/exasol-setup.md)

[![](https://badge.fury.io/py/dbt-exasol.svg/)](https://badge.fury.io/py/dbt-exasol) dbt Core ![](/img/icons/dbt-ibm-netezza.svg) ###### IBM Netezza * [Install with dbt Core](https://docs.getdbt.com/docs/local/connect-data-platform/ibmnetezza-setup.md)

[![](https://badge.fury.io/py/dbt-ibm-netezza.svg/)](https://badge.fury.io/py/dbt-ibm-netezza) dbt Core ![](/img/icons/lakebase.svg) ###### Databricks Lakebase * [Install with dbt Core](https://docs.getdbt.com/docs/local/connect-data-platform/lakebase-setup.md)

[![](https://badge.fury.io/py/dbt-postgres.svg/)](https://badge.fury.io/py/dbt-postgres) dbt platformdbt Core ![](/img/icons/materialize.svg) ###### Materialize * [Install with dbt Core](https://docs.getdbt.com/docs/local/connect-data-platform/materialize-setup.md)

[![](https://badge.fury.io/py/dbt-materialize.svg/)](https://badge.fury.io/py/dbt-materialize) dbt Core ![](/img/icons/fabric_warehouse.svg) ###### Microsoft Fabric Warehouse * [Set up in the dbt platform](https://docs.getdbt.com/docs/cloud/connect-data-platform/connect-microsoft-fabric.md)
* [Install with dbt Core](https://docs.getdbt.com/docs/local/connect-data-platform/fabric-setup.md)
[![](https://badge.fury.io/py/dbt-fabric.svg/)](https://badge.fury.io/py/dbt-fabric) dbt platformdbt Core ![](/img/icons/fabric_lakehouse.svg) ###### Microsoft Fabric Lakehouse * [Install with dbt Core](https://docs.getdbt.com/docs/local/connect-data-platform/fabricspark-setup.md)
[![](https://badge.fury.io/py/dbt-fabricspark.svg/)](https://badge.fury.io/py/dbt-fabricspark) dbt Core ![](/img/icons/oracle.svg) ###### Oracle Autonomous Database * [Install with dbt Core](https://docs.getdbt.com/docs/local/connect-data-platform/oracle-setup.md)
[![](https://badge.fury.io/py/dbt-oracle.svg/)](https://badge.fury.io/py/dbt-oracle) dbt Core ![](/img/icons/postgres.svg) ###### Postgres * [Set up in the dbt platform](https://docs.getdbt.com/docs/cloud/connect-data-platform/connect-postgresql-alloydb.md)
* [Install with dbt Core](https://docs.getdbt.com/docs/local/connect-data-platform/postgres-setup.md)
[![](https://badge.fury.io/py/dbt-postgres.svg/)](https://badge.fury.io/py/dbt-postgres) dbt platformdbt Core ![](/img/icons/redshift.svg) ###### Redshift * [Set up in the dbt platform](https://docs.getdbt.com/docs/cloud/connect-data-platform/connect-redshift.md)
* [Install with dbt Core](https://docs.getdbt.com/docs/local/connect-data-platform/redshift-setup.md)
* [Install with dbt Fusion](https://docs.getdbt.com/docs/local/connect-data-platform/redshift-setup.md)
[![](https://badge.fury.io/py/dbt-redshift.svg/)](https://badge.fury.io/py/dbt-redshift) dbt platformdbt CoreFusion ![](/img/icons/risingwave.svg) ###### RisingWave * [Install with dbt Core](https://docs.getdbt.com/docs/local/connect-data-platform/risingwave-setup.md)

[![](https://badge.fury.io/py/dbt-risingwave.svg/)](https://badge.fury.io/py/dbt-risingwave) dbt Core ![](/img/icons/salesforce.svg) ###### Salesforce Data 360 * [Install with dbt Fusion](https://docs.getdbt.com/docs/local/connect-data-platform/salesforce-data-cloud-setup.md)
Fusion ![](/img/icons/singlestore.svg) ###### SingleStore * [Install with dbt Core](https://docs.getdbt.com/docs/local/connect-data-platform/singlestore-setup.md)

[![](https://badge.fury.io/py/dbt-singlestore.svg/)](https://badge.fury.io/py/dbt-singlestore) dbt Core ![](/img/icons/snowflake.svg) ###### Snowflake * [Set up in the dbt platform](https://docs.getdbt.com/docs/cloud/connect-data-platform/connect-snowflake.md)
* [Install with dbt Core](https://docs.getdbt.com/docs/local/connect-data-platform/snowflake-setup.md)
* [Install with dbt Fusion](https://docs.getdbt.com/docs/local/connect-data-platform/snowflake-setup.md)
[![](https://badge.fury.io/py/dbt-snowflake.svg/)](https://badge.fury.io/py/dbt-snowflake) dbt platformdbt CoreFusion ![](/img/icons/starburst.svg) ###### Starburst/Trino * [Set up in the dbt platform](https://docs.getdbt.com/docs/cloud/connect-data-platform/connect-starburst-trino.md)
* [Install with dbt Core](https://docs.getdbt.com/docs/local/connect-data-platform/trino-setup.md)
[![](https://badge.fury.io/py/dbt-trino.svg/)](https://badge.fury.io/py/dbt-trino) dbt platformdbt Core ![](/img/icons/teradata.svg) ###### Teradata * [Set up in the dbt platform](https://docs.getdbt.com/docs/cloud/connect-data-platform/connect-teradata.md)
* [Install with dbt Core](https://docs.getdbt.com/docs/local/connect-data-platform/teradata-setup.md)
[![](https://badge.fury.io/py/dbt-teradata.svg/)](https://badge.fury.io/py/dbt-teradata) dbt platformdbt Core #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Unit tests 💡Did you know... Available from dbt v 1.8 or with the [dbt "Latest" release track](https://docs.getdbt.com/docs/dbt-versions/cloud-release-tracks.md). Historically, dbt's test coverage was confined to [“data” tests](https://docs.getdbt.com/docs/build/data-tests.md), assessing the quality of input data or resulting datasets' structure. However, these tests could only be executed *after* building a model. There is an additional type of test to dbt - unit tests. In software programming, unit tests validate small portions of your functional code, and they work much the same way here. Unit tests allow you to validate your SQL modeling logic on a small set of static inputs *before* you materialize your full model in production. Unit tests enable test-driven development, benefiting developer efficiency and code reliability. #### Before you begin[​](#before-you-begin "Direct link to Before you begin") * We currently only support unit testing SQL models. * We currently only support adding unit tests to models in your *current* project. * We currently *don't* support unit testing models that use the [`materialized view`](https://docs.getdbt.com/docs/build/materializations.md#materialized-view) materialization. * We currently *don't* support unit testing models that use recursive SQL. * We currently *don't* support unit testing models that use introspective queries. * If your model has multiple versions, by default the unit test will run on *all* versions of your model. Read [unit testing versioned models](https://docs.getdbt.com/reference/resource-properties/unit-testing-versions.md) for more information. * Unit tests must be defined in a YML file in your [`models/` directory](https://docs.getdbt.com/reference/project-configs/model-paths.md). * Table names must be aliased in order to unit test `join` logic. * Include all [`ref`](https://docs.getdbt.com/reference/dbt-jinja-functions/ref.md) or [`source`](https://docs.getdbt.com/reference/dbt-jinja-functions/source.md) model references in the unit test configuration as `input`s to avoid "node not found" errors during compilation. ###### Adapter-specific caveats[​](#adapter-specific-caveats "Direct link to Adapter-specific caveats") * You must specify all fields in a BigQuery `STRUCT` in a unit test. You cannot use only a subset of fields in a `STRUCT`. * Redshift customers need to be aware of a [limitation when building unit tests](https://docs.getdbt.com/reference/resource-configs/redshift-configs.md#unit-test-limitations) that requires a workaround. * Redshift sources need to be in the same database as the models. tip Check out our [Unit tests on-demand course](https://learn.getdbt.com/learn/course/unit-testing/welcome-to-unit-testing-5min/introduction-to-unit-testing) to learn how to add unit tests and more! Read the [reference doc](https://docs.getdbt.com/reference/resource-properties/unit-tests.md) for more details about formatting your unit tests. ##### When to add a unit test to your model[​](#when-to-add-a-unit-test-to-your-model "Direct link to When to add a unit test to your model") You should unit test a model: * When your SQL contains complex logic: * Regex * Date math * Window functions * `case when` statements when there are many `when`s * Truncation * When you're writing custom logic to process input data, similar to creating a function. * We don't recommend conducting unit testing for functions like `min()` since these functions are tested extensively by the warehouse. If an unexpected issue arises, it's more likely a result of issues in the underlying data rather than the function itself. Therefore, fixture data in the unit test won't provide valuable information. * Logic for which you had bugs reported before. * Edge cases not yet seen in your actual data that you want to handle. * Prior to refactoring the transformation logic (especially if the refactor is significant). * Models with high "criticality" (public, contracted models or models directly upstream of an exposure). ##### When to run unit tests[​](#when-to-run-unit-tests "Direct link to When to run unit tests") dbt Labs strongly recommends only running unit tests in development or CI environments. Since the inputs of the unit tests are static, there's no need to use additional compute cycles running them in production. Use them in development for a test-driven approach and CI to ensure changes don't break them. Use the [resource type](https://docs.getdbt.com/reference/global-configs/resource-type.md) flag `--exclude-resource-type` or the `DBT_EXCLUDE_RESOURCE_TYPES` environment variable to exclude unit tests from your production builds and save compute. #### Unit testing a model[​](#unit-testing-a-model "Direct link to Unit testing a model") This example creates a new `dim_customers` model with a field `is_valid_email_address` that calculates whether or not the customer’s email is valid: ```sql with customers as ( select * from {{ ref('stg_customers') }} ), accepted_email_domains as ( select * from {{ ref('top_level_email_domains') }} ), check_valid_emails as ( select customers.customer_id, customers.first_name, customers.last_name, customers.email, coalesce (regexp_like( customers.email, '^[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\\.[A-Za-z]{2,}$' ) = true and accepted_email_domains.tld is not null, false) as is_valid_email_address from customers left join accepted_email_domains on customers.email_top_level_domain = lower(accepted_email_domains.tld) ) select * from check_valid_emails ``` The logic posed in this example can be challenging to validate. You can add a unit test to this model to ensure the `is_valid_email_address` logic captures all known edge cases: emails without `.`, emails without `@`, and emails from invalid domains. ```yaml unit_tests: - name: test_is_valid_email_address description: "Check my is_valid_email_address logic captures all known edge cases - emails without ., emails without @, and emails from invalid domains." model: dim_customers given: - input: ref('stg_customers') rows: - {email: cool@example.com, email_top_level_domain: example.com} - {email: cool@unknown.com, email_top_level_domain: unknown.com} - {email: badgmail.com, email_top_level_domain: gmail.com} - {email: missingdot@gmailcom, email_top_level_domain: gmail.com} - input: ref('top_level_email_domains') rows: - {tld: example.com} - {tld: gmail.com} expect: rows: - {email: cool@example.com, is_valid_email_address: true} - {email: cool@unknown.com, is_valid_email_address: false} - {email: badgmail.com, is_valid_email_address: false} - {email: missingdot@gmailcom, is_valid_email_address: false} ``` The previous example defines the mock data using the inline `dict` format, but you can also use `csv` or `sql` either inline or in a separate fixture file. Store your fixture files in a `fixtures` subdirectory in any of your [test paths](https://docs.getdbt.com/reference/project-configs/test-paths.md). For example, `tests/fixtures/my_unit_test_fixture.sql`. When using the `dict` or `csv` format, you only have to define the mock data for the columns relevant to you. This enables you to write succinct and *specific* unit tests. note The direct parents of the model that you’re unit testing (in this example, `stg_customers` and `top_level_email_domains`) need to exist in the warehouse before you can execute the unit test. Use the [`--empty`](https://docs.getdbt.com/reference/commands/build.md#the---empty-flag) flag to build an empty version of the models to save warehouse spend. ```bash dbt run --select "stg_customers top_level_email_domains" --empty ``` Alternatively, use `dbt build` to, in lineage order: * Run the unit tests on your model. * Materialize your model in the warehouse. * Run the data tests on your model. Now you’re ready to run this unit test. You have a couple of options for commands depending on how specific you want to be: * `dbt test --select dim_customers` runs *all* of the tests on `dim_customers`. * `dbt test --select "dim_customers,test_type:unit"` runs all of the *unit* tests on `dim_customers`. * `dbt test --select test_is_valid_email_address` runs the test named `test_is_valid_email_address`. ```shell dbt test --select test_is_valid_email_address 16:03:49 Running with dbt=1.8.0-a1 16:03:49 Registered adapter: postgres=1.8.0-a1 16:03:50 Found 6 models, 5 seeds, 4 data tests, 0 sources, 0 exposures, 0 metrics, 410 macros, 0 groups, 0 semantic models, 1 unit test 16:03:50 16:03:50 Concurrency: 5 threads (target='postgres') 16:03:50 16:03:50 1 of 1 START unit_test dim_customers::test_is_valid_email_address ................... [RUN] 16:03:51 1 of 1 FAIL 1 dim_customers::test_is_valid_email_address ............................ [FAIL 1 in 0.26s] 16:03:51 16:03:51 Finished running 1 unit_test in 0 hours 0 minutes and 0.67 seconds (0.67s). 16:03:51 16:03:51 Completed with 1 error and 0 warnings: 16:03:51 16:03:51 Failure in unit_test test_is_valid_email_address (models/marts/unit_tests.yml) 16:03:51 actual differs from expected: @@ ,email ,is_valid_email_address → ,cool@example.com,True→False ,cool@unknown.com,False ...,... ,... 16:03:51 16:03:51 compiled Code at models/marts/unit_tests.yml 16:03:51 16:03:51 Done. PASS=0 WARN=0 ERROR=1 SKIP=0 TOTAL=1 ``` The clever regex statement wasn’t as clever as initially thought, as the model incorrectly flagged `cool@example.com` as an invalid email address. Updating the regex logic to `'^[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}$'` (those pesky escape characters) and rerunning the unit test solves the problem: ```shell dbt test --select test_is_valid_email_address 16:09:11 Running with dbt=1.8.0-a1 16:09:12 Registered adapter: postgres=1.8.0-a1 16:09:12 Found 6 models, 5 seeds, 4 data tests, 0 sources, 0 exposures, 0 metrics, 410 macros, 0 groups, 0 semantic models, 1 unit test 16:09:12 16:09:13 Concurrency: 5 threads (target='postgres') 16:09:13 16:09:13 1 of 1 START unit_test dim_customers::test_is_valid_email_address ................... [RUN] 16:09:13 1 of 1 PASS dim_customers::test_is_valid_email_address .............................. [PASS in 0.26s] 16:09:13 16:09:13 Finished running 1 unit_test in 0 hours 0 minutes and 0.75 seconds (0.75s). 16:09:13 16:09:13 Completed successfully 16:09:13 16:09:13 Done. PASS=1 WARN=0 ERROR=0 SKIP=0 TOTAL=1 ``` Your model is now ready for production! Adding this unit test helped catch an issue with the SQL logic *before* you materialized `dim_customers` in your warehouse and will better ensure the reliability of this model in the future. #### Unit testing incremental models[​](#unit-testing-incremental-models "Direct link to Unit testing incremental models") When configuring your unit test, you can override the output of macros, vars, or environment variables. This enables you to unit test your incremental models in "full refresh" and "incremental" modes. note Incremental models need to exist in the database first before running unit tests or doing a `dbt build`. Use the [`--empty` flag](https://docs.getdbt.com/reference/commands/build.md#the---empty-flag) to build an empty version of the models to save warehouse spend. You can also optionally select only your incremental models using the [`--select` flag](https://docs.getdbt.com/reference/node-selection/syntax.md#shorthand). ```shell dbt run --select "config.materialized:incremental" --empty ``` After running the command, you can then perform a regular `dbt build` for that model and then run your unit test. When testing an incremental model, the expected output is the **result of the materialization** (what will be merged/inserted), not the resulting model itself (what the final table will look like after the merge/insert). For example, say you have an incremental model in your project: my\_incremental\_model.sql ```sql {{ config( materialized='incremental' ) }} select * from {{ ref('events') }} {% if is_incremental() %} where event_time > (select max(event_time) from {{ this }}) {% endif %} ``` You can define unit tests on `my_incremental_model` to ensure your incremental logic is working as expected: ```yml unit_tests: - name: my_incremental_model_full_refresh_mode model: my_incremental_model overrides: macros: # unit test this model in "full refresh" mode is_incremental: false given: - input: ref('events') rows: - {event_id: 1, event_time: 2020-01-01} expect: rows: - {event_id: 1, event_time: 2020-01-01} - name: my_incremental_model_incremental_mode model: my_incremental_model overrides: macros: # unit test this model in "incremental" mode is_incremental: true given: - input: ref('events') rows: - {event_id: 1, event_time: 2020-01-01} - {event_id: 2, event_time: 2020-01-02} - {event_id: 3, event_time: 2020-01-03} - input: this # contents of current my_incremental_model rows: - {event_id: 1, event_time: 2020-01-01} expect: # what will be inserted/merged into my_incremental_model rows: - {event_id: 2, event_time: 2020-01-02} - {event_id: 3, event_time: 2020-01-03} ``` There is currently no way to unit test whether the dbt framework inserted/merged the records into your existing model correctly, but [we're investigating support for this in the future](https://github.com/dbt-labs/dbt-core/issues/8664). #### Unit testing a model that depends on ephemeral model(s)[​](#unit-testing-a-model-that-depends-on-ephemeral-models "Direct link to Unit testing a model that depends on ephemeral model(s)") If you want to unit test a model that depends on an ephemeral model, you must use `format: sql` for that input. ```yml unit_tests: - name: my_unit_test model: dim_customers given: - input: ref('ephemeral_model') format: sql rows: | select 1 as id, 'emily' as first_name expect: rows: - {id: 1, first_name: emily} ``` #### Unit test exit codes[​](#unit-test-exit-codes "Direct link to Unit test exit codes") Unit test successes and failures are represented by two exit codes: * Pass (0) * Fail (1) Exit codes differ from data test success and failure outputs because they don't directly reflect failing data tests. Data tests are queries designed to check specific conditions in your data, and they return one row per failed test case (for example, the number of values with duplicates for the `unique` test). dbt reports the number of failing records as failures. Whereas, each unit test represents one 'test case', so results are always 0 (pass) or 1 (fail) regardless of how many records failed within that test case. Learn about [exit codes](https://docs.getdbt.com/reference/exit-codes.md) for more information. #### Additional resources[​](#additional-resources "Direct link to Additional resources") * [Unit testing reference page](https://docs.getdbt.com/reference/resource-properties/unit-tests.md) * [Supported data formats for mock data](https://docs.getdbt.com/reference/resource-properties/data-formats.md) * [Unit testing versioned models](https://docs.getdbt.com/reference/resource-properties/unit-testing-versions.md) * [Unit test inputs](https://docs.getdbt.com/reference/resource-properties/unit-test-input.md) * [Unit test overrides](https://docs.getdbt.com/reference/resource-properties/unit-test-overrides.md) * [Platform-specific data types](https://docs.getdbt.com/reference/resource-properties/data-types.md) #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Upgrade versions in dbt platform In dbt, both [jobs](https://docs.getdbt.com/docs/deploy/jobs.md) and [environments](https://docs.getdbt.com/docs/dbt-cloud-environments.md) are configured to use a specific version of dbt Core. The version can be upgraded at any time. #### Environments[​](#environments "Direct link to Environments") Navigate to the settings page of an environment, then click **Edit**. Click the **dbt version** dropdown bar and make your selection. You can select a [release track](#release-tracks) to receive ongoing updates (recommended), or a legacy version of dbt Core. Be sure to save your changes before navigating away. [![Example environment settings in dbt](/img/docs/dbt-cloud/cloud-configuring-dbt-cloud/choosing-dbt-version/example-environment-settings.png?v=2 "Example environment settings in dbt")](#)Example environment settings in dbt ##### Release Tracks[​](#release-tracks "Direct link to Release Tracks") Starting in 2024, your project gets upgraded automatically on a cadence that you choose: The **Latest** track ensures you have up-to-date dbt functionality, and early access to new features of the dbt framework. The **Compatible** and **Extended** tracks are designed for customers who need a less-frequent release cadence, the ability to test new dbt releases before they go live in production, and/or ongoing compatibility with the latest open source releases of dbt Core. As a best practice, dbt Labs recommends that you test the upgrade in development first; use the [Override dbt version](#override-dbt-version) setting to test *your* project on the latest dbt version before upgrading your deployment environments and the default development environment for all your colleagues. To upgrade an environment in the [dbt Admin API](https://docs.getdbt.com/docs/dbt-cloud-apis/admin-cloud-api.md) or [Terraform](https://registry.terraform.io/providers/dbt-labs/dbtcloud/latest), set `dbt_version` to the name of your release track: * `latest-fusion` [Private preview](https://docs.getdbt.com/docs/dbt-versions/product-lifecycles "Go to https://docs.getdbt.com/docs/dbt-versions/product-lifecycles") (available to select accounts) * `latest` (default) * `compatible` (available to Starter, Enterprise, Enterprise+ plans) * `extended` (available to all Enterprise plans) ##### Override dbt version[​](#override-dbt-version "Direct link to Override dbt version") Configure your project to use a different dbt version than what's configured in your [development environment](https://docs.getdbt.com/docs/dbt-cloud-environments.md#types-of-environments). This *override* only affects your user account, no one else's. Use this to safely test new dbt features before upgrading the dbt version for your projects. 1. Click your account name from the left side panel and select **Account settings**. 2. Choose **Credentials** from the sidebar and select a project. This opens a side panel. 3. In the side panel, click **Edit** and scroll to the **User development settings** section. 4. Choose a version from the **dbt version** dropdown and click **Save**. An example of overriding the configured version to [**Latest** release track](https://docs.getdbt.com/docs/dbt-versions/cloud-release-tracks.md) for the selected project: [![Example of overriding the dbt version on your user account](/img/docs/dbt-cloud/cloud-configuring-dbt-cloud/choosing-dbt-version/example-override-version.png?v=2 "Example of overriding the dbt version on your user account")](#)Example of overriding the dbt version on your user account 5. (Optional) Verify that dbt will use your override setting to build the project by invoking a `dbt build` command in the Studio IDE's command bar. Expand the **System Logs** section and find the output's first line. It should begin with `Running with dbt=` and list the version dbt is using.

For users on Release tracks, the output will display `Running dbt...` instead of a specific version, reflecting the flexibility and continuous automatic updates provided by the release track functionality. #### dbt Fusion engine [Private preview](https://docs.getdbt.com/docs/dbt-versions/product-lifecycles "Go to https://docs.getdbt.com/docs/dbt-versions/product-lifecycles")[​](#dbt-fusion-engine- "Direct link to dbt-fusion-engine-") dbt Labs has introduced the new [dbt Fusion engine](https://docs.getdbt.com/docs/fusion.md), a ground-up rebuild of dbt. This is currently in private preview on the dbt platform. Eligible customers can update environments to Fusion using the same workflows as v1.x, but remember: * If you don't see the `Latest Fusion` release track as an option, you should check with your dbt Labs account team about eligibility. * To increase the compatibility of your project, update all jobs and environments to the **Latest** release track and read more about the changes in our [upgrade guide](https://docs.getdbt.com/docs/dbt-versions/core-upgrade/upgrading-to-fusion.md). * Make sure you're using a supported adapter and authentication method:  BigQuery * Service Account / User Token * Native OAuth * External OAuth * [Required permissions](https://docs.getdbt.com/docs/local/connect-data-platform/bigquery-setup.md#required-permissions)  Databricks * Service Account / User Token * Native OAuth  Redshift * Username / Password * IAM profile  Snowflake * Username / Password * Native OAuth * External OAuth * Key pair using a modern PKCS#8 method * MFA * Once you upgrade your development environment(s) to `Latest Fusion`, every user will have to restart the IDE. [![Upgrade to the Fusion engine in your environment settings.](/img/docs/dbt-cloud/cloud-configuring-dbt-cloud/cloud-upgrading-dbt-versions/upgrade-fusion.png?v=2 "Upgrade to the Fusion engine in your environment settings.")](#)Upgrade to the Fusion engine in your environment settings. ##### Upgrading environments to Fusion [Private preview](https://docs.getdbt.com/docs/dbt-versions/product-lifecycles "Go to https://docs.getdbt.com/docs/dbt-versions/product-lifecycles")[​](#upgrading-environments-to-fusion- "Direct link to upgrading-environments-to-fusion-") When you're ready to upgrade your project(s) to dbt Fusion engine, there are some tools available to you in the dbt platform UI to help you get started. The Fusion upgrade assistant will step you through the process of preparing and upgrading your projects. [![The Fusion upgrade assistant.](/img/docs/dbt-cloud/cloud-configuring-dbt-cloud/choosing-dbt-version/fusion-upgrade-gui.png?v=2 "The Fusion upgrade assistant.")](#)The Fusion upgrade assistant. ###### Prerequisites[​](#prerequisites "Direct link to Prerequisites") To take advantage of the upgrade assistant, you'll need to meet the following prerequisites: * Your dbt project must be updated to use the **Latest** release track. * You must have a `developer` license. * You must have the beta enabled for your account. For more information, please contact your account manager. ###### Assign access to upgrade[​](#assign-access-to-upgrade "Direct link to Assign access to upgrade") By default, all users can view the Fusion upgrade workflows. The actions they can take will ultimately be limited by their assigned permissions and access to environments. You can fine-tune who can access the upgrade with the combination of a new account setting and the `Fusion admin` permission set. From your **Account settings**: 1. Navigate to the **Account** screen. 2. Click **Edit** and scroll to the bottom, and click the box next to **Enable Fusion migration** permissions. 3. Click **Save**. [![Limit access to the Fusion upgrade workflows.](/img/docs/dbt-cloud/cloud-configuring-dbt-cloud/choosing-dbt-version/fusion-migration-permissions.png?v=2 "Limit access to the Fusion upgrade workflows.")](#)Limit access to the Fusion upgrade workflows. This hides the Fusion upgrade workflow from users who don't have the `Fusion admin` permission set, including the highest levels of admin access. To grant access to the upgrade workflows to specific projects and/or specific users: 1. Navigate to an existing group in your **Account settings** and click **Edit**, or click [**Create group**](https://docs.getdbt.com/docs/cloud/manage-access/about-user-access.md#create-new-groups) to create a new one. 2. Scroll to the **Access and permissions** section and click **Add permission**. 3. Select the **Fusion admin** permission set from the dropdown and then select the project(s) you want the users to access. 4. Click **Save**. [![Assign Fusion admin to groups and projects.](/img/docs/dbt-cloud/cloud-configuring-dbt-cloud/choosing-dbt-version/assign-fusion-admin.png?v=2 "Assign Fusion admin to groups and projects.")](#)Assign Fusion admin to groups and projects. The Fusion upgrade workflows helps identify areas of the project that need to be updated and provides tools for manually resolving and autofixing any errors. ###### Upgrade your development environment[​](#upgrade-your-development-environment "Direct link to Upgrade your development environment") To begin the process of upgrading to Fusion with the assistant: 1. From the project homepage or sidebar menu, click the **Start Fusion upgrade** or **Get started** button. You will be redirected to the Studio IDE. [![Start the Fusion upgrade.](/img/docs/dbt-cloud/cloud-configuring-dbt-cloud/choosing-dbt-version/start-upgrade.png?v=2 "Start the Fusion upgrade.")](#)Start the Fusion upgrade. 2. At the top of the Studio IDE click **Check deprecation warnings**. [![Begin the process of parsing for deprecation warnings.](/img/docs/dbt-cloud/cloud-configuring-dbt-cloud/choosing-dbt-version/check-deprecations.png?v=2 "Begin the process of parsing for deprecation warnings.")](#)Begin the process of parsing for deprecation warnings. 3. dbt parses your project for the deprecations and presents a list of all deprecation warnings along with the option to **Autofix warnings**. Autofixing attempts to correct all syntax errors automatically. See [Fix deprecation warnings](https://docs.getdbt.com/docs/cloud/studio-ide/autofix-deprecations.md) for more information. [![Begin the process of parsing for deprecation warnings.](/img/docs/dbt-cloud/cloud-configuring-dbt-cloud/choosing-dbt-version/check-deprecations.png?v=2 "Begin the process of parsing for deprecation warnings.")](#)Begin the process of parsing for deprecation warnings. 4. Once the deprecation warnings have been resolved, click the **Enable Fusion** button. This upgrades your development environment to Fusion! [![You're now ready to upgrade to Fusion in your development environment!](/img/docs/dbt-cloud/cloud-configuring-dbt-cloud/choosing-dbt-version/autofix-success.png?v=2 "You're now ready to upgrade to Fusion in your development environment!")](#)You're now ready to upgrade to Fusion in your development environment! Now that you've upgraded your development environment to , you're ready to start the process of upgrading your Production, Staging, and General environments. Follow your organization's standard procedures and use the [release tracks](#release-tracks) to upgrade. ###### Upgrade considerations[​](#upgrade-considerations "Direct link to Upgrade considerations") Keep in mind the following considerations during the upgrade process: * **Manifest incompatibility** — Fusion is backwards-compatible and can read dbt Core [manifests](https://docs.getdbt.com/reference/artifacts/manifest-json.md). However, dbt Core isn't forward-compatible and can't read Fusion manifests. Fusion produces a `v20` manifest, while the latest version of dbt Core still produces a `v12` manifest. As a result, mixing dbt Core and Fusion manifests across environments breaks cross-environment features. To avoid this, use `state:modified`, `--defer`, and cross-environment `dbt docs generate` only after *all* environments are running the latest Fusion version. Using these features before all environments are on Fusion may cause errors and failures. * **State-aware orchestration** — If using [state-aware orchestration](https://docs.getdbt.com/docs/deploy/state-aware-about.md), dbt doesn’t detect a change if a table or view is dropped outside of dbt, as the cache is unique to each dbt platform environment. This means state-aware orchestration will not rebuild that model until either there is new data or a change in the code that the model uses. * **Workarounds:** * Use the **Clear cache** button on the target Environment page to force a full rebuild (acts like a reset), or * Temporarily disable State-aware orchestration for the job and rerun it. #### Jobs[​](#jobs "Direct link to Jobs") Each job in dbt can be configured to inherit parameters from the environment it belongs to. [![Settings of a dbt job](/img/docs/dbt-cloud/cloud-configuring-dbt-cloud/choosing-dbt-version/job-settings.png?v=2 "Settings of a dbt job")](#)Settings of a dbt job The example job seen in the screenshot above belongs to the environment "Prod". It inherits the dbt version of its environment as shown by the **Inherited from ENVIRONMENT\_NAME (DBT\_VERSION)** selection. You may also manually override the dbt version of a specific job to be any of the current Core releases supported by Cloud by selecting another option from the dropdown. #### Supported versions[​](#supported-versions "Direct link to Supported versions") dbt Labs has always encouraged users to upgrade dbt Core versions whenever a new minor version is released. We released our first major version of dbt - `dbt 1.0` - in December 2021. Alongside this release, we updated our policy on which versions of dbt Core we will support in the dbt platform. > **Starting with v1.0, all subsequent minor versions are available in dbt. Versions are actively supported, with patches and bug fixes, for 1 year after their initial release. At the end of the 1-year window, we encourage all users to upgrade to a newer version for better ongoing maintenance and support.** We provide different support levels for different versions, which may include new features, bug fixes, or security patches: * **[Active](https://docs.getdbt.com/docs/dbt-versions/core.md#current-version-support)**: In the first few months after a minor version's initial release, we patch it with bugfix releases. These include fixes for regressions, new bugs, and older bugs / quality-of-life improvements. We implement these changes when we have high confidence that they're narrowly scoped and won't cause unintended side effects. * **[Critical](https://docs.getdbt.com/docs/dbt-versions/core.md#current-version-support)**: When a newer minor version ships, the previous one transitions to "Critical Support" for the remainder of its one-year window. Patches during this period are limited to critical security and installation fixes. After the one-year window ends, the version reaches end of life. * **[End of Life](https://docs.getdbt.com/docs/dbt-versions/core.md#end-of-life-versions)**: Minor versions that have reached EOL no longer receive new patch releases. * **Deprecated**: dbt Core versions that are no longer maintained by dbt Labs, nor supported in the dbt platform. We'll continue to update the following release table so that users know when we plan to stop supporting different versions of Core in dbt. ##### Latest releases[​](#latest-releases "Direct link to Latest releases") | dbt Core | Initial release | Support level and end date | | -------------------------------------------------------------------------------------------------------- | --------------- | ----------------------------------- | | [**v1.11**](https://docs.getdbt.com/docs/dbt-versions/core-upgrade/upgrading-to-v1.11.md) | Dec 19, 2025 | **Active Support — Dec 18, 2026** | | [**v1.10**](https://docs.getdbt.com/docs/dbt-versions/core-upgrade/upgrading-to-v1.10.md) | Jun 16, 2025 | **Critical Support — Jun 15, 2026** | | [**v1.9**](https://docs.getdbt.com/docs/dbt-versions/core-upgrade/upgrading-to-v1.9.md) | Dec 9, 2024 | Deprecated ⛔️ | | [**v1.8**](https://docs.getdbt.com/docs/dbt-versions/core-upgrade/upgrading-to-v1.8.md) | May 9, 2024 | Deprecated ⛔️ | | [**v1.7**](https://docs.getdbt.com/docs/dbt-versions/core-upgrade/upgrading-to-v1.7.md) | Nov 2, 2023 | End of Life ⚠️ | | [**v1.6**]() | Jul 31, 2023 | End of Life ⚠️ | | [**v1.5**]() | Apr 27, 2023 | End of Life ⚠️ | | [**v1.4**]() | Jan 25, 2023 | End of Life ⚠️ | | [**v1.3**]() | Oct 12, 2022 | End of Life ⚠️ | | [**v1.2**]() | Jul 26, 2022 | Deprecated ⛔️ | | [**v1.1**]() | Apr 28, 2022 | Deprecated ⛔️ | | [**v1.0**]() | Dec 3, 2021 | Deprecated ⛔️ | | **v0.X** ⛔️ | (Various dates) | Deprecated ⛔️ | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | All functionality in dbt Core since the v1.7 release is available in [dbt release tracks](https://docs.getdbt.com/docs/dbt-versions/cloud-release-tracks.md), which provide automated upgrades at a cadence appropriate for your team. 1 Release tracks are required for the Developer and Starter plans on dbt. Accounts using older dbt versions will be migrated to the **Latest** release track. For customers of dbt: dbt Labs strongly recommends migrating environments on older and unsupported versions to [release tracks](https://docs.getdbt.com/docs/dbt-versions/cloud-release-tracks.md) or a supported version. In 2025, dbt Labs will remove the oldest dbt Core versions from availability in dbt platform, starting with v1.0 -- v1.2. Starting with v1.0, dbt will ensure that you're always using the latest compatible patch release of `dbt-core` and plugins, including all the latest fixes. You may also choose to try prereleases of those patch releases before they are generally available. For more on version support and future releases, see [Understanding dbt Core versions](https://docs.getdbt.com/docs/dbt-versions/core.md). ##### Need help upgrading?[​](#need-help-upgrading "Direct link to Need help upgrading?") If you want more advice on how to upgrade your dbt projects, check out our [migration guides](https://docs.getdbt.com/docs/dbt-versions/core-upgrade.md) and our [upgrading Q\&A page](https://docs.getdbt.com/docs/dbt-versions/upgrade-dbt-version-in-cloud.md#upgrading-legacy-versions-under-10). ##### Testing your changes before upgrading[​](#testing-your-changes-before-upgrading "Direct link to Testing your changes before upgrading") Once you know what code changes you'll need to make, you can start implementing them. We recommend you: * Create a separate dbt project, "Upgrade project", to test your changes before making them live in your main dbt project. * In your "Upgrade project", connect to the same repository you use for your production project. * Set the development environment [settings](https://docs.getdbt.com/docs/dbt-versions/upgrade-dbt-version-in-cloud.md) to run the latest version of dbt Core. * Check out a branch `dbt-version-upgrade`, make the appropriate updates to your project, and verify your dbt project compiles and runs with the new version in the Studio IDE. * If upgrading directly to the latest version results in too many issues, try testing your project iteratively on successive minor versions. There are years of development and a few breaking changes between distant versions of dbt Core (for example, 1.0 --> 1.10). The likelihood of experiencing problems upgrading between successive minor versions is much lower, which is why upgrading regularly is recommended. * Once you have your project compiling and running on the latest version of dbt in the development environment for your `dbt-version-upgrade` branch, try replicating one of your production jobs to run off your branch's code. * You can do this by creating a new deployment environment for testing, setting the custom branch to 'ON' and referencing your `dbt-version-upgrade` branch. You'll also need to set the dbt version in this environment to the latest dbt Core version. [![Setting your testing environment](/img/docs/dbt-cloud/cloud-configuring-dbt-cloud/cloud-upgrading-dbt-versions/upgrade-environment.png?v=2 "Setting your testing environment")](#)Setting your testing environment * Then add a job to the new testing environment that replicates one of the production jobs your team relies on. * If that job runs smoothly, you should be all set to merge your branch into main. * Then change your development and deployment environments in your main dbt project to run off the newest version of dbt Core. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Upgrading to dbt utils v1.0 For the first time, [dbt utils](https://hub.getdbt.com/dbt-labs/dbt_utils/latest/) is crossing the major version boundary. From [last month’s blog post](https://www.getdbt.com/blog/announcing-dbt-v1.3-and-utils/): > It’s time to formalize what was already unofficial policy: you can rely on dbt utils in the same way as you do dbt Core, with stable interfaces and consistent and intuitive naming. Just like the switch to dbt Core 1.0 last year, there are some breaking changes as we standardized and prepared for the future. Most changes can be handled with find-and-replace. If you need help, post on the [Community Forum](https://discourse.getdbt.com) or in [#package-ecosystem](https://getdbt.slack.com/archives/CU4MRJ7QB) channel on Slack. #### New features[​](#new-features "Direct link to New features") * `get_single_value()` — An easy way to pull a single value from a SQL query, instead of having to access the `[0][0]`th element of a `run_query` result. * `safe_divide()` — Returns null when the denominator is 0, instead of throwing a divide-by-zero error. * New `not_empty_string` test — An easier wrapper than using `expression_is_true` to check the length of a column. #### Enhancements[​](#enhancements "Direct link to Enhancements") * Many tests are more meaningful when you run them against subgroups of a table. For example, you may need to validate that recent data exists for every turnstile instead of a single data source being sufficient. Add the new `group_by_columns` argument to your tests to do so. Review [this article](https://www.emilyriederer.com/post/grouping-data-quality/) by the test's author for more information. * With the addition of an on-by-default `quote_identifiers` argument in the `star()` macro, you can now disable quoting if necessary. * The `recency` test now has an optional `ignore_time_component` argument which can be used when testing against a date column. This prevents the time of day the test runs from causing false negatives/positives. #### Fixes[​](#fixes "Direct link to Fixes") * `union()` now includes/excludes columns case-insensitively * `slugify()` prefixes an underscore when the first char is a digit * The `expression_is_true` test doesn’t output `*` unless storing failures, a cost improvement for BigQuery. #### Breaking Changes[​](#breaking-changes "Direct link to Breaking Changes") ##### Changes to `surrogate_key()`:[​](#changes-to-surrogate_key "Direct link to changes-to-surrogate_key") * `surrogate_key()` has been replaced by `generate_surrogate_key()`. The original treated null values and blank strings the same, which could lead to duplicate keys being created. `generate_surrogate_key()` does not have this flaw. Compare the [surrogate keys calculated for these columns](https://docs.google.com/spreadsheets/d/1qWfdbieUOSgkzdY0kmJ9iCgdqyWccA0R-6EW0EgaMQc/edit#gid=0): ![A table comparing the behavior of surrogate\_key and generate\_surrogate\_key](/assets/images/surrogate_key_behaviour-2248a1a7c8bfa9df30140fadc6021a99.png) Changing the calculation method for surrogate keys, even for the better, could have significant consequences in downstream uses (such as snapshots and incremental models which use this column as their `unique_key`). As a result, it's possible to opt into the legacy behavior by setting the following variable in your dbt project: ```yaml #dbt_project.yml vars: surrogate_key_treat_nulls_as_empty_strings: true #turn on legacy behavior ``` By creating a new macro instead of updating the behavior of the old one, we are requiring all projects who use this macro to make an explicit decision about which approach is better for their context. **Our recommendation is that existing users should opt into the legacy behavior** unless you are confident that either: * your surrogate keys never contained nulls, or * your surrogate keys are not used for incremental models, snapshots or other stateful artifacts and so can be regenerated with new values without issue. Warning to package maintainers You can not assume one behavior or the other, as each project can customize its behavior. ##### Functionality now native to dbt Core:[​](#functionality-now-native-to-dbt-core "Direct link to Functionality now native to dbt Core:") * The `expression_is_true` test no longer has a dedicated `condition` argument. Instead, use `where` which is [now available natively to all tests](https://docs.getdbt.com/reference/resource-configs/where.md): ```yaml models: - name: old_syntax tests: - dbt_utils.expression_is_true: expression: "col_a + col_b = total" #replace this... condition: "created_at > '2018-12-31'" - name: new_syntax tests: - dbt_utils.expression_is_true: expression: "col_a + col_b = total" # ...with this... where: "created_at > '2018-12-31'" ``` **Note** — This may cause some tests to get the same autogenerated names. To resolve this, you can [define a custom name for a test](https://docs.getdbt.com/reference/resource-properties/data-tests.md#define-a-custom-name-for-one-test). * The deprecated `unique_where` and `not_null_where` tests have been removed, because [where is now available natively to all tests](https://docs.getdbt.com/reference/resource-configs/where.md). To migrate, find and replace `dbt_utils.unique_where` with `unique` and `dbt_utils.not_null_where` with `not_null`. * `dbt_utils.current_timestamp()` is replaced by `dbt.current_timestamp()`. * Note that Postgres and Snowflake’s implementation of `dbt.current_timestamp()` differs from the old `dbt_utils` one ([full details here](https://github.com/dbt-labs/dbt-utils/pull/597#issuecomment-1231074577)). If you use Postgres or Snowflake and need identical backwards-compatible behavior, use `dbt.current_timestamp_backcompat()`. This discrepancy will hopefully be reconciled in a future version of dbt Core. * All other cross-db macros have moved to the dbt namespace, with no changes necessary other than replacing `dbt_utils.` with `dbt.`. Review the [cross database macros documentation](https://docs.getdbt.com/reference/dbt-jinja-functions/cross-database-macros.md) for the full list. * In your code editor, you can do a global find and replace with regex: `\{\{\s*dbt_utils\.(any_value|bool_or|cast_bool_to_text|concat|dateadd|datediff|date_trunc|escape_single_quotes|except|hash|intersect|last_day|length|listagg|position|replace|right|safe_cast|split_part|string_literal|type_bigint|type_float|type_int|type_numeric|type_string|type_timestamp|type_bigint|type_float|type_int|type_numeric|type_string|type_timestamp|except|intersect|concat|hash|length|position|replace|right|split_part|escape_single_quotes|string_literal|any_value|bool_or|listagg|cast_bool_to_text|safe_cast|dateadd|datediff|date_trunc|last_day)` → `{{ dbt.$1` ##### Removal of `insert_by_period` materialization[​](#removal-of-insert_by_period-materialization "Direct link to removal-of-insert_by_period-materialization") * The `insert_by_period` materialization has been moved to the [experimental-features repo](https://github.com/dbt-labs/dbt-labs-experimental-features/tree/main/insert_by_period). To continue to use it, add the below to your packages.yml file: ```yaml packages: - git: https://github.com/dbt-labs/dbt-labs-experimental-features subdirectory: insert_by_period revision: XXXX #optional but highly recommended. Provide a full git sha hash, e.g. 1c0bfacc49551b2e67d8579cf8ed459d68546e00. If not provided, uses the current HEAD. ``` ##### Removal of deprecated legacy behavior:[​](#removal-of-deprecated-legacy-behavior "Direct link to Removal of deprecated legacy behavior:") * `safe_add()` only works with a list of arguments; use `{{ dbt_utils.safe_add(['column_1', 'column_2']) }}` instead of varargs `{{ dbt_utils.safe_add('column_1', 'column_2') }}`. * Several long-promised deprecations to `deduplicate()` have been applied: * The `group_by` argument is replaced by `partition_by`. * `relation_alias` is removed. If you need an alias, you can pass it directly to the `relation` argument. * `order_by` is now mandatory. Pass a static value like `1` if you don’t care how they are deduplicated. * The deprecated `table` argument has been removed from `unpivot()`. Use `relation` instead. #### Resolving error messages[​](#resolving-error-messages "Direct link to Resolving error messages") After upgrading, these are common error messages you may encounter, along with their resolutions. `dict object has no attribute MACRO_NAME` **Cause**: No macro called `MACRO_NAME` exists. This is most likely because the macro has moved to the `dbt` namespace (see above). It could also be because you haven't run dbt deps or have misspelled a macro's name. **Resolution**: For [cross-database macros](https://docs.getdbt.com/reference/dbt-jinja-functions/cross-database-macros.md), change `dbt_utils.MACRO_NAME()` to `dbt.MACRO_NAME()`. `macro 'dbt_macro__generate_surrogate_key' takes not more than 1 argument(s)` **Cause**: `generate_surrogate_key()` requires a single argument containing a list of columns, not a set of varargs. **Resolution**: Change to `dbt_utils.generate_surrogate_key(['column_1', 'column_2'])` - note the square brackets. `The dbt_utils.surrogate_key has been replaced by dbt_utils.generate_surrogate_key` **Cause**: `surrogate_key()` has been replaced. **Resolution**: 1. Decide whether you need to enable backwards compatibility [as detailed above](#changes-to-surrogate_key). 2. Find and replace `dbt_utils.surrogate_key` with `dbt_utils.generate_surrogate_key`. `macro dbt_macro__test_expression_is_true takes no keyword argument condition` **Cause**: `condition` has been removed from the `expression_is_true` test, now that `where` is available on all tests automatically. **Resolution**: Replace `condition` with `where`. `No materialization insert_by_period was found for adapter` **Cause**: `insert_by_period` has moved to the experimental features repo (see above). **Resolution**: Install the package as [described above](#removal-of-insert_by_period-materialization). `dbt found two tests with the name "XXX".` **Cause**: Changing from `condition` to `where` in the `expression_is_true` test, as configs are not part of a test's unique name. **Resolution**: Define a [custom name for your test](https://docs.getdbt.com/reference/resource-properties/tests#define-a-custom-name-for-one-test). #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Upgrading to the dbt Fusion engine (v2.0) important The dbt Fusion engine is currently available for installation in: * [Local command line interface (CLI) tools](https://docs.getdbt.com/docs/local/install-dbt.md?version=2#get-started) [Preview](https://docs.getdbt.com/docs/dbt-versions/product-lifecycles "Go to https://docs.getdbt.com/docs/dbt-versions/product-lifecycles") * [VS Code and Cursor with the dbt extension](https://docs.getdbt.com/docs/install-dbt-extension.md) [Preview](https://docs.getdbt.com/docs/dbt-versions/product-lifecycles "Go to https://docs.getdbt.com/docs/dbt-versions/product-lifecycles") * [dbt platform environments](https://docs.getdbt.com/docs/dbt-versions/upgrade-dbt-version-in-cloud.md#dbt-fusion-engine) [Private preview](https://docs.getdbt.com/docs/dbt-versions/product-lifecycles "Go to https://docs.getdbt.com/docs/dbt-versions/product-lifecycles") Join the conversation in our Community Slack channel [`#dbt-fusion-engine`](https://getdbt.slack.com/archives/C088YCAB6GH). Read the [Fusion Diaries](https://github.com/dbt-labs/dbt-fusion/discussions/categories/announcements) for the latest updates. #### More information about Fusion[​](#more-information-about-fusion "Direct link to More information about Fusion") Fusion marks a significant update to dbt. While many of the workflows you've grown accustomed to remain unchanged, there are a lot of new ideas, and a lot of old ones going away. The following is a list of the full scope of our current release of the Fusion engine, including implementation, installation, deprecations, and limitations: * [About the dbt Fusion engine](https://docs.getdbt.com/docs/fusion/about-fusion.md) * [About the dbt extension](https://docs.getdbt.com/docs/about-dbt-extension.md) * [New concepts in Fusion](https://docs.getdbt.com/docs/fusion/new-concepts.md) * [Supported features matrix](https://docs.getdbt.com/docs/fusion/supported-features.md) * [Installing Fusion CLI](https://docs.getdbt.com/docs/local/install-dbt.md?version=2#get-started) * [Installing VS Code extension](https://docs.getdbt.com/docs/install-dbt-extension.md) * [Fusion release track](https://docs.getdbt.com/docs/dbt-versions/upgrade-dbt-version-in-cloud.md#dbt-fusion-engine) * [Quickstart for Fusion](https://docs.getdbt.com/guides/fusion.md?step=1) * [Upgrade guide](https://docs.getdbt.com/docs/dbt-versions/core-upgrade/upgrading-to-fusion.md) * [Fusion licensing](http://www.getdbt.com/licenses-faq) #### What to know before upgrading[​](#what-to-know-before-upgrading "Direct link to What to know before upgrading") dbt Core and dbt Fusion share a common language spec—the code in your project. dbt Labs is committed to providing feature parity with dbt Core wherever possible. At the same time, we want to take this opportunity to *strengthen the framework* by removing deprecated functionality, rationalizing confusing behavior, and providing more rigorous validation on erroneous inputs. This means that there is some work involved in preparing an existing dbt project for readiness on Fusion. That work is documented below — it should be simple, straightforward, and in many cases, auto-fixable with the [`dbt-autofix`](https://github.com/dbt-labs/dbt-autofix) helper. You can find more information about what's changing in the dbt Fusion engine [changelog](https://github.com/dbt-labs/dbt-fusion/blob/main/CHANGELOG.md). ###### Upgrade considerations[​](#upgrade-considerations "Direct link to Upgrade considerations") Keep in mind the following considerations during the upgrade process: * **Manifest incompatibility** — Fusion is backwards-compatible and can read dbt Core [manifests](https://docs.getdbt.com/reference/artifacts/manifest-json.md). However, dbt Core isn't forward-compatible and can't read Fusion manifests. Fusion produces a `v20` manifest, while the latest version of dbt Core still produces a `v12` manifest. As a result, mixing dbt Core and Fusion manifests across environments breaks cross-environment features. To avoid this, use `state:modified`, `--defer`, and cross-environment `dbt docs generate` only after *all* environments are running the latest Fusion version. Using these features before all environments are on Fusion may cause errors and failures. * **State-aware orchestration** — If using [state-aware orchestration](https://docs.getdbt.com/docs/deploy/state-aware-about.md), dbt doesn’t detect a change if a table or view is dropped outside of dbt, as the cache is unique to each dbt platform environment. This means state-aware orchestration will not rebuild that model until either there is new data or a change in the code that the model uses. * **Workarounds:** * Use the **Clear cache** button on the target Environment page to force a full rebuild (acts like a reset), or * Temporarily disable State-aware orchestration for the job and rerun it. ##### Supported adapters[​](#supported-adapters "Direct link to Supported adapters") The following adapters are supported in the dbt Fusion engine:  BigQuery * Service Account / User Token * Native OAuth * External OAuth * [Required permissions](https://docs.getdbt.com/docs/local/connect-data-platform/bigquery-setup.md#required-permissions)  Databricks * Service Account / User Token * Native OAuth  Redshift * Username / Password * IAM profile  Snowflake * Username / Password * Native OAuth * External OAuth * Key pair using a modern PKCS#8 method * MFA ##### A clean slate[​](#a-clean-slate "Direct link to A clean slate") dbt Labs is committed to moving forward with Fusion, and it will not support any deprecated functionality (see the [Changes overview](https://docs.getdbt.com/reference/changes-overview.md) for details): * All [deprecation warnings](https://docs.getdbt.com/reference/deprecations.md) must be resolved before upgrading to the new engine. This includes historic deprecations and [new ones as of dbt Core v1.10](https://docs.getdbt.com/docs/dbt-versions/core-upgrade/upgrading-to-v1.10.md#deprecation-warnings). * All [behavior change flags](https://docs.getdbt.com/reference/global-configs/behavior-changes.md#behaviors) will be removed (generally enabled). You can no longer opt out of them using `flags:` in your `dbt_project.yml`. ##### Ecosystem packages[​](#ecosystem-packages "Direct link to Ecosystem packages") The most popular `dbt-labs` packages (`dbt_utils`, `audit_helper`, `dbt_external_tables`, `dbt_project_evaluator`) are already compatible with Fusion. External packages published by organizations outside of dbt may use outdated code or incompatible features that fail to parse with the new Fusion engine. We're working with those package maintainers to make packages available for Fusion. Packages requiring an upgrade to a new release for Fusion compatibility, will be documented in this upgrade guide. ##### Changed functionality[​](#changed-functionality "Direct link to Changed functionality") When developing the Fusion engine, there were opportunities to improve the dbt framework - failing earlier (when possible), fixing bugs, optimizing run order, and deprecating flags that are no longer relevant. The result is a handful of specific and nuanced changes to existing behavior. When upgrading to Fusion, you should expect the following changes in functionality: ###### Parse time printing of relations will print out the full qualified name, instead of an empty string[​](#parse-time-printing-of-relations-will-print-out-the-full-qualified-name-instead-of-an-empty-string "Direct link to Parse time printing of relations will print out the full qualified name, instead of an empty string") In dbt Core v1, when printing the result of `get_relation()`, the parse time output for that Jinja would print `None` (the undefined object coerces to the string “None”). In Fusion, to help with intelligent batching of `get_relation()` calls (and significantly speed up `dbt compile`), dbt needs to construct a relation object with the fully qualified name resolved at parse time for the `get_relation()` adapter call. Constructing a relation object with the fully qualified name in Fusion produces different behavior than dbt Core in `print()`, `log()`, or any Jinja macro that outputs to `stdout` or `stderr` at parse time. Example: ```jinja {% set relation = adapter.get_relation( database=db_name, schema=db_schema, identifier='a') %} {{ print('relation: ' ~ relation) }} {% set relation_via_api = api.Relation.create( database=db_name, schema=db_schema, identifier='a' ) %} {{ print('relation_via_api: ' ~ relation_via_api) }} ``` The output after `dbt parse` in dbt Core v1: ```text relation: None relation_via_api: my_db.my_schema.my_table ``` The output after `dbt parse` in Fusion: ```text relation: my_db.my_schema.my_table relation_via_api: my_db.my_schema.my_table ``` ###### Deprecated flags[​](#deprecated-flags "Direct link to Deprecated flags") What are "deprecated flags"? Deprecated flags are command-line flags (like `--models`, `--print`) that you pass to dbt commands. These are being removed in Fusion. This is different from: * [Deprecation warnings](https://docs.getdbt.com/reference/deprecations.md) — Features in your project code (models, YAML, macros) that need to be updated * [Behavior change flags](https://docs.getdbt.com/reference/global-configs/behavior-changes.md) — Flags in `dbt_project.yml` that let you opt in/out of new behaviors See the [Changes overview](https://docs.getdbt.com/reference/changes-overview.md) for a full comparison. Some historic CLI flags in dbt Core will no longer do anything in Fusion. If you pass them into a dbt command using Fusion, the command will not error, but the flag will do nothing (and warn accordingly). One exception to this rule: The `--models` / `--model` / `-m` flag was renamed to `--select` / `--s` way back in dbt Core v0.21 (Oct 2021). Silently skipping this flag means ignoring your command's selection criteria, which could mean building your entire DAG when you only meant to select a small subset. For this reason, the `--models` / `--model` / `-m` flag **will raise an error** in Fusion. Please update your job definitions accordingly. | flag name | remediation | | ------------------------------------------------------------------------------------------------------------------------------------------------ | --------------------------------------------------------- | | `dbt seed` [`--show`](https://docs.getdbt.com/reference/commands/seed.md) | N/A | | [`--print` / `--no-print`](https://docs.getdbt.com/reference/global-configs/print-output.md) | No action required | | [`--printer-width`](https://docs.getdbt.com/reference/global-configs/print-output.md#printer-width) | No action required | | [`--source`](https://docs.getdbt.com/reference/commands/deps.md#non-hub-packages) | No action required | | [`--record-timing-info` / `-r`](https://docs.getdbt.com/reference/global-configs/record-timing-info.md) | No action required | | [`--cache-selected-only` / `--no-cache-selected-only`](https://docs.getdbt.com/reference/global-configs/cache.md) | No action required | | [`--clean-project-files-only` / `--no-clean-project-files-only`](https://docs.getdbt.com/reference/commands/clean.md#--clean-project-files-only) | No action required | | `--single-threaded` / `--no-single-threaded` | No action required | | `dbt source freshness` [`--output` / `-o`](https://docs.getdbt.com/docs/deploy/source-freshness.md) | | | [`--config-dir`](https://docs.getdbt.com/reference/commands/debug.md) | No action required | | [`--resource-type` / `--exclude-resource-type`](https://docs.getdbt.com/reference/global-configs/resource-type.md) | change to `--resource-types` / `--exclude-resource-types` | | `--show-resource-report` / `--no-show-resource-report` | No action required | | [`--log-cache-events` / `--no-log-cache-events`](https://docs.getdbt.com/reference/global-configs/logs.md#logging-relational-cache-events) | No action required | | `--use-experimental-parser` / `--no-use-experimental-parser` | No action required | | [`--empty-catalog`](https://docs.getdbt.com/reference/commands/cmd-docs.md#dbt-docs-generate) | | | [`--compile` / `--no-compile`](https://docs.getdbt.com/reference/commands/cmd-docs.md#dbt-docs-generate) | | | `--inline-direct` | No action required | | `--partial-parse-file-diff` / `--no-partial-parse-file-diff` | No action required | | `--partial-parse-file-path` | No action required | | `--populate-cache` / `--no-populate-cache` | No action required | | `--static-parser` / `--no-static-parser` | No action required | | `--use-fast-test-edges` / `--no-use-fast-test-edges` | No action required | | [`--introspect` / `--no-introspect`](https://docs.getdbt.com/reference/commands/compile.md#introspective-queries) | No action required | | `--inject-ephemeral-ctes` / `--no-inject-ephemeral-ctes` | | | [`--partial-parse` / `--no-partial-parse`](https://docs.getdbt.com/reference/parsing.md#partial-parsing) | No action required | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | ###### Conflicting package versions when a local package depends on a hub package which the root package also wants will error[​](#conflicting-package-versions-when-a-local-package-depends-on-a-hub-package-which-the-root-package-also-wants-will-error "Direct link to Conflicting package versions when a local package depends on a hub package which the root package also wants will error") If a local package depends on a hub package that the root package also wants, `dbt deps` doesn't resolve conflicting versions in dbt Core v1; it will install whatever the root project requests. Fusion will present an error: ```bash error: dbt8999: Cannot combine non-exact versions: =0.8.3 and =1.1.1 ``` ###### Parse will fail on nonexistent macro invocations and adapter methods[​](#parse-will-fail-on-nonexistent-macro-invocations-and-adapter-methods "Direct link to Parse will fail on nonexistent macro invocations and adapter methods") When you call a nonexistent macro in dbt: ```sql select id as payment_id, # my_nonexistent_macro is a macro that DOES NOT EXIST {{ my_nonexistent_macro('amount') }} as amount_usd, from app_data.payments ``` Or a nonexistent adapter method: ```sql {{ adapter.does_not_exist() }} ``` In dbt Core v1, `dbt parse` passes, but `dbt compile` fails. Fusion will error out during `parse`. ###### Parse will fail on missing generic test[​](#parse-will-fail-on-missing-generic-test "Direct link to Parse will fail on missing generic test") When you have an undefined generic test in your project: ```yaml models: - name: dim_wizards data_tests: - does_not_exist ``` In dbt Core v1, `dbt parse` passes, but `dbt compile` fails. Fusion will error out during `parse`. ###### Parse will fail on missing variable[​](#parse-will-fail-on-missing-variable "Direct link to Parse will fail on missing variable") When you have an undefined variable in your project: ```sql select {{ var('does_not_exist') }} as my_column ``` In dbt Core v1, `dbt parse` passes, but `dbt compile` fails. Fusion will error out during `parse`. ###### Stricter evaluation of duplicate docs blocks[​](#stricter-evaluation-of-duplicate-docs-blocks "Direct link to Stricter evaluation of duplicate docs blocks") In older versions of dbt Core, it was possible to create scenarios with duplicate [docs blocks](https://docs.getdbt.com/docs/build/documentation.md#using-docs-blocks). For example, you can have two packages with identical docs blocks referenced by an unqualified name in your dbt project. In this case, dbt Core would use whichever docs block is referenced without any warnings or errors. Fusion adds stricter evaluation of names of docs blocks to prevent such ambiguity. It will present an error if it detects duplicate names: ```bash dbt found two docs with the same name: 'docs_block_title in files: 'models/crm/_crm.md' and 'docs/crm/business_class_marketing.md' ``` To resolve this error, rename any duplicate docs blocks. ###### End of support for legacy manifest versions[​](#end-of-support-for-legacy-manifest-versions "Direct link to End of support for legacy manifest versions") You can no longer interoperate with pre-1.8 versions of dbt-core if you're a: * Hybrid customer running Fusion and an old (pre-v1.8) version of dbt Core * Customer upgrading from the old (pre-v1.8) version of dbt Core to Fusion Fusion can not interoperate with the old manifest, which powers features like deferral for `state:modified` comparison. ###### `dbt clean` will not delete any files in configured resource paths or files outside the project directory[​](#dbt-clean-will-not-delete-any-files-in-configured-resource-paths-or-files-outside-the-project-directory "Direct link to dbt-clean-will-not-delete-any-files-in-configured-resource-paths-or-files-outside-the-project-directory") In dbt Core v1, `dbt clean` deletes: * Any files outside the project directory if `clean-targets` is configured with an absolute path or relative path containing `../`, though there is an opt-in config to disable this (`--clean-project-files-only` / `--no-clean-project-files-only`). * Any files in the `asset-paths` or `doc-paths` (even though other resource paths, like `model-paths` and `seed-paths`, are restricted). In Fusion, `dbt clean` will not delete any files in configured resource paths or files outside the project directory. ###### All unit tests are run first in `dbt build`[​](#all-unit-tests-are-run-first-in-dbt-build "Direct link to all-unit-tests-are-run-first-in-dbt-build") In dbt Core v1, the direct parents of the model being unit tested needed to exist in the warehouse to retrieve the needed column name and type information. `dbt build` runs the unit tests (and their dependent models) *in lineage order*. In Fusion, `dbt build` runs *all* of the unit tests *first*, and then build the rest of the DAG, due to built-in column name and type awareness. ###### Configuring `--threads`[​](#configuring---threads "Direct link to configuring---threads") dbt Core runs with `--threads 1` by default. You can increase this number to run more nodes in parallel on the remote data platform, up to the max parallelism enabled by the DAG. Fusion handles threading differently depending on your data platform: | Adapter | Behavior | | -------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | **Snowflake** | Fusion ignores user-set threads and automatically optimizes parallelism for maximum performance.
The only supported override is `threads: 1`, which can also help resolve timeout issues if set. | | **Databricks** | Fusion ignores user-set threads and automatically optimizes parallelism for maximum performance.
The only supported override is `threads: 1`, which can also help resolve timeout issues if set. | | **BigQuery** | Fusion respects user-set threads to manage API rate limits.
Setting `--threads 0` (or omitting the setting) allows Fusion to dynamically optimize parallelism. | | **Redshift** | Fusion respects user-set threads to manage concurrency limits.
Setting `--threads 0` (or omitting the setting) allows Fusion to dynamically optimize parallelism. | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | For more information, refer to [Using threads](https://docs.getdbt.com/docs/running-a-dbt-project/using-threads.md#fusion-engine-thread-optimization). ###### Continue to compile unrelated nodes after hitting a compile error[​](#continue-to-compile-unrelated-nodes-after-hitting-a-compile-error "Direct link to Continue to compile unrelated nodes after hitting a compile error") As soon as dbt Core's `compile` encounters an error compiling one of your models, dbt stops and doesn't compile anything else. When Fusion's `compile` encounters an error, it will skip nodes downstream of the one that failed to compile, but it will keep compiling the rest of the DAG (in parallel, up to the number of configured / optimal threads). ###### Seeds with extra commas don't result in extra columns[​](#seeds-with-extra-commas-dont-result-in-extra-columns "Direct link to Seeds with extra commas don't result in extra columns") In dbt Core v1, if you have an additional comma on your seed, dbt creates a seed with an additional empty column. For example, the following seed file (with an extra comma): ```text animal, dog, cat, bear, ``` Will produce this table when `dbt seed` is executed: | animal | b | | ------ | - | | dog | | | cat | | | bear | | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | Fusion will not produce this extra column in the table resulting from `dbt seed`: | animal | | ------ | | dog | | cat | | bear | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | ###### Move standalone anchors under `anchors:` key[​](#move-standalone-anchors-under-anchors-key "Direct link to move-standalone-anchors-under-anchors-key") As part of the ongoing process of making the dbt authoring language more precise, unexpected top-level keys in a YAML file will result in errors. A common use case behind these unexpected keys is standalone anchor definitions at the top level of a YAML file. You can use the new top-level `anchors:` key as a container for these reusable configuration blocks. For example, rather than using this configuration: models/\_models.yml ```yml # id_column is not a valid name for a top-level key in the dbt authoring spec, and will raise an error id_column: &id_column_alias name: id description: This is a unique identifier. data_type: int data_tests: - not_null - unique models: - name: my_first_model columns: - *id_column_alias - name: unrelated_column_a description: This column is not repeated in other models. - name: my_second_model columns: - *id_column_alias ``` Move the anchor under the `anchors:` key instead: models/\_models.yml ```yml anchors: - &id_column_alias name: id description: This is a unique identifier. data_type: int data_tests: - not_null - unique models: - name: my_first_model columns: - *id_column_alias - name: unrelated_column_a description: This column is not repeated in other models - name: my_second_model columns: - *id_column_alias ``` This move is only necessary for fragments defined outside of the main YAML structure. For more information about this new key, see [anchors](https://docs.getdbt.com/reference/resource-properties/anchors.md). ###### Algebraic operations in Jinja macros[​](#algebraic-operations-in-jinja-macros "Direct link to Algebraic operations in Jinja macros") In dbt Core, you can set algebraic functions in the return function of a Jinja macro: ```jinja {% macro my_macro() %} return('xyz') + 'abc' {% endmacro %} ``` This is no longer supported in Fusion and will return an error: ```bash error: dbt1501: Failed to add template invalid operation: return() is called in a non-block context ``` This is not a common use case and there is no deprecation warning for this behavior in dbt Core. The supported format is: ```jinja {% macro my_macro() %} return('xyzabc') {% endmacro %} ``` ##### Accessing custom configurations in meta[​](#accessing-custom-configurations-in-meta "Direct link to Accessing custom configurations in meta") `config.get()` and `config.require()` don't return values from the `meta` dictionary. If you try to access a key that only exists in `meta`, dbt emits a warning: ```bash warning: The key 'my_key' was not found using config.get('my_key'), but was detected as a custom config under 'meta'. Please use config.meta_get('my_key') or config.meta_require('my_key') instead. ``` Behavior when a key exists only in meta: | Method | Behavior | | -------------------------- | ---------------------------------------------- | | `config.get('my_key')` | Returns the default value and emits a warning. | | `config.require('my_key')` | Raises an error and emits a warning. | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | To access custom configurations stored under meta, use the explicit methods: ```jinja {% set owner = config.meta_get('owner') %} {% set has_pii = config.meta_require('pii') %} ``` For more information, see [config.meta\_get()](https://docs.getdbt.com/reference/dbt-jinja-functions/config.md#configmeta_get) and [config.meta\_require()](https://docs.getdbt.com/reference/dbt-jinja-functions/config.md#configmeta_require). ##### Fusion compiler[​](#fusion-compiler "Direct link to Fusion compiler") ###### Snowflake model functions[​](#snowflake-model-functions "Direct link to Snowflake model functions") Fusion supports [Snowflake ML model functions](https://docs.snowflake.com/en/guides-overview-ml-functions), which allow you to call machine learning models directly in SQL. Because model function return types are flexible and defined by the underlying model, Fusion uses simplified type checking: * **Arguments:** Fusion accepts any arguments without strict type validation. * **Return type:** Fusion treats all model function results as `VARIANT`. To use the result in your models, cast it to the expected type: ```sql select my_model!predict(input_column)::float as prediction_score from {{ ref('my_table') }} ``` ##### Package support[​](#package-support "Direct link to Package support") To determine if a package is compatible with the dbt Fusion engine, visit the [dbt package hub](https://hub.getdbt.com/) and look for the Fusion-compatible badge, or review the package's [`require-dbt-version` configuration](https://docs.getdbt.com/reference/project-configs/require-dbt-version.md#pin-to-a-range). * Packages with a `require-dbt-version` that equals or contains `2.0.0` are compatible with Fusion. For example, `require-dbt-version: ">=1.10.0,<3.0.0"`. Even if a package doesn't reflect compatibility in the package hub, it may still work with Fusion. Work with package maintainers to track updates, and [thoroughly test packages](https://docs.getdbt.com/guides/fusion-package-compat?step=5) that aren't clearly compatible before deploying. * Package maintainers who would like to make their package compatible with Fusion can refer to the [Fusion package upgrade guide](https://docs.getdbt.com/guides/fusion-package-compat.md) for instructions. Fivetran package considerations: * The Fivetran `source` and `transformation` packages have been combined into a single package. * If you manually installed source packages like `fivetran/github_source`, you need to ensure `fivetran/github` is installed and deactivate the transformation models. ###### Package compatibility messages[​](#package-compatibility-messages "Direct link to Package compatibility messages") Inconsistent Fusion warnings and `dbt-autofix` logs Fusion warnings and `dbt-autofix` logs may show different messages about package compatibility. If you use [`dbt-autofix`](https://github.com/dbt-labs/dbt-autofix) while upgrading to Fusion in the Studio IDE or dbt VS Code extension, you may see different messages about package compatibility between `dbt-autofix` and Fusion warnings. Here's why: * Fusion warnings are emitted based on a package's `require-dbt-version` and whether `require-dbt-version` contains `2.0.0`. * Some packages are already Fusion-compatible even though package maintainers haven't yet updated `require-dbt-version`. * `dbt-autofix` knows about these compatible packages and will not try to upgrade a package that it knows is already compatible. This means that even if you see a Fusion warning for a package that `dbt-autofix` identifies as compatible, you don't need to change the package. The message discrepancy is temporary while we implement and roll out `dbt-autofix`'s enhanced compatibility detection to Fusion warnings. Here's an example of a Fusion warning in the Studio IDE that says a package isn't compatible with Fusion but `dbt-autofix` indicates it is compatible: ```text dbt1065: Package 'dbt_utils' requires dbt version [>=1.30,<2.0.0], but current version is 2.0.0-preview.72. This package may not be compatible with your dbt version. dbt(1065) [Ln 1, Col 1] ``` #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Upgrading to v1.0 ##### Resources[​](#resources "Direct link to Resources") * [Discourse](https://discourse.getdbt.com/t/3180) * [Changelog](https://github.com/dbt-labs/dbt-core/blob/1.0.latest/CHANGELOG.md) * [dbt Core CLI Installation guide](https://docs.getdbt.com/docs/local/install-dbt.md) * [Cloud upgrade guide](https://docs.getdbt.com/docs/dbt-versions/upgrade-dbt-version-in-cloud.md) #### What to know before upgrading[​](#what-to-know-before-upgrading "Direct link to What to know before upgrading") dbt Core major version 1.0 includes a number of breaking changes! Wherever possible, we have offered backwards compatibility for old behavior, and (where necessary) made migration simple. ##### Renamed fields in `dbt_project.yml`[​](#renamed-fields-in-dbt_projectyml "Direct link to renamed-fields-in-dbt_projectyml") **These affect everyone:** * [model-paths](https://docs.getdbt.com/reference/project-configs/model-paths.md) have replaced `source-paths` in `dbt-project.yml`. * [seed-paths](https://docs.getdbt.com/reference/project-configs/seed-paths.md) have replaced `data-paths` in `dbt-project.yml` with a default value of `seeds`. * The [packages-install-path](https://docs.getdbt.com/reference/project-configs/packages-install-path.md) was updated from `modules-path`. Additionally the default value is now `dbt_packages` instead of `dbt_modules`. You may need to update this value in [`clean-targets`](https://docs.getdbt.com/reference/project-configs/clean-targets.md). * Default for `quote_columns` is now `True` for all adapters other than Snowflake. **These probably don't:** * The default value of [test-paths](https://docs.getdbt.com/reference/project-configs/test-paths.md) has been updated to be the plural `tests`. * The default value of [analysis-paths](https://docs.getdbt.com/reference/project-configs/analysis-paths.md) has been updated to be the plural `analyses`. ##### Tests[​](#tests "Direct link to Tests") The two **test types** are now "singular" and "generic" (instead of "data" and "schema", respectively). The `test_type:` selection method accepts `test_type:singular` and `test_type:generic`. (It will also accept `test_type:schema` and `test_type:data` for backwards compatibility.) **Not backwards compatible:** The `--data` and `--schema` flags to dbt test are no longer supported, and tests no longer have the tags `'data'` and `'schema'` automatically applied. Updated docs: [data tests](https://docs.getdbt.com/docs/build/data-tests.md), [test selection](https://docs.getdbt.com/reference/node-selection/test-selection-examples.md), [selection methods](https://docs.getdbt.com/reference/node-selection/methods.md). The `greedy` flag/property has been renamed to **`indirect_selection`**, which is now eager by default. **Note:** This reverts test selection to its pre-v0.20 behavior by default. `dbt test -s my_model` *will* select multi-parent tests, such as `relationships`, that depend on unselected resources. To achieve the behavior change in v0.20 + v0.21, set `--indirect-selection=cautious` on the CLI or `indirect_selection: cautious` in YAML selectors. Updated docs: [test selection examples](https://docs.getdbt.com/reference/node-selection/test-selection-examples.md), [yaml selectors](https://docs.getdbt.com/reference/node-selection/yaml-selectors.md). ##### Global macros[​](#global-macros "Direct link to Global macros") Global project macros have been reorganized, and some old unused macros have been removed: `column_list`, `column_list_for_create_table`, `incremental_upsert`. This is unlikely to affect your project. ##### Installation[​](#installation "Direct link to Installation") * [Installation docs](https://docs.getdbt.com/docs/supported-data-platforms.md) reflects adapter-specific installations * `python -m pip install dbt` is no longer supported, and will raise an explicit error. Install the specific adapter plugin you need as `python -m pip install dbt-`. * `brew install dbt` is no longer supported. Install the specific adapter plugin you need (among Postgres, Redshift, Snowflake, or BigQuery) as `brew install dbt-`. * Removed official support for python 3.6, which is reaching end of life on December 23, 2021 ##### For users of adapter plugins[​](#for-users-of-adapter-plugins "Direct link to For users of adapter plugins") * **BigQuery:** Support for ingestion-time-partitioned tables has been officially deprecated in favor of modern approaches. Use `partition_by` and incremental modeling strategies instead. For more information, refer to [Incremental models](https://docs.getdbt.com/docs/build/incremental-models.md). ##### For maintainers of plugins + other integrations[​](#for-maintainers-of-plugins--other-integrations "Direct link to For maintainers of plugins + other integrations") We've introduced a new [**structured event interface**](https://docs.getdbt.com/reference/events-logging.md), and we've transitioned all dbt logging to use this new system. **This includes a breaking change for adapter plugins**, requiring a very simple migration. For more details, see the [`events` module README](https://github.com/dbt-labs/dbt-core/blob/HEAD/core/dbt/events/README.md#adapter-maintainers). If you maintain a different kind of plugin that *needs* legacy logging, for the time being, you can re-enable it with an env var (`DBT_ENABLE_LEGACY_LOGGER=True`); be advised that we will remove this capability in a future version of dbt Core. The [**dbt RPC Server**](https://docs.getdbt.com/reference/commands/rpc.md) has been split out from `dbt-core` and is now packaged separately. Its functionality will be fully deprecated by the end of 2022, in favor of a new dbt Server. Instead of `dbt rpc`, use `dbt-rpc serve`. **Artifacts:** New schemas (manifest v4, run results v4, sources v3). Notable changes: add `metrics` nodes; schema test + data test nodes are renamed to generic test + singular test nodes; freshness threshold default values look slightly different. ##### Deprecations from long ago[​](#deprecations-from-long-ago "Direct link to Deprecations from long ago") Several under-the-hood changes from past minor versions, tagged with deprecation warnings, have now been fully deprecated. * The `packages` argument of [dispatch](https://docs.getdbt.com/reference/dbt-jinja-functions/dispatch.md) has been deprecated and will raise an exception when used. * The "adapter\_macro" macro has been deprecated. Instead, use the [dispatch](https://docs.getdbt.com/reference/dbt-jinja-functions/dispatch.md) method to find a macro and call the result. * The `release` arg has been removed from the `execute_macro` method. #### New features and changed documentation[​](#new-features-and-changed-documentation "Direct link to New features and changed documentation") * Add [metrics](https://docs.getdbt.com/docs/build/build-metrics-intro.md), a new node type * [Generic tests](https://docs.getdbt.com/best-practices/writing-custom-generic-tests.md) can be defined in `tests/generic` (new), in addition to `macros/` (as before) * [Parsing](https://docs.getdbt.com/reference/parsing.md): partial parsing and static parsing have been turned on by default. * [Global configs](https://docs.getdbt.com/reference/global-configs/about-global-configs.md) have been standardized. Related updates to [global CLI flags](https://docs.getdbt.com/reference/global-configs/about-global-configs.md) and [`profiles.yml`](https://docs.getdbt.com/docs/local/profiles.yml.md). * [The `init` command](https://docs.getdbt.com/reference/commands/init.md) has a whole new look and feel. It's no longer just for first-time users. * Add `result:` subselectors for smarter reruns when dbt models have errors and tests fail. See examples: [Pro-tips for Workflows](https://docs.getdbt.com/best-practices/best-practice-workflows.md#pro-tips-for-workflows) * Secret-prefixed [env vars](https://docs.getdbt.com/reference/dbt-jinja-functions/env_var.md) are now allowed only in `profiles.yml` + `packages.yml` #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Upgrading to v1.1 ##### Resources[​](#resources "Direct link to Resources") * [Changelog](https://github.com/dbt-labs/dbt-core/blob/1.1.latest/CHANGELOG.md) * [dbt Core CLI Installation guide](https://docs.getdbt.com/docs/local/install-dbt.md) * [Cloud upgrade guide](https://docs.getdbt.com/docs/dbt-versions/upgrade-dbt-version-in-cloud.md) #### What to know before upgrading[​](#what-to-know-before-upgrading "Direct link to What to know before upgrading") There are no breaking changes for code in dbt projects and packages. We are committed to providing backwards compatibility for all versions 1.x. If you encounter an error upon upgrading, please let us know by [opening an issue](https://github.com/dbt-labs/dbt-core/issues/new). ##### For maintainers of adapter plugins[​](#for-maintainers-of-adapter-plugins "Direct link to For maintainers of adapter plugins") We have reworked the testing suite for adapter plugin functionality. For details on the new testing suite, refer to the "Test your adapter" step in the [Build, test, document, and promote adapters](https://docs.getdbt.com/guides/adapter-creation.md) guide. The abstract methods `get_response` and `execute` now only return `connection.AdapterReponse` in type hints. Previously, they could return a string. We encourage you to update your methods to return an object of class `AdapterResponse`, or implement a subclass specific to your adapter. This also gives you the opportunity to add fields specific to your adapter's query execution, such as `rows_affected` or `bytes_processed`. ##### For consumers of dbt artifacts (metadata)[​](#for-consumers-of-dbt-artifacts-metadata "Direct link to For consumers of dbt artifacts (metadata)") The manifest schema version will be updated to v5. The only change is to the default value of `config` for parsed nodes. For users of [state-based functionality](https://docs.getdbt.com/reference/node-selection/syntax.md#about-node-selection), such as the `state:modified` selector, recall that: > The `--state` artifacts must be of schema versions that are compatible with the currently running dbt version. If you have two jobs, whereby one job compares or defers to artifacts produced by the other, you'll need to upgrade both at the same time. If there's a mismatch, dbt will alert you with this error message: ```text Expected a schema version of "https://schemas.getdbt.com/dbt/manifest/v5.json" in /manifest.json, but found "https://schemas.getdbt.com/dbt/manifest/v4.json". Are you running with a different version of dbt? ``` #### New and changed documentation[​](#new-and-changed-documentation "Direct link to New and changed documentation") [**Incremental models**](https://docs.getdbt.com/docs/build/incremental-models.md) can now accept a list of multiple columns as their `unique_key`, for models that need a combination of columns to uniquely identify each row. This is supported by the most common data warehouses, for incremental strategies that make use of the `unique_key` config (`merge` and `delete+insert`). [**Generic tests**](https://docs.getdbt.com/reference/resource-properties/data-tests.md) can define custom names. This is useful to "prettify" the synthetic name that dbt applies automatically. It's needed to disambiguate the case when the same generic test is defined multiple times with different configurations. [**Sources**](https://docs.getdbt.com/reference/source-properties.md) can define configuration inline with other `.yml` properties, just like other resource types. The only supported config is `enabled`; you can use this to dynamically enable/disable sources based on environment or package variables. ##### Advanced and experimental functionality[​](#advanced-and-experimental-functionality "Direct link to Advanced and experimental functionality") **Fresh Rebuilds.** There's a new *experimental* selection method in town: [`source_status:fresher`](https://docs.getdbt.com/reference/node-selection/methods.md#source_status). Much like the `state:` and `result` methods, the goal is to use dbt metadata to run your DAG more efficiently. If dbt has access to previous and current results of `dbt source freshness` (the `sources.json` artifact), dbt can compare them to determine which sources have loaded new data, and select only resources downstream of "fresher" sources. Read more in [Understanding State](https://docs.getdbt.com/reference/node-selection/syntax.md#about-node-selection) and [CI/CD in dbt](https://docs.getdbt.com/docs/deploy/continuous-integration.md). [**dbt-Jinja functions**](https://docs.getdbt.com/reference/dbt-jinja-functions-context-variables.md) have a new landing page, and two new members: * [`print`](https://docs.getdbt.com/reference/dbt-jinja-functions/print.md) exposes the Python `print()` function. It can be used as an alternative to `log()`, and together with the `QUIET` config, for advanced macro-driven workflows. * [`selected_resources`](https://docs.getdbt.com/reference/dbt-jinja-functions/selected_resources.md) exposes, at runtime, the list of DAG nodes selected by the current task. [**Global configs**](https://docs.getdbt.com/reference/global-configs/about-global-configs.md) include some new additions: * `QUIET` and `NO_PRINT`, to control which log messages dbt prints to terminal output. For use in advanced macro-driven workflows, such as [codegen](https://hub.getdbt.com/dbt-labs/codegen/latest/). * `CACHE_SELECTED_ONLY` is an *experimental* config that can significantly speed up dbt's start-of-run preparations, in cases where you're running only a few models from a large project that manages many schemas. ##### For users of specific adapters[​](#for-users-of-specific-adapters "Direct link to For users of specific adapters") **dbt-bigquery** added Support for finer-grained configuration of query timeout and retry when defining your [connection profile](https://docs.getdbt.com/docs/local/connect-data-platform/bigquery-setup.md). **dbt-spark** added support for a [`session` connection method](https://docs.getdbt.com/docs/local/connect-data-platform/spark-setup.md#session), for use with a pySpark session, to support rapid iteration when developing advanced or experimental functionality. This connection method is not recommended for new users, and it is not supported in dbt. ##### Dependencies[​](#dependencies "Direct link to Dependencies") [Python compatibility](https://docs.getdbt.com/faqs/Core/install-python-compatibility.md): dbt Core officially supports Python 3.10 #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Upgrading to v1.10 #### Resources[​](#resources "Direct link to Resources") * dbt Core [v1.10 changelog](https://github.com/dbt-labs/dbt-core/blob/1.10.latest/CHANGELOG.md) * [dbt Core CLI Installation guide](https://docs.getdbt.com/docs/local/install-dbt.md) * [Cloud upgrade guide](https://docs.getdbt.com/docs/dbt-versions/upgrade-dbt-version-in-cloud.md#release-tracks) #### What to know before upgrading[​](#what-to-know-before-upgrading "Direct link to What to know before upgrading") dbt Labs is committed to providing backward compatibility for all versions 1.x. Any behavior changes will be accompanied by a [behavior change flag](https://docs.getdbt.com/reference/global-configs/behavior-changes.md#behavior-change-flags) to provide a migration window for existing projects. If you encounter an error upon upgrading, please let us know by [opening an issue](https://github.com/dbt-labs/dbt-core/issues/new). Starting in 2024, dbt provides the functionality from new versions of dbt Core via [release tracks](https://docs.getdbt.com/docs/dbt-versions/cloud-release-tracks.md) with automatic upgrades. If you have selected the **Latest** release track in dbt, you already have access to all the features, fixes, and other functionality that is included in dbt Core v1.10! If you have selected the **Compatible** release track, you will have access in the next monthly **Compatible** release after the dbt Core v1.10 final release. For users of dbt Core, since v1.8, we recommend explicitly installing both `dbt-core` and `dbt-`. This may become required for a future version of dbt. For example: ```sql python3 -m pip install dbt-core dbt-snowflake ``` #### New and changed features and functionality[​](#new-and-changed-features-and-functionality "Direct link to New and changed features and functionality") New features and functionality available in dbt Core v1.10 ##### The `--sample` flag[​](#the---sample-flag "Direct link to the---sample-flag") Large data sets can slow down dbt build times, making it harder for developers to test new code efficiently. The [`--sample` flag](https://docs.getdbt.com/docs/build/sample-flag.md), available for the `run` and `build` commands, helps reduce build times and warehouse costs by running dbt in sample mode. It generates filtered refs and sources using time-based sampling, allowing developers to validate outputs without building entire models. ##### Move standalone anchors under `anchors:` key[​](#move-standalone-anchors-under-anchors-key "Direct link to move-standalone-anchors-under-anchors-key") As part of the ongoing process of making the dbt authoring language more precise, dbt Core v1.10 raises a warning when it sees an unexpected top-level key in a properties YAML file. A common use case behind these unexpected keys is standalone anchor definitions at the top level of a properties YAML file. You can use the new top-level `anchors:` key as a container for these reusable configuration blocks. For example, rather than using this configuration: models/\_models.yml ```yml id_column: &id_column_alias name: id description: This is a unique identifier. data_type: int data_tests: - not_null - unique models: - name: my_first_model columns: - *id_column_alias - name: unrelated_column_a description: This column is not repeated in other models. - name: my_second_model columns: - *id_column_alias ``` Move the anchor under the `anchors:` key instead: models/\_models.yml ```yml anchors: - &id_column_alias name: id description: This is a unique identifier. data_type: int data_tests: - not_null - unique models: - name: my_first_model columns: - *id_column_alias - name: unrelated_column_a description: This column is not repeated in other models - name: my_second_model columns: - *id_column_alias ``` This move is only necessary for fragments defined outside of the main YAML structure. For more information about this new key, see [anchors](https://docs.getdbt.com/reference/resource-properties/anchors.md). ##### Parsing `catalogs.yml`[​](#parsing-catalogsyml "Direct link to parsing-catalogsyml") dbt Core can now parse the `catalogs.yml` file. This is an important milestone in the journey to supporting external catalogs for Iceberg tables, as it enables write integrations. You'll be able to provide a config specifying a catalog integration for your producer model: For example: ```yml catalogs: - name: catalog_dave # materializing the data to an external location, and metadata to that data catalog write_integrations: - name: databricks_glue_write_integration external_volume: databricks_external_volume_prod table_format: iceberg catalog_type: unity ``` The implementation for the model would look like this: models/schemas.yml ```yaml models: - name: my_second_public_model config: catalog_name: catalog_dave ``` Check out our [docs on external catalog support](https://docs.getdbt.com/docs/mesh/iceberg/about-catalogs.md) today! We'll have more information about this in the coming weeks, but this is an exciting step in journey to cross-platform support. ##### Integrating dbt Core artifacts with dbt projects[​](#integrating-dbt-core-artifacts-with-dbt-projects "Direct link to Integrating dbt Core artifacts with dbt projects") With [hybrid projects](https://docs.getdbt.com/docs/deploy/hybrid-projects.md), dbt Core users working in the command line interface (CLI) can execute runs that seamlessly upload [artifacts](https://docs.getdbt.com/reference/artifacts/dbt-artifacts.md) into dbt. This enhances hybrid dbt Core/dbt deployments by: * Fostering collaboration between dbt + dbt Core users by enabling them to visualize and perform [cross-project references](https://docs.getdbt.com/docs/mesh/govern/project-dependencies.md#how-to-write-cross-project-ref) to models defined in dbt Core projects. This feature unifies dbt + dbt Core workflows for a more connected dbt experience. * Giving dbt and dbt Core users insights into their models and assets in [Catalog](https://docs.getdbt.com/docs/explore/explore-projects.md). To view Catalog, you must have have a [developer or read-only license](https://docs.getdbt.com/docs/cloud/manage-access/seats-and-users.md). * (Coming soon) Enabling users working in the [Canvas](https://docs.getdbt.com/docs/cloud/canvas.md) to build off of models already created by a central data team in dbt Core rather than having to start from scratch. Hybrid projects are available as a private beta to [dbt Enterprise accounts](https://www.getdbt.com/pricing). Contact your account representative to register your interest in the beta. ##### Managing changes to legacy behaviors[​](#managing-changes-to-legacy-behaviors "Direct link to Managing changes to legacy behaviors") dbt Core v1.10 introduces new flags for [managing changes to legacy behaviors](https://docs.getdbt.com/reference/global-configs/behavior-changes.md). You may opt into recently introduced changes (disabled by default), or opt out of mature changes (enabled by default), by setting `True` / `False` values, respectively, for `flags` in `dbt_project.yml`. You can read more about each of these behavior changes in the following links: * (Introduced, disabled by default) [`validate_macro_args`](https://docs.getdbt.com/reference/global-configs/behavior-changes.md#macro-argument-validation). If the flag is set to `True`, dbt will raise a warning if the argument `type` names you've added in your macro YAMLs don't match the argument names in your macro or if the argument types aren't valid according to the [supported types](https://docs.getdbt.com/reference/resource-properties/arguments.md#supported-types). * (Introduced, disabled by default) [`require_all_warnings_handled_by_warn_error`](https://docs.getdbt.com/reference/global-configs/behavior-changes.md#warn-error-handler-for-all-warnings). If this flag is set to `True`, all warnings raised during a run will be routed through the `--warn-error` / `--warn-error-options` handler. This ensures consistent behavior when promoting warnings to errors or silencing them. When the flag is `False` (which is the current default), only some warnings are processed by the handler — others may bypass it. Turning it on for projects that use `--warn-error` (or `--warn-error-options='{"error":"all"}'`) may cause build failures on warnings that were previously ignored to fail so we recommend enabling it gradually, one a project at a time. ##### Deprecation warnings[​](#deprecation-warnings "Direct link to Deprecation warnings") Starting in `v1.10`, you will receive deprecation warnings for dbt code that will become invalid in the future, including: * Custom inputs (for example, unrecognized resource properties, configurations, and top-level keys) * Duplicate YAML keys in the same file * Unexpected Jinja blocks (for example, `{% endmacro %}` tags without a corresponding `{% macro %}` tag) * Some `properties` are moving to `configs` * And more dbt will start raising these warnings in version `1.10`, but making these changes will not be a prerequisite for using it. We at dbt Labs understand that it will take existing users time to migrate their projects, and it is not our goal to disrupt anyone with this update. The goal is to enable you to work with more safety, feedback, and confidence going forward. What does this mean for you? 1. If your project (or dbt package) encounters a new deprecation warning in `v1.10`, plan to update your invalid code soon. Although it’s just a warning for now, in a future version, dbt will enforce stricter validation of the inputs in your project. Check out the [`dbt-autofix` tool](https://github.com/dbt-labs/dbt-autofix) to autofix many of these! 2. In the future, the [`meta` config](https://docs.getdbt.com/reference/resource-configs/meta.md) will be the only place to put custom user-defined attributes. Everything else will be strongly typed and strictly validated. If you have an extra attribute you want to include in your project, or a model config you want to access in a custom materialization, you must nest it under `meta` moving forward. 3. If you are using the [`—-warn-error` flag](https://docs.getdbt.com/reference/global-configs/warnings.md) (or `--warn-error-options '{"error": "all"}'`) to promote all warnings to errors, this will include new deprecation warnings coming to dbt Core. If you don’t want these to be promoted to errors, the `--warn-error-options` flag gives you more granular control over exactly which types of warnings are treated as errors. You can set `"warn": ["Deprecations"]` (new as of `v1.10`) to continue treating the deprecation warnings as warnings. 4. The `--models` / `--model` / `-m` flag was renamed to `--select` / `--s` way back in dbt Core v0.21 (Oct 2021). Silently skipping this flag means ignoring your command's selection criteria, which could mean building your entire DAG when you only meant to select a small subset. For this reason, the `--models` / `--model` / `-m` flag **will raise a warning** in dbt Core v1.10, and an error in Fusion. Please update your job definitions accordingly. ###### Custom inputs[​](#custom-inputs "Direct link to Custom inputs") Historically, dbt has allowed you to configure inputs largely unconstrained. A common example of this is setting custom YAML properties: ```yml models: - name: my_model description: A model in my project. dbt_is_awesome: true # a custom property ``` dbt detects the unrecognized custom property (`dbt_is_awesome`) and silently continues. Without a set of strictly defined inputs, it becomes challenging to validate your project's configuration. This creates unintended issues such as: * Silently ignoring misspelled properties and configurations (for example, `desciption:` instead of `description:`). * Unintended collisions with user code when dbt introduces a new “reserved” property or configuration. If you have an unrecognized custom property, you will receive a warning, and in a future version, dbt will cease to support custom properties. Moving forward, these should be nested under the [`meta` config](https://docs.getdbt.com/reference/resource-configs/meta.md), which will be the only place to put custom user-defined attributes: ```yml models: - name: my_model description: A model in my project. config: meta: dbt_is_awesome: true ``` ###### Custom keys not nested under meta[​](#custom-keys-not-nested-under-meta "Direct link to Custom keys not nested under meta") Previously, when you could define any additional fields directly under `config`, it could lead to collisions between pre-existing user-defined configurations and official configurations of the dbt framework. In the future, the `meta` config will be the sole location for custom user-defined attributes. Everything else will be strongly typed and strictly validated. If you have an extra attribute you want to include in your project, or a model config you want to access in a custom materialization, you must nest it under `meta` moving forward: ```yaml models: - name: my_model config: meta: custom_config_key: value columns: - name: my_column config: meta: some_key: some_value ``` ###### Duplicate keys in the same yaml file[​](#duplicate-keys-in-the-same-yaml-file "Direct link to Duplicate keys in the same yaml file") If two identical keys exist in the same properties YAML file, you will get a warning, and in a future version, dbt will stop supporting duplicate keys. Previously, if identical keys existed in the same properties YAML file, dbt silently overwrite, using the last configuration listed in the file. profiles.yml ```yml my_profile: target: my_target outputs: ... my_profile: # dbt would use only this profile key target: my_other_target outputs: ... ``` Moving forward, you should delete unused keys or move them to a separate properties YAML file. ###### Unexpected Jinja blocks[​](#unexpected-jinja-blocks "Direct link to Unexpected Jinja blocks") If you have an orphaned Jinja block, you will receive a warning, and in a future version, dbt will stop supporting unexpected Jinja blocks. Previously, these orphaned Jinja blocks were silently ignored. macros/my\_macro.sql ```sql {% endmacro %} # orphaned endmacro jinja block {% macro hello() %} hello! {% endmacro %} ``` Moving forward, you should delete these orphaned Jinja blocks. ###### Properties moving to configs[​](#properties-moving-to-configs "Direct link to Properties moving to configs") Some historical properties are moving entirely to configs. This will include: `freshness`, `meta`, `tags`, `docs`, `group`, and `access` If you previously set one of the impacted properties, such as `freshness`: ```yaml sources: - name: ecom schema: raw description: E-commerce data for the Jaffle Shop freshness: warn_after: count: 24 period: hour ``` You should now set it under `config`: ```yaml sources: - name: ecom schema: raw description: E-commerce data for the Jaffle Shop config: freshness: warn_after: count: 24 period: hour ``` ###### Custom output path for source freshness[​](#custom-output-path-for-source-freshness "Direct link to Custom output path for source freshness") The ability to override the default path for `sources.json` via the `--output` or `-o` flags has been deprecated. You can still set the path for all artifacts in the step with `--target-path`, but will receive a warning if trying to set the path for just source freshness. ###### Warn error options[​](#warn-error-options "Direct link to Warn error options") The `warn_error_option` options for `include` and `exclude` have been deprecated and replaced with `error` and `warn`, respectively. ```yaml ... flags: warn_error_options: error: # Previously called "include" warn: # Previously called "exclude" silence: # To silence or ignore warnings - NoNodesForSelectionCriteria ``` #### Adapter-specific features and functionalities[​](#adapter-specific-features-and-functionalities "Direct link to Adapter-specific features and functionalities") Snowflake column size change [Snowflake plans to increase](https://docs.snowflake.com/en/release-notes/bcr-bundles/un-bundled/bcr-2118) the default column size for string and binary data types in May 2026. `dbt-snowflake` versions below v1.10.6 may fail to build certain incremental models when this change is deployed.  Assess impact and required actions If you're using a `dbt-snowflake` version below v1.10.6 or have not yet migrated to a [release track](https://docs.getdbt.com/docs/dbt-versions/cloud-release-tracks.md) in the dbt platform, your adapter version is incompatible with this change and may fail to build incremental models that meet *both* of the following conditions: * Contain string columns with collation defined * Use the `on_schema_change='sync_all_columns'` config To check whether this change affects your project, run the following [list](https://docs.getdbt.com/reference/commands/list.md) command: ```bash dbt ls -s config.materialized:incremental,config.on_schema_change:sync_all_columns --resource-type model ``` * If the command returns `No nodes selected!`, no action is required. * If the command returns one or more models (for example, `Found 1000 models, 644 macros`), you may be impacted if those models have string columns that don't specify a width. In that case, upgrade to a version that includes the fix: * **dbt Core**: `dbt-snowflake` v1.10.6 or later. For upgrade instructions, see [Upgrade adapters](https://docs.getdbt.com/docs/local/install-dbt.md#upgrade-adapters). * **dbt platform**: Any release track (Latest, Compatible, Extended, or Fallback). * **dbt Fusion engine**: v2.0.0-preview.147 or higher. This ensures your incremental models can safely handle schema changes while maintaining required collation settings. ##### Snowflake[​](#snowflake "Direct link to Snowflake") * You can use the `platform_detection_timeout_seconds` parameter to control how long the Snowflake connector waits when detecting the cloud platform where the connection is being made. For more information, see [Snowflake setup](https://docs.getdbt.com/docs/local/connect-data-platform/snowflake-setup.md#platform_detection_timeout_seconds). ##### BigQuery[​](#bigquery "Direct link to BigQuery") * `dbt-bigquery` cancels BigQuery jobs that exceed their configured timeout by sending a cancellation request. If the request succeeds, dbt stops the job. If the request fails, the BigQuery job may keep running in the background until it finishes or you cancel it manually. For more information, see [Timeout and retries](https://docs.getdbt.com/docs/local/connect-data-platform/bigquery-setup.md#timeouts-and-retries). #### Quick hits[​](#quick-hits "Direct link to Quick hits") * Provide the [`loaded_at_query`](https://docs.getdbt.com/reference/resource-properties/freshness.md#loaded_at_query) property for source freshness to specify custom SQL to generate the `maxLoadedAt` time stamp on the source (versus the [built-in query](https://github.com/dbt-labs/dbt-adapters/blob/6c41bedf27063eda64375845db6ce5f7535ef6aa/dbt/include/global_project/macros/adapters/freshness.sql#L4-L16), which uses the `loaded_at_field`). You cannot define `loaded_at_query` if the `loaded_at_field` config is also provided. * Provide validation for macro arguments using the [`validate_macro_args`](https://docs.getdbt.com/reference/global-configs/behavior-changes.md#macro-argument-validation) flag, which is disabled by default. When enabled, this flag checks that documented macro argument names match those in the macro definition and validates their types against a supported format. Previously, dbt did not enforce standard argument types, treating the type field as documentation-only. If no arguments are documented, dbt infers them from the macro and includes them in the manifest.json file. Learn more about [supported types](https://docs.getdbt.com/reference/resource-properties/arguments.md#supported-types). * You can use the [`config.meta_get()`](https://docs.getdbt.com/reference/dbt-jinja-functions/config.md#configmeta_get) and [`config.meta_require()`](https://docs.getdbt.com/reference/dbt-jinja-functions/config.md#configmeta_require) functions to access custom configurations stored under `meta`. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Upgrading to v1.11 #### Resources[​](#resources "Direct link to Resources") * [dbt Core v1.11 changelog](https://github.com/dbt-labs/dbt-core/blob/1.11.latest/CHANGELOG.md) * [dbt Core CLI Installation guide](https://docs.getdbt.com/docs/local/install-dbt.md) * [Cloud upgrade guide](https://docs.getdbt.com/docs/dbt-versions/upgrade-dbt-version-in-cloud.md#release-tracks) #### What to know before upgrading[​](#what-to-know-before-upgrading "Direct link to What to know before upgrading") dbt Labs is committed to providing backward compatibility for all versions 1.x. Any behavior changes will be accompanied by a [behavior change flag](https://docs.getdbt.com/reference/global-configs/behavior-changes.md#behavior-change-flags) to provide a migration window for existing projects. If you encounter an error upon upgrading, please let us know by [opening an issue](https://github.com/dbt-labs/dbt-core/issues/new). Starting in 2024, dbt provides the functionality from new versions of dbt Core via [release tracks](https://docs.getdbt.com/docs/dbt-versions/cloud-release-tracks.md) with automatic upgrades. If you have selected the **Latest** release track in dbt, you already have access to all the features, fixes, and other functionality included in the latest dbt Core version! If you have selected the **Compatible** release track, you will have access in the next monthly **Compatible** release after the dbt Core v1.11 final release. We continue to recommend explicitly installing both `dbt-core` and `dbt-`. This may become required for a future version of dbt. For example: ```sql python3 -m pip install dbt-core dbt-snowflake ``` #### New and changed features and functionality[​](#new-and-changed-features-and-functionality "Direct link to New and changed features and functionality") New features and functionality available in dbt Core v1.11 ##### User-defined functions (UDFs)[​](#user-defined-functions-udfs "Direct link to User-defined functions (UDFs)") dbt Core v1.11 introduces support for user-defined functions (UDFs), which enable you to define and register custom functions in your warehouse. Like macros, UDFs promote code reuse, but they are objects in the warehouse so you can reuse the same logic in tools outside dbt. Key features include: * **Define UDFs as first-class dbt resources**: Create UDF files in a `functions/` directory with corresponding YAML configuration. * **Execution**: Create, update, and rename UDFs as part of DAG execution using `dbt build --select "resource_type:function"` * **DAG integration**: When executing `dbt build`, UDFs are built before models that reference them, ensuring proper dependency management. * **New `function()` macro**: Reference UDFs in your models using the `{{ function('function_name') }}` Jinja macro. * **Deferral**: When you run a dbt command with `--defer` and `--state`, `function()` calls resolve to the UDF in the state manifest, so you can run models that depend on UDFs without building those UDFs first. Read more about UDFs, including prerequisites and how to define and use them in the [UDF documentation](https://docs.getdbt.com/docs/build/udfs.md). ##### Managing changes to legacy behaviors[​](#managing-changes-to-legacy-behaviors "Direct link to Managing changes to legacy behaviors") dbt Core v1.11 introduces new flags for [managing changes to legacy behaviors](https://docs.getdbt.com/reference/global-configs/behavior-changes.md). You may opt into recently introduced changes (disabled by default), or opt out of mature changes (enabled by default), by setting `True` / `False` values, respectively, for `flags` in `dbt_project.yml`. You can read more about each of these behavior changes in the following links: * (Introduced, disabled by default) [`require_unique_project_resource_names`](https://docs.getdbt.com/reference/global-configs/behavior-changes.md#unique-project-resource-names). This flag is set to `False` by default. With this setting, if two unversioned resources in the same package share the same name, dbt continues to run and raises a [`DuplicateNameDistinctNodeTypesDeprecation`](https://docs.getdbt.com/reference/deprecations.md#duplicatenamedistinctnodetypesdeprecation) warning. When set to `True`, dbt raises a `DuplicateResourceNameError` error. * (Introduced, disabled by default) [`require_ref_searches_node_package_before_root`](https://docs.getdbt.com/reference/global-configs/behavior-changes.md#package-ref-search-order). This flag is set to `False` by default. With this setting, when dbt resolves a `ref()` in a package model, it searches for the referenced model in the root project *first*, then in the package where the model is defined. When set to `True`, dbt searches the package where the model is defined *before* searching the root project. ##### Deprecation warnings enabled by default[​](#deprecation-warnings-enabled-by-default "Direct link to Deprecation warnings enabled by default") Deprecation warnings from JSON schema validation are now enabled by default when validating your YAML configuration files (such as `schema.yml` and `dbt_project.yml`) for projects running using the Snowflake, Databricks, BigQuery, and Redshift adapters. These warnings help you proactively identify and update deprecated configurations (such as misspelled config keys, deprecated properties, or incorrect data types). You'll see the following deprecation warnings by default: * [CustomKeyInConfigDeprecation](https://docs.getdbt.com/reference/deprecations.md#customkeyinconfigdeprecation) * [CustomKeyInObjectDeprecation](https://docs.getdbt.com/reference/deprecations.md#customkeyinobjectdeprecation) * [CustomTopLevelKeyDeprecation](https://docs.getdbt.com/reference/deprecations.md#customtoplevelkeydeprecation) * [MissingPlusPrefixDeprecation](https://docs.getdbt.com/reference/deprecations.md#missingplusprefixdeprecation) * [SourceOverrideDeprecation](https://docs.getdbt.com/reference/deprecations.md#sourceoverridedeprecation) Each deprecation type can be silenced using the [warn-error-options](https://docs.getdbt.com/reference/global-configs/warnings.md#configuration) project configuration. For example, to silence all of the above deprecations within `dbt_project.yml`: dbt\_project.yml ```yml flags: warn_error_options: silence: - CustomTopLevelKeyDeprecation - CustomKeyInConfigDeprecation - CustomKeyInObjectDeprecation - MissingPlusPrefixDeprecation - SourceOverrideDeprecation ``` Alternatively, the `--warn-error-options` flag can be used to silence specific deprecations from the command line: ```sh dbt parse --warn-error-options '{"silence": ["CustomTopLevelKeyDeprecation", "CustomKeyInConfigDeprecation", "CustomKeyInObjectDeprecation", "MissingPlusPrefixDeprecation", "SourceOverrideDeprecation"]}' ``` To silence *all* deprecation warnings within `dbt_project.yml`: dbt\_project.yml ```yml flags: warn_error_options: silence: - Deprecations ``` Similarly, all deprecation warnings can be silenced via the `--warn-error-options` command line flag: ```sh dbt parse --warn-error-options '{"silence": ["Deprecations"]}' ``` #### Adapter-specific features and functionalities[​](#adapter-specific-features-and-functionalities "Direct link to Adapter-specific features and functionalities") ##### Snowflake[​](#snowflake "Direct link to Snowflake") * The Snowflake adapter supports basic table materialization on Iceberg tables registered in a Glue catalog through a [catalog-linked database](https://docs.snowflake.com/en/user-guide/tables-iceberg-catalog-linked-database#label-catalog-linked-db-create). For more information, see [Glue Data Catalog](https://docs.getdbt.com/docs/mesh/iceberg/snowflake-iceberg-support.md#external-catalogs). * The `cluster_by` configuration is supported in dynamic tables. For more information, see [Dynamic table clustering](https://docs.getdbt.com/reference/resource-configs/snowflake-configs.md#dynamic-table-clustering). * The `immutable_where` configuration is supported in dynamic tables. For more information, see [Snowflake configurations](https://docs.getdbt.com/reference/resource-configs/snowflake-configs.md#immutable-where). ##### BigQuery[​](#bigquery "Direct link to BigQuery") * To improve performance, dbt can issue a single batch query when calculating source freshness through metadata, instead of executing one query per source. To enable this feature, set [bigquery\_use\_batch\_source\_freshness](https://docs.getdbt.com/reference/global-configs/bigquery-changes.md#the-bigquery_use_batch_source_freshness-flag) to `True`. ##### Redshift[​](#redshift "Direct link to Redshift") * The [`redshift_skip_autocommit_transaction_statements`](https://docs.getdbt.com/reference/global-configs/redshift-changes.md#the-redshift_skip_autocommit_transaction_statements-flag) flag is now `True` by default. When `autocommit=True` (the default since dbt-redshift 1.5), dbt now skips sending unnecessary `BEGIN`/`COMMIT`/`ROLLBACK` statements, improving performance by reducing round trips to Redshift. To preserve the legacy behavior, set the flag to `False`. ##### Spark[​](#spark "Direct link to Spark") * New profile configurations have been added to enhance [retry handling for PyHive connections](https://docs.getdbt.com/reference/resource-configs/spark-configs.md#retry-handling-for-pyhive-connections): * `poll_interval`: Controls how frequently the adapter polls the Thrift server to check if an async query has completed. * `query_timeout`: Adds an overall timeout (in seconds) for query execution. If a query exceeds the set duration during polling, it raises a `DbtRuntimeError`. This helps prevent indefinitely hanging queries. * `query_retries`: Handles connection loss during query polling by automatically retrying. #### Quick hits[​](#quick-hits "Direct link to Quick hits") You will find these quick hits in dbt Core v1.11: * The `dbt ls` command can now write out nested keys. This makes it easier to debug and troubleshoot your project. Example: `dbt ls --output json --output-keys config.materialized` * Manifest metadata now includes `run_started_at`, providing better tracking of when dbt runs were initiated. * When a model is disabled, unit tests for that model are automatically disabled as well. * You can use the new [`config.meta_get()`](https://docs.getdbt.com/reference/dbt-jinja-functions/config.md#configmeta_get) and [`config.meta_require()`](https://docs.getdbt.com/reference/dbt-jinja-functions/config.md#configmeta_require) functions to access custom configurations stored under `meta`. These functions have been backported to dbt Core v1.10. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Upgrading to v1.12 Beta coming soon dbt Core v1.12 is not yet available in beta. We will update this guide when it becomes available. #### Resources[​](#resources "Direct link to Resources") * dbt Core v1.12 changelog (coming soon) * [dbt Core CLI Installation guide](https://docs.getdbt.com/docs/local/install-dbt.md) * [Cloud upgrade guide](https://docs.getdbt.com/docs/dbt-versions/upgrade-dbt-version-in-cloud.md#release-tracks) #### What to know before upgrading[​](#what-to-know-before-upgrading "Direct link to What to know before upgrading") dbt Labs is committed to providing backward compatibility for all versions 1.x. Any behavior changes will be accompanied by a [behavior change flag](https://docs.getdbt.com/reference/global-configs/behavior-changes.md#behavior-change-flags) to provide a migration window for existing projects. If you encounter an error upon upgrading, please let us know by [opening an issue](https://github.com/dbt-labs/dbt-core/issues/new). dbt provides the functionality from new versions of dbt Core via [release tracks](https://docs.getdbt.com/docs/dbt-versions/cloud-release-tracks.md) with automatic upgrades. If you have selected the **Latest** release track in dbt, you already have access to all the features, fixes, and other functionality included in the latest dbt Core version! If you have selected the **Compatible** release track, you will have access to the next monthly **Compatible** release after the dbt Core v1.12 final release. We continue to recommend explicitly installing both `dbt-core` and `dbt-`. This may become required for a future version of dbt. For example: ```sql python3 -m pip install dbt-core dbt-snowflake ``` #### New and changed features and functionality[​](#new-and-changed-features-and-functionality "Direct link to New and changed features and functionality") ##### Support for `vars.yml` [Beta](https://docs.getdbt.com/docs/dbt-versions/product-lifecycles "Go to https://docs.getdbt.com/docs/dbt-versions/product-lifecycles")[​](#support-for-varsyml- "Direct link to support-for-varsyml-") You can use the [`vars.yml`](https://docs.getdbt.com/docs/build/project-variables.md#defining-variables-in-varsyml) file, located at the project root, to define project variables. This keeps variable definitions in one place and helps simplify `dbt_project.yml`. Variables defined in `vars.yml` are parsed *before* `dbt_project.yml`, so you can reference them in `dbt_project.yml` using `{{ var('...') }}`. You can continue to define variables in `dbt_project.yml` as before, but you cannot define variables in both files. For details and precedence, refer to [Project variables](https://docs.getdbt.com/docs/build/project-variables.md). ##### Managing changes to legacy behaviors[​](#managing-changes-to-legacy-behaviors "Direct link to Managing changes to legacy behaviors") dbt Core v1.12 introduces new flags for [managing changes to legacy behaviors](https://docs.getdbt.com/reference/global-configs/behavior-changes.md). You may opt into recently introduced changes (disabled by default), or opt out of mature changes (enabled by default), by setting `True` / `False` values, respectively, for `flags` in `dbt_project.yml`. You can read more about each of these behavior changes in the following links: * (Introduced, disabled by default) [`require_valid_schema_from_generate_schema_name`](https://docs.getdbt.com/reference/global-configs/behavior-changes.md#valid-schema-from-generate_schema_name). This flag is set to `False` by default. With this setting, dbt raises the [`GenerateSchemaNameNullValueDeprecation`](https://docs.getdbt.com/reference/deprecations.md#generateschemanamenullvaluedeprecation) warning when a custom `generate_schema_name` macro returns a `null` value. When set to `True`, dbt enforces stricter validation and raises a parsing error instead of a warning. * (Introduced, disabled by default) [`require_sql_header_in_test_configs`](https://docs.getdbt.com/reference/global-configs/behavior-changes.md#sql_header-in-data-tests). When set to `True`, you can set [`sql_header`](https://docs.getdbt.com/reference/resource-configs/sql_header.md) in the `config` of a generic data test at the model or column level in your `properties.yml` file. For more information, refer to [Data test configurations](https://docs.getdbt.com/reference/data-test-configs.md). #### Adapter-specific features and functionalities[​](#adapter-specific-features-and-functionalities "Direct link to Adapter-specific features and functionalities") ##### Snowflake[​](#snowflake "Direct link to Snowflake") * You can create Snowflake dynamic tables as transient (no [Fail-safe period](https://docs.snowflake.com/en/user-guide/data-failsafe)) by setting the [`transient`](https://docs.getdbt.com/reference/resource-configs/snowflake-configs.md#transient-dynamic-tables) config on models. When `transient` is not set on a model, the [`snowflake_default_transient_dynamic_tables`](https://docs.getdbt.com/reference/global-configs/snowflake-changes.md#the-snowflake_default_transient_dynamic_tables-flag) flag controls the default. Set this flag to `True` to make all dynamic tables transient by default. ##### BigQuery[​](#bigquery "Direct link to BigQuery") * Added the [`bigquery_reject_wildcard_metadata_source_freshness`](https://docs.getdbt.com/reference/global-configs/bigquery-changes.md#the-bigquery_reject_wildcard_metadata_source_freshness-flag) flag. When you set this flag to `True`, dbt raises a `DbtRuntimeError` if you run metadata-based source freshness checks with wildcard table identifiers (for example, `events_*`), preventing incorrect freshness results. * You can configure BigQuery job link logging with `job_link_info_level_log`. By default, dbt logs job links at the debug level. To log job links at the info level, set `job_link_info_level_log: true` in your BigQuery profile. This makes job links visible in dbt logs for easier access to the BigQuery console. For more information, see [BigQuery setup](https://docs.getdbt.com/docs/local/connect-data-platform/bigquery-setup.md#job_link_info_level_log). * You can set `job_execution_timeout_seconds` per model, snapshot, seed, or test, in addition to the profile-level configuration. The per-resource value takes precedence over the default value set in the profile level. For more information, refer to [BigQuery setup](https://docs.getdbt.com/docs/local/connect-data-platform/bigquery-setup.md#job_execution_timeout_seconds). ##### Redshift[​](#redshift "Direct link to Redshift") * Added support for the `query_group` session parameter, allowing dbt to tag queries for Redshift Workload Manager routing and query logging. When configured in a profile, dbt sets `query_group` when opening a connection and the value applies for the duration of that session. You can also configure `query_group` at the model level to temporarily override the default value for a specific model, and dbt reverts the value at the end of model materialization. For more information, see [Redshift configurations](https://docs.getdbt.com/reference/resource-configs/redshift-configs.md#session-configuration). #### Quick hits[​](#quick-hits "Direct link to Quick hits") **Coming soon** #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Upgrading to v1.2 ##### Resources[​](#resources "Direct link to Resources") * [Changelog](https://github.com/dbt-labs/dbt-core/blob/1.2.latest/CHANGELOG.md) * [dbt Core CLI Installation guide](https://docs.getdbt.com/docs/local/install-dbt.md) * [Cloud upgrade guide](https://docs.getdbt.com/docs/dbt-versions/upgrade-dbt-version-in-cloud.md) #### What to know before upgrading[​](#what-to-know-before-upgrading "Direct link to What to know before upgrading") There are no breaking changes for code in dbt projects and packages. We are committed to providing backwards compatibility for all versions 1.x. If you encounter an error upon upgrading, please let us know by [opening an issue](https://github.com/dbt-labs/dbt-core/issues/new). ##### For consumers of dbt artifacts (metadata)[​](#for-consumers-of-dbt-artifacts-metadata "Direct link to For consumers of dbt artifacts (metadata)") The manifest schema version has been updated to `v6`. The relevant changes are: * Change to `config` default, which includes a new `grants` property with default value `{}` * Addition of a `metrics` property, to any node which could reference metrics using the `metric()` function For users of [state-based selection](https://docs.getdbt.com/reference/node-selection/syntax.md#about-node-selection): This release also includes new logic declaring forwards compatibility for older manifest versions. While running dbt Core v1.2, it should be possible to use `state:modified --state ...` selection against a manifest produced by dbt Core v1.0 or v1.1. #### For maintainers of adapter plugins[​](#for-maintainers-of-adapter-plugins "Direct link to For maintainers of adapter plugins") See GitHub discussion [dbt-labs/dbt-core#5468](https://github.com/dbt-labs/dbt-core/discussions/5468) for detailed information #### New and changed functionality[​](#new-and-changed-functionality "Direct link to New and changed functionality") * **[Grants](https://docs.getdbt.com/reference/resource-configs/grants.md)** are natively supported in `dbt-core` for the first time. That support extends to all standard materializations, and the most popular adapters. If you already use hooks to apply simple grants, we encourage you to use built-in `grants` to configure your models, seeds, and snapshots instead. This will enable you to [DRY](https://en.wikipedia.org/wiki/Don%27t_repeat_yourself) up your duplicated or boilerplate code. * **[Metrics](https://docs.getdbt.com/docs/build/build-metrics-intro.md)** now support an `expression` type (metrics-on-metrics), as well as a `metric()` function to use when referencing metrics from within models, macros, or `expression`-type metrics. For more information on how to use expression metrics, check out the [**`dbt_metrics` package**](https://github.com/dbt-labs/dbt_metrics) * **[dbt-Jinja functions](https://docs.getdbt.com/reference/dbt-jinja-functions-context-variables.md)** now include the [`itertools` Python module](https://docs.getdbt.com/reference/dbt-jinja-functions/modules.md#itertools), as well as the [set](https://docs.getdbt.com/reference/dbt-jinja-functions/set.md) and [zip](https://docs.getdbt.com/reference/dbt-jinja-functions/zip.md) functions. * **[Node selection](https://docs.getdbt.com/reference/node-selection/syntax.md)** includes a [file selection method](https://docs.getdbt.com/reference/node-selection/methods.md#file) (`-s model.sql`), and [yaml selector](https://docs.getdbt.com/reference/node-selection/yaml-selectors.md) inheritance. * **[Global configs](https://docs.getdbt.com/reference/global-configs/about-global-configs.md)** now include CLI flag and environment variable settings for [`target-path`](https://docs.getdbt.com/reference/global-configs/json-artifacts.md) and [`log-path`](https://docs.getdbt.com/reference/global-configs/logs.md), which can be used to override the values set in `dbt_project.yml` ##### Specific adapters[​](#specific-adapters "Direct link to Specific adapters") * [Postgres](https://docs.getdbt.com/docs/local/connect-data-platform/postgres-setup.md) and [Redshift](https://docs.getdbt.com/docs/local/connect-data-platform/redshift-setup.md) profiles support a `retries` config, if dbt encounters an operational error or timeout when opening a connection. The default is 1 retry. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Upgrading to v1.3 ##### Resources[​](#resources "Direct link to Resources") * [Changelog](https://github.com/dbt-labs/dbt-core/blob/1.3.latest/CHANGELOG.md) * [dbt Core CLI Installation guide](https://docs.getdbt.com/docs/local/install-dbt.md) * [Cloud upgrade guide](https://docs.getdbt.com/docs/dbt-versions/upgrade-dbt-version-in-cloud.md) #### What to know before upgrading[​](#what-to-know-before-upgrading "Direct link to What to know before upgrading") We are committed to providing backward compatibility for all versions 1.x. If you encounter an error upon upgrading, please let us know by [opening an issue](https://github.com/dbt-labs/dbt-core/issues/new). There are three changes in dbt Core v1.3 that may require action from some users: 1. If you have a `profiles.yml` file located in the root directory where you run dbt, dbt will start preferring that profiles file over the default location on your machine. [You can read more details here](https://docs.getdbt.com/docs/local/profiles.yml.md#advanced-customizing-a-profile-directory). 2. If you already have `.py` files defined in the `model-paths` of your dbt project, dbt will start trying to read them as Python models. You can use [the new `.dbtignore` file](https://docs.getdbt.com/reference/dbtignore.md) to tell dbt to ignore those files. 3. If you have custom code accessing the `raw_sql` property of models (with the [model](https://docs.getdbt.com/reference/dbt-jinja-functions/model.md) or [graph](https://docs.getdbt.com/reference/dbt-jinja-functions/graph.md) objects), it has been renamed to `raw_code`. This is a change to the manifest contract, described in more detail below. ##### For users of dbt Metrics[​](#for-users-of-dbt-metrics "Direct link to For users of dbt Metrics") The names of metric properties have changed, with backward compatibility. Those changes are: * Renamed `type` to `calculation_method` * Renamed `sql` to `expression` * Renamed `expression` calculation method metrics to `derived` calculation method metrics We plan to keep backward compatibility for a full minor version. Defining metrics with the old names will raise an error in dbt Core v1.4. ##### For consumers of dbt artifacts (metadata)[​](#for-consumers-of-dbt-artifacts-metadata "Direct link to For consumers of dbt artifacts (metadata)") We have updated the manifest schema version to `v7`. This includes the changes to metrics described above and a few other changes related to the addition of Python models: * Renamed `raw_sql` to `raw_code` * Renamed `compiled_sql` to `compiled_code` * A new top-level node property, `language` (`'sql'` or `'python'`) For users of [state-based selection](https://docs.getdbt.com/reference/node-selection/syntax.md#about-node-selection): This release includes logic providing backward and forward compatibility for older manifest versions. While running dbt Core v1.3, it should be possible to use `state:modified --state ...` selection against a manifest produced by dbt Core v1.0 and higher. ##### For maintainers of adapter plugins[​](#for-maintainers-of-adapter-plugins "Direct link to For maintainers of adapter plugins") GitHub discussion with details: [dbt-labs/dbt-core#6011](https://github.com/dbt-labs/dbt-core/discussions/6011) #### New and changed documentation[​](#new-and-changed-documentation "Direct link to New and changed documentation") * **[Python models](https://docs.getdbt.com/docs/build/python-models.md)** are natively supported in `dbt-core` for the first time, on data warehouses that support Python runtimes. * Updates made to **[Metrics](https://docs.getdbt.com/docs/build/build-metrics-intro.md)** reflect their new syntax for definition, as well as additional properties that are now available. * Plus, a few related updates to **[exposure properties](https://docs.getdbt.com/reference/exposure-properties.md)**: `config`, `label`, and `name` validation. * **[Custom `node_color`](https://docs.getdbt.com/reference/resource-configs/docs.md)** in `dbt-docs`. For the first time, you can control the colors displayed in dbt's DAG. Want bronze, silver, and gold layers? It's at your fingertips. * **[`Profiles.yml`](https://docs.getdbt.com/docs/local/profiles.yml.md#advanced-customizing-a-profile-directory)** search order now looks in the current working directory before `~/.dbt`. ##### Quick hits[​](#quick-hits "Direct link to Quick hits") * **["Full refresh"](https://docs.getdbt.com/reference/resource-configs/full_refresh.md)** flag supports a short name, `-f`. * **[The "config" selection method](https://docs.getdbt.com/reference/node-selection/methods.md#config)** supports boolean and list config values, in addition to strings. * Two new dbt-Jinja context variables for accessing invocation metadata: [`invocation_args_dict`](https://docs.getdbt.com/reference/dbt-jinja-functions/flags.md#invocation_args_dict) and [`dbt_metadata_envs`](https://docs.getdbt.com/reference/dbt-jinja-functions/env_var.md#custom-metadata). #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Upgrading to v1.4 ##### Resources[​](#resources "Direct link to Resources") * [Changelog](https://github.com/dbt-labs/dbt-core/blob/1.4.latest/CHANGELOG.md) * [dbt Core CLI Installation guide](https://docs.getdbt.com/docs/local/install-dbt.md) * [Cloud upgrade guide](https://docs.getdbt.com/docs/dbt-versions/upgrade-dbt-version-in-cloud.md) **Final release:** January 25, 2023 dbt Core v1.4 is a "behind-the-scenes" release. We've been hard at work rebuilding `dbt-core` internals on top of more-solid foundations, to enable an exciting year of new feature development. Check out the [v1.5 milestone](https://github.com/dbt-labs/dbt-core/milestone/82) in GitHub for a preview of what's planned for April. #### What to know before upgrading[​](#what-to-know-before-upgrading "Direct link to What to know before upgrading") dbt Labs is committed to providing backward compatibility for all versions 1.x. If you encounter an error upon upgrading, please let us know by [opening an issue](https://github.com/dbt-labs/dbt-core/issues/new). ##### For consumers of dbt artifacts (metadata)[​](#for-consumers-of-dbt-artifacts-metadata "Direct link to For consumers of dbt artifacts (metadata)") The manifest schema version has updated to `v8`. These changes are relevant for people who parse or analyze the contents of the `manifest.json` file, or who have custom code accessing the [`model`](https://docs.getdbt.com/reference/dbt-jinja-functions/model.md) or [`graph`](https://docs.getdbt.com/reference/dbt-jinja-functions/graph.md) variables, for example, `{{ model.root_path }}`. Relevant changes: * The `root_path` attribute has been removed for non-seed nodes to reduce duplicative information. * Unused attributes have been removed from seed nodes (including `depends_on.nodes`), and from `macros` (including `tags`). * The `unique_id` of docs blocks now start with `doc` for consistency with other resource types. ##### For maintainers of adapter plugins[​](#for-maintainers-of-adapter-plugins "Direct link to For maintainers of adapter plugins") > **TL;DR** Not much heavy lifting for this minor version. We anticipate more work for `1.5.0`. We plan to release betas early & often, and provide guidance on upgrading. The high-level changes are: * Add support for Python 3.11 * Rename/replace deprecated exception functions * Add support for Incremental Predicates (if applicable) * Make use of new adapter-zone tests For more detailed information and to ask any questions, please visit [dbt Core/discussions/6624](https://github.com/dbt-labs/dbt-core/discussions/6624). #### New and changed documentation[​](#new-and-changed-documentation "Direct link to New and changed documentation") * [**Events and structured logging**](https://docs.getdbt.com/reference/events-logging.md): dbt's event system got a makeover. Expect more consistency in the availability and structure of information, backed by type-safe event schemas. * [**Python support**](https://docs.getdbt.com/faqs/Core/install-python-compatibility.md): Python 3.11 was released in October 2022. It is officially supported in dbt-core v1.4, although full support depends also on the adapter plugin for your data platform. According to the Python maintainers, "Python 3.11 is between 10-60% faster than Python 3.10." We encourage you to try [`dbt parse`](https://docs.getdbt.com/reference/commands/parse.md) with dbt Core v1.4 + Python 3.11, and compare the timing with dbt Core v1.3 + Python 3.10. Let us know what you find! * [**Metrics**](https://docs.getdbt.com/docs/build/build-metrics-intro.md): `time_grain` is optional, to provide better ergonomics around metrics that aren't time-bound. * **dbt-Jinja context:** The [local\_md5](https://docs.getdbt.com/reference/dbt-jinja-functions/local_md5.md) context method will calculate an [MD5 hash](https://en.wikipedia.org/wiki/MD5) for use *within* dbt. (Not to be confused with SQL md5!) * [**Exposures**](https://docs.getdbt.com/docs/build/exposures.md) can now depend on `metrics`. * [**"Tarball" packages**](https://docs.getdbt.com/docs/build/packages.md#internally-hosted-tarball-URL): Some organizations have security requirements to pull resources only from internal services. To address the need to install packages from hosted environments (such as Artifactory or cloud storage buckets), it's possible to specify any accessible URL where a compressed dbt package can be downloaded. * [**Granular "warn error" configuration**](https://docs.getdbt.com/reference/global-configs/warnings.md): Thanks to a full cleanup and consolidation of warning and exception classes within `dbt-core`, it is now possible to define a more granular `--warn-error-options` configuration that specifies the exact warnings you do (or don't) want dbt to treat as errors. * [**Deferral**](https://docs.getdbt.com/reference/node-selection/defer.md#favor-state) supports an optional configuration, `--favor-state`. ##### Advanced configurations for incremental models[​](#advanced-configurations-for-incremental-models "Direct link to Advanced configurations for incremental models") * [**`incremental_predicates`** config](https://docs.getdbt.com/docs/build/incremental-strategy.md#about-incremental_predicates) is now supported on the most popular adapters, enabling greater flexibility when tuning performance in `merge` and `delete` statements against large datasets. * **BigQuery:** The `insert_overwrite` incremental strategy supports a new (old) mechanism, [`time_ingestion_partitioning`](https://docs.getdbt.com/reference/resource-configs/bigquery-configs.md#partitioning-by-an-ingestion-date-or-timestamp) + [`copy_partitions`](#copying-ingestion-time-partitions), that can yield significant savings in cost + time for large datasets. ##### Updates to Python models[​](#updates-to-python-models "Direct link to Updates to Python models") * Python models are [configured to materialize](https://docs.getdbt.com/docs/build/python-models.md) as `table` by default. * Python models [running on Snowpark](https://docs.getdbt.com/docs/build/python-models.md) will use "anonymous" stored procedures by default, enabling a small speedup and a cleaner query history. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Upgrading to v1.5 dbt Core v1.5 is a feature release, with two significant additions: 1. [**Model governance**](https://docs.getdbt.com/docs/mesh/govern/about-model-governance.md) — access, contracts, versions — the first phase of [multi-project deployments](https://github.com/dbt-labs/dbt-core/discussions/6725) 2. A Python entry point for [**programmatic invocations**](https://docs.getdbt.com/reference/programmatic-invocations.md), at parity with the CLI #### Resources[​](#resources "Direct link to Resources") * [Changelog](https://github.com/dbt-labs/dbt-core/blob/1.5.latest/CHANGELOG.md) * [dbt Core CLI Installation guide](https://docs.getdbt.com/docs/local/install-dbt.md) * [Cloud upgrade guide](https://docs.getdbt.com/docs/dbt-versions/upgrade-dbt-version-in-cloud.md) * [Release schedule](https://github.com/dbt-labs/dbt-core/issues/6715) #### What to know before upgrading[​](#what-to-know-before-upgrading "Direct link to What to know before upgrading") dbt Labs is committed to providing backward compatibility for all versions 1.x, with the exception of any changes explicitly mentioned below. If you encounter an error upon upgrading, please let us know by [opening an issue](https://github.com/dbt-labs/dbt-core/issues/new). ##### Behavior changes[​](#behavior-changes "Direct link to Behavior changes") Why changes to previous behavior? This release includes significant new features, and rework to `dbt-core`'s CLI and initialization flow. As part of refactoring its internals from [`argparse`](https://docs.python.org/3/library/argparse.html) to [`click`](https://click.palletsprojects.com), we made a handful of changes to runtime configuration. The net result of these changes is more consistent and practical configuration options, and a more legible codebase. ***Wherever possible, we will provide backward compatibility and deprecation warnings for at least one minor version before actually removing the old functionality.*** In those cases, we still reserve the right to fully remove backwards compatibility for deprecated functionality in a future v1.x minor version of `dbt-core`. Setting `log-path` and `target-path` in `dbt_project.yml` has been deprecated for consistency with other invocation-specific runtime configs ([dbt-core#6882](https://github.com/dbt-labs/dbt-core/issues/6882)). We recommend setting via env var or CLI flag instead. The `dbt list` command will now include `INFO` level logs by default. Previously, the `list` command (and *only* the `list` command) had `WARN`-level stdout logging, to support piping its results to [`jq`](https://jqlang.github.io/jq/manual/), a file, or another process. To achieve that goal, you can use either of the following parameters: * `dbt list --log-level warn` (recommended; equivalent to previous default) * `dbt list --quiet` (suppresses all logging less than ERROR level, except for "printed" messages and `list` output) The following env vars have been renamed, for consistency with the convention followed by all other parameters: * `DBT_DEFER_TO_STATE` → `DBT_DEFER` * `DBT_FAVOR_STATE_MODE` → `DBT_FAVOR_STATE` * `DBT_NO_PRINT` → `DBT_PRINT` * `DBT_ARTIFACT_STATE_PATH` → `DBT_STATE` As described in [dbt-core#7169](https://github.com/dbt-labs/dbt-core/pull/7169), command-line parameters that could be silent before will no longer be silent. See [dbt-labs/dbt-core#7158](https://github.com/dbt-labs/dbt-core/issues/7158) and [dbt-labs/dbt-core#6800](https://github.com/dbt-labs/dbt-core/issues/6800) for more examples of the behavior we are fixing. An empty `tests:` key in a yaml file will now raise a validation error, instead of being silently skipped. You can resolve this by removing the empty `tests:` key, or by setting it to an empty list explicitly: ```yml # ❌ this will raise an error models: - name: my_model tests: config: ... # ✅ this is fine models: - name: my_model tests: [] # todo! add tests later config: ... ``` Some options that could previously be specified *after* a subcommand can now only be specified *before*. This includes the inverse of the option, `--write-json` and `--no-write-json`, for example. The list of affected options are: List of affected options ```bash --cache-selected-only | --no-cache-selected-only --debug, -d | --no-debug --deprecated-print | --deprecated-no-print --enable-legacy-logger | --no-enable-legacy-logger --fail-fast, -x | --no-fail-fast --log-cache-events | --no-log-cache-events --log-format --log-format-file --log-level --log-level-file --log-path --macro-debugging | --no-macro-debugging --partial-parse | --no-partial-parse --partial-parse-file-path --populate-cache | --no-populate-cache --print | --no-print --printer-width --quiet, -q | --no-quiet --record-timing-info, -r --send-anonymous-usage-stats | --no-send-anonymous-usage-stats --single-threaded | --no-single-threaded --static-parser | --no-static-parser --use-colors | --no-use-colors --use-colors-file | --no-use-colors-file --use-experimental-parser | --no-use-experimental-parser --version, -V, -v --version-check | --no-version-check --warn-error --warn-error-options --write-json | --no-write-json ``` Additionally, some options that could be previously specified *before* a subcommand can now only be specified *after*. Any option *not* in the above list must appear *after* the subcommand from v1.5 and later. For example, `--profiles-dir`. The built-in [collect\_freshness](https://github.com/dbt-labs/dbt-core/blob/1.5.latest/core/dbt/include/global_project/macros/adapters/freshness.sql) macro now returns the entire `response` object, instead of just the `table` result. If you're using a custom override for `collect_freshness`, make sure you're also returning the `response` object; otherwise, some of your dbt commands will never finish. For example: ```sql {{ return(load_result('collect_freshness')) }} ``` Finally: The [built-in `generate_alias_name` macro](https://github.com/dbt-labs/dbt-core/blob/1.5.latest/core/dbt/include/global_project/macros/get_custom_name/get_custom_alias.sql) now includes logic to handle versioned models. If your project has reimplemented the `generate_alias_name` macro with custom logic, and you want to start using [model versions](https://docs.getdbt.com/docs/mesh/govern/model-versions.md), you will need to update the logic in your macro. Note that, while this is **not** a prerequisite for upgrading to v1.5—only for using the new feature—we recommend that you do this during your upgrade, whether you're planning to use model versions tomorrow or far in the future. Likewise, if your project has reimplemented the `ref` macro with custom logic, you will need to update the logic in your macro as described [here](https://docs.getdbt.com/reference/dbt-jinja-functions/builtins.md). ##### For consumers of dbt artifacts (metadata)[​](#for-consumers-of-dbt-artifacts-metadata "Direct link to For consumers of dbt artifacts (metadata)") The [manifest](https://docs.getdbt.com/reference/artifacts/manifest-json.md) schema version will be updated to `v9`. Specific changes: * Addition of `groups` as a top-level key * Addition of `access`, `constraints`, `version`, `latest_version` as a top-level node attributes for models * Addition of `constraints` as a column-level attribute * Addition of `group` and `contract` as node configs * To support model versions, the type of `refs` has changed from `List[List[str]]` to `List[RefArgs]`, with nested keys `name: str`, `package: Optional[str] = None`, and `version: Union[str, float, NoneType] = None)`. ##### For maintainers of adapter plugins[​](#for-maintainers-of-adapter-plugins "Direct link to For maintainers of adapter plugins") For more detailed information and to ask questions, please read and comment on the GH discussion: [dbt-labs/dbt-core#7213](https://github.com/dbt-labs/dbt-core/discussions/7213). #### New and changed documentation[​](#new-and-changed-documentation "Direct link to New and changed documentation") ##### Model governance[​](#model-governance "Direct link to Model governance") The first phase of supporting dbt deployments at scale, across multiple projects with clearly defined ownership and interface boundaries. [Read about model governance](https://docs.getdbt.com/docs/mesh/govern/about-model-governance.md), all of which is new in v1.5. ##### Revamped CLI[​](#revamped-cli "Direct link to Revamped CLI") Compile and preview dbt models and `--inline` dbt-SQL queries on the CLI using: * [`dbt compile`](https://docs.getdbt.com/reference/commands/compile.md) * [`dbt show`](https://docs.getdbt.com/reference/commands/show.md) (new!) [Node selection methods](https://docs.getdbt.com/reference/node-selection/methods.md) can use Unix-style wildcards to glob nodes matching a pattern: ```text dbt ls --select "tag:team_*" ``` And (!): a first-ever entry point for [programmatic invocations](https://docs.getdbt.com/reference/programmatic-invocations.md), at parity with CLI commands. Run `dbt --help` to see new & improved help documentation :) ##### Quick hits[​](#quick-hits "Direct link to Quick hits") * The [`version: 2` top-level key](https://docs.getdbt.com/reference/project-configs/version.md) is now **optional** in all YAML files. Also, the [`config-version: 2`](https://docs.getdbt.com/reference/project-configs/config-version.md) and `version:` top-level keys are now optional in `dbt_project.yml` files. * [Events and logging](https://docs.getdbt.com/reference/events-logging.md): Added `node_relation` (`database`, `schema`, `identifier`) to the `node_info` dictionary, available on node-specific events * Support setting `--project-dir` via environment variable: [`DBT_PROJECT_DIR`](https://docs.getdbt.com/reference/dbt_project.yml.md) * More granular configurations for logging (to set [log format](https://docs.getdbt.com/reference/global-configs/logs.md#log-formatting), [log levels](https://docs.getdbt.com/reference/global-configs/logs.md#log-level), and [colorization](https://docs.getdbt.com/reference/global-configs/logs.md#color)) and [cache population](https://docs.getdbt.com/reference/global-configs/cache.md#cache-population) * [dbt overwrites the `manifest.json` file](https://docs.getdbt.com/reference/node-selection/state-comparison-caveats.md#overwrites-the-manifestjson) during parsing, which means when you reference `--state` from the `target/ directory`, you may encounter a warning indicating that the saved manifest wasn't found. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Upgrading to v1.6 dbt Core v1.6 has three significant areas of focus: 1. Next milestone of [multi-project deployments](https://github.com/dbt-labs/dbt-core/discussions/6725): improvements to contracts, groups/access, versions; and building blocks for cross-project `ref` 2. Semantic layer re-launch: dbt Core and [MetricFlow](https://docs.getdbt.com/docs/build/about-metricflow.md) integration 3. Mechanisms to support mature deployment at scale (`dbt clone` and `dbt retry`) #### Resources[​](#resources "Direct link to Resources") * [Changelog](https://github.com/dbt-labs/dbt-core/blob/1.6.latest/CHANGELOG.md) * [dbt Core installation guide](https://docs.getdbt.com/docs/local/install-dbt.md) * [Cloud upgrade guide](https://docs.getdbt.com/docs/dbt-versions/upgrade-dbt-version-in-cloud.md) * [Release schedule](https://github.com/dbt-labs/dbt-core/issues/7481) #### What to know before upgrading[​](#what-to-know-before-upgrading "Direct link to What to know before upgrading") dbt Labs is committed to providing backward compatibility for all versions 1.x, with the exception of any changes explicitly mentioned below. If you encounter an error upon upgrading, please let us know by [opening an issue](https://github.com/dbt-labs/dbt-core/issues/new). ##### Behavior changes[​](#behavior-changes "Direct link to Behavior changes") Action required if your project defines `metrics` The [spec for metrics](https://github.com/dbt-labs/dbt-core/discussions/7456) has changed and now uses [MetricFlow](https://docs.getdbt.com/docs/build/about-metricflow.md). If your dbt project defines metrics, you must migrate to dbt v1.6 because the YAML spec has moved from dbt\_metrics to MetricFlow. Any tests you have won't compile on v1.5 or older. * dbt Core v1.6 does not support Python 3.7, which reached End Of Life on June 23. Support Python versions are 3.8, 3.9, 3.10, and 3.11. * As part of the [dbt Semantic layer](https://docs.getdbt.com/docs/use-dbt-semantic-layer/dbt-sl.md) re-launch, the spec for `metrics` has changed significantly. * The manifest schema version is now v10. * dbt Labs is ending support for Homebrew installation of dbt Core and adapters. See [the discussion](https://github.com/dbt-labs/dbt-core/discussions/8277) for more details. ##### For consumers of dbt artifacts (metadata)[​](#for-consumers-of-dbt-artifacts-metadata "Direct link to For consumers of dbt artifacts (metadata)") The [manifest](https://docs.getdbt.com/reference/artifacts/manifest-json.md) schema version has been updated to `v10`. Specific changes: * Addition of `semantic_models` and changes to `metrics` attributes * Addition of `deprecation_date` as a model property * Addition of `on_configuration_change` as default node configuration (to support materialized views) * Small type changes to `contracts` and `constraints` * Manifest `metadata` includes `project_name` ##### For maintainers of adapter plugins[​](#for-maintainers-of-adapter-plugins "Direct link to For maintainers of adapter plugins") For more detailed information and to ask questions, please read and comment on the GH discussion: [dbt-labs/dbt Core#7958](https://github.com/dbt-labs/dbt-core/discussions/7958). #### New and changed documentation[​](#new-and-changed-documentation "Direct link to New and changed documentation") ##### MetricFlow[​](#metricflow "Direct link to MetricFlow") * [**Build your metrics**](https://docs.getdbt.com/docs/build/build-metrics-intro.md) with MetricFlow, a key component of the Semantic Layer. You can define your metrics and build semantic models with MetricFlow, available on the command line (CLI) for dbt Core v1.6 beta or higher. ##### Materialized views[​](#materialized-views "Direct link to Materialized views") Supported on: * [Postgres](https://docs.getdbt.com/reference/resource-configs/postgres-configs.md#materialized-view) * [Redshift](https://docs.getdbt.com/reference/resource-configs/redshift-configs.md#materialized-view) * [Snowflake](https://docs.getdbt.com/reference/resource-configs/snowflake-configs.md#dynamic-tables) * [Databricks](https://docs.getdbt.com/reference/resource-configs/databricks-configs.md#materialized-views-and-streaming-tables) ##### New commands for mature deployment[​](#new-commands-for-mature-deployment "Direct link to New commands for mature deployment") [`dbt retry`](https://docs.getdbt.com/reference/commands/retry.md) executes the previously run command from the point of failure. Rebuild just the nodes that errored or skipped in a previous run/build/test, rather than starting over from scratch. [`dbt clone`](https://docs.getdbt.com/reference/commands/clone.md) leverages each data platform's functionality for creating lightweight copies of dbt models from one environment into another. Useful when quickly spinning up a new development environment, or promoting specific models from a staging environment into production. ##### Multi-project collaboration[​](#multi-project-collaboration "Direct link to Multi-project collaboration") [**Deprecation date**](https://docs.getdbt.com/reference/resource-properties/deprecation_date.md): Models can declare a deprecation date that will warn model producers and downstream consumers. This enables clear migration windows for versioned models, and provides a mechanism to facilitate removal of immature or little-used models, helping to avoid project bloat. [Model names](https://docs.getdbt.com/faqs/Project/unique-resource-names.md) can be duplicated across different namespaces (projects/packages), so long as they are unique within each project/package. We strongly encourage using [two-argument `ref`](https://docs.getdbt.com/reference/dbt-jinja-functions/ref.md#ref-project-specific-models) when referencing a model from a different package/project. More consistency and flexibility around packages. Resources defined in a package will respect variable and global macro definitions within the scope of that package. * `vars` defined in a package's `dbt_project.yml` are now available in the resolution order when compiling nodes in that package, though CLI `--vars` and the root project's `vars` will still take precedence. See ["Variable Precedence"](https://docs.getdbt.com/docs/build/project-variables.md#variable-precedence) for details. * `generate_x_name` macros (defining custom rules for database, schema, alias naming) follow the same pattern as other "global" macros for package-scoped overrides. See [macro dispatch](https://docs.getdbt.com/reference/dbt-jinja-functions/dispatch.md) for an overview of the patterns that are possible. Closed Beta - dbt Enterprise [**Project dependencies**](https://docs.getdbt.com/docs/mesh/govern/project-dependencies.md): Introduces `dependencies.yml` and dependent `projects` as a feature of dbt Enterprise. Allows enforcing model access (public vs. protected/private) across project/package boundaries. Enables cross-project `ref` of public models, without requiring the installation of upstream source code. ##### Deprecated functionality[​](#deprecated-functionality "Direct link to Deprecated functionality") The ability for installed packages to override built-in materializations without explicit opt-in from the user is being deprecated. * Overriding a built-in materialization from an installed package raises a deprecation warning. * Using a custom materialization from an installed package does not raise a deprecation warning. * Using a built-in materialization package override from the root project via a wrapping materialization is still supported. For example: ```text {% materialization view, default %} {{ return(my_cool_package.materialization_view_default()) }} {% endmaterialization %} ``` ##### Quick hits[​](#quick-hits "Direct link to Quick hits") * [`state:unmodified` and `state:old`](https://docs.getdbt.com/reference/node-selection/methods.md#state) for [MECE](https://en.wikipedia.org/wiki/MECE_principle) stateful selection * [`invocation_args_dict`](https://docs.getdbt.com/reference/dbt-jinja-functions/flags.md#invocation_args_dict) includes full `invocation_command` as string * [`dbt debug --connection`](https://docs.getdbt.com/reference/commands/debug.md) to test just the data platform connection specified in a profile * [`dbt docs generate --empty-catalog`](https://docs.getdbt.com/reference/commands/cmd-docs.md) to skip catalog population while generating docs * [`--defer-state`](https://docs.getdbt.com/reference/node-selection/defer.md) enables more-granular control * [`dbt ls`](https://docs.getdbt.com/reference/commands/list.md) adds the Semantic model selection method to allow for `dbt ls -s "semantic_model:*"` and the ability to execute `dbt ls --resource-type semantic_model`. * Syntax for `DBT_ENV_SECRET_` has changed to `DBT_ENV_SECRET` and no longer requires the closing underscore. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Upgrading to v1.7 #### Resources[​](#resources "Direct link to Resources") * [Changelog](https://github.com/dbt-labs/dbt-core/blob/1.7.latest/CHANGELOG.md) * [dbt Core CLI Installation guide](https://docs.getdbt.com/docs/local/install-dbt.md) * [Cloud upgrade guide](https://docs.getdbt.com/docs/dbt-versions/upgrade-dbt-version-in-cloud.md) * [Release schedule](https://github.com/dbt-labs/dbt-core/issues/8260) #### What to know before upgrading[​](#what-to-know-before-upgrading "Direct link to What to know before upgrading") dbt Labs is committed to providing backward compatibility for all versions 1.x, with the exception of any changes explicitly mentioned below. If you encounter an error upon upgrading, please let us know by [opening an issue](https://github.com/dbt-labs/dbt-core/issues/new). Snowflake column size change [Snowflake plans to increase](https://docs.snowflake.com/en/release-notes/bcr-bundles/un-bundled/bcr-2118) the default column size for string and binary data types in May 2026. `dbt-snowflake` versions below v1.10.6 may fail to build certain incremental models when this change is deployed.  Assess impact and required actions If you're using a `dbt-snowflake` version below v1.10.6 or have not yet migrated to a [release track](https://docs.getdbt.com/docs/dbt-versions/cloud-release-tracks.md) in the dbt platform, your adapter version is incompatible with this change and may fail to build incremental models that meet *both* of the following conditions: * Contain string columns with collation defined * Use the `on_schema_change='sync_all_columns'` config To check whether this change affects your project, run the following [list](https://docs.getdbt.com/reference/commands/list.md) command: ```bash dbt ls -s config.materialized:incremental,config.on_schema_change:sync_all_columns --resource-type model ``` * If the command returns `No nodes selected!`, no action is required. * If the command returns one or more models (for example, `Found 1000 models, 644 macros`), you may be impacted if those models have string columns that don't specify a width. In that case, upgrade to a version that includes the fix: * **dbt Core**: `dbt-snowflake` v1.10.6 or later. For upgrade instructions, see [Upgrade adapters](https://docs.getdbt.com/docs/local/install-dbt.md#upgrade-adapters). * **dbt platform**: Any release track (Latest, Compatible, Extended, or Fallback). * **dbt Fusion engine**: v2.0.0-preview.147 or higher. This ensures your incremental models can safely handle schema changes while maintaining required collation settings. ##### Behavior changes[​](#behavior-changes "Direct link to Behavior changes") dbt Core v1.7 expands the amount of sources you can configure freshness for. Previously, freshness was limited to sources with a `loaded_at_field`; now, freshness can be generated from warehouse metadata tables when available. As part of this change, the `loaded_at_field` is no longer required to generate source freshness. If a source has a `freshness:` block, dbt will attempt to calculate freshness for that source: * If a `loaded_at_field` is provided, dbt will calculate freshness via a select query (previous behavior). * If a `loaded_at_field` is *not* provided, dbt will calculate freshness via warehouse metadata tables when possible (new behavior). This is a relatively small behavior change, but worth calling out in case you notice that dbt is calculating freshness for *more* sources than before. To exclude a source from freshness calculations, explicitly set `freshness: null`. Beginning with v1.7, running [`dbt deps`](https://docs.getdbt.com/reference/commands/deps.md) creates or updates the `package-lock.yml` file in the *project\_root* where `packages.yml` is recorded. The `package-lock.yml` file contains a record of all packages installed and, if subsequent `dbt deps` runs contain no updated packages in `dependencies.yml` or `packages.yml`, dbt-core installs from `package-lock.yml`. To retain the behavior prior to v1.7, there are two main options: 1. Use `dbt deps --upgrade` everywhere `dbt deps` was used previously. 2. Add `package-lock.yml` to your `.gitignore` file. #### New and changed features and functionality[​](#new-and-changed-features-and-functionality "Direct link to New and changed features and functionality") * [`dbt docs generate`](https://docs.getdbt.com/reference/commands/cmd-docs.md) now supports `--select` to generate [catalog metadata](https://docs.getdbt.com/reference/artifacts/catalog-json.md) for a subset of your project. * [Source freshness](https://docs.getdbt.com/docs/deploy/source-freshness.md) can now be generated from warehouse metadata tables. ##### MetricFlow enhancements[​](#metricflow-enhancements "Direct link to MetricFlow enhancements") * Automatically create metrics on measures with [`create_metric: true`](https://docs.getdbt.com/docs/build/semantic-models.md). * Optional [`label`](https://docs.getdbt.com/docs/build/semantic-models.md) in semantic\_models, measures, dimensions, and entities. * New configurations for semantic models - [enable/disable](https://docs.getdbt.com/reference/resource-configs/enabled.md), [group](https://docs.getdbt.com/reference/resource-configs/group.md), and [meta](https://docs.getdbt.com/reference/resource-configs/meta.md). * Support `fill_nulls_with` and `join_to_timespine` for metric nodes. * `saved_queries` extends governance beyond the semantic objects to their consumption. ##### For consumers of dbt artifacts (metadata)[​](#for-consumers-of-dbt-artifacts-metadata "Direct link to For consumers of dbt artifacts (metadata)") * The [manifest](https://docs.getdbt.com/reference/artifacts/manifest-json.md) schema version has been updated to v11. * The [run\_results](https://docs.getdbt.com/reference/artifacts/run-results-json.md) schema version has been updated to v5. * There are a few specific changes to the [catalog.json](https://docs.getdbt.com/reference/artifacts/catalog-json.md): * Added [node attributes](https://docs.getdbt.com/reference/artifacts/run-results-json.md) related to compilation (`compiled`, `compiled_code`, `relation_name`) to the `catalog.json`. * The nodes dictionary in the `catalog.json` can now be "partial" if `dbt docs generate` is run with a selector. ##### Model governance[​](#model-governance "Direct link to Model governance") dbt Core v1.5 introduced model governance which we're continuing to refine. v1.7 includes these additional features and functionality: * **[Breaking change detection](https://docs.getdbt.com/reference/resource-properties/versions.md#detecting-breaking-changes) for models with contracts enforced:** When dbt detects a breaking change to a model with an enforced contract during state comparison, it will now raise an error for versioned models and a warning for models that are not versioned. * **[Set `access` as a config](https://docs.getdbt.com/reference/resource-configs/access.md):** You can now set a model's `access` within config blocks in the model's SQL file or in the project YAML file (`dbt_project.yml`) for an entire subfolder at once. * **[Type aliasing for model contracts](https://docs.getdbt.com/reference/resource-configs/contract.md):** dbt will use each adapter's built-in type aliasing for user-provided data types—meaning you can now write `string` always, and dbt will translate to `text` on Postgres/Redshift. This is "on" by default, but you can opt-out. * **[Raise warning for numeric types](https://docs.getdbt.com/reference/resource-configs/contract.md):** Because of issues when putting `numeric` in model contracts without considering that default values such as `numeric(38,0)` might round decimals accordingly. dbt will now warn you if it finds a numeric type without specified precision/scale. ##### dbt clean[​](#dbt-clean "Direct link to dbt clean") [dbt clean](https://docs.getdbt.com/reference/commands/clean.md) only cleans paths within the current working directory. The `--no-clean-project-files-only` flag will delete all paths specified in the `clean-targets` section of `dbt_project.yml`, even if they're outside the dbt project. Supported flags: * `--clean-project-files-only` (default) * `--no-clean-project-files-only` ##### Additional attributes in run\_results.json[​](#additional-attributes-in-run_resultsjson "Direct link to Additional attributes in run_results.json") The run\_results.json now includes three attributes related to the `applied` state that complement `unique_id`: * `compiled`: Boolean entry of the node compilation status (`False` after parsing, but `True` after compiling). * `compiled_code`: Rendered string of the code that was compiled (empty after parsing, but full string after compiling). * `relation_name`: The fully-qualified name of the object that was (or will be) created/updated within the database. ##### Deprecated functionality[​](#deprecated-functionality "Direct link to Deprecated functionality") The ability for installed packages to override built-in materializations without explicit opt-in from the user is being deprecated. * Overriding a built-in materialization from an installed package raises a deprecation warning. * Using a custom materialization from an installed package does not raise a deprecation warning. * Using a built-in materialization package override from the root project via a wrapping materialization is still supported. For example: ```text {% materialization view, default %} {{ return(my_cool_package.materialization_view_default()) }} {% endmaterialization %} ``` ##### Quick hits[​](#quick-hits "Direct link to Quick hits") With these quick hits, you can now: * Configure a [`delimiter`](https://docs.getdbt.com/reference/resource-configs/delimiter.md) for a seed file. * Use packages with the same git repo and unique subdirectory. * Access the `date_spine` macro directly from dbt-core (moved over from dbt-utils). * Syntax for `DBT_ENV_SECRET_` has changed to `DBT_ENV_SECRET` and no longer requires the closing underscore. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Upgrading to v1.8 #### Resources[​](#resources "Direct link to Resources") * [Changelog](https://github.com/dbt-labs/dbt-core/blob/1.8.latest/CHANGELOG.md) * [dbt Core CLI Installation guide](https://docs.getdbt.com/docs/local/install-dbt.md) * [Cloud upgrade guide](https://docs.getdbt.com/docs/dbt-versions/upgrade-dbt-version-in-cloud.md) #### What to know before upgrading[​](#what-to-know-before-upgrading "Direct link to What to know before upgrading") dbt Labs is committed to providing backward compatibility for all versions 1.x, except for any changes explicitly mentioned on this page. If you encounter an error upon upgrading, please let us know by [opening an issue](https://github.com/dbt-labs/dbt-core/issues/new). #### Release tracks[​](#release-tracks "Direct link to Release tracks") Starting in 2024, dbt provides the functionality from new versions of dbt Core via [release tracks](https://docs.getdbt.com/docs/dbt-versions/cloud-release-tracks.md) with automatic upgrades. Select a release track in your development, staging, and production [environments](https://docs.getdbt.com/docs/deploy/deploy-environments.md) to access everything in dbt Core v1.8+ and more. To upgrade an environment in the [dbt Admin API](https://docs.getdbt.com/docs/dbt-cloud-apis/admin-cloud-api.md) or [Terraform](https://registry.terraform.io/providers/dbt-labs/dbtcloud/latest), set `dbt_version` to the string `latest`. #### New and changed features and functionality[​](#new-and-changed-features-and-functionality "Direct link to New and changed features and functionality") Features and functionality new in dbt v1.8. ##### New dbt Core adapter installation procedure[​](#new-dbt-core-adapter-installation-procedure "Direct link to New dbt Core adapter installation procedure") Before dbt Core v1.8, whenever you would `pip install` a data warehouse adapter for dbt, `pip` would automatically install `dbt-core` alongside it. The dbt adapter directly depended on components of `dbt-core`, and `dbt-core` depended on the adapter for execution. This bidirectional dependency made it difficult to develop adapters independent of `dbt-core`. Beginning in v1.8, [`dbt-core` and adapters are decoupled](https://github.com/dbt-labs/dbt-adapters/discussions/87). Going forward, your installations should explicitly include *both* `dbt-core` *and* the desired adapter. The new `pip` installation command should look like this: ```shell pip install dbt-core dbt-ADAPTER_NAME ``` For example, you would use the following command if you use Snowflake: ```shell pip install dbt-core dbt-snowflake ``` For the time being, we have maintained install-time dependencies to avoid breaking existing scripts in surprising ways; `pip install dbt-snowflake` will continue to install the latest versions of both `dbt-core` and `dbt-snowflake`. Given that we may remove this implicit dependency in future versions, we strongly encourage you to update install scripts **now**. Snowflake column size change [Snowflake plans to increase](https://docs.snowflake.com/en/release-notes/bcr-bundles/un-bundled/bcr-2118) the default column size for string and binary data types in May 2026. `dbt-snowflake` versions below v1.10.6 may fail to build certain incremental models when this change is deployed.  Assess impact and required actions If you're using a `dbt-snowflake` version below v1.10.6 or have not yet migrated to a [release track](https://docs.getdbt.com/docs/dbt-versions/cloud-release-tracks.md) in the dbt platform, your adapter version is incompatible with this change and may fail to build incremental models that meet *both* of the following conditions: * Contain string columns with collation defined * Use the `on_schema_change='sync_all_columns'` config To check whether this change affects your project, run the following [list](https://docs.getdbt.com/reference/commands/list.md) command: ```bash dbt ls -s config.materialized:incremental,config.on_schema_change:sync_all_columns --resource-type model ``` * If the command returns `No nodes selected!`, no action is required. * If the command returns one or more models (for example, `Found 1000 models, 644 macros`), you may be impacted if those models have string columns that don't specify a width. In that case, upgrade to a version that includes the fix: * **dbt Core**: `dbt-snowflake` v1.10.6 or later. For upgrade instructions, see [Upgrade adapters](https://docs.getdbt.com/docs/local/install-dbt.md#upgrade-adapters). * **dbt platform**: Any release track (Latest, Compatible, Extended, or Fallback). * **dbt Fusion engine**: v2.0.0-preview.147 or higher. This ensures your incremental models can safely handle schema changes while maintaining required collation settings. ##### Unit Tests[​](#unit-tests "Direct link to Unit Tests") Historically, dbt's test coverage was confined to [“data” tests](https://docs.getdbt.com/docs/build/data-tests.md), assessing the quality of input data or resulting datasets' structure. In v1.8, we're introducing native support for [unit testing](https://docs.getdbt.com/docs/build/unit-tests.md). Unit tests validate your SQL modeling logic on a small set of static inputs **before** you materialize your full model in production. They support a test-driven development approach, improving both the efficiency of developers and the reliability of code. Starting from v1.8, when you execute the `dbt test` command, it will run both unit and data tests. Use the [`test_type`](https://docs.getdbt.com/reference/node-selection/methods.md#test_type) method to run only unit or data tests: ```shell dbt test --select "test_type:unit" # run all unit tests dbt test --select "test_type:data" # run all data tests ``` Unit tests are defined in YML files in your `models/` directory and are currently only supported on SQL models. To distinguish between the two, the `tests:` config has been renamed to `data_tests:`. Both are currently supported for backward compatibility. ###### New `data_tests:` syntax[​](#new-data_tests-syntax "Direct link to new-data_tests-syntax") The `tests:` syntax is changing to reflect the addition of unit tests. Start migrating your [data test](https://docs.getdbt.com/docs/build/data-tests.md#new-data_tests-syntax) YML to use `data_tests:` after you upgrade to v1.8 to prevent issues in the future. ```yml models: - name: orders columns: - name: order_id data_tests: - unique - not_null ``` ###### The `--empty` flag[​](#the---empty-flag "Direct link to the---empty-flag") The [`run`](https://docs.getdbt.com/reference/commands/run.md#the-%60--empty%60-flag) and [`build`](https://docs.getdbt.com/reference/commands/build.md#the---empty-flag) commands now support the `--empty` flag for building schema-only dry runs. The `--empty` flag limits the refs and sources to zero rows. dbt will still execute the model SQL against the target data warehouse but will avoid expensive reads of input data. This validates dependencies and ensures your models will build properly. ##### Deprecated functionality[​](#deprecated-functionality "Direct link to Deprecated functionality") The ability for installed packages to override built-in materializations without explicit opt-in from the user is being deprecated. * Overriding a built-in materialization from an installed package raises a deprecation warning. * Using a custom materialization from an installed package does not raise a deprecation warning. * Using a built-in materialization package override from the root project via a wrapping materialization is still supported. For example: ```sql {% materialization view, default %} {{ return(my_cool_package.materialization_view_default()) }} {% endmaterialization %} ``` ##### Managing changes to legacy behaviors[​](#managing-changes-to-legacy-behaviors "Direct link to Managing changes to legacy behaviors") dbt Core v1.8 has introduced flags for [managing changes to legacy behaviors](https://docs.getdbt.com/reference/global-configs/behavior-changes.md). You may opt into recently introduced changes (disabled by default), or opt out of mature changes (enabled by default), by setting `True` / `False` values, respectively, for `flags` in `dbt_project.yml`. You can read more about each of these behavior changes in the following links: * (Mature, enabled by default) [Require explicit package overrides for builtin materializations](https://docs.getdbt.com/reference/global-configs/behavior-changes.md#require_explicit_package_overrides_for_builtin_materializations) * (Introduced, disabled by default) [Require resource names without spaces](https://docs.getdbt.com/reference/global-configs/behavior-changes.md#require_resource_names_without_spaces) * (Introduced, disabled by default) [Run project hooks (`on-run-*`) in the `dbt source freshness` command](https://docs.getdbt.com/reference/global-configs/behavior-changes.md#source_freshness_run_project_hooks) #### Quick hits[​](#quick-hits "Direct link to Quick hits") * Custom defaults of [global config flags](https://docs.getdbt.com/reference/global-configs/about-global-configs.md) should be set in the `flags` dictionary in [`dbt_project.yml`](https://docs.getdbt.com/reference/dbt_project.yml.md), instead of in [`profiles.yml`](https://docs.getdbt.com/docs/local/profiles.yml.md). Support for `profiles.yml` has been deprecated. * New CLI flag [`--resource-type`/`--exclude-resource-type`](https://docs.getdbt.com/reference/global-configs/resource-type.md) for including/excluding resources from dbt `build`, `run`, and `clone`. * To improve performance, dbt now issues a single (batch) query when calculating `source freshness` through metadata, instead of executing a query per source. * Syntax for `DBT_ENV_SECRET_` has changed to `DBT_ENV_SECRET` and no longer requires the closing underscore. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Upgrading to v1.9 #### Resources[​](#resources "Direct link to Resources") * [dbt Core 1.9 changelog](https://github.com/dbt-labs/dbt-core/blob/1.9.latest/CHANGELOG.md) * [dbt Core CLI Installation guide](https://docs.getdbt.com/docs/local/install-dbt.md) * [Cloud upgrade guide](https://docs.getdbt.com/docs/dbt-versions/upgrade-dbt-version-in-cloud.md#release-tracks) #### What to know before upgrading[​](#what-to-know-before-upgrading "Direct link to What to know before upgrading") dbt Labs is committed to providing backward compatibility for all versions 1.x. Any behavior changes will be accompanied by a [behavior change flag](https://docs.getdbt.com/reference/global-configs/behavior-changes.md#behavior-change-flags) to provide a migration window for existing projects. If you encounter an error upon upgrading, please let us know by [opening an issue](https://github.com/dbt-labs/dbt-core/issues/new). Starting in 2024, dbt provides the functionality from new versions of dbt Core via [release tracks](https://docs.getdbt.com/docs/dbt-versions/cloud-release-tracks.md) with automatic upgrades. If you have selected the **Latest** release track in dbt, you already have access to all the features, fixes, and other functionality that is included in dbt Core v1.9! If you have selected the **Compatible** release track, you will have access in the next monthly **Compatible** release after the dbt Core v1.9 final release. For users of dbt Core, since v1.8, we recommend explicitly installing both `dbt-core` and `dbt-`. This may become required for a future version of dbt. For example: ```sql python3 -m pip install dbt-core dbt-snowflake ``` #### New and changed features and functionality[​](#new-and-changed-features-and-functionality "Direct link to New and changed features and functionality") Features and functionality new in dbt v1.9. ##### Microbatch `incremental_strategy`[​](#microbatch-incremental_strategy "Direct link to microbatch-incremental_strategy") info If you use a custom microbatch macro, set the [`require_batched_execution_for_custom_microbatch_strategy`](https://docs.getdbt.com/reference/global-configs/behavior-changes.md#custom-microbatch-strategy) behavior flag in your `dbt_project.yml` to enable batched execution. If you don't have a custom microbatch macro, you don't need to set this flag as dbt will handle microbatching automatically for any model using the microbatch strategy. Incremental models are, and have always been, a *performance optimization* — for datasets that are too large to be dropped and recreated from scratch every time you do a `dbt run`. Learn more about [incremental models](https://docs.getdbt.com/docs/build/incremental-models-overview.md). Historically, managing incremental models involved several manual steps and responsibilities, including: * Add a snippet of dbt code (in an `is_incremental()` block) that uses the already-existing table (`this`) as a rough bookmark, so that only new data gets processed. * Pick one of the strategies for smushing old and new data together (`append`, `delete+insert`, or `merge`). * If anything goes wrong, or your schema changes, you can always "full-refresh", by running the same simple query that rebuilds the whole table from scratch. While this works for many use-cases, there’s a clear limitation with this approach: *Some datasets are just too big to fit into one query.* Starting in Core 1.9, you can use the new [microbatch strategy](https://docs.getdbt.com/docs/build/incremental-microbatch.md#what-is-microbatch-in-dbt) to optimize your largest datasets -- **process your event data in discrete periods with their own SQL queries, rather than all at once.** The benefits include: * Simplified query design: Write your model query for a single batch of data. dbt will use your `event_time`, `lookback`, and `batch_size` configurations to automatically generate the necessary filters for you, making the process more streamlined and reducing the need for you to manage these details. * Independent batch processing: dbt automatically breaks down the data to load into smaller batches based on the specified `batch_size` and processes each batch independently, improving efficiency and reducing the risk of query timeouts. If some of your batches fail, you can use `dbt retry` to load only the failed batches. * Targeted reprocessing: To load a *specific* batch or batches, you can use the CLI arguments `--event-time-start` and `--event-time-end`. * [Automatic parallel batch execution](https://docs.getdbt.com/docs/build/parallel-batch-execution.md): Process multiple batches at the same time, instead of one after the other (sequentially) for faster processing of your microbatch models. dbt intelligently auto-detects if your batches can run in parallel, while also allowing you to manually override parallel execution with the [`concurrent_batches` config](https://docs.getdbt.com/reference/resource-properties/concurrent_batches.md). Currently microbatch is supported on these adapters with more to come: * postgres * redshift * snowflake * bigquery * spark * databricks ##### Snapshots improvements[​](#snapshots-improvements "Direct link to Snapshots improvements") Beginning in dbt Core 1.9, we've streamlined snapshot configuration and added a handful of new configurations to make dbt **snapshots easier to configure, run, and customize.** These improvements include: * New snapshot specification: Snapshots can now be configured in a YAML file, which provides a cleaner and more consistent set up. * New `snapshot_meta_column_names` config: Allows you to customize the names of meta fields (for example, `dbt_valid_from`, `dbt_valid_to`, etc.) that dbt automatically adds to snapshots. This increases flexibility to tailor metadata to your needs. * `target_schema` is now optional for snapshots: When omitted, snapshots will use the schema defined for the current environment. * Standard `schema` and `database` configs supported: Snapshots will now be consistent with other dbt resource types. You can specify where environment-aware snapshots should be stored. * Warning for incorrect `updated_at` data type: To ensure data integrity, you'll see a warning if the `updated_at` field specified in the snapshot configuration is not the proper data type or timestamp. * Set a custom current indicator for the value of `dbt_valid_to`: Use the [`dbt_valid_to_current` config](https://docs.getdbt.com/reference/resource-configs/dbt_valid_to_current.md) to set a custom indicator for the value of `dbt_valid_to` in current snapshot records (like a future date). By default, this value is `NULL`. When configured, dbt will use the specified value instead of `NULL` for `dbt_valid_to` for current records in the snapshot table. * Use the [`hard_deletes`](https://docs.getdbt.com/reference/resource-configs/hard-deletes.md) configuration to get more control on how to handle deleted rows from the source. Supported methods are `ignore` (default), `invalidate` (replaces legacy `invalidate_hard_deletes=true`), and `new_record`. Setting `hard_deletes='new_record'` allows you to track hard deletes by adding a new record when row becomes "deleted" in source. Read more about [Snapshots meta fields](https://docs.getdbt.com/docs/build/snapshots.md#snapshot-meta-fields). To learn how to safely migrate existing snapshots, refer to [Snapshot configuration migration](https://docs.getdbt.com/reference/snapshot-configs.md#snapshot-configuration-migration) for more information. ##### Some `properties` moved to `configs`[​](#some-properties-moved-to-configs "Direct link to some-properties-moved-to-configs") The following `properties` were moved to `configs` in [Core v1.10](https://docs.getdbt.com/docs/dbt-versions/core-upgrade/upgrading-to-v1.10.md) and backported to Core v1.9: * [`freshness`](https://docs.getdbt.com/reference/resource-properties/freshness.md) for sources * [`meta`](https://docs.getdbt.com/reference/resource-configs/meta.md) under `columns` * [`tags`](https://docs.getdbt.com/reference/resource-configs/tags.md) under `columns` ##### `state:modified` improvements[​](#statemodified-improvements "Direct link to statemodified-improvements") We’ve made improvements to `state:modified` behaviors to help reduce the risk of false positives and negatives. Read more about [the `state:modified` behavior flag](#managing-changes-to-legacy-behaviors) that unlocks this improvement: * Added environment-aware enhancements for environments where the logic purposefully differs (for example, materializing as a table in `prod` but a `view` in dev). ##### Managing changes to legacy behaviors[​](#managing-changes-to-legacy-behaviors "Direct link to Managing changes to legacy behaviors") dbt Core v1.9 has a handful of new flags for [managing changes to legacy behaviors](https://docs.getdbt.com/reference/global-configs/behavior-changes.md). You may opt into recently introduced changes (disabled by default), or opt out of mature changes (enabled by default), by setting `True` / `False` values, respectively, for `flags` in `dbt_project.yml`. You can read more about each of these behavior changes in the following links: * (Introduced, disabled by default) [`state_modified_compare_more_unrendered_values`](https://docs.getdbt.com/reference/global-configs/behavior-changes.md#behavior-change-flags). Set to `True` to start persisting `unrendered_database` and `unrendered_schema` configs during source parsing, and do comparison on unrendered values during `state:modified` checks to reduce false positives due to environment-aware logic when selecting `state:modified`. * (Introduced, disabled by default) [`skip_nodes_if_on_run_start_fails` project config flag](https://docs.getdbt.com/reference/global-configs/behavior-changes.md#behavior-change-flags). If the flag is set and **any** `on-run-start` hook fails, mark all selected nodes as skipped. * `on-run-start/end` hooks are **always** run, regardless of whether they passed or failed last time. - (Introduced, disabled by default) [`require_nested_cumulative_type_params`](https://docs.getdbt.com/reference/global-configs/behavior-changes.md#cumulative-metrics). If the flag is set to `True`, users will receive an error instead of a warning if they're not properly formatting cumulative metrics using the new [`cumulative_type_params`](https://docs.getdbt.com/docs/build/cumulative.md#parameters) nesting. - (Introduced, disabled by default) [`require_batched_execution_for_custom_microbatch_strategy`](https://docs.getdbt.com/reference/global-configs/behavior-changes.md#custom-microbatch-strategy). Set to `True` if you use a custom microbatch macro to enable batched execution. If you don't have a custom microbatch macro, you don't need to set this flag as dbt will handle microbatching automatically for any model using the microbatch strategy. #### Adapter-specific features and functionalities[​](#adapter-specific-features-and-functionalities "Direct link to Adapter-specific features and functionalities") Snowflake column size change [Snowflake plans to increase](https://docs.snowflake.com/en/release-notes/bcr-bundles/un-bundled/bcr-2118) the default column size for string and binary data types in May 2026. `dbt-snowflake` versions below v1.10.6 may fail to build certain incremental models when this change is deployed.  Assess impact and required actions If you're using a `dbt-snowflake` version below v1.10.6 or have not yet migrated to a [release track](https://docs.getdbt.com/docs/dbt-versions/cloud-release-tracks.md) in the dbt platform, your adapter version is incompatible with this change and may fail to build incremental models that meet *both* of the following conditions: * Contain string columns with collation defined * Use the `on_schema_change='sync_all_columns'` config To check whether this change affects your project, run the following [list](https://docs.getdbt.com/reference/commands/list.md) command: ```bash dbt ls -s config.materialized:incremental,config.on_schema_change:sync_all_columns --resource-type model ``` * If the command returns `No nodes selected!`, no action is required. * If the command returns one or more models (for example, `Found 1000 models, 644 macros`), you may be impacted if those models have string columns that don't specify a width. In that case, upgrade to a version that includes the fix: * **dbt Core**: `dbt-snowflake` v1.10.6 or later. For upgrade instructions, see [Upgrade adapters](https://docs.getdbt.com/docs/local/install-dbt.md#upgrade-adapters). * **dbt platform**: Any release track (Latest, Compatible, Extended, or Fallback). * **dbt Fusion engine**: v2.0.0-preview.147 or higher. This ensures your incremental models can safely handle schema changes while maintaining required collation settings. ##### Redshift[​](#redshift "Direct link to Redshift") * Support IAM Role auth ##### Snowflake[​](#snowflake "Direct link to Snowflake") * Iceberg Table Format — Support will be available on three out-of-the-box materializations: table, incremental, dynamic tables. * Breaking change — When upgrading from dbt 1.8 to 1.9 `{{ target.account }}` replaces underscores with dashes. For example, if the `target.account` is set to `sample_company`, then the compiled code now generates `sample-company`. [Refer to the `dbt-snowflake` issue](https://github.com/dbt-labs/dbt-snowflake/issues/1286) for more information. ##### Bigquery[​](#bigquery "Direct link to Bigquery") * Can cancel running queries on keyboard interrupt * Auto-drop intermediate tables created by incremental models to save resources ##### Spark[​](#spark "Direct link to Spark") * Support overriding the ODBC driver connection string which now enables you to provide custom connections #### Quick hits[​](#quick-hits "Direct link to Quick hits") We also made some quality-of-life improvements in Core 1.9, enabling you to: * Maintain data quality now that dbt returns an error (versioned models) or warning (unversioned models) when someone [removes a contracted model by deleting, renaming, or disabling](https://docs.getdbt.com/docs/mesh/govern/model-contracts.md#how-are-breaking-changes-handled) it. * Document [data tests](https://docs.getdbt.com/reference/resource-properties/description.md). * Use `ref` and `source` in [foreign key constraints](https://docs.getdbt.com/reference/resource-properties/constraints.md). * Use `dbt test` with the `--resource-type` / `--exclude-resource-type` flag, making it possible to include or exclude data tests (`test`) or unit tests (`unit_test`). * The [`enabled`](https://docs.getdbt.com/reference/resource-configs/enabled.md) config is now available for unit tests. Defaults to `true` if not defined. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### User-defined functions User-defined functions (UDFs) enable you to define and register custom functions in your warehouse. Like [macros](https://docs.getdbt.com/docs/build/jinja-macros.md), UDFs promote code reuse, but they are objects in the warehouse so you can reuse the same logic in tools outside dbt, such as BI tools, data science notebooks, and more. UDFs are particularly valuable for sharing logic across multiple tools, standardizing complex business calculations, improving performance for compute-intensive operations (since they're compiled and optimized by your warehouse's query engine), and version controlling custom logic within your dbt project. dbt creates, updates, and renames UDFs as part of DAG execution. The UDF is built in the warehouse before the model that references it. Refer to [listing and building UDFs](https://docs.getdbt.com/docs/build/udfs.md#listing-and-building-udfs) for more info on how to build UDFs in your project. Refer to [Function properties](https://docs.getdbt.com/reference/function-properties.md) or [Function configurations](https://docs.getdbt.com/reference/function-configs.md) for more information on the configs/properties for UDFs. #### Prerequisites[​](#prerequisites "Direct link to Prerequisites") * Make sure you're using dbt platform's **Latest Fusion** or **Latest** [release track](https://docs.getdbt.com/docs/dbt-versions/cloud-release-tracks.md) or dbt Core v1.11. * Use one of the following adapters: * dbt Core * dbt Fusion engine - BigQuery - Snowflake - Redshift - Postgres - Databricks * BigQuery * Snowflake * Redshift * Databricks UDF support Additional languages (for example, Java, JavaScript, Scala) aren't currently supported when developing UDFs. See the [Limitations](#limitations) section below for the full list of currently supported UDF capabilities. #### Defining UDFs in dbt[​](#defining-udfs-in-dbt "Direct link to Defining UDFs in dbt") You can define SQL and Python UDFs in dbt. Python UDFs are supported in Snowflake and BigQuery when using dbt Core or Fusion. Follow these steps to define UDFs in dbt: 1. Create a SQL or Python file under the `functions` directory. For example, this UDF checks if a string represents a positive integer: * SQL * Python Define a SQL UDF in a SQL file. functions/is\_positive\_int.sql ```sql # syntax for BigQuery, Snowflake, and Databricks REGEXP_INSTR(a_string, '^[0-9]+$') # syntax for Redshift and Postgres SELECT REGEXP_INSTR(a_string, '^[0-9]+$') ``` Define a Python UDF in a Python file. functions/is\_positive\_int.py ```py import re def main(a_string): return 1 if re.search(r'^[0-9]+$', a_string or '') else 0 ``` **Note**: You can specify configs in a config block in the SQL file or in the corresponding properties YAML file in next step (Step 2). 2. Specify the function name and define the config, properties, return type, and optional arguments in a corresponding properties YAML file. For example: * SQL * Python functions/schema.yml ```yml functions: - name: is_positive_int # required description: My UDF that returns 1 if a string represents a naked positive integer (like "10", "+8" is not allowed). # optional config: schema: udf_schema database: udf_db volatility: deterministic arguments: # optional - name: a_string # required if arguments is specified data_type: string # required if arguments is specified description: The string that I want to check if it's representing a positive integer (like "10") default_value: "'1'" # optional, available in Snowflake and Postgres returns: # required data_type: integer # required ``` The following configs are required when defining a Python UDF: * [`runtime_version`](https://docs.getdbt.com/reference/resource-configs/runtime-version.md) — Specify the Python version to run. Supported values are: * [Snowflake](https://docs.snowflake.com/en/developer-guide/udf/python/udf-python-introduction): `3.10`, `3.11`, `3.12`, and `3.13` * [BigQuery](https://cloud.google.com/bigquery/docs/user-defined-functions-python): `3.11` * [`entry_point`](https://docs.getdbt.com/reference/resource-configs/entry-point.md) — Specify the Python function to be called. For example: functions/schema.yml ```yml functions: - name: is_positive_int # required description: My UDF that returns 1 if a string represents a naked positive integer (like "10", "+8" is not allowed). # optional config: runtime_version: "3.11" # required entry_point: main # required schema: udf_schema database: udf_db volatility: deterministic arguments: # optional - name: a_string # required if arguments is specified data_type: string # required if arguments is specified description: The string that I want to check if it's representing a positive integer (like "10") default_value: "'1'" # optional, available in Snowflake and Postgres returns: # required data_type: integer # required ``` volatility warehouse-specific Something to note is that `volatility` is accepted in dbt for both SQL and Python UDFs, but the handling of it is warehouse-specific. BigQuery ignores `volatility` and dbt displays a warning. In Snowflake, `volatility` is applied when creating the UDF. Refer to [volatility](https://docs.getdbt.com/reference/resource-configs/volatility.md) for more information. 3. Run one of the following `dbt build` commands to build your UDFs and create them in the warehouse: Build all UDFs: ```bash dbt build --select "resource_type:function" ``` Or build a specific UDF: ```bash dbt build --select is_positive_int ``` When you run `dbt build`, both the `functions/schema.yml` file and the corresponding SQL or Python file (for example, `functions/is_positive_int.sql` or `functions/is_positive_int.py`) work together to generate the `CREATE FUNCTION` statement. The rendered `CREATE FUNCTION` statement depends on which adapter you're using. For example: * SQL * Python - Snowflake - Redshift - BigQuery - Databricks - Postgres ```sql CREATE OR REPLACE FUNCTION udf_db.udf_schema.is_positive_int(a_string STRING DEFAULT '1') RETURNS INTEGER LANGUAGE SQL IMMUTABLE AS $$ REGEXP_INSTR(a_string, '^[0-9]+$') $$; ``` ```sql CREATE OR REPLACE FUNCTION udf_db.udf_schema.is_positive_int(a_string VARCHAR) RETURNS INTEGER IMMUTABLE AS $$ SELECT REGEXP_INSTR(a_string, '^[0-9]+$') $$ LANGUAGE SQL; ``` ```sql CREATE OR REPLACE FUNCTION udf_db.udf_schema.is_positive_int(a_string STRING) RETURNS INT64 AS ( REGEXP_INSTR(a_string, r'^[0-9]+$') ); ``` ```sql CREATE OR REPLACE FUNCTION udf_db.udf_schema.is_positive_int(a_string STRING) RETURNS INT DETERMINISTIC RETURN REGEXP_INSTR(a_string, '^[0-9]+$'); ``` ```sql CREATE OR REPLACE FUNCTION udf_schema.is_positive_int(a_string text DEFAULT '1') RETURNS int LANGUAGE sql IMMUTABLE AS $$ SELECT regexp_instr(a_string, '^[0-9]+$') $$; ``` * Snowflake * BigQuery ```sql CREATE OR REPLACE FUNCTION udf_db.udf_schema.is_positive_int(a_string STRING DEFAULT '1') RETURNS INTEGER LANGUAGE PYTHON RUNTIME_VERSION = '3.11' HANDLER = 'main' AS $$ import re def main(a_string): return 1 if re.search(r'^[0-9]+$', a_string or '') else 0 $$; ``` ```sql CREATE OR REPLACE FUNCTION udf_db.udf_schema.is_positive_int(a_string STRING) RETURNS INT64 LANGUAGE python OPTIONS(runtime_version="python-3.11", entry_point="main") AS r''' import re def main(a_string): return 1 if re.search(r'^[0-9]+$', a_string or '') else 0 '''; ``` 4. Reference the UDF in a model using the `{{ function(...) }}` macro. For example: models/my\_model.sql ```sql select maybe_positive_int_column, {{ function('is_positive_int') }}(maybe_positive_int_column) as is_positive_int from {{ ref('a_model_i_like') }} ``` When using [`--defer`](https://docs.getdbt.com/reference/node-selection/defer.md), `function()` resolves to the UDF definition from the state manifest (for example, a production environment) if the function is not selected or not yet built in your target environment. This allows models that depend on UDFs to run successfully in [continuous integration](https://docs.getdbt.com/docs/deploy/continuous-integration.md) and development workflows. 5. Run `dbt compile` to see how the UDF is referenced. In the following example, the `{{ function('is_positive_int') }}` is replaced by the UDF name `udf_db.udf_schema.is_positive_int`. models/my\_model.sql ```sql select maybe_positive_int_column, udf_db.udf_schema.is_positive_int(maybe_positive_int_column) as is_positive from analytics.dbt_schema.a_model_i_like ``` In your DAG, a UDF node is created from the SQL/Python and YAML definitions, and there will be a dependency between `is_positive_int` → `my_model`. [![The DAG for the UDF node](/img/docs/building-a-dbt-project/UDF-DAG.png?v=2 "The DAG for the UDF node")](#)The DAG for the UDF node After defining a UDF, if you update the SQL/Python file that contains its function body (`is_positive_int.sql` or `is_positive_int.py` in this example) or its configurations, your changes will be applied to the UDF in the warehouse next time you `build`. #### Using UDFs in unit tests[​](#using-udfs-in-unit-tests "Direct link to Using UDFs in unit tests") You can use [unit tests](https://docs.getdbt.com/docs/build/unit-tests.md) to validate models that reference UDFs. Before running unit tests, make sure the function exists in your warehouse. To ensure that the function exists for a unit test, run: ```bash dbt build --select "+my_model_to_test" --empty ``` Following the example in [Defining UDFs in dbt](#defining-udfs-in-dbt), here's an example of a unit test that validates a model that calls a UDF: tests/test\_is\_positive\_int.yml ```yml unit_tests: - name: test_is_positive_int description: "Check my is_positive_int logic captures edge cases" model: my_model given: - input: ref('a_model_i_like') rows: - { maybe_positive_int_column: 10 } - { maybe_positive_int_column: -4 } - { maybe_positive_int_column: +8 } - { maybe_positive_int_column: 1.0 } expect: rows: - { maybe_positive_int_column: 10, is_positive: true } - { maybe_positive_int_column: -4, is_positive: false } - { maybe_positive_int_column: +8, is_positive: true } - { maybe_positive_int_column: 1.0, is_positive: true } ``` #### Listing and building UDFs[​](#listing-and-building-udfs "Direct link to Listing and building UDFs") Use the [`list` command](https://docs.getdbt.com/reference/commands/list.md#listing-functions) to list UDFs in your project: `dbt list --select "resource_type:function"` or `dbt list --resource-type function`. Use the [`build` command](https://docs.getdbt.com/reference/commands/build.md#functions) to select UDFs when building a project: `dbt build --select "resource_type:function"`. For more information about selecting UDFs, see the examples in [Node selector methods](https://docs.getdbt.com/reference/node-selection/methods.md#file). #### Limitations[​](#limitations "Direct link to Limitations") * Creating UDFs in other languages (for example, Java, JavaScript, or Scala) is not yet supported. * Python UDFs are supported in Snowflake and BigQuery only (when using dbt Core or Fusion). Other warehouses aren't yet supported for Python UDFs. * Only scalar and aggregate functions are currently supported. For more information, see [Supported function types](https://docs.getdbt.com/reference/resource-configs/type.md#supported-function-types). #### Related FAQs[​](#related-faqs "Direct link to Related FAQs") When should I use a UDF instead of a macro? Both user-defined functions (UDFs) and macros let you reuse logic across your dbt project, but they work in fundamentally different ways. Here's when to use each: ###### Use UDFs when:[​](#use-udfs-when "Direct link to Use UDFs when:")  You need logic accessible outside dbt UDFs are created in your warehouse and can be used by BI tools, data science notebooks, SQL clients, or any other tool that connects to your warehouse. Macros only work within dbt.  You want to standardize warehouse-native functions UDFs let you create reusable warehouse functions for data validation, custom formatting, or business-specific calculations that need to be consistent across all your data tools. Once created, they become part of your warehouse's function catalog.  You want dbt to manage the function lifecycle dbt manages UDFs as part of your DAG execution, ensuring they're created before models that reference them. You can version control UDF definitions alongside your models, test changes in development environments, and deploy them together through CI/CD pipelines.  Jinja compiles at creation time, not on each function call You can use Jinja (loops, conditionals, macros, `ref`, `source`, `var`) inside a UDF configuration. dbt resolves that Jinja **when the UDF is created**, and the resulting SQL body is what gets stored in your warehouse. Jinja influences the function when it’s created, whereas arguments influence it when it runs in the warehouse: * ✅ **Allowed:** Jinja that depends on project or build-time state — for example, `var(“can_do_things”)`, static `ref(‘orders’)`, or environment-specific logic. These are all evaluated once at creation time. * ❌ **Not allowed:** Jinja that depends on **function arguments** passed at runtime. The compiler can’t see those, so dynamic `ref(ref_name)` or conditional Jinja based on argument values won’t work.  You need Python logic that runs in your warehouse A Python UDF creates a Python function directly within your data warehouse, which you can invoke using SQL.
This makes it easier to apply complex transformations, calculations, or logic that would be difficult or verbose to express in SQL. Python UDFs support conditionals and looping within the function logic itself (using Python syntax), and execute at runtime, not at compile time like macros. Python UDFs are currently supported in Snowflake and BigQuery. ###### Use macros when:[​](#use-macros-when "Direct link to Use macros when:")  You need to generate SQL at compile time Macros generate SQL dynamically **before** it's sent to the warehouse (at compile time). This is essential for: * Building different SQL for different warehouses * Generating repetitive SQL patterns (like creating dozens of similar columns) * Creating entire model definitions or DDL statements * Dynamically referencing models based on project structure UDFs execute **at query runtime** in the warehouse. While they can use Jinja templating in their definitions, they don't generate new SQL queries—they're pre-defined functions that get called by your SQL. Expanding UDFs Currently, SQL and Python UDFs are supported. Java and Scala UDFs are planned for future releases.  You want to generate DDL or DML statements Currently, SQL and Python UDFs are supported. Java and Scala UDFs are planned for future releases.  You need to adapt SQL across different warehouses Macros can use Jinja conditional logic to generate warehouse-specific SQL (see [cross-database macros](https://docs.getdbt.com/reference/dbt-jinja-functions/cross-database-macros.md)), making your dbt project portable across platforms. UDFs are warehouse-specific objects. Even though UDFs can include Jinja templating in their definitions, each warehouse has different syntax for creating functions, different supported data types, and different SQL dialects. You would need to define separate UDF files for each warehouse you support.  Your logic needs access to dbt context Both macros and UDFs can use Jinja, which means they can access dbt context variables like `{{ ref() }},` `{{ source() }}`, environment variables, and project configurations. You can even call a macro from within a UDF (and vice versa) to combine dynamic SQL generation with runtime execution. However, the difference between the two is *when* the logic runs: * Macros run at compile time, generating SQL before it’s sent to the warehouse. * UDFs run inside the warehouse at query time.  You want to avoid creating warehouse objects Macros don't create anything in your warehouse; they just generate SQL at compile time. UDFs create actual function objects in your warehouse that need to be managed. ###### Can I use both together?[​](#can-i-use-both-together "Direct link to Can I use both together?") Yes! You can use a macro to call a UDF or call a macro from within a UDF, combining the benefits of both. So the following example shows how to use a macro to define default values for arguments alongside your logic, for your UDF ```sql {% macro cents_to_dollars(column_name, scale=2) %} {{ function('cents_to_dollars') }}({{ column_name }}, {{scale}}) {% endmacro %} ``` ###### Related documentation[​](#related-documentation "Direct link to Related documentation") * [User-defined functions](https://docs.getdbt.com/docs/build/udfs.md) * [Jinja macros](https://docs.getdbt.com/docs/build/jinja-macros.md) #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Validations Validations refer to the process of checking whether a system or configuration meets the expected requirements or constraints. In the case of the Semantic Layer, powered by MetricFlow, there are three built-in validations — [parsing](#parsing), [semantic](#semantic), and [data platform](#data-platform). These validations ensure that configuration files follow the expected schema, the semantic graph doesn't violate any constraints, and semantic definitions in the graph exist in the physical table — providing effective data governance support. These three validation steps occur sequentially and must succeed before proceeding to the next step. The code that handles validation [can be found here](https://github.com/dbt-labs/dbt-semantic-interfaces/tree/main/dbt_semantic_interfaces/validations) for those who want to dive deeper into this topic. #### Validations command[​](#validations-command "Direct link to Validations command") You can run validations from the dbt platform or the command line with the following [MetricFlow commands](https://docs.getdbt.com/docs/build/metricflow-commands.md). In dbt, you need developer credentials to run `dbt sl validate` in the IDE or CLI, and deployment credentials to run it in CI. * For Fusion and dbt users in the dbt platform CLI or locally with a valid `dbt_cloud.yml`: ```bash dbt sl validate ``` This runs parsing, semantic, and (where supported) data platform validations. When using `dbt sl validate` locally, the command validates your local semantic manifest, and not the platform's manifest. This means your uncommitted local changes are included in the validation. * For dbt Core (open source) users or Fusion CLI users not connected to dbt platform and using local MetricFlow: ```bash mf validate-configs ``` This runs parsing and semantic validations. #### Availability by environment[​](#availability-by-environment "Direct link to Availability by environment") Validation behavior and availability differ depending on your environment and setup: | Environment | Who can use | Parsing | Semantic syntax | Data platform | How to run | | ----------------- | --------------------------------------------------- | ------- | --------------- | ------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | dbt Fusion engine | dbt platform users for full Semantic Layer features | ✅ | ✅ \* | ✅ | - Parsing validations run automatically while generating the semantic manifest.
- When running in development, semantic syntax validations run automatically on dbt platform if `dbt_cloud.yml` is configured. If not, run manually using `mf-validate-configs`.
- Data platform validations don't run automatically for Fusion. You must run `dbt sl validate` to run data platform validations. | | dbt CLI | dbt platform users | ✅ | ✅ | ✅ | Run any dbt CLI command; validations execute automatically except data platform validations. You must run `dbt sl validate` to run data platform validations. | | dbt Core | Open source users | ✅ | ✅ | ❌ | Use dbt Core for parsing/builds. Run additional validation manually with the MetricFlow CLI. | | MetricFlow CLI | Open source users | ✅ | ✅ | ✅ | Run `mf validate-configs` locally to validate and test metrics. | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | \*Jobs run in **Orchestration** or **Studio IDE** run this validation automatically. #### Parsing[​](#parsing "Direct link to Parsing") In this validation step, we ensure your config files follow the defined schema for each semantic graph object and can be parsed successfully. It validates the schema for the following core objects: #### Semantic syntax[​](#semantic-syntax "Direct link to Semantic syntax") This syntactic validation step occurs after we've built your semantic graph. The Semantic Layer, powered by MetricFlow, runs a suite of tests to ensure that your semantic graph doesn't violate any constraints. For example, we check to see if names are unique, or if metrics referenced in materialization exist. The current semantic rules we check for are: #### Data platform[​](#data-platform "Direct link to Data platform") This type of validation checks to see if the semantic definitions in your semantic graph exist in the underlying physical table. To test this, we run queries against your data platform to ensure the generated SQL for semantic models, dimensions, and metrics will execute. We run the following checks: You can run semantic validations (against your semantic layer) in a CI job to guarantee any code changes made to dbt models don't break these metrics. For more information, refer to [semantic validation in CI](https://docs.getdbt.com/docs/deploy/ci-jobs.md#semantic-validations-in-ci). #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Version upgrade guides #### [📄️ Upgrading to the dbt Fusion engine (v2.0)](https://docs.getdbt.com/docs/dbt-versions/core-upgrade/upgrading-to-fusion.md) [New features and changes in Fusion](https://docs.getdbt.com/docs/dbt-versions/core-upgrade/upgrading-to-fusion.md) --- ### View documentation dbt provides intuitive and scalable tools for viewing your dbt documentation. Detailed documentation is essential for your developers and other stakeholders to gain shared context for your dbt project. You can view documentation in two complementary ways, depending on your needs: | Option | Description | Availability | | ----------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | --------------------------------------------- | | [**dbt Docs**](#dbt-docs) | Generates a static website with model lineage, metadata, and documentation that can be hosted on your web server (like S3 or Netlify). | dbt Core or dbt Developer plans | | [**Catalog**](https://docs.getdbt.com/docs/explore/explore-projects.md) | The premier documentation experience in dbt. Builds on dbt Docs to provide a dynamic, real-time interface with rich [metadata](https://docs.getdbt.com/docs/explore/explore-projects.md#generate-metadata), customizable views, deep insight into your project and resources, and collaborative tools. | dbt Starter, Enterprise, or Enterprise+ plans | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | #### Navigating your documentation[​](#navigating-your-documentation "Direct link to Navigating your documentation") The following sections describe how to navigate your documentation in Catalog and dbt Docs. ##### Catalog [Starter](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing")[Enterprise](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing")[Enterprise +](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing")[​](#catalog- "Direct link to catalog-") [Catalog](https://docs.getdbt.com/docs/explore/explore-projects.md) offers a dynamic, interactive way to explore your models, sources, and lineage. To access Catalog, navigate to the **Catalog** option in the dbt navigation menu. [![Example of Catalog's resource details page and its lineage.](/img/docs/collaborate/dbt-explorer/example-model-details.png?v=2 "Example of Catalog's resource details page and its lineage.")](#)Example of Catalog's resource details page and its lineage. [![Navigate Catalog to discover your project's resources and lineage.](/img/docs/collaborate/dbt-explorer/explorer-main-page.gif?v=2 "Navigate Catalog to discover your project's resources and lineage.")](#)Navigate Catalog to discover your project's resources and lineage. Catalog offers users a comprehensive suite of features to enhance data project navigation and understanding, like: * Interactive lineage visualization for your project's DAG to understand relationships between resources. * Resource search bar with comprehensive filters to help find project resources efficiently and quickly. * Model performance insights to access metadata on dbt runs for in-depth analysis of model performance and quality. * Project recommendations with suggestions to improve test coverage and documentation across your data estate. * Data health signals to monitor the health and performance of each resource through data health indicators. * Model query history to track consumption queries on your models to gain deeper insights into data usage. * Downstream exposures to automatically expose relevant data models from tools like Tableau to enhance visibility. For additional details and instructions on how to explore your lineage, navigate your resources, view model query history and data health signals, feature availability, and more — refer to [Discover data with Catalog](https://docs.getdbt.com/docs/explore/explore-projects.md). ##### dbt Docs[​](#dbt-docs "Direct link to dbt Docs") dbt Docs provides valuable insights into your dbt Core or dbt Developer plan projects. The interface enables you to navigate to the documentation for specific models. That might look something like this: [![Auto-generated documentation for a dbt model](/img/docs/building-a-dbt-project/testing-and-documentation/f2221dc-Screen_Shot_2018-08-14_at_6.29.55_PM.png?v=2 "Auto-generated documentation for a dbt model")](#)Auto-generated documentation for a dbt model Here, you can see a representation of the project structure, a markdown description for a model, and a list of all of the columns (with documentation) in the model. From the dbt Docs page, click the green button in the bottom-right corner of the webpage to expand a "mini-map" of your DAG. This pane displays the immediate parents and children of the model that you're exploring. [![Opening the DAG mini-map](/img/docs/building-a-dbt-project/testing-and-documentation/ec77c45-Screen_Shot_2018-08-14_at_6.31.56_PM.png?v=2 "Opening the DAG mini-map")](#)Opening the DAG mini-map In this example, the `fct_subscription_transactions` model only has one direct parent. By clicking the "Expand" button in the top-right corner of the window, we can pivot the graph horizontally and view the full lineage for our model. This lineage is filterable using the `--select` and `--exclude` flags, which are consistent with the semantics of [model selection syntax](https://docs.getdbt.com/reference/node-selection/syntax.md). Further, you can right-click to interact with the DAG, jump to documentation, or share links to your graph visualization with your coworkers. [![The full lineage for a dbt model](/img/docs/building-a-dbt-project/testing-and-documentation/ac97fba-Screen_Shot_2018-08-14_at_6.35.14_PM.png?v=2 "The full lineage for a dbt model")](#)The full lineage for a dbt model #### Deploy the documentation site[​](#deploy-the-documentation-site "Direct link to Deploy the documentation site") Effortlessly deploy documentation in Catalog or dbt Docs to make it available to your teams. Security The `dbt docs serve` command is only intended for local/development hosting of the documentation site. Please use one of the methods listed in the next section (or similar) to ensure that your documentation site is hosted securely! ##### Catalog [Starter](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing")[Enterprise](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing")[Enterprise +](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing")[​](#catalog--1 "Direct link to catalog--1") Catalog automatically updates documentation after each production or staging job run using the metadata generated. This means it always has the latest results for your project with no manual deployment required. For details on how Catalog uses metadata to automatically update documentation, refer to [Generate metadata](https://docs.getdbt.com/docs/explore/explore-projects.md#generate-metadata). To learn how to deploy your documentation site, see [Build and view your docs with dbt](https://docs.getdbt.com/docs/explore/build-and-view-your-docs.md). ##### dbt Docs[​](#dbt-docs-1 "Direct link to dbt Docs") dbt Docs was built to make it easy to host on the web. The site is "static," meaning you don't need any "dynamic" servers to serve the docs. You can host your documentation in several ways: * Host on [Amazon S3](https://docs.aws.amazon.com/AmazonS3/latest/dev/WebsiteHosting.html) (optionally [with IP access restrictions](https://docs.aws.amazon.com/AmazonS3/latest/dev/example-bucket-policies.html#example-bucket-policies-use-case-3)) * Publish with [Netlify](https://discourse.getdbt.com/t/publishing-dbt-docs-to-netlify/121) * Use your own web server like Apache/Nginx * If you're on a dbt Developer plan, see [Build and view your docs with dbt](https://docs.getdbt.com/docs/explore/build-and-view-your-docs.md#dbt-docs) to learn how to deploy your documentation site. Interested in using Catalog for the complete dbt documentation experience, sign up for a free [dbt trial](https://www.getdbt.com/signup) or [contact us](https://www.getdbt.com/contact). #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Visualize and orchestrate downstream exposures EnterpriseEnterprise + ### Visualize and orchestrate downstream exposures [Enterprise](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing")[Enterprise +](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing") Visualize and orchestrate downstream exposures in dbt to automatically generate exposures from dashboards and proactively refresh the underlying data sources (like Tableau extracts) during scheduled dbt jobs. The following table summarizes the differences between visualizing and orchestrating downstream exposures: | Info | Set up and visualize downstream exposures | Orchestrate downstream exposures [Beta](https://docs.getdbt.com/docs/dbt-versions/product-lifecycles "Go to https://docs.getdbt.com/docs/dbt-versions/product-lifecycles") | | ----------------- | ---------------------------------------------------------------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | Purpose | Automatically brings downstream assets into your dbt lineage. | Proactively refreshes the underlying data sources during scheduled dbt jobs. | | Benefits | Provides visibility into data flow and dependencies. | Ensures BI tools always have up-to-date data without manual intervention. | | Location | Exposed in dbt [Catalog](https://docs.getdbt.com/docs/explore/explore-projects.md) | Exposed in [dbt scheduler](https://docs.getdbt.com/docs/deploy/deployments.md) | | Supported BI tool | Tableau | Tableau | | Use case | Helps users understand how models are used and reduces incidents. | Optimizes timeliness and reduces costs by running models when needed. | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | Check out the following sections for more information on visualizing and orchestrating downstream exposures: [![](/img/icons/dbt-bit.svg)](https://docs.getdbt.com/docs/cloud-integrations/downstream-exposures-tableau.md) ###### [Set up and visualize downstream exposures](https://docs.getdbt.com/docs/cloud-integrations/downstream-exposures-tableau.md) [Set up downstream exposures automatically from dashboards to understand how models are used in downstream tools for a richer downstream lineage.](https://docs.getdbt.com/docs/cloud-integrations/downstream-exposures-tableau.md) [![](/img/icons/dbt-bit.svg)](https://docs.getdbt.com/docs/cloud-integrations/orchestrate-exposures.md) ###### [Orchestrate downstream exposures](https://docs.getdbt.com/docs/cloud-integrations/orchestrate-exposures.md) [Proactively refreshes the underlying data sources (like Tableau extracts) using the dbt scheduler during scheduled dbt jobs.](https://docs.getdbt.com/docs/cloud-integrations/orchestrate-exposures.md) #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Visualize and orchestrate downstream exposures EnterpriseEnterprise + ### Visualize and orchestrate downstream exposures [Enterprise](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing")[Enterprise +](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing") The following table summarizes the differences between visualizing and orchestrating downstream exposures: | Info | Set up and visualize downstream exposures | Orchestrate downstream exposures [Beta](https://docs.getdbt.com/docs/dbt-versions/product-lifecycles "Go to https://docs.getdbt.com/docs/dbt-versions/product-lifecycles") | | ----------------- | ---------------------------------------------------------------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | Purpose | Automatically brings downstream assets into your dbt lineage. | Proactively refreshes the underlying data sources during scheduled dbt jobs. | | Benefits | Provides visibility into data flow and dependencies. | Ensures BI tools always have up-to-date data without manual intervention. | | Location | Exposed in dbt [Catalog](https://docs.getdbt.com/docs/explore/explore-projects.md) | Exposed in [dbt scheduler](https://docs.getdbt.com/docs/deploy/deployments.md) | | Supported BI tool | Tableau | Tableau | | Use case | Helps users understand how models are used and reduces incidents. | Optimizes timeliness and reduces costs by running models when needed. | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | Check out the following sections for more information on visualizing and orchestrating downstream exposures: [![](/img/icons/dbt-bit.svg)](https://docs.getdbt.com/docs/cloud-integrations/downstream-exposures-tableau.md) ###### [Set up and visualize downstream exposures](https://docs.getdbt.com/docs/cloud-integrations/downstream-exposures-tableau.md) [Set up downstream exposures automatically from dashboards to understand how models are used in downstream tools for a richer downstream lineage.](https://docs.getdbt.com/docs/cloud-integrations/downstream-exposures-tableau.md) [![](/img/icons/dbt-bit.svg)](https://docs.getdbt.com/docs/cloud-integrations/orchestrate-exposures.md) ###### [Orchestrate downstream exposures](https://docs.getdbt.com/docs/cloud-integrations/orchestrate-exposures.md) [Proactively refreshes the underlying data sources (like Tableau extracts) using the dbt scheduler during scheduled dbt jobs.](https://docs.getdbt.com/docs/cloud-integrations/orchestrate-exposures.md) #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Visualize downstream exposures EnterpriseEnterprise + ### Visualize downstream exposures [Enterprise](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing")[Enterprise +](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing") Downstream exposures integrate natively with Tableau (Power BI coming soon) and auto-generate downstream lineage in Catalog for a richer experience. As a data team, it’s critical that you have context into the downstream use cases and users of your data products. By leveraging downstream [exposures](https://docs.getdbt.com/docs/build/exposures.md) automatically, data teams can: * Gain a better understanding of how models are used in downstream analytics, improving governance and decision-making. * Reduce incidents and optimize workflows by linking upstream models to downstream dependencies. * Automate exposure tracking for supported BI tools, ensuring lineage is always up to date. * [Orchestrate exposures](https://docs.getdbt.com/docs/cloud-integrations/orchestrate-exposures.md) to refresh the underlying data sources during scheduled dbt jobs, improving timeliness and reducing costs. Orchestrating exposures is essentially a way to ensure that your BI tools are updated regularly by using the [dbt job scheduler](https://docs.getdbt.com/docs/deploy/deployments.md). * For more info on the differences between visualizing and orchestrating exposures, see [Visualize and orchestrate downstream exposures](https://docs.getdbt.com/docs/cloud-integrations/downstream-exposures.md). To configure downstream exposures automatically from dashboards in Tableau, prerequisites, and more — refer to [Configure downstream exposures](https://docs.getdbt.com/docs/cloud-integrations/downstream-exposures-tableau.md). ##### Supported plans[​](#supported-plans "Direct link to Supported plans") Downstream exposures is available on all dbt [Enterprise-tier plans](https://www.getdbt.com/pricing/). Currently, you can only connect to a single Tableau site on the same server. Tableau Server If you're using Tableau Server, you need to [allowlist dbt's IP addresses](https://docs.getdbt.com/docs/cloud/about-cloud/access-regions-ip-addresses.md) for your dbt region. #### View downstream exposures[​](#view-downstream-exposures "Direct link to View downstream exposures") After setting up downstream exposures in dbt, you can view them in [Catalog](https://docs.getdbt.com/docs/explore/explore-projects.md) for a richer experience. Navigate to Catalog by selecting **Catalog** from the top-level navigation. From the **Overview** page, you can view downstream exposures from a couple of places: * [Exposures menu](#exposures-menu) * [File tree](#file-tree) * [Project lineage](#project-lineage) ##### Exposures menu[​](#exposures-menu "Direct link to Exposures menu") View downstream exposures from the **Exposures** menu item under **Resources**. This menu provides a comprehensive list of all the exposures so you can quickly access and manage them. The menu displays the following information: * **Name**: The name of the exposure. * **Health**: The [data health signal](https://docs.getdbt.com/docs/explore/data-health-signals.md) of the exposure. * **Type**: The type of exposure, such as `dashboard` or `notebook`. * **Owner**: The owner of the exposure. * **Owner email**: The email address of the owner of the exposure. * **Integration**: The BI tool that the exposure is integrated with. * **Exposure mode**: The type of exposure defined: **Auto** or **Manual**. [![View from the dbt Catalog under the project menu.](/img/docs/cloud-integrations/auto-exposures/explorer-view-resources.png?v=2 "View from the dbt Catalog under the project menu.")](#)View from the dbt Catalog under the project menu. ##### File tree[​](#file-tree "Direct link to File tree") Locate directly from within the **File tree** under the **imported\_from\_tableau** sub-folder. This view integrates exposures seamlessly with your project files, making it easy to find and reference them from your project's structure. [![View from the dbt Catalog under the 'File tree' menu.](/img/docs/cloud-integrations/auto-exposures/explorer-view-file-tree.jpg?v=2 "View from the dbt Catalog under the 'File tree' menu.")](#)View from the dbt Catalog under the 'File tree' menu. ##### Project lineage[​](#project-lineage "Direct link to Project lineage") From the **Project lineage** view, which visualizes the dependencies and relationships in your project. Exposures are represented with the Tableau icon, offering an intuitive way to see how they fit into your project's overall data flow. [![View from the dbt Catalog in your Project lineage view, displayed with the Tableau icon.](/img/docs/cloud-integrations/auto-exposures/explorer-lineage2.jpg?v=2 "View from the dbt Catalog in your Project lineage view, displayed with the Tableau icon.")](#)View from the dbt Catalog in your Project lineage view, displayed with the Tableau icon. [![View from the dbt Catalog in your Project lineage view, displayed with the Tableau icon.](/img/docs/cloud-integrations/auto-exposures/explorer-lineage.jpg?v=2 "View from the dbt Catalog in your Project lineage view, displayed with the Tableau icon.")](#)View from the dbt Catalog in your Project lineage view, displayed with the Tableau icon. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Webhooks for your jobs StarterEnterpriseEnterprise + ### Webhooks for your jobs [Starter](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing")[Enterprise](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing")[Enterprise +](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing") With dbt, you can create outbound webhooks to send events (notifications) about your dbt jobs to your other systems. Your other systems can listen for (subscribe to) these events to further automate your workflows or to help trigger automation flows you have set up. A webhook is an HTTP-based callback function that allows event-driven communication between two different web applications. This allows you to get the latest information on your dbt jobs in real time. Without it, you would need to make API calls repeatedly to check if there are any updates that you need to account for (polling). Because of this, webhooks are also called *push APIs* or *reverse APIs* and are often used for infrastructure development. dbt sends a JSON payload to your application's endpoint URL when your webhook is triggered. You can send a [Slack](https://docs.getdbt.com/guides/zapier-slack.md) notification, a [Microsoft Teams](https://docs.getdbt.com/guides/zapier-ms-teams.md) notification, [open a PagerDuty incident](https://docs.getdbt.com/guides/serverless-pagerduty.md) when a dbt job fails. You can create webhooks for these events from the [dbt web-based UI](#create-a-webhook-subscription) and by using the [dbt API](#api-for-webhooks): * `job.run.started` — Run started. * `job.run.completed` — Run completed. This can be a run that has failed or succeeded. * `job.run.errored` — Run errored. dbt retries sending each event five times. dbt keeps a log of each webhook delivery for 30 days. Every webhook has its own **Recent Deliveries** section, which lists whether a delivery was successful or failed at a glance. A webhook in dbt has a timeout of 10 seconds. This means that if the endpoint doesn't respond within 10 seconds, the webhook processor will time out. This can result in a situation where the client responds successfully after the 10 second timeout and records a success status while the dbt webhooks system will interpret this as a failure. Videos If you're interested in course learning with videos, check out the [Webhooks on-demand course](https://learn.getdbt.com/courses/webhooks) from dbt Labs. You can also check out the free [dbt Fundamentals course](https://learn.getdbt.com/courses/dbt-fundamentals). #### Prerequisites[​](#prerequisites "Direct link to Prerequisites") * You have a dbt account that is on the [Starter or Enterprise-tier](https://www.getdbt.com/pricing/) plan. * For `write` access to webhooks: * **Enterprise-tier plans** — Permission sets are the same for both API service tokens and the dbt UI. You, or the API service token, must have the Account Admin, Admin, or Developer [permission set](https://docs.getdbt.com/docs/cloud/manage-access/enterprise-permissions.md). * **Starter plan accounts** — For the dbt UI, you need to have a [Developer license](https://docs.getdbt.com/docs/cloud/manage-access/self-service-permissions.md). * You have a multi-tenant or an AWS single-tenant deployment model in dbt. For more information, refer to [Tenancy](https://docs.getdbt.com/docs/cloud/about-cloud/tenancy.md). * Your destination system supports [Authorization headers](#troubleshooting). #### Create a webhook subscription[​](#create-a-webhook-subscription "Direct link to Create a webhook subscription") 1. Navigate to **Account settings** in dbt (by clicking your account name from the left side panel) 2. Go to the **Webhooks** section and click **Create webhook**. 3. To configure your new webhook: * **Webhook name** — Enter a name for your outbound webhook. * **Description** — Enter a description of the webhook. * **Events** — Choose the event you want to trigger this webhook. You can subscribe to more than one event. * **Jobs** — Specify the job(s) you want the webhook to trigger on. Or, you can leave this field empty for the webhook to trigger on all jobs in your account. By default, dbt configures your webhook at the account level. * **Endpoint** — Enter your application's endpoint URL, where dbt can send the event(s) to. 4. When done, click **Save**. dbt provides a secret token that you can use to [check for the authenticity of a webhook](#validate-a-webhook). It’s strongly recommended that you perform this check on your server to protect yourself from fake (spoofed) requests. info Note that dbt automatically deactivates a webhook after 5 consecutive failed attempts to send events to your endpoint. To re-activate the webhook, locate it in the webhooks list and click the reactivate button to enable it and continue receiving events. To find the appropriate dbt access URL for your region and plan, refer to [Regions & IP addresses](https://docs.getdbt.com/docs/cloud/about-cloud/access-regions-ip-addresses.md). ##### Differences between completed and errored webhook events[​](#completed-errored-event-difference "Direct link to Differences between completed and errored webhook events") The `job.run.errored` event is a subset of the `job.run.completed` events. If you subscribe to both, you will receive two notifications when your job encounters an error. However, dbt triggers the two events at different times: * `job.run.completed` — This event only fires once the job’s metadata and artifacts have been ingested and are available from the dbt Admin and Discovery APIs. * `job.run.errored` — This event fires immediately so the job’s metadata and artifacts might not have been ingested. This means that information might not be available for you to use. If your integration depends on data from the Admin API (such as accessing the logs from the run) or Discovery API (accessing model-by-model statuses), use the `job.run.completed` event and filter on `runStatus` or `runStatusCode`. If your integration doesn’t depend on additional data or if improved delivery performance is more important for you, use `job.run.errored` and build your integration to handle API calls that might not return data a short period at first. #### Validate a webhook[​](#validate-a-webhook "Direct link to Validate a webhook") You can use the secret token provided by dbt to validate that webhooks received by your endpoint were actually sent by dbt. Official webhooks will include the `Authorization` header that contains a SHA256 hash of the request body and uses the secret token as a key. An example for verifying the authenticity of the webhook in Python: ```python auth_header = request.headers.get('authorization', None) app_secret = os.environ['MY_DBT_CLOUD_AUTH_TOKEN'].encode('utf-8') signature = hmac.new(app_secret, request_body, hashlib.sha256).hexdigest() return signature == auth_header ``` Note that the destination system must support [Authorization headers](#troubleshooting) for the webhook to work correctly. You can test your endpoint's support by sending a request with curl and an Authorization header, like this: ```shell curl -H 'Authorization: 123' -X POST https:// ``` #### Inspect HTTP requests[​](#inspect-http-requests "Direct link to Inspect HTTP requests") When working with webhooks, it’s good practice to use tools like [RequestBin](https://requestbin.com/) and [Requestly](https://requestly.io/). These tools allow you to inspect your HTML requests, response payloads, and response headers so you can debug and test webhooks before incorporating them into your systems. #### Examples of JSON payloads[​](#examples-of-json-payloads "Direct link to Examples of JSON payloads") An example of a webhook payload for a run that's started: ```json { "accountId": 1, "webhookId": "wsu_12345abcde", "eventId": "wev_2L6Z3l8uPedXKPq9D2nWbPIip7Z", "timestamp": "2023-01-31T19:28:15.742843678Z", "eventType": "job.run.started", "webhookName": "test", "data": { "jobId": "123", "jobName": "Daily Job (dbt build)", "runId": "12345", "environmentId": "1234", "environmentName": "Production", "dbtVersion": "1.0.0", "projectName": "Snowflake Github Demo", "projectId": "167194", "runStatus": "Running", "runStatusCode": 3, "runStatusMessage": "None", "runReason": "Kicked off from the UI by test@test.com", "runStartedAt": "2023-01-31T19:28:07Z" } } ``` An example of a webhook payload for a completed run: ```json { "accountId": 1, "webhookId": "wsu_12345abcde", "eventId": "wev_2L6ZDoilyiWzKkSA59Gmc2d7FDD", "timestamp": "2023-01-31T19:29:35.789265936Z", "eventType": "job.run.completed", "webhookName": "test", "data": { "jobId": "123", "jobName": "Daily Job (dbt build)", "runId": "12345", "environmentId": "1234", "environmentName": "Production", "dbtVersion": "1.0.0", "projectName": "Snowflake Github Demo", "projectId": "167194", "runStatus": "Success", "runStatusCode": 10, "runStatusMessage": "None", "runReason": "Kicked off from the UI by test@test.com", "runStartedAt": "2023-01-31T19:28:07Z", "runFinishedAt": "2023-01-31T19:29:32Z" } } ``` An example of a webhook payload for an errored run: ```json { "accountId": 1, "webhookId": "wsu_12345abcde", "eventId": "wev_2L6m5BggBw9uPNuSmtg4MUiW4Re", "timestamp": "2023-01-31T21:15:20.419714619Z", "eventType": "job.run.errored", "webhookName": "test", "data": { "jobId": "123", "jobName": "dbt Vault", "runId": "12345", "environmentId": "1234", "environmentName": "dbt Vault Demo", "dbtVersion": "1.0.0", "projectName": "Snowflake Github Demo", "projectId": "167194", "runStatus": "Errored", "runStatusCode": 20, "runStatusMessage": "None", "runReason": "Kicked off from the UI by test@test.com", "runStartedAt": "2023-01-31T21:14:41Z", "runErroredAt": "2023-01-31T21:15:20Z" } } ``` #### API for webhooks[​](#api-for-webhooks "Direct link to API for webhooks") You can use the dbt API to create new webhooks that you want to subscribe to, get detailed information about your webhooks, and to manage the webhooks that are associated with your account. The following sections describe the API endpoints you can use for this. Access URLs dbt is hosted in multiple regions in the world and each region has a different access URL. People on Enterprise-tier plans can choose to have their account hosted in any one of these regions. For a complete list of available dbt access URLs, refer to [Regions & IP addresses](https://docs.getdbt.com/docs/cloud/about-cloud/access-regions-ip-addresses.md). ##### List all webhook subscriptions[​](#list-all-webhook-subscriptions "Direct link to List all webhook subscriptions") List all webhooks that are available from a specific dbt account. ###### Request[​](#request "Direct link to Request") ```shell GET https://{your access URL}/api/v3/accounts/{account_id}/webhooks/subscriptions ``` ###### Path parameters[​](#path-parameters "Direct link to Path parameters") | Name | Description | | ----------------- | ------------------------------------------------- | | `your access URL` | The login URL for your dbt account. | | `account_id` | The dbt account the webhooks are associated with. | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | ###### Response sample[​](#response-sample "Direct link to Response sample") ```json { "data": [ { "id": "wsu_12345abcde", "account_identifier": "act_12345abcde", "name": "Webhook for jobs", "description": "A webhook for when jobs are started", "job_ids": [ "123", "321" ], "event_types": [ "job.run.started" ], "client_url": "https://test.com", "active": true, "created_at": "1675735768491774", "updated_at": "1675787482826757", "account_id": "123", "http_status_code": "0" }, { "id": "wsu_12345abcde", "account_identifier": "act_12345abcde", "name": "Notification Webhook", "description": "Webhook used to trigger notifications in Slack", "job_ids": [], "event_types": [ "job.run.completed", "job.run.started", "job.run.errored" ], "client_url": "https://test.com", "active": true, "created_at": "1674645300282836", "updated_at": "1675786085557224", "http_status_code": "410", "dispatched_at": "1675786085548538", "account_id": "123" } ], "status": { "code": 200 }, "extra": { "pagination": { "total_count": 2, "count": 2 }, "filters": { "offset": 0, "limit": 10 } } } ``` ###### Response schema[​](#response-schema "Direct link to Response schema") | Name | Description | Possible Values | | -------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | `data` | List of available webhooks for the specified dbt account ID. | | | `id` | The webhook ID. This is a universally unique identifier (UUID) that's unique across all regions, including multi-tenant and single-tenant | | | `account_identifier` | The unique identifier for *your* dbt account. | | | `name` | Name of the outbound webhook. | | | `description` | Description of the webhook. | | | `job_ids` | The specific jobs the webhook is set to trigger for. When the list is empty, the webhook is set to trigger for all jobs in your account; by default, dbt configures webhooks at the account level. | - Empty list
- List of job IDs | | `event_types` | The event type(s) the webhook is set to trigger on. | One or more of these: - `job.run.started`
- `job.run.completed`
- `job.run.errored` | | `client_url` | The endpoint URL for an application where dbt can send event(s) to. | | | `active` | A Boolean value indicating whether the webhook is active or not. | One of these: - `true`
- `false` | | `created_at` | Timestamp of when the webhook was created. | | | `updated_at` | Timestamp of when the webhook was last updated. | | | `http_status_code` | The latest HTTP status of the webhook. | Can be any [HTTP response status code](https://developer.mozilla.org/en-US/docs/Web/HTTP/Status). If the value is `0`, that means the webhook has never been triggered. | | `dispatched_at` | Timestamp of when the webhook was last dispatched to the specified endpoint URL. | | | `account_id` | The dbt account ID. | | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | ##### Get details about a webhook[​](#get-details-about-a-webhook "Direct link to Get details about a webhook") Get detailed information about a specific webhook. ###### Request[​](#request-1 "Direct link to Request") ```shell GET https://{your access URL}/api/v3/accounts/{account_id}/webhooks/subscription/{webhook_id} ``` ###### Path parameters[​](#path-parameters-1 "Direct link to Path parameters") | Name | Description | | ----------------- | ----------------------------------------------- | | `your access URL` | The login URL for your dbt account. | | `account_id` | The dbt account the webhook is associated with. | | `webhook_id` | The webhook you want detailed information on. | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | ###### Response sample[​](#response-sample-1 "Direct link to Response sample") ```json { "data": { "id": "wsu_12345abcde", "account_identifier": "act_12345abcde", "name": "Webhook for jobs", "description": "A webhook for when jobs are started", "event_types": [ "job.run.started" ], "client_url": "https://test.com", "active": true, "created_at": "1675789619690830", "updated_at": "1675793192536729", "dispatched_at": "1675793192533160", "account_id": "123", "job_ids": [], "http_status_code": "0" }, "status": { "code": 200 } } ``` ###### Response schema[​](#response-schema-1 "Direct link to Response schema") | Name | Description | Possible Values | | -------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | `id` | The webhook ID. | | | `account_identifier` | The unique identifier for *your* dbt account. | | | `name` | Name of the outbound webhook. | | | `description` | Complete description of the webhook. | | | `event_types` | The event type the webhook is set to trigger on. | One or more of these: - `job.run.started`
- `job.run.completed`
- `job.run.errored` | | `client_url` | The endpoint URL for an application where dbt can send event(s) to. | | | `active` | A Boolean value indicating whether the webhook is active or not. | One of these: - `true`
- `false` | | `created_at` | Timestamp of when the webhook was created. | | | `updated_at` | Timestamp of when the webhook was last updated. | | | `dispatched_at` | Timestamp of when the webhook was last dispatched to the specified endpoint URL. | | | `account_id` | The dbt account ID. | | | `job_ids` | The specific jobs the webhook is set to trigger for. When the list is empty, the webhook is set to trigger for all jobs in your account; by default, dbt configures webhooks at the account level. | One of these: - Empty list
- List of job IDs | | `http_status_code` | The latest HTTP status of the webhook. | Can be any [HTTP response status code](https://developer.mozilla.org/en-US/docs/Web/HTTP/Status). If the value is `0`, that means the webhook has never been triggered. | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | ##### Create a new webhook subscription[​](#create-a-new-webhook-subscription "Direct link to Create a new webhook subscription") Create a new outbound webhook and specify the endpoint URL that will be subscribing (listening) to the webhook's events. ###### Request sample[​](#request-sample "Direct link to Request sample") ```shell POST https://{your access URL}/api/v3/accounts/{account_id}/webhooks/subscriptions ``` ```json { "event_types": [ "job.run.started" ], "name": "Webhook for jobs", "client_url": "https://test.com", "active": true, "description": "A webhook for when jobs are started", "job_ids": [ 123, 321 ] } ``` ###### Path parameters[​](#path-parameters-2 "Direct link to Path parameters") | Name | Description | | ----------------- | ----------------------------------------------- | | `your access URL` | The login URL for your dbt account. | | `account_id` | The dbt account the webhook is associated with. | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | ###### Request parameters[​](#request-parameters "Direct link to Request parameters") | Name | Description | Possible Values | | ------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | --------------------------------------------------------------------------------------------- | | `event_types` | Enter the event you want to trigger this webhook. You can subscribe to more than one event. | One or more of these: - `job.run.started`
- `job.run.completed`
- `job.run.errored` | | `name` | Enter the name of your webhook. | | | `client_url` | Enter your application's endpoint URL, where dbt can send the event(s) to. | | | `active` | Enter a Boolean value to indicate whether your webhook is active or not. | One of these: - `true`
- `false` | | `description` | Enter a description of your webhook. | | | `job_ids` | Enter the specific jobs you want the webhook to trigger on or you can leave this parameter as an empty list. If this is an empty list, the webhook is set to trigger for all jobs in your account; by default, dbt configures webhooks at the account level. | One of these: - Empty list
- List of job IDs | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | ###### Response sample[​](#response-sample-2 "Direct link to Response sample") ```json { "data": { "id": "wsu_12345abcde", "account_identifier": "act_12345abcde", "name": "Webhook for jobs", "description": "A webhook for when jobs are started", "job_ids": [ "123", "321" ], "event_types": [ "job.run.started" ], "client_url": "https://test.com", "hmac_secret": "12345abcde", "active": true, "created_at": "1675795644808877", "updated_at": "1675795644808877", "account_id": "123", "http_status_code": "0" }, "status": { "code": 201 } } ``` ###### Response schema[​](#response-schema-2 "Direct link to Response schema") | Name | Description | Possible Values | | -------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | `id` | The webhook ID. | | | `account_identifier` | The unique identifier for *your* dbt account. | | | `name` | Name of the outbound webhook. | | | `description` | Complete description of the webhook. | | | `job_ids` | The specific jobs the webhook is set to trigger for. When the list is empty, the webhook is set to trigger for all jobs in your account; by default, dbt configures webhooks at the account level. | One of these: - Empty list
- List of job IDs | | `event_types` | The event type the webhook is set to trigger on. | One or more of these: - `job.run.started`
- `job.run.completed`
- `job.run.errored` | | `client_url` | The endpoint URL for an application where dbt can send event(s) to. | | | `hmac_secret` | The secret key for your new webhook. You can use this key to [validate the authenticity of this webhook](#validate-a-webhook). | | | `active` | A Boolean value indicating whether the webhook is active or not. | One of these: - `true`
- `false` | | `created_at` | Timestamp of when the webhook was created. | | | `updated_at` | Timestamp of when the webhook was last updated. | | | `account_id` | The dbt account ID. | | | `http_status_code` | The latest HTTP status of the webhook. | Can be any [HTTP response status code](https://developer.mozilla.org/en-US/docs/Web/HTTP/Status). If the value is `0`, that means the webhook has never been triggered. | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | ##### Update a webhook[​](#update-a-webhook "Direct link to Update a webhook") Update the configuration details for a specific webhook. ###### Request sample[​](#request-sample-1 "Direct link to Request sample") ```shell PUT https://{your access URL}/api/v3/accounts/{account_id}/webhooks/subscription/{webhook_id} ``` ```json { "event_types": [ "job.run.started" ], "name": "Webhook for jobs", "client_url": "https://test.com", "active": true, "description": "A webhook for when jobs are started", "job_ids": [ 123, 321 ] } ``` ###### Path parameters[​](#path-parameters-3 "Direct link to Path parameters") | Name | Description | | ----------------- | ----------------------------------------------- | | `your access URL` | The login URL for your dbt account. | | `account_id` | The dbt account the webhook is associated with. | | `webhook_id` | The webhook you want to update. | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | ###### Request parameters[​](#request-parameters-1 "Direct link to Request parameters") | Name | Description | Possible Values | | ------------- | -------------------------------------------------------------------------------------------------------------------------------- | --------------------------------------------------------------------------------------------- | | `event_types` | Update the event type the webhook is set to trigger on. You can subscribe to more than one. | One or more of these: - `job.run.started`
- `job.run.completed`
- `job.run.errored` | | `name` | Change the name of your webhook. | | | `client_url` | Update the endpoint URL for an application where dbt can send event(s) to. | | | `active` | Change the Boolean value indicating whether the webhook is active or not. | One of these: - `true`
- `false` | | `description` | Update the webhook's description. | | | `job_ids` | Change which jobs you want the webhook to trigger for. Or, you can use an empty list to trigger it for all jobs in your account. | One of these: - Empty list
- List of job IDs | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | ###### Response sample[​](#response-sample-3 "Direct link to Response sample") ```json { "data": { "id": "wsu_12345abcde", "account_identifier": "act_12345abcde", "name": "Webhook for jobs", "description": "A webhook for when jobs are started", "job_ids": [ "123" ], "event_types": [ "job.run.started" ], "client_url": "https://test.com", "active": true, "created_at": "1675798888416144", "updated_at": "1675804719037018", "http_status_code": "200", "account_id": "123" }, "status": { "code": 200 } } ``` ###### Response schema[​](#response-schema-3 "Direct link to Response schema") | Name | Description | Possible Values | | -------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | `id` | The webhook ID. | | | `account_identifier` | The unique identifier for *your* dbt account. | | | `name` | Name of the outbound webhook. | | | `description` | Complete description of the webhook. | | | `job_ids` | The specific jobs the webhook is set to trigger for. When the list is empty, the webhook is set to trigger for all jobs in your account; by default, dbt configures webhooks at the account level. | One of these: - Empty list
- List of job IDs | | `event_types` | The event type the webhook is set to trigger on. | One or more of these: - `job.run.started`
- `job.run.completed`
- `job.run.errored` | | `client_url` | The endpoint URL for an application where dbt can send event(s) to. | | | `active` | A Boolean value indicating whether the webhook is active or not. | One of these: - `true`
- `false` | | `created_at` | Timestamp of when the webhook was created. | | | `updated_at` | Timestamp of when the webhook was last updated. | | | `http_status_code` | The latest HTTP status of the webhook. | Can be any [HTTP response status code](https://developer.mozilla.org/en-US/docs/Web/HTTP/Status). If the value is `0`, that means the webhook has never been triggered. | | `account_id` | The dbt account ID. | | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | ##### Test a webhook[​](#test-a-webhook "Direct link to Test a webhook") Test a specific webhook. ###### Request[​](#request-2 "Direct link to Request") ```shell GET https://{your access URL}/api/v3/accounts/{account_id}/webhooks/subscription/{webhook_id}/test ``` ###### Path parameters[​](#path-parameters-4 "Direct link to Path parameters") | Name | Description | | ----------------- | ----------------------------------------------- | | `your access URL` | The login URL for your dbt account. | | `account_id` | The dbt account the webhook is associated with. | | `webhook_id` | The webhook you want to test. | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | ###### Response sample[​](#response-sample-4 "Direct link to Response sample") ```json { "data": { "verification_error": null, "verification_status_code": "200" }, "status": { "code": 200 } } ``` ##### Delete a webhook[​](#delete-a-webhook "Direct link to Delete a webhook") Delete a specific webhook. ###### Request[​](#request-3 "Direct link to Request") ```shell DELETE https://{your access URL}/api/v3/accounts/{account_id}/webhooks/subscription/{webhook_id} ``` ###### Path parameters[​](#path-parameters-5 "Direct link to Path parameters") | Name | Description | | ----------------- | ----------------------------------------------- | | `your access URL` | The login URL for your dbt account. | | `account_id` | The dbt account the webhook is associated with. | | `webhook_id` | The webhook you want to delete. | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | ###### Response sample[​](#response-sample-5 "Direct link to Response sample") ```json { "data": { "id": "wsu_12345abcde" }, "status": { "code": 200, "is_success": true } } ``` #### Related docs[​](#related-docs "Direct link to Related docs") * [dbt CI](https://docs.getdbt.com/docs/deploy/continuous-integration.md) * [Use dbt's webhooks with other SaaS apps](https://docs.getdbt.com/guides.md?tags=Webhooks) #### Troubleshooting[​](#troubleshooting "Direct link to Troubleshooting") If your destination system isn't receiving dbt webhooks, ensure it allows Authorization headers. dbt webhooks send an Authorization header, and if your endpoint doesn't support this, it may be incompatible. Services like Azure Logic Apps and Power Automate may not accept Authorization headers, so they won't work with dbt webhooks. You can test your endpoint's support by sending a request with curl and an Authorization header, like this: ```shell curl -H 'Authorization: 123' -X POST https:// ``` #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Weekly dbt single-tenant release notes dbt Single-tenant release notes for weekly updates. Release notes fall into one of these categories: * **New:** New products and features * **Enhancement:** Performance improvements and feature enhancements * **Fix:** Bug and security fixes * **Behavior change:** A change to existing behavior that doesn't fit into the other categories, such as feature deprecations or changes to default settings Release notes are grouped by date for single-tenant environments. #### March 11, 2026[​](#march-11-2026 "Direct link to March 11, 2026") #### New[​](#new "Direct link to New") ##### Deployment and Configuration[​](#deployment-and-configuration "Direct link to Deployment and Configuration") * **Self-serve Snowflake private endpoint requests:** You can request a new Snowflake private endpoint from account settings by pasting the output from `SELECT SYSTEM$GET_PRIVATELINK_CONFIG();`, then track request status in the private endpoints table. This is available for Enterprise Business Critical accounts only, and please contact your account manager to enable. For other connection types, contact . #### Enhancements[​](#enhancements "Direct link to Enhancements") ##### Orchestration and Run Status[​](#orchestration-and-run-status "Direct link to Orchestration and Run Status") * **Run retries support dbt Fusion runs:** You can now retry failed runs as long as your environment is on dbt Core version `1.6` or higher or dbt Fusion. ##### Integrations[​](#integrations "Direct link to Integrations") * **More reliable Slack notifications:** Slack channel discovery and notifications now retry on Slack rate limits to reduce dropped messages during busy periods. ##### APIs, Identity, and Administration[​](#apis-identity-and-administration "Direct link to APIs, Identity, and Administration") * **Improved OpenAPI typing for large integers:** OpenAPI schemas now mark 64-bit integer fields as `format: int64` to improve generated client types. * **Clearer credentials schemas:** Credentials OpenAPI docs now use a `type` discriminator (`postgres`, `redshift`, `snowflake`, `bigquery`, and `adapter`) to improve code generation and request validation. #### Fixes[​](#fixes "Direct link to Fixes") ##### Orchestration and Run Status[​](#orchestration-and-run-status-1 "Direct link to Orchestration and Run Status") * **More reliable job search:** Searching jobs with numeric terms (for example, `12`) no longer triggers API validation errors, so you can load job lists reliably. * **Clearer cross-project publication errors:** When dbt platform cannot fetch a publication artifact for an upstream project declared in `dependencies.yml`, you now see which project is missing an artifact and guidance to run the upstream environment at least once. ##### Integrations[​](#integrations-1 "Direct link to Integrations") * **More accurate Microsoft Teams notification triggers:** Microsoft Teams notifications now use the correct trigger event type for each notification, so you see the expected run outcome context in the message. ##### APIs, Identity, and Administration[​](#apis-identity-and-administration-1 "Direct link to APIs, Identity, and Administration") * **More accurate error responses during permission checks:** You now receive more accurate errors from permission checks, and underlying service errors surface instead of being reported as authorization failures. ##### Deployment and Configuration[​](#deployment-and-configuration-1 "Direct link to Deployment and Configuration") * **Clearer private endpoint validation errors:** Creating a private endpoint now returns a `400` error with a clear message when `snowflake_output` is malformed or not valid JSON. #### Behavior Changes[​](#behavior-changes "Direct link to Behavior Changes") ##### Orchestration and Run Status[​](#orchestration-and-run-status-2 "Direct link to Orchestration and Run Status") * **Model timing unavailable for dbt Fusion runs:** You now see an informational notice instead of the Model timing chart for dbt Fusion runs because dbt Fusion handles threading differently. ##### APIs, Identity, and Administration[​](#apis-identity-and-administration-2 "Direct link to APIs, Identity, and Administration") * **System for Cross-domain Identity Management (SCIM) `id` fields are now strings:** SCIM schema discovery now reports `id` fields as strings for users and groups. #### March 4, 2026[​](#march-4-2026 "Direct link to March 4, 2026") #### Enhancements[​](#enhancements-1 "Direct link to Enhancements") ##### Orchestration and Run Status[​](#orchestration-and-run-status-3 "Direct link to Orchestration and Run Status") * **Clearer SAO description**: Job settings now describe state-aware orchestration (SAO) as only building models when data or code changes are detected. * **Direct links for cost optimization setup**: Fusion cost optimization settings now link to account-level Cost Insights settings and setup documentation so you can validate cost data and savings. ##### APIs, Identity, and Administration[​](#apis-identity-and-administration-3 "Direct link to APIs, Identity, and Administration") * **Confirmation when enabling manual SCIM updates**: When you enable manual updates for System for Cross-domain Identity Management (SCIM), dbt platform now asks you to confirm so you do not accidentally allow changes outside your identity provider. * **More reliable SCIM group provisioning**: SCIM has been updated so that when a SCIM-provisioned user with an expired invite is added to a SCIM-managed group through a SCIM request, the invite is automatically resent during group assignment. This helps prevent errors caused by unaccepted invites. ##### dbt platform[​](#dbt-platform "Direct link to dbt platform") * **Project names and descriptions handle empty values better**: Projects with missing names now show as “Untitled Project,” and you can save project descriptions as empty. ##### Studio IDE[​](#studio-ide "Direct link to Studio IDE") * **Removed non-functional “Open Settings” actions**: Studio IDE no longer shows “Open Settings” buttons in editor notifications because Studio IDE does not expose VS Code settings, and the action would not help you resolve issues. #### Fixes[​](#fixes-1 "Direct link to Fixes") ##### Catalog[​](#catalog "Direct link to Catalog") * **More reliable file tree loading**: Catalog no longer gets stuck loading the file tree on initial page load. * **Clearer trust signals**: Trust signals now suppress less-severe upstream-source issues when a more severe issue is present, so badges and messages are easier to interpret. ##### Integrations[​](#integrations-2 "Direct link to Integrations") * **Clearer deploy key decryption errors**: When dbt platform cannot decrypt a deploy key, you now get a clearer failure instead of a generic git credentials error. ##### Studio IDE[​](#studio-ide-1 "Direct link to Studio IDE") * **Cleaner LSP disconnects**: If authentication fails when you connect to the Language Server Protocol (LSP) WebSocket, the connection now closes cleanly instead of failing with an internal server error, so you should see fewer unexpected disconnects. * **Improved timeout handling and authentication stability**: Reduced environment setup timeouts and resolved intermittent authentication failures during busy periods. * **Clearer invalid credentials error**: If your development connection credentials are invalid, you now see a clearer error message to help you diagnose the issue faster. #### Behavior Changes[​](#behavior-changes-1 "Direct link to Behavior Changes") ##### Orchestration and Run Status[​](#orchestration-and-run-status-4 "Direct link to Orchestration and Run Status") * **`versionless` dbt version is no longer accepted**: dbt platform now treats `versionless` as deprecated and updates existing environments and jobs to use `latest`. If you set `dbt_version` in an API integration or automation, update it to send `latest` instead. ##### Webhooks[​](#webhooks "Direct link to Webhooks") * **Account identifier required for run-based notifications**: If you send events that include a `run_id`, you must also provide an `account_identifier` so the service can validate and resolve the correct account before dispatch. If `account_identifier` is missing, the event fails instead of falling back to a `run_id`-only lookup. #### February 25, 2026[​](#february-25-2026 "Direct link to February 25, 2026") #### New[​](#new-1 "Direct link to New") ##### Catalog[​](#catalog-1 "Direct link to Catalog") * **Saved queries now ingested for lineage and governance**: Saved query definitions (including tags, exports, parameters, and lineage relationships) are now captured during ingestion so they can participate in Catalog lineage and governance workflows. #### Enhancements[​](#enhancements-2 "Direct link to Enhancements") ##### dbt platform[​](#dbt-platform-1 "Direct link to dbt platform") * **System logs now surface warnings and errors**: Run step structured logs now show an indicator when system warnings or errors are present, making issues easier to spot during run triage. * **Region labels now use backend display names**: Account Settings now shows the backend-provided region display name for clearer, more accurate region labeling. * **SCIM create group UI change**: Changes to our UI to improve the experience of managing groups with SCIM enabled. * **Updated the post-invite message for SSO accounts**: After a user accepts an invite, the UI now explains that they must log in using SSO to fully redeem the invite and access the account. This replaces the previous "Joined successfully" message and helps avoid confusion when users accept an invite but do not complete the SSO login flow. ##### Studio IDE and Copilot[​](#studio-ide-and-copilot "Direct link to Studio IDE and Copilot") * **Improved crash recovery and not-found routing**: Studio IDE now catches unexpected render failures with a top-level error boundary and shows Not Found more reliably for unknown in-project routes. * **Improved navigation accessibility and semantics in Studio IDE**: The main navigation trigger area is now a navigation element with improved focus and labeling. * **Reduced shortcut conflicts with VS Code search**: When Visual Studio Code (VS Code) search is enabled, Studio IDE avoids unregistering Quick Open and suppresses conflicting command palette shortcuts. ##### Catalog and Insights Data[​](#catalog-and-insights-data "Direct link to Catalog and Insights Data") * **More accurate source freshness outdated status in Catalog**: Source freshness Outdated status can now be computed at query time, improving freshness status filtering consistency. * **Improved search and lineage usability in Catalog**: Search results better support column-level navigation and very long queries show a clear validation error, and lineage visuals have improved alignment and reduced edge clutter. * **Improved cross-project lineage and function awareness in Catalog**: Lineage graph building now includes cross-project dependencies and supports function nodes as first-class lineage entities. ##### APIs, Identity, and Administration[​](#apis-identity-and-administration-4 "Direct link to APIs, Identity, and Administration") * **Project deletion now supported in Admin v2 and v3 Projects APIs**: Projects APIs now explicitly support DELETE with stricter permission checks. #### Behavior Changes[​](#behavior-changes-2 "Direct link to Behavior Changes") ##### Webhooks[​](#webhooks-1 "Direct link to Webhooks") * **Updated job run event field presence and status normalization**: Webhook payloads now include `runFinishedAt` only for completed events and `runErroredAt` only for errored events; canceled runs no longer include `runCanceledAt`, and run status is normalized from Cancelled to Canceled. Also note that enabling JSON preserve order can change key ordering, so consumers should parse JSON rather than string-compare payloads. ##### Insights APIs[​](#insights-apis "Direct link to Insights APIs") * **Optional source freshness expiration windows**: Source freshness expiration windows can optionally derive from each source’s freshness criteria rather than a fixed window. You must enable in your deployment. ##### Deployment and Configuration[​](#deployment-and-configuration-2 "Direct link to Deployment and Configuration") * **Source ingestion may skip sources for extremely large manifests in Catalog**: For very large `manifest.json` files, ingestion may strip sources above a configurable threshold to prevent out of memory failures. Set `SOURCE_INGESTION_THRESHOLD=0` if you must always ingest sources regardless of size. * **Removed deprecated object storage settings in Studio IDE**: Deprecated settings `project_storage_bucket_name` and `project_storage_object_prefix` have been removed. Migrate to `object_storage_bucket_name` and `object_storage_object_prefix`. #### February 18, 2026[​](#february-18-2026 "Direct link to February 18, 2026") #### New[​](#new-2 "Direct link to New") ##### Cost Insights[​](#cost-insights "Direct link to Cost Insights") * **Estimated warehouse compute costs**: Cost Insights shows estimated warehouse compute costs and run times for your dbt projects and models, directly in the dbt platform. It highlights cost reductions and efficiency gains from optimizations like state-aware orchestration across your project dashboard, model pages, and job details. This feature is in private beta. To request access, contact your account manager. #### Enhancements[​](#enhancements-3 "Direct link to Enhancements") ##### Studio IDE[​](#studio-ide-2 "Direct link to Studio IDE") * **Reduced conflicts across multiple tabs**: Studio IDE can pause the Language Server Protocol (LSP) in background tabs and resume on return to improve stability when the editor is open in more than one tab. * **More informative header and more editor space**: Adds a Visual Studio Code-style header showing a dbt badge and current project name, with an option to hide surrounding chrome for more editor space. Please contact your account manager to enable. * **Clearer file and folder creation errors**: Surfaces more actionable filesystem errors (for example, name too long and file-is-a-directory) instead of generic failures. * **Copy relative path**: Adds a Copy Relative Path action that respects `dbt_project_subdirectory` for quicker navigation and sharing. * **Friendlier lineage error messages**: Improves user-facing errors for lineage failures (including server errors and cases where upstream returns HTML instead of JSON). * **More reliable private connectivity selection**: Improves private endpoint filtering by adapter type and updates Studio IDE to use the correct version 3 private endpoints endpoint. ##### Canvas[​](#canvas "Direct link to Canvas") * **More reliable Add Sources CSV uploads**: Improves Comma-Separated Values (CSV) upload progress, resume behavior, and common error handling during Add Sources. ##### Catalog[​](#catalog-2 "Direct link to Catalog") * **Faster and more usable lineage for large projects**: Improves directed acyclic graph (DAG) performance by rendering only visible elements and improving layout for disconnected nodes. * **Safer search result interactions**: Improves keyboard and hover behavior in the search dropdown and avoids showing stale results while searches are loading. ##### dbt platform[​](#dbt-platform-2 "Direct link to dbt platform") * **More informative user invite statuses**: This change shows clearer invite status (invitation sent and invitation accepted) and supports accepted, login pending for Single Sign-On (SSO). * **Unpaid billing banner enabled by default**: The unpaid billing banner is no longer feature-flagged and will display when applicable, while billing link visibility remains permission-based. * **System for cross-domain identity management (SCIM)**: Bug fixes and improvements related to managed invites for easier processing. ##### dbt Copilot and agents[​](#dbt-copilot-and-agents "Direct link to dbt Copilot and agents") * **Streaming control for server-sent events**: Adds Server-Sent Events (SSE) streaming control so clients can choose chunk streaming or message streaming. This enables more responsive Copilot experiences in environments that support streaming. * **More reliable similar models requests**: Improves responsiveness for AI Similar Models and Similar Sources requests by enforcing tighter embedding and database timeouts aligned to request deadlines. Users should see faster, more consistent results when exploring related models. * **dbt Copilot: Improved bring your own key error handling**: Categorizes OpenAI failures with Bring Your Own Key (BYOK) awareness so BYOK failures return the expected 424-class behavior instead of generic 500-series errors. This makes it easier to diagnose and resolve key or configuration issues. * **Expanded dbt Model Context Protocol tooling**: Updates dbt Model Context Protocol (MCP) tooling, including adding `get_all_macros` and improving error categorization, enabling more accurate responses. #### Fixes[​](#fixes-2 "Direct link to Fixes") ##### Studio IDE and Catalog[​](#studio-ide-and-catalog "Direct link to Studio IDE and Catalog") * **More reliable search and replace**: Ensures bulk edits stay in sync after server-side edits to prevent stale content from overwriting changes. * **Correct search preview highlighting**: Fixes preview and match highlighting assembly so match ranges align correctly in multi-line previews. * **Improved startup failure experience**: Shows a proper error layout and notification on unrecoverable initialization failures. ##### Canvas[​](#canvas-1 "Direct link to Canvas") * **Fewer Add Sources UI interruptions**: Prevents incorrect tab closing after uploads complete and avoids showing the floating node panel when not on a file tab. ##### Catalog[​](#catalog-3 "Direct link to Catalog") * **Public model lineage across environments**: Fixes lineage resolution for public model parents when the producer model lives in a non-default environment. ##### dbt Copilot And Agents[​](#dbt-copilot-and-agents-1 "Direct link to dbt Copilot And Agents") * **Reduced resource growth under load**: Fixes an OpenAI connection pool leak that could lead to out-of-memory (OOM) conditions under sustained load. Users should see fewer slowdowns during high-traffic periods. * **Fewer related models timeouts**: Reduces intermittent failures when attaching related models by increasing internal timeouts for related-model fetching. Users should experience fewer timeout errors when working with related models. #### Behavior Changes[​](#behavior-changes-3 "Direct link to Behavior Changes") ##### Studio IDE[​](#studio-ide-3 "Direct link to Studio IDE") * **Prevent destructive root operations**: Prevents rename and delete operations on the repository root and shows clearer warnings. * **Resumable dbt command log streaming**: Improves dbt command log streaming reliability by resuming from the last known Command Line Interface (CLI) event offset. Contact your account manager to enable. ##### Admin And APIs[​](#admin-and-apis "Direct link to Admin And APIs") * **Job Admin gains write access in Profiles API**: Job Admin now includes `profiles_write`, which can change what Job Admin users can do where Profiles are enabled. * **Search parameter renamed**: Version 3 Private Endpoints query parameter `name_search` is renamed to `search`, and search matches endpoint name and endpoint value. * **Connections: Postgres database name required**: Postgres connection validation now requires a non-empty database name. * **User credentials: Prevent sharing credentials across users**: Prevents associating the same active credentials object to multiple users, returning a conflict instead of silently duplicating associations. ##### Integrations[​](#integrations-3 "Direct link to Integrations") * **GitHub: More flexible repository URL schemes**: GitHub shared webhooks now accept repository URLs using https, git, and Secure Shell (SSH) formats. * **Slack: Tighter permission gating for settings**: Slack linking and notification settings are more strictly gated by the relevant permissions. * **Slack: Permission check aligned to job notification access**: Slack integration listing now uses job notifications read permission, reducing incorrect permission-denied scenarios. ##### CLI Runtime[​](#cli-runtime "Direct link to CLI Runtime") * **Shorter default request timeouts**: Reduces default timeouts from 60 seconds to 5 seconds for Cloud Config and Cloud Artifact calls, causing requests to fail faster in high-latency environments unless overridden. * **OpenTelemetry logs: Corrected JSON field name**: Corrects the OpenTelemetry (OTel) log payload field name to `additional_message` (from the misspelled `addtional_message`), which may require updates to downstream parsing. #### February 11, 2026[​](#february-11-2026 "Direct link to February 11, 2026") #### Enhancements[​](#enhancements-4 "Direct link to Enhancements") ##### Catalog[​](#catalog-4 "Direct link to Catalog") * **Faster model graph rendering for large projects**: Improved model graph layout performance to reduce load time in larger projects. * **Faster similar models results**: Similar Models lookup now uses an optimized vector search strategy to reduce timeouts on large projects. ##### Studio IDE[​](#studio-ide-4 "Direct link to Studio IDE") * **Clearer project root in Catalog file tree**: When your dbt project is in a subdirectory, the project root is highlighted in the Catalog file tree. * **More native rename and delete in Catalog file tree**: Rename and delete actions now use native editor behaviors when using the Catalog file tree. * **More reliable in-browser formatting**: Formatting updates now apply directly to the active editor buffer to reduce prompts and inconsistent results. * **Cleaner code generation workflow**: Code generation no longer creates a temporary file in your repository during generation. ##### dbt platform[​](#dbt-platform-3 "Direct link to dbt platform") * **Fusion compatibility validation on environments**: Environment settings now prevent saving a Fusion dbt version with an incompatible connection and surface field level validation errors. * **Smarter Fusion defaults during connection setup**: When setting up a new connection, Fusion eligible adapters now default to the latest Fusion version to reduce misconfiguration during setup. * **Improved Private Link endpoint management**: Private Endpoints can be sorted by status and connections, and endpoint details now show associated connections and environments. ##### Run Logs[​](#run-logs "Direct link to Run Logs") * **More reliable invocation event streaming**: Invocation event streaming is more reliable for long running jobs by deriving totals from the latest stream event identifier. * **Reduced Redis usage after log streams complete**: Log streaming now cleans up Redis keys after a stream completes, reducing stale keys and Redis memory pressure for high volume runs. #### Fixes[​](#fixes-3 "Direct link to Fixes") ##### dbt Copilot[​](#dbt-copilot "Direct link to dbt Copilot") * **Consistent usage limit messaging in Insights and Studio IDE**: When users hit the usage limit, dbt disables Copilot and shows a clear message, including the reset date when available. ##### Studio IDE[​](#studio-ide-5 "Direct link to Studio IDE") * **Git status decorations registered once**: Fixed duplicate Git status decorations in the file tree that could cause visual issues and performance impact. * **Avoid automatic pull on primary branch**: Studio IDE no longer runs an automatic pull on the primary branch to reduce unexpected changes during development. * **Clearer file operation validation errors**: File operations now return structured validation errors and explicitly reject names that exceed operating system limits. * **More reliable command log refresh and finalization**: Command logs for the dbt Cloud Command Line Interface (CLI) are refreshed and finalized more reliably. ##### Run Automation[​](#run-automation "Direct link to Run Automation") * **Correct account attribution for automatically triggered runs**: Scheduler triggered runs now include account context, improving run attribution and preventing some downstream triggers from running without proper context. * **Reject malformed account identifiers for exposure events**: Exposure generated events now validate that account identifiers are numeric before triggering follow on automation. ##### Webhooks[​](#webhooks-2 "Direct link to Webhooks") * **More compatible run completion payload for canceled and errored runs**: Webhook payloads now include consistent completion and error timestamps, and canceled runs include a canceled timestamp and normalized status. * **Restored dual dispatch for some failure and completion triggers**: When both failure and completion triggers are configured, errored runs may generate two webhook deliveries to match legacy behavior. ##### dbt Project Metadata[​](#dbt-project-metadata "Direct link to dbt Project Metadata") * **Manifest Ingestion: Accept functions section in manifest.json**: Ingestion now accepts the `functions` section (for example, Snowflake user defined functions (UDF)) to prevent parse failures on newer manifest schemas. * **Macro Metadata: More consistent timestamps and argument comparison**: Macro metadata persistence now uses more consistent Coordinated Universal Time (UTC) timestamps and improves argument comparison to reduce noisy or incorrect macro updates. #### Behavior Changes[​](#behavior-changes-4 "Direct link to Behavior Changes") ##### dbt platform APIs[​](#dbt-platform-apis "Direct link to dbt platform APIs") * **Removed credential configuration fields from responses**: Profiles API responses no longer include credential configuration and extended attributes; use the appropriate credentials and configuration endpoints instead. * **Filter connections by Private Endpoint**: Account Connections list supports filtering by Private Endpoint identifier for easier management. * **Additional ordering options**: Private Endpoints list now supports ordering by endpoint state and connection count. * **Private Link: Updated license permission defaults**: User licenses now include read access for Private Link resources, which may change who can view Private Link related settings. ##### Studio IDE[​](#studio-ide-6 "Direct link to Studio IDE") * **Metric generation writes directly to active file**: Generated metrics are now written directly into the active model file instead of using an accept and reject diff flow. #### February 4, 2026[​](#february-4-2026 "Direct link to February 4, 2026") #### New[​](#new-3 "Direct link to New") ##### Studio IDE[​](#studio-ide-7 "Direct link to Studio IDE") * **Studio IDE: Copilot link in console toolbar**: Adds a link that opens Copilot from the console toolbar. You can use Copilot to read files and list directories for better context. * **Studio IDE: Copy repo-relative path command**: Adds a command to copy a file path relative to your dbt project subdirectory, making it easier to share paths in runbooks and support tickets. #### Enhancements[​](#enhancements-5 "Direct link to Enhancements") ##### dbt platform[​](#dbt-platform-4 "Direct link to dbt platform") * **dbt platform: Fusion eligibility and compatibility indicators in setup flows**: Improves Fusion setup by showing “Fusion compatible” indicators during connection setup. * **dbt platform: Compare Changes shows partial success warnings**: When Compare Changes subqueries fail, the experience now surfaces a partial success state with expandable warning details to make troubleshooting faster. * **dbt platform: In-progress run logs preserve text selection**: Improves log usability during in-progress runs by preserving text selection while logs auto-refresh and rerender. * **dbt platform: Job completion trigger job picker search**: Adds server-side search and clearer loading and empty states to the job picker for job-completion triggers. * **dbt platform: Job artifacts content types and downloads**: Improves artifact handling for job documentation and run artifacts by strengthening HTML detection, defaulting empty paths to `index.html`, and returning clearer `Content-Type` and download filenames. * **dbt platform: Private Endpoints API listing and pagination improvements**: Improves Private Endpoints API v3 list behavior with validated query parameters, filtering, limit and offset pagination, and `connection_count` in responses. ##### Studio IDE[​](#studio-ide-8 "Direct link to Studio IDE") * **Studio IDE: Format file more reliable in subdirectories**: Improves formatting reliability by consistently using the active editor content and a stable repo-relative path when invoking formatting. * **Studio IDE: Better stability for tabs and Git operations**: Reduces errors when working with non-file tabs and improves robustness around tab-close and Git checkout flows. * **Studio IDE: Sidebar layout improvements for embedded panels**: Improves embedded panel sizing to reduce clipping and scrolling issues in the sidebar. * **Studio IDE: Fusion prompts reflect actual eligibility**: Improves Fusion banners and prompts by checking project eligibility via a Fusion status endpoint to reduce confusing prompts for ineligible projects. ##### Catalog and Discovery[​](#catalog-and-discovery "Direct link to Catalog and Discovery") * **Catalog: Improved cross-project lineage for dbt Mesh**: Improves cross-project lineage (“public ancestors”) computation to better match expected external lineage boundaries in dbt Mesh experiences. ##### Insights[​](#insights "Direct link to Insights") * **Insights: More reliable Copilot Agent requests and context handoff**: Standardizes Copilot Agent requests to the API and includes active tab content as context to improve reliability of agent runs and handoff. #### Fixes[​](#fixes-4 "Direct link to Fixes") ##### dbt platform[​](#dbt-platform-5 "Direct link to dbt platform") * **dbt platform: Webhook form editing more resilient**: Improves webhook subscription editing reliability with asynchronous data and fixes a multiselect focus issue that could cause accidental option selection. * **dbt platform: Run warning emails render correctly**: Fixes HTML email markup that could break rendering for run warning notifications. * **dbt platform: Profiles URLs moved under project dashboard** Profile create and view routes now live under `/dashboard/:accountId/projects/:projectId/profiles/...`, which may affect bookmarks and direct links. ##### Studio IDE[​](#studio-ide-9 "Direct link to Studio IDE") * **Studio IDE: Cleaner command history list**: Removes hidden background commands (such as listing and parsing commands) from command history to reduce noise for users. * **Studio IDE: More reliable inline compile and show output**: Improves robustness of inline compile and show output attachment, including cases with tricky quoting and newlines, reducing missing results during interactive use. * **Studio IDE: More reliable log downloads for dbt commands**: Fixes log download behavior so downloads correctly serve either the active `dbt.log` or the finalized compressed log. * **Studio IDE: More reliable artifact uploads to Microsoft Azure Blob Storage**: Fixes edge cases where gzipped artifacts (such as manifests) could fail to upload due to upload stream handling, improving upload reliability. * **Studio IDE: More stable language server protocol (LSP) sessions in workers**: Reduces noisy disconnect and cleanup errors when multiple websocket connections and processes map to the same invocation, improving session stability. ##### Catalog[​](#catalog-5 "Direct link to Catalog") * **Catalog: Search highlighting displays correctly with multiple matches**: Fixes search result highlighting when the backend returns multiple highlights per field, improving readability of matches. Updates search highlights to display as compact badges with counts for easier scanning of results. * **Catalog: Environment filtering more accurate in search results**: Improves environment-scoped Catalog search filtering by using merged environment identifiers and preserving warehouse-only assets via a dedicated sentinel value. * **Catalog: Public models return empty list when none exist**: Improves behavior for environments with no public models by returning an empty list instead of falling into follow-on query logic. ##### Copilot[​](#copilot "Direct link to Copilot") * **Copilot: More reliable model context protocol (MCP) connections during long tool calls**: Improves keep-alive behavior so connections shut down cleanly when the client disconnects, reducing noisy failures. * **Copilot: Semantic Layer tools only offered when available**: Prevents failing tool calls by hiding Semantic Layer tools when the Semantic Layer is not available for the user or environment. * **Copilot: More accurate HTTP error responses**: Improves error reporting by walking wrapped exceptions and exception groups to return the most specific status code and detail available. * **Copilot: Empty Tool Outputs No Longer Cause Failures**: Treats empty tool outputs as valid results (for example, “no matches”) to reduce unnecessary “tool call failed” errors. #### Behavior Changes[​](#behavior-changes-5 "Direct link to Behavior Changes") ##### dbt platform[​](#dbt-platform-6 "Direct link to dbt platform") * **dbt platform: Fusion default dbt version selection more restrictive**: During connection setup, the default dbt version now only defaults to `latest-fusion` when the selected adapter is Fusion-compatible and the project and account are eligible. * **dbt platform: dbt version enforcement now project-aware**: dbt version “allowed version” checks now account for `project_id` across jobs and environments, including Application Programming Interface (API)-triggered runs, improving correctness for overrides and automatic mapping to allowed equivalents when possible. * **dbt platform: Connected app refresh tokens now last 7 days**: Refresh token expiration for connected app OAuth flows increased from 8 hours to 7 days, reducing re-authorization frequency. ##### Studio IDE[​](#studio-ide-10 "Direct link to Studio IDE") * **Studio IDE: File stat timestamps now milliseconds**: File stat responses now return modified time and created time as integer milliseconds since epoch instead of float seconds; integrations consuming these endpoints may need to adjust. * **Studio IDE: Language Server Protocol deferral controls expanded**: The Language Server Protocol (LSP) websocket now supports `defer_env_id` to defer against a specific environment and `no_defer=true` to explicitly disable deferral. * **Studio IDE: Deferral toggle applied more consistently to Language Server Protocol connections**: When “defer to production” is turned off, the Studio Integrated Development Environment (IDE) now passes `no_defer=true` to align editor intelligence with the selected deferral behavior. (Language Server Protocol (LSP)) ##### Catalog[​](#catalog-6 "Direct link to Catalog") * **Catalog: Source freshness outdated status removed**: The freshness status value `outdated` was removed; unconfigured freshness is now handled explicitly as `unconfigured`, and sources will no longer report `outdated`. * **Catalog: Rows per page selector removed from tables**: The rows-per-page selector was removed, and pagination now uses a fixed page size. ##### Orchestration and Run Status[​](#orchestration-and-run-status-5 "Direct link to Orchestration and Run Status") * **Orchestration: Cached and stale outcome status mapping updated**: Cached nodes are now consistently surfaced as Reused with clearer reasons, and stale outcomes are treated as errors, which can change the statuses operators see in run output and telemetry. #### January 28, 2026[​](#january-28-2026 "Direct link to January 28, 2026") ##### New[​](#new-4 "Direct link to New") * **Canvas** * **New two-step "upload source" API for more resilient uploads**: Use `POST /v1/workspaces/{workspace_id}/upload-source` to create an upload, then `PATCH /v1/workspaces/{workspace_id}/upload-source/{file_id}/process` to stream processing progress (SSE). ##### Enhancements[​](#enhancements-6 "Direct link to Enhancements") * **Catalog & Search** * **Improved search relevance and highlighting**: Ranking now boosts results by modeling layer, and highlighting is more consistent (including support for multiple highlight snippets per field). * **dbt platform** * **Private endpoints details page**: The dbt platform now includes a Private Endpoint details view with endpoint properties, connectivity status, and associated projects. * **Fusion-aware default dbt version during setup**: Connection setup and environment creation can now default to `latest-fusion` for eligible projects. * **Studio IDE** * **Search and replace in files**: Adds a dedicated sidebar search experience. Please contact your account manager to enable. * **Autofix now includes package upgrades**: Upgrade flows can proceed from fixing deprecations into package upgrades in the same guided run. * **Editor UI polish**: Fixed multiple layout/styling issues for a more consistent editor experience. ##### Fixes[​](#fixes-5 "Direct link to Fixes") * **dbt platform** * **Run logs render ANSI/structured output more reliably**: Improved rendering and cleanup of escape sequences in step logs. * **More accurate source freshness status in multi-job environments**: Freshness status is preserved when a run lacks freshness results but freshness remains configured. * **More robust seed artifact ingestion**: Ingestion now tolerates missing/null `schema` fields in the manifest to avoid failures. * **Studio IDE** * **CLI project sync no longer fails on broken symlinks**: Sync skips missing symlink targets instead of failing the whole sync. * **IDE abort is clearer when a command is missing**: Aborting a command that no longer exists returns a specific "no-command-found" response. * **More robust inline command results**: Malformed inline commands no longer break result processing; `show --inline` with an empty result returns an empty preview table. * **Canvas** * **Clearer errors for duplicate uploaded-source names**: Creating an uploaded-source model with a duplicate name now returns HTTP 409 with an actionable message. * **Failed uploads are now visible via file state**: Uploaded-source processing records failure state instead of deleting the file record, improving retry/resume workflows. * **Invocation status streaming reliability**: The invocation status SSE endpoint now correctly awaits the status stream. ##### Behavior changes[​](#behavior-changes-6 "Direct link to Behavior changes") * **Catalog & Search** * **Search highlight fields deprecated and highlights shape expanded**: `AccountSearchHit.highlight` and `AccountSearchHit.matchedField` are deprecated. `AccountSearchHit.highlights` now supports multiple highlight snippets per field (arrays). * **dbt platform** * **Deprecations**: The "Adaptive" job type is deprecated. `last_checked_at` is deprecated and no longer populated in run responses. * **Canvas** * **Existing CSV upload SSE endpoint deprecated**: Migrate to the new two-step [upload source](https://docs.getdbt.com/docs/cloud/use-canvas.md#upload-data-to-canvas) flow. #### January 21, 2026[​](#january-21-2026 "Direct link to January 21, 2026") ##### New[​](#new-5 "Direct link to New") * **dbt platform** * **Favorites are now available in Catalog**: Add resources to favorites and organize your frequently accessed resources in the Catalog navigation. * **Connectivity / private networking** * **New v3 API endpoint to fetch a specific PrivateLink endpoint**: You can now retrieve individual PrivateLink endpoints by ID, enabling better automation and troubleshooting workflows. ##### Enhancements[​](#enhancements-7 "Direct link to Enhancements") * **dbt platform** * **Run artifacts are now searchable**: Find specific artifacts faster in run history with the new artifacts search box and improved empty states. * **Webhooks editor is more stable**: The webhook form no longer resets while job options are loading, and server-generated fields now display reliably after creation. * **Fusion onboarding completion card can be dismissed**: After completing the Fusion onboarding checklist, you can now dismiss the card and it will stay dismissed. * **Cross-project lineage is now generally available**: Cross-project lineage is now enabled for all applicable accounts. * **Catalog & Search** * **Improved Catalog search relevance and performance**: Enhanced search scoring and matching provides more accurate results, with better column matching and highlighting for large catalogs. * **Search results are refreshed when column metadata changes**: Column name and description updates now automatically trigger re-indexing, ensuring search results stay current. * **Search typeahead includes "View all results"**: Quickly access full search results from the typeahead dropdown with the new footer link. * **Cleaner environment dropdown behavior**: The environment selector now only shows "Staging" when your account has projects with a staging environment configured. * **Studio IDE** * **Clearer error messages when fetching dev credentials and defer state**: IDE-related endpoints now return more specific and helpful error messages for common configuration issues and timeouts. * **Studio console and command log viewer improvements**: Enhanced command log viewer with improved download capabilities and more consistent error log viewing. ##### Fixes[​](#fixes-6 "Direct link to Fixes") * **AI-assisted workflows** * **Enhancement:** [dbt Copilot](https://docs.getdbt.com/docs/cloud/dbt-copilot.md) adds missing column descriptions more accurately. Copilot generated documentation now correctly detects column names across various `schema.yml` files, adds only missing descriptions, and preserves existing ones. * **Catalog & lineage** * **Fixes missing auto-generated exposures in model lineage**: Auto-generated exposures now appear correctly in lineage views. * **Catalog search no longer errors when a warehouse connection name is missing**: Search now handles missing connection names gracefully without causing errors. * **Improved security: malformed identity headers are rejected cleanly**: Requests with invalid authentication tokens now fail safely with clear error messages. * **Studio IDE** * **Command status is more reliable when Cloud CLI invocation data expires**: Commands that can't be fetched are now properly marked as failed instead of staying in a "running" state. * **APIs** * **Jobs API deferral validation is stricter and clearer**: Job deferral settings are now validated to ensure the deferring job and environment exist within the same account, with improved error messages. ##### Behavior changes[​](#behavior-changes-7 "Direct link to Behavior changes") * **dbt platform** * **Account Insights default page size changed to 5 rows**: Tables in Account Insights now display 5 rows per page by default (previously 10). * **Webhooks** * **Webhook timestamps are now consistently UTC RFC3339 with `Z`**: All webhook timestamp fields (`run_started_at`, `run_finished_at`, `timestamp`) now use UTC with `Z` suffix and higher precision. Missing/invalid timestamps emit `1970-01-01T00:00:00Z` instead of empty strings. Update webhook consumers if needed. * **Webhook `run_status` string changed from `Error` to `Errored`**: Update webhook consumers that parse this status value strictly. * **Runs / ingestion** * **Very large exposure sets are now limited during ingestion**: Projects with more than 5,000 exposures will skip exposure ingestion to prevent performance issues. All other artifact ingestion continues normally. Contact support if you need to increase this limit. #### January 14, 2026[​](#january-14-2026 "Direct link to January 14, 2026") ##### New[​](#new-6 "Direct link to New") * **dbt platform** * **Fusion migration readiness endpoint**: Added an API endpoint to determine whether a project is eligible for Fusion migration. ##### Enhancements[​](#enhancements-8 "Direct link to Enhancements") * **Copilot and AI** * **More resilient agent runs**: Agent tool execution errors now return structured responses instead of failing the entire run. * **Better project context retrieval**: Agent toolsets include additional retrieval and search capabilities for more relevant responses. * **Improved Azure OpenAI verification**: Azure OpenAI connection verification now uses GPT-5-compatible parameters for GPT-5 deployments. * **BYOK for Azure OpenAI**: Added support for Azure Foundry URLs with automatic endpoint parsing to reduce setup friction. * **Insights and Catalog** * **Semantic Layer querying now generally available (GA)**: Build SQL queries against the Semantic Layer without writing SQL code. * **Improved search relevance**: Search scoring prioritizes exact and multi-term matches more strongly, with better highlighting and column-description matching. * **Catalog UX improvements**: Search labels are more consistent, and the embedded lineage view loads more responsively. * **Studio IDE** * **Unified Studio IDE**: Studio now loads a single unified IDE package. * **Defer-to-production honors `defer-env-id` override**: Studio now respects `dbt-cloud.defer-env-id` settings when Cloud CLI runtime is supported. * **Improved log exporting**: Download and copy behavior for command logs is more consistent, including debug logs. * **Enhanced multi-edit support**: The IDE now supports multiple explicit edits in one request with safer validation. * **Clearer Cloud CLI session errors**: Session creation returns clearer error messages and guidance for setup issues. * **dbt platform** * **Settings detail pages in resizable drawer**: Settings detail experiences now use an improved drawer-based UI. * **More resilient profile creation**: Profile creation now handles dependencies and failures more gracefully. * **Enhanced logging limits for in-progress runs**: Logs for in-progress runs are also limited by memory usage, in addition to the existing 1,000-line limit. ##### Fixes[​](#fixes-7 "Direct link to Fixes") * **dbt platform** * **Profiles API clearing extended attributes**: The Profiles API now allows unsetting extended attributes by setting `extended_attributes_id` to null. * **Recently viewed more reliable**: Recently viewed entries now update atomically and retain the 5 most recent items. * **Run log tailing improvements**: Debug logs for completed runs now consistently fetch only the tail of the log. * **Studio IDE** * **More reliable `show` and `compile`**: CLI flags to disable caching are now positioned correctly to avoid parsing issues. * **Canvas preview improvements**: Fixed argument ordering so `--no-defer` is interpreted consistently. ##### Behavior changes[​](#behavior-changes-8 "Direct link to Behavior changes") * **dbt platform** * **dbt v1.7 end-of-life**: dbt v1.7 is now labeled as end-of-life in version lifecycle messaging. #### January 7, 2026[​](#january-7-2026 "Direct link to January 7, 2026") No changes of note this week. #### December 24, 2025[​](#december-24-2025 "Direct link to December 24, 2025") ##### New[​](#new-7 "Direct link to New") * **AI Codegen** * **File-aware LangGraph agents**: Analysts can now drop `@path` references in the bundled CLI to stream local files into `/private/v1/agents/run`, which are auto-rendered as text inside the run so copilots have the exact config or SQL snippet you referenced. * **dbt platform** * **Slack Copilot feedback loops**: Copilot replies now carry inline "Did that answer your question?" buttons, so you can rate answers without leaving Slack. * **Codex workflows** * **Databricks cost tracking for Model Cost Over Time**: A Databricks history provider and DBU-based cost query now surface daily model cost alongside Snowflake coverage, so Databricks tenants get unified FinOps reporting. * **Canvas** * **CSV upload GA**: The CSV upload endpoint is now generally available. ##### Enhancements[​](#enhancements-9 "Direct link to Enhancements") * **Cloud artifacts** * **Better similar-model suggestions**: Attachment workflows now only recommend meaningfully related models. * **dbt platform** * **Unified SSO & SCIM admin**: Settings consolidate SSO + SCIM, add an empty state for auto-generated slugs, and render read-only login URLs so admins can start configuration without touching slug fields. * **SCIM token management polish**: Token tables gain fixed pagination, inline search, consistent iconography, and clearer deletion warnings to avoid accidental cuts to live integrations. * **Twice the per-environment custom variables**: The v3 API/UI now allow up to 20 scoped environment variables before enforcing limits, giving larger projects more room for secrets. * **Canvas** * **Dialect-aware projection SQL**: SELECT \* RENAME/EXCEPT support now respects each warehouse's syntax using schema metadata, so SQL previews and column metadata stay accurate across Snowflake, Databricks, BigQuery, and Redshift. ##### Fixes[​](#fixes-8 "Direct link to Fixes") * **dbt platform** * **Webhook editor keeps job selections**: Default values are cached after the first render and stop resetting once the user edits the form, eliminating accidental job-list clearing while tabbing through fields. * **Codex GraphQL** * **Exposure parents mirror the manifest**: `parentsModels` and `parentsSources` now derive from the manifest's `parents` list, so exposures with mixed upstreams display complete lineage in both the GraphQL API and UI. ##### Behavior changes[​](#behavior-changes-9 "Direct link to Behavior changes") * **dbt platform** * **Legacy Cost Management UI retired**: All cost management pages and hooks were removed, and platform metadata credentials now only expose catalog ingestion and Cost Insights toggles, eliminating dead-end controls. #### December 17, 2025[​](#december-17-2025 "Direct link to December 17, 2025") ##### New[​](#new-8 "Direct link to New") * **dbt platform** * **Feature licensing service**: A new `/accounts//feature-licenses` endpoint issues short-lived JWTs that encode entitled features, and service/PAT authentication now checks that a caller holds an active license on the target account before any Fusion-enabled workflow runs. * **Databricks platform metadata credentials**: Databricks warehouses can register platform metadata credentials (token plus optional catalog), enabling catalog ingestion, metadata sharing, and Cost Insights pipelines without custom adapters. ##### Enhancements[​](#enhancements-10 "Direct link to Enhancements") * **dbt platform** * **Large list pagination**: Settings's Projects and Credentials now paginate after 25 rows (with search boxes and skeleton states), keeping navigation responsive for large deployments. * **Metadata Explorer** * **Model context & lineage polish**: Model panels now show materialization type, lineage renders metadata strips only when content exists, and upstream public-model columns load automatically for better cross-project visibility. * **Freshness clarity & Studio navigation**: Source tiles respect the `meta5161ExpiredUnconfiguredSources` flag (showing warn/error thresholds) and "Open in IDE" links now point at `/studio/{accountId}/projects/{projectId}` to drop users directly into dbt Studio. * **Insights UI** * **Copilot guardrails**: The Copilot listener now hydrates builder tabs only when a semantic-layer payload arrives, preventing plain-SQL replies from overwriting editor state. * **dbt CLI** * **Improved monorepo support for file sync and the IDE**: * File sync now anchors itself to the invocation directory, making monorepo structures behave more predictably. * Nested `dependencies.yml` files correctly trigger dependency installs. * The IDE’s LSP and file sync now recognize dbt subdirectories properly. * Exclusion lists remain accurate even in multi-project repositories. * **Notifications system** * **Webhook auditability**: Outbound calls now persist the exact JSON body in webhook history, making allowlisting and troubleshooting easier. * **Studio** * **Git sidebar & file refresh parity**: The file tree now mirrors Cloud VCS statuses (including conflicts) and automatically invalidates caches after `dbt deps`/`dbt clean`, so new or removed files appear without a reload. * **Log viewers & Autofix UX**: Command and interactive query logs adopt the new accordion-based viewer, and Autofix sessions in Fusion treat plain `parse` commands as the trigger for deprecation summaries, keeping remediation flows consistent. ##### Fixes[​](#fixes-9 "Direct link to Fixes") * **dbt platform** * **Environment variable editor stability**: Editing one variable no longer backfills blank cells with previously edited values, preventing accidental overrides. * **Cost optimization indicator accuracy**: Job pages once again display “Cost optimization features” whenever Fusion actually runs (and gating conditions are met), so users see the right coverage status regardless of feature-flag permutations. ##### Behavior changes[​](#behavior-changes-10 "Direct link to Behavior changes") * **dbt platform** * **Stronger tenant identity enforcement**: Service/PAT calls without an active license now fail authentication, Slack Copilot sessions build a scoped identity JWT for the invoking user, and SSO providers enforce auto-generated slugs (draft configs can’t be targeted), reducing misconfiguration risk. * **dbt CLI** * **User-isolated invocation history**: Every invocation lookup validates the caller’s user ID, preventing admins from accidentally reading another developer’s runs when multiple accounts share a CLI server. * **IDE server** * **Enhanced security for support-assisted sessions:** Support impersonation sessions now restrict the execution of `show`, `run`, `build`, and `test` commands. Artifacts generated by `dbt show` are also short-lived and will automatically expire after 15 minutes to limit unintended data retention. * **dbt Orchestration** * **Fusion compare support & new dependency**: Fusion tracks now treat `dbt compare` as a supported command (no more target-path hacks). #### December 10, 2025[​](#december-10-2025 "Direct link to December 10, 2025") ##### Enhancements[​](#enhancements-11 "Direct link to Enhancements") * **AI codegen API**: Streaming middleware enforces request-scoped instrumentation across every AI endpoint, offload warehouse calls via threads, and expose human-readable tool names while gating keyword search behind feature flag for approved tenants. * **dbt platform** * **Operations clarity**: Environment profile drawers link directly to connection settings and treat Snowflake fields as optional, while Compare Changes and run-step drawers now explain whether steps failed or were skipped so troubleshooting is faster. * **Collaboration & notifications**: Slack Copilot mentions are now more reliable, with hardened workers, support for CSV attachments, and improved logging. Webhook channels now accept longer URLs, handle “warning-only” subscriptions correctly, and automatically clean up corrupted job IDs. * **Profile & credential management**: Environment APIs accept `secondary_profile_ids`, run acquisition favors profile-backed credentials, and whoami/auth metrics are scrubbed so cross-platform profiles stay in sync. * **dbt CLI server**: Improved stability and performance for large projects. * **Studio IDE**: For dbt Fusion logging, node start and end times will now properly be displayed in command output. * **Studio IDE**: Copilot Chat automatically appears anywhere AI entitlements exist, preview runs auto-cancel when nodes change, and keyboard shortcuts respect native keymaps with clear UI labels. * **Studio IDE**: Tab view, console pane, and command drawer have been redesigned to enhance efficiency and multitasking. ##### Fixes[​](#fixes-10 "Direct link to Fixes") * **Studio IDE server**: Branch creation now returns explicit feedback for bad branch names/SHAs and detects unauthorized Git errors earlier, making automation failures actionable. #### December 3, 2025[​](#december-3-2025 "Direct link to December 3, 2025") ##### New[​](#new-9 "Direct link to New") * **dbt platform** * **Autofix deprecation warnings**: When deprecations are detected, you now see "Autofix deprecation warnings." * **Autofix Packages detailed results**: After running Autofix, you see a results panel with upgraded packages (with links), packages left unchanged and why, and quick access to `packages.yml` to help assess Fusion readiness and next steps. ##### Enhancements[​](#enhancements-12 "Direct link to Enhancements") * **dbt platform** * **Code Quality tab improvements** * Clearer lint/format actions (SQLFluff, Prettier), better empty states, visible Config button when applicable, and simplified logs retrieval. * Applies to SQL, JSON, YAML, and Markdown workflows. * **Editor experience** * Upgraded editor for stability. * Improved container sizing/overflow. * "Save" overlay only appears when tabs are open. * Minor action‑bar refinements. ##### Fixes[​](#fixes-11 "Direct link to Fixes") * **dbt platform lineage and command pane stability**: Reliability improved by aligning with updated IDE and VS Code command APIs; eliminates intermittent skips. ##### Behavior changes[​](#behavior-changes-11 "Direct link to Behavior changes") * **dbt platform:** dbt Core “versionless” renamed to “latest” so it's consistent and clear across tenants. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Write queries with exports StarterEnterpriseEnterprise + ### Write queries with exports [Starter](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing")[Enterprise](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing")[Enterprise +](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing") Exports enhance [saved queries](https://docs.getdbt.com/docs/build/saved-queries.md) by running your saved queries and writing the output to a table or view within your data platform. Saved queries are a way to save and reuse commonly used queries in MetricFlow, exports take this functionality a step further by: * Enabling you to write these queries within your data platform using the dbt job scheduler. * Proving an integration path for tools that don't natively support the Semantic Layer by exposing tables of metrics and dimensions. Essentially, exports are like any other table in your data platform — they enable you to query metric definitions through any SQL interface or connect to downstream tools without a first-class [Semantic Layer integration](https://docs.getdbt.com/docs/cloud-integrations/avail-sl-integrations.md). Running an export counts towards [queried metrics](https://docs.getdbt.com/docs/cloud/billing.md#what-counts-as-a-queried-metric) usage. Querying the resulting table or view from the export does not count toward queried metric usage. #### Prerequisites[​](#prerequisites "Direct link to Prerequisites") * You have a dbt account on a [Starter or Enterprise-tier](https://www.getdbt.com/pricing/) plan. * You use one of the following data platforms: Snowflake, BigQuery, Databricks, Redshift, or Postgres. * You are on [dbt version](https://docs.getdbt.com/docs/dbt-versions/upgrade-dbt-version-in-cloud.md) 1.7 or newer. * You have the Semantic Layer [configured](https://docs.getdbt.com/docs/use-dbt-semantic-layer/setup-sl.md) in your dbt project. * You have a dbt environment with the [job scheduler](https://docs.getdbt.com/docs/deploy/job-scheduler.md) enabled. * You have a [saved query](https://docs.getdbt.com/docs/build/saved-queries.md) and [export configured](https://docs.getdbt.com/docs/build/saved-queries.md#configure-exports) in your dbt project. In your configuration, leverage [caching](https://docs.getdbt.com/docs/use-dbt-semantic-layer/sl-cache.md) to cache common queries, speed up performance, and reduce compute costs. * You have the [dbt CLI](https://docs.getdbt.com/docs/cloud/cloud-cli-installation.md) installed. Note, that exports aren't supported in Studio IDE yet. #### Benefits of exports[​](#benefits-of-exports "Direct link to Benefits of exports") The following section explains the main benefits of using exports:  DRY representation Currently, creating tables often involves generating tens, hundreds, or even thousands of tables that denormalize data into summary or metric mart tables. The main benefit of exports is creating a "Don't Repeat Yourself (DRY)" representation of the logic to construct each metric, dimension, join, filter, and so on. This allows you to reuse those components for long-term scalability, even if you're replacing manually written SQL models with references to the metrics or dimensions in saved queries.  Easier changes Exports ensure that changes to metrics and dimensions are made in one place and then cascade to those various destinations seamlessly. This prevents the problem of needing to update a metric across every model that references that same concept.  Caching Use exports to pre-populate the cache, so that you're pre-computing what you need to serve users through the dynamic Semantic Layer APIs. ###### Considerations[​](#considerations "Direct link to Considerations") Exports offer many benefits and it's important to note some use cases that fall outside the advantages: * Business users may still struggle to consume from tens, hundreds, or thousands of tables, and choosing the right one can be a challenge. * Business users may also make mistakes when aggregating and filtering from the pre-built tables. For these use cases, use the dynamic [Semantic Layer APIs](https://docs.getdbt.com/docs/dbt-cloud-apis/sl-api-overview.md) instead of exports. #### Run exports[​](#run-exports "Direct link to Run exports") Before you're able to run exports in development or production, you'll need to make sure you've [configured saved queries and exports](https://docs.getdbt.com/docs/build/saved-queries.md) in your dbt project. In your saved query config, you can also leverage [caching](https://docs.getdbt.com/docs/use-dbt-semantic-layer/sl-cache.md) with the dbt job scheduler to cache common queries, speed up performance, and reduce compute costs. There are two ways to run an export: * [Run exports in development](#exports-in-development) using the [dbt CLI](https://docs.getdbt.com/docs/cloud/cloud-cli-installation.md) to test the output before production (You can configure exports in the Studio IDE, however running them directly in the Studio IDE isn't supported yet). * If you're using the Studio IDE, use `dbt build` to run exports. Make sure you have the [environment variable](#set-environment-variable) enabled. * [Run exports in production](#exports-in-production) using the [dbt job scheduler](https://docs.getdbt.com/docs/deploy/job-scheduler.md) to write these queries within your data platform. #### Exports in development[​](#exports-in-development "Direct link to Exports in development") You can run an export in your development environment using your development credentials if you want to test the output of the export before production. This section explains the different commands and options available to run exports in development. * Use the [`dbt sl export` command](#exports-for-single-saved-query) to test and generate exports in your development environment for a singular saved query. You can also use the `--select` flag to specify particular exports from a saved query. * Use the [`dbt sl export-all` command](#exports-for-multiple-saved-queries) to run exports for multiple saved queries at once. This command provides a convenient way to manage and execute exports for several queries simultaneously, saving time and effort. * If you're using the Studio IDE, use `dbt build` to run exports. Make sure you have the [environment variable](#set-environment-variable) enabled before running the command. ##### Exports for single saved query[​](#exports-for-single-saved-query "Direct link to Exports for single saved query") Use the following command to run exports in the dbt CLI: ```bash dbt sl export ``` The following table lists the options for `dbt sl export` command, using the `--` flag prefix to specify the parameters: | Parameters | Type | Required | Description | | ------------- | -------------- | -------- | --------------------------------------------------------------------------------------------------------------------- | | `name` | String | Required | Name of the `export` object. | | `saved-query` | String | Required | Name of a saved query that could be used. | | `select` | List or String | Optional | Specify the names of exports to select from the saved query. | | `exclude` | String | Optional | Specify the names of exports to exclude from the saved query. | | `export_as` | String | Optional | Type of export to create from the `export_as` types available in the config. Options available are `table` or `view`. | | `schema` | String | Optional | Schema to use for creating the table or view. | | `alias` | String | Optional | Table alias to use to write the table or view. | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | You can also run any export defined for the saved query and write the table or view in your development environment. Refer to the following command example and output: ```bash dbt sl export --saved-query sq_name ``` The output would look something like this: ```bash Polling for export status - query_id: 2c1W6M6qGklo1LR4QqzsH7ASGFs.. Export completed. ``` ##### Use the select flag[​](#use-the-select-flag "Direct link to Use the select flag") You can have multiple exports for a saved query and by default, all exports are run for a saved query. You can use the `select` flag in [development](#exports-in-development) to select specific or multiple exports. Note, you can’t sub-select metrics or dimensions from the saved query, it’s just to change the export configuration i.e table format or schema For example, the following command runs `export_1` and `export_2` and doesn't work with the `--alias` or `--export_as` flags: ```bash dbt sl export --saved-query sq_name --select export_1,export2 ``` Overriding export configurations The `--select` flag is mainly used to include or exclude specific exports. If you need to change these settings, you can use the following flags to override export configurations: * `--export-as` — Defines the materialization type (table or view) for the export. This creates a new export with its own settings and is useful for testing in development. * `--schema` — Specifies the schema to use for the written table or view. * `--alias` — Assigns a custom alias to the written table or view. This overrides the default export name. Be careful. The `--select` flag *can't* be used with `alias` or `schema`. For example, you can use the following command to create a new export named `new_export` as a table: ```bash dbt sl export --saved-query sq_number1 --export-as table --alias new_export ``` ##### Exports for multiple saved queries[​](#exports-for-multiple-saved-queries "Direct link to Exports for multiple saved queries") Use the command, `dbt sl export-all`, to run exports for multiple saved queries at once. This is different from the `dbt sl export` command, which only runs exports for a singular saved query. For example, to run exports for multiple saved queries, you can use: ```bash dbt sl export-all ``` The output would look something like this: ```bash Exports completed: - Created TABLE at `DBT_SL_TEST.new_customer_orders` - Created VIEW at `DBT_SL_TEST.new_customer_orders_export_alias` - Created TABLE at `DBT_SL_TEST.order_data_key_metrics` - Created TABLE at `DBT_SL_TEST.weekly_revenue` Polling completed ``` The command `dbt sl export-all` provides the flexibility to manage multiple exports in a single command. #### Exports in production[​](#exports-in-production "Direct link to Exports in production") Enabling and executing exports in dbt optimizes data workflows and ensures real-time data access. It enhances efficiency and governance for smarter decisions. Exports use the default credentials of the production environment. To enable exports to run saved queries and write them within your data platform, perform the following steps: 1. [Set an environment variable](#set-environment-variable) in dbt. 2. [Create and execute export](#create-and-execute-exports) job run. ##### Set environment variable[​](#set-environment-variable "Direct link to Set environment variable") 1. Click **Deploy** in the top navigation bar and choose **Environments**. 2. Select **Environment variables**. 3. [Set the environment variable](https://docs.getdbt.com/docs/build/environment-variables.md#setting-and-overriding-environment-variables) key to `DBT_EXPORT_SAVED_QUERIES` and the environment variable's value to `TRUE` (`DBT_EXPORT_SAVED_QUERIES=TRUE`). Doing this ensures saved queries and exports are included in your dbt build job. For example, running `dbt build -s sq_name` runs the equivalent of `dbt sl export --saved-query sq_name` in the dbt Job scheduler. If exports aren't needed, you can set the value(s) to `FALSE` (`DBT_EXPORT_SAVED_QUERIES=FALSE`). [![Add an environment variable to run exports in your production run.](/img/docs/dbt-cloud/semantic-layer/env-var-dbt-exports.png?v=2 "Add an environment variable to run exports in your production run.")](#)Add an environment variable to run exports in your production run. When you run a build job, any saved queries downstream of the dbt models in that job will also run. To make sure your export data is up-to-date, run the export as a downstream step (after the model). ##### Create and execute exports[​](#create-and-execute-exports "Direct link to Create and execute exports") 1. Create a [deploy job](https://docs.getdbt.com/docs/deploy/deploy-jobs.md) and ensure the `DBT_EXPORT_SAVED_QUERIES=TRUE` environment variable is set, as described in [Set environment variable](#set-environment-variable). * This enables you to run any export that needs to be refreshed after a model is built. * Use the [selector syntax](https://docs.getdbt.com/reference/node-selection/syntax.md) `--select` or `-s` option in your build command to specify a particular dbt model or saved query to run. For example, to run all saved queries downstream of the `orders` semantic model, use the following command: ```bash dbt build --select orders+ ``` 2. After dbt finishes building the models, the MetricFlow Server processes the exports, compiles the necessary SQL, and executes this SQL against your data platform. It directly executes a "create table" statement so the data stays within your data platform. 3. Review the exports' execution details in the jobs logs and confirm the export was run successfully. This helps troubleshoot and to ensure accuracy. Since saved queries are integrated into the dbt DAG, all outputs related to exports are available in the job logs. 4. Your data is now available in the data platform for querying! 🎉 #### FAQs[​](#faqs "Direct link to FAQs")  Can I have multiple exports in a single saved query? Yes, this is possible. However, the difference would be the name, schema, and materialization strategy of the export.  How do I run all exports for a saved query? * In production runs, you can build the saved query by calling it directly in the build command, or you build a model and any exports downstream of that model. * In development, you can run all exports by running `dbt sl export --saved-query sq_name`.  Will I run duplicate exports if multiple models are downstream of my saved query? dbt will only run each export once even if it builds multiple models that are downstream of the saved query. For example, you could have a saved query called `order_metrics`, which has metrics from both the `orders` and `order_items` semantic models. You can run a job that includes both models using `dbt build`. This runs both the `orders` and `order_items` models, however, it will only run the `order_metrics` export once.  Can I reference an export as a dbt model using ref() No, you won't be able to reference an export using `ref`. Exports are treated as leaf nodes in your DAG. Modifying an export could lead to inconsistencies with the original metrics from the Semantic Layer.  How can I select saved\_queries by their resource type? To include all saved queries in the dbt build run, use the [`--resource-type` flag](https://docs.getdbt.com/reference/global-configs/resource-type.md) and run the command `dbt build --resource-type saved_query`. #### Related docs[​](#related-docs "Direct link to Related docs") * [Validate semantic nodes in a CI job](https://docs.getdbt.com/docs/deploy/ci-jobs.md#semantic-validations-in-ci) * Configure [caching](https://docs.getdbt.com/docs/use-dbt-semantic-layer/sl-cache.md) * [Semantic Layer FAQs](https://docs.getdbt.com/docs/use-dbt-semantic-layer/sl-faqs.md) #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ## Faqs ### [Error] Could not find my_project package If a package name is included in the `search_order` of a project-level `dispatch` config, dbt expects that package to contain macros which are viable candidates for dispatching. If an included package does not contain *any* macros, dbt will raise an error like: ```shell Compilation Error In dispatch: Could not find package 'my_project' ``` This does not mean the package or root project is missing—it means that any macros from it are missing, and so it is missing from the search spaces available to `dispatch`. If you've tried the step above and are still experiencing this behavior - reach out to the Support team at and we'll be happy to help! #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Account-specific features The features in dbt are tailored to each organization's unique configuration, including user permissions, project setup, and subscription level, with guidance provided to help teams make the most of their available capabilities. This document provides a comprehensive overview of account-specific features in dbt according to plan type. #### Copilot[​](#copilot "Direct link to Copilot") [Copilot](https://docs.getdbt.com/docs/cloud/dbt-copilot.md) is an AI-powered assistant designed to accelerate your development workflow and help you focus on delivering high-quality data. Copilot is available to all users in dbt but limits are imposed according to plan type. Have a look at [dbt's pricing](https://www.getdbt.com/pricing) for more information. #### Copilot features[​](#copilot-features "Direct link to Copilot features") ##### Codegen [Starter](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing")[Enterprise](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing")Enterprise+[​](#codegen- "Direct link to codegen-") Copilot codegen refers to the code generation capabilities provided by Copilot, an AI-powered assistant integrated into dbt. This feature allows users to generate SQL code, documentation, tests, and semantic models directly from natural language prompts, helping automate and accelerate common analytics engineering workflows.⁠⁠⁠⁠ Copilot codegen uses metadata such as relationships, lineage, and model context from your dbt projects to produce contextually accurate code. This helps avoid mistakes common with generic AI tools by ensuring generated code matches your actual schema and conventions.⁠⁠⁠⁠ The code Copilot generates may include: * Base/staging/semantic models (including SQL for new models) * YAML files for documentation or tests * Inline SQL expressions * Semantic model structures and metrics Copilot codegen is available in the Studio IDE, Canvas, and (soon) Insights, making it possible to generate and edit code directly within these interfaces.⁠ ##### Bring your own key (BYOK) [Enterprise](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing")Enterprise+[​](#bring-your-own-key-byok- "Direct link to bring-your-own-key-byok-") BYOK allows users to provide and manage their own encryption or API keys, rather than relying on keys managed by a vendor or third party. This gives organizations greater control over data security, compliance, and contracts. BYOK means users can bring and configure their own OpenAI or Azure OpenAI API key. With BYOK, users have more control over privacy, observability, and security for their data and metadata. Take note of the following when using BYOK: * When you use your own API key, your contract with the LLM provider (not dbt Labs') applies. You are responsible for managing costs, usage limits, and data handling. This means ownership and liability for API use rests with the user, not dbt Labs. * dbt Labs does not impose usage limits on the user's key, as it does with internally managed keys. * OpenAI projects with [data residency controls](https://platform.openai.com/docs/guides/your-data#data-residency-controls) enabled and configured for the United States (project region set to US) don't currently support BYOK. These projects can only use the API key in the dbt platform configuration. Specifying custom endpoints required for data residency isn’t yet supported, and we’re evaluating a solution for this. To use BYOK, ensure your OpenAI project doesn’t have data residency controls enabled. Projects without project region settings will use the standard OpenAI endpoint (`https://api.openai.com`) and support BYOK. * Currently, BYOK in dbt supports OpenAI and Azure-hosted OpenAI API keys. Users enter their key through the [account settings](https://docs.getdbt.com/docs/cloud/account-settings.md), and requests made by Copilot or other AI features are billed directly to the customer by the respective provider.⁠⁠⁠⁠ info The Copilot experience with BYOK and Azure OpenAI will not use metadata information in Insights, Canvas, or the Studio IDE. Without this contextual data, the LLM's responses may be suboptimal compared to those generated by the default dbt AI service. This is a temporary limitation, and we are working on an update that will enable the use of Azure OpenAI APIs. If you choose to BYOK, we don't monitor or collect any data related to your usage. Some of the reasons organizations require BYOK include: * Regulatory and compliance demands (for example, keeping encryption keys or sensitive operations under customer control) * Assurances about how and where data is processed * Ability to negotiate and manage their own vendor contracts * Ability to collect their own observability metrics Note that BYOK is different from bring your own cloud (BYOC). BYOK refers to key or credential management, whereas BYOC refers to running software workloads in your own cloud environment. ##### Natural language in Canvas [Enterprise](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing")Enterprise+[​](#natural-language-in-canvas- "Direct link to natural-language-in-canvas-") Natural language in Canvas refers to the ability to build data models visually in Canvas using plain language prompts, powered by GenAI (Copilot). You can describe what you want to build or transform, and the tool generates the underlying SQL and transformation steps for you. No SQL expertise is required. It's aimed at making data modeling more accessible to less-technical users or anyone who prefers a drag-and-drop or conversational interface over hand-coding SQL.⁠⁠ Natural language lets users translate business questions or transformation requests directly into data workflows. This accelerates the process of creating governed, production-ready models while maintaining best practices and version control. You can edit Canvas models collaboratively, and you can see both the graphical workflow and the SQL code it produces.⁠⁠⁠⁠ The natural language capability is fully integrated into the Canvas workspace. You can start with a blank model and generate models or transformation steps by specifying requirements in everyday language. Copilot interprets the request, constructs the model in the Canvas, and presents it visually — making it easy to refine, preview, and publish changes.⁠⁠ This approach is especially valuable for analysts and business users, allowing broader participation in data transformation tasks without losing dbt's governance, reproducibility, and code review processes. #### Canvas [Enterprise](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing")Enterprise+[​](#canvas- "Direct link to canvas-") Canvas enables efficient data access and transformation through a visual interface, combining the benefits of code-driven development with AI-assisted code generation for a seamless, flexible experience. #### dbt Insights [Enterprise](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing")Enterprise+[​](#dbt-insights- "Direct link to dbt-insights-") Insights is an interactive feature in dbt designed for writing, running, and analyzing SQL queries within an intuitive interface. It brings together SQL query execution, results visualization, and integration with dbt metadata and documentation — all in one place.⁠⁠ It supports key features such as query history, the ability to export results to CSV, basic charting (for example, line and bar charts), and direct links to Catalog and the Studio IDE for a seamless workflow between exploration and development.⁠⁠⁠⁠ Analysts can quickly analyze metrics across data, while engineers can leverage context, metadata, and dbt lineage details to debug or validate data models.⁠⁠⁠⁠ You can save and share frequently used SQL queries, and explore documentation or data lineage as you work. Each query's results are, for now, limited to 500 rows (with plans to increase this).⁠⁠ The interface supports syntax highlighting, code completion, asset linking (to easily reference dbt models/tables), and connects to the Semantic Layer for querying metrics or columns by name. While Insights supports some light visualizations and query sharing, it is not intended to replace BI tools for reporting or dashboarding. Instead, it's focused on fast ad hoc analysis and insight generation. Integrations allow users to "jump off" into downstream BI tools with their queries if needed.⁠ #### dbt Mesh cross platform [Enterprise](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing")Enterprise+[​](#dbt-mesh-cross-platform- "Direct link to dbt-mesh-cross-platform-") dbt Mesh cross-platform (sometimes called "cross-platform Mesh" or "cross-platform dbt Mesh") is a capability in dbt Mesh that allows for referencing models and sharing lineage across multiple dbt projects, even when those projects use different data warehouse platforms. #### SCIM [Enterprise](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing")Enterprise+[​](#scim- "Direct link to scim-") SCIM (System for Cross-Domain Identity Management) automates user identities and groups, enhancing security and simplifying admin tasks. It allows for real-time user provisioning, deprovisioning, and profile updates in dbt, primarily using Okta as the identity provider. #### Hybrid projects [Enterprise](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing")Enterprise+[​](#hybrid-projects- "Direct link to hybrid-projects-") Hybrid projects refer to a setup where both dbt Core and dbt are utilized within the same organization, often working on the same codebase or data platform. This approach enables different teams or contributors to work in the environment that aligns best with their preferences or workflows, while still benefiting from shared assets and centralized metadata. #### Enterprise security Enterprise+[​](#enterprise-security- "Direct link to enterprise-security-") Enterprise security includes robust capabilities for managing network access and user permissions, designed to safeguard sensitive data. Two widely used features that support these efforts are PrivateLink and IP allowlisting. ##### PrivateLink[​](#privatelink "Direct link to PrivateLink") PrivateLink provides a secure and private connection between your organization's environments, such as databases, version control systems, or data warehouses and dbt. This setup ensures that traffic remains within the AWS network, avoiding exposure to the public internet. ##### IP allowlist[​](#ip-allowlist "Direct link to IP allowlist") IP restrictions (IP allowlist/blocklist) let organizations control which IPs can access their dbt account. #### Projects and run slots[​](#projects-and-run-slots "Direct link to Projects and run slots") The number of projects and run slots available to your organization varies based on your selected plan tier. For detailed information, please refer to our [pricing page](https://www.getdbt.com/pricing). #### Upgrade plan[​](#upgrade-plan "Direct link to Upgrade plan") dbt offers a range of plans with varying features to suit different organizational needs. For information on the different plan types and upgrading your plan, refer to our document on [How to upgrade a dbt Cloud account](https://docs.getdbt.com/faqs/Accounts/cloud-upgrade-instructions.md). #### Related content[​](#related-content "Direct link to Related content") * [Billing](https://docs.getdbt.com/docs/cloud/billing.md) #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Add a seed file 1. Add a seed file: seeds/country\_codes.csv ```text country_code,country_name US,United States CA,Canada GB,United Kingdom ... ``` 2. Run `dbt seed` 3. Ref the model in a downstream model models/something.sql ```sql select * from {{ ref('country_codes') }} ``` #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Are the results of freshness stored anywhere? Yes! The `dbt source freshness` command will output a pass/warning/error status for each table selected in the freshness snapshot. Additionally, dbt will write the freshness results to a file in the `target/` directory called `sources.json` by default. You can also override this destination, use the `-o` flag to the `dbt source freshness` command. After enabling source freshness within a job, configure [Artifacts](https://docs.getdbt.com/docs/deploy/artifacts.md) in your **Project Details** page, which you can find by selecting your account name on the left side menu in dbt and clicking **Account settings**. You can see the current status for source freshness by clicking **View Sources** in the job page. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Are there any example dbt projects? Yes! * **Quickstart Tutorial:** You can build your own example dbt project in the [quickstart guide](https://docs.getdbt.com/docs/get-started-dbt.md) * **Jaffle Shop:** A demonstration project (closely related to the tutorial) for a fictional e-commerce store ([main source code](https://github.com/dbt-labs/jaffle-shop) and [source code using duckdb](https://github.com/dbt-labs/jaffle_shop_duckdb)) * **GitLab:** Gitlab's internal dbt project is open source and is a great example of how to use dbt at scale ([source code](https://gitlab.com/gitlab-data/analytics/-/tree/master/transform/snowflake-dbt)) * **dummy-dbt:** A containerized dbt project that populates the Sakila database in Postgres and populates dbt seeds, models, snapshots, and tests. The project can be used for testing and experimentation purposes ([source code](https://github.com/gmyrianthous/dbt-dummy)) * **Google Analytics 4:** A demonstration project that transforms the Google Analytics 4 BigQuery exports to various models ([source code](https://github.com/stacktonic-com/stacktonic-dbt-example-project), [docs](https://stacktonic.com/article/google-analytics-big-query-and-dbt-a-dbt-example-project)) * **Make Open Data:** A production-grade ELT with tests, documentation, and CI/CD (GHA) about French open data (housing, demography, geography, etc). It can be used to learn with voluminous and ambiguous data. Contributions are welcome ([source code](https://github.com/make-open-data/make-open-data), [docs](https://make-open-data.fr/)) If you have an example project to add to this list, suggest an edit by clicking **Edit this page** below. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Can I add tests and descriptions in a SQL config block? dbt has the ability to define node configs in YAML files, in addition to `config()` blocks and `dbt_project.yml`. But the reverse isn't always true: there are some things in `.yml` files that can *only* be defined there. Certain properties are special, because: * They have a unique Jinja rendering context * They create new project resources * They don't make sense as hierarchical configuration * They're older properties that haven't yet been redefined as configs These properties are: * [`description`](https://docs.getdbt.com/reference/resource-properties/description.md) * [`tests`](https://docs.getdbt.com/reference/resource-properties/data-tests.md) * [`docs`](https://docs.getdbt.com/reference/resource-configs/docs.md) * `columns` * [`quote`](https://docs.getdbt.com/reference/resource-properties/columns.md#quote) * [`source` properties](https://docs.getdbt.com/reference/source-properties.md) (e.g. `loaded_at_field`, `freshness`) * [`exposure` properties](https://docs.getdbt.com/reference/exposure-properties.md) (e.g. `type`, `maturity`) * [`macro` properties](https://docs.getdbt.com/reference/resource-properties/arguments.md) (e.g. `arguments`) #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Can I build my models in a schema other than my target schema or split my models across multiple schemas? Yes! Use the [schema](https://docs.getdbt.com/reference/resource-configs/schema.md) configuration in your `dbt_project.yml` file, or using a `config` block: dbt\_project.yml ```yml name: jaffle_shop ... models: jaffle_shop: marketing: +schema: marketing # models in the `models/marketing/` subdirectory will use the marketing schema ``` models/customers.sql ```sql {{ config( schema='core' ) }} ``` #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Can I build my seeds in a schema other than my target schema or can I split my seeds across multiple schemas? Yes! Use the [schema](https://docs.getdbt.com/reference/resource-configs/schema.md) configuration in your `dbt_project.yml` file. dbt\_project.yml ```yml name: jaffle_shop ... seeds: jaffle_shop: +schema: mappings # all seeds in this project will use the schema "mappings" by default marketing: +schema: marketing # seeds in the "seeds/marketing/" subdirectory will use the schema "marketing" ``` #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Can I connect my dbt project to two databases? The meaning of the term 'database' varies with each major warehouse manager. Hence, the answer to "can a dbt project connect to more than one database?" depends on the warehouse used in your tech stack. * dbt projects connecting to warehouses like Snowflake or Bigquery—these empower one set of credentials to draw from all datasets or 'projects' available to an account—are *sometimes* said to connect to more than one database. * dbt projects connecting to warehouses like Redshift and Postgres—these tie one set of credentials to one database—are said to connect to one database only. Sidestep the 'one database problem' by relying on ELT thinking (i.e. extract -> load -> transform). Remember, dbt is not a loader--with few exceptions, it doesn't move data from sources to a warehouse. dbt is a transformer. It enters the picture after extractors and loaders have funneled sources into a warehouse. It moves and combines data inside the warehouse itself. Hence, instead of thinking "how do I connect my dbt project to two databases", ask "what loader services will best prepare our warehouse for dbt transformations." For more on the modern 'ELT-powered' data stack, see the "dbt and the modern BI stack" section of this [dbt blog post](https://blog.getdbt.com/what-exactly-is-dbt). #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Can I define private packages in the dependencies.yml file? It depends on how you're accessing your private packages: * If you're using [native private packages](https://docs.getdbt.com/docs/build/packages.md#native-private-packages), you can define them in the `dependencies.yml` file. * If you're using the [git token method](https://docs.getdbt.com/docs/build/packages.md#git-token-method), you must define them in the `packages.yml` file instead of the `dependencies.yml` file. This is because conditional rendering (like Jinja-in-yaml) is not supported in `dependencies.yml`. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Can I document things other than models, like sources, seeds, and snapshots? Yes! You can document almost everything in your project using the `description:` key. Check out the reference docs on [descriptions](https://docs.getdbt.com/reference/resource-properties/description.md) for more info! #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Can I pay via invoice? Currently for Starter plans, self-service dbt payments must be made with a credit card and by default, they will be billed monthly based on the number of [active developer seats and usage](https://docs.getdbt.com/docs/cloud/billing.md). We don't have any plans to do invoicing for self-service teams in the near future, but we *do* currently support invoices for companies on the **dbt Enterprise or Enterprise+ plans.** Feel free to [contact us](https://www.getdbt.com/contact) to build your Enterprise pricing. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Can I set a different connection at the environment level? dbt supports [Connections](https://docs.getdbt.com/docs/cloud/connect-data-platform/about-connections.md#connection-management), available to all dbt users. Connections allows different data platform connections per environment, eliminating the need to duplicate projects. Projects can only use multiple connections of the same warehouse type. Connections are reusable across projects and environments. In dbt Core, you can maintain separate production and development environments through the use of [`targets`](https://docs.getdbt.com/reference/dbt-jinja-functions/target.md) within a [profile](https://docs.getdbt.com/docs/local/profiles.yml.md). dbt Core users can define different targets in their profiles.yml, which means you can have targets for different data warehouses for the same profile. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Can I set test failure thresholds? You can use the `error_if` and `warn_if` configs to set custom failure thresholds in your tests. For more details, see [reference](https://docs.getdbt.com/reference/resource-configs/severity.md) for more information. You can also try the following solutions: * Setting the [severity](https://docs.getdbt.com/reference/resource-configs/severity.md) to `warn` or `error` * Writing a [custom generic test](https://docs.getdbt.com/best-practices/writing-custom-generic-tests.md) that accepts a threshold argument ([example](https://discourse.getdbt.com/t/creating-an-error-threshold-for-schema-tests/966)) #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Can I store my data tests in a directory other than the `tests` directory in my project? By default, dbt expects your singular data test files to be located in the `tests` subdirectory of your project, and generic data test definitions to be located in `tests/generic` or `macros`. To change this, update the [test-paths](https://docs.getdbt.com/reference/project-configs/test-paths.md) configuration in your `dbt_project.yml` file, like so: dbt\_project.yml ```yml test-paths: ["my_cool_tests"] ``` Then, you can define generic data tests in `my_cool_tests/generic/`, and singular data tests everywhere else in `my_cool_tests/`. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Can I store my models in a directory other than the `models` directory in my project? By default, dbt expects the files defining your models to be located in the `models` subdirectory of your project. To change this, update the [model-paths](https://docs.getdbt.com/reference/project-configs/model-paths.md) configuration in your `dbt_project.yml` file, like so: dbt\_project.yml ```yml model-paths: ["transformations"] ``` #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Can I store my seeds in a directory other than the `seeds` directory in my project? By default, dbt expects your seed files to be located in the `seeds` subdirectory of your project. To change this, update the [seed-paths](https://docs.getdbt.com/reference/project-configs/seed-paths.md) configuration in your `dbt_project.yml` file, like so: dbt\_project.yml ```yml seed-paths: ["custom_seeds"] ``` #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Can I store my snapshots in a directory other than the `snapshot` directory in my project? By default, dbt expects your snapshot files to be located in the `snapshots` subdirectory of your project. To change this, update the [snapshot-paths](https://docs.getdbt.com/reference/project-configs/snapshot-paths.md) configuration in your `dbt_project.yml` file, like so: dbt\_project.yml ```yml snapshot-paths: ["snapshots"] ``` Note that you cannot co-locate snapshots and models in the same directory. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Can I test the uniqueness of two columns? Yes, there's a few different options for testing the uniqueness of two columns. Consider an orders table that contains records from multiple countries, and the combination of ID and country code is unique: | order\_id | country\_code | | --------- | ------------- | | 1 | AU | | 2 | AU | | ... | ... | | 1 | US | | 2 | US | | ... | ... | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | Here are some approaches: ###### 1. Create a unique key in the model and test that[​](#1-create-a-unique-key-in-the-model-and-test-that "Direct link to 1. Create a unique key in the model and test that") models/orders.sql ```sql select country_code || '-' || order_id as surrogate_key, ... ``` models/orders.yml ```yml models: - name: orders columns: - name: surrogate_key data_tests: - unique ``` ###### 2. Test an expression[​](#2-test-an-expression "Direct link to 2. Test an expression") models/orders.yml ```yml models: - name: orders data_tests: - unique: arguments: # available in v1.10.5 and higher. Older versions can set the as the top-level property. column_name: "(country_code || '-' || order_id)" ``` ###### 3. Use the `dbt_utils.unique_combination_of_columns` test[​](#3-use-the-dbt_utilsunique_combination_of_columns-test "Direct link to 3-use-the-dbt_utilsunique_combination_of_columns-test") This is especially useful for large datasets since it is more performant. Check out the docs on [packages](https://docs.getdbt.com/docs/build/packages.md) for more information. models/orders.yml ```yml models: - name: orders data_tests: - dbt_utils.unique_combination_of_columns: arguments: # available in v1.10.5 and higher. Older versions can set the as the top-level property. combination_of_columns: - country_code - order_id ``` #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Can I use a YAML file extension? No. At present, dbt will only search for files with a `.yml` file extension. In a future release of dbt, dbt will also search for files with a `.yaml` file extension. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Can I use environment variables in my profile? Yes! Check out the docs on [environment variables](https://docs.getdbt.com/reference/dbt-jinja-functions/env_var.md) for more information. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Can I use seeds to load raw data? Seeds should **not** be used to load raw data (for example, large CSV exports from a production database). Since seeds are version controlled, they are best suited to files that contain business-specific logic, for example a list of country codes or user IDs of employees. Loading CSVs using dbt's seed functionality is not performant for large files. Consider using a different tool to load these CSVs into your data warehouse. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Can self-hosted GitLab instances only be connected via dbt Enterprise plans? Presently yes, this is only available to Enterprise users. This is because of the way you have to set up the GitLab app redirect URL for auth, which can only be customized if you're a user on an Enterprise plan. Check out our [pricing page](https://www.getdbt.com/pricing/) for more information or feel free to [contact us](https://www.getdbt.com/contact) to build your Enterprise pricing. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Debug Snapshot target is not a snapshot table errors If you see the following error when you try executing the snapshot command: > Snapshot target is not a snapshot table (missing `dbt_scd_id`, `dbt_valid_from`, `dbt_valid_to`) Double check that you haven't inadvertently caused your snapshot to behave like table materializations by setting its `materialized` config to be `table`. Prior to dbt version 1.4, it was possible to have a snapshot like this: ```sql {% snapshot snappy %} {{ config(materialized = 'table', ...) }} ... {% endsnapshot %} ``` dbt is treating snapshots like tables (issuing `create or replace table ...` statements) **silently** instead of actually snapshotting data (SCD2 via `insert` / `merge` statements). When upgrading to dbt versions 1.4 and higher, dbt now raises a Parsing Error (instead of silently treating snapshots like tables) that reads: ```text A snapshot must have a materialized value of 'snapshot' ``` This tells you to change your `materialized` config to `snapshot`. But when you make that change, you might encounter an error message saying that certain fields like `dbt_scd_id` are missing. This error happens because, previously, when dbt treated snapshots as tables, it didn't include the necessary [snapshot meta-fields](https://docs.getdbt.com/docs/build/snapshots.md#snapshot-meta-fields) in your target table. Since those meta-fields don't exist, dbt correctly identifies that you're trying to create a snapshot in a table that isn't actually a snapshot. When this happens, you have to start from scratch — re-snapshotting your source data as if it was the first time by dropping your "snapshot" which isn't a real snapshot table. Then dbt snapshot will create a new snapshot and insert the snapshot meta-fields as expected. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Do hooks run with seeds? Yes! The following hooks are available: * [pre-hooks & post-hooks](https://docs.getdbt.com/reference/resource-configs/pre-hook-post-hook.md) * [on-run-start & on-run-end hooks](https://docs.getdbt.com/reference/project-configs/on-run-start-on-run-end.md) Configure these in your `dbt_project.yml` file. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Do hooks run with snapshots? Yes! The following hooks are available for snapshots: * [pre-hooks](https://docs.getdbt.com/reference/resource-configs/pre-hook-post-hook.md) * [post-hooks](https://docs.getdbt.com/reference/resource-configs/pre-hook-post-hook.md) * [on-run-start](https://docs.getdbt.com/reference/project-configs/on-run-start-on-run-end.md) * [on-run-end](https://docs.getdbt.com/reference/project-configs/on-run-start-on-run-end.md) #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Do I need to add a YAML entry for column for it to appear in the docs site? Fortunately, no! dbt will introspect your warehouse to generate a list of columns in each relation, and match it with the list of columns in your `.yml` files. As such, any undocumented columns will still appear in your documentation! #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Do I need to create my target schema before running dbt? Nope! dbt will check if the schema exists when it runs. If the schema does not exist, dbt will create it for you. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Do ref-able resource names need to be unique? Within one project: yes! To build dependencies between resources (such as models, seeds, and snapshots), you need to use the `ref` function, and pass in the resource name as an argument. dbt uses that resource name to uniquely resolve the `ref` to a specific resource. As a result, these resource names need to be unique, *even if they are in distinct folders*. A resource in one project can have the same name as a resource in another project (installed as a dependency). dbt uses the project name to uniquely identify each resource. We call this "namespacing." If you `ref` a resource with a duplicated name, it will resolve to the resource within the same namespace (package or project), or raise an error because of an ambiguous reference. Use [two-argument `ref`](https://docs.getdbt.com/reference/dbt-jinja-functions/ref.md#ref-project-specific-models) to disambiguate references by specifying the namespace. Those resource will still need to land in distinct locations in the data warehouse. Read the docs on [custom aliases](https://docs.getdbt.com/docs/build/custom-aliases.md) and [custom schemas](https://docs.getdbt.com/docs/build/custom-schemas.md) for details on how to achieve this. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Does dbt offer extract and load functionality? dbt is a transformation tool. It is *not* designed for extract or load functionality, and dbt Labs strongly recommends against using dbt in this way. Support is not provided for extract or load functionality. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Does my `.yml` file containing tests and descriptions need to be named `schema.yml`? No! You can name this file whatever you want (including `whatever_you_want.yml`), so long as: * The file is in your `models/` directory¹ * The file has `.yml` extension Check out the [docs](https://docs.getdbt.com/reference/configs-and-properties.md) for more information. ¹If you're declaring properties for seeds, snapshots, or macros, you can also place this file in the related directory — `seeds/`, `snapshots/` and `macros/` respectively. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Does my operating system have prerequisites? Your operating system may require pre-installation setup before installing dbt Core with pip. After downloading and installing any dependencies specific to your development environment, you can proceed with the [pip installation of dbt Core](https://docs.getdbt.com/docs/local/install-dbt.md). ##### CentOS[​](#centos "Direct link to CentOS") CentOS requires Python and some other dependencies to successfully install and run dbt Core. To install Python and other dependencies: ```shell sudo yum install redhat-rpm-config gcc libffi-devel \ python-devel openssl-devel ``` ##### MacOS[​](#macos "Direct link to MacOS") The MacOS requires Python 3.8 or higher to successfully install and run dbt Core. To check the Python version: ```shell python --version ``` If you need a compatible version, you can download and install [Python version 3.9 or higher for MacOS](https://www.python.org/downloads/macos). If your machine runs on an Apple M1 architecture, we recommend that you install dbt via [Rosetta](https://support.apple.com/en-us/HT211861). This is necessary for certain dependencies that are only supported on Intel processors. ##### Ubuntu/Debian[​](#ubuntudebian "Direct link to Ubuntu/Debian") Ubuntu requires Python and other dependencies to successfully install and run dbt Core. To install Python and other dependencies: ```shell sudo apt-get install git libpq-dev python-dev python3-pip sudo apt-get remove python-cffi sudo pip install --upgrade cffi pip install cryptography~=3.4 ``` ##### Windows[​](#windows "Direct link to Windows") Windows requires Python and git to successfully install and run dbt Core. Install [Git for Windows](https://git-scm.com/downloads) and [Python version 3.9 or higher for Windows](https://www.python.org/downloads/windows/). For further questions, please see the [Python compatibility FAQ](https://docs.getdbt.com/faqs/Core/install-python-compatibility.md) #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Does the Cost Insights feature incur warehouse costs? dbt issues lightweight, read-only queries against your warehouse to retrieve metadata and to power features such as Cost Insights. dbt scopes and filters these queries to minimize impact, and most customers see negligible costs (typically on the order of cents). #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Errors importing a repository on dbt project set up If you don't see your repository listed, double-check that: * Your repository is in a Gitlab group you have access to. dbt will not read repos associated with a user. If you do see your repository listed, but are unable to import the repository successfully, double-check that: * You are a maintainer of that repository. Only users with maintainer permissions can set up repository connections. If you imported a repository using the dbt native integration with GitLab, you should be able to see if the clone strategy is using a `deploy_token`. If it's relying on an SSH key, this means the repository was not set up using the native GitLab integration, but rather using the generic git clone option. The repository must be reconnected in order to get the benefits described above. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### GitLab token refresh message When you connect dbt to a GitLab repository, GitLab automatically creates a [project access token](https://docs.gitlab.com/ee/user/project/settings/project_access_tokens.html) in your GitLab repository in the background. This sends the job run status back to Gitlab using the dbt API for CI jobs. By default, the project access token follows a naming pattern: `dbt token for GitLab project: `. If you have multiple tokens in your repository, look for one that follows this pattern to identify the correct token used by dbt. If you're receiving a "Refresh token" message, don't worry — dbt automatically refreshes this project access token for you, which means you never have to manually rotate it. If you still experience any token refresh errors, please try disconnecting and reconnecting the repository in your dbt project to refresh the token. For any issues, please reach out to the Support team at and we'll be happy to help! #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### How can I connect dbt to a Google Source repository? Although we don't officially support Google Cloud as a git repository, the below workaround using the SSH URL method should help you to connect: * First: "import" your Repository into dbt using the SSH URL provided to you by GCP. That will look something like: `ssh://drew@fishtownanalytics.com@source.developers.google.com:2022/p/dbt-integration-tests/r/drew-debug` * After importing the repo, you should see a public key generated by dbt for the repository. You'll want to copy that public key into a new SSH Key for your user here: * After saving this SSH key, dbt should be able to read and write to this repo. If you've tried the workaround above and are still having issues connecting - reach out to the Support team at and we'll be happy to help! #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### How can I consolidate projects in dbt? Consolidating your dbt projects can be an enormous task, and there is no universal solution. But, there are some common approaches to project consolidation in dbt that you can follow, depending on the scope of the work that needs to be done. If you have multiple projects that contain production-worthy code, there are rarely straightforward solutions to merging them. Let's suppose you have `Main Project` and `Smaller Subset Project`. #### Files and Folders[​](#files-and-folders "Direct link to Files and Folders") ##### Git and the local directory[​](#git-and-the-local-directory "Direct link to Git and the local directory") Reference the [merge git commands](https://gist.github.com/msrose/2feacb303035d11d2d05) to help complete the migration plan. Using the commands will help retain git commit history, but you might result in duplicate folders called `models`, `tests`, etc. You will most likely still have to move files around manually. Another option would be to use an external code editor (for example, VS Code) to move files from the `Smaller Subset Project` to the `Main Project`. This is what internal dbt Labs experts recommend to stay informed about what comes over to the main project and also allows you to be more aware of the incoming files, with the ability to make any minor tweaks to folder hierarchy that you might want to do at the same time. ##### Manual migration with multiple browser tabs[​](#manual-migration-with-multiple-browser-tabs "Direct link to Manual migration with multiple browser tabs") If you only have a couple of models or macros that you want to consolidate, copy the raw file contents from your git provider in `Smaller Subset Project`. Then, in the Studio IDE, paste the contents into a new file in your `Main Project`. Alternatively, you can download those files from your git provider (`Smaller Subset Project` repo) and upload them back to your other repository (`Main Project` repo). This doesn’t scale well and could bypass change controls, so it might only be a viable solution for organizations with only a few files. #### Production jobs[​](#production-jobs "Direct link to Production jobs") If you have multiple projects with deployment environments deploying jobs, this poses another challenge. Assuming all the models from `Smaller Subset Project` can be consolidated into `Main Project`, your commands within your jobs will take on a new meaning. In lieu of refactoring your global job strategy at the same time, you can add tags to the incoming project models and utilize that in your job command syntax, with the help of node selection syntax. Main Project job command example: `dbt build --exclude tag:smaller_subset_project` Smaller Subset Project commands: `dbt build --select tag:smaller_subset_project` #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### How can I fix my .gitignore file? A `.gitignore` file specifies which files git should intentionally ignore or 'untrack'. dbt indicates untracked files in the project file explorer pane by putting the file or folder name in *italics*. If you encounter issues like problems reverting changes, checking out or creating a new branch, or not being prompted to open a pull request after a commit in the Studio IDE — this usually indicates a problem with the [.gitignore](https://github.com/dbt-labs/dbt-starter-project/blob/main/.gitignore) file. The file may be missing or lacks the required entries for dbt to work correctly. The following sections describe how to fix the `.gitignore` file in:  Fix in the Studio IDE To resolve issues with your `gitignore` file, adding the correct entries won't automatically remove (or 'untrack') files or folders that have already been tracked by git. The updated `gitignore` will only prevent new files or folders from being tracked. So you'll need to first fix the `gitignore` file, then perform some additional git operations to untrack any incorrect files or folders. 1. Launch the Studio IDE into the project that is being fixed, by selecting **Develop** on the menu bar. 2. In your **File Explorer**, check to see if a `.gitignore` file exists at the root of your dbt project folder. If it doesn't exist, create a new file. 3. Open the new or existing `gitignore` file, and add the following: ```bash # ✅ Correct target/ dbt_packages/ logs/ # legacy -- renamed to dbt_packages in dbt v1 dbt_modules/ ``` * **Note** — You can place these lines anywhere in the file, as long as they're on separate lines. The lines shown are wildcards that will include all nested files and folders. Avoid adding a trailing `'*'` to the lines, such as `target/*`. For more info on `gitignore` syntax, refer to the [Git docs](https://git-scm.com/docs/gitignore). 4. Save the changes but *don't commit*. 5. Restart the IDE by clicking on the three dots next to the **IDE Status button** on the lower right corner of the IDE screen and select **Restart IDE**. [![Restart the IDE by clicking the three dots on the lower right or click on the Status bar](/img/docs/dbt-cloud/cloud-ide/restart-ide.png?v=2 "Restart the IDE by clicking the three dots on the lower right or click on the Status bar")](#)Restart the IDE by clicking the three dots on the lower right or click on the Status bar 6. Once the Studio IDE restarts, go to the **File Catalog** to delete the following files or folders (if they exist). No data will be lost: * `target`, `dbt_modules`, `dbt_packages`, `logs` 7. **Save** and then **Commit and sync** the changes. 8. Restart the Studio IDE again using the same procedure as step 5. 9. Once the Studio IDE restarts, use the **Create a pull request** (PR) button under the **Version Control** menu to start the process of integrating the changes. 10. When the git provider's website opens to a page with the new PR, follow the necessary steps to complete and merge the PR into the main branch of that repository. * **Note** — The 'main' branch might also be called 'master', 'dev', 'qa', 'prod', or something else depending on the organizational naming conventions. The goal is to merge these changes into the root branch that all other development branches are created from. 11. Return to the Studio IDE and use the **Change Branch** button, to switch to the main branch of the project. 12. Once the branch has changed, click the **Pull from remote** button to pull in all the changes. 13. Verify the changes by making sure the files/folders in the `.gitignore` file are in italics. [![A dbt project on the main branch that has properly configured gitignore folders (highlighted in italics).](/img/docs/dbt-cloud/cloud-ide/gitignore-italics.png?v=2 "A dbt project on the main branch that has properly configured gitignore folders (highlighted in italics).")](#)A dbt project on the main branch that has properly configured gitignore folders (highlighted in italics).  Fix in the Git provider Sometimes it's necessary to use the git providers web interface to fix a broken `.gitignore` file. Although the specific steps may vary across providers, the general process remains the same. There are two options for this approach: editing the main branch directly if allowed, or creating a pull request to implement the changes if required: * Edit in main branch * Unable to edit main branch When permissions allow it, it's possible to edit the `.gitignore` directly on the main branch of your repo. Here are the following steps: 1. Go to your repository's web interface. 2. Switch to the main branch and the root directory of your dbt project. 3. Find the `.gitignore` file. Create a blank one if it doesn't exist. 4. Edit the file in the web interface, adding the following entries: ```bash target/ dbt_packages/ logs/ # legacy -- renamed to dbt_packages in dbt v1 dbt_modules/ ``` 5. Commit (save) the file. 6. Delete the following folders from the dbt project root, if they exist. No data or code will be lost: * `target`, `dbt_modules`, `dbt_packages`, `logs` 7. Commit (save) the deletions to the main branch. 8. Switch to the Studio IDE , and open the project that you're fixing. 9. [Rollback your repo to remote](https://docs.getdbt.com/docs/cloud/git/version-control-basics.md#the-git-button-in-the-cloud-ide) in the IDE by clicking on the three dots next to the **IDE Status** button on the lower right corner of the IDE screen, then select **Rollback to remote**. * **Note** — Rollback to remote resets your repo back to an earlier clone from your remote. Any saved but uncommitted changes will be lost, so make sure you copy any modified code that you want to keep in a temporary location outside of dbt. 10. Once you rollback to remote, open the `.gitignore` file in the branch you're working in. If the new changes aren't included, you'll need to merge the latest commits from the main branch into your working branch. 11. Go to the **File Explorer** to verify the `.gitignore` file contains the correct entries and make sure the untracked files/folders in the .gitignore file are in *italics*. 12. Great job 🎉! You've configured the `.gitignore` correctly and can continue with your development! If you can't edit the `.gitignore` directly on the main branch of your repo, follow these steps: 1. Go to your repository's web interface. 2. Switch to an existing development branch, or create a new branch just for these changes (This is often faster and cleaner). 3. Find the `.gitignore` file. Create a blank one if it doesn't exist. 4. Edit the file in the web interface, adding the following entries: ```bash target/ dbt_packages/ logs/ # legacy -- renamed to dbt_packages in dbt v1 dbt_modules/ ``` 5. Commit (save) the file. 6. Delete the following folders from the dbt project root, if they exist. No data or code will be lost: * `target`, `dbt_modules`, `dbt_packages`, `logs` 7. Commit (save) the deleted folders. 8. Open a merge request using the git provider web interface. The merge request should attempt to merge the changes into the 'main' branch that all development branches are created from. 9. Follow the necessary procedures to get the branch approved and merged into the 'main' branch. You can delete the branch after the merge is complete. 10. Once the merge is complete, go back to the Studio IDE, and open the project that you're fixing. 11. [Rollback your repo to remote](https://docs.getdbt.com/docs/cloud/git/version-control-basics.md#the-git-button-in-the-cloud-ide) in the Studio IDE by clicking on the three dots next to the **Studio IDE Status** button on the lower right corner of the Studio IDE screen, then select **Rollback to remote**. * **Note** — Rollback to remote resets your repo back to an earlier clone from your remote. Any saved but uncommitted changes will be lost, so make sure you copy any modified code that you want to keep in a temporary location outside of dbt. 12. Once you rollback to remote, open the `.gitignore` file in the branch you're working in. If the new changes aren't included, you'll need to merge the latest commits from the main branch into your working branch. 13. Go to the **File Explorer** to verify the `.gitignore` file contains the correct entries and make sure the untracked files/folders in the .gitignore file are in *italics*. 14. Great job 🎉! You've configured the `.gitignore` correctly and can continue with your development! For more info, refer to this [detailed video](https://www.loom.com/share/9b3b8e2b617f41a8bad76ec7e42dd014) for additional guidance. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### How can I fix my .gitignore file? A gitignore file specifies which files Git should intentionally ignore. You can identify these files in your project by their italics formatting. If you can't revert changes, check out a branch, or click commit — this is usually do to your project missing a [.gitignore](https://github.com/dbt-labs/dbt-starter-project/blob/main/.gitignore) file OR your gitignore file doesn't contain the necessary content inside the folder. To fix this, complete the following steps: 1. In the Studio IDE, add the following [.gitignore contents](https://github.com/dbt-labs/dbt-starter-project/blob/main/.gitignore) in your dbt project `.gitignore` file: ```bash target/ dbt_packages/ logs/ # legacy -- renamed to dbt_packages in dbt v1 dbt_modules/ ``` 2. Save your changes but *don't commit* 3. Restart the Studio IDE by clicking on the three dots next to the **Studio IDE Status button** on the lower right of the Studio IDE. [![Restart the IDE by clicking the three dots on the lower right or click on the Status bar](/img/docs/dbt-cloud/cloud-ide/restart-ide.png?v=2 "Restart the IDE by clicking the three dots on the lower right or click on the Status bar")](#)Restart the IDE by clicking the three dots on the lower right or click on the Status bar 4. Select **Restart Studio IDE**. 5. Go back to the **File explorer** in the IDE and delete the following files or folders if you have them: * `target`, `dbt_modules`, `dbt_packages`, `logs` 6. **Save** and then **Commit and sync** your changes. 7. Restart the Studio IDE again. 8. Create a pull request (PR) under the **Version Control** menu to integrate your new changes. 9. Merge the PR on your git provider page. 10. Switch to your main branch and click on **Pull from remote** to pull in all the changes you made to your main branch. You can verify the changes by making sure the files/folders in the .gitignore file are in italics. [![A dbt project on the main branch that has properly configured gitignore folders (highlighted in italics).](/img/docs/dbt-cloud/cloud-ide/gitignore-italics.png?v=2 "A dbt project on the main branch that has properly configured gitignore folders (highlighted in italics).")](#)A dbt project on the main branch that has properly configured gitignore folders (highlighted in italics). For more info, refer to this [detailed video](https://www.loom.com/share/9b3b8e2b617f41a8bad76ec7e42dd014) for additional guidance. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### How can I see the SQL that dbt is running? To check out the SQL that dbt is running, you can look in: * dbt: * Within the run output, click on a model name, and then select "Details" * dbt Core: * The `target/compiled/` directory for compiled `select` statements * The `target/run/` directory for compiled `create` statements * The `logs/dbt.log` file for verbose logging. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### How can I set up the right permissions in BigQuery? To use this functionality, first create the service account you want to impersonate. Then grant users that you want to be able to impersonate this service account the `roles/iam.serviceAccountTokenCreator` role on the service account resource. Then, you also need to grant the service account the same role on itself. This allows it to create short-lived tokens identifying itself, and allows your human users (or other service accounts) to do the same. More information on this scenario is available [here](https://cloud.google.com/iam/docs/understanding-service-accounts#directly_impersonating_a_service_account). Once you've granted the appropriate permissions, you'll need to enable the [IAM Service Account Credentials API](https://console.cloud.google.com/apis/library/iamcredentials.googleapis.com). Enabling the API and granting the role are eventually consistent operations, taking up to 7 minutes to fully complete, but usually fully propagating within 60 seconds. Give it a few minutes, then add the `impersonate_service_account` option to your BigQuery profile configuration. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### How can I update my billing information? If you want to change your account's credit card details, go to the left side panel, click **Account settings** → **Billing** → scroll to **Payment information**. Enter the new credit card details on the respective fields then click on **Update payment information**. Only the *account owner* can make this change. To change your billing name or location address, send our Support team a message at with the newly updated information, and we can make that change for you! #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### How can we move our project from a managed repository, to a self-hosted repository? dbt Labs can send your managed repository through a ZIP file in its current state for you to push up to a git provider. After that, you'd just need to switch over to the [repo in your project](https://docs.getdbt.com/docs/cloud/git/import-a-project-by-git-url.md) to point to the new repository. When you're ready to do this, [contact the dbt Labs Support team](mailto:support@getdbt.com) with your request and your managed repo URL, which you can find by navigating to your project setting. To find project settings: 1. From dbt, click on your account name in the left side menu and select **Account settings**. 2. Click **Projects**, and then select your project. 3. Under **Repository** in the project details page, you can find your managed repo URL. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### How did dbt choose which schema to build my models in? By default, dbt builds models in your target schema. To change your target schema: * If you're developing in **dbt**, these are set for each user when you first use a development environment. * If you're developing with **dbt Core**, this is the `schema:` parameter in your `profiles.yml` file. If you wish to split your models across multiple schemas, check out the docs on [using custom schemas](https://docs.getdbt.com/docs/build/custom-schemas.md). Note: on BigQuery, `dataset` is used interchangeably with `schema`. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### How do I access documentation in dbt Catalog? If you're using dbt to deploy your project and have a [Starter, Enterprise, or Enterprise+ plan](https://www.getdbt.com/pricing/), you can use Catalog to view your project's [resources](https://docs.getdbt.com/docs/build/projects.md) (such as models, tests, and metrics) and their lineage to gain a better understanding of its latest production state. Access Catalog in dbt by clicking the **Catalog** link in the navigation. You can have up to 5 read-only users access the documentation for your project. dbt developer plan and dbt Core users can use [dbt Docs](https://docs.getdbt.com/docs/explore/build-and-view-your-docs.md#dbt-docs), which generates basic documentation but it doesn't offer the same speed, metadata, or visibility as Catalog. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### How do I build one seed at a time? You can use a `--select` option with the `dbt seed` command, like so: ```shell $ dbt seed --select country_codes ``` There is also an `--exclude` option. Check out more in the [model selection syntax](https://docs.getdbt.com/reference/node-selection/syntax.md) documentation. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### How do I change a user license type to read-only in dbt? To change the license type for a user from `developer` to `read-only` or `IT` in dbt, you must be an account owner or have admin privileges. You might make this change to free up a billable seat but retain the user’s access to view the information in the dbt account. 1. From dbt, click on your account name in the left side menu and, select **Account settings**. [![Navigate to account settings](/img/docs/dbt-cloud/Navigate-to-account-settings.png?v=2 "Navigate to account settings")](#)Navigate to account settings 2. In **Account Settings**, select **Users** under **Teams**. 3. Select the user you want to remove and click **Edit** in the bottom of their profile. 4. For the **License** option, choose **Read-only** or **IT** (from **Developer**), and click **Save**. [![Change user's license type](/img/docs/dbt-cloud/change_user_to_read_only_20221023.gif?v=2 "Change user's license type")](#)Change user's license type License types override group permissions **User license types always override their assigned group permission sets.** For example, a user with a Read-Only license cannot perform administrative actions, even if they belong to an Account Admin group. This ensures that license restrictions are always enforced, regardless of group membership. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### How do I create dependencies between models? When you use the `ref` [function](https://docs.getdbt.com/reference/dbt-jinja-functions/ref.md), dbt automatically infers the dependencies between models. For example, consider a model, `customer_orders`, like so: models/customer\_orders.sql ```sql select customer_id, min(order_date) as first_order_date, max(order_date) as most_recent_order_date, count(order_id) as number_of_orders from {{ ref('stg_orders') }} group by 1 ``` **There's no need to explicitly define these dependencies.** dbt will understand that the `stg_orders` model needs to be built before the above model (`customer_orders`). When you execute `dbt run`, you will see these being built in order: ```txt $ dbt run Running with dbt=1.9.0 Found 2 models, 28 data tests, 0 snapshots, 0 analyses, 130 macros, 0 operations, 0 seed files, 3 sources 11:42:52 | Concurrency: 8 threads (target='dev_snowflake') 11:42:52 | 11:42:52 | 1 of 2 START sql view model dbt_claire.stg_jaffle_shop__orders ....... [RUN] 11:42:55 | 1 of 2 OK created sql view model dbt_claire.stg_jaffle_shop__orders .. [CREATE VIEW in 2.50s] 11:42:55 | 2 of 2 START sql view model dbt_claire.customer_orders .............. [RUN] 11:42:56 | 2 of 2 OK created sql view model dbt_claire.customer_orders ......... [CREATE VIEW in 0.60s] 11:42:56 | Finished running 2 view models in 15.13s. Done. PASS=2 WARN=0 ERROR=0 SKIP=0 TOTAL=2 ``` To learn more about building a dbt project, we recommend you complete the [quickstart guide](https://docs.getdbt.com/guides.md). #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### How do I debug my Jinja? You should get familiar with checking the compiled SQL in `target/compiled//` and the logs in `logs/dbt.log` to see what dbt is running behind the scenes. You can also use the [log](https://docs.getdbt.com/reference/dbt-jinja-functions/log.md) function to debug Jinja by printing objects to the command line. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### How do I define a column type? Your warehouse's SQL engine automatically assigns a [datatype](https://www.w3schools.com/sql/sql_datatypes.asp) to every column, whether it's found in a source or model. To force SQL to treat a columns a certain datatype, use `cast` functions: models/order\_prices.sql ```sql select cast(order_id as integer), cast(order_price as double(6,2)) -- a more generic way of doing type conversion from {{ ref('stg_orders') }} ``` Many modern data warehouses now support `::` syntax as a shorthand for `cast( as )`. models/orders\_prices\_colon\_syntax.sql ```sql select order_id::integer, order_price::numeric(6,2) -- you might find this in Redshift, Snowflake, and Postgres from {{ ref('stg_orders') }} ``` Be warned, reading in data and casting that data may not always yield expected results, and every warehouse has its own subtleties. Certain casts may not be allowed (e.g. on Bigquery, you can't cast a `boolean`-type value to a `float64`). Casts that involve a loss in precision loss (e.g. `float` to `integer`) rely on your SQL engine to make a best guess or follow a specific schema not used by competing services. When performing casts, it's imperative that you are familiar with your warehouse's casting rules to best label fields in your sources and models. Thankfully, popular database services tend to have type docs--[Redshift](https://docs.amazonaws.cn/en_us/redshift/latest/dg/r_CAST_function.html) and [Bigquery](https://cloud.google.com/bigquery/docs/reference/standard-sql/conversion_rules). #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### How do I delete a project in dbt? To delete a project in dbt, you must be the account owner or have admin privileges. 1. From dbt, click on your account name in the left side menu and select **Account settings**. [![Navigate to account settings](/img/docs/dbt-cloud/Navigate-to-account-settings.png?v=2 "Navigate to account settings")](#)Navigate to account settings 2. In **Account Settings**, select **Projects**. Click the project you want to delete from the **Projects** page. 3. Click the edit icon in the lower right-hand corner of the **Project Details**. A **Delete** option appears on the left side of the same details view. 4. Click **Delete**. Confirm the action to immediately delete the user without additional password prompts. The project is deleted immediately after confirmation. Once a project is deleted, this action cannot be undone. [![Delete projects](/img/docs/dbt-cloud/delete_projects_from_dbt_cloud.png?v=2 "Delete projects")](#)Delete projects #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### How do I delete a user in dbt? To delete a user in dbt, you must be an account owner or have admin privileges. If the user has a `developer` license type, this will open up their seat for another user or allow the admins to lower the total number of seats. 1. From dbt, click on your account name in the left side menu and, select **Account settings**. [![Navigate to account settings](/img/docs/dbt-cloud/Navigate-to-account-settings.png?v=2 "Navigate to account settings")](#)Navigate to account settings 2. In **Account settings**, select **Users** under **Teams**. 3. Select the user you want to delete, then click **Edit**. 4. Click **Delete** in the bottom left. Click **Confirm Delete** to immediately delete the user without additional password prompts. This action cannot be undone. However, you can re-invite the user with the same information if the deletion was made in error. [![Deleting a user](/img/docs/dbt-cloud/delete_user.png?v=2 "Deleting a user")](#)Deleting a user If you are on a **Starter** plan and you're deleting users to reduce the number of billable seats, follow these steps to lower the license count to avoid being overcharged: 1. In **Account Settings**, select **Billing**. 2. Under **Billing details**, enter the number of developer seats you want and make sure you fill in all the payment details, including the **Billing address** section. If you leave any field blank, you won't be able to save your changes. 3. Click **Update Payment Information** to save your changes. [![Navigate to Account settings -> Users to modify dbt users](/img/docs/dbt-cloud/faq-account-settings-billing.png?v=2 "Navigate to Account settings -> Users to modify dbt users")](#)Navigate to Account settings -> Users to modify dbt users #### Related docs[​](#related-docs "Direct link to Related docs") * [dbt licenses](https://docs.getdbt.com/docs/cloud/manage-access/seats-and-users.md#licenses) #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### How do I document macros? To document macros, use a [properties file](https://docs.getdbt.com/reference/macro-properties.md) and nest the configurations under a `macros:` key #### Example[​](#example "Direct link to Example") macros/properties.yml ```yml macros: - name: cents_to_dollars description: A macro to convert cents to dollars arguments: - name: column_name type: column description: The name of the column you want to convert - name: precision type: integer description: Number of decimal places. Defaults to 2. ``` tip From dbt Core v1.10, you can opt into validating the arguments you define in macro documentation using the `validate_macro_args` behavior change flag. When enabled, dbt will: * Infer arguments from the macro and includes them in the [manifest.json](https://docs.getdbt.com/reference/artifacts/manifest-json.md) file if no arguments are documented. * Raise a warning if documented argument names don't match the macro definition. * Raise a warning if `type` fields don't follow [supported formats](https://docs.getdbt.com/reference/resource-properties/arguments.md#supported-types). Learn more about [macro argument validation](https://docs.getdbt.com/reference/global-configs/behavior-changes.md#macro-argument-validation). #### Document a custom materialization[​](#document-a-custom-materialization "Direct link to Document a custom materialization") When you create a [custom materialization](https://docs.getdbt.com/guides/create-new-materializations.md), dbt creates an associated macro with the following format: ```text materialization_{materialization_name}_{adapter} ``` To document a custom materialization, use the previously mentioned format to determine the associated macro name(s) to document. macros/properties.yml ```yaml macros: - name: materialization_my_materialization_name_default description: A custom materialization to insert records into an append-only table and track when they were added. - name: materialization_my_materialization_name_xyz description: A custom materialization to insert records into an append-only table and track when they were added. ``` #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### How do I exclude a table from a freshness snapshot? Some tables in a data source may be updated infrequently. If you've set a `freshness` property at the source level, this table is likely to fail checks. To work around this, you can set the table's freshness to null (`freshness: null`) to "unset" the freshness for a particular table: models/\.yml ```yaml sources: - name: jaffle_shop database: raw schema: jaffle_shop config: freshness: warn_after: {count: 12, period: hour} error_after: {count: 24, period: hour} loaded_at_field: _etl_loaded_at tables: - name: orders - name: product_skus config: freshness: null # do not check freshness for this table ``` #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### How do I load data into my warehouse? dbt assumes that you already have a copy of your data, in your data warehouse. We recommend you use an off-the-shelf tool like [Stitch](https://www.stitchdata.com/) or [Fivetran](https://fivetran.com/) to get data into your warehouse. **Can dbt be used to load data?** No, dbt does not extract or load data. It focuses on the transformation step only. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### How do I populate the owner column in the generated docs? You cannot change the `owner` column in your generated documentation. dbt pulls the `owner` field in `dbt-docs` from database metadata ([catalog.json](https://docs.getdbt.com/reference/artifacts/catalog-json.md)), meaning the `owner` of that table in the database. With the exception of [exposures](https://docs.getdbt.com/docs/build/exposures.md), dbt does not pull this value from an `owner` field set within dbt. Generally, dbt's database user owns the tables created in the database. The service responsible for ingesting or loading the data usually owns the source tables. If you set `meta.owner`, that field appears under **meta** (pulled from dbt), but still not under the top-level `owner` field. #### Example[​](#example "Direct link to Example") The following example shows a model with `meta.owner` so it appears under **meta** in the docs. Replace `DATA_TEAM_EMAIL` with your own values. models/stg\_orders.yml ```yaml models: - name: stg_orders description: "Staging table for order events." config: meta: owner: "DATA_TEAM_EMAIL" columns: - name: order_id description: "Primary key for orders." - name: order_date description: "Date when order was placed." ``` #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### How do I preserve leading zeros in a seed? If you need to preserve leading zeros (for example in a zipcode or mobile number), include leading zeros in your seed file, and use the `column_types` [configuration](https://docs.getdbt.com/reference/resource-configs/column_types.md) with a varchar datatype of the correct length. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### How do I remove deleted models from my data warehouse? If you delete a model from your dbt project, dbt does not automatically drop the relation from your schema. This means that you can end up with extra objects in schemas that dbt creates, which can be confusing to other users. (This can also happen when you switch a model from being a view or table, to ephemeral) When you remove models from your dbt project, you should manually drop the related relations from your schema. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### How do I run data tests on just my sources? To run data tests on all sources, use the following command: ```shell dbt test --select "source:*" ``` (You can also use the `-s` shorthand here instead of `--select`) To run data tests on one source (and all of its tables): ```shell $ dbt test --select source:jaffle_shop ``` And, to run data tests on one source table only: ```shell $ dbt test --select source:jaffle_shop.orders ``` #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### How do I run models downstream of a seed? You can run models downstream of a seed using the [model selection syntax](https://docs.getdbt.com/reference/node-selection/syntax.md), and treating the seed like a model. For example, the following would run all models downstream of a seed named `country_codes`: ```shell $ dbt run --select country_codes+ ``` #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### How do I run models downstream of one source? To run models downstream of a source, use the `source:` selector: ```shell $ dbt run --select source:jaffle_shop+ ``` (You can also use the `-s` shorthand here instead of `--select`) To run models downstream of one source table: ```shell $ dbt run --select source:jaffle_shop.orders+ ``` Check out the [model selection syntax](https://docs.getdbt.com/reference/node-selection/syntax.md) for more examples! #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### How do I run one model at a time? To run one model, use the `--select` flag (or `-s` flag), followed by the name of the model: ```shell $ dbt run --select customers ``` Check out the [model selection syntax documentation](https://docs.getdbt.com/reference/node-selection/syntax.md) for more operators and examples. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### How do I run one snapshot at a time? To run one snapshot, use the `--select` flag, followed by the name of the snapshot: ```shell $ dbt snapshot --select order_snapshot ``` Check out the [model selection syntax documentation](https://docs.getdbt.com/reference/node-selection/syntax.md) for more operators and examples. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### How do I set a datatype for a column in my seed? dbt will infer the datatype for each column based on the data in your CSV. You can also explicitly set a datatype using the `column_types` [configuration](https://docs.getdbt.com/reference/resource-configs/column_types.md) like so: dbt\_project.yml ```yml seeds: jaffle_shop: # you must include the project name warehouse_locations: +column_types: zipcode: varchar(5) ``` #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### How do I snapshot freshness for one source only? Use the `--select` flag to snapshot freshness for specific sources. Eg: ```shell # Snapshot freshness for all Jaffle Shop tables: $ dbt source freshness --select source:jaffle_shop # Snapshot freshness for a particular source : $ dbt source freshness --select source:jaffle_shop.orders # Snapshot freshness for multiple particular source tables: $ dbt source freshness --select source:jaffle_shop.orders source:jaffle_shop.customers ``` See the [`source freshness` command reference](https://docs.getdbt.com/reference/commands/source.md) for more information. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### How do I specify column types? Simply cast the column to the correct type in your model: ```sql select id, created::timestamp as created from some_other_table ``` You might have this question if you're used to running statements like this: ```sql create table dbt_alice.my_table id integer, created timestamp; insert into dbt_alice.my_table ( select id, created from some_other_table ) ``` In comparison, dbt would build this table using a `create table as` statement: ```sql create table dbt_alice.my_table as ( select id, created from some_other_table ) ``` So long as your model queries return the correct column type, the table you create will also have the correct column type. To define additional column options: * Rather than enforcing uniqueness and not-null constraints on your column, use dbt's [data testing](https://docs.getdbt.com/docs/build/data-tests.md) functionality to check that your assertions about your model hold true. * Rather than creating default values for a column, use SQL to express defaults (e.g. `coalesce(updated_at, current_timestamp()) as updated_at`) * In edge-cases where you *do* need to alter a column (e.g. column-level encoding on Redshift), consider implementing this via a [post-hook](https://docs.getdbt.com/reference/resource-configs/pre-hook-post-hook.md). #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### How do I test and document seeds? To test and document seeds, use a [properties file](https://docs.getdbt.com/reference/configs-and-properties.md) and nest the configurations under a `seeds:` key #### Example[​](#example "Direct link to Example") seeds/properties.yml ```yml seeds: - name: country_codes description: A mapping of two letter country codes to country names columns: - name: country_code data_tests: - unique - not_null - name: country_name data_tests: - unique - not_null ``` #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### How do I test one model at a time? Running tests on one model looks very similar to running a model: use the `--select` flag (or `-s` flag), followed by the name of the model: ```shell dbt test --select customers ``` Check out the [model selection syntax documentation](https://docs.getdbt.com/reference/node-selection/syntax.md) for full syntax, and [test selection examples](https://docs.getdbt.com/reference/node-selection/test-selection-examples.md) in particular. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### How do I transfer account ownership to another user? You can transfer your dbt [access control](https://docs.getdbt.com/docs/cloud/manage-access/about-user-access.md) to another user by following the steps below, depending on your dbt account plan: | Account plan | Steps | | ------------------------------------------ | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | **Developer** | You can transfer ownership by changing the email directly on your dbt profile page, which you can access using this URL when you replace `YOUR_ACCESS_URL` with the [appropriate Access URL](https://docs.getdbt.com/docs/cloud/about-cloud/access-regions-ip-addresses.md) for your region and plan: `https://YOUR_ACCESS_URL/settings/profile`. Before doing this, please ensure that you unlink your GitHub profile. The email address of the new account owner cannot be associated with another dbt account. | | **Starter** | Existing account admins with account access can add users to, or remove users from the owner group. | | **Enterprise or Enterprise+** | Account admins can add users to, or remove users from a group with Account Admin permissions. | | **If all account owners left the company** | If the account owner has left your organization, you will need to work with *your* IT department to have incoming emails forwarded to the new account owner. Once your IT department has redirected the emails, you can request to reset the user password. Once you log in, you can change the email on the Profile page when you replace `YOUR_ACCESS_URL` with the [appropriate Access URL](https://docs.getdbt.com/docs/cloud/about-cloud/access-regions-ip-addresses.md) for your region and plan: `https://YOUR_ACCESS_URL/settings/profile`. | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | When you make any account owner and email changes: * The new email address *must* be verified through our email verification process. * You can update any billing email address or [Notifications Settings](https://docs.getdbt.com/docs/deploy/job-notifications.md) to reflect the new account owner changes, if applicable. * When transferring account ownership, please ensure you [unlink](https://docs.getdbt.com/faqs/Accounts/git-account-in-use.md) your GitHub account in dbt. This is because you can only have your Git account linked to one dbt user account. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### How do I troubleshoot if cost data isn't appearing? If cost data isn't appearing in Cost Insights, check the following: * Verify that platform metadata credentials are configured in your account settings and that the credential test is passing. For more information, see [Set up Cost Insights](https://docs.getdbt.com/docs/explore/set-up-cost-insights.md#configure-platform-metadata-credentials). * Ensure you have one of the required permissions to view cost data. For more information, see [Assign required permissions](https://docs.getdbt.com/docs/explore/set-up-cost-insights.md#assign-required-permissions). * Confirm that at least one job is running in a production environment. Cost data only appears after jobs have executed. * Cost data refreshes daily and reflects the previous day's usage, which means there is a lag of up to one day between when a job runs and when its cost data appears. If you just ran a job, wait until the next day to see the data. * After enabling Cost Insights, dbt looks back 10 days to build baselines for cost reduction calculations. If you don't see cost reduction data, ensure you have sufficient job history within the last 10 days. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### How do I use the 'Custom Branch' settings in a dbt Environment? In dbt environments, you can change your git settings to use a different branch in your dbt project repositories besides the default branch. When you make this change, you run dbt on a custom branch. When specified, dbt executes models using the custom branch setting for that environment. Development and deployment environments have slightly different effects. To specify a custom branch: 1. Edit an existing environment or create a new one 2. Select **Only run on a custom branch** under General Settings 3. Specify the **branch name or tag** #### Development[​](#development "Direct link to Development") In a development environment, the primary branch (usually named `main`) is protected in your connected repositories. You can directly edit, format, or lint files and execute dbt commands in your protected default git branch. Since the Studio IDE prevents commits to the protected branch, you can commit those changes to a new branch when you're ready. Specifying a **Custom branch** overrides the default behavior. It makes the custom branch protected and enables you to create new development branches from it. You can directly edit, format, or lint files and execute dbt commands in your custom branch, but you cannot make commits to it. dbt prompts you to commit those changes to a new branch. Only one branch can be protected. If you specify a custom branch, the primary branch is no longer protected. If you want to protect the primary branch and prevent any commits on it, you need to set up branch protection rules in your git provider settings. This ensures your primary branch remains secure and no new commits can be made to it. For example, if you want to use the `develop` branch of a connected repository: 1. Go to an environment and click **Settings** > **Edit** to edit the environment. 2. Select **Only run on a custom branch** in **General settings**. 3. Enter **develop** as the name of your custom branch. 4. Click **Save**. [![Configuring a custom base repository branch](/img/docs/dbt-cloud/cloud-configuring-dbt-cloud/dev-environment-custom-branch.png?v=2 "Configuring a custom base repository branch")](#)Configuring a custom base repository branch #### Deployment[​](#deployment "Direct link to Deployment") When running jobs in a deployment environment, dbt will clone your project from your connected repository before executing your models. By default, dbt uses the default branch of your repository (commonly the `main` branch). To specify a different version of your project for dbt to execute during job runs in a particular environment, you can edit the Custom Branch setting as shown in the previous steps. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### How do I write long-form explanations in my descriptions? If you need more than a sentence to explain a model, you can: 1. Split your description over multiple lines using `>`. Interior line breaks are removed and Markdown can be used. This method is recommended for simple, single-paragraph descriptions: ```yml models: - name: customers description: > Lorem ipsum **dolor** sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. ``` 2. Split your description over multiple lines using `|`. Interior line breaks are maintained and Markdown can be used. This method is recommended for more complex descriptions: ```yml models: - name: customers description: | ### Lorem ipsum * dolor sit amet, consectetur adipisicing elit, sed do eiusmod * tempor incididunt ut labore et dolore magna aliqua. ``` 3. Use a [docs block](https://docs.getdbt.com/docs/build/documentation.md#using-docs-blocks) to write the description in a separate Markdown file. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### How does increasing job frequency affect cost reduction estimates? Cost reduction metrics reflect how dbt optimizes compute costs by reusing existing results instead of running the same model again. When you increase your job run frequency (for example, because performance improvements make it easier to schedule jobs more often), dbt has more opportunities to reuse models. As reuse increases, dbt optimizes more compute, which means your reported cost reductions may also increase. This metric shows the efficiency impact of reuse within your current workload. It reflects the compute costs that dbt reduces by reusing models instead of rebuilding them, rather than showing your total warehouse spend reduction. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### How is state-aware orchestration different from using selectors in dbt Core? In dbt Core, running with the selectors `state:modified+` and `source_status:fresher+` builds models that either: * Have changed since the prior run (`state:modified+`) * Have upstream sources that are fresher than in the prior run (`source_status:fresher+`) Instead of relying only on these selectors and prior-run artifacts, state-aware orchestration decides whether to rebuild a model based on: * Compiled SQL diffs that ignore non-meaningful changes like whitespace and comments * Upstream data changes at runtime and model-level freshness settings * Shared state across jobs While dbt Core uses selectors like `state:modified+` and `source_status:fresher+` to decide what to build *only for a single run in a single job*, state-aware orchestration with Fusion maintains a *shared, real-time model state across every job in the environment* and uses that state to determine whether a model’s code or upstream data have actually changed before rebuilding. This ensures dbt only rebuilds models when something has changed, no matter which job runs them. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### How often is cost data refreshed? Cost data refreshes daily and reflects the previous day's usage. This means there is a lag of up to one day between when a job runs and when its cost data appears in Cost Insights. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### How often should I run the snapshot command? Snapshots are a batch-based approach to [change data capture](https://en.wikipedia.org/wiki/Change_data_capture). The `dbt snapshot` command must be run on a schedule to ensure that changes to tables are actually recorded! While individual use-cases may vary, snapshots are intended to be run between hourly and daily. If you find yourself snapshotting more frequently than that, consider if there isn't a more appropriate way to capture changes in your source data tables. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### How should I structure my project? There's no one best way to structure a project! Every organization is unique. If you're just getting started, check out how we (dbt Labs) [structure our dbt projects](https://docs.getdbt.com/best-practices/how-we-structure/1-guide-overview.md). #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### How to delete a job or environment in dbt? To delete an environment or job in dbt, you must have a `developer` [license](https://docs.getdbt.com/docs/cloud/manage-access/seats-and-users.md) and have the necessary [access permissions](https://docs.getdbt.com/docs/cloud/manage-access/about-user-access.md). ##### Delete a job[​](#delete-a-job "Direct link to Delete a job") To delete a job or multiple jobs in dbt: 1. Click **Deploy** on the navigation header. 2. Click **Jobs** and select the job you want to delete. 3. Click **Settings** on the top right of the page and then click **Edit**. 4. Scroll to the bottom of the page and click **Delete job** to delete the job.
[![Delete a job](/img/docs/dbt-cloud/cloud-configuring-dbt-cloud/delete-job.png?v=2 "Delete a job")](#)Delete a job 5. Confirm your action in the pop-up by clicking **Confirm delete** in the bottom right to delete the job immediately. This action cannot be undone. However, you can create a new job with the same information if the deletion was made in error. 6. Refresh the page, and the deleted job should now be gone. If you want to delete multiple jobs, you'll need to perform these steps for each job. If you're having any issues, feel free to [contact us](mailto:support@getdbt.com) for additional help. ##### Delete an environment[​](#delete-an-environment "Direct link to Delete an environment") Deleting an environment automatically deletes its associated job(s). If you want to keep those jobs, move them to a different environment first. Follow these steps to delete an environment in dbt: 1. Click **Deploy** on the navigation header and then click **Environments** 2. Select the environment you want to delete. 3. Click **Settings** on the top right of the page and then click **Edit**. 4. Scroll to the bottom of the page and click **Delete** to delete the environment. [![Delete an environment](/img/docs/dbt-cloud/cloud-configuring-dbt-cloud/delete-environment.png?v=2 "Delete an environment")](#)Delete an environment 5. Confirm your action in the pop-up by clicking **Confirm delete** in the bottom right to delete the environment immediately. This action cannot be undone. However, you can create a new environment with the same information if the deletion was made in error. 6. Refresh your page and the deleted environment should now be gone. To delete multiple environments, you'll need to perform these steps to delete each one. If you're having any issues, feel free to [contact us](mailto:support@getdbt.com) for additional help. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### How to generate HAR files HTTP Archive (HAR) files are used to gather data from users’ browser, which dbt Support uses to troubleshoot network or resource issues. This information includes detailed timing information about the requests made between the browser and the server. The following sections describe how to generate HAR files using common browsers such as [Google Chrome](#google-chrome), [Mozilla Firefox](#mozilla-firefox), [Apple Safari](#apple-safari), and [Microsoft Edge](#microsoft-edge). info Remove or hide any confidential or personally identifying information before you send the HAR file to dbt Labs. You can edit the file using a text editor. ##### Google Chrome[​](#google-chrome "Direct link to Google Chrome") 1. Open Google Chrome. 2. Click on **View** --> **Developer Tools**. 3. Select the **Network** tab. 4. Ensure that Google Chrome is recording. A red button (🔴) indicates that a recording is already in progress. Otherwise, click **Record network log**. 5. Select **Preserve Log**. 6. Clear any existing logs by clicking **Clear network log** (🚫). 7. Go to the page where the issue occurred and reproduce the issue. 8. Click **Export HAR** (the down arrow icon) to export the file as HAR. The icon is located on the same row as the **Clear network log** button. 9. Save the HAR file. 10. Upload the HAR file to the dbt Support ticket thread. ##### Mozilla Firefox[​](#mozilla-firefox "Direct link to Mozilla Firefox") 1. Open Firefox. 2. Click the application menu and then **More tools** --> **Web Developer Tools**. 3. In the developer tools docked tab, select **Network**. 4. Go to the page where the issue occurred and reproduce the issue. The page automatically starts recording as you navigate. 5. When you're finished, click **Pause/Resume recording network log**. 6. Right-click anywhere in the **File** column and select **Save All as HAR**. 7. Save the HAR file. 8. Upload the HAR file to the dbt Support ticket thread. ##### Apple Safari[​](#apple-safari "Direct link to Apple Safari") 1. Open Safari. 2. In case the **Develop** menu doesn't appear in the menu bar, go to **Safari** and then **Settings**. 3. Click **Advanced**. 4. Select the **Show features for web developers** checkbox. 5. From the **Develop** menu, select **Show Web Inspector**. 6. Click the **Network tab**. 7. Go to the page where the issue occurred and reproduce the issue. 8. When you're finished, click **Export**. 9. Save the file. 10. Upload the HAR file to the dbt Support ticket thread. ##### Microsoft Edge[​](#microsoft-edge "Direct link to Microsoft Edge") 1. Open Microsoft Edge. 2. Click the **Settings and more** menu (...) to the right of the toolbar and then select **More tools** --> **Developer tools**. 3. Click **Network**. 4. Ensure that Microsoft Edge is recording. A red button (🔴) indicates that a recording is already in progress. Otherwise, click **Record network log**. 5. Go to the page where the issue occurred and reproduce the issue. 6. When you're finished, click **Stop recording network log**. 7. Click **Export HAR** (the down arrow icon) or press **Ctrl + S** to export the file as HAR. 8. Save the HAR file. 9. Upload the HAR file to the dbt Support ticket thread. ##### Additional resources[​](#additional-resources "Direct link to Additional resources") Check out the [How to generate a HAR file in Chrome](https://www.loom.com/share/cabdb7be338243f188eb619b4d1d79ca) video for a visual guide on how to generate HAR files in Chrome. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### How to migrate git providers To migrate from one git provider to another, refer to the following steps to avoid minimal disruption: 1. Outside of dbt, you'll need to import your existing repository into your new provider. By default, connecting your repository in one account won't automatically disconnected it from another account. As an example, if you're migrating from GitHub to Azure DevOps, you'll need to import your existing repository (GitHub) into your new Git provider (Azure DevOps). For detailed steps on how to do this, refer to your Git provider's documentation (Such as [GitHub](https://docs.github.com/en/migrations/importing-source-code/using-github-importer/importing-a-repository-with-github-importer), [GitLab](https://docs.gitlab.com/ee/user/project/import/repo_by_url.html), [Azure DevOps](https://learn.microsoft.com/en-us/azure/devops/repos/git/import-git-repository?view=azure-devops)) 2. Go back to dbt and set up your [integration for the new Git provider](https://docs.getdbt.com/docs/cloud/git/git-configuration-in-dbt-cloud.md), if needed. 3. Disconnect the old repository in dbt by going to **Account Settings** and then **Projects**. 4. Click on the **Repository** link, then click **Edit** and **Disconnect**. [![Disconnect and reconnect your Git repository in your dbt Account settings page.](/img/docs/dbt-cloud/disconnect-repo.png?v=2 "Disconnect and reconnect your Git repository in your dbt Account settings page.")](#)Disconnect and reconnect your Git repository in your dbt Account settings page. 5. Click **Confirm Disconnect**. 6. On the same page, connect to the new Git provider repository by clicking **Configure Repository** * If you're using the native integration, you may need to OAuth to it. 7. That's it, you should now be connected to the new Git provider! 🎉 Note — As a tip, we recommend you refresh your page and Studio IDE before performing any actions. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### How to upgrade a dbt account dbt offers [several plans](https://www.getdbt.com/pricing/) with different features that meet your needs. This document is for dbt admins and explains how to select a plan in order to continue using dbt. #### Prerequisites[​](#prerequisites "Direct link to Prerequisites") Before you begin: * You *must* be part of the [Owner](https://docs.getdbt.com/docs/cloud/manage-access/self-service-permissions.md) user group to make billing changes. Users not included in this group will not see these options. * All amounts shown in dbt are in U.S. Dollars (USD) * When your trial expires, your account's default plan enrollment will be a Starter plan. #### Select a plan[​](#select-a-plan "Direct link to Select a plan") When your [14 day trial](https://www.getdbt.com/signup/) ends or if your subscription payment is past due , you'll need to select a plan in order to continue using your account: * Upon logging in, you should see an "Account locked" pop up message with instructions to unlock your account and update your payment details * Click **Go to Billing** to go to the billing page * Under **Billing**, you can review the available dbt [plans](https://www.getdbt.com/pricing/) and their features To unlock your account and select a plan, review the following guidance per plan type: ##### Developer plan[​](#developer-plan "Direct link to Developer plan") 1. To select a Developer plan, click Select plan on the right. 2. Confirm your plan selection on the pop up message. 3. This automatically unlocks your dbt account, and you can now enjoy the benefits of the Developer plan. 🎉 [![](/img/docs/dbt-cloud/downgrade-dev-flow.gif?v=2)](#) ###### Plan allocation[​](#plan-allocation "Direct link to Plan allocation") If you select a plan but have too many seats or projects for that plan (for example, if you select the Developer plan but have more than one developer seat), you'll be directed to the users & projects pages to make edits. ##### Starter plan[​](#starter-plan "Direct link to Starter plan") 1. When your trial expires, your account's default plan enrollment will be a Starter plan. 2. To unlock your account and continue using the Starter plan, click on **Select plan** under the Starter column. 3. Enter your payment information and seat purchases. Then click **Save**. 4. This automatically unlocks your dbt account, and you can now enjoy the benefits of the Starter plan. 🎉 [![](/img/docs/dbt-cloud/trial-team-flow.png?v=2)](#) [![](/img/docs/dbt-cloud/trial-team-payments-flow.png?v=2)](#) ##### Enterprise plan[​](#enterprise-plan "Direct link to Enterprise plan") 1. If you're interested in one of our Enterprise-tier plans, select the Enterprise tab under **Billing**. 2. Click **Contact Sales** on the right. This opens a chat window for you to contact the dbt Support team, who will connect you to our Sales team. 3. Once you submit your request, our Sales team will contact you with more information. [![](/img/docs/dbt-cloud/enterprise-upgrade.gif?v=2)](#) 4. Alternatively, you can [contact](https://www.getdbt.com/contact/) our Sales team directly to chat about how dbt can help you and your team. #### Related questions[​](#related-questions "Direct link to Related questions") For commonly asked billings questions, refer to the dbt [pricing page](https://www.getdbt.com/pricing/). How does billing work? Starter plans are billed monthly on the credit card used to sign up, based on [developer seat count and usage](https://docs.getdbt.com/docs/cloud/billing.md). You’ll also be sent a monthly receipt to the billing email of your choice. You can change any billing information in your **Account Settings** > **Billing page**. Enterprise-tier plan customers are billed annually based on the number of developer seats, as well as any additional services + features in your chosen plan. Can I upgrade or downgrade my plan? Yes, you can upgrade or downgrade at any time. Account Owners can access their dedicated billing section via the account settings page. If you’re not sure which plan is right for you, get in touch and we’ll be happy to help you find one that fits your needs. Can I pay by invoice? Currently, dbt Starter plan payments must be made with a credit card, and by default they will be billed monthly based on the number of [developer seats and usage](https://docs.getdbt.com/docs/cloud/billing.md). We don’t have any plans to do invoicing for Starter plan accounts in the near future, but we do currently support invoices for companies on the dbt Enterprise-tier plan. Feel free to [contact](https://www.getdbt.com/contact/) us to build your Enterprise pricing plan. Why did I receive a **Failed payment** error email? This means we were unable to charge the credit card you have on file, or you have not provided an updated card for payment. If you're a current account owner with a card on file, contact your credit card issuer to inquire as to why your card was declined or update the credit card on your account. Your Account Owner can update payment details in the **Account Settings** -> **Billing** page. Click **Edit** next to your card details, double check your information is up-to-date, and we'll give it another go at the next billing run. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### I got an "unused model configurations" error message, what does this mean? You might have forgotten to nest your configurations under your project name, or you might be trying to apply configurations to a directory that doesn't exist. Check out this [article](https://discourse.getdbt.com/t/faq-i-got-an-unused-model-configurations-error-message-what-does-this-mean/112) to understand more. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### I need to use quotes to select from my source, what should I do? This is reasonably common on Snowflake in particular. By default, dbt will not quote the database, schema, or identifier for the source tables that you've specified. To force dbt to quote one of these values, use the [`quoting` property](https://docs.getdbt.com/reference/resource-properties/quoting.md): models/\.yml ```yaml sources: - name: jaffle_shop database: raw schema: jaffle_shop quoting: database: true schema: true identifier: true tables: - name: order_items - name: orders # This overrides the `jaffle_shop` quoting config quoting: identifier: false ``` #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### I'm getting a "Partial parsing enabled: 1 files deleted, 0 files added, 2 files changed" compilation error in dbt? If you're receiving this error, try deleting the `target/partial_parse.msgpack` file from your project and refresh your IDE. If you've tried the workaround above and are still experiencing this behavior - reach out to the Support team at and we'll be happy to help! #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### I'm getting a "Session occupied" error in dbt CLI? If you're receiving a `Session occupied` error in the dbt CLI or if you're experiencing a long-running session, you can use the `dbt invocation list` command in a separate terminal window to view the status of your active session. This helps debug the issue and identify the arguments that are causing the long-running session. To cancel an active session, use the `Ctrl + Z` shortcut. To learn more about the `dbt invocation` command, see the [dbt invocation command reference](https://docs.getdbt.com/reference/commands/invocation.md). Alternatively, you can reattach to your existing session with `dbt reattach` and then press `Control-C` and choose to cancel the invocation. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### I'm receiving a 'This run exceeded your account's run memory limits' error in my failed job If you're receiving a `This run exceeded your account's run memory limits` error in your failed job, it means that the job exceeded the [memory limits](https://docs.getdbt.com/docs/deploy/job-scheduler.md#job-memory) set for your account. All dbt accounts have a pod memory of 600Mib and memory limits are on a per run basis. They're typically influenced by the amount of result data that dbt has to ingest and process, which is small but can become bloated unexpectedly by project design choices. ##### Common reasons[​](#common-reasons "Direct link to Common reasons") Some common reasons for higher memory usage are: * dbt run/build: Macros that capture large result sets from run query may not all be necessary and may be memory inefficient. * dbt docs generate: Source or model schemas with large numbers of tables (even if those tables aren't all used by dbt) cause the ingest of very large results for catalog queries. ##### Resolution[​](#resolution "Direct link to Resolution") There are various reasons why you could be experiencing this error but they are mostly the outcome of retrieving too much data back into dbt. For example, using the `run_query()` operations or similar macros, or even using database/schemas that have a lot of other non-dbt related tables/views. Try to reduce the amount of data / number of rows retrieved back into dbt by refactoring the SQL in your `run_query()` operation using `group`, `where`, or `limit` clauses. Additionally, you can also use a database/schema with fewer non-dbt related tables/views. Video example As an additional resource, check out [this example video](https://www.youtube.com/watch?v=sTqzNaFXiZ8), which demonstrates how to refactor the sample code by reducing the number of rows returned. If you've tried the earlier suggestions and are still experiencing failed job runs with this error about hitting the memory limits of your account, please [reach out to support](mailto:support@getdbt.com). We're happy to help! ##### Additional resources[​](#additional-resources "Direct link to Additional resources") * [Blog post on how we shaved 90 mins off](https://docs.getdbt.com/blog/how-we-shaved-90-minutes-off-model) #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### I'm receiving a "Permission denied while getting Drive credential" error when trying to query from Google Drive? If you're seeing the below error when you try to query a dataset from a Google Drive document in the Studio IDE, the Studio IDE due to the below error message, we'll do our best to get you unstuck with the below steps! ```text Access denied: BigQuery BigQuery: Permission denied while getting Drive credentials ``` Usually, this error indicates that you haven't granted the BigQuery service account access to the specific Google Drive document. If you're seeing this error, try giving the service account (Client email field seen [here](https://docs.getdbt.com/docs/cloud/connect-data-platform/connect-bigquery.md)) you are using for your BigQuery connection in dbt, permission to your Google Drive or Google Sheet. You'll want to do this directly in your Google Document, click the **Share** button, and enter the client email. If you are experiencing this error when using OAuth, and you have verified your access to the Google Sheet, you may need to grant permissions for gcloud to access Google Drive: ```text gcloud auth application-default login --disable-quota-project ``` For more info see the [gcloud auth application-default documentation](https://cloud.google.com/sdk/gcloud/reference/auth/application-default/login) If you've tried the earlier steps and are still experiencing this behavior, try using the following command to log into Google Cloud and enable access to Google Drive. It also updates the Application Default Credentials (ADC) file, which many Google Cloud libraries use to authenticate API calls. ```text gcloud auth login --enable-gdrive-access --update-adc ``` For more info, refer to [gcloud auth login documentation](https://cloud.google.com/sdk/gcloud/reference/auth/login#--enable-gdrive-access). If you've tried the steps above and are still experiencing this behavior - reach out to the Support team at and we'll be happy to help! #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### I'm receiving a 403 error 'Forbidden: Access denied' when using service tokens All [service token](https://docs.getdbt.com/docs/dbt-cloud-apis/service-tokens.md) traffic is subject to IP restrictions. When using a service token, the following 403 response error indicates the IP is not on the allowlist. To resolve this, you should add your third-party integration CIDRs (network addresses) to your allowlist. The following is an example of the 403 response error: ```json { "status": { "code": 403, "is_success": False, "user_message": ("Forbidden: Access denied"), "developer_message": None, }, "data": { "account_id": , "user_id": , "is_service_token": , "account_access_denied": True, }, } ``` #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### I'm receiving a git rev-list master error in the IDE? If you're unable to access the Studio IDE due to the below error message, we'll do our best to get you unstuck with the below steps! ```shell git rev-list master..origin/main --count fatal: ambiguous argument 'master..origin/main': unknown revision or path not in the working tree. Use '--' to separate paths from revisions, like this: 'git [...] -- [...]' ``` Usually this error indicates that the "main" branch name has changed or it is possible that dbt was unable to determine what your primary branch was. No worries, we have a few workarounds for you to try: **Workaround 1** Take a look at your Environment Settings - If you **do not** have a custom branch filled in your Environment Settings: 1. Disconnect and reconnect your repository [connection](https://docs.getdbt.com/docs/cloud/git/import-a-project-by-git-url.md) on your Project Settings page. This should then allow dbt to pick up that the "main" branch is now called `main`. 2. In the Environment Settings, set the custom branch to 'master' and refresh the Studio IDE. **Workaround 2** Take a look at your Environment Settings - If you **do** have a custom branch filled in your Environment Settings: 1. Disconnecting and reconnecting your repository [connection](https://docs.getdbt.com/docs/cloud/git/import-a-project-by-git-url.md) on your Project Settings page. This should then allow dbt to pick up that the "main" branch is now called `main`. 2. In the Environment Settings, remove the custom branch and refresh the Studio IDE. If you've tried the workarounds above and are still experiencing this behavior - reach out to the Support team at and we'll be happy to help! #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### I'm receiving a NoneType object has no attribute error in the IDE? If you're unable to access the Studio IDE due to the below error message, we'll do our best to get you unstuck with the below steps! ```shell NoneType object has no attribute enumerate_fields' ``` Usually this error indicates that you tried connecting your database via [SSH tunnel](https://docs.getdbt.com/docs/cloud/connect-data-platform/connect-redshift.md#connecting-using-an-ssh-tunnel). If you're seeing this error, double-check you have supplied the following items: * the hostname * username * port of bastion server If you've tried the step above and are still experiencing this behavior - reach out to the Support team at and we'll be happy to help! #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### I'm receiving a Runtime Error Could not find profile named 'user' error? If you're unable to access the Studio IDE due to the below error message, we'll do our best to get you unstuck with the below steps! ```shell Running with dbt=1.9.0 Encountered an error while reading the project: ERROR: Runtime Error Could not find profile named 'user' Runtime Error Could not run dbt' ``` Usually this errors indicates that there is an issue with missing/stale credentials/authentication. No worries, we have a few workarounds for you to try: **In the Studio IDE:** If this is happening in the Studio IDE, you'll want to navigate to the Profile settings where your development credentials are configured. Once you're there, you'll need to either re-enter or re-authorize your credentials in order to get around this error message. **In a job:** If this is happening in a job, it might be that you made some sort of change to the deployment environment in which the job is configured and did not re-enter your deployment credentials upon saving those changes. To fix this, you'll need to go back into the deployment environment settings, re-enter your credentials (either the private key/private key passphrase or the username and password), and kick off a new job run. If you've tried the step above and are still experiencing this behavior - reach out to the Support team at and we'll be happy to help! #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### I'm receiving an 'Your IDE session experienced an unknown error and was terminated. Please contact support'. If you're seeing the following error when you launch the Studio IDE, it could be due to a few scenarios but, commonly, it indicates a missing repository: ```shell Your session experienced an unknown error and was terminated. Please contact support. ``` You can try to resolve this by adding a repository like a [managed repository](https://docs.getdbt.com/docs/cloud/git/managed-repository.md) or your preferred Git account. To add your Git account, navigate to **Project** > **Repository** and select your repository. If you're still running into this error, please contact the Support team at for help. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### I'm receiving an `Failed ALPN` error when trying to connect to the dbt Semantic Layer. If you're receiving a `Failed ALPN` error when trying to connect the dbt Semantic Layer with the various [data integration tools](https://docs.getdbt.com/docs/cloud-integrations/avail-sl-integrations.md) (such as Tableau, DBeaver, Datagrip, ADBC, or JDBC), it typically happens when connecting from a computer behind a corporate VPN or Proxy (like Zscaler or Check Point). The root cause is typically the proxy interfering with the TLS handshake as the Semantic Layer uses gRPC/HTTP2 for connectivity. To resolve this: * If your proxy supports gRPC/HTTP2 but isn't configured to allow ALPN, adjust its settings accordingly to allow ALPN. Or create an exception for the dbt domain. * If your proxy does not support gRPC/HTTP2, add an SSL interception exception for the dbt domain in your proxy settings This should help in successfully establishing the connection without the Failed ALPN error. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### I'm seeing a 'GitHub and dbt latest permissions' error If you see the error `This account needs to accept the latest permissions for the dbt GitHub App` in dbt — this usually occurs when the permissions for the dbt GitHub App are out-of-date. To solve this issue, you'll need to update the permissions for the dbt GitHub App in your GitHub account. This FAQ shares a couple of ways you can do it. #### Update permissions[​](#update-permissions "Direct link to Update permissions") A GitHub organization admin will need to update the permissions in GitHub for the dbt GitHub App. If you're not the admin, reach out to your organization admin to request this. 1. Navigate to your GitHub account. Click on the top right profile icon and then **Settings** (or personal if using a non-organization account). 2. Then go to **Integrations** and then select **Applications** to identify any necessary permission changes. Note that a GitHub repository admin may not see the same permission request. [![Navigate to Application settings to identify permission changes.](/img/docs/dbt-cloud/cloud-configuring-dbt-cloud/github-applications.png?v=2 "Navigate to Application settings to identify permission changes.")](#)Navigate to Application settings to identify permission changes. 3. Click on **Review request** and then click on the **Accept new permissions** button on the next page. [![Grant access to the dbt app by accepting the new permissions.](/img/docs/dbt-cloud/cloud-configuring-dbt-cloud/github-review-request.png?v=2 "Grant access to the dbt app by accepting the new permissions.")](#)Grant access to the dbt app by accepting the new permissions. For more info on GitHub permissions, refer to [access permissions](https://docs.github.com/en/get-started/learning-about-github/access-permissions-on-github). Alternatively, try [disconnecting your GitHub account](#disconnect-github) in dbt, detailed in the following section. #### Disconnect GitHub[​](#disconnect-github "Direct link to Disconnect GitHub") Disconnect the GitHub and dbt integration in dbt. 1. In dbt, go to **Account Settings**. 2. In **Projects**, select the project experiencing the issue. 3. Click the repository link under **Repository**. 4. In the **Repository details** page, click **Edit**. 5. Click **Disconnect** to remove the GitHub integration. [![Disconnect and reconnect your git repository in your dbt Account settings pages.](/img/docs/dbt-cloud/disconnect-repo.png?v=2 "Disconnect and reconnect your git repository in your dbt Account settings pages.")](#)Disconnect and reconnect your git repository in your dbt Account settings pages. 6. Click **Confirm Disconnect**. 7. Return to your **Project details** page and reconnect your repository by clicking the **Configure Repository** link. 8. Click **GitHub** and select your repository. #### Support[​](#support "Direct link to Support") If you've tried these workarounds and are still experiencing this behavior — reach out to the [dbt Support](mailto:support@getdbt.com) team and we'll be happy to help! #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### I'm seeing a Gitlab authentication out of date error loop If you're seeing a 'GitLab Authentication is out of date' 500 server error page - this usually occurs when the deploy key in the repository settings in both dbt and GitLab do not match. No worries - this is a current issue the dbt Labs team is working on and we have a few workarounds for you to try: ###### First workaround[​](#first-workaround "Direct link to First workaround") 1. Disconnect repo from project in dbt. 2. Go to Gitlab and click on Settings > Repository. 3. Under Repository Settings, remove/revoke active dbt deploy tokens and deploy keys. 4. Attempt to reconnect your repository via dbt. 5. You would then need to check Gitlab to make sure that the new deploy key is added. 6. Once confirmed that it's added, refresh dbt and try developing once again. ###### Second workaround[​](#second-workaround "Direct link to Second workaround") 1. Keep repo in project as is -- don't disconnect. 2. Copy the deploy key generated in dbt. 3. Go to Gitlab and click on Settings > Repository. 4. Under Repository Settings, manually add to your Gitlab project deploy key repo (with `Grant write permissions` box checked). 5. Go back to dbt, refresh your page and try developing again. If you've tried the workarounds above and are still experiencing this behavior - reach out to the Support team at and we'll be happy to help! #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### I'm seeing a Gitlab authentication out of date error loop If you're seeing a 'GitLab Authentication is out of date' 500 server error page - this usually occurs when the deploy key in the repository settings in both dbt and GitLab do not match. No worries - this is a current issue the dbt Labs team is working on and we have a few workarounds for you to try: ##### 1st Workaround[​](#1st-workaround "Direct link to 1st Workaround") 1. Disconnect repo from project in dbt. 2. Go to Gitlab and click on Settings > Repository. 3. Under Repository Settings, remove/revoke active dbt deploy tokens and deploy keys. 4. Attempt to reconnect your repository via dbt. 5. You would then need to check Gitlab to make sure that the new deploy key is added. 6. Once confirmed that it's added, refresh dbt and try developing once again. ##### 2nd Workaround[​](#2nd-workaround "Direct link to 2nd Workaround") 1. Keep repo in project as is -- don't disconnect. 2. Copy the deploy key generated in dbt. 3. Go to Gitlab and click on Settings > Repository. 4. Under Repository Settings, manually add to your Gitlab project deploy key repo (with `Grant write permissions` box checked). 5. Go back to dbt, refresh your page and try developing again. If you've tried the workarounds above and are still experiencing this behavior - reach out to the Support team at and we'll be happy to help! #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### If I can name these files whatever I'd like, what should I name them? It's up to you! Here's a few options: * Default to the existing terminology: `schema.yml` (though this does make it hard to find the right file over time) * Use the same name as your directory (assuming you're using sensible names for your directories) * If you test and document one model (or seed, snapshot, macro etc.) per file, you can give it the same name as the model (or seed, snapshot, macro etc.) Choose what works for your team. We have more recommendations in our guide on [structuring dbt projects](https://docs.getdbt.com/best-practices/how-we-structure/1-guide-overview.md). #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### If I rerun dbt, will there be any downtime as models are rebuilt? Nope! The SQL that dbt generates behind the scenes ensures that any relations are replaced atomically (i.e. your business users won't experience any downtime). The implementation of this varies on each warehouse, check out the [logs](https://docs.getdbt.com/faqs/Runs/checking-logs.md) to see the SQL dbt is executing. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### If models can only be `select` statements, how do I insert records? For those coming from an ETL (Extract Transform Load) paradigm, there's often a desire to write transformations as `insert` and `update` statements. In comparison, dbt will wrap your `select` query in a `create table as` statement, which can feel counter-productive. * If you wish to use `insert` statements for performance reasons (i.e. to reduce data that is processed), consider [incremental models](https://docs.getdbt.com/docs/build/incremental-models.md) * If you wish to use `insert` statements since your source data is constantly changing (e.g. to create "Type 2 Slowly Changing Dimensions"), consider [snapshotting your source data](https://docs.getdbt.com/docs/build/sources.md#source-data-freshness), and building models on top of your snaphots. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### My compiled SQL has a lot of spaces and new lines, how can I get rid of it? This is known as "whitespace control". Use a minus sign (`-`, e.g. `{{- ... -}}`, `{%- ... %}`, `{#- ... -#}`) at the start or end of a block to strip whitespace before or after the block (more docs [here](https://jinja.palletsprojects.com/page/templates/#whitespace-control)). Check out the [tutorial on using Jinja](https://docs.getdbt.com/guides/using-jinja.md#use-whitespace-control-to-tidy-up-compiled-code) for an example. Take caution: it's easy to fall down a rabbit hole when it comes to whitespace control! #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### One of my tests failed, how can I debug it? To debug a failing test, find the SQL that dbt ran by: * dbt: * Within the test output, click on the failed test, and then select "Details". * dbt Core: * Open the file path returned as part of the error message. * Navigate to the `target/compiled/schema_tests` directory for all compiled test queries. Copy the SQL into a query editor (in dbt, you can paste it into a new `Statement`), and run the query to find the records that failed. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Receiving a 'Could not parse dbt_project.yml' error in dbt job The error message `Could not parse dbt_project.yml: while scanning for...` in your dbt job run or development usually occurs for several reasons: * There's a parsing failure in a YAML file (such as a tab indentation or Unicode characters). * Your `dbt_project.yml` file has missing fields or incorrect formatting. * Your `dbt_project.yml` file doesn't exist in your dbt project repository. To resolve this issue, consider the following: * Use an online YAML parser or validator to check for any parsing errors in your YAML file. Some known parsing errors include missing fields, incorrect formatting, or tab indentation. * Or ensure your `dbt_project.yml` file exists. Once you've identified the issue, you can fix the error and rerun your dbt job. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Receiving a `Failed to connect to DB` error when connecting to Snowflake 1. If you see the following error: ```text Failed to connect to DB: xxxxxxx.snowflakecomputing.com:443. The role requested in the connection, or the default role if none was requested in the connection ('xxxxx'), is not listed in the Access Token or was filtered. Please specify another role, or contact your OAuth Authorization server administrator. ``` 2. Edit your OAuth Security integration and explicitly specify this scope mapping attribute: ```sql ALTER INTEGRATION SET EXTERNAL_OAUTH_SCOPE_MAPPING_ATTRIBUTE = 'scp'; ``` You can read more about this error in [Snowflake's documentation](https://community.snowflake.com/s/article/external-custom-oauth-error-the-role-requested-in-the-connection-is-not-listed-in-the-access-token). *** 1. If you see the following error: ```text Failed to connect to DB: xxxxxxx.snowflakecomputing.com:443. Incorrect username or password was specified. ``` * **Unique email addresses** — Each user in Snowflake must have a unique email address. You can't have multiple users (for example, a human user and a service account) using the same email, such as `alice@acme.com`, to authenticate to Snowflake. * **Match email addresses with identity provider** — The email address of your Snowflake user must exactly match the email address you use to authenticate with your Identity Provider (IdP). For example, if your Snowflake user's email is `alice@acme.com` but you log in to Entra or Okta with `alice_adm@acme.com`, this mismatch can cause an error. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Reconnecting to Snowflake OAuth after authentication expires When you connect Snowflake to dbt platform using [OAuth](https://docs.getdbt.com/docs/cloud/manage-access/set-up-snowflake-oauth.md), dbt stores a refresh token. This allows your development credentials to remain usable in tools like the Studio IDE and the dbt Semantic Layer without needing to re-authenticate each time. If you see an `authentication has expired` error when you try to run queries, you must renew your connection between Snowflake and the dbt platform. To resolve the issue, complete the following steps: 1. Go to your **Profile settings** page, accessible from the navigation menu. 2. Navigate to **Credentials** and then choose the project where you're experiencing the issue. 3. Under **Development credentials**, click the **Reconnect Snowflake Account** button. This will guide you through re-authenticating using your SSO workflow. Your Snowflake administrator can [configure the refresh token validity period](https://docs.getdbt.com/docs/cloud/manage-access/set-up-snowflake-oauth.md#create-a-security-integration), up to the maximum 90 days. If you've tried these step and are still getting this error, please contact the Support team at for further assistance. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Should I use separate files to declare resource properties, or one large file? It's up to you: * Some folks find it useful to have one file per model (or source / snapshot / seed etc) * Some find it useful to have one per directory, documenting and testing multiple models in one file Choose what works for your team. We have more recommendations in our guide on [structuring dbt projects](https://docs.getdbt.com/best-practices/how-we-structure/1-guide-overview.md). #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### The columns of my seed changed, and now I get an error when running the `seed` command, what should I do? If you changed the columns of your seed, you may get a `Database Error`: * Snowflake * Redshift ```shell $ dbt seed Running with dbt=1.6.0-rc2 Found 0 models, 0 tests, 0 snapshots, 0 analyses, 130 macros, 0 operations, 1 seed file, 0 sources 12:12:27 | Concurrency: 8 threads (target='dev_snowflake') 12:12:27 | 12:12:27 | 1 of 1 START seed file dbt_claire.country_codes...................... [RUN] 12:12:30 | 1 of 1 ERROR loading seed file dbt_claire.country_codes.............. [ERROR in 2.78s] 12:12:31 | 12:12:31 | Finished running 1 seed in 10.05s. Completed with 1 error and 0 warnings: Database Error in seed country_codes (seeds/country_codes.csv) 000904 (42000): SQL compilation error: error line 1 at position 62 invalid identifier 'COUNTRY_NAME' Done. PASS=0 WARN=0 ERROR=1 SKIP=0 TOTAL=1 ``` ```shell $ dbt seed Running with dbt=1.6.0-rc2 Found 0 models, 0 tests, 0 snapshots, 0 analyses, 149 macros, 0 operations, 1 seed file, 0 sources 12:14:46 | Concurrency: 1 threads (target='dev_redshift') 12:14:46 | 12:14:46 | 1 of 1 START seed file dbt_claire.country_codes...................... [RUN] 12:14:46 | 1 of 1 ERROR loading seed file dbt_claire.country_codes.............. [ERROR in 0.23s] 12:14:46 | 12:14:46 | Finished running 1 seed in 1.75s. Completed with 1 error and 0 warnings: Database Error in seed country_codes (seeds/country_codes.csv) column "country_name" of relation "country_codes" does not exist Done. PASS=0 WARN=0 ERROR=1 SKIP=0 TOTAL=1 ``` In this case, you should rerun the command with a `--full-refresh` flag, like so: ```text dbt seed --full-refresh ``` **Why is this the case?** When you typically run dbt seed, dbt truncates the existing table and reinserts the data. This pattern avoids a `drop cascade` command, which may cause downstream objects (that your BI users might be querying!) to get dropped. However, when column names are changed, or new columns are added, these statements will fail as the table structure has changed. The `--full-refresh` flag will force dbt to `drop cascade` the existing table before rebuilding it. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Unable to trigger a CI job with GitLab When you connect dbt to a GitLab repository, GitLab automatically registers a webhook in the background, viewable under the repository settings. This webhook is also used to trigger [CI jobs](https://docs.getdbt.com/docs/deploy/ci-jobs.md) when you push to the repository. If you're unable to trigger a CI job, this usually indicates that the webhook registration is missing or incorrect. To resolve this issue, navigate to the repository settings in GitLab and view the webhook registrations by navigating to GitLab --> **Settings** --> **Webhooks**. Some things to check: * The webhook registration is enabled in GitLab. * The webhook registration is configured with the correct URL and secret. If you're still experiencing this issue, reach out to the Support team at and we'll be happy to help! #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### What are the best practices for installing dbt Core with pip? info The dbt Fusion engine is a next-generation, Rust-based engine that powers dbt development across the platform and local tooling. See [dbt Fusion engine](https://docs.getdbt.com/docs/fusion.md) for more information. #### Best practices[​](#best-practices "Direct link to Best practices") Managing Python local environments can be challenging! You can use these best practices to improve the dbt Core installation with `pip`. | Best practice | Recommendation | Why it matters | | ------------------------------------------------------------------------------------------------------------------------------------------------ | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------- | | [Install dbt Core with an adapter](https://docs.getdbt.com/docs/local/install-dbt.md?version=1#installing-the-adapter) and keep versions in sync | Install with: `python -m pip install dbt-core dbt-ADAPTER_NAME`

(For example, `python -m pip install dbt-core dbt-snowflake`)

Match adapter versions to your dbt Core version

| Provides a complete, compatible, and ready-to-run dbt setup



Prevents runtime errors and adapter incompatibilities | | For tooling without a warehouse connection, install dbt Core without an adapter | `python -m pip install dbt-core` | Keeps your setup lean, predictable, and easier to maintain | | Use [virtual environments](https://docs.getdbt.com/faqs/Core/install-pip-best-practices.md#using-virtual-environments) | Install dbt in an isolated environment (for example, `venv`, `pipenv`, `poetry`) | Avoids dependency conflicts | | Reactivate your virtual environment for each session | Reactivate your virtual environment at the start of each new session before installing dependencies or running dbt commands | Keeps your dbt setup predictable, isolated, and reproducible | | [Create a project](https://docs.getdbt.com/docs/local/install-dbt.md#create-a-project) | Use the `dbt init` command to create and initialize your first project | Creates a standard dbt project and verifies your installation | | Ensure you have the latest versions of `pip`, `wheel`, and `setuptools` | Before installing dbt, upgrade your Python packaging tools:

`python -m pip install --upgrade pip wheel setuptools` | Helps ensure a smoother, more predictable dbt installation | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | |
Note, dbt adapters and dbt Core are versioned and installed independently to prevent unintended changes to an existing dbt Core installation. ##### Using virtual environments[​](#using-virtual-environments "Direct link to Using virtual environments") We recommend using [virtual environments](https://docs.python-guide.org/dev/virtualenvs/) to namespace `pip` modules. Here's an example setup: ```shell python3 -m venv dbt-env # create the environment source dbt-env/bin/activate # activate the environment for Mac and Linux dbt-env\Scripts\activate # activate the environment for Windows ``` If you install `dbt` in a virtual environment, you need to reactivate that same virtual environment each time you create a shell window or session. *Tip:* You can create an alias for the `source` command in your `$HOME/.bashrc`, `$HOME/.zshrc`, or whichever rc file your shell draws from. For example, you can add a command like `alias env_dbt='source /bin/activate'`, replacing `` with the path to your virtual environment configuration. ##### Using the latest versions[​](#using-the-latest-versions "Direct link to Using the latest versions") dbt installations are tested using the latest versions of `pip` and `setuptools`. Newer versions have improved behavior around dependency resolution, as well as much faster install times by using precompiled "wheels" when available for your operating system. Before installing dbt, make sure you have the latest versions: ```shell python -m pip install --upgrade pip wheel setuptools ``` #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### What data tests are available for me to use in dbt? Out of the box, dbt ships with the following data tests: * `unique` * `not_null` * `accepted_values` * `relationships` (for example, referential integrity) You can also write your own [custom generic tests](https://docs.getdbt.com/docs/build/data-tests.md#generic-data-tests). Some additional generic tests have been open-sourced in the [dbt-utils package](https://github.com/dbt-labs/dbt-utils#generic-tests). Check out the docs on [packages](https://docs.getdbt.com/docs/build/packages.md) to learn how to make these tests available in your project. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### What data tests should I add to my project? We recommend that every model has a data test on a primary key, that is, a column that is `unique` and `not_null`. We also recommend that you test any assumptions on your source data. For example, if you believe that your payments can only be one of three payment methods, you should test that assumption regularly — a new payment method may introduce logic errors in your SQL. In advanced dbt projects, we recommend using [sources](https://docs.getdbt.com/docs/build/sources.md) and running these source data-integrity tests against the sources rather than models. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### What happens if I add new columns to my snapshot query? When the columns of your source query changes, dbt will attempt to reconcile this change in the destination snapshot table. dbt does this by: 1. Creating new columns from the source query in the destination table 2. Expanding the size of string types where necessary (eg. `varchar`s on Redshift) dbt *will not* delete columns in the destination snapshot table if they are removed from the source query. It will also not change the type of a column beyond expanding the size of varchar columns. That is, if a `string` column is changed to a `date` column in the snapshot source query, dbt will not attempt to change the type of the column in the destination table. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### What happens if one of my runs fails? If you're using dbt, we recommend setting up email and Slack notifications (`Account Settings > Notifications`) for any failed runs. Then, debug these runs the same way you would debug any runs in development. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### What happens if the SQL in my query is bad or I get a database error? If there's a mistake in your SQL, dbt will return the error that your database returns. ```shell $ dbt run --select customers Running with dbt=1.9.0 Found 3 models, 9 tests, 0 snapshots, 0 analyses, 133 macros, 0 operations, 0 seed files, 0 sources 14:04:12 | Concurrency: 1 threads (target='dev') 14:04:12 | 14:04:12 | 1 of 1 START view model dbt_alice.customers.......................... [RUN] 14:04:13 | 1 of 1 ERROR creating view model dbt_alice.customers................. [ERROR in 0.81s] 14:04:13 | 14:04:13 | Finished running 1 view model in 1.68s. Completed with 1 error and 0 warnings: Database Error in model customers (models/customers.sql) Syntax error: Expected ")" but got identifier `your-info-12345` at [13:15] compiled SQL at target/run/jaffle_shop/customers.sql Done. PASS=0 WARN=0 ERROR=1 SKIP=0 TOTAL=1 ``` Any models downstream of this model will also be skipped. Use the error message and the [compiled SQL](https://docs.getdbt.com/faqs/Runs/checking-logs.md) to debug any errors. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### What if my source is in a different database to my target database? Use the [`database` property](https://docs.getdbt.com/reference/resource-properties/database.md) to define the database that the source is in. models/\.yml ```yml sources: - name: jaffle_shop database: raw schema: jaffle_shop tables: - name: orders - name: customers ``` #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### What if my source is in a poorly named schema or table? By default, dbt will use the `name:` parameters to construct the source reference. If these names are a little less-than-perfect, use the [schema](https://docs.getdbt.com/reference/resource-properties/schema.md) and [identifier](https://docs.getdbt.com/reference/resource-properties/identifier.md) properties to define the names as per the database, and use your `name:` property for the name that makes sense! models/\.yml ```yml sources: - name: jaffle_shop database: raw schema: postgres_backend_public_schema tables: - name: orders identifier: api_orders ``` In a downstream model: ```sql select * from {{ source('jaffle_shop', 'orders') }} ``` Will get compiled to: ```sql select * from raw.postgres_backend_public_schema.api_orders ``` #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### What materializations are available in dbt? dbt ships with five built-in materializations: `view`, `table`, `incremental`, `ephemeral`, and `materialized_view`. Check out the documentation on [materializations](https://docs.getdbt.com/docs/build/materializations.md) for more information on each of these options. You can also create your own [custom materializations](https://docs.getdbt.com/guides/create-new-materializations.md). This is an advanced feature of dbt. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### What model configurations exist? You can also configure: * [tags](https://docs.getdbt.com/reference/resource-configs/tags.md) to support easy categorization and graph selection * [custom schemas](https://docs.getdbt.com/reference/resource-properties/schema.md) to split your models across multiple schemas * [aliases](https://docs.getdbt.com/reference/resource-configs/alias.md) if your view/table name should differ from the filename * Snippets of SQL to run at the start or end of a model, known as [hooks](https://docs.getdbt.com/docs/build/hooks-operations.md) * Warehouse-specific configurations for performance (e.g. `sort` and `dist` keys on Redshift, `partitions` on BigQuery) Check out the docs on [model configurations](https://docs.getdbt.com/reference/model-configs.md) to learn more. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### What parts of Jinja are dbt-specific? There are certain expressions that are specific to dbt — these are documented in the [Jinja function reference](https://docs.getdbt.com/reference/dbt-jinja-functions-context-variables.md) section of these docs. Further, docs blocks, snapshots, and materializations are custom Jinja *blocks* that exist only in dbt. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### What privileges does my database user need to use dbt? Your user will need to be able to: * `select` from raw data in your warehouse (i.e. data to be transformed) * `create` schemas, and therefore create tables/views within that schema¹ * read system views to generate documentation (i.e. views in `information_schema`) On Postgres, Redshift, Databricks, and Snowflake, use a series of `grants` to ensure that your user has the correct privileges. Check out [example permissions](https://docs.getdbt.com/reference/database-permissions/about-database-permissions.md) for these warehouses. On BigQuery, use the "BigQuery User" role to assign these privileges. *** ¹Alternatively, a separate user can create a schema for the dbt user, and then grant the user privileges to create within this schema. We generally recommend granting your dbt user the ability to create schemas, as it is less complicated to implement. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### What should I name my profile? We typically use a company name for a profile name, and then use targets to differentiate between `dev` and `prod`. Check out the docs on [environments in dbt Core](https://docs.getdbt.com/docs/local/dbt-core-environments.md) for more information. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### What should I name my target? We typically use targets to differentiate between development and production runs of dbt, naming the targets `dev` and `prod`, respectively. Check out the docs on [managing environments in dbt Core](https://docs.getdbt.com/docs/local/dbt-core-environments.md) for more information. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### What should my profiles.yml file look like for my warehouse? The structure of a profile looks different on each warehouse. Check out the [Supported Data Platforms](https://docs.getdbt.com/docs/supported-data-platforms.md) page, and navigate to the `Profile Setup` section for your warehouse. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### What version of Python can I use? Use this table to match dbt Core versions with their compatible Python versions. New [dbt minor versions](https://docs.getdbt.com/docs/dbt-versions/core.md#minor-versions) will add support for new Python3 minor versions when all dependencies can support it. In addition, dbt minor versions will withdraw support for old Python3 minor versions before their [end of life](https://endoflife.date/python). #### Python compatibility matrix[​](#python-compatibility-matrix "Direct link to Python compatibility matrix") | dbt-core version | v1.11 | v1.10 | v1.9 | v1.8 | v1.7 | v1.6 | v1.5 | v1.4 | v1.3 | v1.2 | v1.1 | v1.0 | | ---------------- | ----- | ----- | ---- | ---- | ---- | ---- | ---- | ---- | ---- | ---- | ---- | ---- | | Python 3.13 | ✅ | ⚠️ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | | Python 3.12 | ✅ | ✅ | ✅ | ✅ | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | | Python 3.11 | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ❌ | ❌ | ❌ | ❌ | | Python 3.10 | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | ⚠️ Python 3.13 is supported in dbt Core v1.10 for the Postgres adapter. Adapter plugins and their dependencies are not always compatible with the latest version of Python. Note that this shouldn't be confused with [dbt Python models](https://docs.getdbt.com/docs/build/python-models.md#specific-data-platforms). If you're using a data platform that supports Snowpark, use the `python_version` config to run a Snowpark model with [Python versions](https://docs.snowflake.com/en/developer-guide/snowpark/python/setup) 3.9, 3.10, or 3.11. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### When should I run my data tests? You should run your data tests whenever you are writing new code (to ensure you haven't broken any existing models by changing SQL), and whenever you run your transformations in production (to ensure that your assumptions about your source data are still valid). #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### When should I use a UDF instead of a macro? Both user-defined functions (UDFs) and macros let you reuse logic across your dbt project, but they work in fundamentally different ways. Here's when to use each: ###### Use UDFs when:[​](#use-udfs-when "Direct link to Use UDFs when:")  You need logic accessible outside dbt UDFs are created in your warehouse and can be used by BI tools, data science notebooks, SQL clients, or any other tool that connects to your warehouse. Macros only work within dbt.  You want to standardize warehouse-native functions UDFs let you create reusable warehouse functions for data validation, custom formatting, or business-specific calculations that need to be consistent across all your data tools. Once created, they become part of your warehouse's function catalog.  You want dbt to manage the function lifecycle dbt manages UDFs as part of your DAG execution, ensuring they're created before models that reference them. You can version control UDF definitions alongside your models, test changes in development environments, and deploy them together through CI/CD pipelines.  Jinja compiles at creation time, not on each function call You can use Jinja (loops, conditionals, macros, `ref`, `source`, `var`) inside a UDF configuration. dbt resolves that Jinja **when the UDF is created**, and the resulting SQL body is what gets stored in your warehouse. Jinja influences the function when it’s created, whereas arguments influence it when it runs in the warehouse: * ✅ **Allowed:** Jinja that depends on project or build-time state — for example, `var(“can_do_things”)`, static `ref(‘orders’)`, or environment-specific logic. These are all evaluated once at creation time. * ❌ **Not allowed:** Jinja that depends on **function arguments** passed at runtime. The compiler can’t see those, so dynamic `ref(ref_name)` or conditional Jinja based on argument values won’t work.  You need Python logic that runs in your warehouse A Python UDF creates a Python function directly within your data warehouse, which you can invoke using SQL.
This makes it easier to apply complex transformations, calculations, or logic that would be difficult or verbose to express in SQL. Python UDFs support conditionals and looping within the function logic itself (using Python syntax), and execute at runtime, not at compile time like macros. Python UDFs are currently supported in Snowflake and BigQuery. ###### Use macros when:[​](#use-macros-when "Direct link to Use macros when:")  You need to generate SQL at compile time Macros generate SQL dynamically **before** it's sent to the warehouse (at compile time). This is essential for: * Building different SQL for different warehouses * Generating repetitive SQL patterns (like creating dozens of similar columns) * Creating entire model definitions or DDL statements * Dynamically referencing models based on project structure UDFs execute **at query runtime** in the warehouse. While they can use Jinja templating in their definitions, they don't generate new SQL queries—they're pre-defined functions that get called by your SQL. Expanding UDFs Currently, SQL and Python UDFs are supported. Java and Scala UDFs are planned for future releases.  You want to generate DDL or DML statements Currently, SQL and Python UDFs are supported. Java and Scala UDFs are planned for future releases.  You need to adapt SQL across different warehouses Macros can use Jinja conditional logic to generate warehouse-specific SQL (see [cross-database macros](https://docs.getdbt.com/reference/dbt-jinja-functions/cross-database-macros.md)), making your dbt project portable across platforms. UDFs are warehouse-specific objects. Even though UDFs can include Jinja templating in their definitions, each warehouse has different syntax for creating functions, different supported data types, and different SQL dialects. You would need to define separate UDF files for each warehouse you support.  Your logic needs access to dbt context Both macros and UDFs can use Jinja, which means they can access dbt context variables like `{{ ref() }},` `{{ source() }}`, environment variables, and project configurations. You can even call a macro from within a UDF (and vice versa) to combine dynamic SQL generation with runtime execution. However, the difference between the two is *when* the logic runs: * Macros run at compile time, generating SQL before it’s sent to the warehouse. * UDFs run inside the warehouse at query time.  You want to avoid creating warehouse objects Macros don't create anything in your warehouse; they just generate SQL at compile time. UDFs create actual function objects in your warehouse that need to be managed. ###### Can I use both together?[​](#can-i-use-both-together "Direct link to Can I use both together?") Yes! You can use a macro to call a UDF or call a macro from within a UDF, combining the benefits of both. So the following example shows how to use a macro to define default values for arguments alongside your logic, for your UDF ```sql {% macro cents_to_dollars(column_name, scale=2) %} {{ function('cents_to_dollars') }}({{ column_name }}, {{scale}}) {% endmacro %} ``` ###### Related documentation[​](#related-documentation "Direct link to Related documentation") * [User-defined functions](https://docs.getdbt.com/docs/build/udfs.md) * [Jinja macros](https://docs.getdbt.com/docs/build/jinja-macros.md) #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Where can I find my user ID? Knowing your dbt user ID can help with interacting with support. To find your user ID in the dbt platform, read the following steps: 1. Click your account name at the bottom left-side menu and go to **Account settings** > **Users**. 2. Select your user.
3. Go to the address bar. The number after `/users` is your user ID.
For example, if the URL is `https://YOUR_ACCESS_URL/settings/accounts/12345/users/67891` — the user ID is `67891`.
4. Copy that number and save it somewhere safe.
#### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Which docs should I use when writing Jinja or creating a macro? If you are stuck with a Jinja issue, it can get confusing where to check for more information. We recommend you check (in order): 1. [Jinja's Template Designer Docs](https://jinja.palletsprojects.com/page/templates/): This is the best reference for most of the Jinja you'll use 2. [Our Jinja function reference](https://docs.getdbt.com/reference/dbt-jinja-functions-context-variables.md): This documents any additional functionality we've added to Jinja in dbt. 3. [Agate's table docs](https://agate.readthedocs.io/page/api/table.html): If you're operating on the result of a query, dbt will pass it back to you as an agate table. This means that the methods you call on the table belong to the Agate library rather than Jinja or dbt. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Which materialization should I use for my model? Start out with views, and then change models to tables when required for performance reasons (i.e. downstream queries have slowed). Check out the [docs on materializations](https://docs.getdbt.com/docs/build/materializations.md) for advice on when to use each materialization. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Which SQL dialect should I write my models in? Or which SQL dialect does dbt use? dbt can feel like magic, but it isn't actually magic. Under the hood, it's running SQL in your own warehouse — your data is not processed outside of your warehouse. As such, your models should just use the **SQL dialect of your own database**. Then, when dbt wraps your `select` statements in the appropriate DDL or DML, it will use the correct DML for your warehouse — all of this logic is written in to dbt. You can find more information about the databases, platforms, and query engines that dbt supports in the [Supported Data Platforms](https://docs.getdbt.com/docs/supported-data-platforms.md) docs. Want to go a little deeper on how this works? Consider a snippet of SQL that works on each warehouse: models/test\_model.sql ```sql select 1 as my_column ``` To replace an existing table, here's an *illustrative* example of the SQL dbt will run on different warehouses (the actual SQL can get much more complicated than this!) * Redshift * BigQuery * Snowflake ```sql -- you can't create or replace on redshift, so use a transaction to do this in an atomic way begin; create table "dbt_alice"."test_model__dbt_tmp" as ( select 1 as my_column ); alter table "dbt_alice"."test_model" rename to "test_model__dbt_backup"; alter table "dbt_alice"."test_model__dbt_tmp" rename to "test_model" commit; begin; drop table if exists "dbt_alice"."test_model__dbt_backup" cascade; commit; ``` ```sql -- Make an API call to create a dataset (no DDL interface for this)!!; create or replace table `dbt-dev-87681`.`dbt_alice`.`test_model` as ( select 1 as my_column ); ``` ```sql create schema if not exists analytics.dbt_alice; create or replace table analytics.dbt_alice.test_model as ( select 1 as my_column ); ``` #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Why am I getting an "account in use" error? If you're receiving an 'Account in use' error when trying to integrate GitHub in your Profile page, this is because the Git integration is a 1-to-1 integration, so you can only have your Git account linked to one dbt user account. Here are some steps to take to get you unstuck: * Log in to the dbt account integrated with your Git account. Go to your user profile and click on Integrations to remove the link. If you don't remember which dbt account is integrated, please email dbt Support at and we'll do our best to disassociate the integration for you. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Why am I receiving a Runtime Error in my packages? If you're receiving the runtime error below in your packages.yml folder, it may be due to an old version of your dbt\_utils package that isn't compatible with your current dbt version. ```shell Running with dbt=xxx Runtime Error Failed to read package: Runtime Error Invalid config version: 1, expected 2 Error encountered in dbt_utils/dbt_project.yml ``` Try updating the old version of the dbt\_utils package in your packages.yml to the latest version found in the [dbt hub](https://hub.getdbt.com/dbt-labs/dbt_utils/latest/): ```shell packages: - package: dbt-labs/dbt_utils version: xxx ``` If you've tried the workaround above and are still experiencing this behavior - reach out to the Support team at and we'll be happy to help! #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Why are profiles stored outside of my project? Profiles are stored separately to dbt projects to avoid checking credentials into version control. Database credentials are extremely sensitive information and should **never be checked into version control**. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Why can't I just write DML in my transformations? ###### `select` statements make transformations accessible[​](#select-statements-make-transformations-accessible "Direct link to select-statements-make-transformations-accessible") More people know how to write `select` statements, than DML, making the transformation layer accessible to more people! ###### Writing good DML is hard[​](#writing-good-dml-is-hard "Direct link to Writing good DML is hard") If you write the DDL / DML yourself you can end up getting yourself tangled in problems like: * What happens if the table already exists? Or this table already exists as a view, but now I want it to be a table? * What if the schema already exists? Or, should I check if the schema already exists? * How do I replace a model atomically (such that there's no down-time for someone querying the table) * What if I want to parameterize my schema so I can run these transformations in a development environment? * What order do I need to run these statements in? If I run a `cascade` does it break other things? Each of these problems *can* be solved, but they are unlikely to be the best use of your time. ###### dbt does more than generate SQL[​](#dbt-does-more-than-generate-sql "Direct link to dbt does more than generate SQL") You can test your models, generate documentation, create snapshots, and more! ###### You reduce your vendor lock in[​](#you-reduce-your-vendor-lock-in "Direct link to You reduce your vendor lock in") SQL dialects tend to diverge the most in DML and DDL (rather than in `select` statements) — check out the example [here](https://docs.getdbt.com/faqs/Models/sql-dialect.md). By writing less SQL, it can make a migration to a new database technology easier. If you do need to write custom DML, there are ways to do this in dbt using [custom materializations](https://docs.getdbt.com/guides/create-new-materializations.md). #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Why dbt compile needs a data platform connection `dbt compile` needs a data platform connection in order to gather the info it needs (including from introspective queries) to prepare the SQL for every model in your project. ##### dbt compile[​](#dbt-compile "Direct link to dbt compile") The [`dbt compile` command](https://docs.getdbt.com/reference/commands/compile.md) generates executable SQL from `source`, `model`, `test`, and `analysis` files. `dbt compile` is similar to `dbt run` except that it doesn't materialize the model's compiled SQL into an existing table. So, up until the point of materialization, `dbt compile` and `dbt run` are similar because they both require a data platform connection, run queries, and have an [`execute` variable](https://docs.getdbt.com/reference/dbt-jinja-functions/execute.md) set to `True`. However, here are some things to consider: * You don't need to execute `dbt compile` before `dbt run` * In dbt, `compile` doesn't mean `parse`. This is because `parse` validates your written `YAML`, configured tags, and so on. ##### Introspective queries[​](#introspective-queries "Direct link to Introspective queries") To generate the compiled SQL for many models, dbt needs to run introspective queries, (which is when dbt needs to run SQL in order to pull data back and do something with it) against the data platform. These introspective queries include: * Populating the relation cache. For more information, refer to the [Create new materializations](https://docs.getdbt.com/guides/create-new-materializations.md) guide. Caching speeds up the metadata checks, including whether an [incremental model](https://docs.getdbt.com/docs/build/incremental-models.md) already exists in the data platform. * Resolving [macros](https://docs.getdbt.com/docs/build/jinja-macros.md#macros), such as `run_query` or `dbt_utils.get_column_values` that you're using to template out your SQL. This is because dbt needs to run those queries during model SQL compilation. Without a data platform connection, dbt can't perform these introspective queries and won't be able to generate the compiled SQL needed for the next steps in the dbt workflow. You can [`parse`](https://docs.getdbt.com/reference/commands/parse.md) a project and use the [`list`](https://docs.getdbt.com/reference/commands/list.md) resources in the project, without an internet or data platform connection. Parsing a project is enough to produce a [manifest](https://docs.getdbt.com/reference/artifacts/manifest-json.md), however, keep in mind that the written-out manifest won't include compiled SQL. To configure a project, you do need a [connection profile](https://docs.getdbt.com/docs/local/profiles.yml.md) (`profiles.yml` if using the CLI). You need this file because the project's configuration depends on its contents. For example, you may need to use [`{{target}}`](https://docs.getdbt.com/reference/dbt-jinja-functions/target.md) for conditional configs or know what platform you're running against so that you can choose the right flavor of SQL. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Why do I need to quote column names in Jinja? In the [macro example](https://docs.getdbt.com/docs/build/jinja-macros.md#macros) we passed the column name `amount` quotes: ```sql {{ cents_to_dollars('amount') }} as amount_usd ``` We have to use quotes to pass the *string* `'amount'` to the macro. Without the quotes, the Jinja parser will look for a variable named `amount`. Since this doesn't exist, it will compile to nothing. Quoting in Jinja can take a while to get used to! The rule is that you're within a Jinja expression or statement (i.e. within `{% ... %}` or `{{ ... }}`), you'll need to use quotes for any arguments that are strings. Single and double quotes are equivalent in Jinja – just make sure you match them appropriately. And if you do need to pass a variable as an argument, make sure you [don't nest your curlies](https://docs.getdbt.com/best-practices/dont-nest-your-curlies.md). #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Why do model and source YAML files always start with `version: 2`? Once upon a time, the structure of these `.yml` files was very different (s/o to anyone who was using dbt back then!). Adding `version: 2` allowed us to make this structure more extensible. From [dbt Core v1.5](), the top-level `version:` key is optional in all resource YAML files. If present, only `version: 2` is supported. Also starting in v1.5, both the [`config-version: 2`](https://docs.getdbt.com/reference/project-configs/config-version.md) and the top-level `version:` key in the `dbt_project.yml` are optional. Resource YAML files do not currently require this config. We only support `version: 2` if it's specified. Although we do not expect to update YAML files to `version: 3` soon, having this config will make it easier for us to introduce new structures in the future #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Why does my dbt output have so many macros in it? The output of a dbt run counts over 100 macros in your project! ```shell $ dbt run Running with dbt=1.7.0 Found 1 model, 0 tests, 0 snapshots, 0 analyses, 138 macros, 0 operations, 0 seed files, 0 sources ``` This is because dbt ships with its own project, which also includes macros! You can learn more about this [here](https://discourse.getdbt.com/t/did-you-know-dbt-ships-with-its-own-project/764). #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Why does the BigQuery OAuth application require scopes to Google Drive? BigQuery supports external tables over both personal Google Drive files and shared files. For more information, refer to [Create Google Drive external tables](https://cloud.google.com/bigquery/docs/external-data-drive). #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Why doesn’t an indirectly referenced upstream public model appear in Explorer? For [project dependencies](https://docs.getdbt.com/docs/mesh/govern/project-dependencies.md) in Mesh, [Catalog](https://docs.getdbt.com/docs/explore/explore-multiple-projects.md) only displays directly referenced [public models](https://docs.getdbt.com/docs/mesh/govern/model-access.md) from upstream projects, even if an upstream model indirectly depends on another public model. So for example, if: * `project_b` adds `project_a` as a dependency * `project_b`'s model `downstream_c` references `project_a.upstream_b` * `project_a.upstream_b` references another public model, `project_a.upstream_a` Then: * In Explorer, only directly referenced public models (`upstream_b` in this case) appear. * In the [Studio IDE](https://docs.getdbt.com/docs/cloud/studio-ide/develop-in-studio.md) lineage view, however, `upstream_a` (the indirect dependency) *will* appear because dbt dynamically resolves the full dependency graph. This behavior makes sure that Catalog only shows the immediate dependencies available to that specific project. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Why is Run on Pull request grayed out? If you're unable to enable Run on Pull requests, you'll want to make sure your existing repo was not added via the Deploy Key auth method. If it was added via a deploy key method, you'll want to use the [GitHub auth method](https://docs.getdbt.com/docs/cloud/git/connect-github.md) to enable CI in dbt. To go ahead and enable 'Run on Pull requests', you'll want to remove dbt from the Apps & Integration on GitHub and re-integrate it again via the GitHub app method. If you've tried the workaround above and are still experiencing this behavior - reach out to the Support team at and we'll be happy to help! #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Why might my actual warehouse costs differ from displayed costs? Cost Insights shows estimates based on warehouse-reported usage and your configured pricing variables. These estimates are based on a retroactive analysis of historical runs and reflect actual usage, *not* forecasts of future costs. Adjustments and differences may occur if: * Your warehouse has custom pricing that differs from the default compute credit unit. * There are discounts or credits applied at the billing level that aren't reflected in usage tables. * Costs include other charges beyond compute. Costs Insights in the dbt platform is designed to be directionally accurate, showing you dbt-specific components rather than matching your billing exactly. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Why would I want to impersonate a service account? You may want your models to be built using a dedicated service account that has elevated access to read or write data to the specified project or dataset. Typically, this requires you to create a service account key for running under development or on your CI server. By specifying the email address of the service account you want to build models as, you can use [Application Default Credentials](https://cloud.google.com/sdk/gcloud/reference/auth/application-default) or the service's configured service account (when running in GCP) to assume the identity of the service account with elevated permissions. This allows you to reap the advantages of using federated identity for developers (via ADC) without needing to grant individual access to read and write data directly, and without needing to create separate service account and keys for each user. It also allows you to completely eliminate the need for service account keys in CI as long as your CI is running on GCP (Cloud Build, Jenkins, GitLab/Github Runners, etc). #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ## Fusion ### A new concept: static analysis The dbt Fusion engine [fully comprehends your project's SQL](https://docs.getdbt.com/blog/the-levels-of-sql-comprehension), enabling advanced capabilities like dialect-aware validation and precise column-level lineage. It can do this because its compilation step is more comprehensive than that of the dbt Core engine. When dbt Core referred to *compilation*, it only meant *rendering* — converting Jinja-templated strings into a SQL query to send to a database. The dbt Fusion engine can also render Jinja, but then it completes a second phase: *static analysis*, producing and validating a logical plan for every rendered query in the project. This step is the cornerstone of Fusion's new capabilities. | Step | dbt Core engine | dbt Fusion engine | | ------------------------------------------- | --------------- | ----------------- | | Render Jinja into SQL | ✅ | ✅ | | Produce and statically analyze logical plan | ❌ | ✅ | | Run rendered SQL | ✅ | ✅ | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | #### Principles of static analysis[​](#principles-of-static-analysis "Direct link to Principles of static analysis") The software engineering concept of [static analysis](https://en.wikipedia.org/wiki/Static_program_analysis) describes checks that can be done on code before it runs (static == not running). The most rigorous static analysis means you can trust that if the analysis succeeds, the code will run in production without compilation errors. Less strict static analysis also surfaces helpful information to developers as they work. There's no free lunch—what you gain in responsiveness you lose in correctness guarantees. The dbt Fusion engine uses the [`static_analysis`](https://docs.getdbt.com/reference/resource-configs/static-analysis.md) config to help you control how it performs static analysis for your models. The dbt Fusion engine is unique in that it can statically analyze not just a single model in isolation, but every query from one end of your DAG to the other. Even your database can only validate the query in front of it! Concepts like [information flow theory](https://roundup.getdbt.com/i/156064124/beyond-cll-information-flow-theory-and-metadata-propagation) — although not incorporated into the dbt platform [yet](https://www.getdbt.com/blog/where-we-re-headed-with-the-dbt-fusion-engine) — rely on stable inputs and the ability to trace columns DAG-wide. ##### Baseline mode: A smooth transition from dbt Core[​](#baseline-mode-a-smooth-transition-from-dbt-core "Direct link to Baseline mode: A smooth transition from dbt Core") The dbt Fusion engine defaults to `static_analysis: baseline` mode, inspired by similar type-checking and linting tools like [TypeScript's migration approach](https://www.typescriptlang.org/docs/handbook/migrating-from-javascript.html), [basedpyright's baseline feature](https://docs.basedpyright.com/latest/benefits-over-pyright/baseline/), and [Pydantic's strict/lax modes](https://docs.pydantic.dev/latest/why/#strict-lax). The philosophy behind the above-mentioned tools and Fusion's baseline mode is: * **Smooth transition**: Provide a familiar first-time experience for users coming from dbt Core. * **Incremental opt-in**: Offer a clear pathway to adopt more Fusion features over time. * **Pragmatic validation**: Catch most SQL errors without requiring a complete project overhaul. Use this style of gradual typing to start with lightweight validation, then incrementally adopt strict guarantees as your project is ready. ###### What baseline mode changes[​](#what-baseline-mode-changes "Direct link to What baseline mode changes") Baseline mode introduces several fundamental behavior changes compared to the previous binary (off/on) approach: * **No downloading of remote schemas** — Baseline mode does not fetch schemas from the warehouse. * **Unit tests work without strict mode** — Previously, unit tests required static analysis to be fully on. In baseline mode, they work out of the box. * **No unsafe introspection warnings** — We no longer warn about unsafe introspection, though we'd still love to help you assess it in the future. The following table shows how baseline mode expands what's available without requiring strict mode. ##### LSP feature comparison[​](#lsp-feature-comparison "Direct link to LSP feature comparison") Baseline mode unlocks a meaningful set of features without requiring strict mode. We're also investing in moving more features into baseline over time. VS Code extension features by static analysis configuration: ✅ = Available | ❌ = Not available | Feature | off | baseline | strict | | ---------------------------------------------- | --- | -------- | ------ | | Go-to-definition/reference (except columns) | ✅ | ✅ | ✅ | | Table lineage | ✅ | ✅ | ✅ | | YAML validation | ✅ | ✅ | ✅ | | Render + preview SQL | ✅ | ✅ | ✅ | | Unit tests | ✅ | ✅ | ✅ | | Detect syntax errors | ❌ | ✅ | ✅ | | Preview CTE results | ❌ | ✅ | ✅ | | Go-to-definition/reference (columns) | ❌ | ❌ | ✅ | | Automatic refactor column names | ❌ | ❌ | ✅ | | Rich column lineage | ❌ | ❌ | ✅ | | Detect data type and function signature errors | ❌ | ❌ | ✅ | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | CodeLens visibility The VS Code extension and Studio IDE provide CodeLens even when static analysis is off, giving you visibility into which models have static analysis disabled and why. Ultimately, we want everyone developing in strict mode for maximum guarantees. We acknowledge this isn't a change that can happen overnight — baseline exists to smooth the transition. Many planned features (like local compute) require strict mode. We're also exploring inferring column types on your behalf, which would enable more functionality in baseline mode without requiring you to manually provide type information. ##### Introspection handling in baseline mode[​](#introspection-handling-in-baseline-mode "Direct link to Introspection handling in baseline mode") In `baseline` mode, all static analysis findings are warnings, not errors — your project can continue running even when the compiler flags invalid or problematic SQL. This section is a good example of why that design exists. Previously, with `strict` mode, the system assumed local schemas of your compiled models would be available. In `baseline` mode, we can no longer assume the full local schema is available and complete, so `baseline` uses the remote database as the source of truth — similar to dbt Core. The practical result is that the Fusion compiler may sometimes flag incorrect queries that result from introspective queries that come back empty. If you encounter this, you can: 1. Ignore the warning 2. Build the model locally 3. (Coming soon) Use `warn_error_options` to disable the warning For example, consider this query using the `dbt_utils.unpivot` macro: ```sql select * from ( {{ dbt_utils.unpivot( relation=ref('example_model'), cast_to='integer', exclude=['order_id', 'customer_id'], field_name='product_type', value_name='quantity' ) }} ) ``` If the introspection query fails or returns no results, this renders to: ```sql select * from ( ) ``` This is invalid SQL. In `baseline` mode, Fusion displays a warning so your project can continue running while still alerting you to the issue: ```bash dbt0101: no viable alternative at input '( )' --> models/example_model.sql:17:1 ``` ###### Migration scenarios[​](#migration-scenarios "Direct link to Migration scenarios") Migrating to Fusion can involve more than moving YAML around. Some scenarios that can make migration more involved include: 1. **Limited access to sources**: You don't have access to all the sources and models of a large dbt project. 2. **Intricate Jinja workflows**: Your project uses post-hooks and introspection extensively. 3. **Package compatibility**: Your project depends on packages that aren't yet Fusion-compatible. 4. **Unsupported SQL features**: Your models or sources use advanced data types (`STRUCT`, `ARRAY`, `GEOGRAPHY`) or built-in functions (`AI.PREDICT`, `JSON_FLATTEN`, `st_pointfromgeohash`) not yet supported by the dbt Fusion engine. Setting `static_analysis` to `baseline` mode lets you start using Fusion immediately while you address these scenarios incrementally. As you resolve compatibility issues, you can opt specific models or your entire project into `strict` mode for maximum validation guarantees. #### Recapping the differences between engines[​](#recapping-the-differences-between-engines "Direct link to Recapping the differences between engines") dbt Core: * Renders and runs models one at a time. * Never runs static analysis. The dbt Fusion engine (baseline mode — default): * Statically analyzes all models, catching most SQL errors while providing a familiar migration experience. The dbt Fusion engine (strict mode): * Renders and statically analyzes all models before execution begins. * Guarantees nothing runs until the entire project is proven valid. #### Configuring `static_analysis`[​](#configuring-static_analysis "Direct link to configuring-static_analysis") You can modify the way static analysis is applied for specific models in your project. The static analysis configuration cascades from most strict to least strict. Going downstream in your lineage, a model can keep the same mode or relax it — it can't be stricter than its parent. For the full rules and examples, see [How static analysis modes cascade](https://docs.getdbt.com/reference/resource-configs/static-analysis.md#how-static-analysis-modes-cascade). The [`static_analysis`](https://docs.getdbt.com/reference/resource-configs/static-analysis.md) config options are: * `baseline` (default): Statically analyze SQL. This is the recommended starting point for users transitioning from dbt Core, providing a smooth migration experience while still catching most SQL errors. * `strict` (previously `on`): Statically analyze all SQL before execution begins. Use this for maximum validation guarantees — nothing runs until the entire project is proven valid. * `off`: Skip SQL analysis on this model and its descendants. Deprecated values The `on` and `unsafe` values are deprecated and will be removed in May 2026. Use `strict` instead. When you disable static analysis, features of the VS Code extension which depend on SQL comprehension will be unavailable. The best place to configure `static_analysis` is as a config on an individual model or group of models. As a debugging aid, you can also use the [`--static-analysis strict` or `--static-analysis off` CLI flags](https://docs.getdbt.com/reference/global-configs/static-analysis-flag.md) to override all model-level configuration. ##### Incrementally adopting strict mode[​](#incrementally-adopting-strict-mode "Direct link to Incrementally adopting strict mode") Once you're comfortable with Fusion in baseline mode, you can incrementally opt models or directories into `strict` mode: dbt\_project.yml ```yml name: jaffle_shop models: jaffle_shop: # Start with strict analysis on your cleanest models staging: +static_analysis: strict # Keep baseline for models that need more work marts: +static_analysis: baseline ``` This approach lets you gain the benefits of strict validation where possible while keeping the flexibility of baseline analysis for models that aren't yet compatible. Refer to [CLI options](https://docs.getdbt.com/reference/global-configs/command-line-options.md) and [Configurations and properties](https://docs.getdbt.com/reference/configs-and-properties.md) to learn more about configs. ##### Example configurations[​](#example-configurations "Direct link to Example configurations") Disable static analysis for all models in a package: dbt\_project.yml ```yml name: jaffle_shop models: jaffle_shop: marts: +materialized: table a_package_with_introspective_queries: +static_analysis: off ``` Disable static analysis in YAML: models/my\_udf\_using\_model.yml ```yml models: - name: model_with_static_analysis_off config: static_analysis: off ``` Disable static analysis for a model using a custom UDF: models/my\_udf\_using\_model.sql ```sql {{ config(static_analysis='off') }} select user_id, my_cool_udf(ip_address) as cleaned_ip from {{ ref('my_model') }} ``` ##### When should I turn static analysis `off`?[​](#when-should-i-turn-static-analysis-off "Direct link to when-should-i-turn-static-analysis-off") With baseline mode enabled by default, static analysis is less likely to block your runs. You should only disable it if the dbt Fusion engine cannot parse SQL that is valid for your database of choice. This is a very rare occurrence. If you encounter this situation, please [open an issue](https://github.com/dbt-labs/dbt-fusion/issues) with an example of the failing SQL so we can update our parsers. #### More information about Fusion[​](#more-information-about-fusion "Direct link to More information about Fusion") Fusion marks a significant update to dbt. While many of the workflows you've grown accustomed to remain unchanged, there are a lot of new ideas, and a lot of old ones going away. The following is a list of the full scope of our current release of the Fusion engine, including implementation, installation, deprecations, and limitations: * [About the dbt Fusion engine](https://docs.getdbt.com/docs/fusion/about-fusion.md) * [About the dbt extension](https://docs.getdbt.com/docs/about-dbt-extension.md) * [New concepts in Fusion](https://docs.getdbt.com/docs/fusion/new-concepts.md) * [Supported features matrix](https://docs.getdbt.com/docs/fusion/supported-features.md) * [Installing Fusion CLI](https://docs.getdbt.com/docs/local/install-dbt.md?version=2#get-started) * [Installing VS Code extension](https://docs.getdbt.com/docs/install-dbt-extension.md) * [Fusion release track](https://docs.getdbt.com/docs/dbt-versions/upgrade-dbt-version-in-cloud.md#dbt-fusion-engine) * [Quickstart for Fusion](https://docs.getdbt.com/guides/fusion.md?step=1) * [Upgrade guide](https://docs.getdbt.com/docs/dbt-versions/core-upgrade/upgrading-to-fusion.md) * [Fusion licensing](http://www.getdbt.com/licenses-faq) #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### About profiles.yml If you're using dbt from the command line, you need a `profiles.yml` file that contains the connection details for your data platform. dbt platform accounts dbt platform projects don't require a profiles.yml file unless you're developing from your local machine instead of the cloud-based UI. #### About profiles.yml[​](#about-profilesyml "Direct link to About profiles.yml") The `profiles.yml` file stores database connection credentials and configuration for dbt projects, including: * **Connection details** — Account identifiers, hosts, ports, and authentication credentials. * **Target definitions** — Define different environments (dev, staging, prod) within a single profile. * **Default target** — Set which environment to use by default. * **Execution parameters** — Thread count, timeouts, and retry settings. * **Credential separation** — Keep sensitive information out of version control. The `profile` field in [`dbt_project.yml`](https://docs.getdbt.com/reference/dbt_project.yml.md) references a profile name defined in `profiles.yml`. #### Location of profiles.yml[​](#location-of-profilesyml "Direct link to Location of profiles.yml") Only one `profiles.yml` file is required and it can manage multiple projects and connections. * dbt Fusion * dbt Core Fusion searches for the parent directory of `profiles.yml` in the following order and uses the first location it finds: 1. `--profiles-dir` flag — Override for CI/CD or testing. 2. Project root directory — Project-specific credentials. 3. `~/.dbt/` directory (Recommended location) — Shared across all projects. dbt Core searches for the parent directory of `profiles.yml` in the following order and uses the first location it finds: 1. `--profiles-dir` flag 2. `DBT_PROFILES_DIR` environment variable 3. Current working directory 4. `~/.dbt/` directory (Recommended location) Note: dbt Core supports using the `DBT_PROFILES_DIR` environment variable or a `profiles.yml` file in the current working directory. These options aren't currently supported in Fusion. `~/.dbt/profiles.yml` is the recommended location for the following reasons: * **Security** — Keeps credentials out of project directories and version control. * **Reusability** — A single file for all dbt projects on the machine. * **Separation** — Connection details don't travel with project code. ###### When should I use project root?[​](#when-should-i-use-project-root "Direct link to When should I use project root?") Place your `profiles.yml` file in the project root directory for: * Self-contained demo or tutorial projects. * Docker containers with baked-in credentials. * CI/CD pipelines with environment-specific configs. #### Create and configure the `profiles.yml` file[​](#create-and-configure-the-profilesyml-file "Direct link to create-and-configure-the-profilesyml-file") The easiest way to create and configure a `profiles.yml` file is to execute `dbt init` after you've installed dbt on your machine. This takes you through the process of configuring an adapter and places the file into the recommended `~/.dbt/` location. If your project has an existing `profiles.yml` file, running `dbt init` will prompt you to amend or overwrite it. If you select the existing adapter for configuration, dbt will automatically populate the existing values. You can also manually create the file and add it to the proper location. To configure an adapter manually, copy and paste the fields from the adapter setup instructions for [dbt Core](https://docs.getdbt.com/docs/local/connect-data-platform/about-dbt-connections.md) or [Fusion](https://docs.getdbt.com/docs/local/profiles.yml.md) along with the appropriate values for each. ##### Example configuration[​](#example-configuration "Direct link to Example configuration") To set up your profile, copy the correct sample profile for your warehouse into your `profiles.yml` file and update the details as follows: * Profile name: Replace the name of the profile with a sensible name – it’s often a good idea to use the name of your organization. Make sure that this is the same name as the `profile` indicated in your `dbt_project.yml` file. * `target`: This is the default target your dbt project will use. It must be one of the targets you define in your profile. Commonly it is set to `dev`. * Populating your `outputs`: * `type`: The type of data warehouse you are connecting to * Warehouse credentials: Get these from your database administrator if you don’t already have them. Remember that user credentials are very sensitive information that should not be shared. May include fields like `account`, `username`, and `password`. * `schema`: The default schema that dbt will build objects in. * `threads`: The number of threads the dbt project will run on. The following example highlighs the format of the `profiles.yml` file. Note that many of the configs are adapter-specific and their syntax varies. \~/.dbt/profiles.yml ```yml my_project_profile: # Profile name (matches dbt_project.yml) target: dev # Default target to use outputs: dev: # Development environment type: adapter_type # Required: snowflake, bigquery, databricks, redshift, postgres, etc # Connection identifiers (placeholder examples, see adapter-specific pages for supported configs) account: abc123 database: docs_team schema: dev_schema # Authentication (adapter-specific) auth_method: username_password username: username password_credentials: password # Execution settings (common across adapters) threads: 4 # Number of parallel threads # Multiple profiles (for multiple projects) my_second_project_profile: target: dev outputs: dev: type: snowflake # Example adapter account: account user: user password: password database: database schema: schema warehouse: warehouse threads: 4 ``` ##### Environment variables[​](#environment-variables "Direct link to Environment variables") Use environment variables to keep sensitive credentials out of your `profiles.yml` file. Check out the [env\_var](https://docs.getdbt.com/reference/dbt-jinja-functions/env_var.md) reference for more information. Example: \~/.dbt/profiles.yml ```yml my_profile: target: dev outputs: dev: type: ADAPTER_NAME account: "{{ env_var("ADAPTER_ACCOUNT") }}" user: "{{ env_var("ADAPTER_USER") }}" password: "{{ env_var("ADAPTER_PASSWORD") }}" database: "{{ env_var("ADAPTER_DATABASE") }}" schema: "{{ env_var("ADAPTER_SCHEMA") }}" warehouse: "{{ env_var("ADAPTER_WAREHOUSE") }}" role: "{{ env_var("ADAPTER_ROLE") }}" threads: 4 ``` #### User config[​](#user-config "Direct link to User config") You can set default values of global configs for all projects that you run using your local machine. Refer to [About global configs](https://docs.getdbt.com/reference/global-configs/about-global-configs.md) for details. #### Understanding targets in profiles[​](#understanding-targets-in-profiles "Direct link to Understanding targets in profiles") dbt supports multiple targets within one profile to encourage the use of separate development and production environments as discussed in [dbt environments](https://docs.getdbt.com/docs/local/dbt-core-environments.md). A typical profile for an analyst using dbt locally will have a target named `dev`, and have this set as the default. You may also have a `prod` target within your profile, which creates the objects in your production schema. However, since it's often desirable to perform production runs on a schedule, we recommend deploying your dbt project to a separate machine other than your local machine. Most dbt users only have a `dev` target in their profile on their local machine. If you do have multiple targets in your profile, and want to use a target other than the default, you can do this using the `--target` flag when running a dbt command. For example, to run against your `prod` target instead of the default `dev` target: ```bash dbt run --target prod ``` You can use the `--target` flag with any dbt command, such as: ```bash dbt build --target prod dbt test --target dev dbt compile --target qa ``` ##### Overriding profiles and targets[​](#overriding-profiles-and-targets "Direct link to Overriding profiles and targets") When running dbt commands, you can specify which profile and target to use from the CLI using the `--profile` and `--target` [flags](https://docs.getdbt.com/reference/global-configs/about-global-configs.md#available-flags). These flags override what’s defined in your `dbt_project.yml` as long as the specified profile and target are already defined in your `profiles.yml` file. To run your dbt project with a different profile or target than the default, you can do so using the followingCLI flags: * `--profile` flag — Overrides the profile set in `dbt_project.yml` by pointing to another profile defined in `profiles.yml`. * `--target` flag — Specifies the target within that profile to use (as defined in `profiles.yml`). These flags help when you're working with multiple profiles and targets and want to override defaults without changing your files. ```bash dbt run --profile my-profile-name --target dev ``` In this example, the `dbt run` command will use the `my-profile-name` profile and the `dev` target. #### Understanding warehouse credentials[​](#understanding-warehouse-credentials "Direct link to Understanding warehouse credentials") We recommend that each dbt user has their own set of database credentials, including a separate user for production runs of dbt – this helps debug rogue queries, simplifies ownerships of schemas, and improves security. To ensure the user credentials you use in your target allow dbt to run, you will need to ensure the user has appropriate privileges. While the exact privileges needed varies between data warehouses, at a minimum your user must be able to: * Read source data * Create schemas¹ * Read system tables Running dbt without create schema privileges If your user is unable to be granted the privilege to create schemas, your dbt runs should instead target an existing schema that your user has permission to create relations within. #### Understanding target schemas[​](#understanding-target-schemas "Direct link to Understanding target schemas") The target schema represents the default schema that dbt will build objects into, and is often used as the differentiator between separate environments within a warehouse. Schemas in BigQuery dbt uses the term "schema" in a target across all supported warehouses for consistency. Note that in the case of BigQuery, a schema is actually a dataset. The schema used for production should be named in a way that makes it clear that it is ready for end-users to use for analysis – we often name this `analytics`. In development, a pattern we’ve found to work well is to name the schema in your `dev` target `dbt_`. Suffixing your name to the schema enables multiple users to develop in dbt, since each user will have their own separate schema for development, so that users will not build over the top of each other, and ensuring that object ownership and permissions are consistent across an entire schema. Note that there’s no need to create your target schema beforehand – dbt will check if the schema already exists when it runs, and create it if it doesn’t. While the target schema represents the default schema that dbt will use, it may make sense to split your models into separate schemas, which can be done by using [custom schemas](https://docs.getdbt.com/docs/build/custom-schemas.md). #### Understanding threads[​](#understanding-threads "Direct link to Understanding threads") When dbt runs, it creates a directed acyclic graph (DAG) of links between models. The number of threads represents the maximum number of paths through the graph dbt may work on at once – increasing the number of threads can minimize the run time of your project. The default value for threads in user profiles is 4 threads. For more information, check out [using threads](https://docs.getdbt.com/docs/running-a-dbt-project/using-threads.md). #### Related docs[​](#related-docs "Direct link to Related docs") * [Install dbt](https://docs.getdbt.com/docs/local/install-dbt.md) * [Connection profiles](https://docs.getdbt.com/docs/local/profiles.yml.md) #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### About the dbt Fusion engine dbt is the industry standard for data transformation. The dbt Fusion engine enables dbt to operate at speed and scale like never before. The dbt Fusion engine shares the same familiar framework for authoring data transformations as dbt Core, while enabling data developers to work faster and deploy transformation workloads more efficiently. ##### What is Fusion[​](#what-is-fusion "Direct link to What is Fusion") Fusion is an entirely new piece of software, written in a different programming language (Rust) than dbt Core (Python). Fusion is significantly faster than dbt Core, and it has a native understanding of SQL across multiple engine dialects. Fusion will eventually support the full dbt Core framework, a superset of dbt Core’s capabilities, and the vast majority of existing dbt projects. Fusion contains mixture of source-available, proprietary, and open source code. That means: * dbt Labs publishes much of the source code in the [`dbt-fusion` repository](https://github.com/dbt-labs/dbt-fusion), where you can read the code and participate in community discussions. * Some Fusion capabilities are exclusively available for paying customers of the cloud-based [dbt platform](https://www.getdbt.com/signup). Refer to [supported features](https://docs.getdbt.com/docs/fusion/supported-features.md#paid-features) for more information. Read more about the licensing for the dbt Fusion engine [here](http://www.getdbt.com/licenses-faq). #### Why use Fusion[​](#why-use-fusion "Direct link to Why use Fusion") As a developer, Fusion can: * Immediately catch incorrect SQL in your dbt models * Preview inline CTEs for faster debugging * Trace model and column definitions across your dbt project All of that and more is available in the [dbt extension for VSCode](https://docs.getdbt.com/docs/about-dbt-extension.md), with Fusion at the foundation. Fusion also enables more-efficient deployments of large DAGs. By tracking which columns are used where, and which source tables have fresh data, Fusion can ensure that models are rebuilt only when they need to process new data. This ["state-aware orchestration"](https://docs.getdbt.com/docs/deploy/state-aware-about.md) is a feature of the dbt platform (formerly dbt Cloud). ##### Thread management[​](#thread-management "Direct link to Thread management") The dbt Fusion engine manages parallelism differently than dbt Core. Rather than treating the `threads` setting as a strict limit on concurrent operations, Fusion optimizes parallelism based on each adapter's characteristics. * **Snowflake and Databricks**: Fusion ignores user-set threads and automatically optimizes parallelism for maximum performance. * **BigQuery and Redshift**: Fusion respects user-set threads to manage rate limits and concurrency constraints. For BigQuery and Redshift, setting `--threads 0` or omitting the setting allows Fusion to dynamically optimize. Low thread values can significantly slow down performance on these platforms. For more information, refer to [Using threads](https://docs.getdbt.com/docs/running-a-dbt-project/using-threads.md#fusion-engine-thread-optimization). ##### How to use Fusion[​](#how-to-use-fusion "Direct link to How to use Fusion") You can: * Select Fusion from the [dropdown/toggle in the dbt platform](https://docs.getdbt.com/docs/dbt-versions/upgrade-dbt-version-in-cloud.md#dbt-fusion-engine) [Private preview](https://docs.getdbt.com/docs/dbt-versions/product-lifecycles "Go to https://docs.getdbt.com/docs/dbt-versions/product-lifecycles") * [Install the dbt extension for VSCode](https://docs.getdbt.com/docs/install-dbt-extension.md) [Preview](https://docs.getdbt.com/docs/dbt-versions/product-lifecycles "Go to https://docs.getdbt.com/docs/dbt-versions/product-lifecycles") * [Install the Fusion CLI](https://docs.getdbt.com/docs/local/install-dbt.md?version=2#get-started) [Preview](https://docs.getdbt.com/docs/dbt-versions/product-lifecycles "Go to https://docs.getdbt.com/docs/dbt-versions/product-lifecycles") Go straight to the [Quickstart](https://docs.getdbt.com/guides/fusion.md) to *feel the Fusion* as fast as possible. #### What's next?[​](#whats-next "Direct link to What's next?") dbt Labs launched the dbt Fusion engine as a public beta on May 28, 2025, with plans to reach full feature parity with dbt Core ahead of [Fusion's general availability](https://docs.getdbt.com/blog/dbt-fusion-engine-path-to-ga). #### More information about Fusion[​](#more-information-about-fusion "Direct link to More information about Fusion") Fusion marks a significant update to dbt. While many of the workflows you've grown accustomed to remain unchanged, there are a lot of new ideas, and a lot of old ones going away. The following is a list of the full scope of our current release of the Fusion engine, including implementation, installation, deprecations, and limitations: * [About the dbt Fusion engine](https://docs.getdbt.com/docs/fusion/about-fusion.md) * [About the dbt extension](https://docs.getdbt.com/docs/about-dbt-extension.md) * [New concepts in Fusion](https://docs.getdbt.com/docs/fusion/new-concepts.md) * [Supported features matrix](https://docs.getdbt.com/docs/fusion/supported-features.md) * [Installing Fusion CLI](https://docs.getdbt.com/docs/local/install-dbt.md?version=2#get-started) * [Installing VS Code extension](https://docs.getdbt.com/docs/install-dbt-extension.md) * [Fusion release track](https://docs.getdbt.com/docs/dbt-versions/upgrade-dbt-version-in-cloud.md#dbt-fusion-engine) * [Quickstart for Fusion](https://docs.getdbt.com/guides/fusion.md?step=1) * [Upgrade guide](https://docs.getdbt.com/docs/dbt-versions/core-upgrade/upgrading-to-fusion.md) * [Fusion licensing](http://www.getdbt.com/licenses-faq) #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Arrow ADBC and Fusion This document provides technical guidance for dbt partners and vendors on how to design, build, and maintain ADBC (Apache Arrow Database Connectivity) drivers for Fusion, the new dbt engine. Fusion leverages ADBC as a unified driver layer for seamless, high-performance integration with data platforms. Building an ADBC driver is the first step to connecting Fusion with a new platform. #### Why Fusion uses ADBC[​](#why-fusion-uses-adbc "Direct link to Why Fusion uses ADBC") The dbt Fusion engine represents a major evolution in the dbt engine with minimal changes to the authoring layer. Built in Rust, Fusion delivers speed, language understanding, and seamless integration with numerous data warehouses. A key aspect of the new engine is its adoption of ADBC — a modern, open standard from the Apache Arrow project that simplifies columnar data interchange across platforms. Historically, dbt Core adapters required bespoke connection logic for each data platform. Fusion improves on this model with a unified ADBC driver layer that offers several key advantages: * **Standardization**: ADBC standardizes common platform features across a single interface. * **Performance**: Drivers leverage Arrow's columnar memory format for efficient query execution with minimal transformations. * **Maintainability**: ADBC drivers follow a shared specification, reducing the complexity of implementing new adapters. #### Technical overview[​](#technical-overview "Direct link to Technical overview") This technical specification covers the ADBC specification. The specification maintains backwards compatibility, so guidance here remains valid as the spec evolves. For the latest information and detailed documentation, refer to the [ADBC documentation](https://arrow.apache.org/adbc/current/), which is the source of truth. The ADBC API provides a powerful array of features, but you don't need to implement all of them. This section covers the API surface required for Fusion compatibility. ##### Programming language[​](#programming-language "Direct link to Programming language") **tl;dr: Use Go.** One distinct advantage of Arrow ADBC is portability. You can write drivers in various languages and load them via driver managers. This portability allows Fusion (written in Rust) to leverage drivers written in other languages. For Fusion compatibility, drivers must: * Compile into shared libraries that can be loaded from any program * Produce a platform-specific, standalone binary We recommend **Go** as the language of choice, though Rust or C++ also work. Go has a runtime and garbage collector, but it's engineered to compile into well-behaved shared libraries—unlike languages like C# or Java. A standalone binary allows users to download and run the driver out of the box without setting up an interpreter. Compiled languages like Go also enable Fusion and its drivers to share memory directly over FFI without external dependencies. ##### ADBC specifications[​](#adbc-specifications "Direct link to ADBC specifications") This section covers the minimum requirements for a Fusion-compatible ADBC driver. For complete details on the ADBC specification and driver development, refer to the [ADBC driver authoring guide](https://arrow.apache.org/adbc/current/driver/authoring.html). Drivers consist of several key abstractions: 1. **Driver**: Load a driver to create databases. 2. **Database**: Create databases and set configuration options (including authentication). Use databases to open connections. 3. **Connection**: Establish connections to the warehouse. Connections create statements. 4. **Statement**: Set options and SQL queries on statements, then execute them against the warehouse. Fusion achieves high performance through aggressive parallelism, so expect many simultaneous connections during project execution. ##### Authentication[​](#authentication "Direct link to Authentication") Drivers handle authentication through key-value options set on the database. Fusion translates options from user-authored `profiles.yml` files before passing them to the driver. For example, what dbt calls `client_secret` in a Snowflake profile gets set on the driver as `adbc.snowflake.sql.client_option.client_secret`. For a complete example of how Fusion translates profile options, see the [Snowflake authentication source code](https://github.com/dbt-labs/dbt-fusion/blob/main/crates/dbt-auth/src/snowflake/mod.rs). For more information on profile configuration, refer to [dbt profiles](https://docs.getdbt.com/docs/local/profiles.yml.md). ###### Credential caching[​](#credential-caching "Direct link to Credential caching") Simple authentication methods (like username/password stored in `profiles.yml`) support fully parallel connection creation with no special handling required. For authentication methods that require browser interaction (user-to-machine OAuth, SSO, or MFA), implement credential caching. Due to Fusion's highly parallel execution, without caching, every new connection prompts the user for authentication repeatedly. Your credential cache for browser-based authentication must: * Block new connections until an initial connection establishes and stores a token in memory (avoiding the thundering herd problem). * Handle token refresh using the same blocking principle when invalidation occurs. * Use interprocess, file-system-based storage to support the LSP, which runs in a separate process. This caching is critical for any browser-based or MFA authentication option, but is not needed for simple credential-based authentication. #### Required APIs[​](#required-apis "Direct link to Required APIs") This section covers the minimum API set for Fusion compatibility. The requirements are: * Authentication via options. * SQL query execution. * Metadata queries to understand remote warehouse state (certain connection-level metadata functions can use cheaper or more performant APIs to pull table schemas). These requirements are not exhaustive. dbt Labs encourages implementing the full ADBC specification to benefit both Fusion and the broader ADBC community. ###### Driver[​](#driver "Direct link to Driver") | Method | Description | | ------------- | ------------------------------- | | `NewDatabase` | Create a new database instance. | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | ###### Database[​](#database "Direct link to Database") | Method | Description | | ------------ | ----------------------------------------------------- | | `SetOptions` | Set configuration options (including authentication). | | `Open` | Open a connection. | | `Close` | Close the database. | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | ###### Connection[​](#connection "Direct link to Connection") | Method | Description | | ---------------- | --------------------------------------- | | `GetObjects` | Pull metadata from the warehouse. | | `GetTableSchema` | Pull schema metadata for tables. | | `NewStatement` | Create a statement for query execution. | | `Close` | Close the connection. | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | ###### Statement[​](#statement "Direct link to Statement") | Method | Description | | --------------- | ----------------------------------- | | `SetOption` | Set options on queries. | | `SetSqlQuery` | Set the SQL query text. | | `ExecuteQuery` | Execute a query and return results. | | `ExecuteUpdate` | Execute DML queries. | | `Close` | Close the statement. | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### BigQuery setup Preview ### BigQuery setup [Preview](https://docs.getdbt.com/docs/dbt-versions/product-lifecycles "Go to https://docs.getdbt.com/docs/dbt-versions/product-lifecycles") You can configure the BigQuery adapter by running `dbt init` in your CLI or manually providing the `profiles.yml` file with the fields configured for your authentication type. The BigQuery adapter for Fusion supports the following [authentication methods](#supported-authentication-types): * Service account (JSON file) * gcloud OAuth #### BigQuery permssions[​](#bigquery-permssions "Direct link to BigQuery permssions") dbt user accounts need the following permissions to read from and create tables and views in a BigQuery project: * BigQuery Data Editor * BigQuery User * BigQuery Read Session User (New in Fusion. For Storage Read API access) For BigQuery DataFrames, users need these additional permissions: * BigQuery Job User * BigQuery Read Session User * Notebook Runtime User * Code Creator * colabEnterpriseUser #### Configure Fusion[​](#configure-fusion "Direct link to Configure Fusion") Executing `dbt init` in your CLI will prompt for the following fields: * **Project ID:** The GCP BigQuery project ID * **Dataset:** The schema name * **Location:** The location for your GCP environment (for example, us-east1) Alternatively, you can manually create the `profiles.yml` file and configure the fields. See examples in [authentication](#supported-authentication-types) section for formatting. If there is an existing `profiles.yml` file, you have the option to retain the existing fields or overwrite them. Next, select your authentication method. Follow the on-screen prompts to provide the required information. #### Supported authentication types[​](#supported-authentication-types "Direct link to Supported authentication types") * Service account (JSON file) * gcloud OAuth Selecting the **Service account (JSON file)** authentication type will prompt you for the path to your JSON file. You can also manually define the path in your `profiles.yml` file. ###### Example service account JSON file configuration[​](#example-service-account-json-file-configuration "Direct link to Example service account JSON file configuration") profiles.yml ```yml default: target: dev outputs: dev: type: bigquery threads: 16 database: ABC123 schema: JAFFLE_SHOP method: service-account keyfile: /Users/mshaver/Downloads/CustomRoleDefinition.json location: us-east1 dataproc_batch: null ``` Prior to selecting this authentication method, you must first configure local OAuth for gcloud: ###### Local OAuth gcloud setup[​](#local-oauth-gcloud-setup "Direct link to Local OAuth gcloud setup") 1. Make sure the `gcloud` command is [installed on your computer](https://cloud.google.com/sdk/downloads). 2. Activate the application-default account with: ```shell gcloud auth application-default login \ --scopes=https://www.googleapis.com/auth/bigquery,\ https://www.googleapis.com/auth/drive.readonly,\ https://www.googleapis.com/auth/iam.test,\ https://www.googleapis.com/auth/cloud-platform # This command uses the `--scopes` flag to request access to Google Sheets. This makes it possible to transform data in Google Sheets using dbt. If your dbt project does not transform data in Google Sheets, then you may omit the `--scopes` flag. ``` A browser window should open, and you should be prompted to log into your Google account. Once you've done that, dbt will use your OAuth'd credentials to connect to BigQuery. ###### Example gcloud configuration[​](#example-gcloud-configuration "Direct link to Example gcloud configuration") profiles.yml ```yml default: target: dev outputs: dev: type: bigquery threads: 16 database: ABC123 schema: JAFFLE_SHOP method: oauth location: us-east1 dataproc_batch: null ``` #### More information[​](#more-information "Direct link to More information") Find BigQuery-specific configuration information in the [BigQuery adapter reference guide](https://docs.getdbt.com/reference/resource-configs/bigquery-configs.md). #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Connection profiles When you invoke dbt from the command line, dbt parses your `dbt_project.yml` and obtains the `profile` name, which dbt needs to connect to your data warehouse. dbt\_project.yml ```yaml # Example dbt_project.yml file name: 'jaffle_shop' profile: 'jaffle_shop' ... ``` dbt then checks your `profiles.yml` file for a profile with the same name. A profile contains all the details required to connect to your data warehouse. dbt will search the current working directory for the `profiles.yml` file and will default to the `~/.dbt/` directory if not found. This file generally lives outside of your dbt project to avoid sensitive credentials being checked in to version control, but `profiles.yml` can be safely checked in when [using environment variables](#advanced-using-environment-variables) to load sensitive credentials. \~/.dbt/profiles.yml ```yaml # example profiles.yml file jaffle_shop: target: dev outputs: dev: type: postgres host: localhost user: alice password: port: 5432 dbname: jaffle_shop schema: dbt_alice threads: 4 prod: # additional prod target type: postgres host: prod.db.example.com user: alice password: port: 5432 dbname: jaffle_shop schema: analytics threads: 8 ``` To add an additional target (like `prod`) to your existing `profiles.yml`, you can add another entry under the `outputs` key. #### The `env_var` function[​](#the-env_var-function "Direct link to the-env_var-function") The `env_var` function can be used to incorporate environment variables from the system into your dbt project. You can use the `env_var` function in your `profiles.yml` file, the `dbt_project.yml` file, the `sources.yml` file, your `schema.yml` files, and in model `.sql` files. Essentially, `env_var` is available anywhere dbt processes Jinja code. When used in a `profiles.yml` file (to avoid putting credentials on a server), it can be used like this: profiles.yml ```yaml profile: target: prod outputs: prod: type: postgres host: 127.0.0.1 # IMPORTANT: Make sure to quote the entire Jinja string here user: "{{ env_var('DBT_USER') }}" password: "{{ env_var('DBT_PASSWORD') }}" .... ``` #### About the `profiles.yml` file[​](#about-the-profilesyml-file "Direct link to about-the-profilesyml-file") In your `profiles.yml` file, you can store as many profiles as you need. Typically, you would have one profile for each warehouse you use. Most organizations only have one profile. #### About profiles[​](#about-profiles "Direct link to About profiles") A profile consists of *targets*, and a specified *default target*. Each *target* specifies the type of warehouse you are connecting to, the credentials to connect to the warehouse, and some dbt-specific configurations. The credentials you need to provide in your target varies across warehouses — sample profiles for each supported warehouse are available in the [Supported Data Platforms](https://docs.getdbt.com/docs/supported-data-platforms.md) section. **Pro Tip:** You may need to surround your password in quotes if it contains special characters. More details [here](https://stackoverflow.com/a/37015689/10415173). #### Setting up your profile[​](#setting-up-your-profile "Direct link to Setting up your profile") To set up your profile, copy the correct sample profile for your warehouse into your `profiles.yml` file and update the details as follows: * Profile name: Replace the name of the profile with a sensible name – it’s often a good idea to use the name of your organization. Make sure that this is the same name as the `profile` indicated in your `dbt_project.yml` file. * `target`: This is the default target your dbt project will use. It must be one of the targets you define in your profile. Commonly it is set to `dev`. * Populating your target: * `type`: The type of data warehouse you are connecting to * Warehouse credentials: Get these from your database administrator if you don’t already have them. Remember that user credentials are very sensitive information that should not be shared. * `schema`: The default schema that dbt will build objects in. * `threads`: The number of threads the dbt project will run on. You can find more information on which values to use in your targets below. Use the [debug](https://docs.getdbt.com/reference/dbt-jinja-functions/debug-method.md) command to validate your warehouse connection. Run `dbt debug` from within a dbt project to test your connection. #### Understanding targets in profiles[​](#understanding-targets-in-profiles "Direct link to Understanding targets in profiles") dbt supports multiple targets within one profile to encourage the use of separate development and production environments as discussed in [dbt environments](https://docs.getdbt.com/docs/local/dbt-core-environments.md). A typical profile for an analyst using dbt locally will have a target named `dev`, and have this set as the default. You may also have a `prod` target within your profile, which creates the objects in your production schema. However, since it's often desirable to perform production runs on a schedule, we recommend deploying your dbt project to a separate machine other than your local machine. Most dbt users only have a `dev` target in their profile on their local machine. If you do have multiple targets in your profile, and want to use a target other than the default, you can do this using the `--target` flag when running a dbt command. For example, to run against your `prod` target instead of the default `dev` target: ```bash dbt run --target prod ``` You can use the `--target` flag with any dbt command, such as: ```bash dbt build --target prod dbt test --target dev dbt compile --target qa ``` ##### Overriding profiles and targets[​](#overriding-profiles-and-targets "Direct link to Overriding profiles and targets") When running dbt commands, you can specify which profile and target to use from the CLI using the `--profile` and `--target` [flags](https://docs.getdbt.com/reference/global-configs/about-global-configs.md#available-flags). These flags override what’s defined in your `dbt_project.yml` as long as the specified profile and target are already defined in your `profiles.yml` file. To run your dbt project with a different profile or target than the default, you can do so using the followingCLI flags: * `--profile` flag — Overrides the profile set in `dbt_project.yml` by pointing to another profile defined in `profiles.yml`. * `--target` flag — Specifies the target within that profile to use (as defined in `profiles.yml`). These flags help when you're working with multiple profiles and targets and want to override defaults without changing your files. ```bash dbt run --profile my-profile-name --target dev ``` In this example, the `dbt run` command will use the `my-profile-name` profile and the `dev` target. #### Understanding warehouse credentials[​](#understanding-warehouse-credentials "Direct link to Understanding warehouse credentials") We recommend that each dbt user has their own set of database credentials, including a separate user for production runs of dbt – this helps debug rogue queries, simplifies ownerships of schemas, and improves security. To ensure the user credentials you use in your target allow dbt to run, you will need to ensure the user has appropriate privileges. While the exact privileges needed varies between data warehouses, at a minimum your user must be able to: * read source data * create schemas¹ * read system tables Running dbt without create schema privileges If your user is unable to be granted the privilege to create schemas, your dbt runs should instead target an existing schema that your user has permission to create relations within. #### Understanding target schemas[​](#understanding-target-schemas "Direct link to Understanding target schemas") The target schema represents the default schema that dbt will build objects into, and is often used as the differentiator between separate environments within a warehouse. Schemas in BigQuery dbt uses the term "schema" in a target across all supported warehouses for consistency. Note that in the case of BigQuery, a schema is actually a dataset. The schema used for production should be named in a way that makes it clear that it is ready for end-users to use for analysis – we often name this `analytics`. In development, a pattern we’ve found to work well is to name the schema in your `dev` target `dbt_`. Suffixing your name to the schema enables multiple users to develop in dbt, since each user will have their own separate schema for development, so that users will not build over the top of each other, and ensuring that object ownership and permissions are consistent across an entire schema. Note that there’s no need to create your target schema beforehand – dbt will check if the schema already exists when it runs, and create it if it doesn’t. While the target schema represents the default schema that dbt will use, it may make sense to split your models into separate schemas, which can be done by using [custom schemas](https://docs.getdbt.com/docs/build/custom-schemas.md). #### Understanding threads[​](#understanding-threads "Direct link to Understanding threads") When dbt runs, it creates a directed acyclic graph (DAG) of links between models. The number of threads represents the maximum number of paths through the graph dbt may work on at once – increasing the number of threads can minimize the run time of your project. The default value for threads in user profiles is 4 threads. For more information, check out [using threads](https://docs.getdbt.com/docs/running-a-dbt-project/using-threads.md). #### Advanced: Customizing a profile directory[​](#advanced-customizing-a-profile-directory "Direct link to Advanced: Customizing a profile directory") * dbt Fusion * dbt Core Fusion determines the parent directory for `profiles.yml` using the following precedence: 1. `--profiles-dir` option 2. Project root directory 3. `~/.dbt/` directory Note that Fusion doesn't currently support the `DBT_PROFILES_DIR` environment variable or setting the `profiles.yml` in the current working directory. dbt Core determines the parent directory for `profiles.yml` using the following precedence: 1. `--profiles-dir` option 2. `DBT_PROFILES_DIR` environment variable 3. current working directory 4. `~/.dbt/` directory To check the expected location of your `profiles.yml` file for your installation of dbt, you can run the following: ```bash $ dbt debug --config-dir To view your profiles.yml file, run: open /Users/alice/.dbt ``` You may want to have your `profiles.yml` file stored in a different directory than `~/.dbt/` – for example, if you are [using environment variables](#advanced-using-environment-variables) to load your credentials, you might choose to include this file in the root directory of your dbt project. Note that the file always needs to be called `profiles.yml`, regardless of which directory it is in. There are multiple ways to direct dbt to a different location for your `profiles.yml` file: ##### 1. Use the `--profiles-dir` option when executing a dbt command[​](#1-use-the---profiles-dir-option-when-executing-a-dbt-command "Direct link to 1-use-the---profiles-dir-option-when-executing-a-dbt-command") This option can be used as follows: ```text $ dbt run --profiles-dir path/to/directory ``` If using this method, the `--profiles-dir` option needs to be provided every time you run a dbt command. ##### 2. Use the `DBT_PROFILES_DIR` environment variable to change the default location (dbt Core only)[​](#2-use-the-dbt_profiles_dir-environment-variable-to-change-the-default-location-dbt-core-only "Direct link to 2-use-the-dbt_profiles_dir-environment-variable-to-change-the-default-location-dbt-core-only") Setting this environment variable tells dbt Core to look for your `profiles.yml` file in the specified directory instead of the default location. You can specify this by running: ```text $ export DBT_PROFILES_DIR=path/to/directory ``` Note: This environment variable isn't supported in Fusion. #### Advanced: Using environment variables[​](#advanced-using-environment-variables "Direct link to Advanced: Using environment variables") Credentials can be placed directly into the `profiles.yml` file or loaded from environment variables. Using environment variables is especially useful for production deployments of dbt. You can find more information about environment variables [here](https://docs.getdbt.com/reference/dbt-jinja-functions/env_var.md). #### Related docs[​](#related-docs "Direct link to Related docs") * [About `profiles.yml`](https://docs.getdbt.com/docs/local/profiles.yml.md) #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Databricks setup Preview ### Databricks setup [Preview](https://docs.getdbt.com/docs/dbt-versions/product-lifecycles "Go to https://docs.getdbt.com/docs/dbt-versions/product-lifecycles") You can configure the Databricks adapter by running `dbt init` in your CLI or manually providing the `profiles.yml` file with the fields configured for your authentication type. The Databricks adapter for Fusion supports the following [authentication methods](#supported-authentication-types): * Personal access token (for individual users) * Service Principal token (for service users) * OAuth #### Databricks configuration details[​](#databricks-configuration-details "Direct link to Databricks configuration details") The dbt Fusion engine `dbt-databricks` adapter is the only supported connection method for Databricks. `dbt-databricks` can connect to Databricks SQL Warehouses. These warehouses are the recommended way to get started with Databricks. Refer to the [Databricks docs](https://docs.databricks.com/dev-tools/dbt.html#) for more info on how to obtain the credentials for configuring your profile. #### Configure Fusion[​](#configure-fusion "Direct link to Configure Fusion") Executing `dbt init` in your CLI will prompt for the following fields: * **Host:** Databricks instance hostname (excluding the `http` or `https` prefix) * **HTTP Path:** Path to your SQL server or cluster * **Schema:** The development/staging/deployment schema for the project * **Catalog (Optional):** The Databricks catalog containing your schemas and tables Alternatively, you can manually create the `profiles.yml` file and configure the fields. See examples in [authentication](#supported-authentication-types) section for formatting. If there is an existing `profiles.yml` file, you are given the option to retain the existing fields or overwrite them. Next, select your authentication method. Follow the on-screen prompts to provide the required information. #### Supported authentication types[​](#supported-authentication-types "Direct link to Supported authentication types") * Personal access token * Service Principal token * OAuth (Recommended) Enter your personal access token (PAT) for the Databricks environment. For more information about obtaining a PAT, refer to the [Databricks documentation](https://docs.databricks.com/aws/en/dev-tools/auth/pat). This is considered a legacy feature by Databricks and OAuth is recommended over PATs. ###### Example personal access token configuration[​](#example-personal-access-token-configuration "Direct link to Example personal access token configuration") profiles.yml ```yml default: target: dev outputs: dev: type: databricks database: TRANSFORMING schema: JANE_SMITH host: YOUR.HOST.COM http_path: YOUR/PATH/HERE token: ABC123 auth_type: databricks_cli threads: 16 ``` Enter your Service Principal token for the Databricks environment. For more information about obtaining a Service Principal token, refer to the [Databricks documentation](https://docs.databricks.com/aws/en/admin/users-groups/service-principals). ###### Example Service Principal token configuration[​](#example-service-principal-token-configuration "Direct link to Example Service Principal token configuration") profiles.yml ```yml default: target: dev outputs: dev: type: databricks database: TRANSFORMING schema: JANE_SMITH host: YOUR.HOST.COM http_path: YOUR/PATH/HERE token: ABC123 auth_type: databricks_cli threads: 16 ``` Selecting the OAuth option will create a connection to your Databricks environment and open a web browser so you can complete the authentication. Users will be prompted to re-authenticate with each new dbt session they initiate. ###### Example OAuth configuration[​](#example-oauth-configuration "Direct link to Example OAuth configuration") profiles.yml ```yml default: target: dev outputs: dev: type: databricks database: TRANSFORMING schema: JANE_SMITH host: YOUR.HOST.COM http_path: YOUR/PATH/HERE auth_type: oauth threads: 16 ``` #### More information[​](#more-information "Direct link to More information") Find Databricks-specific configuration information in the [Databricks adapter reference guide](https://docs.getdbt.com/reference/resource-configs/databricks-configs.md). #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### dbt Fusion engine Preview ### dbt Fusion engine [Preview](https://docs.getdbt.com/docs/dbt-versions/product-lifecycles "Go to https://docs.getdbt.com/docs/dbt-versions/product-lifecycles") dbt Fusion engine is the next-generation engine built in Rust, that powers development across the dbt platform (formerly dbt Cloud), and local development in VS Code and Cursor, and the CLI. Fusion is faster, smarter, and more cost-efficient — bringing SQL comprehension and state awareness, instant feedback, and more — to every dbt workflow, and an integrated VS Code experience through the [dbt extension](https://marketplace.visualstudio.com/items?itemName=dbtLabsInc.dbt) and [Language Server Protocol (LSP)](https://docs.getdbt.com/blog/dbt-fusion-engine-components#the-dbt-vs-code-extension-and-language-server), which enables features like live CTE previews, hover info, error highlighting, and more. Choose one of the following paths to get started with the dbt Fusion engine. [![](/img/icons/dbt-bit.svg)](https://docs.getdbt.com/docs/fusion/about-fusion.md) ###### [About Fusion](https://docs.getdbt.com/docs/fusion/about-fusion.md) [Learn about the dbt Fusion engine and how it works.](https://docs.getdbt.com/docs/fusion/about-fusion.md) [![](/img/icons/dbt-bit.svg)](https://docs.getdbt.com/docs/fusion/fusion-availability.md) ###### [Fusion availability](https://docs.getdbt.com/docs/fusion/fusion-availability.md) [Learn where the dbt Fusion engine is available.](https://docs.getdbt.com/docs/fusion/fusion-availability.md) [![](/img/icons/dbt-bit.svg)](https://docs.getdbt.com/docs/fusion/fusion-readiness.md) ###### [Fusion readiness checklist](https://docs.getdbt.com/docs/fusion/fusion-readiness.md) [Learn about the checklist to prepare your projects for the dbt Fusion engine.](https://docs.getdbt.com/docs/fusion/fusion-readiness.md) [![](/img/icons/dbt-bit.svg)](https://docs.getdbt.com/docs/fusion/new-concepts.md) ###### [New concepts in Fusion](https://docs.getdbt.com/docs/fusion/new-concepts.md) [Learn about the new concepts in the dbt Fusion engine.](https://docs.getdbt.com/docs/fusion/new-concepts.md) [![](/img/icons/dbt-bit.svg)](https://docs.getdbt.com/docs/fusion/supported-features.md) ###### [Supported features](https://docs.getdbt.com/docs/fusion/supported-features.md) [Learn about the features and capabilities of the dbt Fusion engine.](https://docs.getdbt.com/docs/fusion/supported-features.md) [![](/img/icons/dbt-bit.svg)](https://docs.getdbt.com/docs/fusion/get-started-fusion.md) ###### [Get started with Fusion](https://docs.getdbt.com/docs/fusion/get-started-fusion.md) [Learn about how to start using the dbt Fusion engine.](https://docs.getdbt.com/docs/fusion/get-started-fusion.md) [![](/img/icons/dbt-bit.svg)]() ###### [Upgrade to Fusion]() [Learn how to upgrade your cloud-hosted environments to the dbt Fusion engine.]() #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Fusion availability Not sure where to start? Try out the [Fusion quickstart](https://docs.getdbt.com/guides/fusion.md) and check out the [Fusion migration guide](https://docs.getdbt.com/docs/dbt-versions/core-upgrade/upgrading-to-fusion.md) to see how to migrate your project. dbt Fusion engine powers dbt development everywhere — in the [dbt platform](https://docs.getdbt.com/docs/dbt-versions/upgrade-dbt-version-in-cloud.md#dbt-fusion-engine), [VS Code/Cursor/Windsurf](https://docs.getdbt.com/docs/about-dbt-extension.md), and [locally](https://docs.getdbt.com/docs/local/install-dbt.md?version=2#installation). Fusion in the dbt platform is available in private preview. Contact your account team for access. [dbt platform](https://docs.getdbt.com/docs/introduction.md#the-dbt-platform-formerly-dbt-cloud) supports two engines: Fusion (Rust-based, fast, visual) and dbt Core (Python-based, traditional). dbt Core is also available as an [open-source CLI](https://docs.getdbt.com/docs/introduction.md#dbt-core) for self-hosted workflows. Features vary depending on how Fusion is implemented. Whether you’re new to dbt or already set up, check out the following table to see what developement solutions are available and where you can use them. See [dbt platform features](https://docs.getdbt.com/docs/cloud/about-cloud/dbt-cloud-features.md) for a full list of the available features for dbt platform. | | Features you can use | Who can use it? | Solutions available | | ---------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ----------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | | **dbt platform**
with Fusion or dbt Core engine | - [Canvas](https://docs.getdbt.com/docs/cloud/canvas.md)
- [Insights](https://docs.getdbt.com/docs/explore/navigate-dbt-insights.md#lsp-features)
- [Studio IDE](https://docs.getdbt.com/docs/cloud/studio-ide/develop-in-studio.md)
- [dbt CLI](https://docs.getdbt.com/docs/cloud/cloud-cli-installation.md)
- [dbt VS Code extension](https://marketplace.visualstudio.com/items?itemName=dbtLabsInc.dbt)
(VS Code/ Cursor/ Windsurf. Fusion only.) | - dbt platform licensed users
- Anyone getting started with dbt
| - **dbt Fusion engine**: Rust-based engine that delivers fast, reliable compilation, analysis, validation, state awareness, and job execution with [visual LSP features](https://docs.getdbt.com/docs/fusion/supported-features.md#features-and-capabilities) like autocomplete, inline errors, live previews, and lineage, and more.

- **dbt Core**: Uses the Python-based dbt Core engine for traditional workflows. *Does not* include LSP features. | | **Self-hosted Fusion** | - [dbt VS Code extension](https://marketplace.visualstudio.com/items?itemName=dbtLabsInc.dbt)
(VS Code/Cursor/Windsurf)
- [Fusion CLI](https://docs.getdbt.com/docs/local/install-dbt.md?version=2#get-started) | - dbt platform users
- dbt Fusion users
- Anyone getting started with dbt | - **VS Code extension:** Combines dbt Fusion engine performance with visual LSP features when developing locally.

- **Fusion CLI:** Provides Fusion performance benefits (faster parsing, compilation, execution) but *does not* include LSP features. | | **Self-hosted dbt Core** | - [dbt Core CLI](https://docs.getdbt.com/docs/local/install-dbt.md) | - dbt Core users
- Anyone getting started with dbt | Uses the Python-based dbt Core engine for traditional workflows. *Does not* include LSP features.

To use the Fusion features locally, install [the VS Code extension](https://docs.getdbt.com/docs/local/install-dbt.md?version=2#get-started) or [Fusion CLI](https://docs.getdbt.com/docs/local/install-dbt.md?version=2#get-started). | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | * Like dbt Core, you can install Fusion locally from the [CLI](https://docs.getdbt.com/docs/local/install-dbt.md?version=2#get-started) to power local workflows. For ergonomic and LSP-based intelligent development (powered by Fusion), [install the VS Code extension](https://docs.getdbt.com/docs/local/install-dbt.md?version=2#get-started). * Fusion in the dbt platform is available in private preview. To use Fusion in the dbt platform, contact your account team for access and then [upgrade environments to the dbt Fusion engine](https://docs.getdbt.com/docs/dbt-versions/upgrade-dbt-version-in-cloud.md#dbt-fusion-engine) to power your projects. * If your account isn't on the dbt Fusion engine, you use the dbt platform with the traditional Python-based dbt Core engine. However, it doesn't come with the Fusion [features](https://docs.getdbt.com/docs/fusion/supported-features.md#features-and-capabilities), such as 30x faster compilation/parsing, autocomplete, hover info, inline error highlights, and more. To use Fusion, contact your account team for access. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Fusion readiness checklist The dbt Fusion engine is here! We currently offer it as a [private preview](https://docs.getdbt.com/docs/dbt-versions/product-lifecycles.md#the-dbt-platform) on the dbt platform. Even if we haven't enabled it for your account, you can still start preparing your projects for upgrade. Use this checklist to ensure a smooth upgrade once Fusion becomes available. If this is all new to you, first [learn about Fusion](https://docs.getdbt.com/docs/fusion.md), its current state, and the features available. #### Preparing for Fusion[​](#preparing-for-fusion "Direct link to Preparing for Fusion") Use the following checklist to prepare your projects for the dbt Fusion engine ##### Upgrade to the latest dbt version[​](#upgrade-to-the-latest-dbt-version "Direct link to Upgrade to the latest dbt version") The **Latest** [release track](https://docs.getdbt.com/docs/dbt-versions/cloud-release-tracks.md) has all of the most recent features to help you prepare for Fusion. * Make sure all your projects are on the **Latest** release track across all deployment environments and jobs. This will ensure the simplest, most predictable experience by allowing you to pre-validate that your project doesn't rely on deprecated behaviors. ##### Resolve all deprecation warnings[​](#resolve-all-deprecation-warnings "Direct link to Resolve all deprecation warnings") You must resolve deprecations while your projects are on a dbt Core release track, as they result in warnings that will become errors once you upgrade to Fusion. The autofix tool can automatically resolve many deprecations (such as moving arbitrary configs into the meta dictionary). For a full list of deprecations and how to resolve them, refer to [Deprecations](https://docs.getdbt.com/reference/deprecations.md). Start a new branch to begin resolving deprecation warnings using one of the following methods: * \[ ] **Run autofix in the dbt platform:** You can address deprecation warnings using the [autofix tool in the Studio IDE](https://docs.getdbt.com/docs/cloud/studio-ide/autofix-deprecations.md). You can run the autofix tool on the **Compatible** or **Latest** release track. * \[ ] **Run autofix locally:** Use the [VS Code extension](https://docs.getdbt.com/docs/about-dbt-extension.md). The extension has a built-in ["Getting Started" workflow](https://docs.getdbt.com/docs/install-dbt-extension.md#getting-started) that will debug your dbt project in the VS Code or Cursor IDE and execute the autofix tool. This has the added benefit of installing Fusion to your computer so you can begin testing locally before implementing in your dbt platform account. * \[ ] **Run autofix locally (without the extension):** Visit the autofix [GitHub repo](https://github.com/dbt-labs/dbt-autofix) to run the tool locally if you're not using VS Code or Cursor. This will only run the tool, it will not install Fusion. ##### Validate and upgrade your dbt packages[​](#validate-and-upgrade-your-dbt-packages "Direct link to Validate and upgrade your dbt packages") The most commonly used dbt Labs managed packages (such as `dbt_utils` and `dbt_project_evaluator`) are already compatible with Fusion, as are a large number of external and community packages. Review [the dbt package hub](https://hub.getdbt.com) to see verified Fusion-compatible packages by checking that the `require-dbt-version` configuration includes `2.0.0` or higher. Refer to [package support](https://docs.getdbt.com/docs/fusion/supported-features.md#package-support) for more information. * Make sure that all of your packages are upgraded to the most recent version, many of which contain enhancements to support Fusion. * Check package repositories to make sure they're compatible with Fusion. If a package you use is not yet compatible, we recommend opening an issue with the maintainer, making the contribution yourself, or removing the package temporarily before you upgrade. ##### Validate user-defined functions[​](#validate-user-defined-functions "Direct link to Validate user-defined functions") ##### Validate support for functions[​](#validate-support-for-functions "Direct link to Validate support for functions") Check that Fusion supports all user-defined functions (UDFs) in your project. Fusion supports nearly all built-in data platform functions out of the box. However, data platforms continuously add new functions that Fusion may not yet support. If you see the error `dbt0209: No function `, the resolution depends on whether the function is a UDF or a built-in function: If you see this warning: * \[ ] **For custom UDFs:** Recreate it as a [native dbt UDF](https://docs.getdbt.com/docs/build/udfs.md#defining-udfs-in-dbt) to get the full Fusion experience. With `static_analysis: baseline` (the default), most UDFs will work out of the box. * \[ ] **For Warehouse-native functions:** Submit a [GitHub issue](https://github.com/dbt-labs/dbt-fusion). Fusion's `baseline` mode handles most cases, but will throw warnings and not errors. You can set `static_analysis: off` for specific models if needed. ##### Check for known Fusion limitations[​](#check-for-known-fusion-limitations "Direct link to Check for known Fusion limitations") Your project may implement features that Fusion currently [limits](https://docs.getdbt.com/docs/fusion/supported-features.md#limitations) or doesn't support. * Remove unnecessary features from your project to make it Fusion compatible. * Monitor progress for critical features, knowing we are working to bring them to Fusion. You can monitor their progress using the issues linked in the [limitations table](https://docs.getdbt.com/docs/fusion/supported-features.md#limitations). ##### Review jobs configured in the dbt platform[​](#review-jobs-configured-in-the-dbt-platform "Direct link to Review jobs configured in the dbt platform") We determine Fusion eligibility using data from your job runs. * Ensure you have at least one job running in each of your projects in the dbt platform. * Ensure all jobs are running on the [**Latest** release track](https://docs.getdbt.com/docs/dbt-versions/cloud-release-tracks.md#which-release-tracks-are-available). * Resolve any job failures — all jobs must run successfully for eligibility checks to work. * Delete any jobs that are no longer in use to ensure accurate eligibility reporting. * Make sure you've promoted the changes for deprecation resolution and package upgrades to your git branches that map to your deployment environments. ##### Stay informed about Fusion progress[​](#stay-informed-about-fusion-progress "Direct link to Stay informed about Fusion progress") The dbt Fusion engine remains in private preview and we currently offer it for eligible projects! We will notify you when all your projects are ready for Fusion based on our eligibility checks on your deployment jobs. In the meantime, keep up-to-date with these resources: * Check out the [Fusion homepage](https://www.getdbt.com/product/fusion) for available resources, including supported adapters, prerequisites, installation instructions, limitations, and deprecations. * Read the [Upgrade guide](https://docs.getdbt.com/docs/dbt-versions/core-upgrade/upgrading-to-fusion.md) to learn about the new features and functionality that impact your dbt projects. * Monitor progress and get insight into the development process by reading the [Fusion Diaries](https://github.com/dbt-labs/dbt-fusion/discussions/categories/announcements). * Catch up on the [cost savings potential](https://www.getdbt.com/blog/announcing-state-aware-orchestration) of Fusion-powered [state-aware orchestration](https://docs.getdbt.com/docs/deploy/state-aware-about) (hint: 30%+ reduction in warehouse spend!) #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Fusion releases Preview ### Fusion releases [Preview](https://docs.getdbt.com/docs/dbt-versions/product-lifecycles "Go to https://docs.getdbt.com/docs/dbt-versions/product-lifecycles") Preview feature This page shows release information for preview builds of Fusion only. When Fusion becomes generally available, these channels will transition to Fusion [release tracks](https://docs.getdbt.com/docs/dbt-versions/cloud-release-tracks.md). Track current versions and full release history for the dbt Fusion engine. This data updates live from dbt release channels. For detailed information about each release, refer to the [dbt Fusion changelog](https://github.com/dbt-labs/dbt-fusion/blob/main/CHANGELOG.md). #### Release channels[​](#release-channels "Direct link to Release channels") The dbt Fusion engine is distributed through three release channels: | Channel | Description | Stability | | -------- | -------------------------------------------- | ------------------------------------------------------------------- | | `latest` | The known `good` stable version | ✅ Recommended for production | | `canary` | The latest version to be officially released | ⚠️ Most recent stable version but still undergoing thorough testing | | `dev` | The latest development build | ❌ May be unstable; may not have passed all internal tests | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | Updating Fusion The following instructions are for updating local installations of Fusion. dbt platform users automatically get the `latest` updates. Running the system update command without a version flag installs the `latest` stable release: ```shell dbt system update ``` To install a specific channel or version, pass the `--version` flag: ```shell dbt system update --version canary # Install the canary release dbt system update --version dev # Install the dev release dbt system update --version 2.0.0-preview.126 # Install a specific version ``` ##### Current versions ###### Dev `v2.0.0-preview.155`2026-03-16 ###### Canary `v2.0.0-preview.154`2026-03-13 ###### Latest `v2.0.0-preview.154`2026-03-13 ##### All releases Search versions... Status:All (all) Channel:All (all) Showing 113 of 113 releases v2.0.0-preview.155GoodDev Released by: **github-merge-queue**Mar 16, 2026, 12:20 PM Automated promotion v2.0.0-preview.154GoodDevCanaryLatest Released by: **kylepeirce**Mar 13, 2026, 10:39 PM Bug fix for INC-5687 v2.0.0-preview.153GoodDevCanaryLatest Released by: **kylepeirce**Mar 12, 2026, 10:37 PM Bug fix for INC-5687 v2.0.0-preview.152GoodDev Released by: **github-merge-queue**Mar 12, 2026, 12:13 PM Automated promotion v2.0.0-preview.151GoodDevCanaryLatest Released by: **kylepeirce**Mar 12, 2026, 06:52 PM Planned orchestration promotion v2.0.0-preview.150GoodDev Released by: **kczimm**Mar 10, 2026, 10:53 PM Automated promotion v2.0.0-preview.149GoodDevCanaryLatest Released by: **mikaylacrawford**Mar 10, 2026, 03:29 PM Planned orchestration promotion v2.0.0-preview.148GoodDevCanaryLatestST MondayST WednesdayST Thursday Released by: **fa-assistant**Mar 12, 2026, 10:10 PM Automated ST promotion v2.0.0-preview.147GoodDevCanary Released by: **kczimm**Mar 7, 2026, 02:04 AM Automated promotion v2.0.0-preview.146Known BadDevCanary Released by: **kczimm**Mar 5, 2026, 11:58 PM Automated promotion v2.0.0-preview.145GoodDevCanaryLatest Released by: **venkaa28**Mar 6, 2026, 04:22 AM INC-5500 v2.0.0-preview.144GoodDevCanaryLatest Released by: **peter-bertuglia**Mar 4, 2026, 08:41 PM retrying scheduled promo for vscode ext v2.0.0-preview.143Known BadDevCanaryLatestST Monday Released by: **akbog**Mar 3, 2026, 03:43 PM inc-5454-minor-table-not-found-in-schema-errors v2.0.0-preview.142GoodDev Released by: **github-merge-queue**Feb 27, 2026, 12:02 PM Automated promotion v2.0.0-preview.141GoodDevCanary Released by: **jasonlin45**Feb 26, 2026, 10:31 PM Automated promotion v2.0.0-preview.139GoodDev Released by: **jasonlin45**Feb 26, 2026, 06:47 PM Automated promotion v2.0.0-preview.137GoodDev Released by: **jasonlin45**Feb 26, 2026, 01:55 AM Automated promotion v2.0.0-preview.135GoodDev Released by: **chayac**Feb 25, 2026, 12:45 AM Automated promotion v2.0.0-preview.134GoodDev Released by: **chayac**Feb 24, 2026, 10:42 PM Automated promotion v2.0.0-preview.127GoodDevCanary Released by: **github-merge-queue**Feb 20, 2026, 12:17 PM Automated promotion v2.0.0-preview.126Known BadDevCanaryLatest Released by: **mikaylacrawford**Feb 20, 2026, 07:38 PM Planned orchestration promotion v2.0.0-preview.125GoodDev Released by: **github-merge-queue\[bot]**Feb 19, 2026, 06:05 PM Automated promotion v2.0.0-preview.123GoodDev Released by: **github-merge-queue**Feb 18, 2026, 12:06 PM Automated promotion v2.0.0-preview.121GoodDev Released by: **github-merge-queue**Feb 16, 2026, 12:09 PM Automated promotion v2.0.0-preview.120GoodDevCanaryLatestST MondayST WednesdayST Thursday Released by: **fa-assistant**Mar 5, 2026, 10:13 PM Automated ST promotion v2.0.0-preview.119GoodDevCanary Released by: **dataders**Feb 13, 2026, 06:48 PM Automated promotion v2.0.0-preview.118GoodDevCanary Released by: **github-merge-queue**Feb 13, 2026, 12:15 PM Automated promotion v2.0.0-preview.117GoodDevCanary Released by: **jzhu13**Feb 13, 2026, 12:45 AM Automated promotion v2.0.0-preview.116GoodDev Released by: **jzhu13**Feb 12, 2026, 07:36 PM Automated promotion v2.0.0-preview.114GoodDevCanaryLatestST MondayST WednesdayST Thursday Released by: **fa-assistant**Feb 26, 2026, 10:15 PM Automated ST promotion v2.0.0-preview.110GoodDevCanaryLatest Released by: **peter-bertuglia**Feb 6, 2026, 07:34 PM INC-5104 v2.0.0-preview.108GoodDevCanaryLatest Released by: **peter-bertuglia**Feb 5, 2026, 04:42 PM Planned orchestration promotion v2.0.0-preview.105GoodDev Released by: **github-merge-queue**Jan 30, 2026, 01:12 PM Automated promotion v2.0.0-preview.104GoodDevCanaryLatestST MondayST WednesdayST Thursday Released by: **fa-assistant**Feb 12, 2026, 10:13 PM Automated ST promotion v2.0.0-preview.103GoodDevCanary Released by: **github-merge-queue**Jan 27, 2026, 01:09 PM Automated promotion v2.0.0-preview.102GoodDevCanaryLatest Released by: **ayshukla**Jan 27, 2026, 08:06 PM Planned orchestration promotion v2.0.0-preview.101GoodDevCanaryLatestST MondayST WednesdayST Thursday Released by: **fa-assistant**Jan 29, 2026, 10:12 PM Automated ST promotion v2.0.0-preview.100GoodDevCanary Released by: **github-merge-queue**Jan 21, 2026, 01:42 PM Automated promotion v2.0.0-preview.99GoodDevCanary Released by: **akbog**Jan 21, 2026, 07:10 AM Automated promotion v2.0.0-preview.98Known BadDevCanary Released by: **github-merge-queue**Jan 19, 2026, 01:22 PM Automated promotion v2.0.0-preview.97GoodDevCanary Released by: **github-merge-queue**Jan 16, 2026, 01:14 PM Automated promotion 2.0.0-preview.97Known Bad v2.0.0-preview.96GoodDevCanaryLatestST MondayST WednesdayST Thursday Released by: **fa-assistant**Jan 22, 2026, 10:10 PM Automated ST promotion v2.0.0-preview.95GoodDevCanary Released by: **jasonlin45**Jan 15, 2026, 06:16 AM Automated promotion v2.0.0-preview.94GoodDevCanaryLatest Released by: **mikaylacrawford**Jan 13, 2026, 07:26 PM Planned orchestration promotion v2.0.0-preview.93Known BadDevCanary Released by: **github-merge-queue**Jan 9, 2026, 01:24 PM Automated promotion v2.0.0-preview.92GoodDevCanaryLatestST MondayST WednesdayST Thursday Released by: **fa-assistant**Jan 15, 2026, 10:10 PM Automated ST promotion v2.0.0-preview.91GoodDevCanary Released by: **github-merge-queue**Dec 22, 2025, 01:09 PM Automated promotion v2.0.0-preview.90GoodDevCanary Released by: **github-merge-queue**Dec 19, 2025, 01:04 PM Automated promotion v2.0.0-preview.89GoodDevCanaryLatestST MondayST WednesdayST Thursday Released by: **laconc**Dec 24, 2025, 12:06 AM st release v2.0.0-preview.88GoodDevCanaryLatest Released by: **ayshukla**Dec 18, 2025, 04:20 PM Planned orchestration promotion v2.0.0-preview.87GoodDevCanary Released by: **github-merge-queue\[bot]**Dec 16, 2025, 01:54 PM Automated promotion v2.0.0-preview.86GoodDevCanaryLatest Released by: **james-durand-dbt**Dec 15, 2025, 08:35 PM INC-4737 v2.0.0-preview.85GoodDevCanaryLatestST MondayST WednesdayST Thursday Released by: **fa-assistant**Dec 18, 2025, 10:10 PM Automated ST promotion v2.0.0-preview.84GoodDevCanaryLatest Released by: **james-durand-dbt**Dec 12, 2025, 06:04 PM Planned orchestration promotion v2.0.0-preview.83GoodDevCanaryLatest Released by: **mikaylacrawford**Dec 10, 2025, 04:19 PM Planned orchestration promotion v2.0.0-preview.82Known BadDevCanary Released by: **xuliangs**Dec 9, 2025, 07:41 PM Automated promotion v2.0.0-preview.81Known BadDevCanary Released by: **ddk-dbt**Dec 9, 2025, 03:21 PM Automated promotion v2.0.0-preview.80Known BadDevCanary Released by: **venkaa28**Dec 8, 2025, 07:34 PM Automated promotion v2.0.0-preview.79GoodDevCanary Released by: **ddk-dbt**Dec 5, 2025, 03:18 PM Automated promotion v2.0.0-preview.78Known BadDevCanary Released by: **ddk-dbt**Dec 4, 2025, 03:24 PM Automated promotion v2.0.0-preview.77Known BadDevCanary Released by: **ddk-dbt**Dec 3, 2025, 10:26 PM Automated promotion v2.0.0-preview.76Known BadDevCanaryLatest Released by: **venkaa28**Dec 4, 2025, 01:36 AM preview.77 causing issues for IA. rolling back for incident. v2.0.0-preview.75GoodDevCanary Released by: **jasonlin45**Nov 22, 2025, 04:09 AM Automated promotion v2.0.0-preview.74GoodDev Released by: **ddk-dbt**Nov 21, 2025, 03:05 PM Automated promotion v2.0.0-preview.73GoodDev Released by: **ddk-dbt**Nov 20, 2025, 03:19 PM Automated promotion v2.0.0-preview.72GoodDevCanaryLatestST MondayST WednesdayST Thursday Released by: **fa-assistant**Dec 11, 2025, 10:11 PM Automated ST promotion v2.0.0-preview.71GoodDev Released by: **venkaa28**Nov 19, 2025, 02:06 AM Automated promotion v2.0.0-preview.70GoodDev Released by: **ddk-dbt**Nov 17, 2025, 03:03 PM Automated promotion v2.0.0-preview.69GoodDev Released by: **ChenyuLInx**Nov 15, 2025, 06:57 AM Automated promotion v2.0.0-preview.68GoodDevCanaryLatest Released by: **dbtlabs007**Nov 18, 2025, 12:44 AM Planned orchestration promotion v2.0.0-preview.67GoodDevCanaryLatestST MondayST WednesdayST Thursday Released by: **laconc**Nov 20, 2025, 10:47 PM Thursday ST release, manual run v2.0.0-preview.66GoodDevCanary Released by: **mishamsk**Nov 12, 2025, 02:09 PM Automated promotion v2.0.0-preview.65GoodDevCanaryLatest Released by: **ayshukla**Nov 12, 2025, 10:00 PM Planned orchestration promotion v2.0.0-preview.63GoodDevCanaryLatestST MondayST WednesdayST Thursday Released by: **ddk-dbt**Nov 13, 2025, 11:05 PM scheduled promotion v2.0.0-preview.62GoodDevCanary Released by: **jzhu13**Nov 5, 2025, 11:18 PM Automated promotion v2.0.0-preview.61GoodDevCanary Released by: **ddk-dbt**Nov 5, 2025, 02:44 PM Automated promotion v2.0.0-preview.60GoodDevCanaryLatest Released by: **mikaylacrawford**Nov 5, 2025, 08:15 PM Planned orchestration promotion v2.0.0-preview.59GoodDevCanaryLatest Released by: **mikaylacrawford**Nov 5, 2025, 02:50 PM Planned orchestration promotion v2.0.0-preview.58GoodDevCanary Released by: **ddk-dbt**Nov 3, 2025, 02:39 PM Automated promotion v2.0.0-preview.57GoodDevCanaryLatestST MondayST WednesdayST Thursday Released by: **fa-assistant**Nov 6, 2025, 10:09 PM Automated ST promotion v2.0.0-preview.56GoodDevCanaryLatest Released by: **mikaylacrawford**Oct 31, 2025, 03:24 PM Planned orchestration promotion v2.0.0-preview.55GoodDev Released by: **dataders**Oct 30, 2025, 07:08 PM Automated promotion v2.0.0-preview.54GoodDevCanary Released by: **ddk-dbt**Oct 30, 2025, 03:24 PM Automated promotion v2.0.0-preview.53GoodDevCanaryLatest Released by: **mikaylacrawford**Oct 30, 2025, 04:47 PM Planned orchestration promotion v2.0.0-preview.52GoodDevCanary Released by: **ddk-dbt**Oct 29, 2025, 02:37 PM Automated promotion v2.0.0-preview.51GoodDevCanary Released by: **dataders**Oct 29, 2025, 01:13 AM Automated promotion v2.0.0-preview.50GoodDevCanaryLatest Released by: **peter-bertuglia**Oct 28, 2025, 06:46 PM Planned orchestration promotion v2.0.0-preview.49GoodDev Released by: **jzhu13**Oct 22, 2025, 06:12 PM Automated promotion v2.0.0-preview.48GoodDevCanaryLatestST Monday Released by: **fa-assistant**Oct 27, 2025, 06:13 PM Automated promotion v2.0.0-preview.47GoodDevCanaryLatest Released by: **ayshukla**Oct 22, 2025, 09:27 PM Planned orchestration promotion v2.0.0-preview.45GoodDevCanaryLatestST MondayST Wednesday Released by: **johnchappelledbt**Oct 22, 2025, 10:16 PM 10-22-25 Wed ST release v2.0.0-preview.44GoodDevCanaryLatest Released by: **dbtlabs007**Oct 9, 2025, 07:39 PM inc-3966-informational-releasing-dbt-lsp-hotfix-to-fix-hang-coalesce-thaw v2.0.0-preview.43GoodDevCanaryLatest Released by: **ayshukla**Oct 9, 2025, 03:56 PM fs deployment freeze promotion v2.0.0-preview.42GoodDevCanary Released by: **github-merge-queue**Oct 8, 2025, 02:26 PM Automated promotion v2.0.0-preview.41GoodDevCanaryLatestST MondayST WednesdayST Thursday Released by: **ddk-dbt**Oct 8, 2025, 09:45 PM pre-coalesce sync v2.0.0-preview.40GoodDevCanary Released by: **davidharting**Oct 7, 2025, 09:56 PM Automated promotion v2.0.0-preview.39GoodDevCanary Released by: **github-merge-queue**Oct 7, 2025, 02:33 PM Automated promotion v2.0.0-preview.38GoodDevCanaryLatest Released by: **ayshukla**Oct 7, 2025, 04:19 PM orc release v2.0.0-preview.37GoodDevCanary Released by: **github-merge-queue\[bot]**Oct 6, 2025, 02:40 PM Automated promotion v2.0.0-preview.36Known BadDevCanary Released by: **github-merge-queue**Oct 3, 2025, 02:21 PM Automated promotion v2.0.0-preview.35GoodDevCanaryLatestST Monday Released by: **fa-assistant**Oct 6, 2025, 06:13 PM Automated promotion v2.0.0-preview.34GoodDevCanaryLatest Released by: **ayshukla**Oct 3, 2025, 03:58 PM Planned orchestration promotion v2.0.0-preview.33GoodDevCanaryLatest Released by: **ayshukla**Oct 2, 2025, 06:23 PM Planned orchestration promotion v2.0.0-preview.32GoodDevCanaryLatest Released by: **mikaylacrawford**Oct 1, 2025, 03:21 PM Planned orchestration promotion v2.0.0-preview.31GoodDevCanary Released by: **github-merge-queue\[bot]**Sep 30, 2025, 02:44 PM Automated promotion v2.0.0-preview.30GoodDevCanaryLatest Released by: **mikaylacrawford**Sep 30, 2025, 04:01 PM Planned orchestration promotion v2.0.0-preview.29GoodDevCanaryLatestST MondayST WednesdayST Thursday Released by: **ddk-dbt**Oct 3, 2025, 03:26 PM Automated promotion v2.0.0-preview.28GoodDev Sep 25, 2025, 02:08 PM v2.0.0-preview.27Known Bad v2.0.0-preview.26GoodCanary Sep 25, 2025, 03:16 PM v2.0.0-preview.25GoodST MondayST WednesdayLatestST Thursday Sep 25, 2025, 10:03 PM v2.0.0-preview.23Known Bad #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Get started with Fusion ##### New to dbt? Start here[​](#new-to-dbt-start-here "Direct link to New to dbt? Start here") Once you've caught up on everything [Fusion has to offer](https://docs.getdbt.com/docs/fusion/about-fusion.md), begin your journey with the most powerful data transformation engine available! [](https://docs.getdbt.com/docs/local/install-dbt.md?version=2#get-started) [](https://docs.getdbt.com/docs/local/install-dbt.md?version=2#get-started) [![](/img/icons/dbt-bit.svg)](https://docs.getdbt.com/docs/local/install-dbt.md?version=2#get-started)[Preview](https://docs.getdbt.com/docs/dbt-versions/product-lifecycles "Go to https://docs.getdbt.com/docs/dbt-versions/product-lifecycles") ###### Install Fusion + VS Code extension Learn how to install and configure our most robust set of tools for local development. [](https://docs.getdbt.com/docs/local/install-dbt.md?version=2#get-started) [](https://docs.getdbt.com/docs/local/install-dbt.md?version=2#get-started) [![](/img/icons/dbt-bit.svg)](https://docs.getdbt.com/docs/local/install-dbt.md?version=2#get-started)[Preview](https://docs.getdbt.com/docs/dbt-versions/product-lifecycles "Go to https://docs.getdbt.com/docs/dbt-versions/product-lifecycles") ###### Install Fusion CLI only Learn how to install and configure the dbt Fusion engine from your command line. [![](/img/icons/dbt-bit.svg)](https://www.getdbt.com/signup) ###### [Sign up for the dbt platform](https://www.getdbt.com/signup) [Create a cloud-based dbt platform account to unlock the full potential of the Fusion engine.](https://www.getdbt.com/signup) ##### Already using dbt? Start here[​](#already-using-dbt-start-here "Direct link to Already using dbt? Start here") Upgrade your existing projects to the dbt Fusion engine and learn about the tools available to you once you're there! [](https://docs.getdbt.com/docs/install-dbt-extension.md) [](https://docs.getdbt.com/docs/install-dbt-extension.md) [![](/img/icons/dbt-bit.svg)](https://docs.getdbt.com/docs/install-dbt-extension.md)[Preview](https://docs.getdbt.com/docs/dbt-versions/product-lifecycles "Go to https://docs.getdbt.com/docs/dbt-versions/product-lifecycles") ###### dbt VS Code extension Add the dbt VS Code extension to your existing development workflows for both dbt platform and the CLI. [](https://docs.getdbt.com/guides/prepare-fusion-upgrade.md) [](https://docs.getdbt.com/guides/prepare-fusion-upgrade.md) [![](/img/icons/dbt-bit.svg)](https://docs.getdbt.com/guides/prepare-fusion-upgrade.md)[Private preview](https://docs.getdbt.com/docs/dbt-versions/product-lifecycles "Go to https://docs.getdbt.com/docs/dbt-versions/product-lifecycles") ###### Guide: Prepare to upgrade to Fusion Step-by-step guide to prepare your dbt platform projects for upgrading to Fusion. [](https://docs.getdbt.com/guides/upgrade-to-fusion.md) [](https://docs.getdbt.com/guides/upgrade-to-fusion.md) [![](/img/icons/dbt-bit.svg)](https://docs.getdbt.com/guides/upgrade-to-fusion.md)[Private preview](https://docs.getdbt.com/docs/dbt-versions/product-lifecycles "Go to https://docs.getdbt.com/docs/dbt-versions/product-lifecycles") ###### Guide: Upgrade to Fusion Learn how to upgrade your eligible projects on the dbt platform to the Fusion engine with this comprehensive guide. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Networking requirements Preview ### Networking requirements [Preview](https://docs.getdbt.com/docs/dbt-versions/product-lifecycles "Go to https://docs.getdbt.com/docs/dbt-versions/product-lifecycles") Fusion requires outbound HTTPS access to several endpoints depending on your usage. This page describes each requirement and provides guidance for enterprise environments that restrict outbound traffic. The following table summarizes all endpoints. See each section below for details. | Resource | URL | Required for | | ----------------------------------------- | ---------------------------------------------- | --------------------------- | | [Adapter drivers](#adapter-drivers) | `https://public.cdn.getdbt.com` | All users | | [Telemetry](#telemetry) | `https://p.vx.dbt.com` | All users (can be disabled) | | [Manifest downloads](#manifest-downloads) | Cloud provider storage URLs (varies by region) | dbt platform users only | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | #### Adapter drivers[​](#adapter-drivers "Direct link to Adapter drivers") The Fusion binary does *not* bundle database drivers. Instead, Fusion automatically downloads the correct [ADBC](https://arrow.apache.org/adbc/) driver for your data platform the first time you run a dbt command (such as `dbt run`, `dbt debug`, or `dbt compile`). Fusion detects which driver you need based on your `profiles.yml` configuration and downloads it from the dbt Labs CDN. Fusion distributes all checksums with the binary itself to guarantee authenticity of the downloaded drivers. Adapter driver downloads require outbound HTTPS access to the dbt CDN: | Resource | URL | Purpose | | ------------------- | ------------------------------- | --------------------------------------------------------------------------------------------------------------------------- | | **Adapter drivers** | `https://public.cdn.getdbt.com` | Downloads ADBC adapter driver libraries (`.dylib`, `.so`, `.dll`) on first use or when running `dbt system install-drivers` | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | info Fusion handles driver download automatically on first use. The `dbt system install-drivers` command downloads **all** supported drivers (Snowflake, BigQuery, Postgres, Databricks, Redshift, DuckDB, and Salesforce) at once. This is useful if you work across multiple data platforms and want to pre-cache every driver before going offline or switching projects. ##### Enterprise proxy considerations[​](#enterprise-proxy-considerations "Direct link to Enterprise proxy considerations") Adapter drivers are native shared libraries (`.dylib` on macOS, `.so` on Linux, `.dll` on Windows). Some enterprise proxy filters and security tools classify these file types as executables and may block the download — even if you allowlist `public.cdn.getdbt.com` at the domain level. If your organization's proxy blocks adapter driver downloads, work with your IT team to ensure both: 1. You allowlist the domain `public.cdn.getdbt.com`. 2. Content inspection rules permit downloading native library file types (`.dylib`, `.so`, `.dll`) from that domain. If you cannot change your proxy configuration, see [Restricted network installation](#restricted-network-installation). ##### Restricted network installation[​](#restricted-network-installation "Direct link to Restricted network installation") If your environment cannot access `public.cdn.getdbt.com` for adapter driver downloads, you can pre-build a bundle of the Fusion binary and the adapter drivers into a single `.tar.gz` or Docker image and host it on an internally approved fileshare. For supported adapters, refer to [Fusion requirements](https://docs.getdbt.com/docs/fusion/supported-features.md#requirements). #### Telemetry[​](#telemetry "Direct link to Telemetry") Fusion sends anonymous usage [telemetry](https://docs.getdbt.com/docs/fusion/telemetry.md) to help improve the product. If the telemetry endpoint is unreachable (for example, blocked by a firewall or proxy), Fusion logs errors on each invocation. | Resource | URL | Purpose | | ------------- | ---------------------- | -------------------------------- | | **Telemetry** | `https://p.vx.dbt.com` | Sends anonymous usage statistics | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | To suppress these errors without allowlisting the URL, disable anonymous telemetry by setting the environment variable: ```shell export DBT_SEND_ANONYMOUS_USAGE_STATS=false ``` You can also add this to your `.env` file in your project root: ```env DBT_SEND_ANONYMOUS_USAGE_STATS=false ``` For more details on `.env` file usage, refer to [Environment variables](https://docs.getdbt.com/docs/local/install-dbt.md?version=2#environment-variables). #### Manifest downloads (dbt platform only) enterprise[​](#manifest-downloads "Direct link to manifest-downloads") For [dbt platform](https://docs.getdbt.com/docs/cloud/about-cloud/dbt-cloud-features.md) customers using Fusion locally, Fusion downloads production manifests from dbt platform to enable features like [deferral](https://docs.getdbt.com/reference/node-selection/defer.md) and [cross-project references](https://docs.getdbt.com/docs/mesh/govern/project-dependencies.md). The [cloud storage provider](https://docs.getdbt.com/docs/cloud/about-cloud/access-regions-ip-addresses.md) hosting your dbt platform cell serves these manifests via **pre-signed URLs**. The specific hostnames depend on your dbt platform deployment region and the underlying cloud provider. To ensure Fusion can download manifests, allowlist the appropriate storage domain for your region: | Cloud provider | URL pattern | Example | | ------------------------ | ----------------------------------------- | ----------------------------------------------- | | **AWS (S3)** | `https://s3..amazonaws.com` | `https://s3.ap-northeast-1.amazonaws.com` (JP1) | | **Azure (Blob Storage)** | `https://.blob.core.windows.net` | `https://prodeu2.blob.core.windows.net` (EU2) | | **GCP (Cloud Storage)** | `https://storage.googleapis.com` | `storage.googleapis.com` | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | Because pre-signed URLs contain region and account-specific hostnames that may change over time, we recommend allowlisting the **base storage domain** for your cloud provider rather than individual URLs: * **AWS** — `s3.*.amazonaws.com` * **Azure** — `*.blob.core.windows.net` * **GCP** — `storage.googleapis.com` #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Redshift setup Preview ### Redshift setup [Preview](https://docs.getdbt.com/docs/dbt-versions/product-lifecycles "Go to https://docs.getdbt.com/docs/dbt-versions/product-lifecycles") You can configure the Redshift adapter by running `dbt init` in your CLI or manually providing the `profiles.yml` file with the fields configured for your authentication type. The Redshift adapter for Fusion supports the following [authentication methods](#supported-authentication-types): * Password * IAM profile #### Configure Fusion[​](#configure-fusion "Direct link to Configure Fusion") Executing `dbt init` in your CLI will prompt for the following fields: * **Host:** The hostname of your Redshift cluster * **User:** Username of the account that will be connecting to the database * **Database:** The database name * **Schema:** The schema name * **Port (default: 5439):** Port for your Redshift environment Alternatively, you can manually create the `profiles.yml` file and configure the fields. See examples in [authentication](#supported-authentication-types) section for formatting. If there is an existing `profiles.yml` file, you are given the option to retain the existing fields or overwrite them. Next, select your authentication method. Follow the on-screen prompts to provide the required information. #### Supported authentication types[​](#supported-authentication-types "Direct link to Supported authentication types") * Password * IAM profile Use your Redshift user's password to authenticate. You can also manually enter it in plain text into the `profiles.yml` file configuration. ###### Example password configuration[​](#example-password-configuration "Direct link to Example password configuration") profiles.yml ```yml default: target: dev outputs: dev: type: redshift port: 5439 database: JAFFLE_SHOP schema: JAFFLE_TEST ra3_node: true method: database host: ABC123.COM user: JANE.SMITH@YOURCOMPANY.COM password: ABC123 threads: 16 ``` Specify the IAM profile to use to connect your Fusion sessions. You will need to provide the following information: * **IAM Profile:** The profile name * **Cluster ID:** The unique identifier for your AWS cluster * **Region:** Your AWS region (for example, us-east-1) * **Use RA3 node type (y/n):** Use high performance AWS RA3 node ###### Example password configuration[​](#example-password-configuration-1 "Direct link to Example password configuration") profiles.yml ```yml default: target: dev outputs: dev: type: redshift port: 5439 database: JAFFLE_SHOP schema: JAFFLE_TEST ra3_node: false method: iam host: YOURHOSTNAME.COM user: JANE.SMITH@YOURCOMPANY.COM iam_profile: YOUR_PROFILE_NAME cluster_id: ABC123 region: us-east-1 threads: 16 ``` #### More information[​](#more-information "Direct link to More information") Find Redshift-specific configuration information in the [Redshift adapter reference guide](https://docs.getdbt.com/reference/resource-configs/redshift-configs.md). #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Salesforce Data 360 setup Beta ### Salesforce Data 360 setup [Beta](https://docs.getdbt.com/docs/dbt-versions/product-lifecycles "Go to https://docs.getdbt.com/docs/dbt-versions/product-lifecycles") This `dbt-salesforce` adapter is available via the dbt Fusion engine CLI. To access the adapter, [install dbt Fusion](https://docs.getdbt.com/docs/fusion/about-fusion-install.md). We recommend using the [VS Code Extension](https://docs.getdbt.com/docs/local/install-dbt.md?version=2#get-started) as the development interface. dbt platform support coming soon. #### Prerequisites[​](#prerequisites "Direct link to Prerequisites") Before you can connect dbt to the Salesforce Data 360, you need the following: * A Data 360 instance * [An external client app that dbt connects to for the Data 360 instance](https://help.salesforce.com/s/articleView?id=xcloud.create_a_local_external_client_app.htm\&type=5), with [OAuth configured](https://help.salesforce.com/s/articleView?id=xcloud.configure_external_client_app_oauth_settings.htm\&type=5). OAuth scopes must include: * `api` - To manage user data via APIs. * `refresh_token`, `offline_access` - To perform requests at any time, even when the user is offline or tokens have expired. * `cdp_query_api` - To execute ANSI SQL queries on Data 360 data. * [A private key and the `server.key` file](https://developer.salesforce.com/docs/atlas.en-us.252.0.sfdx_dev.meta/sfdx_dev/sfdx_dev_auth_key_and_cert.htm) * User with `Data Cloud Architect` permission #### Configure Fusion[​](#configure-fusion "Direct link to Configure Fusion") To connect dbt to Salesforce Data 360, set up your `profiles.yml`. Refer to the following configuration: \~/.dbt/profiles.yml ```yaml company-name: target: dev outputs: dev: type: salesforce method: jwt_bearer client_id: [Consumer Key of your Data 360 app] private_key_path: [local file path of your server key] login_url: "https://login.salesforce.com" username: [username on the Data 360 Instance] ``` | Profile field | Required | Description | Example | | ------------------ | -------- | -------------------------------------------------------------- | ------------------------------------------------------------- | | `method` | Yes | Authentication Method. Currently, only `jwt_bearer` supported. | `jwt_bearer` | | `client_id` | Yes | This is the `Consumer Key` from your connected app secrets. | | | `private_key_path` | Yes | File path of the `server.key` file in your computer. | `/Users/dbt_user/Documents/server.key` | | `login_url` | Yes | Login URL of the Salesforce instance. | [https://login.salesforce.com](https://login.salesforce.com/) | | `username` | Yes | Username on the Data 360 Instance. | | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | #### More information[​](#more-information "Direct link to More information") Find Salesforce-specific configuration information in the [Salesforce adapter reference guide](https://docs.getdbt.com/reference/resource-configs/data-cloud-configs.md). #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Snowflake setup Preview ### Snowflake setup [Preview](https://docs.getdbt.com/docs/dbt-versions/product-lifecycles "Go to https://docs.getdbt.com/docs/dbt-versions/product-lifecycles") You can configure the Snowflake adapter by running `dbt init` in your CLI or manually providing the `profiles.yml` file with the fields configured for your authentication type. The Snowflake adapter for Fusion supports the following [authentication methods](#supported-authentication-types): * Password * Key pair * Single sign-on (SSO) * Password with MFA note [Snowflake is deprecating single-access password login](https://docs.snowflake.com/en/user-guide/security-mfa-rollout). Individual developers should use MFA or SSO instead of password authentication. Password-based login remains supported for service users (Snowflake user type: `LEGACY_SERVICE`). #### Snowflake configuration details[​](#snowflake-configuration-details "Direct link to Snowflake configuration details") The information required for configuring the Snowflake adapter can be found conveniently in your Snowflake account menu: 1. Click on your name from the Snowflake sidebar. 2. Hover over the **Account** field. 3. In the field with your account name, click **View account details**. 4. Click **Config file** and select the appropriate **Warehouse** and **Database**. [![Sample config file in Snowflake.](/img/fusion/connect-adapters/snowflake-account-details.png?v=2 "Sample config file in Snowflake.")](#)Sample config file in Snowflake. #### Configure Fusion[​](#configure-fusion "Direct link to Configure Fusion") Executing `dbt init` in your CLI will prompt for the following fields: * **Account:** Snowflake account number * **User:** Your Snowflake username * **Database:** The database within your Snowflake account to connect to your project * **Warehouse:** The compute warehouse that will handle the tasks for your project * **Schema:** The development/staging/deployment schema for the project * **Role (Optional):** The role dbt should assume when connnecting to the warehouse Alternatively, you can manually create the `profiles.yml` file and configure the fields. See examples in [authentication](#supported-authentication-types) section for formatting. If there is an existing `profiles.yml` file, you are given the option to retain the existing fields or overwrite them. Next, select your authentication method. Follow the on-screen prompts to provide the required information. #### Supported authentication types[​](#supported-authentication-types "Direct link to Supported authentication types") * Password * Key pair * Single sign-on Password authentication prompts for your Snowflake account password. This is becoming an increasingly less common option as organizations opt for more secure authentication. Selecting **Password with MFA** redirects you to the Snowflake account login to provide your passkey or authenticator password. ###### Example password configuration[​](#example-password-configuration "Direct link to Example password configuration") profiles.yml ```yml default: target: dev outputs: dev: type: snowflake threads: 16 account: ABC123 user: JANE.SMITH@YOURCOMPANY.COM database: JAFFLE_SHOP warehouse: TRANSFORM schema: JANE_SMITH password: THISISMYPASSWORD ``` ###### Example password with MFA configuration[​](#example-password-with-mfa-configuration "Direct link to Example password with MFA configuration") profiles.yml ```yml default: target: dev outputs: dev: type: snowflake threads: 16 authenticator: username_password_mfa account: ABC123 user: JANE.SMITH@YOURCOMPANY.COM database: JAFFLE_SHOP warehouse: TRANFORM schema: JANE_SMITH ``` Key pair authentication gives you the option to: * Define the path to the key. * Provide the plain-text PEM format key inline. We recommend using PKCS#8 format with AES-256 encryption for key pair authentication with Fusion. Fusion doesn't support legacy 3DES encryption or headerless key formats. Using older key formats may cause authentication failures. If you encounter the `Key is PKCS#1 (RSA private key). Snowflake requires PKCS#8` error, then your private key is in the wrong format. You have two options: * (Recommended fix) Re-export your key with modern encryption: ```bash # Convert to PKCS#8 with AES-256 encryption openssl genrsa 2048 | openssl pkcs8 -topk8 -v2 aes-256-cbc -inform PEM -out rsa_key.p8 ``` * (Temporary workaround) Add the `BEGIN` header and `END` footer to your PEM body: ```text -----BEGIN ENCRYPTED PRIVATE KEY----- < Your existing encrypted private key contents > -----END ENCRYPTED PRIVATE KEY----- ``` Once the key is configuted, you will be given the option to provide a passphrase, if required. ###### Example key pair configuration[​](#example-key-pair-configuration "Direct link to Example key pair configuration") profiles.yml ```yml default: target: dev outputs: dev: type: snowflake threads: 16 account: ABC123 user: JANE.SMITH@YOURCOMPANY.COM database: JAFFLE_SHOP warehouse: TRANSFORM schema: JANE_SMITH private_key: '' private_key_passphrase: YOURPASSPHRASEHERE ``` Single sign-on will leverage your browser to authenticate the Snowflake session. By default, every connection that dbt opens will require you to re-authenticate in a browser. The Snowflake connector package supports caching your session token, but it [currently only supports Windows and Mac OS](https://docs.snowflake.com/en/user-guide/admin-security-fed-auth-use.html#optional-using-connection-caching-to-minimize-the-number-of-prompts-for-authentication). Refer to the [Snowflake docs](https://docs.snowflake.com/en/sql-reference/parameters.html#label-allow-id-token) for information on enabling this feature in your account. ###### Example SSO configuration[​](#example-sso-configuration "Direct link to Example SSO configuration") profiles.yml ```yml default: target: dev outputs: dev: type: snowflake threads: 16 authenticator: externalbrowser account: ABC123 user: JANE.SMITH@YOURCOMPANY.COM database: JAFFLE_SHOP warehouse: TRANSFORM schema: JANE_SMITH ``` #### More information[​](#more-information "Direct link to More information") Find Snowflake-specific configuration information in the [Snowflake adapter reference guide](https://docs.getdbt.com/reference/resource-configs/snowflake-configs.md). #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Supported features Learn about the features supported by the dbt Fusion engine, including requirements and limitations. #### Requirements[​](#requirements "Direct link to Requirements") To use Fusion in your dbt project you must: * Use a supported adapter and authentication method:  BigQuery * Service Account / User Token * Native OAuth * External OAuth * [Required permissions](https://docs.getdbt.com/docs/local/connect-data-platform/bigquery-setup.md#required-permissions)  Databricks * Service Account / User Token * Native OAuth  Redshift * Username / Password * IAM profile  Snowflake * Username / Password * Native OAuth * External OAuth * Key pair using a modern PKCS#8 method * MFA * Be able to run your project on the latest version of dbt Core with no deprecation warnings or errors. * Migrate your Semantic Layer configurations to the [latest YAML spec](https://docs.getdbt.com/docs/build/latest-metrics-spec.md). #### Parity with dbt Core[​](#parity-with-dbt-core "Direct link to Parity with dbt Core") Our goal is for the dbt Fusion engine to support all capabilities of the dbt Core framework, and then some. Fusion already supports many of the capabilities in dbt Core v1.9, and we're working fast to add more. Note that we have removed some deprecated features and introduced more rigorous validation of erroneous project code. Refer to the [Upgrade guide](https://docs.getdbt.com/docs/dbt-versions/core-upgrade/upgrading-to-fusion.md) for details. #### Features and capabilities[​](#features-and-capabilities "Direct link to Features and capabilities") dbt Fusion engine (built on Rust) gives your team up to 30x faster performance and comes with different features depending on where you use it. * It powers both *engine-level* improvements (like faster compilation and incremental builds) and *editor-level* features (like IntelliSense, hover info, and inline errors) through the LSP through the dbt VS Code extension. * To learn about the LSP features supported across the dbt platform, refer to [About dbt LSP](https://docs.getdbt.com/docs/about-dbt-lsp.md). * To stay up-to-date on the latest features and capabilities, check out the [Fusion diaries](https://github.com/dbt-labs/dbt-fusion/discussions). dbt Core (built on Python) supports SQL rendering but lacks SQL parsing and modern editor features powered by dbt Fusion engine and the LSP. tip dbt platform customers using Fusion can [develop across multiple development surfaces](https://docs.getdbt.com/docs/fusion/fusion-availability.md), including Studio IDE and VS Code with the dbt extension. dbt platform [features](https://docs.getdbt.com/docs/cloud/about-cloud/dbt-cloud-features.md) (like [Advanced CI](https://docs.getdbt.com/docs/deploy/advanced-ci.md), [dbt Mesh](https://docs.getdbt.com/docs/mesh/about-mesh.md), [State-aware orchestration](https://docs.getdbt.com/docs/deploy/state-aware-about.md), and more) are available regardless of which surface you use, depending on your [dbt plan](https://www.getdbt.com/pricing). If you're not sure what features are available in Fusion, the dbt VS Code extension, Fusion-CLI, or more, the following table focuses on Fusion-powered options. In this table, self-hosted means it's open-source/source-available and runs on your own infrastructure; dbt platform is hosted by dbt Labs and includes platform-level features. > ✅ = Available | 🟡 = Partial/at compile-time only | ❌ = Not available | Coming soon = Not yet available | **Category/Capability** | **Fusion CLI**
(self-hosted) | **Fusion + VS Code extension**
(self-hosted) | **dbt platform**
\*\* + VS Code extension\*\*1 | **dbt platform** \*\* + Studio IDE\*\*
\*\* + Other dev surfaces\*\*2 | **Requires
[static analysis](https://docs.getdbt.com/docs/fusion/new-concepts.md#principles-of-static-analysis)** | | ----------------------------------------------- | --------------------------------- | ------------------------------------------------- | --------------------------------------------------- | -------------------------------------------------------------------------- | ---------------------------------------------------------------------------------------------------------------------- | | **Engine performance** | | | | | | | SQL rendering | ✅ | ✅ | ✅ | ✅ | ❌ | | SQL parsing and compilation (SQL understanding) | ✅ | ✅ | ✅ | ✅ | ✅ | | **Editor and dev experience** | | | | | | | IntelliSense/autocomplete/hover info | ❌ | ✅ | ✅ | ✅ | ✅ | | Inline errors (on save/in editor) | 🟡 | ✅ | ✅ | ✅ | ✅ | | Live CTE previews/compiled SQL view | ❌ | ✅ | ✅ | ✅ | 🟡
(Live CTE previews only) | | Refactoring tools (rename model/column) | ❌ | ✅ | ✅ | Coming soon | 🟡
(Column refactor only) | | Go-to definition/references/macro | ❌ | ✅ | ✅ | ✅ | 🟡
(Column go-to definition only) | | Column-level lineage (in editor) | ❌ | ✅ | ✅ | Coming soon | ✅ | | Developer compare changes | ❌ | ❌ | Coming soon | Coming soon | ❌ | | **Platform and governance** | | | | | | | Advanced CI compare changes | ❌ | ❌ | ✅ | ✅ | ❌ | | dbt Mesh | ❌ | ❌ | ✅ | ✅ | ❌ | | Efficient testing | ❌ | ❌ | ✅ | ✅ | ✅ | | State-aware orchestration (SAO) | ❌ | ❌ | ✅ | ✅ | ❌ | | Governance (PII/PHI tracking) | ❌ | ❌ | Coming soon | Coming soon | ✅ | | CI/CD cost optimization (Slimmer CI) | ❌ | ❌ | Coming soon | Coming soon | ✅ | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | 1 Support for other dbt platform and LSP features, like Column-level lineage, is coming soon. See [About LSP](https://docs.getdbt.com/docs/about-dbt-lsp.md) for a more detailed comparison of dbt development environments.
2 The [dbt VS Code extension](https://docs.getdbt.com/docs/about-dbt-extension.md) is usable in VS Code, Cursor, Windsurf, and other VS Code–based editors. ###### Additional considerations[​](#additional-considerations "Direct link to Additional considerations") Here are some additional considerations if using the Fusion CLI without the VS Code extension or the VS Code extension without the Fusion CLI: * **Fusion CLI** ([binary](https://docs.getdbt.com/blog/dbt-fusion-engine-components)) * Free to use and runs on the dbt Fusion engine (distinct from dbt Core). * Benefits from Fusion engine's performance for `parse`, `compile`, `build`, and `run`, but *doesn't* include LSP [features](https://docs.getdbt.com/docs/dbt-extension-features.md) like autocomplete, hover insights, lineage, and more. * Requires `profiles.yml` only (no `dbt_cloud.yml`). * **dbt VS Code extension** * Free to use and runs on the dbt Fusion engine; register your email within 14 days. * Benefits from Fusion engine's performance for `parse`, `compile`, `build`, and `run`, and includes LSP [features](https://docs.getdbt.com/docs/dbt-extension-features.md) like autocomplete, hover insights, lineage, and more. * Capped at 15 users per organization. See the [acceptable use policy](https://www.getdbt.com/dbt-assets/vscode-plugin-aup) for more information. * If you already have a dbt platform user account (even if a trial expired), sign in with the same email. Unlock or reset it if locked. * Requires both `profiles.yml` and `dbt_cloud.yml` files. #### Limitations[​](#limitations "Direct link to Limitations") If your project is using any of the features listed in the following table, you can use Fusion, but you won't be able to fully migrate all your workloads because you have: * Models that leverage specific materialization features may be unable to run or may be missing some desirable configurations. * Tooling that expects dbt Core's exact log output. Fusion's logging system is currently unstable and incomplete. * Workflows built around complementary features of the dbt platform (like model-level notifications) that Fusion does not yet support. * When using the dbt VS Code extension in Cursor, lineage visualization works best in Editor mode and doesn't render in Agent mode. If you're working in Agent mode and need to view lineage, switch to Editor mode to access the full lineage tab functionality. note We have been moving quickly to implement many of these features ahead of General Availability. Read more about [the path to GA](https://docs.getdbt.com/blog/dbt-fusion-engine-path-to-ga), and track our progress in the [`dbt-fusion` milestones](https://github.com/dbt-labs/dbt-fusion/milestones). | Feature | This will affect you if... | GitHub issue | | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | --------------------------------------------------------------------- | | [--warn-error, --warn-error-options](https://docs.getdbt.com/reference/global-configs/warnings.md) | You are upgrading all/specific warnings to errors, or silencing specific warnings, by configuring the warning event names. Fusion's logging system is incomplete and unstable, and so specific event names are likely to change. | [dbt-fusion#8](https://github.com/dbt-labs/dbt-fusion/issues/8) | | Iceberg support (BigQuery) | You have configured models to be materialized as Iceberg tables, or you are defining `catalogs` in your BigQuery project to configure the external write location of Iceberg models. Fusion doesn't support these model configurations for BigQuery. | [dbt-fusion#947](https://github.com/dbt-labs/dbt-fusion/issues/947) | | [Model-level notifications](https://docs.getdbt.com/docs/deploy/model-notifications.md) | You are leveraging the dbt platform’s capabilities for model-level notifications in your workflows. Fusion currently supports job-level notifications. | [dbt-fusion#1103](https://github.com/dbt-labs/dbt-fusion/issues/1103) | | [dbt-docs documentation site](https://docs.getdbt.com/docs/build/view-documentation.md#dbt-docs) and ["docs generate/serve" commands](https://docs.getdbt.com/reference/commands/cmd-docs.md) | Fusion does not yet support a local experience for generating, hosting, and viewing documentation, as dbt Core does via dbt-docs (static HTML site). We intend to support such an experience by GA. If you need to generate and host local documentation, you should continue generating the catalog by running dbt docs generate with dbt Core. | [dbt-fusion#9](https://github.com/dbt-labs/dbt-fusion/issues/9) | | [Programmatic invocations](https://docs.getdbt.com/reference/programmatic-invocations.md) | You use dbt Core’s Python API for triggering invocations and registering callbacks on events/logs. Note that Fusion’s logging system is incomplete and unstable. | [dbt-fusion#10](https://github.com/dbt-labs/dbt-fusion/issues/10) | | [Linting via SQLFluff](https://docs.getdbt.com/docs/deploy/continuous-integration.md#to-configure-sqlfluff-linting) | You use SQLFluff for linting in your development or CI workflows. Eventually, we plan to build linting support into Fusion directly, since the engine has SQL comprehension capabilities. In the meantime, you can continue using the dbt Core + SQLFluff integration. dbt Cloud will do exactly this in the Cloud IDE / Studio + CI jobs. | [dbt-fusion#11](https://github.com/dbt-labs/dbt-fusion/issues/11) | | [`{{ graph }}`](https://docs.getdbt.com/reference/dbt-jinja-functions/graph.md) - `raw_sql` attribute (for example, specific models in [dbt\_project\_evaluator](https://hub.getdbt.com/dbt-labs/dbt_project_evaluator/latest/)) | You access the `raw_sql` / `raw_code` attribute of the `{{ graph }}` context variable, which Fusion stubs with an empty value at runtime. If you access this attribute, your code will not fail, but it will return different results. This is used in three quality checks within the [`dbt_project_evaluator` package](https://hub.getdbt.com/dbt-labs/dbt_project_evaluator/latest/). We intend to find a more-performant mechanism for Fusion to provide this information in the future. | Coming soon | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | #### Package support[​](#package-support "Direct link to Package support") To determine if a package is compatible with the dbt Fusion engine, visit the [dbt package hub](https://hub.getdbt.com/) and look for the Fusion-compatible badge, or review the package's [`require-dbt-version` configuration](https://docs.getdbt.com/reference/project-configs/require-dbt-version.md#pin-to-a-range). * Packages with a `require-dbt-version` that equals or contains `2.0.0` are compatible with Fusion. For example, `require-dbt-version: ">=1.10.0,<3.0.0"`. Even if a package doesn't reflect compatibility in the package hub, it may still work with Fusion. Work with package maintainers to track updates, and [thoroughly test packages](https://docs.getdbt.com/guides/fusion-package-compat?step=5) that aren't clearly compatible before deploying. * Package maintainers who would like to make their package compatible with Fusion can refer to the [Fusion package upgrade guide](https://docs.getdbt.com/guides/fusion-package-compat.md) for instructions. Fivetran package considerations: * The Fivetran `source` and `transformation` packages have been combined into a single package. * If you manually installed source packages like `fivetran/github_source`, you need to ensure `fivetran/github` is installed and deactivate the transformation models. ###### Package compatibility messages[​](#package-compatibility-messages "Direct link to Package compatibility messages") Inconsistent Fusion warnings and `dbt-autofix` logs Fusion warnings and `dbt-autofix` logs may show different messages about package compatibility. If you use [`dbt-autofix`](https://github.com/dbt-labs/dbt-autofix) while upgrading to Fusion in the Studio IDE or dbt VS Code extension, you may see different messages about package compatibility between `dbt-autofix` and Fusion warnings. Here's why: * Fusion warnings are emitted based on a package's `require-dbt-version` and whether `require-dbt-version` contains `2.0.0`. * Some packages are already Fusion-compatible even though package maintainers haven't yet updated `require-dbt-version`. * `dbt-autofix` knows about these compatible packages and will not try to upgrade a package that it knows is already compatible. This means that even if you see a Fusion warning for a package that `dbt-autofix` identifies as compatible, you don't need to change the package. The message discrepancy is temporary while we implement and roll out `dbt-autofix`'s enhanced compatibility detection to Fusion warnings. Here's an example of a Fusion warning in the Studio IDE that says a package isn't compatible with Fusion but `dbt-autofix` indicates it is compatible: ```text dbt1065: Package 'dbt_utils' requires dbt version [>=1.30,<2.0.0], but current version is 2.0.0-preview.72. This package may not be compatible with your dbt version. dbt(1065) [Ln 1, Col 1] ``` #### More information about Fusion[​](#more-information-about-fusion "Direct link to More information about Fusion") Fusion marks a significant update to dbt. While many of the workflows you've grown accustomed to remain unchanged, there are a lot of new ideas, and a lot of old ones going away. The following is a list of the full scope of our current release of the Fusion engine, including implementation, installation, deprecations, and limitations: * [About the dbt Fusion engine](https://docs.getdbt.com/docs/fusion/about-fusion.md) * [About the dbt extension](https://docs.getdbt.com/docs/about-dbt-extension.md) * [New concepts in Fusion](https://docs.getdbt.com/docs/fusion/new-concepts.md) * [Supported features matrix](https://docs.getdbt.com/docs/fusion/supported-features.md) * [Installing Fusion CLI](https://docs.getdbt.com/docs/local/install-dbt.md?version=2#get-started) * [Installing VS Code extension](https://docs.getdbt.com/docs/install-dbt-extension.md) * [Fusion release track](https://docs.getdbt.com/docs/dbt-versions/upgrade-dbt-version-in-cloud.md#dbt-fusion-engine) * [Quickstart for Fusion](https://docs.getdbt.com/guides/fusion.md?step=1) * [Upgrade guide](https://docs.getdbt.com/docs/dbt-versions/core-upgrade/upgrading-to-fusion.md) * [Fusion licensing](http://www.getdbt.com/licenses-faq) #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Telemetry and observability The dbt Fusion engine provides a comprehensive telemetry system that replaces [dbt Core's structured logging](https://docs.getdbt.com/reference/events-logging.md#structured-logging). Built on [OpenTelemetry](https://opentelemetry.io/) conventions and backed by a stable protobuf schema, it enables deep integration with orchestrators, observability platforms, and custom tooling. This uses the same integration that dbt platform relies on for orchestration and monitoring, providing proven and production-ready features that work at scale. #### Available output formats[​](#available-output-formats "Direct link to Available output formats") Fusion telemetry supports three output formats, which you can enable independently: | Format | Use case | Availability | | ----------- | --------------------------------------------------------------------- | --------------------------- | | **JSONL** | Real-time monitoring, streaming to downstream systems. | Written as events occur. | | **Parquet** | Post-run analysis, querying, and long-term storage. | Written when runs complete. | | **OTLP** | Integration with observability platforms (Datadog, Jaeger, and more). | Streamed in real-time. | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | ##### Enabling telemetry output[​](#enabling-telemetry-output "Direct link to Enabling telemetry output") The following are some examples of options for enabling telemetry output (You can combine multiple outputs in a single run): Write JSONL to a file (saves to the `logs/` directory): ```bash dbtf build --otel-file-name telemetry.jsonl ``` Stream JSONL to stdout: ```bash dbtf build --log-format otel ``` Write a Parquet file (saves to `target/metadata/` directory): ```bash dbtf build --otel-parquet-file-name telemetry.parquet ``` Export to an OpenTelemetry collector: ```bash OTEL_EXPORTER_OTLP_ENDPOINT="http://localhost:4318" dbtf build --export-to-otlp ``` #### Telemetry data[​](#telemetry-data "Direct link to Telemetry data") Fusion telemetry contains two types of records: * **Spans** — Operations with a start and end time (like compiling a model or running a test). * **Log records** — Point-in-time events within a span. ##### Telemetry hierarchy[​](#telemetry-hierarchy "Direct link to Telemetry hierarchy") Every dbt command creates a hierarchy of spans: ```text Invocation (dbtf build) ├── Phase (Parse) ├── Phase (Compile) │ ├── Node (model.project.customers) │ └── Node (model.project.orders) └── Phase (Run) ├── Node (model.project.customers) └── Node (model.project.orders) ``` The `trace_id` (also known as `invocation_id`) remains consistent across all telemetry records for a single dbt command, making it easy to correlate events. #### Node outcome[​](#node-outcome "Direct link to Node outcome") Every node produces a result for each phase it participates in. Some phases, such as `parse`, don't involve node-level execution, so they don't produce node spans or node outcomes. The `node_outcome` field indicates whether or not Fusion executed the node's operation. | Outcome | Description | | ---------- | ----------------------------------------------------------------- | | `success` | The node operation was completed with no errors. | | `error` | The node operation failed to execute (for example, syntax error). | | `skipped` | The node was not evaluated (see [skip reasons](#skip-reasons)). | | `canceled` | The node was interrupted (for example, user pressed Ctrl+C). | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | ##### Skip reasons[​](#skip-reasons "Direct link to Skip reasons") When Fusion skips a node, the telemetry includes a reason: | Skip reason | Description | | ---------------- | ------------------------------------------------------------------------------------------------------------------------------------------------- | | `upstream` | A dependency failed. | | `cached` | Fusion reused results from cache (no changes detected via [state aware orchestration](https://docs.getdbt.com/docs/deploy/state-aware-about.md)). | | `phase_disabled` | The phase was disabled (for example, `--static-analysis off`). | | `noop` | Node doesn't perform work in this phase (for example, ephemeral models). | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | ##### Test outcomes[​](#test-outcomes "Direct link to Test outcomes") When a test executes successfully (`node_outcome: success`), it reports the test result: | Test outcome | Description | | ------------ | ---------------------------------------------- | | `passed` | No failures detected. | | `warned` | Failures detected, but configured as warnings. | | `failed` | Failures detected (data quality issue). | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | Test outcomes A test with `node_outcome: success` and `test_outcome: failed` means Fusion successfully ran the test, and the test reported data quality issues. This differs from `node_outcome: error`, which means the test itself couldn't run (for example, invalid SQL). #### Querying telemetry data[​](#querying-telemetry-data "Direct link to Querying telemetry data") Query the telemetry data to gain deeper insights into your dbt runs. ##### JSONL examples[​](#jsonl-examples "Direct link to JSONL examples") The following are some examples of querying the JSONL telemetry data. Watch for errors in real-time: ```bash tail -f telemetry.jsonl | jq 'select(.severity_text == "ERROR")' ``` List skipped nodes, reasons, and upstream details: ```bash cat telemetry.jsonl | jq 'select(.attributes.node_outcome == "NODE_OUTCOME_SKIPPED") | {node: .attributes.unique_id, reason: .attributes.node_skip_reason, upstream: .attributes.node_skip_upstream_detail.upstream_unique_id }' ``` ##### Parquet analysis with DuckDB[​](#parquet-analysis-with-duckdb "Direct link to Parquet analysis with DuckDB") Leverage DuckDB to better understand your telemetry data stored in Parquet files. Find slowest nodes: ```python import duckdb duckdb.sql(""" SELECT attributes.unique_id, (end_time_unix_nano - start_time_unix_nano) / 1e6 AS duration_ms FROM 'telemetry.parquet' WHERE event_type LIKE '%NodeProcessed%' ORDER BY duration_ms DESC LIMIT 10 """).show() ``` Count outcomes by type: ```python duckdb.sql(""" SELECT attributes.node_outcome, COUNT(*) as count FROM 'telemetry.parquet' WHERE attributes.node_outcome IS NOT NULL GROUP BY attributes.node_outcome """).show() ``` #### OpenTelemetry integration[​](#opentelemetry-integration "Direct link to OpenTelemetry integration") Fusion's native OTLP support lets you send telemetry directly to any OpenTelemetry-compatible receiver, including Datadog, Jaeger, Google Cloud Trace, Grafana Tempo, and Honeycomb. This enables: * Integration with existing observability — No custom integrations needed. * Custom alerts that trigger notifications on failures or slow builds. * Correlate across systems that links dbt traces with downstream services. * Centralized monitoring to view dbt alongside your other infrastructure. ##### Setting up OTLP export[​](#setting-up-otlp-export "Direct link to Setting up OTLP export") The following example configures the OTLP export: ```bash export OTEL_EXPORTER_OTLP_ENDPOINT="http://localhost:4318" dbtf build --export-to-otlp ``` #### Mapping to dbt Core concepts[​](#mapping-to-dbt-core-concepts "Direct link to Mapping to dbt Core concepts") If you're familiar with dbt Core's structured logging, here's how Fusion telemetry maps: | dbt Core | Fusion telemetry | | ---------------------------------- | ------------------------------------------------ | | `invocation_id` | `trace_id` (same value, different format) | | `run_results.json` status | `node_outcome` + `skip_reason` or `test_outcome` | | Event `code` (for example, `Q001`) | `event_type` | | `--log-format json` | `--log-format otel` or `--otel-file-name` | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | ##### Node status mapping[​](#node-status-mapping "Direct link to Node status mapping") | dbt Core status | Fusion outcome | | --------------- | ------------------------------------------------ | | `success` | `node_outcome: success` | | `error` | `node_outcome: error` | | `skipped` | `node_outcome: skipped`, `skip_reason: upstream` | | `pass` (tests) | `node_outcome: success`, `test_outcome: passed` | | `warn` (tests) | `node_outcome: success`, `test_outcome: warned` | | `fail` (tests) | `node_outcome: success`, `test_outcome: failed` | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | Note that dbt Core's `fail` status maps to Fusion's `node_outcome: success` because Fusion distinguishes between "the test ran successfully and found data issues" versus "the test couldn't run." This separation enables more precise alerting and retry logic. Fusion adds `skip_reason: cached` for nodes reused via [State Aware Orchestration](https://docs.getdbt.com/docs/deploy/state-aware-about.md), which has no dbt Core equivalent. #### Record structure[​](#record-structure "Direct link to Record structure") Each telemetry record contains envelope fields plus event-specific `attributes`: ```json { "record_type": "SpanEnd", "trace_id": "f9a0a9e64c924b878133363ba3515e50", "span_id": "0000000000000036", "span_name": "Node(model.project.customers)", "parent_span_id": "0000000000000017", "start_time_unix_nano": "1756139116981079652", "end_time_unix_nano": "1756139117234567890", "severity_text": "INFO", "event_type": "v1.public.events.fusion.node.NodeEvaluated", "attributes": { "unique_id": "model.project.customers", "phase": "Run", "node_outcome": "success" } } ``` | Field | Description | | ---------------------------- | --------------------------------------------------------------------------------------------------------------------------------- | | `record_type` | `SpanStart`, `SpanEnd`, or `LogRecord` | | `trace_id` | Unique identifier for the invocation (same data as `invocation_id` but in OTEL format). | | `span_id` / `parent_span_id` | For reconstructing the span hierarchy. | | `event_type` | Type identifier for filtering and parsing. | | `attributes` | Event-specific data (schema varies by event type, but unlike OTEL conventions, it's strictly backed by a stable protobuf schema). | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | #### Schema stability[​](#schema-stability "Direct link to Schema stability") Unlike dbt Core's structured logging, Fusion telemetry is backed by a public protobuf schema with strict compatibility guarantees: * **Additive only** — New fields and event types may be added, but existing fields are never removed or changed. * **Forward compatible** — Your integrations will continue to work as the schema evolves. This makes Fusion telemetry a reliable foundation for production integrations, orchestrators, and long-term analytics pipelines. #### Official client library (coming soon)[​](#official-client-library "Direct link to Official client library (coming soon)") dbt Labs is developing an official open-source client library. Built in Rust for performance, it will be available as: * A standalone Rust crate and CLI. * A fully-typed Python package wrapping the Rust core. The library will provide type-safe, forward-compatible access to telemetry data—stream JSONL in real-time, query Parquet files, and build custom integrations with confidence that schema changes won't break your code. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ## Getting Started ### dbt platform configuration checklist StarterEnterpriseEnterprise + ### dbt platform configuration checklist [Starter](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing")[Enterprise](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing")[Enterprise +](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing") So, you've created a new cloud-hosted dbt platform account, and you're ready to explore its lightning-fast and intuitive features. Welcome! Before you begin, let’s ensure your account is properly configured so that you can easily onboard new users and take advantage of all the integrations dbt has to offer. For most organizations, this will require some collaboration with IT and/or security teams. Depending on the features you're using, you may need some of the following admin personas to help you get set up: * Data warehouse (Snowflake, BigQuery, Databricks, etc.) * Access control (Okta, Entra ID, Google, SAML 2.0) * Git (GitHub, GitLab, Azure DevOps, etc.) This checklist ensures you have everything in the right place, allowing you to deploy quickly and without any bottlenecks. #### Data warehouse[​](#data-warehouse "Direct link to Data warehouse") The dbt platform supports [global connections](https://docs.getdbt.com/docs/cloud/connect-data-platform/about-connections.md#connection-management) for your data warehouses. This means that a single configured connection can be used across multiple projects and environments. The dbt platform supports multiple data warehouse connections, including (but not limited to) BigQuery, Databricks, Redshift, and Snowflake. One of the earliest account configuration steps you'll want to take is ensuring you have a working connection: * Use the [connection set up documentation](https://docs.getdbt.com/docs/cloud/connect-data-platform/about-connections.md) to configure the data warehouse connection of your choice. * Verify that dbt developers have proper roles and access in your data warehouse(s). * Be sure the data warehouse has real data you can reference. This can be production or development data. We have a sandbox e-commerce project called [The Jaffle Shop](https://github.com/dbt-labs/jaffle-shop) that you can use if you prefer. The Jaffle Shop includes mock data and ready-to-run models! * Whether starting a brand new project or importing an existing dbt Core project, you'll want to make sure you have the [proper structure configured](https://docs.getdbt.com/docs/build/projects.md). * If you are migrating from Core, there are some important things you'll need to know, so check out our [migration guide](https://docs.getdbt.com/guides/core-migration-2.md?step=1). * Your users will need to [configure their credentials](https://docs.getdbt.com/docs/cloud/studio-ide/develop-in-studio.md#get-started-with-the-cloud-ide) to connect to the development environment in the dbt Studio IDE. * Ensure that all users who need access to work in the IDE have a [developer license](https://docs.getdbt.com/docs/cloud/manage-access/seats-and-users.md) assigned in your account. * dbt models are primarily written as [SELECT statements](https://docs.getdbt.com/docs/build/sql-models.md), so an early step for measuring success is having a developer run a simple select statement in the IDE and validating the results. * You can also verify the connection by running basic SQL queries using [dbt Insights](https://docs.getdbt.com/docs/explore/access-dbt-insights.md). * Create a single model and ensure that you can [run it](https://docs.getdbt.com/reference/dbt-commands.md) successfully. * For an easy to use drag-and-drop interface, try creating it with [dbt Canvas](https://docs.getdbt.com/docs/cloud/canvas.md). * Create a service account with proper access for your [production jobs](https://docs.getdbt.com/docs/deploy/jobs.md). #### Git configuration[​](#git-configuration "Direct link to Git configuration") Git is, for many dbt environments, the backbone of your project. Git repositories are where your dbt files will live and where your developers will collaborate and manage version control of your project. * Configure a [Git repository](https://docs.getdbt.com/docs/cloud/git/git-configuration-in-dbt-cloud.md) for your account. dbt supports integrations with: * [GitHub](https://docs.getdbt.com/docs/cloud/git/connect-github.md) * [GitLab](https://docs.getdbt.com/docs/cloud/git/connect-gitlab.md) * [Azure DevOps](https://docs.getdbt.com/docs/cloud/git/connect-azure-devops.md) * Other providers using [Git clone](https://docs.getdbt.com/docs/cloud/git/import-a-project-by-git-url.md) * If you aren't ready to integrate with an existing Git solution, dbt can provide you with a [managed Git repository](https://docs.getdbt.com/docs/cloud/git/managed-repository.md). * Ensure developers can [checkout](https://docs.getdbt.com/docs/cloud/git/version-control-basics.md#git-overview) a new branch in your repo. * Ensure developers in the IDE can [commit changes](https://docs.getdbt.com/docs/cloud/studio-ide/ide-user-interface.md#basic-layout). #### Environments and jobs[​](#environments-and-jobs "Direct link to Environments and jobs") [Environments](https://docs.getdbt.com/docs/environments-in-dbt.md) separate your development data from your production data. dbt supports two environment types: Development and Deployment. There are three types of deployment environments: * Production - One per project * Staging - One per project * General - Multiple per project Additionally, you will have only one `Development` environment per project, but each developer will have their own unique access to the IDE, separate from the work of other developers. [Jobs](https://docs.getdbt.com/docs/deploy/jobs.md) dictate which commands are run in your environments and can be triggered manually, on a schedule, by other jobs, by APIs, or when pull requests are committed or merged. Once you connect your data warehouse and complete the Git integration, you can configure environments and jobs: * Start by creating a new [Development environment](https://docs.getdbt.com/docs/dbt-cloud-environments.md#create-a-development-environment) for your project. * Create a [Production Deployment environment](https://docs.getdbt.com/docs/deploy/deploy-environments.md). * (Optional) Create an additional Staging or General environment. * \[ ] [Create and schedule](https://docs.getdbt.com/docs/deploy/deploy-jobs.md#create-and-schedule-jobs) a deployment job. * Validate the job by manually running it first. * If needed, configure different databases for your environments. #### User access[​](#user-access "Direct link to User access") The dbt platform offers a variety of access control tools that you can leverage to grant or revoke user access, configure RBAC, and assign user licenses and permissions. * Manually [invite users](https://docs.getdbt.com/docs/cloud/manage-access/invite-users.md) to the dbt platform, and they can authenticate using [MFA (SMS or authenticator app)](https://docs.getdbt.com/docs/cloud/manage-access/mfa.md). * Configure [single sign-on or OAuth](https://docs.getdbt.com/docs/cloud/manage-access/sso-overview.md) for advanced access control. [Enterprise](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing")[Enterprise +](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing") accounts only. * Create [SSO mappings](https://docs.getdbt.com/docs/cloud/manage-access/about-user-access.md#sso-mappings-) for groups * Configure [System for Cross-Domain Identity Management (SCIM)](https://docs.getdbt.com/docs/cloud/manage-access/scim.md) if available for your IdP. * Ensure invited users are able to connect to the data warehouse from their personal profile. * \[ ] [Create groups](https://docs.getdbt.com/docs/cloud/manage-access/about-user-access.md#create-new-groups-) with granular permission sets assigned. * Create [RBAC rules](https://docs.getdbt.com/docs/cloud/manage-access/about-user-access#role-based-access-control-) to assign users to groups and permission sets upon sign in. [Enterprise](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing")[Enterprise +](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing") accounts only. * Enforce SSO for all non-admin users, and MFA is enforced for all password-based logins. #### Continue the journey[​](#continue-the-journey "Direct link to Continue the journey") Once you've completed this checklist, you're ready to start your dbt platform journey, but that journey has only just begun. Explore these additional resources to support you along the way: * Review the [guides](https://docs.getdbt.com/guides.md) for quickstarts to help you get started with projects and features. * Take a [dbt Learn](https://learn.getdbt.com/catalog) hands-on course. * Review our [best practices](https://docs.getdbt.com/best-practices.md) for practical advice on structuring and deploying your dbt projects. * Become familiar with the [references](https://docs.getdbt.com/reference/references-overview.md), as they are the product dictionary and offer detailed implementation examples. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### dbt Quickstarts Begin your dbt journey by choosing how you want to develop: * [**dbt platform** ](#the-dbt-platform)— Develop in your browser (Studio IDE or Canvas) or use local tools (VS Code extension, dbt CLI) that connect to your platform account. The platform provides hosted CI/CD, documentation, and more. Supports both the [dbt Fusion engine](https://docs.getdbt.com/docs/fusion.md) and [dbt Core](https://docs.getdbt.com/docs/local/install-dbt.md) engines. * [**Local only**](#dbt-local-installations) — Use local tools ([VS Code extension](https://docs.getdbt.com/docs/about-dbt-extension.md), [Fusion CLI](https://docs.getdbt.com/docs/local/install-dbt.md?version=2#get-started), or [dbt Core](https://docs.getdbt.com/docs/local/install-dbt.md)) to develop and run dbt on your own infrastructure. You can use local tools with or without a dbt platform account. * **Local + dbt platform** — Use the VS Code extension or dbt CLI with a dbt platform account to develop locally while leveraging platform features like CI/CD, documentation hosting, Insights, Canvas, and more. #### The dbt platform[​](#the-dbt-platform "Direct link to The dbt platform") dbt provides a fully managed environment to develop, run, and deploy dbt projects—with CI/CD, documentation hosting, and more. Learn more about [dbt features](https://docs.getdbt.com/docs/cloud/about-cloud/dbt-cloud-features.md) and [start your free trial](https://www.getdbt.com/signup/) today. The dbt Fusion engine adds managed execution, [state-aware orchestration](https://docs.getdbt.com/docs/deploy/state-aware-about.md), and a unified development experience so you can focus on building rather than infrastructure. Choose your warehouse to get started with a quickstart: [![](/img/icons/athena.svg)](https://docs.getdbt.com/guides/athena) ###### [Quickstart for dbt and Amazon Athena](https://docs.getdbt.com/guides/athena) [Integrate dbt with Amazon Athena for your data transformations.](https://docs.getdbt.com/guides/athena) [![](/img/icons/azure-synapse-analytics-2.svg)](https://docs.getdbt.com/guides/azure-synapse-analytics) ###### [Quickstart for dbt and Azure Synapse Analytics](https://docs.getdbt.com/guides/azure-synapse-analytics) [Discover how to integrate dbt with Azure Synapse Analytics for your data transformations.](https://docs.getdbt.com/guides/azure-synapse-analytics) [![](/img/icons/bigquery.svg)](https://docs.getdbt.com/guides/bigquery) ###### [Quickstart for dbt and BigQuery](https://docs.getdbt.com/guides/bigquery) [Discover how to leverage dbt with BigQuery to streamline your analytics workflows.](https://docs.getdbt.com/guides/bigquery) [![](/img/icons/databricks.svg)](https://docs.getdbt.com/guides/databricks) ###### [Quickstart for dbt and Databricks](https://docs.getdbt.com/guides/databricks) [Learn how to integrate dbt with Databricks for efficient data processing and analysis.](https://docs.getdbt.com/guides/databricks) [![](/img/icons/fabric.svg)](https://docs.getdbt.com/guides/microsoft-fabric) ###### [Quickstart for dbt and Microsoft Fabric](https://docs.getdbt.com/guides/microsoft-fabric) [Explore the synergy between dbt and Microsoft Fabric to optimize your data transformations.](https://docs.getdbt.com/guides/microsoft-fabric) [![](/img/icons/redshift.svg)](https://docs.getdbt.com/guides/redshift) ###### [Quickstart for dbt and Redshift](https://docs.getdbt.com/guides/redshift) [Learn how to connect dbt to Redshift for more agile data transformations.](https://docs.getdbt.com/guides/redshift) [![](/img/icons/snowflake.svg)](https://docs.getdbt.com/guides/snowflake) ###### [Quickstart for dbt and Snowflake](https://docs.getdbt.com/guides/snowflake) [Unlock the full potential of using dbt with Snowflake for your data transformations.](https://docs.getdbt.com/guides/snowflake) [![](/img/icons/starburst.svg)](https://docs.getdbt.com/guides/starburst-galaxy) ###### [Quickstart for dbt and Starburst Galaxy](https://docs.getdbt.com/guides/starburst-galaxy) [Leverage dbt with Starburst Galaxy to enhance your data transformation workflows.](https://docs.getdbt.com/guides/starburst-galaxy) [![](/img/icons/teradata.svg)](https://docs.getdbt.com/guides/teradata) ###### [Quickstart for dbt and Teradata](https://docs.getdbt.com/guides/teradata) [Discover and use dbt with Teradata to enhance your data transformation workflows.](https://docs.getdbt.com/guides/teradata) #### dbt local installations[​](#dbt-local-installations "Direct link to dbt local installations") When you install dbt locally, you get command-line tools and the VS Code extension that enable you to transform data using analytics engineering best practices. You can use local tools with or without a dbt platform account. With an account, the VS Code extension and dbt CLI sync with your platform project for CI/CD, documentation, and more. Without an account, you run dbt entirely on your own infrastructure. Develop locally using the dbt Fusion engine or dbt Core engine. [![](/img/icons/dbt-bit.svg)](https://docs.getdbt.com/guides/fusion.md?step=2) ###### [dbt Fusion engine from a manual install](https://docs.getdbt.com/guides/fusion.md?step=2) [Learn how to install dbt Fusion and set up a project.](https://docs.getdbt.com/guides/fusion.md?step=2) [![](/img/icons/dbt-bit.svg)](https://docs.getdbt.com/guides/manual-install.md) ###### [dbt Core from a manual install](https://docs.getdbt.com/guides/manual-install.md) [Learn how to install dbt Core and set up a project.](https://docs.getdbt.com/guides/manual-install.md) [![](/img/icons/duckdb-seeklogo.svg)](https://docs.getdbt.com/guides/duckdb.md?step=1) ###### [Quickstart for dbt with DuckDB](https://docs.getdbt.com/guides/duckdb.md?step=1) [Learn how to connect dbt to DuckDB.](https://docs.getdbt.com/guides/duckdb.md?step=1) #### Related docs[​](#related-docs "Direct link to Related docs") Expand your dbt knowledge and expertise with these additional resources: * [Join the monthly demos](https://www.getdbt.com/resources/webinars/dbt-cloud-demos-with-experts) to see dbt in action and ask questions. * [dbt AWS marketplace](https://aws.amazon.com/marketplace/pp/prodview-tjpcf42nbnhko) contains information on how to deploy dbt on AWS, user reviews, and more. * [Best practices](https://docs.getdbt.com/best-practices.md) contains information on how dbt Labs approaches building projects through our current viewpoints on structure, style, and setup. * [dbt Learn](https://learn.getdbt.com) offers free online courses that cover dbt fundamentals, advanced topics, and more. * [Join the dbt Community](https://www.getdbt.com/community/join-the-community) to learn how other data practitioners globally are using dbt, share your own experiences, and get help with your dbt projects. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### What is dbt? dbt transforms raw warehouse data into trusted data products. You write simple SQL select statements, and dbt handles the heavy lifting by creating modular, maintainable data models that power analytics, operations, and AI -- replacing the need for complex and fragile transformation code. dbt is the industry standard for data transformation, helping you get more work done while producing higher quality results. You can use dbt and its [framework](#dbt-framework) to: * Centralize and modularize your analytics code, while also providing your data team with guardrails typically found in software engineering workflows. * Collaborate on data models to safely deploy and monitor data transformations in production. * Apply software engineering best practices like version control, testing, modularity, CI/CD, and documentation to analytics workflows. Backed by a 100,000+ member [community](https://docs.getdbt.com/community/join.md), dbt helps teams build high-quality, trustworthy data pipelines faster. [![dbt works alongside your ingestion, visualization, and other data tools, so you can transform data directly in your cloud data platform.](/img/docs/cloud-overview.jpg?v=2 "dbt works alongside your ingestion, visualization, and other data tools, so you can transform data directly in your cloud data platform.")](#)dbt works alongside your ingestion, visualization, and other data tools, so you can transform data directly in your cloud data platform. Read more about why we want to enable analysts to work more like software engineers in [The dbt Viewpoint](https://docs.getdbt.com/community/resources/viewpoint.md). Learn how other data practitioners around the world are using dbt by [joining the dbt Community](https://www.getdbt.com/community/join-the-community). #### dbt framework[​](#dbt-framework "Direct link to dbt framework") Use the dbt framework to quickly and collaboratively transform data and deploy analytics code following software engineering best practices like version control, modularity, portability, CI/CD, and documentation. This means anyone on the data team familiar with SQL can safely contribute to production-grade data pipelines. The dbt framework is composed of a *language* and an *engine*: * The *dbt language* is the code you write in your dbt project — SQL select statements, Jinja templating, YAML configs, tests, and more. It's the standard for the data industry and the foundation of the dbt framework. * The *dbt engine* compiles your project, executes your transformation graph, and produces metadata. dbt supports two engines which you can use depending on your needs: * The dbt Core engine, which renders Jinja and runs your models. * The dbt Fusion engine, which goes beyond Jinja rendering to statically analyze your SQL — validating syntax and logic before your SQL is sent to the database (saving compute resources), and supports LSP features. ##### dbt Fusion engine[​](#dbt-fusion-engine "Direct link to dbt Fusion engine") The dbt Fusion engine is a Rust-based engine that delivers a lightning-fast development experience, intelligent cost savings, and improved governance. Fusion understands SQL natively across multiple dialects, catches errors instantly, and optimizes how your models are built — bringing SQL comprehension and state awareness, instant feedback, LSP, and more to every dbt workflow. Fusion powers dbt in the [dbt platform](https://docs.getdbt.com/docs/cloud/about-cloud/dbt-cloud-features.md), [VS Code / Cursor](https://docs.getdbt.com/docs/about-dbt-extension.md), and [locally from the command line](https://docs.getdbt.com/docs/local/install-dbt.md?version=2#get-started). You don't need to have a dbt platform project to use the dbt Fusion engine. For more information, refer to [About the dbt Fusion engine](https://docs.getdbt.com/docs/fusion.md), [supported features](https://docs.getdbt.com/docs/fusion/supported-features.md), and the [get started with Fusion](https://docs.getdbt.com/docs/fusion/get-started-fusion.md) pages. ##### dbt Core engine[​](#dbt-core-engine "Direct link to dbt Core engine") [dbt Core](https://docs.getdbt.com/docs/local/install-dbt.md) is the open-source, Python-based engine that enables data practitioners to transform data. dbt Core surfaces feedback when you run or build your project. It doesn't include Fusion features like the LSP, for example, which provides instant feedback as you type. Learn more with the [quickstart for dbt Core](https://docs.getdbt.com/guides/duckdb.md?step=1). #### How to use dbt[​](#how-to-use-dbt "Direct link to How to use dbt") You can deploy dbt projects in different ways depending on your needs: * Using the [dbt platform](#dbt-platform) (recommended for most users) * [Locally from your command line or code editor](#dbt-local-development) All ways support using the dbt Fusion engine or dbt Core engine. ##### dbt platform[​](#dbt-platform "Direct link to dbt platform") The dbt platform offers the fastest, most reliable, and scalable way to deploy dbt. It can be powered by the dbt Fusion engine or dbt Core engine, and provides a fully managed service with scheduling, CI/CD, documentation hosting, monitoring, development, and alerting through a web-based user interface (UI). The dbt platform offers [multiple ways](https://docs.getdbt.com/docs/cloud/about-cloud/dbt-cloud-features.md) to develop and collaborate on dbt projects: * [Develop in your browser using the Studio IDE](https://docs.getdbt.com/docs/cloud/studio-ide/develop-in-studio.md) * [Seamless drag-and-drop development with Canvas](https://docs.getdbt.com/docs/cloud/canvas.md) * [Run dbt commands from your local command line](#dbt-local-development) using the dbt VS Code extension or dbt CLI (both which integrate seamlessly with the dbt platform project(s)). Learn more about the [dbt platform features](https://docs.getdbt.com/docs/cloud/about-cloud/dbt-cloud-features.md) and try one of the [dbt Quickstarts](https://docs.getdbt.com/guides.md). You can learn about plans and pricing on [www.getdbt.com](https://www.getdbt.com/pricing/). ##### dbt local development[​](#dbt-local-development "Direct link to dbt local development") Use the dbt framework and develop dbt projects from your command line or code editor: * [Install the dbt VS Code extension](https://docs.getdbt.com/docs/about-dbt-extension.md) — Combines the dbt Fusion engine performance with visual features like autocomplete, inline errors, and lineage. Includes [LSP features](https://docs.getdbt.com/docs/about-dbt-lsp.md) and suitable for users with dbt platform projects or running dbt locally without a dbt platform project. *Recommended for local development.* * [Install the Fusion CLI](https://docs.getdbt.com/docs/local/install-dbt.md?version=2#get-started) — The dbt Fusion engine from the command line, but doesn't include LSP features. * [Install the dbt CLI](https://docs.getdbt.com/docs/cloud/cloud-cli-installation.md) — The dbt platform CLI, which allows you to run dbt commands against your dbt platform development environment from your local command line. * [Install dbt Core](https://docs.getdbt.com/docs/local/install-dbt.md) — The open-source, Python-based CLI that uses the dbt Core engine. Doesn't include LSP features. #### Why use dbt[​](#why-use-dbt "Direct link to Why use dbt") As a dbt user, your main focus will be on writing models (select queries) that reflect core business logic – there's no need to write boilerplate code to create tables and views, or to define the order of execution of your models. Instead, dbt handles turning these models into objects in your warehouse for you * **No boilerplate** — Write business logic with just a SQL `select` statement or a Python DataFrame. dbt handles materialization, transactions, DDL, and schema changes. * **Modular and reusable** — Build data models that can be referenced in subsequent work. Change a model once and the change propagates to all its dependencies, so you can publish canonical business logic without reimplementing it. * **Fast builds** — Use [incremental models](https://docs.getdbt.com/docs/build/incremental-models.md) and leverage metadata to optimize long-running models. * **Tested and documented** — Write [data quality tests](https://docs.getdbt.com/docs/build/data-tests.md) on your underlying data and auto-generate [documentation](https://docs.getdbt.com/docs/build/documentation.md) alongside your code. * **Software engineering workflows** — Version control, branching, pull requests, CI/CD, and [package management](https://docs.getdbt.com/docs/build/packages.md) for your data pipelines. Write DRYer code with [macros](https://docs.getdbt.com/docs/build/jinja-macros.md) and [hooks](https://docs.getdbt.com/docs/build/hooks-operations.md). * **State-aware orchestration** — Use the dbt Fusion engine to orchestrate your dbt projects and models with [state-awareness orchestration](https://docs.getdbt.com/docs/deploy/state-aware-about.md), which automatically determines which models to build by detecting changes in code or data. This reduces runtime and costs by only building the models that have changed. #### Related docs[​](#related-docs "Direct link to Related docs") * [Quickstarts for dbt](https://docs.getdbt.com/guides.md) * [Best practice guides](https://docs.getdbt.com/best-practices.md) * [What is a dbt Project?](https://docs.getdbt.com/docs/build/projects.md) * [dbt run](https://docs.getdbt.com/docs/running-a-dbt-project/run-your-dbt-projects.md) #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ## Guides ### Airflow and dbt [Back to guides](https://docs.getdbt.com/guides.md) dbt platform Orchestration Intermediate [Menu ]() #### Introduction[​](#introduction "Direct link to Introduction") Many organizations already use [Airflow](https://airflow.apache.org/) to orchestrate their data workflows. dbt works great with Airflow, letting you execute your dbt code in dbt while keeping orchestration duties with Airflow. This ensures your project's metadata (important for tools like Catalog) is available and up-to-date, while still enabling you to use Airflow for general tasks such as: * Scheduling other processes outside of dbt runs * Ensuring that a [dbt job](https://docs.getdbt.com/docs/deploy/job-scheduler.md) kicks off before or after another process outside of dbt * Triggering a dbt job only after another has completed In this guide, you'll learn how to: 1. Create a working local Airflow environment 2. Invoke a dbt job with Airflow 3. Reuse tested and trusted Airflow code for your specific use cases You’ll also gain a better understanding of how this will: * Reduce the cognitive load when building and maintaining pipelines * Avoid dependency hell (think: `pip install` conflicts) * Define clearer handoff of workflows between data engineers and analytics engineers #### Prerequisites[​](#prerequisites "Direct link to Prerequisites") * [dbt Enterprise or Enterprise+ account](https://www.getdbt.com/pricing/) (with [admin access](https://docs.getdbt.com/docs/cloud/manage-access/enterprise-permissions.md)) in order to create a service token. Permissions for service tokens can be found [here](https://docs.getdbt.com/docs/dbt-cloud-apis/service-tokens.md#permissions-for-service-account-tokens). * A [free Docker account](https://hub.docker.com/signup) in order to sign in to Docker Desktop, which will be installed in the initial setup. * A local digital scratchpad for temporarily copy-pasting API keys and URLs 🙌 Let’s get started! 🙌 #### Install the Astro CLI[​](#install-the-astro-cli "Direct link to Install the Astro CLI") Astro is a managed software service that includes key features for teams working with Airflow. In order to use Astro, we’ll install the Astro CLI, which will give us access to useful commands for working with Airflow locally. You can read more about Astro [here](https://docs.astronomer.io/astro/). In this example, we’re using Homebrew to install Astro CLI. Follow the instructions to install the Astro CLI for your own operating system [here](https://docs.astronomer.io/astro/install-cli). ```bash brew install astro ``` #### Install and start Docker Desktop[​](#install-and-start-docker-desktop "Direct link to Install and start Docker Desktop") Docker allows us to spin up an environment with all the apps and dependencies we need for this guide. Follow the instructions [here](https://docs.docker.com/desktop/) to install Docker desktop for your own operating system. Once Docker is installed, ensure you have it up and running for the next steps. #### Clone the airflow-dbt-cloud repository[​](#clone-the-airflow-dbt-cloud-repository "Direct link to Clone the airflow-dbt-cloud repository") Open your terminal and clone the [airflow-dbt-cloud repository](https://github.com/dbt-labs/airflow-dbt-cloud). This contains example Airflow DAGs that you’ll use to orchestrate your dbt job. Once cloned, navigate into the `airflow-dbt-cloud` project. ```bash git clone https://github.com/dbt-labs/airflow-dbt-cloud.git cd airflow-dbt-cloud ``` For more information about cloning GitHub repositories, refer to "[Cloning a repository](https://docs.github.com/en/repositories/creating-and-managing-repositories/cloning-a-repository)" in the GitHub documentation. #### Start the Docker container[​](#start-the-docker-container "Direct link to Start the Docker container") 1. From the `airflow-dbt-cloud` directory you cloned and opened in the prior step, run the following command to start your local Airflow deployment: ```bash astro dev start ``` When this finishes, you should see a message similar to the following: ```bash Airflow is starting up! This might take a few minutes… Project is running! All components are now available. Airflow Webserver: http://localhost:8080 Postgres Database: localhost:5432/postgres The default Airflow UI credentials are: admin:admin The default Postgres DB credentials are: postgres:postgres ``` 2. Open the Airflow interface. Launch your web browser and navigate to the address for the **Airflow Webserver** from your output above (for us, `http://localhost:8080`). This will take you to your local instance of Airflow. You’ll need to log in with the **default credentials**: * Username: admin * Password: admin ![Airflow login screen](/assets/images/airflow-login-56d38c8b37cf6d5cfe9672e8274a2d19.png) #### Create a dbt service token[​](#create-a-dbt-service-token "Direct link to Create a dbt service token") [Create a service token](https://docs.getdbt.com/docs/dbt-cloud-apis/service-tokens.md) with `Job Admin` privileges from within dbt. Ensure that you save a copy of the token, as you won’t be able to access this later. #### Create a dbt job[​](#create-a-dbt-job "Direct link to Create a dbt job") [Create a job in your dbt account](https://docs.getdbt.com/docs/deploy/deploy-jobs.md#create-and-schedule-jobs), paying special attention to the information in the bullets below. * Configure the job with the full commands that you want to include when this job kicks off. This sample code has Airflow triggering the dbt job and all of its commands, instead of explicitly identifying individual models to run from inside of Airflow. * Ensure that the schedule is turned **off** since we’ll be using Airflow to kick things off. * Once you hit `save` on the job, make sure you copy the URL and save it for referencing later. The url will look similar to this: ```html https://YOUR_ACCESS_URL/#/accounts/{account_id}/projects/{project_id}/jobs/{job_id}/ ``` #### Connect dbt to Airflow[​](#connect-dbt-to-airflow "Direct link to Connect dbt to Airflow") Now you have all the working pieces to get up and running with Airflow + dbt. It's time to **set up a connection** and **run a DAG in Airflow** that kicks off a dbt job. 1. From the Airflow interface, navigate to Admin and click on **Connections** ![Airflow connections menu](/assets/images/airflow-connections-menu-71e1784305b7249eba5892141dde3d98.png) 2. Click on the `+` sign to add a new connection, then click on the drop down to search for the dbt Connection Type. ![Connection type](/assets/images/connection-type-42874161269a9c455e01c5734c4aaf64.png) 3. Add in your connection details and your default dbt account id. This is found in your dbt URL after the accounts route section (`/accounts/{YOUR_ACCOUNT_ID}`), for example the account with id 16173 would see this in their URL: `https://YOUR_ACCESS_URL/#/accounts/16173/projects/36467/jobs/65767/` ![Connection type](/assets/images/connection-type-configured-f0be2b2ee60d8192f0cf0bbeac03528b.png) #### Update the placeholders in the sample code[​](#update-the-placeholders-in-the-sample-code "Direct link to Update the placeholders in the sample code") Add your `account_id` and `job_id` to the python file [dbt\_cloud\_run\_job.py](https://github.com/dbt-labs/airflow-dbt-cloud/blob/main/dags/dbt_cloud_run_job.py). Both IDs are included inside of the dbt job URL as shown in the following snippets: ```python # For the dbt Job URL https://YOUR_ACCESS_URL/#/accounts/16173/projects/36467/jobs/65767/ # The account_id is 16173 and the job_id is 65767 # Update lines 34 and 35 ACCOUNT_ID = "16173" JOB_ID = "65767" ``` #### Run the Airflow DAG[​](#run-the-airflow-dag "Direct link to Run the Airflow DAG") Turn on the DAG and trigger it to run. Verify the job succeeded after running. ![Airflow DAG](/assets/images/airflow-dag-d7d6a6fe556ac6e8a7970ae7305a5bc3.png) Click **Monitor Job Run** to open the run details in dbt. ![Task run instance](/assets/images/task-run-instance-936ac2e4ef47727b434363656900a99d.png) #### Cleaning up[​](#cleaning-up "Direct link to Cleaning up") At the end of this guide, make sure you shut down your docker container. When you’re done using Airflow, use the following command to stop the container: ```bash $ astrocloud dev stop [+] Running 3/3 ⠿ Container airflow-dbt-cloud_e3fe3c-webserver-1 Stopped 7.5s ⠿ Container airflow-dbt-cloud_e3fe3c-scheduler-1 Stopped 3.3s ⠿ Container airflow-dbt-cloud_e3fe3c-postgres-1 Stopped 0.3s ``` To verify that the deployment has stopped, use the following command: ```bash astrocloud dev ps ``` This should give you an output like this: ```bash Name State Ports airflow-dbt-cloud_e3fe3c-webserver-1 exited airflow-dbt-cloud_e3fe3c-scheduler-1 exited airflow-dbt-cloud_e3fe3c-postgres-1 exited ``` #### Frequently asked questions[​](#frequently-asked-questions "Direct link to Frequently asked questions") ##### How can we run specific subsections of the dbt DAG in Airflow?[​](#how-can-we-run-specific-subsections-of-the-dbt-dag-in-airflow "Direct link to How can we run specific subsections of the dbt DAG in Airflow?") Because the Airflow DAG references dbt jobs, your analytics engineers can take responsibility for configuring the jobs in dbt. For example, to run some models hourly and others daily, there will be jobs like `Hourly Run` or `Daily Run` using the commands `dbt run --select tag:hourly` and `dbt run --select tag:daily` respectively. Once configured in dbt, these can be added as steps in an Airflow DAG as shown in this guide. Refer to our full [node selection syntax docs here](https://docs.getdbt.com/reference/node-selection/syntax.md). ##### How can I re-run models from the point of failure?[​](#how-can-i-re-run-models-from-the-point-of-failure "Direct link to How can I re-run models from the point of failure?") You can trigger re-run from point of failure with the `rerun` API endpoint. See the docs on [retrying jobs](https://docs.getdbt.com/docs/deploy/retry-jobs.md) for more information. ##### Should Airflow run one big dbt job or many dbt jobs?[​](#should-airflow-run-one-big-dbt-job-or-many-dbt-jobs "Direct link to Should Airflow run one big dbt job or many dbt jobs?") dbt jobs are most effective when a build command contains as many models at once as is practical. This is because dbt manages the dependencies between models and coordinates running them in order, which ensures that your jobs can run in a highly parallelized fashion. It also streamlines the debugging process when a model fails and enables re-run from point of failure. As an explicit example, it's not recommended to have a dbt job for every single node in your DAG. Try combining your steps according to desired run frequency, or grouping by department (finance, marketing, customer success...) instead. ##### We want to kick off our dbt jobs after our ingestion tool (such as Fivetran) / data pipelines are done loading data. Any best practices around that?[​](#we-want-to-kick-off-our-dbt-jobs-after-our-ingestion-tool-such-as-fivetran--data-pipelines-are-done-loading-data-any-best-practices-around-that "Direct link to We want to kick off our dbt jobs after our ingestion tool (such as Fivetran) / data pipelines are done loading data. Any best practices around that?") Astronomer's DAG registry has a sample workflow combining Fivetran, dbt and Census [here](https://registry.astronomer.io/dags/fivetran-dbt_cloud-census/versions/3.0.0). ##### How do you set up a CI/CD workflow with Airflow?[​](#how-do-you-set-up-a-cicd-workflow-with-airflow "Direct link to How do you set up a CI/CD workflow with Airflow?") Check out these two resources for accomplishing your own CI/CD pipeline: * [Continuous Integration with dbt](https://docs.getdbt.com/docs/deploy/continuous-integration.md) * [Astronomer's CI/CD Example](https://docs.astronomer.io/software/ci-cd/#example-cicd-workflow) ##### Can dbt dynamically create tasks in the DAG like Airflow can?[​](#can-dbt-dynamically-create-tasks-in-the-dag-like-airflow-can "Direct link to Can dbt dynamically create tasks in the DAG like Airflow can?") As discussed above, we prefer to keep jobs bundled together and containing as many nodes as are necessary. If you must run nodes one at a time for some reason, then review [this article](https://www.astronomer.io/blog/airflow-dbt-1/) for some pointers. ##### Can you trigger notifications if a dbt job fails with Airflow?[​](#can-you-trigger-notifications-if-a-dbt-job-fails-with-airflow "Direct link to Can you trigger notifications if a dbt job fails with Airflow?") Yes, either through [Airflow's email/slack](https://www.astronomer.io/guides/error-notifications-in-airflow/) functionality, or [dbt's notifications](https://docs.getdbt.com/docs/deploy/job-notifications.md), which support email and Slack notifications. You could also create a [webhook](https://docs.getdbt.com/docs/deploy/webhooks.md). ##### How should I plan my dbt + Airflow implementation?[​](#how-should-i-plan-my-dbt--airflow-implementation "Direct link to How should I plan my dbt + Airflow implementation?") Check out [this recording](https://www.youtube.com/watch?v=n7IIThR8hGk) of a dbt meetup for some tips. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Analyze your data in dbt Start with a stakeholder question and analyze the data to answer that question without writing any SQL [Back to guides](https://docs.getdbt.com/guides.md) Analyst dbt platform Quickstart [Menu ]() #### Introduction[​](#introduction "Direct link to Introduction") As a data analyst, you play a key role in transforming complex data into trusted, actionable insights for your team. With dbt, you can use built-in, AI-powered tools to build governed data models, explore how they’re built, and even run your own analysis. In this quickstart, you’ll learn how to: * Use Catalog to browse and understand data models across both dbt and Snowflake data assets * Use Insights to run queries for exploring and validating your data * Use Canvas to visually build your own data models * Build confidence using dbt as your workspace enhanced with AI Here's more about the tools you will use on your journey: * Catalog: View your project's resources (such as models, tests, and metrics), their lineage, and query patterns to gain a better understanding of its latest production state. * Insights: Explore, validate, and query data with an intuitive, context-rich interface that bridges technical and business users by combining metadata, documentation, AI-assisted tools, and powerful querying capabilities. * Canvas: Quickly access and transform data through a visual, drag-and-drop experience and with a built-in AI for custom code generation. #### Prerequisites[​](#prerequisites "Direct link to Prerequisites") Before you begin, make sure: * You have access to and credentials configured for a dbt project * Your team has already run a successful dbt job, so models are built and ready * You have a a git provider connected and authenticated #### Analyst workflows[​](#analyst-workflows "Direct link to Analyst workflows") Kimiko, an analyst at the Jaffle Shop, notices they've been doing a lot of new sales and wants to investigate the most critical data they have in their warehouse. **Question: A stakeholder is curious how many customers you've acquired month by month, in the last 12 months.** Kimiko wonders, "How do I find data in our project that will help me answer their question?" ##### Explore a stakeholder question[​](#explore-a-stakeholder-question "Direct link to Explore a stakeholder question") She navigates to the data catalog, Catalog, by signing into dbt and clicking Catalog in the left panel. Because the question was about customers, Kimiko begins by searching for "customers" in Catalog: [![Catalog search for customers](/img/guides/analyst-qs/catalog-search.png?v=2 "Catalog search for customers")](#)Catalog search for customers She finds a "customers" model, which might be what she needs. She clicks **customers** to open the model. The description reads, “Customer overview data Mart offering key details for each unique customer, one row per customer.” Next, Kimiko selects **Columns** to see which columns this model uses. [![Columns in customers table](/img/guides/analyst-qs/columns.png?v=2 "Columns in customers table")](#)Columns in customers table She notices these columns: `customer_ID`, `customer_names`, and `first_ordered_at`. The `first_ordered_at` column stands out to Kimiko, and she wonders if she might use it to see how many customers they've acquired based on when they placed their first order. But first, she decides to interact with the data to learn more. ##### Query data in Insights[​](#query-data-in-insights "Direct link to Query data in Insights") From the **Customer model page** in Catalog, Kimiko selects **Analyze data** from the **Open in...** dropdown. This enables her to query data for the Customer model. Once opened, Insights contains a query poised and ready to run. [![Open query](/img/guides/analyst-qs/query.png?v=2 "Open query")](#)Open query When Kimiko runs the query, she can look at the data underyling it. The same context she saw in Catalog she now sees in her SQL editing experience. As she looks through the data, she sees information about each customer. She also notices the `first_ordered_at` column. Kimiko wants to code the query but her SQL is a little rusty so she uses natural language in dbt Copilot: *How many new customers did we get in each month last year? I'd like to use my customer model and the first ordered at field to do this analysis.* dbt Copilot writes SQL that Kimiko decides to use: ```sql select date_trunc('month', first_ordered_at) as month, count(customer_id) as new_customers from {{ ref('customers') }} where date_part('year', first_ordered_at) = date_part('year', current_date) - 1 and customer_type = 'new' group by 1 order by 1; ``` Kimiko clicks **Replace** to move all of the SQL into her editor and replaces the original query. She runs the new query and reviews the data but decides to limit the dates using Copilot once again: *Can we limit the dates to 2024?* She verifies the new filter for 2024 and reruns this query: ```sql select date_trunc('month', first_ordered_at) as month, count(customer_id) as new_customers from {{ ref('customers') }} where date_part('year', first_ordered_at) = 2024 and customer_type = 'new' group by 1 order by 1; ``` She's happy with the results and clicks **Details** to see the AI-generated report, which includes a title and description, supplied SQL, and the compiled SQL. [![Details report tabe](/img/guides/analyst-qs/details.png?v=2 "Details report tabe")](#)Details report tabe Once she's ready to get the insight to her stakeholder, she clicks **Chart** to view the chart prefilled with the data from the **Data** tab. She adds x- and y-axis labels, such as "Month of first order" and "Total new customers" to make it more comprehensible for the final report she'll share with her stakeholder. Next, she takes a screenshot to share with them. She often comes back to this data so Kimiko decides to bookmark the page by clicking **Bookmark** in the top right. She also exports it to a CSV file. ##### Visualize results[​](#visualize-results "Direct link to Visualize results") Kimiko has a few conversations with teammates and she finds out they're running pretty similar one-off queries, so she decides to take her long running query that she previously bookmarked and turn it into a full-fledged dbt model using Canvas. She does this so she can share it with others, which de-duplicates work and helps her team become more efficient. To do this, she opens the query in Insights and clicks **Develop** then \***Develop in** Canvas. This opens the SQL query in a visual form, represented in a DAG. When she examines the model, she notes it's selecting from customers as expected, filtering for 2024, showing dates by month, and aggregating over that month. She runs it in her development environment and clicks **Commit** to submit a pull request. Now Kimiko's entire team, those who have the same access as her, can run this model and see the same results she does! What's more is they can help her improve the model as the stakeholder requests get more complicated, and she will benefit from their help. ##### The query becomes a model[​](#the-query-becomes-a-model "Direct link to The query becomes a model") Going forward, Kimiko is able to return to her project in Catalog and run the model to get the most current results. From here, she can: * Manually run the model, which also runs tests and is versioned so Kimiko can track changes over time * Trigger a scheduled job to run the dbt model, like every Monday for her stakeholder report * Set up a Slack notification in case the job fails so she can recitfy any problems #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Browse our guides [Skip to main content](#__docusaurus_skipToContent_fallback) [Join our free, Fast track to dbt workshop on April 7 or 8. Build and run your first dbt models!](https://www.getdbt.com/resources/webinars/fast-track-to-dbt-workshop/?utm_medium=internal\&utm_source=docs\&utm_campaign=q1-2027_fast-track-dbt-workshop_aw\&utm_content=____\&utm_term=all_all__) [![dbt Logo](/img/dbt-logo.svg?v=2)![dbt Logo](/img/dbt-logo-light.svg?v=2)](https://docs.getdbt.com/index.md) [Version: v](#) * FusionCoreAll * [dbt Fusion engine (Latest)]() [Docs](#) * [Product docs](https://docs.getdbt.com/docs/introduction.md) * [References](https://docs.getdbt.com/reference/references-overview.md) * [Best practices](https://docs.getdbt.com/best-practices.md) * [Developer blog](https://docs.getdbt.com/blog) [Guides](https://docs.getdbt.com/guides.md)[APIs](https://docs.getdbt.com/docs/dbt-cloud-apis/overview.md) [Help](#) * [Release notes](https://docs.getdbt.com/docs/dbt-versions/dbt-cloud-release-notes.md) * [FAQs](https://docs.getdbt.com/docs/faqs.md) * [Support and billing](https://docs.getdbt.com/docs/dbt-support.md) * [Fusion Diaries](https://github.com/dbt-labs/dbt-fusion/discussions/categories/announcements) * [Courses](https://learn.getdbt.com) [Community](#) * [Join the dbt Community](https://docs.getdbt.com/community/join.md) * [Become a contributor](https://docs.getdbt.com/community/contribute.md) * [Community forum](https://docs.getdbt.com/community/forum) * [Events](https://docs.getdbt.com/community/events) * [Spotlight](https://docs.getdbt.com/community/spotlight.md) [Account](#) * [Log in to dbt](https://cloud.getdbt.com/) * [Create a free account](https://www.getdbt.com/signup) [Install VS Code extension](https://marketplace.visualstudio.com/items?itemName=dbtLabsInc.dbt) Search The dbt platform is the fastest and most reliable way to deploy dbt for scalable data transformation, while dbt Core powers open-source transformation workflows. Together, they provide a seamless analytics engineering experience. Explore our step-by-step guides, quickstart tutorials, and troubleshooting resources to get started with dbt and your data platform. Search guides... ##### Filter by ▼ Choose a topic \[ ]Adapter creation\[ ]AI\[ ]Amazon \+ View more (32)Select all Choose a level \[ ]Advanced\[ ]Beginner\[ ]Intermediate Select all Clear all ##### Get started with Fusion [![](/img/icons/zap.svg)](https://docs.getdbt.com/guides/fusion.md) ###### [Quickstart for the dbt Fusion engine](https://docs.getdbt.com/guides/fusion.md) [Start ](https://docs.getdbt.com/guides/fusion.md) [![](/img/icons/zap.svg)](https://docs.getdbt.com/guides/prepare-fusion-upgrade.md) ###### [Upgrade to Fusion part 1: Preparing to upgrade](https://docs.getdbt.com/guides/prepare-fusion-upgrade.md) [Start ](https://docs.getdbt.com/guides/prepare-fusion-upgrade.md) [![](/img/icons/zap.svg)](https://docs.getdbt.com/guides/upgrade-to-fusion.md) ###### [Upgrade to Fusion part 2: Making the move](https://docs.getdbt.com/guides/upgrade-to-fusion.md) [Start ](https://docs.getdbt.com/guides/upgrade-to-fusion.md) ##### Popular [![](/img/icons/snowflake.svg)](https://docs.getdbt.com/guides/snowflake.md) ###### [Quickstart for dbt and Snowflake](https://docs.getdbt.com/guides/snowflake.md) [Fusion compatible](https://docs.getdbt.com/guides/snowflake.md) [Start ](https://docs.getdbt.com/guides/snowflake.md) [![](/img/icons/databricks.svg)](https://docs.getdbt.com/guides/databricks.md) ###### [Quickstart for dbt and Databricks](https://docs.getdbt.com/guides/databricks.md) [Fusion compatible](https://docs.getdbt.com/guides/databricks.md) [Start ](https://docs.getdbt.com/guides/databricks.md) [![](/img/icons/bigquery.svg)](https://docs.getdbt.com/guides/bigquery.md) ###### [Quickstart for dbt and BigQuery](https://docs.getdbt.com/guides/bigquery.md) [Fusion compatible](https://docs.getdbt.com/guides/bigquery.md) [Start ](https://docs.getdbt.com/guides/bigquery.md) [![](/img/icons/redshift.svg)](https://docs.getdbt.com/guides/redshift.md) ###### [Quickstart for dbt and Redshift](https://docs.getdbt.com/guides/redshift.md) [Fusion compatible](https://docs.getdbt.com/guides/redshift.md) [Start ](https://docs.getdbt.com/guides/redshift.md) ##### Troubleshooting [![](/img/icons/guides.svg)](https://docs.getdbt.com/guides/debug-schema-names.md) ###### [Debug schema names](https://docs.getdbt.com/guides/debug-schema-names.md) [Start ](https://docs.getdbt.com/guides/debug-schema-names.md) [![](/img/icons/guides.svg)](https://docs.getdbt.com/guides/using-jinja.md) ###### [Use Jinja to improve your SQL code](https://docs.getdbt.com/guides/using-jinja.md) [Start ](https://docs.getdbt.com/guides/using-jinja.md) [![](/img/icons/guides.svg)](https://docs.getdbt.com/guides/debug-errors.md) ###### [Debug errors](https://docs.getdbt.com/guides/debug-errors.md) [Start ](https://docs.getdbt.com/guides/debug-errors.md) ##### Advanced use cases [![](/img/icons/zap.svg)](https://docs.getdbt.com/guides/fusion-package-compat.md) ###### [Fusion package upgrade guide](https://docs.getdbt.com/guides/fusion-package-compat.md) [Start ](https://docs.getdbt.com/guides/fusion-package-compat.md) [![](/img/icons/guides.svg)](https://docs.getdbt.com/guides/airflow-and-dbt-cloud.md) ###### [Airflow and dbt](https://docs.getdbt.com/guides/airflow-and-dbt-cloud.md) [Start ](https://docs.getdbt.com/guides/airflow-and-dbt-cloud.md) [![](/img/icons/guides.svg)](https://docs.getdbt.com/guides/adapter-creation.md) ###### [Build, test, document, and promote adapters](https://docs.getdbt.com/guides/adapter-creation.md) [Start ](https://docs.getdbt.com/guides/adapter-creation.md) [![](/img/icons/guides.svg)](https://docs.getdbt.com/guides/core-migration-1.md) ###### [Move from dbt Core to the dbt platform: Get started](https://docs.getdbt.com/guides/core-migration-1.md) [Total estimated time: 3-4 hoursStart ](https://docs.getdbt.com/guides/core-migration-1.md) [![](/img/icons/guides.svg)](https://docs.getdbt.com/guides/create-new-materializations.md) ###### [Create new materializations](https://docs.getdbt.com/guides/create-new-materializations.md) [Start ](https://docs.getdbt.com/guides/create-new-materializations.md) [![](/img/icons/guides.svg)](https://docs.getdbt.com/guides/customize-schema-alias.md) ###### [Customize dbt models database, schema, and alias](https://docs.getdbt.com/guides/customize-schema-alias.md) [Start ](https://docs.getdbt.com/guides/customize-schema-alias.md) [![](/img/icons/guides.svg)](https://docs.getdbt.com/guides/refactoring-legacy-sql.md) ###### [Refactoring legacy SQL to dbt](https://docs.getdbt.com/guides/refactoring-legacy-sql.md) [Start ](https://docs.getdbt.com/guides/refactoring-legacy-sql.md) Get started #### Start building with dbt. The free dbt VS Code extension is the best way to develop locally with the dbt Fusion Engine. [Install free extension](https://marketplace.visualstudio.com/items?itemName=dbtLabsInc.dbt) [Request your demo](https://www.getdbt.com/contact) [![dbt Labs](/img/dbt-logo-light.svg?v=2)](https://docs.getdbt.com/index.md) ###### Resources [VS Code Extension](https://docs.getdbt.com/docs/about-dbt-extension.md) [Resource Hub](https://www.getdbt.com/resources) [dbt Learn](https://www.getdbt.com/dbt-learn) [Certification](https://www.getdbt.com/dbt-certification) [Developer Blog](https://docs.getdbt.com/blog) ###### Community [Join the Community](https://docs.getdbt.com/community/join.md) [Become a Contributor](https://docs.getdbt.com/community/contribute.md) [Open Source dbt Packages](https://hub.getdbt.com/) [Community Forum](https://docs.getdbt.com/community/forum) ###### Support [Contact Support](https://docs.getdbt.com/docs/dbt-support.md) [Professional Services](https://www.getdbt.com/services) [Find a Partner](https://www.getdbt.com/partner-directory) [System Status](https://status.getdbt.com/) ###### Connect with Us [](https://github.com/dbt-labs/docs.getdbt.com "GitHub") [](https://www.linkedin.com/company/dbtlabs/mycompany/ "LinkedIn") [](https://www.youtube.com/channel/UCVpBwKK-ecMEV75y1dYLE5w "YouTube") [](https://www.instagram.com/dbt_labs/ "Instagram") [](https://x.com/dbt_labs "X") [](https://bsky.app/profile/getdbt.com "Bluesky") [](https://www.getdbt.com/community/join-the-community/ "Community Slack") © 2026 dbt Labs, Inc. All Rights Reserved. [Terms of Service](https://www.getdbt.com/terms-of-use/) [Privacy Policy](https://www.getdbt.com/cloud/privacy-policy/) [Security](https://www.getdbt.com/security/) Cookie Settings --- ### Browse our guides [Back to guides](https://docs.getdbt.com/guides.md) [Menu ]() #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Build a data lakehouse with dbt Core and Dremio Cloud [Back to guides](https://docs.getdbt.com/guides.md) Dremio dbt Core Intermediate [Menu ]() #### Introduction[​](#introduction "Direct link to Introduction") This guide will demonstrate how to build a data lakehouse with dbt Core 1.5 or newer and Dremio Cloud. You can simplify and optimize your data infrastructure with dbt's robust transformation framework and Dremio’s open and easy data lakehouse. The integrated solution empowers companies to establish a strong data and analytics foundation, fostering self-service analytics and enhancing business insights while simplifying operations by eliminating the necessity to write complex Extract, Transform, and Load (ETL) pipelines. ##### Prerequisites[​](#prerequisites "Direct link to Prerequisites") * You must have a [Dremio Cloud](https://docs.dremio.com/cloud/) account. * You must have Python 3 installed. * You must have dbt Core v1.5 or newer [installed](//docs/local/install-dbt). * You must have the Dremio adapter 1.5.0 or newer [installed and configured](https://docs.getdbt.com/docs/local/connect-data-platform/dremio-setup.md) for Dremio Cloud. * You must have basic working knowledge of Git and the command line interface (CLI). #### Validate your environment[​](#validate-your-environment "Direct link to Validate your environment") Validate your environment by running the following commands in your CLI and verifying the results: ```shell $ python3 --version Python 3.11.4 # Must be Python 3 ``` ```shell $ dbt --version Core: - installed: 1.5.0 # Must be 1.5 or newer - latest: 1.6.3 - Update available! Your version of dbt-core is out of date! You can find instructions for upgrading here: https://docs.getdbt.com/docs/installation Plugins: - dremio: 1.5.0 - Up to date! # Must be 1.5 or newer ``` #### Getting started[​](#getting-started "Direct link to Getting started") 1. Clone the Dremio dbt Core sample project from the [GitHub repo](https://github.com/dremio-brock/DremioDBTSample/tree/master/dremioSamples). 2. In your integrated development environment (Studio IDE), open the relation.py file in the Dremio adapter directory: `$HOME/Library/Python/3.9/lib/python/site-packages/dbt/adapters/dremio/relation.py` 3. Find and update lines 51 and 52 to match the following syntax: ```python PATTERN = re.compile(r"""((?:[^."']|"[^"]*"|'[^']*')+)""") return ".".join(PATTERN.split(identifier)[1::2]) ``` The complete selection should look like this: ```python def quoted_by_component(self, identifier, componentName): if componentName == ComponentName.Schema: PATTERN = re.compile(r"""((?:[^."']|"[^"]*"|'[^']*')+)""") return ".".join(PATTERN.split(identifier)[1::2]) else: return self.quoted(identifier) ``` You need to update this pattern because the plugin doesn’t support schema names in Dremio containing dots and spaces. #### Build your pipeline[​](#build-your-pipeline "Direct link to Build your pipeline") 1. Create a `profiles.yml` file in the `$HOME/.dbt/profiles.yml` path and add the following configs: ```yaml dremioSamples: outputs: cloud_dev: dremio_space: dev dremio_space_folder: no_schema object_storage_path: dev object_storage_source: $scratch pat: cloud_host: api.dremio.cloud cloud_project_id: threads: 1 type: dremio use_ssl: true user: target: dev ``` 2. Execute the transformation pipeline: ```shell $ dbt run -t cloud_dev ``` If the above configurations have been implemented, the output will look something like this: ```shell 17:24:16 Running with dbt=1.5.0 17:24:17 Found 5 models, 0 tests, 0 snapshots, 0 analyses, 348 macros, 0 operations, 0 seed files, 2 sources, 0 exposures, 0 metrics, 0 groups 17:24:17 17:24:29 Concurrency: 1 threads (target='cloud_dev') 17:24:29 17:24:29 1 of 5 START sql view model Preparation.trips .................................. [RUN] 17:24:31 1 of 5 OK created sql view model Preparation. trips ............................. [OK in 2.61s] 17:24:31 2 of 5 START sql view model Preparation.weather ................................ [RUN] 17:24:34 2 of 5 OK created sql view model Preparation.weather ........................... [OK in 2.15s] 17:24:34 3 of 5 START sql view model Business.Transportation.nyc_trips .................. [RUN] 17:24:36 3 of 5 OK created sql view model Business.Transportation.nyc_trips ............. [OK in 2.18s] 17:24:36 4 of 5 START sql view model Business.Weather.nyc_weather ....................... [RUN] 17:24:38 4 of 5 OK created sql view model Business.Weather.nyc_weather .................. [OK in 2.09s] 17:24:38 5 of 5 START sql view model Application.nyc_trips_with_weather ................. [RUN] 17:24:41 5 of 5 OK created sql view model Application.nyc_trips_with_weather ............ [OK in 2.74s] 17:24:41 17:24:41 Finished running 5 view models in 0 hours 0 minutes and 24.03 seconds (24.03s). 17:24:41 17:24:41 Completed successfully 17:24:41 17:24:41 Done. PASS=5 WARN=0 ERROR=0 SKIP=0 TOTAL=5 ``` Now that you have a running environment and a completed job, you can view the data in Dremio and expand your code. This is a snapshot of the project structure in an Studio IDE: [![Cloned repo in an IDE](/img/guides/dremio/dremio-cloned-repo.png?v=2 "Cloned repo in an IDE")](#)Cloned repo in an IDE #### About the schema.yml[​](#about-the-schemayml "Direct link to About the schema.yml") The `schema.yml` file defines Dremio sources and models to be used and what data models are in scope. In this guides sample project, there are two data sources: 1. The `NYC-weather.csv` stored in the **Samples** database and 2. The `sample_data` from the **Samples database**. The models correspond to both weather and trip data respectively and will be joined for analysis. The sources can be found by navigating to the **Object Storage** section of the Dremio Cloud UI. [![NYC-weather.csv location in Dremio Cloud](/img/guides/dremio/dremio-nyc-weather.png?v=2 "NYC-weather.csv location in Dremio Cloud")](#)NYC-weather.csv location in Dremio Cloud #### About the models[​](#about-the-models "Direct link to About the models") **Preparation** — `preparation_trips.sql` and `preparation_weather.sql` are building views on top of the trips and weather data. **Business** — `business_transportation_nyc_trips.sql` applies some level of transformation on `preparation_trips.sql` view. `Business_weather_nyc.sql` has no transformation on the `preparation_weather.sql` view. **Application** — `application_nyc_trips_with_weather.sql` joins the output from the Business model. This is what your business users will consume. #### The Job output[​](#the-job-output "Direct link to The Job output") When you run the dbt job, it will create a **dev** space folder that has all the data assets created. This is what you will see in Dremio Cloud UI. Spaces in Dremio is a way to organize data assets which map to business units or data products. [![Dremio Cloud dev space](/img/guides/dremio/dremio-dev-space.png?v=2 "Dremio Cloud dev space")](#)Dremio Cloud dev space Open the **Application folder** and you will see the output of the simple transformation we did using dbt. [![Application folder transformation output](/img/guides/dremio/dremio-dev-application.png?v=2 "Application folder transformation output")](#)Application folder transformation output #### Query the data[​](#query-the-data "Direct link to Query the data") Now that you have run the job and completed the transformation, it's time to query your data. Click on the `nyc_trips_with_weather` view. That will take you to the SQL Runner page. Click **Show SQL Pane** on the upper right corner of the page. Run the following query: ```sql SELECT vendor_id, AVG(tip_amount) FROM dev.application."nyc_treips_with_weather" GROUP BY vendor_id ``` [![Sample output from SQL query](/img/guides/dremio/dremio-test-results.png?v=2 "Sample output from SQL query")](#)Sample output from SQL query This completes the integration setup and data is ready for business consumption. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Build, test, document, and promote adapters [Back to guides](https://docs.getdbt.com/guides.md) Adapter creation Advanced [Menu ]() #### Introduction[​](#introduction "Direct link to Introduction") Adapters are an essential component of dbt. At their most basic level, they are how dbt connects with the various supported data platforms. At a higher-level, dbt Core adapters strive to give analytics engineers more transferrable skills as well as standardize how analytics projects are structured. Gone are the days where you have to learn a new language or flavor of SQL when you move to a new job that has a different data platform. That is the power of adapters in dbt Core. Navigating and developing around the nuances of different databases can be daunting, but you are not alone. Visit [#adapter-ecosystem](https://getdbt.slack.com/archives/C030A0UF5LM) Slack channel for additional help beyond the documentation. ##### All databases are not the same[​](#all-databases-are-not-the-same "Direct link to All databases are not the same") There's a tremendous amount of work that goes into creating a database. Here is a high-level list of typical database layers (from the outermost layer moving inwards): * SQL API * Client Library / Driver * Server Connection Manager * Query parser * Query optimizer * Runtime * Storage Access Layer * Storage There's a lot more there than just SQL as a language. Databases (and data warehouses) are so popular because you can abstract away a great deal of the complexity from your brain to the database itself. This enables you to focus more on the data. dbt allows for further abstraction and standardization of the outermost layers of a database (SQL API, client library, connection manager) into a framework that both: * Opens database technology to less technical users (a large swath of a DBA's role has been automated, similar to how the vast majority of folks with websites today no longer have to be "[webmasters](https://en.wikipedia.org/wiki/Webmaster)"). * Enables more meaningful conversations about how data warehousing should be done. This is where dbt adapters become critical. ##### What needs to be adapted?[​](#what-needs-to-be-adapted "Direct link to What needs to be adapted?") dbt adapters are responsible for *adapting* dbt's standard functionality to a particular database. Our prototypical database and adapter are PostgreSQL and dbt-postgres, and most of our adapters are somewhat based on the functionality described in dbt-postgres. Connecting dbt to a new database will require a new adapter to be built or an existing adapter to be extended. The outermost layers of a database map roughly to the areas in which the dbt adapter framework encapsulates inter-database differences. ##### SQL API[​](#sql-api "Direct link to SQL API") Even amongst ANSI-compliant databases, there are differences in the SQL grammar. Here are some categories and examples of SQL statements that can be constructed differently: | Category | Area of differences | Examples | | -------------------------------------------- | ------------------------------------------------------------------------------------------------ | ----------------------------------------------------------------------------------------------------------------------------------------------------------- | | Statement syntax | The use of `IF EXISTS` | \* `IF EXISTS, DROP TABLE`\* `DROP
IF EXISTS` | | Workflow definition & semantics | Incremental updates | \* `MERGE`\* `DELETE; INSERT` | | Relation and column attributes/configuration | Database-specific materialization configs | \* `DIST = ROUND_ROBIN` (Synapse)\* `DIST = EVEN` (Redshift) | | Permissioning | Grant statements that can only take one grantee at a time vs those that accept lists of grantees | \* `grant SELECT on table dinner.corn to corn_kid, everyone`\* `grant SELECT on table dinner.corn to corn_kid; grant SELECT on table dinner.corn to everyone` | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | ##### Python Client Library & Connection Manager[​](#python-client-library--connection-manager "Direct link to Python Client Library & Connection Manager") The other big category of inter-database differences comes with how the client connects to the database and executes queries against the connection. To integrate with dbt, a data platform must have a pre-existing python client library or support ODBC, using a generic python library like pyodbc. | Category | Area of differences | Examples | | ---------------------------- | ----------------------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------- | | Credentials & authentication | Authentication | \* Username & password\* MFA with `boto3` or Okta token | | Connection opening/closing | Create a new connection to db | \* `psycopg2.connect(connection_string)`\* `google.cloud.bigquery.Client(...)` | | Inserting local data | Load seed .`csv` files into Python memory | \* `google.cloud.bigquery.Client.load_table_from_file(...)` (BigQuery)\* `INSERT ... INTO VALUES ...` prepared statement (most other databases) | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | ##### How dbt encapsulates and abstracts these differences[​](#how-dbt-encapsulates-and-abstracts-these-differences "Direct link to How dbt encapsulates and abstracts these differences") Differences between databases are encoded into discrete areas: | Components | Code Path | Function | | ---------------- | ------------------------------------------------- | --------------------------------------------------------------------------------------- | | Python classes | `adapters/` | Configuration (Refer to \[Python classes]\(#python classes) | | Macros | `include//macros/adapters/` | SQL API & statement syntax (for example, how to create schema or how to get table info) | | Materializations | `include//macros/materializations/` | Table/view/snapshot/ workflow definitions | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | ###### Python classes[​](#python-classes "Direct link to Python classes") These classes implement all the methods responsible for: * Connecting to a database and issuing queries. * Providing dbt with database-specific configuration information. | Class | Description | | ------------------------ | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | AdapterClass | High-level configuration type conversion and any database-specific python methods needed | | AdapterCredentials | Typed dictionary of possible profiles and associated methods | | AdapterConnectionManager | All the methods responsible for connecting to a database and issuing queries | | AdapterRelation | How relation names should be rendered, printed, and quoted. Do relation names use all three parts? `catalog.model_name` (two-part name) or `database.schema.model_name` (three-part name) | | AdapterColumn | How names should be rendered, and database-specific properties | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | ###### Macros[​](#macros "Direct link to Macros") A set of *macros* responsible for generating SQL that is compliant with the target database. ###### Materializations[​](#materializations "Direct link to Materializations") A set of *materializations* and their corresponding helper macros defined in dbt using Jinja and SQL. They codify for dbt how model files should be persisted into the database. ##### Adapter Architecture[​](#adapter-architecture "Direct link to Adapter Architecture") Below is a flow diagram illustrating how a `dbt run` command works with the `dbt-postgres` adapter. It shows the relationship between `dbt-core`, `dbt-adapters`, and individual adapters. [![Diagram of adapter architecture](/img/adapter-guide/adapter-architecture-diagram.png?v=2 "Diagram of adapter architecture")](#)Diagram of adapter architecture #### Prerequisites[​](#prerequisites "Direct link to Prerequisites") It is very important that you have the right skills, and understand the level of difficulty required to make an adapter for your data platform. The more you can answer Yes to the below questions, the easier your adapter development (and user-) experience will be. See the [New Adapter Information Sheet wiki](https://github.com/dbt-labs/dbt-core/wiki/New-Adapter-Information-Sheet) for even more specific questions. ##### Training[​](#training "Direct link to Training") * The developer (and any product managers) ideally will have substantial experience as an end-user of dbt. If not, it is highly advised that you at least take the [dbt Fundamentals](https://learn.getdbt.com/courses/dbt-fundamentals) and [Advanced Materializations](https://learn.getdbt.com/courses/advanced-materializations) course. ##### Database[​](#database "Direct link to Database") * Does the database complete transactions fast enough for interactive development? * Can you execute SQL against the data platform? * Is there a concept of schemas? * Does the data platform support ANSI SQL, or at least a subset? ##### Driver / Connection Library[​](#driver--connection-library "Direct link to Driver / Connection Library") * Is there a Python-based driver for interacting with the database that is db API 2.0 compliant (e.g. Psycopg2 for Postgres, pyodbc for SQL Server) * Does it support: prepared statements, multiple statements, or single sign on token authorization to the data platform? ##### Open source software[​](#open-source-software "Direct link to Open source software") * Does your organization have an established process for publishing open source software? It is easiest to build an adapter for dbt when the following the data warehouse/platform in question has: * a conventional ANSI-SQL interface (or as close to it as possible), * a mature connection library/SDK that uses ODBC or Python DB 2 API, and * a way to enable developers to iterate rapidly with both quick reads and writes ##### Maintaining your new adapter[​](#maintaining-your-new-adapter "Direct link to Maintaining your new adapter") When your adapter becomes more popular, and people start using it, you may quickly become the maintainer of an increasingly popular open source project. With this new role, comes some unexpected responsibilities that not only include code maintenance, but also working with a community of users and contributors. To help people understand what to expect of your project, you should communicate your intentions early and often in your adapter documentation or README. Answer questions like, Is this experimental work that people should use at their own risk? Or is this production-grade code that you're committed to maintaining into the future? ###### Keeping the code compatible with dbt Core[​](#keeping-the-code-compatible-with-dbt-core "Direct link to Keeping the code compatible with dbt Core") An adapter is compatible with dbt Core if it has correctly implemented the interface defined in [dbt-adapters](https://github.com/dbt-labs/dbt-adapters/) and is tested by [dbt-tests-adapters](https://github.com/dbt-labs/dbt-adapters/tree/main/dbt-tests-adapter). Prior to dbt Core version 1.8, this interface was contained in `dbt-core`. New minor version releases of `dbt-adapters` may include changes to the Python interface for adapter plugins, as well as new or updated test cases. The maintainers of `dbt-adapters` will clearly communicate these changes in documentation and release notes, and they will aim for backwards compatibility whenever possible. Patch releases of `dbt-adapters` will *not* include breaking changes or new features to adapter-facing code. ###### Versioning and releasing your adapter[​](#versioning-and-releasing-your-adapter "Direct link to Versioning and releasing your adapter") dbt Labs strongly recommends you to adopt the following approach when versioning and releasing your plugin. * Declare major version compatibility with `dbt-adapters` and only set a boundary on the minor version if there is some known reason. * Do not import or rely on code from `dbt-core`. * Aim to release a new minor version of your plugin as you add substantial new features. Typically, this will be triggered by adding support for new features released in `dbt-adapters` or by changes to the data platform itself. * While your plugin is new and you're iterating on features, aim to offer backwards compatibility and deprecation notices for at least one minor version. As your plugin matures, aim to leave backwards compatibility and deprecation notices in place until the next major version (dbt Core v2). * Release patch versions of your plugins whenever needed. These patch releases should only contain fixes. note Prior to dbt Core version 1.8, we recommended that the minor version of your plugin should match the minor version in `dbt-core` (for example, 1.1.x). #### Build a new adapter[​](#build-a-new-adapter "Direct link to Build a new adapter") This step will walk you through the first creating the necessary adapter classes and macros, and provide some resources to help you validate that your new adapter is working correctly. Make sure you've familiarized yourself with the previous steps in this guide. Once the adapter is passing most of the functional tests in the previous "Testing a new adapter" step, please let the community know that is available to use by adding the adapter to the ["Supported Data Platforms"](https://docs.getdbt.com/docs/supported-data-platforms.md) page by following the steps given in "Documenting your adapter. For any questions you may have, don't hesitate to ask in the [#adapter-ecosystem](https://getdbt.slack.com/archives/C030A0UF5LM) Slack channel. The community is very helpful and likely has experienced a similar issue as you. ##### Scaffolding a new adapter[​](#scaffolding-a-new-adapter "Direct link to Scaffolding a new adapter") To create a new adapter plugin from scratch, you can use the [dbt-database-adapter-scaffold](https://github.com/dbt-labs/dbt-database-adapter-scaffold) to trigger an interactive session which will generate a scaffolding for you to build upon. Example usage: ```text $ cookiecutter gh:dbt-labs/dbt-database-adapter-scaffold ``` The generated boilerplate starting project will include a basic adapter plugin file structure, examples of macros, high level method descriptions, etc. One of the most important choices you will make during the cookiecutter generation will revolve around the field for `is_sql_adapter` which is a boolean used to correctly apply imports for either a `SQLAdapter` or `BaseAdapter`. Knowing which you will need requires a deeper knowledge of your selected database but a few good guides for the choice are. * Does your database have a complete SQL API? Can it perform tasks using SQL such as creating schemas, dropping schemas, querying an `information_schema` for metadata calls? If so, it is more likely to be a SQLAdapter where you set `is_sql_adapter` to `True`. * Most adapters do fall under SQL adapters which is why we chose it as the default `True` value. * It is very possible to build out a fully functional `BaseAdapter`. This will require a little more ground work as it doesn't come with some prebuilt methods the `SQLAdapter` class provides. See `dbt-bigquery` as a good guide. ##### Implementation details[​](#implementation-details "Direct link to Implementation details") Regardless if you decide to use the cookiecutter template or manually create the plugin, this section will go over each method that is required to be implemented. The following table provides a high-level overview of the classes, methods, and macros you may have to define for your data platform. | File | Component | Purpose | | ------------------------------------------------- | ------------------------------------------------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | `./setup.py` | `setup()` function | adapter meta-data (package name, version, author, homepage, etc) | | `myadapter/dbt/adapters/myadapter/__init__.py` | `AdapterPlugin` | bundle all the information below into a dbt plugin | | `myadapter/dbt/adapters/myadapter/connections.py` | `MyAdapterCredentials` class | parameters to connect to and configure the database, via a the chosen Python driver | | `myadapter/dbt/adapters/myadapter/connections.py` | `MyAdapterConnectionManager` class | telling dbt how to interact with the database w.r.t opening/closing connections, executing queries, and fetching data. Effectively a wrapper around the db API or driver. | | `myadapter/dbt/include/bigquery/` | a dbt project of macro "overrides" in the format of "myadapter\_\_" | any differences in SQL syntax for regular db operations will be modified here from the global\_project (e.g. "Create Table As Select", "Get all relations in the current schema", etc) | | `myadapter/dbt/adapters/myadapter/impl.py` | `MyAdapterConfig` | database- and relation-level configs and | | `myadapter/dbt/adapters/myadapter/impl.py` | `MyAdapterAdapter` | for changing *how* dbt performs operations like macros and other needed Python functionality | | `myadapter/dbt/adapters/myadapter/column.py` | `MyAdapterColumn` | for defining database-specific column such as datatype mappings | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | ##### Editing `setup.py`[​](#editing-setuppy "Direct link to editing-setuppy") Edit the file at `myadapter/setup.py` and fill in the missing information. You can skip this step if you passed the arguments for `email`, `url`, `author`, and `dependencies` to the cookiecutter template script. If you plan on having nested macro folder structures, you may need to add entries to `package_data` so your macro source files get installed. ##### Editing the connection manager[​](#editing-the-connection-manager "Direct link to Editing the connection manager") Edit the connection manager at `myadapter/dbt/adapters/myadapter/connections.py`. This file is defined in the sections below. ###### The Credentials class[​](#the-credentials-class "Direct link to The Credentials class") The credentials class defines all of the database-specific credentials (e.g. `username` and `password`) that users will need in the [connection profile](https://docs.getdbt.com/docs/supported-data-platforms.md) for your new adapter. Each credentials contract should subclass dbt.adapters.base.Credentials, and be implemented as a python dataclass. Note that the base class includes required database and schema fields, as dbt uses those values internally. For example, if your adapter requires a host, integer port, username string, and password string, but host is the only required field, you'd add definitions for those new properties to the class as types, like this: connections.py ```python from dataclasses import dataclass from typing import Optional from dbt.adapters.base import Credentials @dataclass class MyAdapterCredentials(Credentials): host: str port: int = 1337 username: Optional[str] = None password: Optional[str] = None @property def type(self): return 'myadapter' @property def unique_field(self): """ Hashed and included in anonymous telemetry to track adapter adoption. Pick a field that can uniquely identify one team/organization building with this adapter """ return self.host def _connection_keys(self): """ List of keys to display in the `dbt debug` output. """ return ('host', 'port', 'database', 'username') ``` There are a few things you can do to make it easier for users when connecting to your database: * Be sure to implement the Credentials' `_connection_keys` method shown above. This method will return the keys that should be displayed in the output of the `dbt debug` command. As a general rule, it's good to return all the arguments used in connecting to the actual database except the password (even optional arguments). * Create a `profile_template.yml` to enable configuration prompts for a brand-new user setting up a connection profile via the [`dbt init` command](https://docs.getdbt.com/reference/commands/init.md). You will find more details in the following steps. * You may also want to define an `ALIASES` mapping on your Credentials class to include any config names you want users to be able to use in place of 'database' or 'schema'. For example if everyone using the MyAdapter database calls their databases "collections", you might do: connections.py ```python @dataclass class MyAdapterCredentials(Credentials): host: str port: int = 1337 username: Optional[str] = None password: Optional[str] = None ALIASES = { 'collection': 'database', } ``` Then users can use `collection` OR `database` in their `profiles.yml`, `dbt_project.yml`, or `config()` calls to set the database. ###### `ConnectionManager` class methods[​](#connectionmanager-class-methods "Direct link to connectionmanager-class-methods") Once credentials are configured, you'll need to implement some connection-oriented methods. They are enumerated in the SQLConnectionManager docstring, but an overview will also be provided here. **Methods to implement:** * `open` * `get_response` * `cancel` * `exception_handler` * `standardize_grants_dict` ###### `open(cls, connection)`[​](#opencls-connection "Direct link to opencls-connection") `open()` is a classmethod that gets a connection object (which could be in any state, but will have a `Credentials` object with the attributes you defined above) and moves it to the 'open' state. Generally this means doing the following: * if the connection is open already, log and return it. * If a database needed changes to the underlying connection before re-use, that would happen here * create a connection handle using the underlying database library using the credentials * on success: * set connection.state to `'open'` * set connection.handle to the handle object * this is what must have a `cursor()` method that returns a cursor! * on error: * set connection.state to `'fail'` * set connection.handle to `None` * raise a `dbt.exceptions.FailedToConnectException` with the error and any other relevant information For example: connections.py ```python @classmethod def open(cls, connection): if connection.state == 'open': logger.debug('Connection is already open, skipping open.') return connection credentials = connection.credentials try: handle = myadapter_library.connect( host=credentials.host, port=credentials.port, username=credentials.username, password=credentials.password, catalog=credentials.database ) connection.state = 'open' connection.handle = handle return connection ``` ###### `get_response(cls, cursor)`[​](#get_responsecls-cursor "Direct link to get_responsecls-cursor") `get_response` is a classmethod that gets a cursor object and returns adapter-specific information about the last executed command. The return value should be an `AdapterResponse` object that includes items such as `code`, `rows_affected`, `bytes_processed`, and a summary `_message` for logging to stdout. connections.py ```python @classmethod def get_response(cls, cursor) -> AdapterResponse: code = cursor.sqlstate or "OK" rows = cursor.rowcount status_message = f"{code} {rows}" return AdapterResponse( _message=status_message, code=code, rows_affected=rows ) ``` ###### `cancel(self, connection)`[​](#cancelself-connection "Direct link to cancelself-connection") `cancel` is an instance method that gets a connection object and attempts to cancel any ongoing queries, which is database dependent. Some databases don't support the concept of cancellation, they can simply implement it via 'pass' and their adapter classes should implement an `is_cancelable` that returns False - On ctrl+c connections may remain running. This method must be implemented carefully, as the affected connection will likely be in use in a different thread. connections.py ```python def cancel(self, connection): tid = connection.handle.transaction_id() sql = 'select cancel_transaction({})'.format(tid) logger.debug("Cancelling query '{}' ({})".format(connection_name, pid)) _, cursor = self.add_query(sql, 'master') res = cursor.fetchone() logger.debug("Canceled query '{}': {}".format(connection_name, res)) ``` ###### `exception_handler(self, sql, connection_name='master')`[​](#exception_handlerself-sql-connection_namemaster "Direct link to exception_handlerself-sql-connection_namemaster") `exception_handler` is an instance method that returns a context manager that will handle exceptions raised by running queries, catch them, log appropriately, and then raise exceptions dbt knows how to handle. If you use the (highly recommended) `@contextmanager` decorator, you only have to wrap a `yield` inside a `try` block, like so: connections.py ```python @contextmanager def exception_handler(self, sql: str): try: yield except myadapter_library.DatabaseError as exc: self.release(connection_name) logger.debug('myadapter error: {}'.format(str(e))) raise dbt.exceptions.DatabaseException(str(exc)) except Exception as exc: logger.debug("Error running SQL: {}".format(sql)) logger.debug("Rolling back transaction.") self.release(connection_name) raise dbt.exceptions.RuntimeException(str(exc)) ``` ###### `standardize_grants_dict(self, grants_table: agate.Table) -> dict`[​](#standardize_grants_dictself-grants_table-agatetable---dict "Direct link to standardize_grants_dictself-grants_table-agatetable---dict") `standardize_grants_dict` is an method that returns the dbt-standardized grants dictionary that matches how users configure grants now in dbt. The input is the result of `SHOW GRANTS ON {{model}}` call loaded into an agate table. If there's any massaging of agate table containing the results, of `SHOW GRANTS ON {{model}}`, that can't easily be accomplished in SQL, it can be done here. For example, the SQL to show grants *should* filter OUT any grants TO the current user/role (e.g. OWNERSHIP). If that's not possible in SQL, it can be done in this method instead. impl.py ```python @available def standardize_grants_dict(self, grants_table: agate.Table) -> dict: """ :param grants_table: An agate table containing the query result of the SQL returned by get_show_grant_sql :return: A standardized dictionary matching the `grants` config :rtype: dict """ grants_dict: Dict[str, List[str]] = {} for row in grants_table: grantee = row["grantee"] privilege = row["privilege_type"] if privilege in grants_dict.keys(): grants_dict[privilege].append(grantee) else: grants_dict.update({privilege: [grantee]}) return grants_dict ``` ##### Editing the adapter implementation[​](#editing-the-adapter-implementation "Direct link to Editing the adapter implementation") Edit the connection manager at `myadapter/dbt/adapters/myadapter/impl.py` Very little is required to implement the adapter itself. On some adapters, you will not need to override anything. On others, you'll likely need to override some of the `convert_*` classmethods, or override the `is_cancelable` classmethod on others to return `False`. ###### `datenow()`[​](#datenow "Direct link to datenow") This classmethod provides the adapter's canonical date function. This is not used but is required– anyway on all adapters. impl.py ```python @classmethod def date_function(cls): return 'datenow()' ``` ##### Editing SQL logic[​](#editing-sql-logic "Direct link to Editing SQL logic") dbt implements specific SQL operations using Jinja macros. While reasonable defaults are provided for many such operations (like `create_schema`, `drop_schema`, `create_table`, etc), you may need to override one or more of macros when building a new adapter. ###### Required macros[​](#required-macros "Direct link to Required macros") The following macros must be implemented, but you can override their behavior for your adapter using the "dispatch" pattern described below. Macros marked (required) do not have a valid default implementation, and are required for dbt to operate. * `alter_column_type` ([source](https://github.com/dbt-labs/dbt-core/blob/f988f76fccc1878aaf8d8631c05be3e9104b3b9a/core/dbt/include/global_project/macros/adapters/columns.sql#L37-L55)) * `check_schema_exists` ([source](https://github.com/dbt-labs/dbt-core/blob/f988f76fccc1878aaf8d8631c05be3e9104b3b9a/core/dbt/include/global_project/macros/adapters/metadata.sql#L43-L55)) * `create_schema` ([source](https://github.com/dbt-labs/dbt-core/blob/f988f76fccc1878aaf8d8631c05be3e9104b3b9a/core/dbt/include/global_project/macros/adapters/schema.sql#L1-L9)) * `drop_relation` ([source](https://github.com/dbt-labs/dbt-core/blob/f988f76fccc1878aaf8d8631c05be3e9104b3b9a/core/dbt/include/global_project/macros/adapters/relation.sql#L34-L42)) * `drop_schema` ([source](https://github.com/dbt-labs/dbt-core/blob/f988f76fccc1878aaf8d8631c05be3e9104b3b9a/core/dbt/include/global_project/macros/adapters/schema.sql#L12-L20)) * `get_columns_in_relation` ([source](https://github.com/dbt-labs/dbt-core/blob/f988f76fccc1878aaf8d8631c05be3e9104b3b9a/core/dbt/include/global_project/macros/adapters/columns.sql#L1-L8)) (required) * `list_relations_without_caching` ([source](https://github.com/dbt-labs/dbt-core/blob/f988f76fccc1878aaf8d8631c05be3e9104b3b9a/core/dbt/include/global_project/macros/adapters/metadata.sql#L58-L65)) (required) * `list_schemas` ([source](https://github.com/dbt-labs/dbt-core/blob/f988f76fccc1878aaf8d8631c05be3e9104b3b9a/core/dbt/include/global_project/macros/adapters/metadata.sql#L29-L40)) * `rename_relation` ([source](https://github.com/dbt-labs/dbt-core/blob/f988f76fccc1878aaf8d8631c05be3e9104b3b9a/core/dbt/include/global_project/macros/adapters/relation.sql#L56-L65)) * `truncate_relation` ([source](https://github.com/dbt-labs/dbt-core/blob/f988f76fccc1878aaf8d8631c05be3e9104b3b9a/core/dbt/include/global_project/macros/adapters/relation.sql#L45-L53)) * `current_timestamp` ([source](https://github.com/dbt-labs/dbt-core/blob/f988f76fccc1878aaf8d8631c05be3e9104b3b9a/core/dbt/include/global_project/macros/adapters/freshness.sql#L1-L8)) (required) * `copy_grants` ###### Adapter dispatch[​](#adapter-dispatch "Direct link to Adapter dispatch") Most modern databases support a majority of the standard SQL spec. There are some databases that *do not* support critical aspects of the SQL spec however, or they provide their own nonstandard mechanisms for implementing the same functionality. To account for these variations in SQL support, dbt provides a mechanism called [multiple dispatch](https://en.wikipedia.org/wiki/Multiple_dispatch) for macros. With this feature, macros can be overridden for specific adapters. This makes it possible to implement high-level methods (like "create table") in a database-specific way. adapters.sql ```jinja2 {# dbt will call this macro by name, providing any arguments #} {% macro create_table_as(temporary, relation, sql) -%} {# dbt will dispatch the macro call to the relevant macro #} {{ return( adapter.dispatch('create_table_as')(temporary, relation, sql) ) }} {%- endmacro %} {# If no macro matches the specified adapter, "default" will be used #} {% macro default__create_table_as(temporary, relation, sql) -%} ... {%- endmacro %} {# Example which defines special logic for Redshift #} {% macro redshift__create_table_as(temporary, relation, sql) -%} ... {%- endmacro %} {# Example which defines special logic for BigQuery #} {% macro bigquery__create_table_as(temporary, relation, sql) -%} ... {%- endmacro %} ``` The `adapter.dispatch()` macro takes a second argument, `packages`, which represents a set of "search namespaces" in which to find potential implementations of a dispatched macro. This allows users of community-supported adapters to extend or "shim" dispatched macros from common packages, such as `dbt-utils`, with adapter-specific versions in their own project or other installed packages. See: * "Shim" package examples: [`spark-utils`](https://github.com/dbt-labs/spark-utils), [`tsql-utils`](https://github.com/dbt-msft/tsql-utils) * [`adapter.dispatch` docs](https://docs.getdbt.com/reference/dbt-jinja-functions/dispatch.md) ###### Overriding adapter methods[​](#overriding-adapter-methods "Direct link to Overriding adapter methods") While much of dbt's adapter-specific functionality can be modified in adapter macros, it can also make sense to override adapter methods directly. In this example, assume that a database does not support a `cascade` parameter to `drop schema`. Instead, we can implement an approximation where we drop each relation and then drop the schema. impl.py ```python def drop_schema(self, relation: BaseRelation): relations = self.list_relations( database=relation.database, schema=relation.schema ) for relation in relations: self.drop_relation(relation) super().drop_schema(relation) ``` ###### Grants Macros[​](#grants-macros "Direct link to Grants Macros") See [this GitHub discussion](https://github.com/dbt-labs/dbt-core/discussions/5468) for information on the macros required for `GRANT` statements: ##### Behavior change flags[​](#behavior-change-flags "Direct link to Behavior change flags") Starting in `dbt-adapters==1.5.0` and `dbt-core==1.8.7`, adapter maintainers can implement their own behavior change flags. Refer to [Behavior changes](https://docs.getdbt.com/reference/global-configs/behavior-changes.md) for more information. Behavior Flags are not intended to be long-living feature flags. They should be implemented with the expectation that the behavior will be the default within an expected period of time. To implement a behavior change flag, you must provide a name for the flag, a default setting (`True` / `False`), an optional source, and a description and/or a link to the flag's documentation on docs.getdbt.com. We recommend having a description and documentation link whenever possible. The description and/or docs should provide end users context for why the flag exists, why they may see a warning, and why they may want to utilize the behavior flag. Behavior change flags can be implemented by overwriting `_behavior_flags()` on the adapter in `impl.py`: impl.py ```python class ABCAdapter(BaseAdapter): ... @property def _behavior_flags(self) -> List[BehaviorFlag]: return [ { "name": "enable_new_functionality_requiring_higher_permissions", "default": False, "source": "dbt-abc", "description": ( "The dbt-abc adapter is implementing a new method for sourcing metadata. " "This is a more performant way for dbt to source metadata but requires higher permissions on the platform. " "Enabling this without granting the requisite permissions will result in an error. " "This feature is expected to be required by Spring 2025." ), "docs_url": "https://docs.getdbt.com/reference/global-configs/behavior-changes#abc-enable_new_functionality_requiring_higher_permissions", } ] ``` Once a behavior change flag has been implemented, it can be referenced on the adapter both in `impl.py` and in Jinja macros: impl.py ```python class ABCAdapter(BaseAdapter): ... def some_method(self, *args, **kwargs): if self.behavior.enable_new_functionality_requiring_higher_permissions: # do the new thing else: # do the old thing ``` adapters.sql ```sql {% macro some_macro(**kwargs) %} {% if adapter.behavior.enable_new_functionality_requiring_higher_permissions %} {# do the new thing #} {% else %} {# do the old thing #} {% endif %} {% endmacro %} ``` Every time the behavior flag evaluates to `False,` it warns the user, informing them that a change will occur in the future. This warning doesn't display when the flag evaluates to `True` as the user is already in the new experience. Recognizing that the warnings can be disruptive and are not always necessary, you can evaluate the flag without triggering the warning. Simply append `.no_warn` to the end of the flag. impl.py ```python class ABCAdapter(BaseAdapter): ... def some_method(self, *args, **kwargs): if self.behavior.enable_new_functionality_requiring_higher_permissions.no_warn: # do the new thing else: # do the old thing ``` adapters.sql ```sql {% macro some_macro(**kwargs) %} {% if adapter.behavior.enable_new_functionality_requiring_higher_permissions.no_warn %} {# do the new thing #} {% else %} {# do the old thing #} {% endif %} {% endmacro %} ``` It's best practice to evaluate a behavior flag as few times as possible. This will make it easier to remove once the behavior change has matured. As a result, evaluating the flag earlier in the logic flow is easier. Then, take either the old or the new path. While this may create some duplication in code, using behavior flags in this way provides a safer way to implement a change, which we are already admitting is risky or even breaking in nature. ##### Other files[​](#other-files "Direct link to Other files") ###### `profile_template.yml`[​](#profile_templateyml "Direct link to profile_templateyml") In order to enable the [`dbt init` command](https://docs.getdbt.com/reference/commands/init.md) to prompt users when setting up a new project and connection profile, you should include a **profile template**. The filepath needs to be `dbt/include//profile_template.yml`. It's possible to provide hints, default values, and conditional prompts based on connection methods that require different supporting attributes. Users will also be able to include custom versions of this file in their own projects, with fixed values specific to their organization, to support their colleagues when using your dbt adapter for the first time. See examples: * [dbt-postgres](https://github.com/dbt-labs/dbt-postgres/blob/main/dbt/include/postgres/profile_template.yml) * [dbt-redshift](https://github.com/dbt-labs/dbt-redshift/blob/main/dbt/include/redshift/profile_template.yml) * [dbt-snowflake](https://github.com/dbt-labs/dbt-snowflake/blob/main/dbt/include/snowflake/profile_template.yml) * [dbt-bigquery](https://github.com/dbt-labs/dbt-bigquery/blob/main/dbt/include/bigquery/profile_template.yml) ###### `__version__.py`[​](#__version__py "Direct link to __version__py") To assure that `dbt --version` provides the latest dbt core version the adapter supports, be sure include a `__version__.py` file. The filepath will be `dbt/adapters//__version__.py`. We recommend using the latest dbt core version and as the adapter is made compatible with later versions, this file will need to be updated. For a sample file, check out this [example](https://github.com/dbt-labs/dbt-snowflake/blob/main/dbt/adapters/snowflake/__version__.py). It should be noted that both of these files are included in the bootstrapped output of the `dbt-database-adapter-scaffold` so when using the scaffolding, these files will be included. #### Test your adapter[​](#test-your-adapter "Direct link to Test your adapter") This document has two sections: 1. Refer to "About the testing framework" for a description of the standard framework that we maintain for using pytest together with dbt. It includes an example that shows the anatomy of a simple test case. 2. Refer to "Testing your adapter" for a step-by-step guide for using our out-of-the-box suite of "basic" tests, which will validate that your adapter meets a baseline of dbt functionality. ##### Testing prerequisites[​](#testing-prerequisites "Direct link to Testing prerequisites") * Your adapter must be compatible with dbt Core **v1.1** or newer * You should be familiar with **pytest**: ##### About the testing framework[​](#about-the-testing-framework "Direct link to About the testing framework") [dbt-adapters-tests](https://github.com/dbt-labs/dbt-adapters/tree/main/dbt-tests-adapter) offers a standard framework for running prebuilt functional tests, and for defining your own tests. The core testing framework is built using `pytest`, a mature and standard library for testing Python projects. It includes basic utilities for setting up pytest + dbt. These are used by all "prebuilt" functional tests, and make it possible to quickly write your own tests. Those utilities allow you to do three basic things: 1. **Quickly set up a dbt "project."** Define project resources via methods such as `models()` and `seeds()`. Use `project_config_update()` to pass configurations into `dbt_project.yml`. 2. **Define a sequence of dbt commands.** The most important utility is `run_dbt()`, which returns the [results](https://docs.getdbt.com/reference/dbt-classes.md#result-objects) of each dbt command. It takes a list of CLI specifiers (subcommand + flags), as well as an optional second argument, `expect_pass=False`, for cases where you expect the command to fail. 3. **Validate the results of those dbt commands.** For example, `check_relations_equal()` asserts that two database objects have the same structure and content. You can also write your own `assert` statements, by inspecting the results of a dbt command, or querying arbitrary database objects with `project.run_sql()`. You can see the full suite of utilities, with arguments and annotations, in [`util.py`](https://github.com/dbt-labs/dbt-core/blob/main/core/dbt/tests/util.py). You'll also see them crop up across a number of test cases. While all utilities are intended to be reusable, you won't need all of them for every test. In the example below, we'll show a simple test case that uses only a few utilities. ###### Example: a simple test case[​](#example-a-simple-test-case "Direct link to Example: a simple test case") This example will show you the anatomy of a test case using dbt + pytest. We will create reusable components, combine them to form a dbt "project", and define a sequence of dbt commands. Then, we'll use Python `assert` statements to ensure those commands succeed (or fail) as we expect. In ["Getting started running basic tests,"](#getting-started-running-basic-tests) we'll offer step-by-step instructions for installing and configuring `pytest`, so that you can run it on your own machine. For now, it's more important to see how the pieces of a test case fit together. This example includes a seed, a model, and two tests—one of which will fail. 1. Define Python strings that will represent the file contents in your dbt project. Defining these in a separate file enables you to reuse the same components across different test cases. The pytest name for this type of reusable component is "fixture." tests/functional/example/fixtures.py ```python # seeds/my_seed.csv my_seed_csv = """ id,name,some_date 1,Easton,1981-05-20T06:46:51 2,Lillian,1978-09-03T18:10:33 3,Jeremiah,1982-03-11T03:59:51 4,Nolan,1976-05-06T20:21:35 """.lstrip() # models/my_model.sql my_model_sql = """ select * from {{ ref('my_seed') }} union all select null as id, null as name, null as some_date """ # models/my_model.yml my_model_yml = """ version: 2 models: - name: my_model columns: - name: id data_tests: - unique - not_null # this test will fail """ ``` 2. Use the "fixtures" to define the project for your test case. These fixtures are always scoped to the **class**, where the class represents one test case—that is, one dbt project or scenario. (The same test case can be used for one or more actual tests, which we'll see in step 3.) Following the default pytest configurations, the file name must begin with `test_`, and the class name must begin with `Test`. tests/functional/example/test\_example\_failing\_test.py ```python import pytest from dbt.tests.util import run_dbt # our file contents from tests.functional.example.fixtures import ( my_seed_csv, my_model_sql, my_model_yml, ) # class must begin with 'Test' class TestExample: """ Methods in this class will be of two types: 1. Fixtures defining the dbt "project" for this test case. These are scoped to the class, and reused for all tests in the class. 2. Actual tests, whose names begin with 'test_'. These define sequences of dbt commands and 'assert' statements. """ # configuration in dbt_project.yml @pytest.fixture(scope="class") def project_config_update(self): return { "name": "example", "models": {"+materialized": "view"} } # everything that goes in the "seeds" directory @pytest.fixture(scope="class") def seeds(self): return { "my_seed.csv": my_seed_csv, } # everything that goes in the "models" directory @pytest.fixture(scope="class") def models(self): return { "my_model.sql": my_model_sql, "my_model.yml": my_model_yml, } # continues below ``` 3. Now that we've set up our project, it's time to define a sequence of dbt commands and assertions. We define one or more methods in the same file, on the same class (`TestExampleFailingTest`), whose names begin with `test_`. These methods share the same setup (project scenario) from above, but they can be run independently by pytest—so they shouldn't depend on each other in any way. tests/functional/example/test\_example\_failing\_test.py ```python # continued from above # The actual sequence of dbt commands and assertions # pytest will take care of all "setup" + "teardown" def test_run_seed_test(self, project): """ Seed, then run, then test. We expect one of the tests to fail An alternative pattern is to use pytest "xfail" (see below) """ # seed seeds results = run_dbt(["seed"]) assert len(results) == 1 # run models results = run_dbt(["run"]) assert len(results) == 1 # test tests results = run_dbt(["test"], expect_pass = False) # expect failing test assert len(results) == 2 # validate that the results include one pass and one failure result_statuses = sorted(r.status for r in results) assert result_statuses == ["fail", "pass"] @pytest.mark.xfail def test_build(self, project): """Expect a failing test""" # do it all results = run_dbt(["build"]) ``` 3. Our test is ready to run! The last step is to invoke `pytest` from your command line. We'll walk through the actual setup and configuration of `pytest` in the next section. terminal ```sh $ python3 -m pytest tests/functional/test_example.py =========================== test session starts ============================ platform ... -- Python ..., pytest-..., pluggy-... rootdir: ... plugins: ... tests/functional/test_example.py .X [100%] ======================= 1 passed, 1 xpassed in 1.38s ======================= ``` You can find more ways to run tests, along with a full command reference, in the [pytest usage docs](https://docs.pytest.org/how-to/usage.html). We've found the `-s` flag (or `--capture=no`) helpful to print logs from the underlying dbt invocations, and to step into an interactive debugger if you've added one. You can also use environment variables to set [global dbt configs](https://docs.getdbt.com/reference/global-configs/about-global-configs.md), such as `DBT_DEBUG` (to show debug-level logs). ##### Testing this adapter[​](#testing-this-adapter "Direct link to Testing this adapter") Anyone who installs `dbt-core`, and wishes to define their own test cases, can use the framework presented in the first section. The framework is especially useful for testing standard dbt behavior across different databases. To that end, we have built and made available a [package of reusable adapter test cases](https://github.com/dbt-labs/dbt-adapters/tree/main/dbt-tests-adapter), for creators and maintainers of adapter plugins. These test cases cover basic expected functionality, as well as functionality that frequently requires different implementations across databases. For the time being, this package is also located within the `dbt-core` repository, but separate from the `dbt-core` Python package. ##### Categories of tests[​](#categories-of-tests "Direct link to Categories of tests") In the course of creating and maintaining your adapter, it's likely that you will end up implementing tests that fall into three broad categories: 1. **Basic tests** that every adapter plugin is expected to pass. These are defined in `tests.adapter.basic`. Given differences across data platforms, these may require slight modification or reimplementation. Significantly overriding or disabling these tests should be with good reason, since each represents basic functionality expected by dbt users. For example, if your adapter does not support incremental models, you should disable the test, [by marking it with `skip` or `xfail`](https://docs.pytest.org/en/latest/how-to/skipping.html), as well as noting that limitation in any documentation, READMEs, and usage guides that accompany your adapter. 2. **Optional tests**, for second-order functionality that is common across plugins, but not required for basic use. Your plugin can opt into these test cases by inheriting existing ones, or reimplementing them with adjustments. For now, this category includes all tests located outside the `basic` subdirectory. More tests will be added as we convert older tests defined on dbt-core and mature plugins to use the standard framework. 3. **Custom tests**, for behavior that is specific to your adapter / data platform. Each data warehouse has its own specialties and idiosyncracies. We encourage you to use the same `pytest`-based framework, utilities, and fixtures to write your own custom tests for functionality that is unique to your adapter. If you run into an issue with the core framework, or the basic/optional test cases—or if you've written a custom test that you believe would be relevant and useful for other adapter plugin developers—please open an issue or PR in the `dbt-core` repository on GitHub. ##### Getting started running basic tests[​](#getting-started-running-basic-tests "Direct link to Getting started running basic tests") In this section, we'll walk through the three steps to start running our basic test cases on your adapter plugin: 1. Install dependencies 2. Set up and configure pytest 3. Define test cases ##### Install dependencies[​](#install-dependencies "Direct link to Install dependencies") You should already have a virtual environment with `dbt-core` and your adapter plugin installed. You'll also need to install: * [`pytest`](https://pypi.org/project/pytest/) * [`dbt-tests-adapter`](https://pypi.org/project/dbt-tests-adapter/), the set of common test cases * (optional) [`pytest` plugins](https://docs.pytest.org/en/7.0.x/reference/plugin_list.html)--we'll use `pytest-dotenv` below Or specify all dependencies in a requirements file like: dev\_requirements.txt ```txt pytest pytest-dotenv dbt-tests-adapter ``` ```sh python -m pip install -r dev_requirements.txt ``` ##### Set up and configure pytest[​](#set-up-and-configure-pytest "Direct link to Set up and configure pytest") First, set yourself up to run `pytest` by creating a file named `pytest.ini` at the root of your repository: pytest.ini ```python [pytest] filterwarnings = ignore:.*'soft_unicode' has been renamed to 'soft_str'*:DeprecationWarning ignore:unclosed file .*:ResourceWarning env_files = test.env # uses pytest-dotenv plugin # this allows you to store env vars for database connection in a file named test.env # rather than passing them in every CLI command, or setting in `PYTEST_ADDOPTS` # be sure to add "test.env" to .gitignore as well! testpaths = tests/functional # name per convention ``` Then, create a configuration file within your tests directory. In it, you'll want to define all necessary profile configuration for connecting to your data platform in local development and continuous integration. We recommend setting these values with environment variables, since this file will be checked into version control. tests/conftest.py ```python import pytest import os # Import the standard functional fixtures as a plugin # Note: fixtures with session scope need to be local pytest_plugins = ["dbt.tests.fixtures.project"] # The profile dictionary, used to write out profiles.yml # dbt will supply a unique schema per test, so we do not specify 'schema' here @pytest.fixture(scope="class") def dbt_profile_target(): return { 'type': '', 'threads': 1, 'host': os.getenv('HOST_ENV_VAR_NAME'), 'user': os.getenv('USER_ENV_VAR_NAME'), ... } ``` ##### Define test cases[​](#define-test-cases "Direct link to Define test cases") As in the example above, each test case is defined as a class, and has its own "project" setup. To get started, you can import all basic test cases and try running them without changes. tests/functional/adapter/test\_basic.py ```python import pytest from dbt.tests.adapter.basic.test_base import BaseSimpleMaterializations from dbt.tests.adapter.basic.test_singular_tests import BaseSingularTests from dbt.tests.adapter.basic.test_singular_tests_ephemeral import BaseSingularTestsEphemeral from dbt.tests.adapter.basic.test_empty import BaseEmpty from dbt.tests.adapter.basic.test_ephemeral import BaseEphemeral from dbt.tests.adapter.basic.test_incremental import BaseIncremental from dbt.tests.adapter.basic.test_generic_tests import BaseGenericTests from dbt.tests.adapter.basic.test_snapshot_check_cols import BaseSnapshotCheckCols from dbt.tests.adapter.basic.test_snapshot_timestamp import BaseSnapshotTimestamp from dbt.tests.adapter.basic.test_adapter_methods import BaseAdapterMethod class TestSimpleMaterializationsMyAdapter(BaseSimpleMaterializations): pass class TestSingularTestsMyAdapter(BaseSingularTests): pass class TestSingularTestsEphemeralMyAdapter(BaseSingularTestsEphemeral): pass class TestEmptyMyAdapter(BaseEmpty): pass class TestEphemeralMyAdapter(BaseEphemeral): pass class TestIncrementalMyAdapter(BaseIncremental): pass class TestGenericTestsMyAdapter(BaseGenericTests): pass class TestSnapshotCheckColsMyAdapter(BaseSnapshotCheckCols): pass class TestSnapshotTimestampMyAdapter(BaseSnapshotTimestamp): pass class TestBaseAdapterMethod(BaseAdapterMethod): pass ``` Finally, run pytest: ```sh python3 -m pytest tests/functional ``` ##### Modifying test cases[​](#modifying-test-cases "Direct link to Modifying test cases") You may need to make slight modifications in a specific test case to get it passing on your adapter. The mechanism to do this is simple: rather than simply inheriting the "base" test with `pass`, you can redefine any of its fixtures or test methods. For instance, on Redshift, we need to explicitly cast a column in the fixture input seed to use data type `varchar(64)`: tests/functional/adapter/test\_basic.py ```python import pytest from dbt.tests.adapter.basic.files import seeds_base_csv, seeds_added_csv, seeds_newcolumns_csv from dbt.tests.adapter.basic.test_snapshot_check_cols import BaseSnapshotCheckCols # set the datatype of the name column in the 'added' seed so it # can hold the '_update' that's added schema_seed_added_yml = """ version: 2 seeds: - name: added config: column_types: name: varchar(64) """ class TestSnapshotCheckColsRedshift(BaseSnapshotCheckCols): # Redshift defines the 'name' column such that it's not big enough # to hold the '_update' added in the test. @pytest.fixture(scope="class") def models(self): return { "base.csv": seeds_base_csv, "added.csv": seeds_added_csv, "seeds.yml": schema_seed_added_yml, } ``` As another example, the `dbt-bigquery` adapter asks users to "authorize" replacing a table with a view by supplying the `--full-refresh` flag. The reason: In the table materialization logic, a view by the same name must first be dropped; if the table query fails, the model will be missing. Knowing this possibility, the "base" test case offers a `require_full_refresh` switch on the `test_config` fixture class. For BigQuery, we'll switch it on: tests/functional/adapter/test\_basic.py ```python import pytest from dbt.tests.adapter.basic.test_base import BaseSimpleMaterializations class TestSimpleMaterializationsBigQuery(BaseSimpleMaterializations): @pytest.fixture(scope="class") def test_config(self): # effect: add '--full-refresh' flag in requisite 'dbt run' step return {"require_full_refresh": True} ``` It's always worth asking whether the required modifications represent gaps in perceived or expected dbt functionality. Are these simple implementation details, which any user of this database would understand? Are they limitations worth documenting? If, on the other hand, they represent poor assumptions in the "basic" test cases, which fail to account for a common pattern in other types of databases-—please open an issue or PR in the `dbt-core` repository on GitHub. ##### Running with multiple profiles[​](#running-with-multiple-profiles "Direct link to Running with multiple profiles") Some databases support multiple connection methods, which map to actually different functionality behind the scenes. For instance, the `dbt-spark` adapter supports connections to Apache Spark clusters *and* Databricks runtimes, which supports additional functionality out of the box, enabled by the Delta file format. tests/conftest.py ```python def pytest_addoption(parser): parser.addoption("--profile", action="store", default="apache_spark", type=str) # Using @pytest.mark.skip_profile('apache_spark') uses the 'skip_by_profile_type' # autouse fixture below def pytest_configure(config): config.addinivalue_line( "markers", "skip_profile(profile): skip test for the given profile", ) @pytest.fixture(scope="session") def dbt_profile_target(request): profile_type = request.config.getoption("--profile") elif profile_type == "databricks_sql_endpoint": target = databricks_sql_endpoint_target() elif profile_type == "apache_spark": target = apache_spark_target() else: raise ValueError(f"Invalid profile type '{profile_type}'") return target def apache_spark_target(): return { "type": "spark", "host": "localhost", ... } def databricks_sql_endpoint_target(): return { "type": "spark", "host": os.getenv("DBT_DATABRICKS_HOST_NAME"), ... } @pytest.fixture(autouse=True) def skip_by_profile_type(request): profile_type = request.config.getoption("--profile") if request.node.get_closest_marker("skip_profile"): for skip_profile_type in request.node.get_closest_marker("skip_profile").args: if skip_profile_type == profile_type: pytest.skip("skipped on '{profile_type}' profile") ``` If there are tests that *shouldn't* run for a given profile: tests/functional/adapter/basic.py ```python # Snapshots require access to the Delta file format, available on our Databricks connection, # so let's skip on Apache Spark @pytest.mark.skip_profile('apache_spark') class TestSnapshotCheckColsSpark(BaseSnapshotCheckCols): @pytest.fixture(scope="class") def project_config_update(self): return { "seeds": { "+file_format": "delta", }, "snapshots": { "+file_format": "delta", } } ``` Finally: ```sh python3 -m pytest tests/functional --profile apache_spark python3 -m pytest tests/functional --profile databricks_sql_endpoint ``` #### Document a new adapter[​](#document-a-new-adapter "Direct link to Document a new adapter") If you've already built, and tested your adapter, it's time to document it so the dbt community will know that it exists and how to use it. ##### Making your adapter available[​](#making-your-adapter-available "Direct link to Making your adapter available") Many community members maintain their adapter plugins under open source licenses. If you're interested in doing this, we recommend: * Hosting on a public git provider (for example, GitHub or Gitlab) * Publishing to [PyPI](https://pypi.org/) * Adding to the list of ["Supported Data Platforms"](https://docs.getdbt.com/docs/supported-data-platforms.md#community-supported) (more info below) ##### General Guidelines[​](#general-guidelines "Direct link to General Guidelines") To best inform the dbt community of the new adapter, you should contribute to the dbt's open-source documentation site, which uses the [Docusaurus project](https://docusaurus.io/). This is the site you're currently on! ##### Conventions[​](#conventions "Direct link to Conventions") Each `.md` file you create needs a header as shown below. The document id will also need to be added to the config file: `website/sidebars.js`. ```md --- title: "Documenting a new adapter" id: "documenting-a-new-adapter" --- ``` ##### Single Source of Truth[​](#single-source-of-truth "Direct link to Single Source of Truth") We ask our adapter maintainers to use the [docs.getdbt.com repo](https://github.com/dbt-labs/docs.getdbt.com) (i.e. this site) as the single-source-of-truth for documentation rather than having to maintain the same set of information in three different places. The adapter repo's `README.md` and the data platform's documentation pages should simply link to the corresponding page on this docs site. Keep reading for more information on what should and shouldn't be included on the dbt docs site. ##### Assumed Knowledge[​](#assumed-knowledge "Direct link to Assumed Knowledge") To simplify things, assume the reader of this documentation already knows how both dbt and your data platform works. There's already great material for how to learn dbt and the data platform out there. The documentation we're asking you to add should be what a user who is already profiecient in both dbt and your data platform would need to know in order to use both. Effectively that boils down to two things: how to connect, and how to configure. ##### Topics and Pages to Cover[​](#topics-and-pages-to-cover "Direct link to Topics and Pages to Cover") The following subjects need to be addressed across three pages of this docs site to have your data platform be listed on our documentation. After the corresponding pull request is merged, we ask that you link to these pages from your adapter repo's `README` as well as from your product documentation. To contribute, all you will have to do make the changes listed in the table below. | How To... | File to change within `/website/docs/` | Action | Info to include | | -------------------- | --------------------------------------------------------------- | ------ | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | Connect | `/docs/local/connect-data-platform/{MY-DATA-PLATFORM}-setup.md` | Create | Give all information needed to define a target in `~/.dbt/profiles.yml` and get `dbt debug` to connect to the database successfully. All possible configurations should be mentioned. | | Configure | `reference/resource-configs/{MY-DATA-PLATFORM}-configs.md` | Create | What options and configuration specific to your data platform do users need to know? e.g. table distribution and indexing options, column\_quoting policy, which incremental strategies are supported | | Discover and Install | `docs/supported-data-platforms.md` | Modify | Is it a vendor- or community- supported adapter? How to install Python adapter package? Ideally with pip and PyPI hosted package, but can also use `git+` link to GitHub Repo | | Add link to sidebar | `website/sidebars.js` | Modify | Add the document id to the correct location in the sidebar menu | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | For example say I want to document my new adapter: `dbt-ders`. For the "Connect" page, I will make a new Markdown file, `ders-setup.md` and add it to the `/website/docs/local/connect-data-platform/` directory. ##### Example PRs to add new adapter documentation[​](#example-prs-to-add-new-adapter-documentation "Direct link to Example PRs to add new adapter documentation") Below are some recent pull requests made by partners to document their data platform's adapter: * [TiDB](https://github.com/dbt-labs/docs.getdbt.com/pull/1309) * [SingleStore](https://github.com/dbt-labs/docs.getdbt.com/pull/1044) * [Firebolt](https://github.com/dbt-labs/docs.getdbt.com/pull/941) Note — Use the following re-usable component to auto-fill the frontmatter content on your new page: ```markdown import SetUpPages from '/snippets/_setup-pages-intro.md'; ``` #### Promote a new adapter[​](#promote-a-new-adapter "Direct link to Promote a new adapter") The most important thing here is recognizing that people are successful in the community when they join, first and foremost, to engage authentically. What does authentic engagement look like? It’s challenging to define explicit rules. One good rule of thumb is to treat people with dignity and respect. Contributors to the community should think of contribution *as the end itself,* not a means toward other business KPIs (leads, community members, etc.). [We are a mission-driven company.](https://www.getdbt.com/dbt-labs/values/) Some ways to know if you’re authentically engaging: * Is an engagement’s *primary* purpose of sharing knowledge and resources or building brand engagement? * Imagine you didn’t work at the org you do — can you imagine yourself still writing this? * Is it written in formal / marketing language, or does it sound like you, the human? ##### Who should join the dbt community slack?[​](#who-should-join-the-dbt-community-slack "Direct link to Who should join the dbt community slack?") * People who have insight into what it means to do hands-on [analytics engineering](https://www.getdbt.com/analytics-engineering/) work The dbt Community Slack workspace is fundamentally a place for analytics practitioners to interact with each other — the closer the users are in the community to actual data/analytics engineering work, the more natural their engagement will be (leading to better outcomes for partners and the community). * DevRel practitioners with strong focus DevRel practitioners often have a strong analytics background and a good understanding of the community. It’s essential to be sure they are focused on *contributing,* not on driving community metrics for partner org (such as signing people up for their slack or events). The metrics will rise naturally through authentic engagement. * Founder and executives who are interested in directly engaging with the community This is either incredibly successful or not at all depending on the profile of the founder. Typically, this works best when the founder has a practitioner-level of technical understanding and is interested in joining not to promote, but to learn and hear from users. * Software Engineers at partner products that are building and supporting integrations with either - Software Engineers at partner products that are building and supporting integrations with either dbt Core or the dbt Core or the dbt platform This is successful when the engineers are familiar with dbt as a product or at least have taken our training course. The Slack is often a place where end-user questions and feedback is initially shared, so it is recommended that someone technical from the team be present. There are also a handful of channels aimed at those building integrations, which tend to be a font of knowledge. ##### Who might struggle in the dbt community[​](#who-might-struggle-in-the-dbt-community "Direct link to Who might struggle in the dbt community") * People in marketing roles dbt Slack is not a marketing channel. Attempts to use it as such invariably fall flat and can even lead to people having a negative view of a product. This doesn’t mean that dbt can’t serve marketing objectives, but a long-term commitment to engagement is the only proven method to do this sustainably. * People in product roles The dbt Community can be an invaluable source of feedback on a product. There are two primary ways this can happen — organically (community members proactively suggesting a new feature) and via direct calls for feedback and user research. Immediate calls for engagement must be done in your dedicated #tools channel. Direct calls should be used sparingly, as they can overwhelm more organic discussions and feedback. ##### Who is the audience for an adapter release?[​](#who-is-the-audience-for-an-adapter-release "Direct link to Who is the audience for an adapter release?") A new adapter is likely to drive huge community interest from several groups of people: * People who are currently using the database that the adapter is supporting * People who may be adopting the database in the near future. * People who are interested in dbt development in general. The database users will be your primary audience and the most helpful in achieving success. Engage them directly in the adapter’s dedicated Slack channel. If one does not exist already, reach out in #channel-requests, and we will get one made for you and include it in an announcement about new channels. The final group is where non-slack community engagement becomes important. Twitter and LinkedIn are both great places to interact with a broad audience. A well-orchestrated adapter release can generate impactful and authentic engagement. ##### How to message the initial rollout and follow-up content[​](#how-to-message-the-initial-rollout-and-follow-up-content "Direct link to How to message the initial rollout and follow-up content") Tell a story that engages dbt users and the community. Highlight new use cases and functionality unlocked by the adapter in a way that will resonate with each segment. * Existing users of your technology who are new to dbt * Provide a general overview of the value dbt will deliver to your users. This can lean on dbt's messaging and talking points which are laid out in the [dbt viewpoint.](https://docs.getdbt.com/community/resources/viewpoint.md) * Give examples of a rollout that speaks to the overall value of dbt and your product. * Users who are already familiar with dbt and the community * Consider unique use cases or advantages your adapter provide over existing adapters. Who will be excited for this? * Contribute to the dbt Community and ensure that dbt users on your adapter are well supported (tutorial content, packages, documentation, etc). * Example of a rollout that is compelling for those familiar with dbt: [Firebolt](https://www.linkedin.com/feed/update/urn:li:activity:6879090752459182080/) ##### Tactically manage distribution of content about new or existing adapters[​](#tactically-manage-distribution-of-content-about-new-or-existing-adapters "Direct link to Tactically manage distribution of content about new or existing adapters") There are tactical pieces on how and where to share that help ensure success. * On slack: * \#i-made-this channel — this channel has a policy against “marketing” and “content marketing” posts, but it should be successful if you write your content with the above guidelines in mind. Even with that, it’s important to post here sparingly. * Your own database / tool channel — this is where the people who have opted in to receive communications from you and always a great place to share things that are relevant to them. * On social media: * Twitter * LinkedIn * Social media posts *from the author* or an individual connected to the project tend to have better engagement than posts from a company or organization account. * Ask your partner representative about: * Retweets and shares from the official dbt Labs accounts. * Flagging posts internally at dbt Labs to get individual employees to share. ###### Measuring engagement[​](#measuring-engagement "Direct link to Measuring engagement") You don’t need 1000 people in a channel to succeed, but you need at least a few active participants who can make it feel lived in. If you’re comfortable working in public, this could be members of your team, or it can be a few people who you know that are highly engaged and would be interested in participating. Having even 2 or 3 regulars hanging out in a channel is all that’s needed for a successful start and is, in fact, much more impactful than 250 people that never post. ##### How to announce a new adapter[​](#how-to-announce-a-new-adapter "Direct link to How to announce a new adapter") We’d recommend *against* boilerplate announcements and encourage finding a unique voice. That being said, there are a couple of things that we’d want to include: * A summary of the value prop of your database / technology for users who aren’t familiar. * The personas that might be interested in this news. * A description of what the adapter *is*. For example: > With the release of our new dbt adapter, you’ll be able to to use dbt to model and transform your data in \[name-of-your-org] * Particular or unique use cases or functionality unlocked by the adapter. * Plans for future / ongoing support / development. * The link to the documentation for using the adapter on the dbt Labs docs site. * An announcement blog. ###### Announcing new release versions of existing adapters[​](#announcing-new-release-versions-of-existing-adapters "Direct link to Announcing new release versions of existing adapters") This can vary substantially depending on the nature of the release but a good baseline is the types of release messages that [we put out in the #dbt-releases](https://getdbt.slack.com/archives/C37J8BQEL/p1651242161526509) channel. ![Full Release Post](/assets/images/0-full-release-notes-1cc8cb263cb178df48deda1f69875c99.png) Breaking this down: * Visually distinctive announcement - make it clear this is a release [![title](/img/adapter-guide/1-announcement.png?v=2 "title")](#)title * Short written description of what is in the release [![description](/img/adapter-guide/2-short-description.png?v=2 "description")](#)description * Links to additional resources [![more resources](/img/adapter-guide/3-additional-resources.png?v=2 "more resources")](#)more resources * Implementation instructions: [![more installation](/img/adapter-guide/4-installation.png?v=2 "more installation")](#)more installation * Contributor recognition (if applicable) [![thank yous](/img/adapter-guide/6-thank-contribs.png?v=2 "thank yous")](#)thank yous #### Build a trusted adapter[​](#build-a-trusted-adapter "Direct link to Build a trusted adapter") The Trusted Adapter Program exists to allow adapter maintainers to demonstrate to the dbt community that your adapter is trusted to be used in production. The very first data platform dbt supported was Redshift followed quickly by Postgres ([dbt-core#174](https://github.com/dbt-labs/dbt-core/pull/174)). In 2017, back when dbt Labs (née Fishtown Analytics) was still a data consultancy, we added support for Snowflake and BigQuery. We also turned dbt's database support into an adapter framework ([dbt-core#259](https://github.com/dbt-labs/dbt-core/pull/259/)), and a plugin system a few years later. For years, dbt Labs specialized in those four data platforms and became experts in them. However, the surface area of all possible databases, their respective nuances, and keeping them up-to-date and bug-free is a Herculean and/or Sisyphean task that couldn't be done by a single person or even a single team! Enter the dbt community which enables dbt Core to work on more than 30 different databases (32 as of Sep '22)! Free and open-source tools for the data professional are increasingly abundant. This is by-and-large a *good thing*, however it requires due diligence that wasn't required in a paid-license, closed-source software world. Before taking a dependency on an open-source project is is important to determine the answer to the following questions: 1. Does it work? 2. Does it meet my team's specific use case? 3. Does anyone "own" the code, or is anyone liable for ensuring it works? 4. Do bugs get fixed quickly? 5. Does it stay up-to-date with new Core features? 6. Is the usage substantial enough to self-sustain? 7. What risks do I take on by taking a dependency on this library? These are valid, important questions to answer—especially given that `dbt-core` itself only put out its first stable release (major version v1.0) in December 2021! Indeed, up until now, the majority of new user questions in database-specific channels are some form of: * "How mature is `dbt-`? Any gotchas I should be aware of before I start exploring?" * "has anyone here used `dbt-` for production models?" * "I've been playing with `dbt-` -- I was able to install and run my initial experiments. I noticed that there are certain features mentioned on the documentation that are marked as 'not ok' or 'not tested'. What are the risks? I'd love to make a statement on my team to adopt dbt, but I'm pretty sure questions will be asked around the possible limitations of the adapter or if there are other companies out there using dbt with Oracle DB in production, etc." There has been a tendency to trust the dbt Labs-maintained adapters over community- and vendor-supported adapters, but repo ownership is only one among many indicators of software quality. We aim to help our users feel well-informed as to the caliber of an adapter with a new program. ##### What it means to be trusted[​](#what-it-means-to-be-trusted "Direct link to What it means to be trusted") By opting into the below, you agree to this, and we take you at your word. dbt Labs reserves the right to remove an adapter from the trusted adapter list at any time, should any of the below guidelines not be met. ##### Feature Completeness[​](#feature-completeness "Direct link to Feature Completeness") To be considered for the Trusted Adapter Program, the adapter must cover the essential functionality of dbt Core given below, with best effort given to support the entire feature set. Essential functionality includes (but is not limited to the following features): * table, view, and seed materializations * dbt tests The adapter should have the required documentation for connecting and configuring the adapter. The dbt docs site should be the single source of truth for this information. These docs should be kept up-to-date. Proceed to the "Document a new adapter" step for more information. ##### Release cadence[​](#release-cadence "Direct link to Release cadence") Keeping an adapter up-to-date with the latest features of dbt, as defined in [dbt-adapters](https://github.com/dbt-labs/dbt-adapters), is an integral part of being a trusted adapter. We encourage adapter maintainers to keep track of new dbt-adapter releases and support new features relevant to their platform, ensuring users have the best version of dbt. Before [dbt Core version 1.8](https://docs.getdbt.com/docs/dbt-versions/core-upgrade/upgrading-to-v1.8.md#new-dbt-core-adapter-installation-procedure), adapter versions needed to match the semantic versioning of dbt Core. After v1.8, this is no longer required. This means users can use an adapter on v1.8+ with a different version of dbt Core v1.8+. For example, a user could use dbt-core v1.9 with dbt-postgres v1.8. ##### Community responsiveness[​](#community-responsiveness "Direct link to Community responsiveness") On a best effort basis, active participation and engagement with the dbt Community across the following forums: * Being responsive to feedback and supporting user enablement in dbt Community’s Slack workspace * Responding with comments to issues raised in public dbt adapter code repository * Merging in code contributions from community members as deemed appropriate ##### Security Practices[​](#security-practices "Direct link to Security Practices") Trusted adapters will not do any of the following: * Output to logs or file either access credentials information to or data from the underlying data platform itself. * Make API calls other than those expressly required for using dbt features (adapters may not add additional logging) * Obfuscate code and/or functionality so as to avoid detection Additionally, to avoid supply-chain attacks: * Use an automated service to keep Python dependencies up-to-date (such as Dependabot or similar), * Publish directly to PyPI from the dbt adapter code repository by using trusted CI/CD process (such as GitHub actions) * Restrict admin access to both the respective code (GitHub) and package (PyPI) repositories * Identify and mitigate security vulnerabilities by use of a static code analyzing tool (such as Snyk) as part of a CI/CD process ##### Other considerations[​](#other-considerations "Direct link to Other considerations") The adapter repository is: * open-souce licensed, * published to PyPI, and * automatically tests the codebase against dbt Lab's provided adapter test suite ##### How to get an adapter on the trusted list[​](#how-to-get-an-adapter-on-the-trusted-list "Direct link to How to get an adapter on the trusted list") Open an issue on the [docs.getdbt.com GitHub repository](https://github.com/dbt-labs/docs.getdbt.com) using the "Add adapter to Trusted list" template. In addition to contact information, it will ask confirm that you agree to the following. 1. my adapter meet the guidelines given above 2. I will make best reasonable effort that this continues to be so 3. checkbox: I acknowledge that dbt Labs reserves the right to remove an adapter from the trusted adapter list at any time, should any of the above guidelines not be met. The approval workflow is as follows: 1. create and populate the template-created issue 2. dbt Labs will respond as quickly as possible (maximally four weeks, though likely faster) 3. If approved, dbt Labs will create and merge a Pull request to formally add the adapter to the list. ##### Getting help for my trusted adapter[​](#getting-help-for-my-trusted-adapter "Direct link to Getting help for my trusted adapter") Ask your question in #adapter-ecosystem channel of the dbt community Slack. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Building dbt packages [Back to guides](https://docs.getdbt.com/guides.md) dbt Core Advanced [Menu ]() #### Introduction[​](#introduction "Direct link to Introduction") Creating packages is an **advanced use of dbt**. If you're new to the tool, we recommend that you first use the product for your own analytics before attempting to create a package for others. ##### Prerequisites[​](#prerequisites "Direct link to Prerequisites") A strong understanding of: * [packages](https://docs.getdbt.com/docs/build/packages.md) * administering a repository on GitHub * [semantic versioning](https://semver.org/) ##### Assess whether a package is the right solution[​](#assess-whether-a-package-is-the-right-solution "Direct link to Assess whether a package is the right solution") Packages typically contain either: * macros that solve a particular analytics engineering problem — for example, [auditing the results of a query](https://hub.getdbt.com/dbt-labs/audit_helper/latest/), [generating code](https://hub.getdbt.com/dbt-labs/codegen/latest/), or [adding additional schema tests to a dbt project](https://hub.getdbt.com/calogica/dbt_expectations/latest/). * models for a common dataset — for example a dataset for software products like [MailChimp](https://hub.getdbt.com/fivetran/mailchimp/latest/) or [Snowplow](https://hub.getdbt.com/dbt-labs/snowplow/latest/), or even models for metadata about your data stack like [Snowflake query spend](https://hub.getdbt.com/gitlabhq/snowflake_spend/latest/) and [the artifacts produced by `dbt run`](https://hub.getdbt.com/tailsdotcom/dbt_artifacts/latest/). In general, there should be a shared set of industry-standard metrics that you can model (e.g. email open rate). We also recommend ensuring your package is compatible with [Fusion](https://docs.getdbt.com/docs/fusion.md) and [dbt Core](https://docs.getdbt.com/docs/local/install-dbt.md). To ensure Fusion compatibility, you can follow the steps in the [Fusion package upgrade guide](https://docs.getdbt.com/guides/fusion-package-compat.md). Packages are *not* a good fit for sharing models that contain business-specific logic, for example, writing code for marketing attribution, or monthly recurring revenue. Instead, consider sharing a blog post and a link to a sample repo, rather than bundling this code as a package (here's our blog post on [marketing attribution](https://blog.getdbt.com/modeling-marketing-attribution/) as an example). #### Create your new project[​](#create-your-new-project "Direct link to Create your new project") Using the command line for package development We tend to use the command line interface for package development. The development workflow often involves installing a local copy of your package in another dbt project — at present dbt is not designed for this workflow. 1. Use the [dbt init](https://docs.getdbt.com/reference/commands/init.md) command to create a new dbt project, which will be your package: ```shell $ dbt init [package_name] ``` 2. Create a public GitHub¹ repo, named `dbt-`, e.g. `dbt-mailchimp`. Follow the GitHub instructions to link this to the dbt project you just created. 3. Update the `name:` of the project in `dbt_project.yml` to your package name, e.g. `mailchimp`. 4. Define the allowed dbt versions by using the [`require-dbt-version` config](https://docs.getdbt.com/reference/project-configs/require-dbt-version.md). ¹Currently, our package registry only supports packages that are hosted in GitHub. #### Develop your package[​](#develop-your-package "Direct link to Develop your package") We recommend that first-time package authors first develop macros and models for use in their own dbt project. Once your new package is created, you can get to work on moving them across, implementing some additional package-specific design patterns along the way. When working on your package, we often find it useful to install a local copy of the package in another dbt project — this workflow is described [here](https://discourse.getdbt.com/t/contributing-to-an-external-dbt-package/657). ##### Ensure Fusion compatibility[​](#ensure-fusion-compatibility "Direct link to Ensure Fusion compatibility") If you're building a package, we recommend you ensure it's compatible with [Fusion](https://docs.getdbt.com/docs/fusion.md) and [dbt Core](https://docs.getdbt.com/docs/local/install-dbt.md). To ensure Fusion compatibility, you can follow the steps in the [Fusion package upgrade guide](https://docs.getdbt.com/guides/fusion-package-compat.md). Doing so will ensure your package is compatible with dbt Fusion engine (and dbt Core), but will be displayed with a Fusion-compatible badge in dbt package hub. ##### Follow best practices[​](#follow-best-practices "Direct link to Follow best practices") *Modeling packages only* Use our [dbt coding conventions](https://github.com/dbt-labs/corp/blob/main/dbt_style_guide.md), our article on [how we structure our dbt projects](https://docs.getdbt.com/best-practices/how-we-structure/1-guide-overview.md), and our [best practices](https://docs.getdbt.com/best-practices.md) for all of our advice on how to build your dbt project. This is where it comes in especially handy to have worked on your own dbt project previously. ##### Make the location of raw data configurable[​](#make-the-location-of-raw-data-configurable "Direct link to Make the location of raw data configurable") *Modeling packages only* Not every user of your package is going to store their Mailchimp data in a schema named `mailchimp`. As such, you'll need to make the location of raw data configurable. We recommend using [sources](https://docs.getdbt.com/docs/build/sources.md) and [variables](https://docs.getdbt.com/docs/build/project-variables.md) to achieve this. Check out [this package](https://github.com/fivetran/dbt_facebook_ads_source/blob/main/models/src_facebook_ads.yml#L5-L6) for an example — notably, the README [includes instructions](https://github.com/fivetran/dbt_facebook_ads_source#configuration) on how to override the default schema from a `dbt_project.yml` file. ##### Install upstream packages from hub.getdbt.com[​](#install-upstream-packages-from-hubgetdbtcom "Direct link to Install upstream packages from hub.getdbt.com") If your package relies on another package (for example, you use some of the cross-database macros from [dbt-utils](https://hub.getdbt.com/dbt-labs/dbt_utils/latest/)), we recommend you install the package from [hub.getdbt.com](https://hub.getdbt.com), specifying a version range like so: packages.yml ```yaml packages: - package: dbt-labs/dbt_utils version: [">0.6.5", "0.7.0"] ``` When packages are installed from hub.getdbt.com, dbt is able to handle duplicate dependencies. ##### Implement cross-database compatibility[​](#implement-cross-database-compatibility "Direct link to Implement cross-database compatibility") Many SQL functions are specific to a particular database. For example, the function name and order of arguments to calculate the difference between two dates varies between Redshift, Snowflake and BigQuery, and no similar function exists on Postgres! If you wish to support multiple warehouses, we have a number of tricks up our sleeve: * We've written a number of macros that compile to valid SQL snippets on each of the original four adapters. Where possible, leverage these macros. * If you need to implement cross-database compatibility for one of your macros, use the [`adapter.dispatch` macro](https://docs.getdbt.com/reference/dbt-jinja-functions/dispatch.md) to achieve this. Check out the cross-database macros in dbt-utils for examples. * If you're working on a modeling package, you may notice that you need write different models for each warehouse (for example, if the EL tool you are working with stores data differently on each warehouse). In this case, you can write different versions of each model, and use the [`enabled` config](https://docs.getdbt.com/reference/resource-configs/enabled.md), in combination with [`target.type`](https://docs.getdbt.com/reference/dbt-jinja-functions/target.md) to enable the correct models — check out [this package](https://github.com/fivetran/dbt_facebook_ads_creative_history/blob/main/dbt_project.yml#L11-L16) as an example. If your package has only been written to work for one data warehouse, make sure you document this in your package README. ##### Use specific model names[​](#use-specific-model-names "Direct link to Use specific model names") *Modeling packages only* Many datasets have a concept of a "user" or "account" or "session". To make sure things are unambiguous in dbt, prefix all of your models with `[package_name]_`. For example, `mailchimp_campaigns.sql` is a good name for a model, whereas `campaigns.sql` is not. ##### Default to views[​](#default-to-views "Direct link to Default to views") *Modeling packages only* dbt makes it possible for users of your package to override your model materialization settings. In general, default to materializing models as `view`s instead of `table`s. The major exception to this is when working with data sources that benefit from incremental modeling (for example, web page views). Implementing incremental logic on behalf of your end users is likely to be helpful in this case. ##### Test and document your package[​](#test-and-document-your-package "Direct link to Test and document your package") It's critical that you [test](https://docs.getdbt.com/docs/build/data-tests.md) your models and sources. This will give your end users confidence that your package is actually working on top of their dataset as intended. Further, adding [documentation](https://docs.getdbt.com/docs/build/documentation.md) via descriptions will help communicate your package to end users, and benefit their stakeholders that use the outputs of this package. ##### Include useful GitHub artifacts[​](#include-useful-github-artifacts "Direct link to Include useful GitHub artifacts") Over time, we've developed a set of useful GitHub artifacts that make administering our packages easier for us. In particular, we ensure that we include: * A useful README, that has: * installation instructions that refer to the latest version of the package on hub.getdbt.com, and includes any configurations requires ([example](https://github.com/dbt-labs/segment)) * Usage examples for any macros ([example](https://github.com/dbt-labs/dbt-audit-helper#macros)) * Descriptions of the main models included in the package ([example](https://github.com/dbt-labs/snowplow)) * GitHub templates, including PR templates and issue templates ([example](https://github.com/dbt-labs/dbt-audit-helper/tree/master/.github)) #### Add integration tests[​](#add-integration-tests "Direct link to Add integration tests") *Optional* We recommend that you implement integration tests to confirm that the package works as expected — this is an even *more* advanced step, so you may find that you build up to this. This pattern can be seen most packages, including the [`audit-helper`](https://github.com/dbt-labs/dbt-audit-helper/tree/master/integration_tests) and [`snowplow`](https://github.com/dbt-labs/snowplow/tree/master/integration_tests) packages. As a rough guide: 1. Create a subdirectory named `integration_tests` 2. In this subdirectory, create a new dbt project — you can use the `dbt init` command to do this. However, our preferred method is to copy the files from an existing `integration_tests` project, like the ones [here](https://github.com/dbt-labs/dbt-codegen/tree/HEAD/integration_tests) (removing the contents of the `macros`, `models` and `tests` folders since they are project-specific) 3. Install the package in the `integration_tests` subdirectory by using the `local` syntax, and then running `dbt deps` packages.yml ```yml packages: - local: ../ # this means "one directory above the current directory" ``` 4. Add resources to the package (seeds, models, tests) so that you can successfully run your project, and compare the output with what you expect. The exact approach here will vary depending on your packages. In general you will find that you need to: * Add mock data via a [seed](https://docs.getdbt.com/docs/build/seeds.md) with a few sample (anonymized) records. Configure the `integration_tests` project to point to the seeds instead of raw data tables. * Add more seeds that represent the expected output of your models, and use the [dbt\_utils.equality](https://github.com/dbt-labs/dbt-utils#equality-source) test to confirm the output of your package, and the expected output matches. 5. Confirm that you can run `dbt run` and `dbt test` from your command line successfully. 6. (Optional) Use a CI tool, like CircleCI or GitHub Actions, to automate running your dbt project when you open a new Pull Request. For inspiration, check out one of our [CircleCI configs](https://github.com/dbt-labs/snowplow/blob/main/.circleci/config.yml), which runs tests against our four main warehouses. Note: this is an advanced step — if you are going down this path, you may find it useful to say hi on [dbt Slack](https://community.getdbt.com/). #### Deploy the docs for your package[​](#deploy-the-docs-for-your-package "Direct link to Deploy the docs for your package") *Optional* A dbt docs site can help a prospective user of your package understand the code you've written. As such, we recommend that you deploy the site generated by `dbt docs generate` and link to the deployed site from your package. The easiest way we've found to do this is to use [GitHub Pages](https://pages.github.com/). 1. On a new git branch, run `dbt docs generate`. If you have integration tests set up (above), use the integration-test project to do this. 2. Move the following files into a directory named `docs` ([example](https://github.com/fivetran/dbt_ad_reporting/tree/HEAD/docs)): `catalog.json`, `index.html`, `manifest.json`, `run_results.json`. 3. Merge these changes into the main branch 4. Enable GitHub pages on the repo in the settings tab, and point it to the “docs” subdirectory 5. GitHub should then deploy the docs at `.github.io/`, like so: [fivetran.github.io/dbt\_ad\_reporting](https://fivetran.github.io/dbt_ad_reporting/) #### Release your package[​](#release-your-package "Direct link to Release your package") Create a new [release](https://docs.github.com/en/github/administering-a-repository/managing-releases-in-a-repository) once you are ready for others to use your work! Be sure to use [semantic versioning](https://semver.org/) when naming your release. In particular, if new changes will cause errors for users of earlier versions of the package, be sure to use *at least* a minor release (e.g. go from `0.1.1` to `0.2.0`). The release notes should contain an overview of the changes introduced in the new version. Be sure to call out any changes that break the existing interface! #### Add the package to hub.getdbt.com[​](#add-the-package-to-hubgetdbtcom "Direct link to Add the package to hub.getdbt.com") Our package registry, [hub.getdbt.com](https://hub.getdbt.com/), gets updated by the [hubcap script](https://github.com/dbt-labs/hubcap). To add your package to hub.getdbt.com, create a PR on the [hubcap repository](https://github.com/dbt-labs/hubcap) to include it in the `hub.json` file. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Create Datadog events from dbt results [Back to guides](https://docs.getdbt.com/guides.md) Webhooks Advanced [Menu ]() #### Introduction[​](#introduction "Direct link to Introduction") This guide will teach you how to build and host a basic Python app which will add dbt job events to Datadog. To do this, when a dbt job completes it will create a log entry for each node that was run, containing all information about the node provided by the [Discovery API](https://docs.getdbt.com/docs/dbt-cloud-apis/discovery-schema-job-models.md). In this example, we will use [fly.io](https://fly.io) for hosting/running the service. fly.io is a platform for running full stack apps without provisioning servers etc. This level of usage should comfortably fit inside of the Free tier. You can also use an alternative tool such as [AWS Lambda](https://ademoverflow.com/en/posts/tutorial-fastapi-aws-lambda-serverless/) or [Google Cloud Run](https://github.com/sekR4/FastAPI-on-Google-Cloud-Run). ##### Prerequisites[​](#prerequisites "Direct link to Prerequisites") This guide assumes some familiarity with: * [dbt Webhooks](https://docs.getdbt.com/docs/deploy/webhooks.md) * CLI apps * Deploying code to a serverless code runner like fly.io or AWS Lambda #### Clone the `dbt-cloud-webhooks-datadog` repo[​](#clone-the-dbt-cloud-webhooks-datadog-repo "Direct link to clone-the-dbt-cloud-webhooks-datadog-repo") [This repository](https://github.com/dpguthrie/dbt-cloud-webhooks-datadog) contains the sample code for validating a webhook and creating logs in Datadog. #### Install `flyctl` and sign up for fly.io[​](#install-flyctl-and-sign-up-for-flyio "Direct link to install-flyctl-and-sign-up-for-flyio") Follow the directions for your OS in the [fly.io docs](https://fly.io/docs/hands-on/install-flyctl/), then from your command line, run the following commands: Switch to the directory containing the repo you cloned in step 1: ```shell #example: replace with your actual path cd ~/Documents/GitHub/dbt-cloud-webhooks-datadog ``` Sign up for fly.io: ```shell flyctl auth signup ``` Your console should show `successfully logged in as YOUR_EMAIL` when you're done, but if it doesn't then sign in to fly.io from your command line: ```shell flyctl auth login ``` #### Launch your fly.io app[​](#launch-your-flyio-app "Direct link to Launch your fly.io app") Launching your app publishes it to the web and makes it ready to catch webhook events: ```shell flyctl launch ``` 1. You will see a message saying that an existing `fly.toml` file was found. Type `y` to copy its configuration to your new app. 2. Choose an app name of your choosing, such as `YOUR_COMPANY-dbt-cloud-webhook-datadog`, or leave blank and one will be generated for you. Note that your name can only contain numbers, lowercase letters and dashes. 3. Choose a deployment region, and take note of the hostname that is generated (normally `APP_NAME.fly.dev`). 4. When asked if you would like to set up Postgresql or Redis databases, type `n` for each. 5. Type `y` when asked if you would like to deploy now. Sample output from the setup wizard: \`\` `joel@Joel-Labes dbt-cloud-webhooks-datadog % flyctl launch An existing fly.toml file was found for app dbt-cloud-webhooks-datadog ? Would you like to copy its configuration to the new app? Yes Creating app in /Users/joel/Documents/GitHub/dbt-cloud-webhooks-datadog Scanning source code Detected a Dockerfile app ? Choose an app name (leave blank to generate one): demo-dbt-cloud-webhook-datadog automatically selected personal organization: Joel Labes Some regions require a paid plan (fra, maa). See https://fly.io/plans to set up a plan. ? Choose a region for deployment: [Use arrows to move, type to filter] ? Choose a region for deployment: Sydney, Australia (syd) Created app dbtlabs-dbt-cloud-webhook-datadog in organization personal Admin URL: https://fly.io/apps/demo-dbt-cloud-webhook-datadog Hostname: demo-dbt-cloud-webhook-datadog.fly.dev ? Would you like to set up a Postgresql database now? No ? Would you like to set up an Upstash Redis database now? No Wrote config file fly.toml ? Would you like to deploy now? Yes` ##### 4. Create a Datadog API Key[​](#4-create-a-datadog-api-key "Direct link to 4. Create a Datadog API Key") [Create an API Key for your Datadog account](https://docs.datadoghq.com/account_management/api-app-keys/) and make note of it and your Datadog site (e.g. `datadoghq.com`) for later. #### Configure a new webhook in dbt[​](#configure-a-new-webhook-in-dbt "Direct link to Configure a new webhook in dbt") 1. See [Create a webhook subscription](https://docs.getdbt.com/docs/deploy/webhooks.md#create-a-webhook-subscription) for full instructions. Your event should be **Run completed**. 2. Set the webhook URL to the host name you created earlier (`APP_NAME.fly.dev`). 3. Make note of the Webhook Secret Key for later. *Do not test the endpoint*; it won't work until you have stored the auth keys (next step) #### Store secrets[​](#store-secrets "Direct link to Store secrets") The application requires four secrets to be set, using these names: * `DBT_CLOUD_SERVICE_TOKEN`: a dbt [personal access token](https://docs.getdbt.com/docs/dbt-cloud-apis/user-tokens.md) or [service account token](https://docs.getdbt.com/docs/dbt-cloud-apis/service-tokens.md) with at least the `Metdata Only` permission. * `DBT_CLOUD_AUTH_TOKEN`: the Secret Key for the dbt webhook you created earlier. * `DD_API_KEY`: the API key you created earlier. * `DD_SITE`: The Datadog site for your organisation, e.g. `datadoghq.com`. Set these secrets as follows, replacing `abc123` etc with actual values: ```shell flyctl secrets set DBT_CLOUD_SERVICE_TOKEN=abc123 DBT_CLOUD_AUTH_TOKEN=def456 DD_API_KEY=ghi789 DD_SITE=datadoghq.com ``` #### Deploy your app[​](#deploy-your-app "Direct link to Deploy your app") After you set your secrets, fly.io will redeploy your application. When it has completed successfully, go back to the dbt webhook settings and click **Test Endpoint**. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Create new materializations [Back to guides](https://docs.getdbt.com/guides.md) Advanced [Menu ]() #### Introduction[​](#introduction "Direct link to Introduction") The model materializations you're familiar with, `table`, `view`, and `incremental` are implemented as macros in a package that's distributed along with dbt. You can check out the [source code for these materializations](https://github.com/dbt-labs/dbt-adapters/tree/60005a0a2bd33b61cb65a591bc1604b1b3fd25d5/dbt/include/global_project/macros/materializations). If you need to create your own materializations, reading these files is a good place to start. Continue reading below for a deep-dive into dbt materializations. caution This is an advanced feature of dbt. Let us know if you need a hand! We're always happy to [chat](http://community.getdbt.com/). #### Creating a materialization[​](#creating-a-materialization "Direct link to Creating a materialization") Learn by video! For video tutorials on Materializations , go to dbt Learn and check out the [Materializations fundamentals](https://learn.getdbt.com/courses/materializations-fundamentals) [ course](https://learn.getdbt.com/courses/materializations-fundamentals). Materialization blocks make it possible for dbt to load custom materializations from packages. The materialization blocks work very much like `macro` blocks, with a couple of key exceptions. Materializations are defined as follows: ```sql {% materialization [materialization name], ["specified adapter" | default] %} ... {% endmaterialization %} ``` Materializations can be given a name, and they can be tied to a specific adapter. dbt will pick the materialization tied to the currently-in-use adapter if one exists, or it will fall back to the `default` adapter. In practice, this looks like: macros/my\_materialization.sql ```sql {% materialization my_materialization_name, default %} -- cross-adapter materialization... assume Redshift is not supported {% endmaterialization %} {% materialization my_materialization_name, adapter='redshift' %} -- override the materialization for Redshift {% endmaterialization %} ``` info dbt's ability to dynamically pick the correct materialization based on the active database target is called [multiple dispatch](https://en.wikipedia.org/wiki/Multiple_dispatch). This feature unlocks a whole world of cross-database compatibility features -- if you're interested in this, please let us know on Slack! ##### Anatomy of a materialization[​](#anatomy-of-a-materialization "Direct link to Anatomy of a materialization") Materializations are responsible for taking a dbt model SQL statement and turning it into a transformed dataset in a database. As such, materializations generally take the following shape: 1. Prepare the database for the new model 2. Run pre-hooks 3. Execute any SQL required to implement the desired materialization 4. Run post-model hooks 5. Clean up the database as required 6. Update the Relation cache Each of these tasks are explained in sections below. ##### Prepare the database[​](#prepare-the-database "Direct link to Prepare the database") Materializations are responsible for creating new tables or views in the database, or inserting/updating/deleting data from existing tables. As such, materializations need to know about the state of the database to determine exactly what SQL they should run. Here is some pseudocode for the "setup" phase of the **table** materialization: ```sql -- Refer to the table materialization (linked above) for an example of real syntax -- This code will not work and is only intended for demonstration purposes {% set existing = adapter.get_relation(this) %} {% if existing and existing.is_view %} {% do adapter.drop_relation(existing) %} {% endif %} ``` In this example, the `get_relation` method is used to fetch the state of the currently-executing model from the database. If the model exists as a view, then the view is dropped to make room for the table that will be built later in the materialization. This is a simplified example, and the setup phase for a materialization can become quite complicated indeed! When building a materialization, be sure to consider the state of the database and any supplied [flags](https://docs.getdbt.com/reference/dbt-jinja-functions/flags.md) (ie. `--full-refresh`) to ensure that the materialization code behaves correctly in different scenarios. ##### Run pre-hooks[​](#run-pre-hooks "Direct link to Run pre-hooks") Pre- and post-hooks can be specified for any model -- be sure that your materialization plays nicely with these settings. Two variables, `pre_hooks` and `post_hooks` are automatically injected into the materialization context. Invoke these hooks at the appropriate time with: ```sql ... {{ run_hooks(pre_hooks) }} .... ``` ##### Executing SQL[​](#executing-sql "Direct link to Executing SQL") Construct your materialization DML to account for the different permutations of table existence, materialization flags, etc. There are a number of [adapter functions](https://docs.getdbt.com/reference/dbt-jinja-functions/adapter.md) and context variables that can help you here. Be sure to consult the Reference section of this site for a full list of variables and functions at your disposal. ##### Run post-hooks[​](#run-post-hooks "Direct link to Run post-hooks") See the section above on pre-hooks for more information on running post-hooks. ##### Clean up[​](#clean-up "Direct link to Clean up") The "cleanup" phase of the materialization typically renames or drops relations and commits the transaction opened in "preparation" step above. The `table` materialization, for instance, executes the following cleanup code: ```text {{ drop_relation_if_exists(backup_relation) }} ``` Be sure to `commit` the transaction in the `cleanup` phase of the materialization with `{{ adapter.commit() }}`. If you do not commit this transaction, it will be rolled back by dbt and the transformations applied in your materialization will be discarded. ##### Update the Relation cache[​](#update-the-relation-cache "Direct link to Update the Relation cache") Materializations should [return](https://docs.getdbt.com/reference/dbt-jinja-functions/return.md) the list of Relations that they have created at the end of execution. dbt will use this list of Relations to update the relation cache in order to reduce the number of queries executed against the database's `information_schema`. If a list of Relations is not returned, then dbt will raise a Deprecation Warning and infer the created relation from the model's configured database, schema, and alias. macros/my\_view\_materialization.sql ```sql {%- materialization my_view, default -%} {%- set target_relation = api.Relation.create( identifier=this.identifier, schema=this.schema, database=this.database, type='view') -%} -- ... setup database ... -- ... run pre-hooks... -- build model {% call statement('main') -%} {{ create_view_as(target_relation, sql) }} {%- endcall %} -- ... run post-hooks ... -- ... clean up the database... -- Return the relations created in this materialization {{ return({'relations': [target_relation]}) }} {%- endmaterialization -%} ``` If a materialization solely creates a single relation, then returning that relation at the end of the materialization is sufficient to synchronize the dbt Relation cache. If the materialization *renames* or *drops* Relations other than the relation returned by the materialization, then additional work is required to keep the cache in sync with the database. To explicitly remove a relation from the cache, use [adapter.drop\_relation](https://docs.getdbt.com/reference/dbt-jinja-functions/adapter.md). To explicitly rename a relation in the cache, use [adapter.rename\_relation](https://docs.getdbt.com/reference/dbt-jinja-functions/adapter.md). Calling these methods is preferable to executing the corresponding SQL directly, as they will mutate the cache as required. If you do need to execute the SQL to drop or rename relations directly, use the `adapter.cache_dropped` and `adapter.cache_renamed` methods to synchronize the cache. #### Materialization Configuration[​](#materialization-configuration "Direct link to Materialization Configuration") Materializations support custom configuration. You might be familiar with some of these configs from materializations like `unique_key` in [incremental models](https://docs.getdbt.com/docs/build/incremental-models.md) or `strategy` in [snapshots](https://docs.getdbt.com/docs/build/snapshots.md) . ##### Specifying configuration options[​](#specifying-configuration-options "Direct link to Specifying configuration options") Materialization configurations can either be "optional" or "required". If a user fails to provide required configurations, then dbt will raise a compilation error. You can define these configuration options with the `config.get` and `config.require` functions. ```text # optional config.get('optional_config_name', default="the default") # required config.require('required_config_name') ``` For more information on the `config` dbt Jinja function, see the [config](https://docs.getdbt.com/reference/dbt-jinja-functions/config.md) reference. #### Materialization precedence[​](#materialization-precedence "Direct link to Materialization precedence") dbt will pick the materialization macro in the following order (lower takes priority): 1. global project - default 2. global project - plugin specific 3. imported package - default 4. imported package - plugin specific 5. local project - default 6. local project - plugin specific In each of the stated search spaces, a materialization can only be defined once. Two different imported packages may not supply the same materialization - an error will be raised. Specific materializations can be selected by using the dot-notation when selecting a materialization from the context. We recommend *not* overriding materialization names directly, and instead using a prefix or suffix to denote that the materialization changes the behavior of the default implementation (eg. my\_project\_incremental). #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Customize dbt models database, schema, and alias [Back to guides](https://docs.getdbt.com/guides.md) Advanced [Menu ]() #### Introduction[​](#introduction "Direct link to Introduction") This guide explains how to customize the [schema](https://docs.getdbt.com/docs/build/custom-schemas.md), [database](https://docs.getdbt.com/docs/build/custom-databases.md), and [alias](https://docs.getdbt.com/docs/build/custom-aliases.md) naming conventions in dbt to fit your data warehouse governance and design needs. When we develop dbt models and execute certain [commands](https://docs.getdbt.com/reference/dbt-commands.md) (such as `dbt run` or `dbt build`), objects (like tables and views) get created in the data warehouse based on these naming conventions. A word on naming Different warehouses have different names for *logical databases*. The information in this document covers "databases" on Snowflake, Redshift, and Postgres; "projects" on BigQuery; and "catalogs" on Databricks Unity Catalog. The following is dbt's out-of-the-box default behavior: * The database where the object is created is defined by the database configured at the [environment level in dbt](https://docs.getdbt.com/docs/dbt-cloud-environments.md) or in the [`profiles.yml` file](https://docs.getdbt.com/docs/local/profiles.yml.md) in dbt Core. * The schema depends on whether you have defined a [custom schema](https://docs.getdbt.com/docs/build/custom-schemas.md) for the model: * If you haven't defined a custom schema, dbt creates the object in the default schema. In dbt, this is typically `dbt_username` for development and the default schema for deployment environments. In dbt Core, it uses the schema specified in the `profiles.yml` file. * If you define a custom schema, dbt concatenates the schema mentioned earlier with the custom one. * For example, if the configured schema is `dbt_myschema` and the custom one is `marketing`, the objects will be created under `dbt_myschema_marketing`. * Note that for automated CI jobs, the schema name derives from the job number and PR number: `dbt_cloud_pr__`. * The object name depends on whether an [alias](https://docs.getdbt.com/reference/resource-configs/alias.md) has been defined on the model: * If no alias is defined, the object will be created with the same name as the model, without the `.sql` or `.py` at the end. * For example, suppose that we have a model where the SQL file is titled `fct_orders_complete.sql`, the custom schema is `marketing`, and no custom alias is configured. The resulting model will be created in `dbt_myschema_marketing.fct_orders_complete` in the dev environment. * If an alias is defined, the object will be created with the configured alias. * For example, suppose that we have a model where the SQL file is titled `fct_orders_complete.sql`, the custom schema is `marketing`, and the alias is configured to be `fct_orders`. The resulting model will be created in `dbt_myschema_marketing.fct_orders` These default rules are a great starting point, and many organizations choose to stick with those without any customization required. The defaults allow developers to work in their isolated schemas (sandboxes) without overwriting each other's work — even if they're working on the same tables. #### How to customize this behavior[​](#how-to-customize-this-behavior "Direct link to How to customize this behavior") While the default behavior will fit the needs of most organizations, there are occasions where this approach won't work. For example, dbt expects that it has permission to create schemas as needed (and we recommend that the users running dbt have this ability), but it might not be allowed at your company. Or, based on how you've designed your warehouse, you may wish to minimize the number of schemas in your dev environment (and avoid schema sprawl by not creating the combination of all developer schemas and custom schemas). Alternatively, you may even want your dev schemas to be named after feature branches instead of the developer name. For this reason, dbt offers three macros to customize what objects are created in the data warehouse: * [`generate_database_name()`](https://docs.getdbt.com/docs/build/custom-databases.md#generate_database_name) * [`generate_schema_name()`](https://docs.getdbt.com/docs/build/custom-schemas.md#how-does-dbt-generate-a-models-schema-name) * [`generate_alias_name()`](https://docs.getdbt.com/docs/build/custom-aliases.md#generate_alias_name) By overwriting one or multiple of those macros, we can tailor where dbt objects are created in the data warehouse and align with any existing requirement. Key concept Models run from two different contexts must result in unique objects in the data warehouse. For example, a developer named Suzie is working on enhancements to `fct_player_stats`, but Darren is developing against the exact same object. In order to prevent overwriting each other's work, both Suzie and Darren should each have their unique versions of `fct_player_stats` in the development environment. Further, the staging version of `fct_player_stats` should exist in a unique location apart from the development versions, and the production version. We often leverage the following when customizing these macros: * In dbt, we recommend utilizing [environment variables](https://docs.getdbt.com/docs/build/environment-variables.md) to define where the dbt invocation is occurring (dev/stg/prod). * They can be set at the environment level and all jobs will automatically inherit the default values. We'll add Jinja logic (`if/else/endif`) to identify whether the run happens in dev, prod, Ci, and more. * Or as an alternative to environment variables, you can use `target.name`. For more information, you can refer to [About target variables](https://docs.getdbt.com/reference/dbt-jinja-functions/target.md). [![Custom schema environmental variables target name.]( "Custom schema environmental variables target name.")](#)Custom schema environmental variables target name. To allow the database/schema/object name to depend on the current branch, you can use the out-of-the-box `DBT_CLOUD_GIT_BRANCH` environment variable in dbt [special environment variables](https://docs.getdbt.com/docs/build/environment-variables.md#special-environment-variables). #### Example use cases[​](#example-use-cases "Direct link to Example use cases") Here are some typical examples we've encountered with dbt users leveraging those 3 macros and different logic. note Note that the following examples are not comprehensive and do not cover all the available options. These examples are meant to be templates for you to develop your own behaviors. * [Use custom schema without concatenating target schema in production](https://docs.getdbt.com/guides/customize-schema-alias.md?step=3#1-custom-schemas-without-target-schema-concatenation-in-production) * [Add developer identities to tables](https://docs.getdbt.com/guides/customize-schema-alias.md?step=3#2-static-schemas-add-developer-identities-to-tables) * [Use branch name as schema prefix](https://docs.getdbt.com/guides/customize-schema-alias.md?step=3#3-use-branch-name-as-schema-prefix) * [Use a static schema for CI](https://docs.getdbt.com/guides/customize-schema-alias.md?step=3#4-use-a-static-schema-for-ci) ##### 1. Custom schemas without target schema concatenation in production[​](#1-custom-schemas-without-target-schema-concatenation-in-production "Direct link to 1. Custom schemas without target schema concatenation in production") The most common use case is using the custom schema without concatenating it with the default schema name when in production. To do so, you can create a new file called `generate_schema_name.sql` under your macros folder with the following code: macros/generate\_schema\_name.sql ```jinja {% macro generate_schema_name(custom_schema_name, node) -%} {%- set default_schema = target.schema -%} {%- if custom_schema_name is none -%} {{ default_schema }} {%- elif env_var('DBT_ENV_TYPE','DEV') == 'PROD' -%} {{ custom_schema_name | trim }} {%- else -%} {{ default_schema }}_{{ custom_schema_name | trim }} {%- endif -%} {%- endmacro %} ``` This will generate the following outputs for a model called `my_model` with a custom schema of `marketing`, preventing any overlap of objects between dbt runs from different contexts. | Context | Target database | Target schema | Resulting object | | ----------- | --------------- | ------------- | ------------------------------------ | | Developer 1 | dev | dbt\_dev1 | dev.dbt\_dev1\_marketing.my\_model | | Developer 2 | dev | dbt\_dev2 | dev.dbt\_dev2\_marketing.my\_model | | CI PR 123 | ci | dbt\_pr\_123 | ci.dbt\_pr\_123\_marketing.my\_model | | CI PR 234 | ci | dbt\_pr\_234 | ci.dbt\_pr\_234\_marketing.my\_model | | Production | prod | analytics | prod.marketing.my\_model | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | note We added logic to check if the current dbt run is happening in production or not. This is important, and we explain why in the [What not to do](https://docs.getdbt.com/guides/customize-schema-alias.md?step=3#what-not-to-do) section. ##### 2. Static schemas: Add developer identities to tables[​](#2-static-schemas-add-developer-identities-to-tables "Direct link to 2. Static schemas: Add developer identities to tables") Occasionally, we run into instances where the security posture of the organization prevents developers from creating schemas and all developers have to develop in a single schema. In this case, we can: * Create a new file called generate\_schema\_name.sql under your macros folder with the following code: * Change `generate_schema_name()` to use a single schema for all developers, even if a custom schema is set. * Update `generate_alias_name()` to append the developer alias and the custom schema to the front of the table name in the dev environment. * This method is not ideal, as it can cause long table names, but it will let developers see in which schema the model will be created in production. macros/generate\_schema\_name.sql ```jinja {% macro generate_schema_name(custom_schema_name, node) -%} {%- set default_schema = target.schema -%} {%- if custom_schema_name is none -%} {{ default_schema }} {%- elif env_var('DBT_ENV_TYPE','DEV') != 'CI' -%} {{ custom_schema_name | trim }} {%- else -%} {{ default_schema }}_{{ custom_schema_name | trim }} {%- endif -%} {%- endmacro %} ``` macros/generate\_alias\_name.sql ```jinja {% macro generate_alias_name(custom_alias_name=none, node=none) -%} {%- if env_var('DBT_ENV_TYPE','DEV') == 'DEV' -%} {%- if custom_alias_name -%} {{ target.schema }}__{{ custom_alias_name | trim }} {%- elif node.version -%} {{ target.schema }}__{{ node.name ~ "_v" ~ (node.version | replace(".", "_")) }} {%- else -%} {{ target.schema }}__{{ node.name }} {%- endif -%} {%- else -%} {%- if custom_alias_name -%} {{ custom_alias_name | trim }} {%- elif node.version -%} {{ return(node.name ~ "_v" ~ (node.version | replace(".", "_"))) }} {%- else -%} {{ node.name }} {%- endif -%} {%- endif -%} {%- endmacro %} ``` This will generate the following outputs for a model called `my_model` with a custom schema of `marketing`, preventing any overlap of objects between dbt runs from different contexts. | Context | Target database | Target schema | Resulting object | | ----------- | --------------- | ------------- | ------------------------------------ | | Developer 1 | dev | dbt\_dev1 | dev.marketing.dbt\_dev1\_my\_model | | Developer 2 | dev | dbt\_dev2 | dev.marketing.dbt\_dev2\_my\_model | | CI PR 123 | ci | dbt\_pr\_123 | ci.dbt\_pr\_123\_marketing.my\_model | | CI PR 234 | ci | dbt\_pr\_234 | ci.dbt\_pr\_234\_marketing.my\_model | | Production | prod | analytics | prod.marketing.my\_model | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | ##### 3. Use branch name as schema prefix[​](#3-use-branch-name-as-schema-prefix "Direct link to 3. Use branch name as schema prefix") For teams who prefer to isolate work based on the feature branch, you may want to take advantage of the `DBT_CLOUD_GIT_BRANCH` special environment variable. Please note that developers will write to the exact same schema when they are on the same feature branch. note The `DBT_CLOUD_GIT_BRANCH` variable is only available within the Studio IDE and not the dbt CLI. We’ve also seen some organizations prefer to organize their dev databases by branch name. This requires implementing similar logic in `generate_database_name()` instead of the `generate_schema_name()` macro. By default, dbt will not automatically create the databases. Refer to the [Tips and tricks](https://docs.getdbt.com/guides/customize-schema-alias.md?step=5) section to learn more. macros/generate\_schema\_name.sql ```jinja {% macro generate_schema_name(custom_schema_name, node) -%} {%- set default_schema = target.schema -%} {%- if env_var('DBT_ENV_TYPE','DEV') == 'DEV' -%} {#- we replace characters not allowed in the schema names by "_" -#} {%- set re = modules.re -%} {%- set cleaned_branch = re.sub("\W", "_", env_var('DBT_CLOUD_GIT_BRANCH')) -%} {%- if custom_schema_name is none -%} {{ cleaned_branch }} {%- else -%} {{ cleaned_branch }}_{{ custom_schema_name | trim }} {%- endif -%} {%- else -%} {{ default_schema }}_{{ custom_schema_name | trim }} {%- endif -%} {%- endmacro %} ``` This will generate the following outputs for a model called `my_model` with a custom schema of `marketing`, preventing any overlap of objects between dbt runs from different contexts. | Context | Branch | Target database | Target schema | Resulting object | | ----------- | ------------ | --------------- | ------------- | ------------------------------------ | | Developer 1 | `featureABC` | dev | dbt\_dev1 | dev.featureABC\_marketing.my\_model | | Developer 2 | `featureABC` | dev | dbt\_dev2 | dev.featureABC\_marketing.my\_model | | Developer 1 | `feature123` | dev | dbt\_dev1 | dev.feature123\_marketing.my\_model | | CI PR 123 | | ci | dbt\_pr\_123 | ci.dbt\_pr\_123\_marketing.my\_model | | CI PR 234 | | ci | dbt\_pr\_234 | ci.dbt\_pr\_234\_marketing.my\_model | | Production | | prod | analytics | prod.marketing.my\_model | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | When developer 1 and developer 2 are checked out on the same branch, they will generate the same object in the data warehouse. This shouldn't be a problem as being on the same branch means the model's code will be the same for both developers. ##### 4. Use a static schema for CI[​](#4-use-a-static-schema-for-ci "Direct link to 4. Use a static schema for CI") Some organizations prefer to write their CI jobs to a single schema with the PR identifier prefixed to the front of the table name. It's important to note that this will result in long table names. To do so, you can create a new file called `generate_schema_name.sql` under your macros folder with the following code: macros/generate\_schema\_name.sql ```jinja {% macro generate_schema_name(custom_schema_name=none, node=none) -%} {%- set default_schema = target.schema -%} {# If the CI Job does not exist in its own environment, use the target.name variable inside the job instead #} {# {%- if target.name == 'CI' -%} #} {%- if env_var('DBT_ENV_TYPE','DEV') == 'CI' -%} ci_schema {%- elif custom_schema_name is none -%} {{ default_schema }} {%- else -%} {{ default_schema }}_{{ custom_schema_name | trim }} {%- endif -%} {%- endmacro %} ``` macros/generate\_alias\_name.sql ```jinja {% macro generate_alias_name(custom_alias_name=none, node=none) -%} {# If the CI Job does not exist in its own environment, use the target.name variable inside the job instead #} {# {%- if target.name == 'CI' -%} #} {%- if env_var('DBT_ENV_TYPE','DEV') == 'CI' -%} {%- if custom_alias_name -%} {{ target.schema }}__{{ node.config.schema }}__{{ custom_alias_name | trim }} {%- elif node.version -%} {{ target.schema }}__{{ node.config.schema }}__{{ node.name ~ "_v" ~ (node.version | replace(".", "_")) }} {%- else -%} {{ target.schema }}__{{ node.config.schema }}__{{ node.name }} {%- endif -%} {%- else -%} {%- if custom_alias_name -%} {{ custom_alias_name | trim }} {%- elif node.version -%} {{ return(node.name ~ "_v" ~ (node.version | replace(".", "_"))) }} {%- else -%} {{ node.name }} {%- endif -%} {%- endif -%} {%- endmacro %} ``` This will generate the following outputs for a model called `my_model` with a custom schema of `marketing`, preventing any overlap of objects between dbt runs from different contexts. | Context | Target database | Target schema | Resulting object | | ----------- | --------------- | ------------- | ------------------------------------------------ | | Developer 1 | dev | dbt\_dev1 | dev.dbt\_dev1\_marketing.my\_model | | Developer 2 | dev | dbt\_dev2 | dev.dbt\_dev2\_marketing.my\_model | | CI PR 123 | ci | dbt\_pr\_123 | ci.ci\_schema.dbt\_pr\_123\_marketing\_my\_model | | CI PR 234 | ci | dbt\_pr\_234 | ci.ci\_schema.dbt\_pr\_234\_marketing\_my\_model | | Production | prod | analytics | prod.marketing.my\_model | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | #### What not to do[​](#what-not-to-do "Direct link to What not to do") This section will provide an outline of what users should avoid doing when customizing their schema and alias due to the issues that may arise. ##### Update generate\_schema\_name() to always use the custom schema[​](#update-generate_schema_name-to-always-use-the-custom-schema "Direct link to Update generate_schema_name() to always use the custom schema") Some people prefer to only use the custom schema when it is set instead of concatenating the default schema with the custom one, as it happens in the out of the box behavior. ##### Problem[​](#problem "Direct link to Problem") When modifying the default macro for `generate_schema_name()`, this might result in creating this new version. macros/generate\_schema\_name.sql ```jinja {% macro generate_schema_name(custom_schema_name, node) -%} {%- set default_schema = target.schema -%} {%- if custom_schema_name is none -%} {{ default_schema }} {%- else -%} # The following is incorrect as it omits {{ default_schema }} before {{ custom_schema_name | trim }}. {{ custom_schema_name | trim }} {%- endif -%} {%- endmacro %} ``` While it may provide the expected output for production, where a dedicated database is used, it will generate conflicts anywhere people share a database. Let’s look at the example of a model called `my_model` with a custom schema of `marketing`. | Context | Target database | Target schema | Resulting object | | ----------- | --------------- | ------------- | ------------------------ | | Production | prod | analytics | prod.marketing.my\_model | | Developer 1 | dev | dbt\_dev1 | dev.marketing.my\_model | | Developer 2 | dev | dbt\_dev2 | dev.marketing.my\_model | | CI PR 123 | ci | dbt\_pr\_123 | ci.marketing.my\_model | | CI PR 234 | ci | dbt\_pr\_234 | ci.marketing.my\_model | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | We can see that both developer 1 and developer 2 get the same object for `my_model`. This means that if they both work on this model at the same time, it will be impossible to know if the version currently in the data warehouse is the one from developer 1 and developer 2. Similarly, different PRs will result in the exact same object in the data warehouse. If different PRs are open at the same time and modifying the same models, it is very likely that we will get issues, slowing down the whole development and code promotion. ##### Solution[​](#solution "Direct link to Solution") As described in the previous example, update the macro to check if dbt is running in production. Only in production should we remove the concatenation and use the custom schema alone. #### Tips and tricks[​](#tips-and-tricks "Direct link to Tips and tricks") This section will provide some useful tips on how to properly adjust your `generate_database_name()` and `generate_alias_name()` macros. ##### Creating non existing databases from dbt[​](#creating-non-existing-databases-from-dbt "Direct link to Creating non existing databases from dbt") dbt will automatically try to create a schema if it doesn’t exist and if an object needs to be created in it, but it won’t automatically try to create a database that doesn’t exist. So, if your `generate_database_name()` configuration points to different databases, which might not exist, dbt will fail if you do a simple `dbt build`. It is still possible to get it working in dbt by creating some macros that will check if a database exists and if not, dbt will create it. You can then call those macros either in [a `dbt run-operation ...` step](https://docs.getdbt.com/reference/commands/run-operation.md) in your jobs or as a [`on-run-start` hook](https://docs.getdbt.com/reference/project-configs/on-run-start-on-run-end.md). ##### Assuming context using environment variables rather than `target.name`[​](#assuming-context-using-environment-variables-rather-than-targetname "Direct link to assuming-context-using-environment-variables-rather-than-targetname") We prefer to use [environment variables](https://docs.getdbt.com/docs/build/environment-variables.md) over `target.name` For a further read, have a look at ([About target variables](https://docs.getdbt.com/reference/dbt-jinja-functions/target.md)) to decipher the context of the dbt invocation. * `target.name` cannot be set at the environment-level. Therefore, every job within the environment must explicitly specify the `target.name` override. If the job does not have the appropriate `target.name` value set, the database/schema/alias may not resolve properly. Alternatively, environment variable values are inherited by the jobs within their corresponding environment. The environment variable values can also be overwritten within the jobs if needed. [![Customize schema alias env var.](/img/docs/dbt-cloud/using-dbt-cloud/custom-schema-env-var-targetname.png?v=2 "Customize schema alias env var.")](#)Customize schema alias env var. * `target.name` requires every developer to input the same value (often ‘dev’) into the target name section of their project development credentials. If a developer doesn’t have the appropriate target name value set, their database/schema/alias may not resolve properly. [![Development credentials.](/img/docs/dbt-cloud/using-dbt-cloud/development-credentials.png?v=2 "Development credentials.")](#)Development credentials. ##### Always enforce custom schemas[​](#always-enforce-custom-schemas "Direct link to Always enforce custom schemas") Some users prefer to enforce custom schemas on all objects within their projects. This avoids writing to unintended “default” locations. You can add this logic to your `generate_schema_name()` macro to [raise a compilation error](https://docs.getdbt.com/reference/dbt-jinja-functions/exceptions.md) if a custom schema is not defined for an object. macros/generate\_schema\_name.sql ```jinja {% macro generate_schema_name(custom_schema_name, node) -%} {%- set default_schema = target.schema -%} {%- set node_custom_schema = node.config.get('schema') -%} {%- if custom_schema_name is none and node_custom_schema is none and node.resource_type == 'model' -%} {{ exceptions.raise_compiler_error("Error: No Custom Schema Defined for the model " ~ node.name ) }} {%- elif custom_schema_name is none -%} {{ default_schema }} {%- elif env_var('DBT_ENV_TYPE','DEV') == 'PROD' -%} {{ custom_schema_name | trim }} {%- else -%} {{ default_schema }}_{{ custom_schema_name | trim }} {%- endif -%} {%- endmacro %} ``` #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Customizing CI/CD with custom pipelines [Back to guides](https://docs.getdbt.com/guides.md) dbt platform Orchestration CI Intermediate [Menu ]() #### Introduction[​](#introduction "Direct link to Introduction") One of the core tenets of dbt is that analytic code should be version controlled. This provides a ton of benefit to your organization in terms of collaboration, code consistency, stability, and the ability to roll back to a prior version. There’s an additional benefit that is provided with your code hosting platform that is often overlooked or underutilized. Some of you may have experience using dbt’s [webhook functionality](https://docs.getdbt.com/docs/deploy/continuous-integration.md) to run a job when a PR is created. This is a fantastic capability, and meets most use cases for testing your code before merging to production. However, there are circumstances when an organization needs additional functionality, like running workflows on every commit (linting), or running workflows after a merge is complete. In this article, we will show you how to setup custom pipelines to lint your project and trigger a dbt job via the API. A note on parlance in this article since each code hosting platform uses different terms for similar concepts. The terms `pull request` (PR) and `merge request` (MR) are used interchangeably to mean the process of merging one branch into another branch. ##### What are pipelines?[​](#what-are-pipelines "Direct link to What are pipelines?") Pipelines (which are known by many names, such as workflows, actions, or build steps) are a series of pre-defined jobs that are triggered by specific events in your repository (PR created, commit pushed, branch merged, etc). Those jobs can do pretty much anything your heart desires assuming you have the proper security access and coding chops. Jobs are executed on [runners](https://docs.github.com/en/actions/learn-github-actions/understanding-github-actions#runners), which are virtual servers. The runners come pre-configured with Ubuntu Linux, macOS, or Windows. That means the commands you execute are determined by the operating system of your runner. You’ll see how this comes into play later in the setup, but for now just remember that your code is executed on virtual servers that are, typically, hosted by the code hosting platform. ![Diagram of how pipelines work](/assets/images/pipeline-diagram-25fbe103dc697bf0237ff92be4a993db.png) Please note, runners hosted by your code hosting platform provide a certain amount of free time. After that, billing charges may apply depending on how your account is setup. You also have the ability to host your own runners. That is beyond the scope of this article, but checkout the links below for more information if you’re interested in setting that up: * Repo-hosted runner billing information: * [GitHub](https://docs.github.com/en/billing/managing-billing-for-github-actions/about-billing-for-github-actions) * [GitLab](https://docs.gitlab.com/ee/ci/pipelines/cicd_minutes.html) * [Bitbucket](https://bitbucket.org/product/features/pipelines#) * Self-hosted runner information: * [GitHub](https://docs.github.com/en/actions/hosting-your-own-runners/about-self-hosted-runners) * [GitLab](https://docs.gitlab.com/runner/) * [Bitbucket](https://support.atlassian.com/bitbucket-cloud/docs/runners/) Additionally, if you’re using the free tier of GitLab you can still follow this guide, but it may ask you to provide a credit card to verify your account. You’ll see something like this the first time you try to run a pipeline: ![Warning from GitLab showing payment information is required](data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAvQAAABzCAAAAADSncmDAAAdl0lEQVR4Aeyb608Ud/v/f3/S+/Hn4TyZJ5Nsssk+2JCQJSGGYCBEA4aUEqLU0AjRtUELtljxRPGA3GKkB4QSIWqLJVsLEQ9EULEgsoiICzv7/t0zwzXM3Ktb+d7a+/R59bBbrs9c1zXjK2S1vP8f/8fQaLT0Gi29RqOl1/zPoaVfXyfvD9mB/9Jo/oukH+y6ToeerjvcZBlY21D4iS4vAG295r9K+nYYNslZ4FZQeu4ypn3pM/TpinfyL2QtHl+mRvO+NP+NLn9rLiT9NPA7yTMwsiHpmePbpE/iAP9CXgGL1Gjel7/t+Ju8FJCeMRwlWYJmbnQUIdq06klfGZ8iz+9Qu1JAxi9VKqj4MHPflquyszk63I9XkEzGB8iLCRX5bJl++Y94YjSh0nQYKFXm7hmSwzWGVfcs+Ga01og0TZMH4sPkbHyHc11qB2LXeT0GRCuo0WzDenG+gPSdiJCLQIqNQBQo96Q3kWIPABhAxi+VADAH2A7EgKN0+A0myRqcZxdQZaKIfnkGUMALkuwHIoB6xSFAAcbzrTej3ps5VqCPvAe1eR3UxqAJGAluB4223nG+sPRPgEe8DDNnf7pnmneBFZHeBhrXnseBjF/yPt7MAFOcVpgLS1+CK1zdV7/kl2eAuucbdDi+Z5BrBkapcMR+ncCZrTcGkvabBOpD0rfbD4Fb1B9vto+23nG+sPRM4CSr0EoyN5MaAR6J9LPAa/ImkPFLnvTfA8PDw8C1sPQ1iHRN2oHyDPCCwvydG8ClObfbQmrGf/OHM403YIWkf0lWoUNL/3HQ0ncjsQrcJa8qAAHpR12dXTu9kkh/BB5fh6V/HAVg9m6VZwBuMhGFw6UbMOkSeGOQjudrQelJNuGrjyK9Rn+84QLQjYhrd/vYVED6B0CWTAEZv+RJ34PogwcPph4sedIrm6zAeUftU0XAPb8ckN5E3chkBJceuT2Zy4Xf8FcYrECX0+5jS6/Rv5ElywF8RQ7BzPF2QPoN4Axz1UDGL7EVleQkjAWy58spkpwD7nBB4bx94psF0sCwX96SPg08ZVrhkg1cJuvQvvVGoZf8FLu5D5U2kyHpXwMT/MBo9B9Zkr0Apl17yxoQkJ5fAFETQMYv8Weg+JZdDXNvKdQiSeYMoEwB51mOyNFPgFW/HPpOH200gUs8CewsBma23pwFyovg/WGRlUBIepZANfBDodH/c0pYBuIk2aeALoUZV3oLKWabFMwfgHW/RLse6GNmnwEkpujymwl8uhsX+LwaQHySfnl2S/pJC2ioQi95IgJExhl4cyoCFN8g7QbA+BZq87omfE3ejMCkRvPRfuAs9zzLMPaL3D+Wchvuv/94TZ+FFXpk59ZIvxwmvcZNll75b6S2eTrzzKYgZG1qNPpHizWa/xTpNRotvUajpddotPQajZZeo9HSazRaeo1GS6/RaOk1WnqNRkuv0WjpNRotvUajpddotPQajZZeo9HSazRaeo1GS6/RaOk1Gi29RqOl12i09BqNll6jpbc3PHJ8PyY6R1kAe2rBfcmS/KNrg+TS1DpJ/n2Ezbfxoj3Nf56Vrj/rMjbIj4K9QfZO8cZ1kh9kBafbj5sn33TN819H7pspOsjNvT9Xf5dbKMxsF4XshgcfXsgXQ3b5QNKfg8cI34spxK/wnaztVUDkdzIZJ/mtSY4WAWje4EsASLQ+Zx7DuEFBmLjKbTKLKRamuYwfhVNYo9HN+mqS77nC+UcsgNMtXkyXFxin8NMv/Kjkb5VGKxfOrVJu7v2Jdji38OcrX0OOm9TAY/WyCovh7/LhpH+eSnXjWir1ku9Ft8rynWR2mn1PJxtw15O+poE30Pjrs35zD1/iROpGZ9x6xDzmmMfX5n+O9PsquE3pbVz9M+lXXuVLX7aPH4+3b7WQ5a+Y+79Jv/JKVn4/6R+lUo1GKpWyRXpfDNnlw0lPevfF3lJVNUaXPZ0kY0NcqTfM/avk+C61c5QOTQrWaUbPlGAt2xq1mtfI+b1G0Vm63PTMq211pbdVLxO7SfKXoqcvcZ1kJrqHDvetvphqndmhEtN8ZM0xemGnio8zExkjp635gwrWRb/xYJEqukGX+QYzmlznfeu7IlUxTwf7q4iqmOUsLiXcL40WqXgv2Vt23GzzFz8ZUwf2e8a5hXFrjewr9RvJq8x0DwUHO6v3kaxskx1kVxafE+m99t0RoylLf3RfifHJTGiFxxZUJPi8J6xH5FJk9NVnprH7hdvtQJKcrlHxaxjnTIWyDmaZAKxJvy3ZXWSTNyMvpNEZZ4Gm1s3dSdono6rkDpltjRnV83QWj7RkmHfe23hzKzJlLXM50k82HGV84AcDZgPrdx8yIt/SQdZxGd+t4hcY9mGtyTR7oh3OLXgr+wf9a+9bPbFSXi9TlWeRy/ted1n1RY3Psq4Y0+Uqcto7Eh8QHz+c9N/jQH8lHtAhcZgk+thgDV9RSc6qqv5az+ffPjUGHxJo7su2qLPnVR0zxdHLbeilw0lFD1f6h5ix0U0PT3q2mnSYgNV3CKqt3yjjFGYJHLsQKeEbjJB38WSiVg0+ksazOHSzDmmSXItHLp1UezkBo+eYaqNDH3qGi4o4C9V9QlVwDrXfN2CCpxDrnpDFe3Hg6g540ruFm1glu0y/kbzKTO9qGexRU07O4YbsILuy6mlQ+lMovroftyijB3GoryiyGlxh9Rqah4LPO6s6yT6s7lc9580qt1t1PV9HIpc6FMbtSFH/1zjNkeiOwWVpS/IexsnGhN/oaBHJXQ3e7iT5Pdqulqgsm3DiYiRirxVZPR2qMf+8t7G3lbMeRjmEWtrqCtXlp1/hwjjrsftqFe6QpKzj8EiVX/0c/SEfWI/OixY6nFtwVw4clGsnoI5fv4+d/Y14m/SI97Wg0xEjZ+68+bVnDtVl8fHDSV9cR2YiyZD00QZy9gFbI+u04wfp0GmRRDP5GlfJISzfwFOyJUaH+kRQ+osGZ3AzLP13WPakv05aNeQZ5Un/Ofk9FkUk9+6l8SAecz31iiRHMe3YsTyBAXJ/lA7N1gbTqdwseshWlUunslxX3/IUFukvHv+EXDVE+kVSpJdG8ioz3UP+YI8hpNltZGUHf1cyLP0CaTX7o4v3kYv4KbSCjavh591STO6q5d0n5Gkl0g87g85gPJtaJquqvc8K0tYhlmRWdUujgMSLdHk8RU5gahXnyYe1T0bxgBxoyDsvG/sfb0q/5IEiZd/HvCOa9/HGyPIVukhS1nFoNdfI9s6QDy/RQ0560jsrBw/KtRO4RiatNXLXW6WfJot3O2IsoY+c/EOk93z8cNJnnfZs3BGS/hiKv54mLTQ2NiIakP4KmUJ5Y2Mlxk64RWRcASNB6es+4RIGwtKfw4Yn/RxZdITshyd9HzmDyZD00vi5ofYOZOjwjUnyGcYmMEv2gg63YCVv5ziLCXII63zU2VSHEzxl0l98w723PZ5xbkGkl0byKjPdQ/5gj4y6wh0t/g7vkt4gWb9LRmfhvh6TFUT68PO+g/lVDDP7Y3IvINJ3mCQfYJzpiy31iHoGhX4pTN7Got/Il9jkJrfbPitGKoVZOhw36JB3Xjb2pf+6lJERNdkT5Zb0VSRjbXSQdUiWN9Ah5MM4HpNUIenloFw7gaebXzv/VulzZEuJK8YO7Dqfpkjv+fjhpH+D7x1riz3pk570uZuNBrpp4MiRI7VdAen7Hdkajxxp/eRhm1OsT66RjjaeIzlXeuMCabR7XxDp9xXRk/5ZWPp+8okj/XVyclN6v/HiqTJE3Dtvs0i+wKh7eR/o8uCLKKq938gOY30c0QPHHOktUhb37q1+U3rLlf4VedakNJJXmekeksFC4+40fvN3kF2D0kv7hl0yeg2W83pdVhDpw887Z10YVOusRHVrVKT3Bs1ifNk0GtpQ7BkU/KWYw9TBCr/RlsQWPU6hNNmE1C94Rod2iw5vP98QkD6FGaztOVnf4kvv3VzclV7WcShtpEPIh1uYJ2kEpZeDcq33uEv2e4LnS69IJj3p31ytAe6I9J6PH056Rg+SueImOpRXkUvoYzrL9eoImxMkXy6HpX+BG6S9kL2GVfLNAh3u4gTJbKTBkX4GD8hdRpqO2rOe9L8iWUj6LE6Tg470BimN1145578nnVKaHMVCUPpXa+RJzIv0zabNjEgvi0daSDsSkH4cE2RznvQy0zskg4Vf0G3l/B1k13dIL6OjR0guroVWsNEbft5sL69v4gv0kcd96X/EEvkdxp1PWSzfSZbV02/rqdRuXt1q1KGyZNSX3n3PFFJpDJDp7sUhLJBT5/PP+9L30iWr9u1k707z+qb0MyHpZR2HlqhNjgyHfFjAEPkHNqWvZ/CgXOs+7qZ4jvyssPTZ5RzTZpNI7/n4AaX/BpfvHcYvdEji++ka9GWNytnZ4irexhfTI+pwWHqWW7em92FpyaicmCguosshdE5P7MKEI/1lZZOPVen1x31qF1+iY+xasyp9U0h6xmNjd6J4wh8wkJbGF3FtqReTJJk2dt65Ze1gUPpac2Jxv9oQ6Y/jxnS9SC+Lt+PyVBMC0i+jZmoI+dLLTPeQDD46ShfbQNvWDrLrO6SX0R3om+nGSGgFRiqmQs+b08CvzKBu5iflS7+odv4+amA8hc7ZU9hJNpnjb6StSw+wSr/RMI5NH0dA+vLo1O8xpFhpjE5WqsyyueP3n61deedlY3crl904yzlg1RVtHl/MBaWXdbz3B+5fxdmQDyw1b6ZKPendlQMH5Vr3cd/CoakLQI7J3e+UfhptCyl1ZlN68fGDSD/uSJ87bCDSR5d0MZDEFY5ZwI4n5NUIsM/T9aQv/ctdgHWTvFsMlD6hS/aAAVhjdKRvcC0YTwBoXONLAChre8mA9Edd6e9tSv8Ud3nHcAY/5WoCx6Xxm0ZAnaXLVDFQtexdfgV0mNsBWD9zFvfI61hPlwGfRzp52iL9xe1GYGd9OR28wgUFa6/pN5JXmekcksFbvy9JYnprB9nVl75mq/3eXf7oXFJBnWFoBfYqhJ43GTdt8oqCedCTvqaevB2F6sK43QyU1lSQExZS0tYlDWemNLL3AmVle50NPCYtoAO/cbUaiNwh7yeAnQv552VjZyuX87hPRkpc0cgmlHs3F28nSVnH5YcIcGA97MPzMuBwtMO5BXflwEG51n3c7DYQaUOOuyI2PTqsLelLXTG+Va49nvTi44eQXsgt0+eVN4grb+iyvMF81jcveP060OTRIj3Mrs0W0xt8f9JZumTsrcbZNH1er1MQ3qwwyOoaffzF118xjL2UYz7Bm5HBKchzqSuWHfxdCyCjc0t23gr2ujzvgkstewMymwdzGWnrMY1RMtAo85Jh0ptHNzZHv84UOu9tlc9Glj6yjs+yne/DqieNrCwH867Npb0X+32EEB//rX/gbA53+V9B36bqk3swyn8njlmJHDX/VtJ32fyvYPwGXe5++Qv/rThx5hU1/1bSazRaeo1GS6/RaOk1Gi29RqOl1wSxt3/Ff/Z92X+N9D8ObiPqKUUJN35oZru2Udh+jLYAryZfB59GbiMv1PrWJ7bW9/VrfngWvdRxo5rkW8kdPbLMPLwrnJrU/5wnRzr5l3DryCCFPvRwu+vLth9E+njxNqKeUpRw44fmGnLbLGwjUViA8RIAO587T8O76WGsMJzve/sTq0XtCvPxcqfcXsWnOwqgZpmMY4DPunrJOSDNADbwhPm4Vzi1d9b5Y9c0g9yGyb+EduyjkMQB2l1dr7a3vmz7AaRfebWdqKcU5/jfJP1t1N+au1ZqvnaehnPTBaUPPjHVxXz8fOb2KsIJIFFtwMo40RnegulKv/Q+0jtXFJa+BH3/eukzYxlmgMfbXn8b0ocCjtN7VNEN63c/JXsgGUybFo56SjEUbrxcqnb8Sma/iJn1c36w1BsqmdCHNSo+bE1u5UnpR10/O7zfsC6SwUBlaCEpSGdJhErdYXPOLK6WqKqJQhFg2VPmuuyoJMnVcudpeDct0vuh1mO1KtFPlwNJSZ1OWzAiGXdbiat6E73c6WfJOnVzM6gqIVKvIvu5J/yH4rEInCEXDfSyNvbz9yYQ+86X/mmtqXaM0waulRp10yT5MhZbZjJ2ko9iCeeKoDWdsW/+fpe9JPt2qkSHzRLASHidGixj14yjUV/U+DzL3MUiFTmY4S+xPcfNL9lXajR1xr4mL5epcu/Gj8cayLFYws62FavKAfJorIesj91YicX6Y6V0+Hm3Edm/+o/NrpQYTYcD0p+OnZiJApGDf7I+U3uMaNMy/z5guFhVPXel3xW74HjYXFj6UMDRyWUeVxjzs1PV9eG0aaGopxSD4cYf0HSlAo/ZhM7eaOy1xI28oZuZ0JdWtPcbhXE/T0r6UddK1F0pxyqDgcrQQlKQzpIIlbrD5pxZ4EhfkbFYIAIse/pzSdro4ibV9d5Ni/R+qBV7+vdgbPOMpE5XBtEyaLvbSlzVm+jlTitR1fvHZpJDQqReRfZzT/gPxWMUKkcyPbfMOIb6DcC6ItLbFiwLeGHDJfKaJCP4mQYS7EeVc0XQmha4vGEvYAEtLAJUnCSXLCgF9fI2XL7lVaAISHIIAJqH4dLkXJgAXOtvOI2SqGM9YALX2IAOMoHv0wAQcyUFDKCM4WY34BD6eNMyYwFm05+sP6Wcdom1NFw6XOmPYAcZx6XC0ocCjm4usyssfTBtWjDqKcVguLF4D7nWMLKKHnIWw6KmO1QyoQPOPfRg3M+TkpSoa2XU5jwGg4HK8EJSkM6SCJU6t+bMIkkuoKdABFj29OeSXs737pUrV35znkbo440fai22mS2qE+kldUp8720rcVWZ6H6IqYzZpEgvaVOnIvt5J/yH4tGBUnLkq6++GnAcCH+8WT7Vnc1Z+MkGajO/wcuqNaPzKYC1FnTlS2+tzAMjNHHakXbF/3jTC2s1m8DV28CvmQrU8vqpO+xHzPG0L7NRiar1OQP7aWKENxD3oie4ySiGngK/8wiKgtI3rWyQZD1quagwF25WjdLVaRWWXj7eFFx/L3blFhSup4FuO4kiV/opYPUFsFRY+lDA0c1lPgpLH0ybFox6SjEQbvS+RKbcW7COiZruUMmEfmmRfIJxyZOSpERdKxtIGmeDgcrQQn5BOksiVOrcmjOLWySLDxSIAMue/lyS8448xw0D+/Ok90OtbSRboyK9pE496c2tuKpM9KTfS196CZGGpXdOBB+Kw3GUkE0AqvOl5/zphl1Avw2kyAp0enn22n40YawEU/nSN5E7cGEReE4q/OJL3wQn6GXbt6HIbpSTI1/UAhad5A1pYNg5s38RKC4pBta8aw7PA28GECGngTcB6Z/SJYIhp2su3MzCAPnZ26UvuH7UubAW7Wlggc6mjvSMYOQHlLOw9OHAq0VyzpE+KdKH06aFop5SDIQbM+inw8+Yd7ZslWCpO1QyoZ0GySmMS56UJCXqWtlI0jwbDFSGFvIL0lkSoVKnP8fJm5Lcsa9ABFj2lLnebZ6gg3k0T3o/1NpB8pgp0ksWyZPe2oqrykRP+saA9JI2dSuyn3Mi+FAcBqBWuXC/FgfypZ9TsKo86e+S1ThGksswmzCDNig7X/pDZCUuPAWWSQMjvvR7vVinp1EvypkEdijX07h4uhdNT4G6ujqzbtk7G+tHHa84J2aBVw1oJ4tc6ZfpYsK7jXAzE8Nky1ulL7C+XNiAw+6ACZH+KxxsxKX3kV4Cjm4u8weMeSlZX3o/m1kw6inFYLgx9jmZuzS5hJ+c/X+QYKk7VDKhtzBENmLcz5OSlKiryBcMVIYWkoJ0lkSo1Lk1Z9Y5sqa6CkSAZc+g9KwxnJUm8YMnfa9IHwi1VpGs2P1O6SWuKhOd3Kk3QoKqkjZ1KrKfd8J/KB7PgQM25wxc2pReZX3pe5Bg1nKlb+WCwiAdimBU0TTwCd8lPRUG+BBYYAnO0KEb8SyTpSO+9Ao32ed7ugfxh6PAfipMcWNkNLeZpizDT06fZ7yICJtQyRUVlL7GsbmydD7crBL1XLXypZ/kn6xfi8+5ZuLHsPTTiBhYeh/pJeC4qMrvjBoY81KyvvR+2rRg1FOKwXDjKfQ8+AKT3GnenKg0FiVY6g6VTGhuD8piEYz7eVKSEnUV+fxAJRleSArSWRKhUufWnFng2t1azBSKAMueQemnUTz2+AcV23CehnvTwxgeGxub2Aq1ouNeB/rfKb3EVWWikzv1RkhQVdKmTkX2807IQ5EkaQdglQGRN670L4Ci63OAaVlWz3Vgfwlc6RFTUC/o0Aqc5l7g0rulPwxVY6CCPARVS5KPFYorgHlf+iIUNUE8dcQGDOznYRgtCZRKQBp4QzsBsxro5EUgbiIo/SCwK45ouBn74GiKfZwuq/WlZxxm55+sfx2ojMFcCkvPOLCT7yW9BBxvR4EOjPkp2Zr6cNq0YNRTisFwY6sB1Uuu7AaKJ/1gqTdUMqGZwWTXDMYlT+ogUdcqV76urUAlGV5ICn5nSYRKfWvOLDoNWEMFI8Cyp8x1mYwCqJqj8zTcmx6GQ3wr1HooDtVOF3lie13pvW0lrupPbEK5N0KCqpI2dSqyn3dCHookSXM9CkDlE9cBsl3h3BxcTmYbgYpiV/pOhdhtutwEJngFmM2z5iCSjvQXaR8yoGpfkzNFUHRIxQHzpqfRZZQzZUI1w+JPrqec/rrx+8Noci9E1QJd7gCfklyuAox2MlMJ1BThh2VAYok9FpB48o/NjigUf4ZGXkPDlvQ/maj5k/XZFwFKHtEZEJC+E+gtLH1+wDGdTWNMUrIF06Z5UU8pSqNgEJIbr4NfDmZCnx2cJLuwJHnSvKir9MlbSArSWRKhUs/LntrpghFg2TOP9L3V8E0HG0sEtCDpDZkYzJ1KUFU6SEX2k5BtKEmam5ta5RbBwRm/YL/ktliwN5vLDa2mKeTd4LfVHZnHFvqc8/NvZPZFIEWH7AJdVvwWPktr0iyAvUKHTvy43R+0SedHZlcaoFbfT/ogjvR/JXZCVcbRzP8UNL/BoXiZQQygnv8MjVjmP80MgF5uX/pM1zz/UrLfHz1+i/85aJ6dafryxw2GOJj8Mcd/hpNf8p9ncW9bSv9osUajpddo6TUaLb1Go6XXaLT0HzbOafOvRaOll8ThNug5MuVdOnC42X6vnOU91UCXx0dO0sdPRP6VaLT0ki3bDqW4SofTgCogfSDEdR1RuvwCM5TI9BKR/ylotPRVaMvxvaTPpV6EpZdEppeI/IvRaOklULlxJG7U3KGD5Bu5vM9SpTe2atmjUat7U/oiwExQIpF/j0zWGaMk22JJsi/2KW+XKbNm3pX+dqyaTO02KnthUgpuItNJREpG0wlNdltGc5aaj4uWXgKV9VDFUA9J+lEvViFWBUz7taNw8KSPAioukUgvcDlIcghGjpX45gkQV4hvfryJcV7BwaQUvERmEi1+RnMILuPUfFS09BKofAq1wFYcCkkP/MyR8w/9msLh7MCm9CzDZfqRyBaoVDZL8g0wtQ7MTp0a4DMgLdJ3Q82+SsCkFLyPN470ktEcAqZeJ3CYmo+Kll4ClQNASQmQCElfCRQlpyi1eWCRtALS+5HIFuylRwPOjCNOTrbVVQDPRPr9OEB+B9Mv+NL7Gc0hp9lXqKXmo6Kll0DlFSfwWFqSJEnJN3KlLQrgltQeAytkLCC9H4l0YjgeI6hox0kOA0XVwJxI34Ak3SywFHzp/YymG6U5paX/2GjpJVD5EMYan4/8RtLPN67d/tVeKMEhqdlAL+8hIL0fidySfl0hjqf8BEc5G/hOfwqRNdbC9AtOItOV3s9obkmfukfNR0NLL4FKO4GiQybaSfr5xoxC9TcGfvRrDUAcQeklEhmQno1AsWOz0WwFvtNPA0YEMP2Cl8h0pJeMpi/9BLBIzUdDSy+BSifciGSW3Mo3MhUFVHvWr73eDTSUoJ8O5ejzI5Fe4NJPZXaRz0uA1sB3ev5owWiH5Re8RKYjvWQ0fxLpH0OtUvOR0NIHA5Ub81lS8o3yZiEXqmXWGEYikfmsZhhmORcuMJcLZzR9Vt9Q0Gj0T1lqNFr6/98uHQgAAAAACPK3HuRiCKQH6ZEepAfpQXqQHqQH6UF6fqQH6UF6kB6kB+lBepAeAijCS9ryYvlHAAAAAElFTkSuQmCC) ##### How to setup pipelines[​](#how-to-setup-pipelines "Direct link to How to setup pipelines") This guide provides details for multiple code hosting platforms. Where steps are unique, they are presented without a selection option. If code is specific to a platform (i.e. GitHub, GitLab, Bitbucket) you will see a selection option for each. Pipelines can be triggered by various events. The [dbt webhook](https://docs.getdbt.com/docs/deploy/continuous-integration.md) process already triggers a run if you want to run your jobs on a merge request, so this guide focuses on running pipelines for every push and when PRs are merged. Since pushes happen frequently in a project, we’ll keep this job super simple and fast by linting with SQLFluff. The pipeline that runs on merge requests will run less frequently, and can be used to call the dbt API to trigger a specific job. This can be helpful if you have specific requirements that need to happen when code is updated in production, like running a `--full-refresh` on all impacted incremental models. Here’s a quick look at what this pipeline will accomplish: ![Diagram showing the pipelines to be created and the programs involved](/assets/images/pipeline-programs-diagram-c05dd62e86c2d0dfbea0241644c8afa2.png) #### Run a dbt job on merge[​](#run-a-dbt-job-on-merge "Direct link to Run a dbt job on merge") This job will take a bit more to setup, but is a good example of how to call the dbt API from a CI/CD pipeline. The concepts presented here can be generalized and used in whatever way best suits your use case. Run on merge If your Git provider has a native integration with dbt, you can take advantage of setting up [Merge jobs](https://docs.getdbt.com/docs/deploy/merge-jobs.md) in the UI. The setup below shows how to call the dbt API to run a job every time there's a push to your main branch (The branch where pull requests are typically merged. Commonly referred to as the main, primary, or master branch, but can be named differently). ##### 1. Get your dbt API key[​](#1-get-your-dbt-api-key "Direct link to 1. Get your dbt API key") When running a CI/CD pipeline you’ll want to use a service token instead of any individual’s API key. There are [detailed docs](https://docs.getdbt.com/docs/dbt-cloud-apis/service-tokens.md) available on this, but below is a quick rundown (this must be performed by an Account Admin): 1. Log in to your dbt account. 2. Click your account name at the bottom left-hand menu and go to **Account settings**. 3. Click [**Service tokens**](https://docs.getdbt.com/docs/dbt-cloud-apis/service-tokens.md) on the left. 4. Click **+ Create service token** to create a new token specifically for CI/CD API calls. 5. Name your token something like “CICD Token”. 6. Click the **+Add permission** button under **Access**, and grant this token the **Job Admin** permission. 7. Click **Save** and you’ll see a grey box appear with your token. Copy that and save it somewhere safe (this is a password, and should be treated as such). [![View of the dbt page where service tokens are created](/img/guides/orchestration/custom-cicd-pipelines/dbt-service-token-page.png?v=2 "View of the dbt page where service tokens are created")](#)View of the dbt page where service tokens are created [![Creating a new service token](/img/guides/orchestration/custom-cicd-pipelines/dbt-new-service-token-page.png?v=2 "Creating a new service token")](#)Creating a new service token ##### 2. Put your dbt API key into your repo[​](#2-put-your-dbt-api-key-into-your-repo "Direct link to 2. Put your dbt API key into your repo") This next part will happen in you code hosting platform. We need to save your API key from above into a repository secret so the job we create can access it. It is **not** recommended to ever save passwords or API keys in your code, so this step ensures that your key stays secure, but is still usable for your pipelines. * GitHub * GitLab * Azure DevOps * Bitbucket - Open up your repository where you want to run the pipeline (the same one that houses your dbt project). - Click *Settings* to open up the repository options. - On the left click the *Secrets and variables* dropdown in the *Security* section. - From that list, click on *Actions*. - Towards the middle of the screen, click the *New repository secret* button. - It will ask you for a name, so let’s call ours `DBT_API_KEY`. * **It’s very important that you copy/paste this name exactly because it’s used in the scripts below.** - In the *Secret* section, paste in the key you copied from dbt. - Click *Add secret* and you’re all set! \*\* A quick note on security: while using a repository secret is the most straightforward way to setup this secret, there are other options available to you in GitHub. They’re beyond the scope of this guide, but could be helpful if you need to create a more secure environment for running actions. Checkout GitHub’s documentation on secrets [here](https://docs.github.com/en/actions/security-guides/encrypted-secrets).\* Here’s a video showing these steps: * Open up your repository where you want to run the pipeline (the same one that houses your dbt project) * Click *Settings* > *CI/CD* * Under the *Variables* section, click *Expand,* then click *Add variable* * It will ask you for a name, so let’s call ours `DBT_API_KEY` * **It’s very important that you copy/paste this name exactly because it’s used in the scripts below.** * In the *Value* section, paste in the key you copied from dbt * Make sure the check box next to *Protect variable* is unchecked, and the box next to *Mask variable* is selected (see below) * “Protected” means that the variable is only available in pipelines that run on protected branches or protected tags - that won’t work for us because we want to run this pipeline on multiple branches. “Masked” means that it will be available to your pipeline runner, but will be masked in the logs. [![\[View of the GitLab window for entering DBT\_API\_KEY](/img/guides/orchestration/custom-cicd-pipelines/dbt-api-key-gitlab.png?v=2 "\[View of the GitLab window for entering DBT_API_KEY")](#)\[View of the GitLab window for entering DBT\_API\_KEY Here’s a video showing these steps: In Azure: * Open up your Azure DevOps project where you want to run the pipeline (the same one that houses your dbt project) * Click on *Pipelines* and then *Create Pipeline* * Select where your git code is located. It should be *Azure Repos Git* * Select your git repository from the list * Select *Starter pipeline* (this will be updated later in Step 4) * Click on *Variables* and then *New variable* * In the *Name* field, enter the `DBT_API_KEY` * **It’s very important that you copy/paste this name exactly because it’s used in the scripts below.** * In the *Value* section, paste in the key you copied from dbt * Make sure the check box next to *Keep this value secret* is checked. This will mask the value in logs, and you won't be able to see the value for the variable in the UI. * Click *OK* and then *Save* to save the variable * Save your new Azure pipeline [![View of the Azure pipelines window for entering DBT\_API\_KEY](/img/guides/orchestration/custom-cicd-pipelines/dbt-api-key-azure.png?v=2 "View of the Azure pipelines window for entering DBT_API_KEY")](#)View of the Azure pipelines window for entering DBT\_API\_KEY In Bitbucket: * Open up your repository where you want to run the pipeline (the same one that houses your dbt project) * In the left menu, click *Repository Settings* * Scroll to the bottom of the left menu, and select *Repository variables* * In the *Name* field, input `DBT_API_KEY` * **It’s very important that you copy/paste this name exactly because it’s used in the scripts below.** * In the *Value* section, paste in the key you copied from dbt * Make sure the check box next to *Secured* is checked. This will mask the value in logs, and you won't be able to see the value for the variable in the UI. * Click *Add* to save the variable ![View of the Bitbucket window for entering DBT\_API\_KEY](/assets/images/dbt-api-key-bitbucket-8b71a7b1de5da9986737d3cf494ff1d8.png) Here’s a video showing these steps: ##### 3. Create script to trigger dbt job via an API call[​](#3-create-script-to-trigger-dbt-job-via-an-api-call "Direct link to 3. Create script to trigger dbt job via an API call") In your project, create a new folder at the root level named `python`. In that folder, create a file named `run_and_monitor_dbt_job.py`. You'll copy/paste the contents from this [gist](https://gist.github.com/b-per/f4942acb8584638e3be363cb87769b48) into that file. ```yaml my_awesome_project ├── python │ └── run_and_monitor_dbt_job.py ``` This Python file has everything you need to call the dbt API, but requires a few inputs (see snip below). Those inputs are fed to this script through environment variables that will be defined in the next step. ```python #------------------------------------------------------------------------------ # get environment variables #------------------------------------------------------------------------------ api_base = os.getenv('DBT_URL', 'https://cloud.getdbt.com/') # default to multitenant url job_cause = os.getenv('DBT_JOB_CAUSE', 'API-triggered job') # default to generic message git_branch = os.getenv('DBT_JOB_BRANCH', None) # default to None schema_override = os.getenv('DBT_JOB_SCHEMA_OVERRIDE', None) # default to None api_key = os.environ['DBT_API_KEY'] # no default here, just throw an error here if key not provided account_id = os.environ['DBT_ACCOUNT_ID'] # no default here, just throw an error here if id not provided project_id = os.environ['DBT_PROJECT_ID'] # no default here, just throw an error here if id not provided job_id = os.environ['DBT_PR_JOB_ID'] # no default here, just throw an error here if id not provided ``` **Required input:** In order to call the dbt API, there are a few pieces of info the script needs. The easiest way to get these values is to open up the job you want to run in dbt. The URL when you’re inside the job has all the values you need: * `DBT_ACCOUNT_ID` - this is the number just after `accounts/` in the URL * `DBT_PROJECT_ID` - this is the number just after `projects/` in the URL * `DBT_PR_JOB_ID` - this is the number just after `jobs/` in the URL ![Image of a dbt job URL with the pieces for account, project, and job highlighted](/assets/images/dbt-cloud-job-url-30ca274dcf77589fb60b72371b59597c.png) ##### 4. Update your project to include the new API call[​](#4-update-your-project-to-include-the-new-api-call "Direct link to 4. Update your project to include the new API call") * GitHub * GitLab * Azure DevOps * Bitbucket For this new job, we'll add a file for the dbt API call named `dbt_run_on_merge.yml`. ```yaml my_awesome_project ├── python │ └── run_and_monitor_dbt_job.py ├── .github │ ├── workflows │ │ └── dbt_run_on_merge.yml │ │ └── lint_on_push.yml ``` The YAML file will look pretty similar to our earlier job, but there is a new section called `env` that we’ll use to pass in the required variables. Update the variables below to match your setup based on the comments in the file. It’s worth noting that we changed the `on:` section to now run **only** when there are pushes to a branch named `main` (for example, a pull request is merged). Have a look through [GitHub documentation](https://docs.github.com/en/actions/using-workflows/events-that-trigger-workflows) on these filters for additional use cases. For information about `github` context property names and their use cases, refer to the [GitHub documentation](https://docs.github.com/en/actions/writing-workflows/choosing-what-your-workflow-does/accessing-contextual-information-about-workflow-runs). ```yaml name: run dbt job on push # This filter says only run this job when there is a push to the main branch # This works off the assumption that you've restricted this branch to only all PRs to push to the default branch # Update the name to match the name of your default branch on: push: branches: - 'main' jobs: # the job calls the dbt API to run a job run_dbt_cloud_job: name: Run dbt Job runs-on: ubuntu-latest # Set the environment variables needed for the run env: DBT_ACCOUNT_ID: 00000 # enter your account id DBT_PROJECT_ID: 00000 # enter your project id DBT_PR_JOB_ID: 00000 # enter your job id DBT_API_KEY: ${{ secrets.DBT_API_KEY }} DBT_URL: https://cloud.getdbt.com # enter a URL that matches your job DBT_JOB_CAUSE: 'GitHub Pipeline CI Job' DBT_JOB_BRANCH: ${{ github.head_ref }} # Resolves to the head_ref or source branch of the pull request in a workflow run. steps: - uses: "actions/checkout@v4" - uses: "actions/setup-python@v5" with: python-version: "3.9" - name: Run dbt job run: "python python/run_and_monitor_dbt_job.py" ``` For this job, we'll set it up using the `gitlab-ci.yml` file as in the prior step (see Step 1 of the linting setup for more info). The YAML file will look pretty similar to our earlier job, but there is a new section called `variables` that we’ll use to pass in the required variables to the Python script. Update this section to match your setup based on the comments in the file. Please note that the `rules:` section now says to run **only** when there are pushes to a branch named `main`, such as a PR being merged. Have a look through [GitLab’s docs](https://docs.gitlab.com/ee/ci/yaml/#rules) on these filters for additional use cases. * Only dbt job * Lint and dbt job ```yaml image: python:3.9 variables: DBT_ACCOUNT_ID: 00000 # enter your account id DBT_PROJECT_ID: 00000 # enter your project id DBT_PR_JOB_ID: 00000 # enter your job id DBT_API_KEY: $DBT_API_KEY # secret variable in gitlab account DBT_URL: https://cloud.getdbt.com DBT_JOB_CAUSE: 'GitLab Pipeline CI Job' DBT_JOB_BRANCH: $CI_COMMIT_BRANCH stages: - build # this job calls the dbt API to run a job run-dbt-cloud-job: stage: build rules: - if: $CI_PIPELINE_SOURCE == "push" && $CI_COMMIT_BRANCH == 'main' script: - python python/run_and_monitor_dbt_job.py ``` ```yaml image: python:3.9 variables: DBT_ACCOUNT_ID: 00000 # enter your account id DBT_PROJECT_ID: 00000 # enter your project id DBT_PR_JOB_ID: 00000 # enter your job id DBT_API_KEY: $DBT_API_KEY # secret variable in gitlab account DBT_URL: https://cloud.getdbt.com DBT_JOB_CAUSE: 'GitLab Pipeline CI Job' DBT_JOB_BRANCH: $CI_COMMIT_BRANCH stages: - pre-build - build # this job runs SQLFluff with a specific set of rules # note the dialect is set to Snowflake, so make that specific to your setup # details on linter rules: https://docs.sqlfluff.com/en/stable/rules.html lint-project: stage: pre-build rules: - if: $CI_PIPELINE_SOURCE == "push" && $CI_COMMIT_BRANCH != 'main' script: - python -m pip install sqlfluff==0.13.1 - sqlfluff lint models --dialect snowflake --rules L019,L020,L021,L022 # this job calls the dbt API to run a job run-dbt-cloud-job: stage: build rules: - if: $CI_PIPELINE_SOURCE == "push" && $CI_COMMIT_BRANCH == 'main' script: - python python/run_and_monitor_dbt_job.py ``` For this new job, open the existing Azure pipeline you created above and select the *Edit* button. We'll want to edit the corresponding Azure pipeline YAML file with the appropriate configuration, instead of the starter code, along with including a `variables` section to pass in the required variables. Copy the below YAML file into your Azure pipeline and update the variables below to match your setup based on the comments in the file. It's worth noting that we changed the `trigger` section so that it will run **only** when there are pushes to a branch named `main` (like a PR merged to your main branch). Read through [Azure's docs](https://learn.microsoft.com/en-us/azure/devops/pipelines/build/triggers?view=azure-devops) on these filters for additional use cases. ```yaml name: Run dbt Job trigger: [ main ] # runs on pushes to main variables: DBT_URL: https://cloud.getdbt.com # no trailing backslash, adjust this accordingly for single-tenant deployments DBT_JOB_CAUSE: 'Azure Pipeline CI Job' # provide a descriptive job cause here for easier debugging down the road DBT_ACCOUNT_ID: 00000 # enter your account id DBT_PROJECT_ID: 00000 # enter your project id DBT_PR_JOB_ID: 00000 # enter your job id steps: - task: UsePythonVersion@0 inputs: versionSpec: '3.7' displayName: 'Use Python 3.7' - script: | python -m pip install requests displayName: 'Install python dependencies' - script: | python -u ./python/run_and_monitor_dbt_job.py displayName: 'Run dbt job ' env: DBT_API_KEY: $(DBT_API_KEY) # Set these values as secrets in the Azure pipelines Web UI ``` For this job, we'll set it up using the `bitbucket-pipelines.yml` file as in the prior step (see Step 1 of the linting setup for more info). The YAML file will look pretty similar to our earlier job, but we’ll pass in the required variables to the Python script using `export` statements. Update this section to match your setup based on the comments in the file. * Only job * Lint and dbt job ```yaml image: python:3.11.1 pipelines: branches: 'main': # override if your default branch doesn't run on a branch named "main" - step: name: 'Run dbt Job' script: - export DBT_URL="https://cloud.getdbt.com" # if you have a single-tenant deployment, adjust this accordingly - export DBT_JOB_CAUSE="Bitbucket Pipeline CI Job" - export DBT_ACCOUNT_ID=00000 # enter your account id here - export DBT_PROJECT_ID=00000 # enter your project id here - export DBT_PR_JOB_ID=00000 # enter your job id here - python python/run_and_monitor_dbt_job.py ``` ```yaml image: python:3.11.1 pipelines: branches: '**': # this sets a wildcard to run on every branch unless specified by name below - step: name: Lint dbt project script: - python -m pip install sqlfluff==0.13.1 - sqlfluff lint models --dialect snowflake --rules L019,L020,L021,L022 'main': # override if your default branch doesn't run on a branch named "main" - step: name: 'Run dbt Job' script: - export DBT_URL="https://cloud.getdbt.com" # if you have a single-tenant deployment, adjust this accordingly - export DBT_JOB_CAUSE="Bitbucket Pipeline CI Job" - export DBT_ACCOUNT_ID=00000 # enter your account id here - export DBT_PROJECT_ID=00000 # enter your project id here - export DBT_PR_JOB_ID=00000 # enter your job id here - python python/run_and_monitor_dbt_job.py ``` ##### 5. Test your new action[​](#5-test-your-new-action "Direct link to 5. Test your new action") Now that you have a shiny new action, it’s time to test it out! Since this change is setup to only run on merges to your default branch, you’ll need to create and merge this change into your main branch. Once you do that, you’ll see a new pipeline job has been triggered to run the dbt job you assigned in the variables section. Additionally, you’ll see the job in the run history of dbt. It should be fairly easy to spot because it will say it was triggered by the API, and the *INFO* section will have the branch you used for this guide. * GitHub * GitLab * Azure DevOps * Bitbucket [![dbt run on merge job in GitHub](/img/guides/orchestration/custom-cicd-pipelines/dbt-run-on-merge-github.png?v=2 "dbt run on merge job in GitHub")](#)dbt run on merge job in GitHub [![dbt job showing it was triggered by GitHub](/img/guides/orchestration/custom-cicd-pipelines/dbt-cloud-job-github-triggered.png?v=2 "dbt job showing it was triggered by GitHub")](#)dbt job showing it was triggered by GitHub [![dbt run on merge job in GitLab](/img/guides/orchestration/custom-cicd-pipelines/dbt-run-on-merge-gitlab.png?v=2 "dbt run on merge job in GitLab")](#)dbt run on merge job in GitLab [![dbt job showing it was triggered by GitLab](/img/guides/orchestration/custom-cicd-pipelines/dbt-cloud-job-gitlab-triggered.png?v=2 "dbt job showing it was triggered by GitLab")](#)dbt job showing it was triggered by GitLab [![dbt run on merge job in ADO](/img/guides/orchestration/custom-cicd-pipelines/dbt-run-on-merge-azure.png?v=2 "dbt run on merge job in ADO")](#)dbt run on merge job in ADO [![ADO-triggered job in dbt](/img/guides/orchestration/custom-cicd-pipelines/dbt-cloud-job-azure-triggered.png?v=2 "ADO-triggered job in dbt")](#)ADO-triggered job in dbt [![dbt run on merge job in Bitbucket](/img/guides/orchestration/custom-cicd-pipelines/dbt-run-on-merge-bitbucket.png?v=2 "dbt run on merge job in Bitbucket")](#)dbt run on merge job in Bitbucket [![dbt job showing it was triggered by Bitbucket](/img/guides/orchestration/custom-cicd-pipelines/dbt-cloud-job-bitbucket-triggered.png?v=2 "dbt job showing it was triggered by Bitbucket")](#)dbt job showing it was triggered by Bitbucket #### Run a dbt job on pull request[​](#run-a-dbt-job-on-pull-request "Direct link to Run a dbt job on pull request") If your git provider is not one with a native integration with dbt, but you still want to take advantage of CI builds, you've come to the right spot! With just a bit of work it's possible to setup a job that will run a dbt job when a pull request (PR) is created. Run on PR If your git provider has a native integration with dbt, you can take advantage of the setup instructions [here](https://docs.getdbt.com/docs/deploy/ci-jobs.md). This section is only for those projects that connect to their git repository using an SSH key. The setup for this pipeline will use the same steps as the prior page. Before moving on, follow steps 1-5 from the [prior page](https://docs.getdbt.com/guides/custom-cicd-pipelines.md?step=2). ##### 1. Create a pipeline job that runs when PRs are created[​](#1-create-a-pipeline-job-that-runs-when-prs-are-created "Direct link to 1. Create a pipeline job that runs when PRs are created") * Bitbucket For this job, we'll set it up using the `bitbucket-pipelines.yml` file as in the prior step. The YAML file will look pretty similar to our earlier job, but we’ll pass in the required variables to the Python script using `export` statements. Update this section to match your setup based on the comments in the file. **What is this pipeline going to do?**
The setup below will trigger a dbt job to run every time a PR is opened in this repository. It will also run a fresh version of the pipeline for every commit that is made on the PR until it is merged. For example: If you open a PR, it will run the pipeline. If you then decide additional changes are needed, and commit/push to the PR branch, a new pipeline will run with the updated code. The following variables control this job: * `DBT_JOB_BRANCH`: Tells the dbt job to run the code in the branch that created this PR * `DBT_JOB_SCHEMA_OVERRIDE`: Tells the dbt job to run this into a custom target schema * The format of this will look like: `DBT_CLOUD_PR_{REPO_KEY}_{PR_NUMBER}` ```yaml image: python:3.11.1 pipelines: # This job will run when pull requests are created in the repository pull-requests: '**': - step: name: 'Run dbt PR Job' script: # Check to only build if PR destination is master (or other branch). # Comment or remove line below if you want to run on all PR's regardless of destination branch. - if [ "${BITBUCKET_PR_DESTINATION_BRANCH}" != "main" ]; then printf 'PR Destination is not master, exiting.'; exit; fi - export DBT_URL="https://cloud.getdbt.com" - export DBT_JOB_CAUSE="Bitbucket Pipeline CI Job" - export DBT_JOB_BRANCH=$BITBUCKET_BRANCH - export DBT_JOB_SCHEMA_OVERRIDE="DBT_CLOUD_PR_"$BITBUCKET_PROJECT_KEY"_"$BITBUCKET_PR_ID - export DBT_ACCOUNT_ID=00000 # enter your account id here - export DBT_PROJECT_ID=00000 # enter your project id here - export DBT_PR_JOB_ID=00000 # enter your job id here - python python/run_and_monitor_dbt_job.py ``` ##### 2. Confirm the pipeline runs[​](#2-confirm-the-pipeline-runs "Direct link to 2. Confirm the pipeline runs") Now that you have a new pipeline, it's time to run it and make sure it works. Since this only triggers when a PR is created, you'll need to create a new PR on a branch that contains the code above. Once you do that, you should see a pipeline that looks like this: * Bitbucket Bitbucket pipeline: ![dbt run on PR job in Bitbucket](/assets/images/bitbucket-run-on-pr-1887d932eaa80e51157249beef6114a3.png) dbt job: ![ job showing it was triggered by Bitbucket](/assets/images/bitbucket-dbt-cloud-pr-1453e2c293a941eec4561ab9ef045a05.png) ##### 3. Handle those extra schemas in your database[​](#3-handle-those-extra-schemas-in-your-database "Direct link to 3. Handle those extra schemas in your database") As noted above, when the PR job runs it will create a new schema based on the PR. To avoid having your database overwhelmed with PR schemas, consider adding a "cleanup" job to your dbt account. This job can run on a scheduled basis to cleanup any PR schemas that haven't been updated/used recently. Add this as a macro to your project. It takes 2 arguments that lets you control which schema get dropped: * `age_in_days`: The number of days since the schema was last altered before it should be dropped (default 10 days) * `database_to_clean`: The name of the database to remove schemas from ```sql {# This macro finds PR schemas older than a set date and drops them The macro defaults to 10 days old, but can be configured with the input argument age_in_days Sample usage with different date: dbt run-operation pr_schema_cleanup --args "{'database_to_clean': 'analytics','age_in_days':'15'}" #} {% macro pr_schema_cleanup(database_to_clean, age_in_days=10) %} {% set find_old_schemas %} select 'drop schema {{ database_to_clean }}.'||schema_name||';' from {{ database_to_clean }}.information_schema.schemata where catalog_name = '{{ database_to_clean | upper }}' and schema_name ilike 'DBT_CLOUD_PR%' and last_altered <= (current_date() - interval '{{ age_in_days }} days') {% endset %} {% if execute %} {{ log('Schema drop statements:' ,True) }} {% set schema_drop_list = run_query(find_old_schemas).columns[0].values() %} {% for schema_to_drop in schema_drop_list %} {% do run_query(schema_to_drop) %} {{ log(schema_to_drop ,True) }} {% endfor %} {% endif %} {% endmacro %} ``` This macro goes into a dbt job that is run on a schedule. The command will look like this (text below for copy/paste): ![ job showing the run operation command for the cleanup macro](/assets/images/dbt-macro-cleanup-pr-c053bfe70d3bc2d4aefa3211713238ce.png) `dbt run-operation pr_schema_cleanup --args "{ 'database_to_clean': 'development','age_in_days':15}"` #### Consider risk of conflicts when using multiple orchestration tools[​](#consider-risk-of-conflicts-when-using-multiple-orchestration-tools "Direct link to Consider risk of conflicts when using multiple orchestration tools") Running dbt jobs through a CI/CD pipeline is a form of job orchestration. If you also run jobs using dbt’s built in scheduler, you now have 2 orchestration tools running jobs. The risk with this is that you could run into conflicts - you can imagine a case where you are triggering a pipeline on certain actions and running scheduled jobs in dbt, you would probably run into job clashes. The more tools you have, the more you have to make sure everything talks to each other. That being said, if **the only reason you want to use pipelines is for adding a lint check or run on merge**, you might decide the pros outweigh the cons, and as such you want to go with a hybrid approach. Just keep in mind that if two processes try and run the same job at the same time, dbt will queue the jobs and run one after the other. It’s a balancing act but can be accomplished with diligence to ensure you’re orchestrating jobs in a manner that does not conflict. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Debug errors [Back to guides](https://docs.getdbt.com/guides.md) Troubleshooting dbt Core dbt platform Beginner [Menu ]() #### General process of debugging[​](#general-process-of-debugging "Direct link to General process of debugging") Learning how to debug is a skill, and one that will make you great at your role! 1. Read the error message — when writing the code behind dbt, we try our best to make error messages as useful as we can. The error message dbt produces will normally contain the type of error (more on these error types below), and the file where the error occurred. 2. Inspect the file that was known to cause the issue, and see if there's an immediate fix. 3. Isolate the problem — for example, by running one model a time, or by undoing the code that broke things. 4. Get comfortable with compiled files and the logs. * The `target/compiled` directory contains `select` statements that you can run in any query editor. * The `target/run` directory contains the SQL dbt executes to build your models. * The `logs/dbt.log` file contains all the queries that dbt runs, and additional logging. Recent errors will be at the bottom of the file. * **dbt users**: Use the above, or the `Details` tab in the command output. * **dbt Core users**: Note that your code editor *may* be hiding these files from the tree view [VSCode help](https://stackoverflow.com/questions/42891463/how-can-i-show-ignored-files-in-visual-studio-code)). 5. If you are really stuck, try [asking for help](https://docs.getdbt.com/community/resources/getting-help.md). Before doing so, take the time to write your question well so that others can diagnose the problem quickly. #### Types of errors[​](#types-of-errors "Direct link to Types of errors") Below, we've listed some of common errors. It's useful to understand what dbt is doing behind the scenes when you execute a command like `dbt run`. | Step | Description | Error type | | ---------------- | --------------------------------------------------------------------------------- | ------------------- | | Initialize | Check that this a dbt project, and that dbt can connect to the warehouse | `Runtime Error` | | Parsing | Check that the Jinja snippets in `.sql` files valid, and that `.yml` files valid. | `Compilation Error` | | Graph validation | Compile the dependencies into a graph. Check that it's acyclic. | `Dependency Error` | | SQL execution | Run the models | `Database Error` | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | Let's dive into some of these errors and how to debug 👇. Note: not all errors are covered here! #### Runtime Errors[​](#runtime-errors "Direct link to Runtime Errors") *Note: If you're using the Studio IDE to work on your project, you're unlikely to encounter these errors.* ##### Not a dbt project[​](#not-a-dbt-project "Direct link to Not a dbt project") ```text Running with dbt=1.7.1 Encountered an error: Runtime Error fatal: Not a dbt project (or any of the parent directories). Missing dbt_project.yml file ``` Debugging * Use `pwd` to check that you're in the right directory. If not, `cd` your way there! * Check that you have a file named `dbt_project.yml` in the root directory of your project. You can use `ls` to list files in the directory, or also open the directory in a code editor and see files in the "tree view". ##### Could not find profile[​](#could-not-find-profile "Direct link to Could not find profile") ```text Running with dbt=1.7.1 Encountered an error: Runtime Error Could not run dbt Could not find profile named 'jaffle_shops' ``` Debugging * Check the `profile:` key in your `dbt_project.yml`. For example, this project uses the `jaffle_shops` (note plural) profile: dbt\_project.yml ```yml profile: jaffle_shops # note the plural ``` * Check the profiles you have in your `profiles.yml` file. For example, this profile is named `jaffle_shop` (note singular). profiles.yml ```yaml jaffle_shop: # this does not match the profile: key target: dev outputs: dev: type: postgres schema: dbt_alice ... # other connection details ``` * Update these so that they match. * If you can't find your `profiles.yml` file, run `dbt debug --config-dir` for help: ```text $ dbt debug --config-dir Running with dbt=1.7.1 To view your profiles.yml file, run: open /Users/alice/.dbt ``` * Then execute `open /Users/alice/.dbt` (adjusting accordingly), and check that you have a `profiles.yml` file. If you do not have one, set one up using [these docs](https://docs.getdbt.com/docs/local/profiles.yml.md) ##### Failed to connect[​](#failed-to-connect "Direct link to Failed to connect") ```text Encountered an error: Runtime Error Database error while listing schemas in database "analytics" Database Error 250001 (08001): Failed to connect to DB: your_db.snowflakecomputing.com:443. Incorrect username or password was specified. ``` Debugging * Open your `profiles.yml` file (if you're unsure where this is, run `dbt debug --config-dir`) * Confirm that your credentials are correct — you may need to work with a DBA to confirm this. * After updating the credentials, run `dbt debug` to check you can connect ```text $ dbt debug Running with dbt=1.7.1 Using profiles.yml file at /Users/alice/.dbt/profiles.yml Using dbt_project.yml file at /Users/alice/jaffle-shop-dbt/dbt_project.yml Configuration: profiles.yml file [OK found and valid] dbt_project.yml file [OK found and valid] Required dependencies: - git [OK found] Connection: ... Connection test: OK connection ok ``` ##### Invalid `dbt_project.yml` file[​](#invalid-dbt_projectyml-file "Direct link to invalid-dbt_projectyml-file") ```text Encountered an error while reading the project: ERROR: Runtime Error at path []: Additional properties are not allowed ('hello' was unexpected) Error encountered in /Users/alice/jaffle-shop-dbt/dbt_project.yml Encountered an error: Runtime Error Could not run dbt ``` Debugging * Open your `dbt_project.yml` file. * Find the offending key (e.g. `hello`, as per "'hello' was unexpected") dbt\_project.yml ```yml name: jaffle_shop hello: world # this is not allowed ``` * Use the reference section for [`dbt_project.yml` files](https://docs.getdbt.com/reference/dbt_project.yml.md) to correct this issue. * If you're using a key that is valid according to the documentation, check that you're using the latest version of dbt with `dbt --version`. #### Compilation Errors[​](#compilation-errors "Direct link to Compilation Errors") *Note: if you're using the Studio IDE to work on your dbt project, this error often shows as a red bar in your command prompt as you work on your dbt project. For dbt Core users, these won't get picked up until you run `dbt run` or `dbt compile`.* ##### Invalid `ref` function[​](#invalid-ref-function "Direct link to invalid-ref-function") ```text $ dbt run -s customers Running with dbt=1.1.0 Encountered an error: Compilation Error in model customers (models/customers.sql) Model 'model.jaffle_shop.customers' (models/customers.sql) depends on a node named 'stg_customer' which was not found ``` Debugging * Open the `models/customers.sql` file. * `cmd + f` (or equivalent) for `stg_customer`. There must be a file named `stg_customer.sql` for this to work. * Replace this reference with a reference to another model (i.e. the filename for another model), in this case `stg_customers`. OR rename your model to `stg_customer` ##### Invalid Jinja[​](#invalid-jinja "Direct link to Invalid Jinja") ```text $ dbt run Running with dbt=1.7.1 Compilation Error in macro (macros/cents_to_dollars.sql) Reached EOF without finding a close tag for macro (searched from line 1) ``` Debugging Here, we rely on the Jinja library to pass back an error, and then just pass it on to you. This particular example is for a forgotten `{% endmacro %}` tag, but you can also get errors like this for: * Forgetting a closing `}` * Closing a `for` loop before closing an `if` statement To fix this: * Navigate to the offending file (e.g. `macros/cents_to_dollars.sql`) as listed in the error message * Use the error message to find your mistake To prevent this: * *(dbt Core only)* Use snippets to auto-complete pieces of Jinja ([atom-dbt package](https://github.com/dbt-labs/atom-dbt)) ##### Invalid YAML[​](#invalid-yaml "Direct link to Invalid YAML") dbt wasn't able to turn your YAML into a valid dictionary. ```text $ dbt run Running with dbt=1.7.1 Encountered an error: Compilation Error Error reading jaffle_shop: schema.yml - Runtime Error Syntax error near line 5 ------------------------------ 2 | 3 | models: 4 | - name: customers 5 | columns: 6 | - name: customer_id 7 | data_tests: 8 | - unique Raw Error: ------------------------------ mapping values are not allowed in this context in "", line 5, column 12 ``` Debugging Usually, it's to do with indentation — here's the offending YAML that caused this error: ```yaml models: - name: customers columns: # this is indented too far! - name: customer_id data_tests: - unique - not_null ``` To fix this: * Open the offending file (e.g. `schema.yml`) * Check the line in the error message (e.g. `line 5`) * Find the mistake and fix it To prevent this: * (dbt Core users) Turn on indentation guides in your code editor to help you inspect your files * Use a YAML validator ([example](http://www.yamllint.com/)) to debug any issues ##### Incorrect YAML spec[​](#incorrect-yaml-spec "Direct link to Incorrect YAML spec") Slightly different error — the YAML structure is right (i.e. the YAML parser can turn this into a python dictionary), *but* there's a key that dbt doesn't recognize. ```text $ dbt run Running with dbt=1.7.1 Encountered an error: Compilation Error Invalid models config given in models/schema.yml @ models: {'name': 'customers', 'hello': 'world', 'columns': [{'name': 'customer_id', 'tests': ['unique', 'not_null']}], 'original_file_path': 'models/schema.yml', 'yaml_key': 'models', 'package_name': 'jaffle_shop'} - at path []: Additional properties are not allowed ('hello' was unexpected) ``` Debugging * Open the file (e.g. `models/schema.yml`) as per the error message * Search for the offending key (e.g. `hello`, as per "**'hello'** was unexpected") * Fix it. Use the [model properties](https://docs.getdbt.com/reference/model-properties.md) docs to find valid keys * If you are using a valid key, check that you're using the latest version of dbt with `dbt --version` #### Dependency Errors[​](#dependency-errors "Direct link to Dependency Errors") ```text $ dbt run Running with dbt=1.7.1-rc Encountered an error: Found a cycle: model.jaffle_shop.customers --> model.jaffle_shop.stg_customers --> model.jaffle_shop.customers ``` Your dbt DAG is not acyclic, and needs to be fixed! * Update the `ref` functions to break the cycle. * If you need to reference the current model, use the [`{{ this }}` variable](https://docs.getdbt.com/reference/dbt-jinja-functions/this.md) instead. #### Database Errors[​](#database-errors "Direct link to Database Errors") The thorniest errors of all! These errors come from your data warehouse, and dbt passes the message on. You may need to use your warehouse docs (i.e. the Snowflake docs, or BigQuery docs) to debug these. ```text $ dbt run ... Completed with 1 error and 0 warnings: Database Error in model customers (models/customers.sql) 001003 (42000): SQL compilation error: syntax error line 14 at position 4 unexpected 'from'. compiled SQL at target/run/jaffle_shop/models/customers.sql ``` 90% of the time, there's a mistake in the SQL of your model. To fix this: 1. Open the offending file: * **dbt:** Open the model (in this case `models/customers.sql` as per the error message) * **dbt Core:** Open the model as above. Also open the compiled SQL (in this case `target/run/jaffle_shop/models/customers.sql` as per the error message) — it can be useful to show these side-by-side in your code editor. 2. Try to re-execute the SQL to isolate the error: * **dbt:** Use the `Preview` button from the model file * **dbt Core:** Copy and paste the compiled query into a query runner (e.g. the Snowflake UI, or a desktop app like DataGrip / TablePlus) and execute it 3. Fix the mistake. 4. Rerun the failed model. In some cases, these errors might occur as a result of queries that dbt runs "behind-the-scenes". These include: * Introspective queries to list objects in your database * Queries to `create` schemas * `pre-hooks`s, `post-hooks`, `on-run-end` hooks and `on-run-start` hooks * For incremental models, and snapshots: merge, update and insert statements In these cases, you should check out the logs — this contains *all* the queries dbt has run. * **dbt**: Use the `Details` in the command output to see logs, or check the `logs/dbt.log` file * **dbt Core**: Open the `logs/dbt.log` file. Isolating errors in the logs If you're hitting a strange `Database Error`, it can be a good idea to clean out your logs by opening the file, and deleting the contents. Then, re-execute `dbt run` for *just* the problematic model. The logs will *just* have the output you're looking for. #### Common pitfalls[​](#common-pitfalls "Direct link to Common pitfalls") ##### `Preview` vs. `dbt run`[​](#preview-vs-dbt-run "Direct link to preview-vs-dbt-run") *(Studio IDE users only)* There's two interfaces that look similar: * The `Preview` button executes whatever SQL statement is in the active tab. It is the equivalent of grabbing the compiled `select` statement from the `target/compiled` directory and running it in a query editor to see the results. * The `dbt run` command builds relations in your database Using the `Preview` button is useful when developing models and you want to visually inspect the results of a query. However, you'll need to make sure you have executed `dbt run` for any upstream models — otherwise dbt will try to select `from` tables and views that haven't been built. ##### Forgetting to save files before running[​](#forgetting-to-save-files-before-running "Direct link to Forgetting to save files before running") We've all been there. dbt uses the last-saved version of a file when you execute a command. In most code editors, and in the Studio IDE, a dot next to a filename indicates that a file has unsaved changes. Make sure you hit `cmd + s` (or equivalent) before running any dbt commands — over time it becomes muscle memory. ##### Editing compiled files[​](#editing-compiled-files "Direct link to Editing compiled files") *(More likely for dbt Core users)* If you just opened a SQL file in the `target/` directory to help debug an issue, it's not uncommon to accidentally edit that file! To avoid this, try changing your code editor settings to grey out any files in the `target/` directory — the visual cue will help avoid the issue. #### FAQs[​](#faqs "Direct link to FAQs") Here are some useful FAQs to help you debug your dbt project: * How to generate HAR files HTTP Archive (HAR) files are used to gather data from users’ browser, which dbt Support uses to troubleshoot network or resource issues. This information includes detailed timing information about the requests made between the browser and the server. The following sections describe how to generate HAR files using common browsers such as [Google Chrome](#google-chrome), [Mozilla Firefox](#mozilla-firefox), [Apple Safari](#apple-safari), and [Microsoft Edge](#microsoft-edge). info Remove or hide any confidential or personally identifying information before you send the HAR file to dbt Labs. You can edit the file using a text editor. ##### Google Chrome[​](#google-chrome "Direct link to Google Chrome") 1. Open Google Chrome. 2. Click on **View** --> **Developer Tools**. 3. Select the **Network** tab. 4. Ensure that Google Chrome is recording. A red button (🔴) indicates that a recording is already in progress. Otherwise, click **Record network log**. 5. Select **Preserve Log**. 6. Clear any existing logs by clicking **Clear network log** (🚫). 7. Go to the page where the issue occurred and reproduce the issue. 8. Click **Export HAR** (the down arrow icon) to export the file as HAR. The icon is located on the same row as the **Clear network log** button. 9. Save the HAR file. 10. Upload the HAR file to the dbt Support ticket thread. ##### Mozilla Firefox[​](#mozilla-firefox "Direct link to Mozilla Firefox") 1. Open Firefox. 2. Click the application menu and then **More tools** --> **Web Developer Tools**. 3. In the developer tools docked tab, select **Network**. 4. Go to the page where the issue occurred and reproduce the issue. The page automatically starts recording as you navigate. 5. When you're finished, click **Pause/Resume recording network log**. 6. Right-click anywhere in the **File** column and select **Save All as HAR**. 7. Save the HAR file. 8. Upload the HAR file to the dbt Support ticket thread. ##### Apple Safari[​](#apple-safari "Direct link to Apple Safari") 1. Open Safari. 2. In case the **Develop** menu doesn't appear in the menu bar, go to **Safari** and then **Settings**. 3. Click **Advanced**. 4. Select the **Show features for web developers** checkbox. 5. From the **Develop** menu, select **Show Web Inspector**. 6. Click the **Network tab**. 7. Go to the page where the issue occurred and reproduce the issue. 8. When you're finished, click **Export**. 9. Save the file. 10. Upload the HAR file to the dbt Support ticket thread. ##### Microsoft Edge[​](#microsoft-edge "Direct link to Microsoft Edge") 1. Open Microsoft Edge. 2. Click the **Settings and more** menu (...) to the right of the toolbar and then select **More tools** --> **Developer tools**. 3. Click **Network**. 4. Ensure that Microsoft Edge is recording. A red button (🔴) indicates that a recording is already in progress. Otherwise, click **Record network log**. 5. Go to the page where the issue occurred and reproduce the issue. 6. When you're finished, click **Stop recording network log**. 7. Click **Export HAR** (the down arrow icon) or press **Ctrl + S** to export the file as HAR. 8. Save the HAR file. 9. Upload the HAR file to the dbt Support ticket thread. ##### Additional resources[​](#additional-resources "Direct link to Additional resources") Check out the [How to generate a HAR file in Chrome](https://www.loom.com/share/cabdb7be338243f188eb619b4d1d79ca) video for a visual guide on how to generate HAR files in Chrome. * Reconnecting to Snowflake OAuth after authentication expires When you connect Snowflake to dbt platform using [OAuth](https://docs.getdbt.com/docs/cloud/manage-access/set-up-snowflake-oauth.md), dbt stores a refresh token. This allows your development credentials to remain usable in tools like the Studio IDE and the dbt Semantic Layer without needing to re-authenticate each time. If you see an `authentication has expired` error when you try to run queries, you must renew your connection between Snowflake and the dbt platform. To resolve the issue, complete the following steps: 1. Go to your **Profile settings** page, accessible from the navigation menu. 2. Navigate to **Credentials** and then choose the project where you're experiencing the issue. 3. Under **Development credentials**, click the **Reconnect Snowflake Account** button. This will guide you through re-authenticating using your SSO workflow. Your Snowflake administrator can [configure the refresh token validity period](https://docs.getdbt.com/docs/cloud/manage-access/set-up-snowflake-oauth.md#create-a-security-integration), up to the maximum 90 days. If you've tried these step and are still getting this error, please contact the Support team at for further assistance. * Receiving a 'Could not parse dbt\_project.yml' error in dbt job The error message `Could not parse dbt_project.yml: while scanning for...` in your dbt job run or development usually occurs for several reasons: * There's a parsing failure in a YAML file (such as a tab indentation or Unicode characters). * Your `dbt_project.yml` file has missing fields or incorrect formatting. * Your `dbt_project.yml` file doesn't exist in your dbt project repository. To resolve this issue, consider the following: * Use an online YAML parser or validator to check for any parsing errors in your YAML file. Some known parsing errors include missing fields, incorrect formatting, or tab indentation. * Or ensure your `dbt_project.yml` file exists. Once you've identified the issue, you can fix the error and rerun your dbt job. * How can I fix my .gitignore file? A gitignore file specifies which files Git should intentionally ignore. You can identify these files in your project by their italics formatting. If you can't revert changes, check out a branch, or click commit — this is usually do to your project missing a [.gitignore](https://github.com/dbt-labs/dbt-starter-project/blob/main/.gitignore) file OR your gitignore file doesn't contain the necessary content inside the folder. To fix this, complete the following steps: 1. In the Studio IDE, add the following [.gitignore contents](https://github.com/dbt-labs/dbt-starter-project/blob/main/.gitignore) in your dbt project `.gitignore` file: ```bash target/ dbt_packages/ logs/ # legacy -- renamed to dbt_packages in dbt v1 dbt_modules/ ``` 2. Save your changes but *don't commit* 3. Restart the Studio IDE by clicking on the three dots next to the **Studio IDE Status button** on the lower right of the Studio IDE. [![Restart the IDE by clicking the three dots on the lower right or click on the Status bar](/img/docs/dbt-cloud/cloud-ide/restart-ide.png?v=2 "Restart the IDE by clicking the three dots on the lower right or click on the Status bar")](#)Restart the IDE by clicking the three dots on the lower right or click on the Status bar 4. Select **Restart Studio IDE**. 5. Go back to the **File explorer** in the IDE and delete the following files or folders if you have them: * `target`, `dbt_modules`, `dbt_packages`, `logs` 6. **Save** and then **Commit and sync** your changes. 7. Restart the Studio IDE again. 8. Create a pull request (PR) under the **Version Control** menu to integrate your new changes. 9. Merge the PR on your git provider page. 10. Switch to your main branch and click on **Pull from remote** to pull in all the changes you made to your main branch. You can verify the changes by making sure the files/folders in the .gitignore file are in italics. [![A dbt project on the main branch that has properly configured gitignore folders (highlighted in italics).](/img/docs/dbt-cloud/cloud-ide/gitignore-italics.png?v=2 "A dbt project on the main branch that has properly configured gitignore folders (highlighted in italics).")](#)A dbt project on the main branch that has properly configured gitignore folders (highlighted in italics). For more info, refer to this [detailed video](https://www.loom.com/share/9b3b8e2b617f41a8bad76ec7e42dd014) for additional guidance. * I'm receiving a 'This run exceeded your account's run memory limits' error in my failed job If you're receiving a `This run exceeded your account's run memory limits` error in your failed job, it means that the job exceeded the [memory limits](https://docs.getdbt.com/docs/deploy/job-scheduler.md#job-memory) set for your account. All dbt accounts have a pod memory of 600Mib and memory limits are on a per run basis. They're typically influenced by the amount of result data that dbt has to ingest and process, which is small but can become bloated unexpectedly by project design choices. ##### Common reasons[​](#common-reasons "Direct link to Common reasons") Some common reasons for higher memory usage are: * dbt run/build: Macros that capture large result sets from run query may not all be necessary and may be memory inefficient. * dbt docs generate: Source or model schemas with large numbers of tables (even if those tables aren't all used by dbt) cause the ingest of very large results for catalog queries. ##### Resolution[​](#resolution "Direct link to Resolution") There are various reasons why you could be experiencing this error but they are mostly the outcome of retrieving too much data back into dbt. For example, using the `run_query()` operations or similar macros, or even using database/schemas that have a lot of other non-dbt related tables/views. Try to reduce the amount of data / number of rows retrieved back into dbt by refactoring the SQL in your `run_query()` operation using `group`, `where`, or `limit` clauses. Additionally, you can also use a database/schema with fewer non-dbt related tables/views. Video example As an additional resource, check out [this example video](https://www.youtube.com/watch?v=sTqzNaFXiZ8), which demonstrates how to refactor the sample code by reducing the number of rows returned. If you've tried the earlier suggestions and are still experiencing failed job runs with this error about hitting the memory limits of your account, please [reach out to support](mailto:support@getdbt.com). We're happy to help! ##### Additional resources[​](#additional-resources "Direct link to Additional resources") * [Blog post on how we shaved 90 mins off](https://docs.getdbt.com/blog/how-we-shaved-90-minutes-off-model) * Why am I receiving a Runtime Error in my packages? If you're receiving the runtime error below in your packages.yml folder, it may be due to an old version of your dbt\_utils package that isn't compatible with your current dbt version. ```shell Running with dbt=xxx Runtime Error Failed to read package: Runtime Error Invalid config version: 1, expected 2 Error encountered in dbt_utils/dbt_project.yml ``` Try updating the old version of the dbt\_utils package in your packages.yml to the latest version found in the [dbt hub](https://hub.getdbt.com/dbt-labs/dbt_utils/latest/): ```shell packages: - package: dbt-labs/dbt_utils version: xxx ``` If you've tried the workaround above and are still experiencing this behavior - reach out to the Support team at and we'll be happy to help! * \[Error] Could not find my\_project package If a package name is included in the `search_order` of a project-level `dispatch` config, dbt expects that package to contain macros which are viable candidates for dispatching. If an included package does not contain *any* macros, dbt will raise an error like: ```shell Compilation Error In dispatch: Could not find package 'my_project' ``` This does not mean the package or root project is missing—it means that any macros from it are missing, and so it is missing from the search spaces available to `dispatch`. If you've tried the step above and are still experiencing this behavior - reach out to the Support team at and we'll be happy to help! * What happens if the SQL in my query is bad or I get a database error? If there's a mistake in your SQL, dbt will return the error that your database returns. ```shell $ dbt run --select customers Running with dbt=1.9.0 Found 3 models, 9 tests, 0 snapshots, 0 analyses, 133 macros, 0 operations, 0 seed files, 0 sources 14:04:12 | Concurrency: 1 threads (target='dev') 14:04:12 | 14:04:12 | 1 of 1 START view model dbt_alice.customers.......................... [RUN] 14:04:13 | 1 of 1 ERROR creating view model dbt_alice.customers................. [ERROR in 0.81s] 14:04:13 | 14:04:13 | Finished running 1 view model in 1.68s. Completed with 1 error and 0 warnings: Database Error in model customers (models/customers.sql) Syntax error: Expected ")" but got identifier `your-info-12345` at [13:15] compiled SQL at target/run/jaffle_shop/customers.sql Done. PASS=0 WARN=0 ERROR=1 SKIP=0 TOTAL=1 ``` Any models downstream of this model will also be skipped. Use the error message and the [compiled SQL](https://docs.getdbt.com/faqs/Runs/checking-logs.md) to debug any errors. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Debug schema names [Back to guides](https://docs.getdbt.com/guides.md) dbt Core Troubleshooting Advanced [Menu ]() #### Introduction[​](#introduction "Direct link to Introduction") If a model uses the [`schema` config](https://docs.getdbt.com/reference/resource-properties/schema.md) but builds under an unexpected schema, here are some steps for debugging the issue. The full explanation of custom schemas can be found [here](https://docs.getdbt.com/docs/build/custom-schemas.md). You can also follow along via this video: #### Search for a macro named `generate_schema_name`[​](#search-for-a-macro-named-generate_schema_name "Direct link to search-for-a-macro-named-generate_schema_name") Do a file search to check if you have a macro named `generate_schema_name` in the `macros` directory of your project. ##### You do not have a macro named `generate_schema_name` in your project[​](#you-do-not-have-a-macro-named-generate_schema_name-in-your-project "Direct link to you-do-not-have-a-macro-named-generate_schema_name-in-your-project") This means that you are using dbt's default implementation of the macro, as defined [here](https://github.com/dbt-labs/dbt-adapters/blob/60005a0a2bd33b61cb65a591bc1604b1b3fd25d5/dbt/include/global_project/macros/get_custom_name/get_custom_schema.sql) ```sql {% macro generate_schema_name(custom_schema_name, node) -%} {%- set default_schema = target.schema -%} {%- if custom_schema_name is none -%} {{ default_schema }} {%- else -%} {{ default_schema }}_{{ custom_schema_name | trim }} {%- endif -%} {%- endmacro %} ``` Note that this logic is designed so that two dbt users won't accidentally overwrite each other's work by writing to the same schema. ##### You have a `generate_schema_name` macro in a project that calls another macro[​](#you-have-a-generate_schema_name-macro-in-a-project-that-calls-another-macro "Direct link to you-have-a-generate_schema_name-macro-in-a-project-that-calls-another-macro") If your `generate_schema_name` macro looks like so: ```sql {% macro generate_schema_name(custom_schema_name, node) -%} {{ generate_schema_name_for_env(custom_schema_name, node) }} {%- endmacro %} ``` Your project is switching out the `generate_schema_name` macro for another macro, `generate_schema_name_for_env`. Similar to the above example, this is a macro which is defined in dbt's global project, [here](https://github.com/dbt-labs/dbt-adapters/blob/main/dbt/include/global_project/macros/get_custom_name/get_custom_schema.sql). ```sql {% macro generate_schema_name_for_env(custom_schema_name, node) -%} {%- set default_schema = target.schema -%} {%- if target.name == 'prod' and custom_schema_name is not none -%} {{ custom_schema_name | trim }} {%- else -%} {{ default_schema }} {%- endif -%} {%- endmacro %} ``` ##### You have a `generate_schema_name` macro with custom logic[​](#you-have-a-generate_schema_name-macro-with-custom-logic "Direct link to you-have-a-generate_schema_name-macro-with-custom-logic") If this is the case — it might be a great idea to reach out to the person who added this macro to your project, as they will have context here — you can use [GitHub's blame feature](https://docs.github.com/en/free-pro-team@latest/github/managing-files-in-a-repository/tracking-changes-in-a-file) to do this. In all cases take a moment to read through the Jinja to see if you can follow the logic. #### Confirm your `schema` config[​](#confirm-your-schema-config "Direct link to confirm-your-schema-config") Check if you are using the [`schema` config](https://docs.getdbt.com/reference/resource-properties/schema.md) in your model, either via a `{{ config() }}` block, or from `dbt_project.yml`. In both cases, dbt passes this value as the `custom_schema_name` parameter of the `generate_schema_name` macro. #### Confirm your target values[​](#confirm-your-target-values "Direct link to Confirm your target values") Most `generate_schema_name` macros incorporate logic from the [`target` variable](https://docs.getdbt.com/reference/dbt-jinja-functions/target.md), in particular `target.schema` and `target.name`. Use the docs [here](https://docs.getdbt.com/reference/dbt-jinja-functions/target.md) to help you find the values of each key in this dictionary. #### Put the two together[​](#put-the-two-together "Direct link to Put the two together") Now, re-read through the logic of your `generate_schema_name` macro, and mentally plug in your `customer_schema_name` and `target` values. You should find that the schema dbt is constructing for your model matches the output of your `generate_schema_name` macro. Be careful. Snapshots do not follow this behavior if target\_schema is set. To have environment-aware snapshots in v1.9+ or dbt, remove the [target\_schema config](https://docs.getdbt.com/reference/resource-configs/target_schema.md) from your snapshots. If you still want a custom schema for your snapshots, use the [`schema`](https://docs.getdbt.com/reference/resource-configs/schema.md) config instead. #### Adjust as necessary[​](#adjust-as-necessary "Direct link to Adjust as necessary") Now that you understand how a model's schema is being generated, you can adjust as necessary: * You can adjust the logic in your `generate_schema_name` macro (or add this macro to your project if you don't yet have one and adjust from there) * You can also adjust your `target` details (for example, changing the name of a target) If you change the logic in `generate_schema_name`, it's important that you consider whether two users will end up writing to the same schema when developing dbt models. This consideration is the reason why the default implementation of the macro concatenates your target schema and custom schema together — we promise we were trying to be helpful by implementing this behavior, but acknowledge that the resulting schema name is unintuitive. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Fusion package upgrade guide Learn how to upgrade your packages to be compatible with the dbt Fusion engine. [Back to guides](https://docs.getdbt.com/guides.md) dbt Fusion engine Advanced [Menu ]() #### Introduction[​](#introduction "Direct link to Introduction") Thank you for being part of the [dbt's package hub community](https://hub.getdbt.com/) and maintaining [packages](https://docs.getdbt.com/docs/build/packages.md)! Your work makes dbt’s ecosystem possible and helps thousands of teams reuse trusted models and macros to build faster, more reliable analytics. This guide helps you upgrade your dbt packages to be [Fusion](https://docs.getdbt.com/docs/fusion.md)-compatible. A Fusion-compatible package: * Supports [dbt Fusion engine](https://docs.getdbt.com/docs/fusion.md) version `2.0.0` * Uses the [`require-dbt-version` config](https://docs.getdbt.com/reference/project-configs/require-dbt-version.md) to signal compatibility in the dbt package hub * Aligns with the latest JSON schema introduced in dbt Core v1.10.0 In this guide, we'll go over: * Updating your package to be compatible with Fusion * Testing your package with Fusion * Updating the `require-dbt-version` config to include `2.0.0` * Updating your README to note that the package is compatible with Fusion ##### Who is this for?[​](#who-is-this-for "Direct link to Who is this for?") This guide is for any dbt package maintainer, like [`dbt-utils`](https://hub.getdbt.com/dbt-labs/dbt_utils/latest/), that's looking to upgrade their package to be compatible with Fusion. Updating your package ensures users have the latest version of your package, your package stays trusted on dbt package hub, and users benefit from the latest features and bug fixes. A user stores their package in a `packages.yml` or `dependencies.yml` file. If a package excludes `2.0.0`, Fusion warns today and errors in a future release, matching dbt Core behavior. This guide assumes you're using the command line and Git to make changes in your package repository. If you're interested in creating a new package from scratch, we recommend using the [dbt package guide](https://docs.getdbt.com/guides/building-packages.md) to get started. #### Prerequisites[​](#prerequisites "Direct link to Prerequisites") Before you begin, make sure you meet the following: * dbt package maintainer — You maintain a package on [dbt's package hub](https://hub.getdbt.com/) or are interested in [creating one](https://docs.getdbt.com/guides/building-packages.md?step=1). * `dbt-autofix` installed — [Install `dbt-autofix`](https://github.com/dbt-labs/dbt-autofix?tab=readme-ov-file#installation) to automatically update the package's YAML files to align with the latest dbt updates and best practices. We recommend [using/installing uv/uvx](https://docs.astral.sh/uv/getting-started/installation/) to run the tool. * Run the command `uvx dbt-autofix` for the latest version of the tool. For more installation options, see the [official `dbt-autofix` doc](https://github.com/dbt-labs/dbt-autofix?tab=readme-ov-file#installation). * Repository access — You’ll need permission to create a branch and release updates/a new version of your package. You’ll need to tag a new version of your package once it’s Fusion-compatible. * A Fusion installation or test environment — You can use Fusion locally (using the `dbtf` binary) or in your CI pipeline to validate compatibility. * CLI and Git usage — You’re comfortable using the command line and Git to update the repository. #### Upgrade the package[​](#upgrade-the-package "Direct link to Upgrade the package") This section covers how to upgrade your package to be compatible with Fusion by: * [Using `dbt-autofix` to automatically update your YAML files](https://docs.getdbt.com/guides/fusion-package-compat.md?step=) * [Testing your package with Fusion](https://docs.getdbt.com/guides/fusion-package-compat.md?step=5) * [Updating your `require-dbt-version` config](https://docs.getdbt.com/guides/fusion-package-compat.md?step=6) * [Publishing a new release of your package](https://docs.getdbt.com/guides/fusion-package-compat.md?step=7) If you're ready to get started, let's begin! #### Run dbt-autofix[​](#run-dbt-autofix "Direct link to Run dbt-autofix") 1. Before you begin, make sure you have `dbt-autofix` installed. If you don't have it installed, run the command `uvx dbt-autofix`. For more installation options, see the [official `dbt-autofix` doc](https://github.com/dbt-labs/dbt-autofix?tab=readme-ov-file#installation). 2. In your dbt package repository, create a branch to work in. For example: ```bash git checkout -b fusion-compat ``` 3. Run `dbt-autofix deprecations` in your package directory so it automatically updates your package code and rewrites YAML to conform to the latest JSON schema: ```bash dbt-autofix deprecations ``` #### Test package with Fusion[​](#test-package-with-fusion "Direct link to Test package with Fusion") Now that you've run `dbt-autofix`, let's test your package with Fusion to ensure it's compatible before [updating](https://docs.getdbt.com/guides/fusion-package-compat?step=6) your `require-dbt-version` config. Refer to the [Fusion limitations documentation](https://docs.getdbt.com/docs/fusion/supported-features.md#limitations) for more information on what to look out for. You can test your package two ways: * [Running your integration tests with Fusion](#running-your-integration-tests-with-fusion) — Use if your package has [integration tests](https://docs.getdbt.com/guides/building-packages?step=4) using an `integration_tests/` folder. * [Manually validating your package](#manually-validating-your-package) — Use if your package doesn't have [integration tests](https://docs.getdbt.com/guides/building-packages?step=4). Consider creating one to help validate your package. ###### Running your integration tests with Fusion[​](#running-your-integration-tests-with-fusion "Direct link to Running your integration tests with Fusion") If your package includes an `integration_tests/` folder ([like `dbt-utils`](https://github.com/dbt-labs/dbt-utils/tree/main/integration_tests)), follow these steps: 1. Navigate to the folder (`cd integration_tests`) to run your tests. If you don't have an `integration_tests/` folder, you can either [create one](https://docs.getdbt.com/guides/building-packages?step=4) or navigate to the folder that contains your tests. 2. Then, run your tests with Fusion by running the following `dbtf build` command (or whatever Fusion executable is available in your environment). 3. If there are no errors, your package likely supports Fusion and you're ready to [update your `require-dbt-version`](https://docs.getdbt.com//guides/fusion-package-compat?step=5#update-your-require-dbt-version). If there are errors, you'll need to fix them first before updating your `require-dbt-version`. ###### Manually validating your package[​](#manually-validating-your-package "Direct link to Manually validating your package") If your package doesn't have integration tests, follow these steps: 1. Create a small, Fusion-compatible dbt project that installs your package and has a `packages.yml` or `dependencies.yml` file. 2. Run it with Fusion using the `dbtf run` command. 3. Confirm that models build successfully and that there are no warnings. If there are errors/warnings, you'll need to fix them first. If you still have issues, reach out to the [#package-ecosystem channel](https://getdbt.slack.com/archives/CU4MRJ7QB) on Slack for help. #### Update `require-dbt-version`[​](#update-require-dbt-version "Direct link to update-require-dbt-version") Only update the [`require-dbt-version` config](https://docs.getdbt.com/reference/project-configs/require-dbt-version.md) after testing and confirming that your package works with Fusion. 1. Update the `require-dbt-version` in your `dbt_project.yml` to include `2.0.0`. We recommend using a range to ensure stability across releases: ```yaml require-dbt-version: [">=1.10.0,<3.0.0"] ``` This signals that your package supports both dbt Core and Fusion. dbt Labs uses this release metadata to mark your package with a Fusion-compatible badge in the [dbt package hub](https://hub.getdbt.com/). Packages without this metadata don't display the Fusion-compatible badge. 2. Commit and push your changes to your repository. #### Publish a new release[​](#publish-a-new-release "Direct link to Publish a new release") 1. After committing and pushing your changes, publish a new release of your package by merging your branch into main (or whatever branch you're using for your package). 2. Update your `README` to note that the package is Fusion-compatible. 3. (Optional) Announce it in [#package-ecosystem on dbt Slack](https://getdbt.slack.com/archives/CU4MRJ7QB) if you’d like. CI Fusion testing When possible, add a step to your CI pipeline that runs `dbtf build` or equivalent to ensure ongoing Fusion compatibility. Your package is now Fusion-compatible and the dbt package hub reflects these changes. To summarize, you've now: * Created a fusion compatible branch * Run `dbt-autofix` deprecations * Reviewed, committed, and tested changes * Updated `require-dbt-version: [">=1.10.0,<3.0.0"]` to include `2.0.0` * Published a new release * Announced the update (optional) * Celebrate your new Fusion-compatible badge 🎉 #### Final thoughts[​](#final-thoughts "Direct link to Final thoughts") Now that you've upgraded your package to be Fusion-compatible, users can use your package with Fusion! 🎉 By upgrading now, you’re ensuring a smoother experience for users, paving the way for the next generation of dbt projects, and helping dbt Fusion reach full stability. If you have questions or run into issues: * Join the conversation in the [#package-ecosystem channel](https://getdbt.slack.com/archives/CU4MRJ7QB) on Slack. * Open an issue in the [dbt-autofix repository](https://github.com/dbt-labs/dbt-autofix/issues) on GitHub. Lastly, thank you for your help in making the dbt ecosystem stronger — one package at a time 💜. #### Frequently asked questions[​](#frequently-asked-questions "Direct link to Frequently asked questions") The following are some frequently asked questions about upgrading your package to be Fusion-compatible.  Why do we need to update our package? Fusion and dbt Core v1.10+ use the same new authoring layer. Ensuring your package supports `2.0.0` in your `require-dbt-version` config ensures your package is compatible with both. Updating your package ensures users have the latest version of your package, your package stays trusted on dbt package hub, and users benefit from the latest features and bug fixes. Fusion-compatible packages display a badge in the dbt package hub. If a package excludes `2.0.0`, Fusion will warn today and error in a future release, matching dbt dbt Core behavior.  How do I test Fusion in CI? Add a separate job that installs Fusion (`dbtf`) and runs `dbtf build`. See this [PR](https://github.com/godatadriven/dbt-date/pull/31) for a working example. You want to do this to ensure any changes to your package remain compatible with Fusion.  How will users know my package is Fusion-compatible? Users can identify your package as Fusion-compatible by checking for 2.0.0 or higher in the `require-dbt-version` range config. Fusion-compatible packages also display a badge in the dbt package hub. This is automatically determined based on your package’s metadata and version requirements. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Get started with Continuous Integration tests [Back to guides](https://docs.getdbt.com/guides.md) dbt platform Orchestration CI Intermediate [Menu ]() #### Introduction[​](#introduction "Direct link to Introduction") By validating your code *before* it goes into production, you don't need to spend your afternoon fielding messages from people whose reports are suddenly broken. A solid CI setup is critical to preventing avoidable downtime and broken trust. dbt uses **sensible defaults** to get you up and running in a performant and cost-effective way in minimal time. After that, there's time to get fancy, but let's walk before we run. In this guide, we're going to add a **CI environment**, where proposed changes can be validated in the context of the entire project without impacting production systems. We will use a single set of deployment credentials (like the Prod environment), but models are built in a separate location to avoid impacting others (like the Dev environment). Your git flow will look like this: [![git flow diagram](/img/best-practices/environment-setup/one-branch-git.png?v=2 "git flow diagram")](#)git flow diagram ##### Prerequisites[​](#prerequisites "Direct link to Prerequisites") As part of your initial dbt setup, you should already have Development and Production environments configured. Let's recap what each does: * Your **Development environment** powers the Studio IDE. Each user has individual credentials, and builds into an individual dev schema. Nothing you do here impacts any of your colleagues. * Your **Production environment** brings the canonical version of your project to life for downstream consumers. There is a single set of deployment credentials, and everything is built into your production schema(s). #### Create a new CI environment[​](#create-a-new-ci-environment "Direct link to Create a new CI environment") See [Create a new environment](https://docs.getdbt.com/docs/dbt-cloud-environments.md#create-a-deployment-environment). The environment should be called **CI**. Just like your existing Production environment, it will be a Deployment-type environment. When setting a Schema in the **Deployment Credentials** area, remember that dbt will automatically generate a custom schema name for each PR to ensure that they don't interfere with your deployed models. This means you can safely set the same Schema name as your Production job. ##### 1. Double-check your Production environment is identified[​](#1-double-check-your-production-environment-is-identified "Direct link to 1. Double-check your Production environment is identified") Go into your existing Production environment, and ensure that the **Set as Production environment** checkbox is set. It'll make things easier later. ##### 2. Create a new job in the CI environment[​](#2-create-a-new-job-in-the-ci-environment "Direct link to 2. Create a new job in the CI environment") Use the **Continuous Integration Job** template, and call the job **CI Check**. In the Execution Settings, your command will be preset to `dbt build --select state:modified+`. Let's break this down: * [`dbt build`](https://docs.getdbt.com/reference/commands/build.md) runs all nodes (seeds, models, snapshots, tests) at once in DAG order. If something fails, nodes that depend on it will be skipped. * The [`state:modified+` selector](https://docs.getdbt.com/reference/node-selection/methods.md#state) means that only modified nodes and their children will be run ("Slim CI"). In addition to [not wasting time](https://discourse.getdbt.com/t/how-we-sped-up-our-ci-runs-by-10x-using-slim-ci/2603) building and testing nodes that weren't changed in the first place, this significantly reduces compute costs. To be able to find modified nodes, dbt needs to have something to compare against. dbt uses the last successful run of any job in your Production environment as its [comparison state](https://docs.getdbt.com/reference/node-selection/syntax.md#about-node-selection). As long as you identified your Production environment in Step 2, you won't need to touch this. If you didn't, pick the right environment from the dropdown. Use CI to test your metrics If you've [built semantic nodes](https://docs.getdbt.com/docs/build/build-metrics-intro.md) in your dbt project, you can [validate them in a CI job](https://docs.getdbt.com/docs/deploy/ci-jobs.md#semantic-validations-in-ci) to ensure code changes made to dbt models don't break these metrics. ##### 3. Test your process[​](#3-test-your-process "Direct link to 3. Test your process") That's it! There are other steps you can take to be even more confident in your work, such as validating your structure follows best practices and linting your code. For more information, refer to [Get started with Continuous Integration tests](https://docs.getdbt.com/guides/set-up-ci.md). To test your new flow, create a new branch in the Studio IDE then add a new file or modify an existing one. Commit it, then create a new Pull Request (not a draft). Within a few seconds, you’ll see a new check appear in your git provider. ##### Things to keep in mind[​](#things-to-keep-in-mind "Direct link to Things to keep in mind") * If you make a new commit while a CI run based on older code is in progress, it will be automatically canceled and replaced with the fresh code. * An unlimited number of CI jobs can run at once. If 10 developers all commit code to different PRs at the same time, each person will get their own schema containing their changes. Once each PR is merged, dbt will drop that schema. * CI jobs will never block a production run. #### Enforce best practices with dbt project evaluator[​](#enforce-best-practices-with-dbt-project-evaluator "Direct link to Enforce best practices with dbt project evaluator") dbt Project Evaluator is a package designed to identify deviations from best practices common to many dbt projects, including modeling, testing, documentation, structure and performance problems. For an introduction to the package, read its [launch blog post](https://docs.getdbt.com/blog/align-with-dbt-project-evaluator). ##### 1. Install the package[​](#1-install-the-package "Direct link to 1. Install the package") As with all packages, add a reference to `dbt-labs/dbt_project_evaluator` to your `packages.yml` file. See the [dbt Package Hub](https://hub.getdbt.com/dbt-labs/dbt_project_evaluator/latest/) for full installation instructions. ##### 2. Define test severity with an environment variable[​](#2-define-test-severity-with-an-environment-variable "Direct link to 2. Define test severity with an environment variable") As noted in the [documentation](https://dbt-labs.github.io/dbt-project-evaluator/latest/ci-check/), tests in the package are set to `warn` severity by default. To have these tests fail in CI, create a new environment called `DBT_PROJECT_EVALUATOR_SEVERITY`. Set the project-wide default to `warn`, and set it to `error` in the CI environment. In your `dbt_project.yml` file, override the severity configuration: ```yaml data_tests: dbt_project_evaluator: +severity: "{{ env_var('DBT_PROJECT_EVALUATOR_SEVERITY', 'warn') }}" ``` ##### 3. Update your CI commands[​](#3-update-your-ci-commands "Direct link to 3. Update your CI commands") Because these tests should only run after the rest of your project has been built, your existing CI command will need to be updated to exclude the dbt\_project\_evaluator package. You will then add a second step which builds *only* the package's models and tests. Update your steps to: ```bash dbt build --select state:modified+ --exclude package:dbt_project_evaluator dbt build --select package:dbt_project_evaluator ``` ##### 4. Apply any customizations[​](#4-apply-any-customizations "Direct link to 4. Apply any customizations") Depending on the state of your project when you roll out the evaluator, you may need to skip some tests or allow exceptions for some areas. To do this, refer to the documentation on: * [disabling tests](https://dbt-labs.github.io/dbt-project-evaluator/latest/customization/customization/) * [excluding groups of models from a specific test](https://dbt-labs.github.io/dbt-project-evaluator/latest/customization/exceptions/) * [excluding packages or sources/models based on path](https://dbt-labs.github.io/dbt-project-evaluator/latest/customization/excluding-packages-and-paths/) If you create a seed to exclude groups of models from a specific test, remember to disable the default seed and include `dbt_project_evaluator_exceptions` in your second `dbt build` command above. #### Run linting checks with SQLFluff[​](#run-linting-checks-with-sqlfluff "Direct link to Run linting checks with SQLFluff") By [linting](https://docs.getdbt.com/docs/cloud/studio-ide/lint-format.md#lint) your project during CI, you can ensure that code styling standards are consistently enforced, without spending human time nitpicking comma placement. Seamlessly enable [SQL linting for your CI job](https://docs.getdbt.com/docs/deploy/continuous-integration.md#sql-linting) in dbt to invoke [SQLFluff](https://docs.sqlfluff.com/en/stable/), a modular and configurable SQL linter that warns you of complex functions, syntax, formatting, and compilation errors. SQL linting in CI lints all the changed SQL files in your project (compared to the last deferred production state). Available on dbt [Starter, Enterprise, or Enterprise+ accounts](https://www.getdbt.com/pricing) using [release tracks](https://docs.getdbt.com/docs/dbt-versions/cloud-release-tracks.md). ##### Manually set up SQL linting in CI[​](#manually-set-up-sql-linting-in-ci "Direct link to Manually set up SQL linting in CI") You can run SQLFluff as part of your pipeline even if you don't have access to [SQL linting in CI](https://docs.getdbt.com/docs/deploy/continuous-integration.md#sql-linting). The following steps walk you through setting up a CI job using SQLFluff to scan your code for linting errors. If you're new to SQLFluff rules in dbt, check out [our recommended config file](https://docs.getdbt.com/best-practices/how-we-style/2-how-we-style-our-sql.md). ##### 1. Create a YAML file to define your pipeline[​](#1-create-a-yaml-file-to-define-your-pipeline "Direct link to 1. Create a YAML file to define your pipeline") The YAML files defined below are what tell your code hosting platform the steps to run. In this setup, you’re telling the platform to run a SQLFluff lint job every time a commit is pushed. * GitHub * GitLab * Bitbucket GitHub Actions are defined in the `.github/workflows` directory. To define the job for your action, add a new file named `lint_on_push.yml` under the `workflows` folder. Your final folder structure will look like this: ```sql my_awesome_project ├── .github │ ├── workflows │ │ └── lint_on_push.yml ``` **Key pieces:** * `on:` defines when the pipeline is run. This workflow will run whenever code is pushed to any branch except `main`. For other trigger options, check out [GitHub’s docs](https://docs.github.com/en/actions/using-workflows/events-that-trigger-workflows). * `runs-on: ubuntu-latest` - this defines the operating system we’re using to run the job * `uses:` - When the Ubuntu server is created, it is completely empty. [`checkout`](https://github.com/actions/checkout#checkout-v3) and [`setup-python`](https://github.com/actions/setup-python#setup-python-v3) are public GitHub Actions which enable the server to access the code in your repo, and set up Python correctly. * `run:` - these steps are run at the command line, as though you typed them at a prompt yourself. This will install sqlfluff and lint the project. Be sure to set the correct `--dialect` for your project. For a full breakdown of the properties in a workflow file, see [Understanding the workflow file](https://docs.github.com/en/actions/learn-github-actions/understanding-github-actions#understanding-the-workflow-file) on GitHub's website. ```yaml name: lint dbt project on push on: push: branches-ignore: - 'main' jobs: # this job runs SQLFluff with a specific set of rules # note the dialect is set to Snowflake, so make that specific to your setup # details on linter rules: https://docs.sqlfluff.com/en/stable/rules.html lint_project: name: Run SQLFluff linter runs-on: ubuntu-latest steps: - uses: "actions/checkout@v3" - uses: "actions/setup-python@v4" with: python-version: "3.9" - name: Install SQLFluff run: "python -m pip install sqlfluff" - name: Lint project run: "sqlfluff lint models --dialect snowflake" ``` Create a `.gitlab-ci.yml` file in your **root directory** to define the triggers for when to execute the script below. You’ll put the code below into this file. ```sql my_awesome_project ├── dbt_project.yml ├── .gitlab-ci.yml ``` **Key pieces:** * `image: python:3.9` - this defines the virtual image we’re using to run the job * `rules:` - defines when the pipeline is run. This workflow will run whenever code is pushed to any branch except `main`. For other rules, refer to [GitLab’s documentation](https://docs.gitlab.com/ee/ci/yaml/#rules). * `script:` - this is how we’re telling the GitLab runner to execute the Python script we defined above. ```yaml image: python:3.9 stages: - pre-build # this job runs SQLFluff with a specific set of rules # note the dialect is set to Snowflake, so make that specific to your setup # details on linter rules: https://docs.sqlfluff.com/en/stable/rules.html lint-project: stage: pre-build rules: - if: $CI_PIPELINE_SOURCE == "push" && $CI_COMMIT_BRANCH != 'main' script: - python -m pip install sqlfluff - sqlfluff lint models --dialect snowflake ``` Create a `bitbucket-pipelines.yml` file in your **root directory** to define the triggers for when to execute the script below. You’ll put the code below into this file. ```sql my_awesome_project ├── bitbucket-pipelines.yml ├── dbt_project.yml ``` **Key pieces:** * `image: python:3.11.1` - this defines the virtual image we’re using to run the job * `'**':` - this is used to filter when the pipeline runs. In this case we’re telling it to run on every push event, and you can see at line 12 we're creating a dummy pipeline for `main`. More information on filtering when a pipeline is run can be found in [Bitbucket's documentation](https://support.atlassian.com/bitbucket-cloud/docs/pipeline-triggers/) * `script:` - this is how we’re telling the Bitbucket runner to execute the Python script we defined above. ```yaml image: python:3.11.1 pipelines: branches: '**': # this sets a wildcard to run on every branch - step: name: Lint dbt project script: - python -m pip install sqlfluff==0.13.1 - sqlfluff lint models --dialect snowflake --rules L019,L020,L021,L022 'main': # override if your default branch doesn't run on a branch named "main" - step: script: - python --version ``` ##### 2. Commit and push your changes to make sure everything works[​](#2-commit-and-push-your-changes-to-make-sure-everything-works "Direct link to 2. Commit and push your changes to make sure everything works") After you finish creating the YAML files, commit and push your code to trigger your pipeline for the first time. If everything goes well, you should see the pipeline in your code platform. When you click into the job you’ll get a log showing that SQLFluff was run. If your code failed linting you’ll get an error in the job with a description of what needs to be fixed. If everything passed the lint check, you’ll see a successful job run. * GitHub * GitLab * Bitbucket In your repository, click the *Actions* tab ![Image showing the GitHub action for lint on push](data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAR8AAAC9CAIAAAAFlu5LAAAQp0lEQVR4Aeycg5MkSxDGz7Zt27Zt27Zt27Zt27bvv3m/3e+9iomZN9M755vNiC82srKrq7o76jeZld13ceImSGr6GTKZjC6TyegymYwuk8lkdJlMRpfJZHSZTKaIp8tkMrpMJqPLZDIZXSaT0WUyGV3JUqVHsmvWazJk5ISESVJix0IlTpaGRxE/UfJfMBezMBcz2gKNWLqy5cr/9O1XlDZD1qQp0j558wV74PCxsfNJ3X36ltvv2W/IL5hr4LCxzHX78WtboBFLV/bcBRxdNC/efPj49ecqNep5Dnr60i3WYtdeA7AjRvefv/t1dA2PZXQZXSiGedGdJ284a/iYyUZXjGR0GV2Xbz8iItVp0Az76t2n2F16Drhw/T4Z48OXH9ds2gV7BYuWxq+zHr36hF24eNnQV5A+U45te4+wkhjnyu3HC5atlT/4LN6Eb951kBOnzl60dc9hzkJ7j5zOU6CYgz/6wsqoOWrCNJo79h9Ts32X3szLdATqS7ceVq/d0Jeu0ROnE5k5ev/5+/Xb9vrN26RFO4Zi/LQZs8lTr3FLedhHuas6cPw8g/N8uNnM2fMcO3tFd7dj3zHuzuiKjXSxGrBbte+KzeLA1k7MqWGzNrCkbjqKXaxU+RBTpEid4dq9p45GGaxCDoWYxfOuTl64EXjijfvPVSdQs3T5quo8b+kammev3sVmcB3lqoQTyleouOgKHHPWguV+RSB1GDB0jDwQSJOkOthVAZ6vE96MrigZXfzGp0qXmZhw+/Ermpt2Hgg3M5w+d4nWlta6VhWqUKWW1yzedKHeA4YTDfir5tBRE0PTxeDYx85d1aGNO/YzVNtOPUSXVnymbLmp9xBmaRJU/aY+dPIC/kOnLrpHoYjnrooYlTt/kZRpMjKUu3cu8tSlm27AgUaX0dVvyCj1XLZmM83r956FS9eR05fpuWXXIecRQmMmzfSaxZsu1qtvoQUPtISmSzYrHsw6dusD1X77rskzF6g5eeZ8mjhdB5dY4ufKAaZA4ZKaS4miroqRHbo0uTA1x02ZRfPWo1dGV5SMrsYt2qknv81KvcKlSyxNmbXQDwx485rFm651W/c4zwYyNBC6cic0XWnSZ7l44wFNp92HTimf9KtqdOs9MJqu94GvqnTZ7LhAEePM5du+V8VvhJprt+ymye+LmsPGTPqRdJmMLoWUtZt3OQ+hCQ8Z4/fTpfRMOnrmCp6dB46LLq1+HVq5fpujSypTodqMeUsdZvOWrI45XYjihG7q3LW7GLy8+mF0mYwu7SiWrd3iOf2ilRvo+eDFB03B5yBu6X8/XSR4cCJaVDaYMG2uu5EV67Zi85Zc8VN0TZg6Z9XGHS3adPYdZ9fBE550UdicNGM+o2HXbdRCh3QNVG5+GF0mo0u1AVXeVHALJmoV7PJVMKSwJgbY2fO91ffTJQl1xERZc+RVwHSHAFu26KLiLyT2HT1DTqhDvBYPTZfbX1Hc11GhhYiZNI2uULIvoWgKgxZtO7t174rjrCrfekOJMpVgwG97E0ylylVR/Q0xLMtRP/Zes3jTBeQQq5FvPnhBYNTRnHkLKV6pIL5971FtyVRS5+WbIBfz46fO9v0Sqkfff+kCOVfVoABIT/dD4Ch1ZEonzl/Hs2TVRjVJHX3TV+qZji4K+tF0Rdkm+0b+f0T8SZQsNVkW70+DSZmbvpH1fO/s5DmmixJ6W00NPXAQ4liwuKpyH6E7rG98VRiUDp44r2jJE7BFZnT9LO0/dpZ1FkyUoX/GmL50/WINHjHeBUbFPZPR9bPE2yo+Rwomvqv6GWPyl5yNV1K//oFSLFHGqNcMJqPLZDL96XSZTCajy2Qyukwmo8tkMhldJpPRlS5T9tIVqv/4kU0mo6tKzQYLVob7OthkMrqMLpPRlbdQiWnzli9bv3PGwlXFy1TGkz5zjiVrt7fv1m/puh1zlqyrXKO+ejZq0SF3/qLuRNezeduu9Fy8emurDj3kn79iY8GiZWRPmLGwWp3GGMXLVJq3bMPyDbtmLVqTM29hR9egkZOYnVMKFSsbOc/dZHQlTZEOKoaPn1G4RLm+Q8awytkLZcqaGwagokjJCoNHT6aDPnvF2WvgSGwn9YSZEmWrNG7ZAbtMxRr4GadoqQrqA5/1mrTBWLRqS7e+Q7PlKjBy4qyJMxeLLk5hXsCbOncZPSPnuZuMrhJlK0NCgsQpseMlTAYANeo1FTPZcxdU4QE7a8782Fly5EucPE0gXcVKVVRz1KRZ/YeND0YXzl4DRyVPlcE3M8TJvNjlq9RmKAyTKULoatyy49yl611z8uwlnXsNEjMOJOy8BYu7PoF0pUyTSc2OPQZMmbM0GF3V6zQhjaQ/iWiBIqX89l3knEaXKaLoImKwZXJNcjxICJeuPP8dHTRq0pDRUzAYs2K1unIuWLlJdCk8whUMA5XRZYpwulKny0Kcad2xJ/8EsG7jVqxv9kXB6OrSaxA7MT+6tENLnjoDZQyGYhD8s5esHTFhJv8FRbnKtegAXcQ3gljJclU52qpjT1JQoyvyZTVDAGCts7JJ22rVb4bHjy5FJ6oaRKQO3fsH0jVg+Hj+ooEjJ8pfvmptSMND2gk/dRu3Vt6IBz8Tlf+HvXtAqgUA4zg6fH7Ztm03zFpKi2kz7a5/tl1n5tS1v981phav1NXQ8uXqQl3HcuPzkJ+curLOWYe//hZf+QB87gT+L664vg/HD9LuAl5NvnIr95JAXbktKquqzxJQF6AuUBeoC1AXqAtQF6gL1AWoC9QF6gLUBeoCdQHqAnWBugB1gbpAXYC6QF2gLkBdoC5QF6AuUBeoC1AXqAvU9Q2AukBdgLpAXaAuQF2gLlAXoC5QF6gLUBeoC9QFqAvUBeoC1AXqAnUB6gJ1gboAdYG6QF2Vtc3dA+OD43NDE/NfHmTUM/AZ+9et6+ffohxM9+BEe89wc0f/NwEZ+Ix9hj8JvFZd2feu/rHvef5Chj8JvEpduWVMvt/5zIUkkBBevq5U6w4h7iImhJeva3B8zpkLCeHl6xqamHfOQkJQ1zOBukBdoC51gbpAXaAudYG6QF3d6xNDu4uTe8uL+zuRJVn53JMH6kpIKepGT28M1HU1rScF1tE70tU/dqylc+CAnbPQjWNZAui/hJkTMzNvzMwvzMycmJmZmcPMzAx/845uXZVWmSwovPaVSlF3dU1vz6TOFNjen/uM5i9xSspYExIV/82d+cTkzHVOHv7GIyHWjyoyfc5i9JUNHfuP5dk8TOaazY0dA5UN7TINCo9h3DkwsXiFm7mZ7rZx+/4L1++7egdNOef+jy5F6wcBu//8/ZM3X0Qev/584/5zePjxpwMJvSMXzHfese+YrvoERV6580RX7z59e+hkofmRHrz4YP2oIjXN3ejZ/NKtR9bPA8ZyjLGLN5nOW7yKMZpbD1+udPVRM/PdeobPY5C9bpuVbbmR8rq26MTMKQgArzbent8EY/aCFY5NF+TYKRRm1unCn86UVOeV1nYPnWOM4DQ/+HSIErjm4MSVLbsOQc6dJ2+YmuLTWPILjnr48iPTrqFz+46eqahvF2yO55fZpIuzFVc1qSSm/89OumCAj8hcs0WmXMi0rrVXpka6JLSGRydZ33btlt3sk19WO9W42rjzUFF1K1JQ2RwUGS96v9BopqLftOswCYtD0mU9cLEKUXaFL4Mrl9W2qrtcvvXo4s0H5pZdg2cZ4NaMYeb249f44rW7T400ssQ+muBhAEUSFTsGJlg6VVSpxl4B4UK1TbrQW+EBWrDRTG/04o2bD18yAHKJVFB97tq9k4WVgjdK7PlTom/ull9Wx2pA6GqciQHHHvhnH67lCWBw+GSR+T7iTMQ6LkfDR58oKEejT4xXGFGaF42j0xUZl5Zf0bTSzZdbzl6/g7FkK0CVtX4Hj8vJ3R/MYlNyHJIuGoNW0PoKP4ztp6uhvR932Xv4tCzhDbqEfuLKHQbNXYPiUsPnr0lQ6ugfN99THBr9hRsPqHaEMRWST5ZmzF1itMeVf4SuvrFLbLLc2Uv01+8/E2KJimSALHHgwop6AOBGmPJeaOsb1eMZWcUmzJTAURkgvDJILDGQT0nP3Qirug+XRMamCH5MMdYnKU8M4WNbe0YcnS43n2BP/3CNV0AFUX4hq2Ug+t1H8o7mVTCYMXfplj1HC6ta4C13024HoEvJEXisoCViMzM8fKoI6Rm58K/ruHjbpIs3MWPeXlxidHp6FVyLmQhemJq9Qd1Xt1UprmrEbNfBEzbrLmKCioBkky5FRTM9UkSmJMMytYcumJEoV1Bez5TQbcwMpdQMjoyTjg6X3HjwQp9YeV0r40kjXgERpviM06X1a7bsZerqFQRd85c6y6okjQyik7IIbq7ewYERcTAGlo5El5RVMGZES8XOroZEpOx1W9HbogtPTRY92SNT486At2bTTnoDoCKb87JHj88Z4alt6cFgzaZdNuki9Kn8TrrOXb0rBnEp2Uyrm7qMdMmd0mkUYYzwHHhi+umTRo7klUNLXnmjT7BJ/rvPlDUQr0KiEsgPoYtV9CnZG6HL3TfUITNDxtbrMZuZIe7u4RtCs0GKn1Vufka6eHZf0RUQFi1L56/fY2q9CMYXsSE2Ko1SHeFwdPNwX9mEmPnjmaH2AMHvJ9IlzUaE3owluqQMGxi/LELg4l8yT31ik629MXcJ8EDRrAXLmS5z8tp/vIiQBXgAdqKoRppD+44VSijL3bgLR3Kwrgb8GJXf0dWQ9Izy3bwzIekQP7Oyny7cESzRqyY2OQsbepKMaaOLCzImv2IslZgc40footphH4m9nAFf/z10acpHH4XpUidPmVJGUokxmGR0RcWnewdGKmCQExAey1fNuHgFqQ1NRUovnc5b4hSXmoulKSHjb6dLWoJ2CvZ20gVI8valicd06Nw1KZnoCkj3wv7YNX7pFsqJy7f3HjlNj06Sug3b9klBItySO1G8MRDRwkxKQforKum5m+yhiyYe+3DU08XV0rf41XR5+oUx5mC0QOUH0FIZctcgJw2PyUdX1rrtlFtEKgKRMEO5RYwiiKXmbMTAN8SEkoYH44w1W3cePAV7/Iwxv6LRlJA5tX6ajOgUv8QPaD0zJl3UzgSdLqWrqXNQ+3sI+BnpIvEDLfNyDsbMex7khyjVQNNCQymI4MqdelQrPJCDyWHErekTsqR1naCiv7chjRn76SIuiUFUXKoeSTsZiOQ83KbWmQBOe83wxBxeZi9YvuvQGf15V0RsqujDo1MATCouMkNRrnD1OVVSK8Z7juSRQ06h34SyKZRhi5a7ft+1BMPoxAxLXoXXmuLTaULIz3YBA+WPC6eV0vG3CTfy//bugANhIIzjMCAEASEQIKpFNmRA3/879YeJjkBnunt4BAi5n13nnUvYH0em5+X0rFXZE17uj/I/9jQ/81lOveVo3hTvOvKsu43evmH1ugrlGyhfp59AXaAudYG6QF2gLnWBukBd6gJ1gbpAXW52hUVCcCs5/M+t5PvD8TQYEaRrSSAh/L6uSLUZ1/cT06cs/uXBVaGuzXaXb0++3W0RsSEcpiz+JFCrrvcW8ToO45zDk+ZBlnoWfLEhrFAXoK42oS5AXaAuUBegLlAXqAtQF6gL1AWoC9QF6gLUBeoCdQHqAnWBugB1gbpAXYC6QF2gLkBdoC5QF6AuUBeoC1AXqAvUBagL1AW8APd3BlmN0NC+AAAAAElFTkSuQmCC) Sample output from SQLFluff in the `Run SQLFluff linter` job: ![Image showing the logs in GitHub for the SQLFluff run](/assets/images/lint-on-push-logs-github-d1b1d9efc65a86cf416ce9fd081cc1e1.png) In the menu option go to *CI/CD > Pipelines* ![Image showing the GitLab action for lint on push](data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAATAAAAChCAMAAAB3aBOTAAACl1BMVEX////8/Pz29vby8vLw8PD7+/v09PT6+vr9/f1LS0twcHDHx8eCgoLR0dGAgIBAQED+/v7m5ua1tbXi4uK4uLgmJiYAAAANDQ1NTU3S0tLs7OwcHByPj4+kpKQyMjLe3t7Ly8tmZma+vr6QkJCIiIj5+fnFxcVoaGh4eHjKysrT09Ph4eF1dXVJSUnr6+vv7+/c3Nzo6OiLi4vd3d3b29vY2Nh3d3eGhobg4OCBgYGWlpbMzMxubm7GxsZiYmKYmJgqKipTU1OHh4dtbW0WFhY4ODicnJxVVVU7OzuVlZX4+PhWVlaTk5MiIiJzc3OhoaFCQkK7u7t8fHzt7e1ra2uzs7PX19eDg4POzs7IyMjQ0NDW1tZ9fX25ublxcXGUlJTx8fHj4+OysrLa2tpsbGyFhYVvb2+qqqrn5+fq6uqsrKxqamqEhIQuLi6wsLD19fWmpqZjY2N6enqnp6e8vLzBwcGOjo6Xl5egoKDk5OSenp6/v7+rq6vNzc3u7u7ExMStra1FRUW3t7eSkpL39/fPz8/p6emfn5+dnZ15eXlfX19dXV2vr6/CwsLV1dWlpaVcXFxpaWk1NTWampqxsbHAwMCMjIyjo6O2traurq7Jycnf39+bm5taWlqpqanz8/N2dnZlZWV+fn7l5eWNjY1mZsRhYWEwMDA6OjpXV1dGRkaKiopQUFBycnJeXl5BQUGZmZlZWVmioqJKSkpnZ2e9vb1/f3/Z2dnDw8PS6N2FwKFLo3UlkFgQhUjV6t9drIMTh0s9nGtSpnoUh0uezrVZqoC22sf8/f2NxacnkVqz2MU7m2nk8eq6urpSUlJgYGCHwqNMpHaRkZFPT09MTExbW1tkZGRHR0eJiYnD4NG0tLSoqKiNxaj5DcKcAAALs0lEQVR4AezMxQHEMAADMIdzWNx/1TIzPGsNIBARERE9lpBKN5QU9687mbHOeQDu9cbM5/vDP7DYExqUvMeiKMYWE6KVpFgg9YjEnDCVnNa6cG4b3aIAflzZYfhK1yn3GiqnYWYu6r2woei4EN4Nc8rMzAz/6FpWFcNIo8UTMH1z7vgntP7+epd5mUW2E5GzYAd20i4gNbtpD/bSPlhlP7sAwO2BYbJZwDxePgA9LtlgQZo9JQZbUOZYfEhO4UGzriLzMiuw4rwSchahqNQYrKwcsAbjiv8MTN8nNldyleluUV1TU1tXU1OP5DQ0Gnc1ZXFzi3GZNZgEtFKbo70D2e2dXYcOHwFw4Kjz2HENrLW9TGn/3/+7D/VkAL19zoJ+I7CegQwNbPC43xcAEPTIoRMAmhrkugALnPTJvlPmYKc98pmzKlh22D8UAc7lyOHzMWabXcuFi8MbYHYbABjh6OO1cQPMI1ViNCwPjyGxa7x5z+DE5GbzMiuw36ghnX5HP5HzGNEUTjudre00FAPLp53TRIfyiU5ihrpnD1GhAVg9T8TAvHM9rnlewJjcubjE9Sji0OIys9jMK6fcHDEDW5WHF9d4HS72b9niDwnRFcq9xE2JO0VhYCwOtskETB//a9zlnryTkov7C30+JHXJO+1XLi4al1mD2a7uon4NzLkZ1+g6uugGpEO7EsCKcJPORBeP4TLNGoDdWuByFew23wHWirEn7EBG7V30jiiAh8VJroCjZtoMbI+6bKgWLhWpkFcj7AKqdgKAZNdzJw4mmYDp4/Vx6nv3RhyI1IjEri33o/9CE8ZllmBqjqZrYA8AGznRTtcLCojGN8DagSZ6GCEqiL6/3QhM1IWVKNgjXl5eZk4fiT32IGcIwDYWZX75eIsXZmAhddljTnexAtziFuSxZ1sEauxGYHYTMH28Pk59L5dHep+IpK6nw9F/u+8ZllmDZT5rHvVCA5MBOEl0Ez9//uJ5xQbYXuAgPbxD9Dz6fq0RGK7IL6Ng8zwzM3OzV/Grj0tbcKYNgIsFyl41cFgyAdOWZfMtF9sAB7+G8voN84W/DKaP18ep76HobQ9XJoG980X/Nbz/m2AS1Ghg3dM4ScfQSieA0ydtKWA4dGgTxIeDhmA4wOxBNk8DyhXcGwCQJuFlrQDaWCjTQDVfRUqezAgUcS7uqcvehuHicuACf7RJAlLXy798SOrj9XENN4FpBZjiwcSuJnncPjh5daPs74PRi/AhcqtyQwPUnpEKlkfXP32mZmMw7GYPIv668uo5H57w23OX5bfI5ZX6bcziLGdH1NNcSnJ5S9ESS3gSXbbOo3Bx8ZMnxT2Oczx/JSgHku8071if9PXx+riXXUFlqau8rE12JHbtzOod/HLxzt886X+FGkcMbPY6ObMEsGc7UV8V3DQTBRvcpINlfDtE9H3aBEzye4CqOeYfFcDrMHOjArzzc3iehbLMrAKk5iUzrwN4HGZ5RcAlTzHP/QS2yMzLCgDYjMBsRmDL2Bivjysf4eB4HvPIDiR1XW7mi9kpZX8j/VSLMgdi2ZkGo4ixdFjk1i3tUdKahIRYMiIwSlnRr4FfhfbomNYe/2DHHrjlBgMgDNe2bdt2j+o2SW27U9u2bc930+Qa+/O6Ol3XzrzHWD27m2Syqlb6mElU5Qsvn3i5RpETzfQamcNoaOqTfT/Yn+47p9F3P9f3P1m4UW/3/XGd7xzfiX7++P6udHtHKaWUUkoppdS7n5LABJarYeqrE9j3gEF9dQITmMAEJjCBKYEJTGACE5gSmMAEJjCBCUwJTGACE5jAlMAEJjCBCUxgGVW1V62qgRzdXXkeKc18b2PiHERrurJZEMFakTROa2TN4VCklMdBoIto9bk7mGAfZvak6fh1YOfXQmADAIs3qnn+GPcAFvsm/wESYKcvF7xcgMZeA2CRVwjLOx8FO+cXvLobYLAxXGmTLHpw3RjLY3ECjJNL2KpNU5YCdTkF79k4Ata4gK08BhbsRV2X92xyEbCNZRhQ4KLqwHA14HAbXrznvHSwk9xQAy+DChbJX2WzFQCPrxyHfDORjAA6HAtsZGE62BA2A44EFWzriPXzq8FmHoASlo8eXTR69XwnXAUc1gd2cGVTvgJ6JMAiP0T0CPAxDIiDWTwHXO/9ArEcPgV6scl5mhq4kQDbwynAfoEB91hS3JVejU9gXObTrIbL0v1MgA0je+1mUMHaINLxKBiutiJ3DUO8qbxg6PUAQoZmfxRsUwQM81qx4CktbclwjW0k9+I8ItVoWhUpDa6h8Z01jW8lMIEJTGACUwITmMAEJjCBKYEJTGACE5gSmMAEJjCBCUwJTGACE5jAPlJDBxgUAkEYgEembQlKB1gQWKw1i+4REDCCTjFA3SAodNUH752h/31H+CzrJAWNTJoNMCz1orMbPaHxo5tV+oQVlvS2mnDVdmsCCtsCEzoOG0oYx8MTPn9EhggzYfoPLAYQZsURoiqve3ue7b7mir5csdfDWCC/nnAtAzdd1/CwXOH5jQm/HRaZ8HyosQvmxo0wjOOP5wI+DpUhpxmHOQ7DcJmOmawcMzMzMzOvylyHNszM+GWqVSxnL0oqF7f53YX5L+3Ou7bsnuLcCLeNzim7LdB4RQkOtuoEXpaXX1BIaWFBfh6ECUnK9AJ4XplJIdCcWCU0WFywFbyiYkpLSsvKSsspLS6CGMeSL/hhCL8LycfAWIPjRAZzeIFXQWllVTVU1VWVlFZAiKQLGMaFJGi8HAKDxd0Fr4bW1sGtrpbWQIDdmX4Yhl/mbmjuxokL5n/g5furvgGchnoR99jDO15wkW9I0o0bTgzwuvMQzAF/ccHs3vz+RWv5XqxYLS3Cfy1YDyRLkiyzZjKcN6ByBoPxtgsLdsABTjGtwxB1tBhu13KwbS881ngdf0XInY16L9n9iixBtfFOCBjHAVHBEmbw8wSthEElzYNu6kr8+B3ctmfiDzU1Y6jYfRthJmE+30u/1SQw8xPAzEgQFczBD/n5tAqcFjBVNJ8P9pD7grA5JnuRMc5xsgpmcgLBOLlebFmCCcwB4+MQFSwtBIMKSqr5XrQVquqSAj5YYhuuha1sn2qzoJmQjtnQxHds71Si7cDytunKHVzrUsbtB2xHgfnfKjOzAcz/XPnxEKZNJd2TYOK4l+uugu6GpALjdRxMSJqoYHYrBhWWgu81EAylhXyw6Cd4h3x3JJVEILRJ2rsUmgVEef9aU4cPssi4aycaSWJQFrHj80lwKOOCTpMcRClrj/SQIP/3yaH5MHHQ13Vbwcj3IBirXVSwJHBomStVL9cLZdQQLBbouMEvyQXkJOAgvyLrAyvw3XQgpL2NBfuo3RfWH9/GR93HgKs/ebIkLdf9MCK/69Ak/Z+CUUp79V7DBpsK4Mm3A8HGdXR0dKrBZgNo/wFZEwAL6QOwTmLBOsi6detIE9ZOAjNKg5ksyRbKsF4jLMkOAJNcwYI2bdp0SA1mB/BBG7LWAR+TRgA3vmPBppKMjIzT/ehaZx6MW5JGsixsSZpv+i1cL37TNwR7MrgkZwFjSIAWDE1vA37f2ViwG80AxizFsyYrEHpeDRYIDzd9SXUDOumGyaYvdKxo0XsZxwo+mK17/sfuTf/ggixl9UCwXWRW0lGSyYLNJZ/eDVWO4g2SGH+EbMVq8ukqD8cKNZfMDWJOsWOFyeDaO+Lg+ugJLrJgE74FFnSQN/Rg7yik6Q4wjgXzOzqVtPeBBcORdkLmfAxMU18m+gI2stbDwVWS+MFVhoDB9W8ejYz8QqAHS7B68R9YClVzD1RLvaFZagXjbfHwaCTJ/NEIMByNRvHhmwWDgUQi/s7h+4YTsrYsZf1W4w/fo/zhnRn9G2HQn/S3H95x3rjBmsHl9/btkAoAGAZiqH+PFTNevOwN/EgIvKYfzDt7QJw9II4B0UTtCHIXZzaH3BapgBilRe4kqGuRbIqCE2TnHhueCgNhhBFGGGHIhBEGwggjjDDCcABMDQkF7368eAAAAABJRU5ErkJggg==) Sample output from SQLFluff in the `Run SQLFluff linter` job: ![Image showing the logs in GitLab for the SQLFluff run](data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAmgAAABhCAIAAADtHcr/AAAl6UlEQVR4AezVJQICQRiGYfK6a8fqTsMjVCi4c/8j8Ccq7u+z7jLyldyLAQCAPwxOAAAITgAACE4AAAhOAAAITgAACE7f9+v1eqfTqVQqnued9odhqJTq9XpFUcRx7OLRjL6hj3Qnd1zcwa7a8hvNpukCeAGCM0mS7XZ7OBwmk4nMZT3Pc9kfRdFqtZI9y+VS5rKeZdltz9aHujE4knMWzq3tuB+/SRps6DEzMzMzMzPz+8EyMzMzMzMzM+Pfs5/OZ0fTPalOe9LzJnfeejx3HMexbEnWV5LdO6zZVh456b6ya12+fXl2wHSn6eCBQfelKwMwrMMrhjF+dNaowuKvWe69sMevhhcO0/6krEu08YfGln9uGZ8y3lrUZMdtVlb74m7JkO4r3FG395zeQtY4Pn4M9cF9g+jpP9yHja0vtarpzJOg5ApZVUyelG122Ho32H+kH+Kb7jOtnwAhw3b8m2jdMePJQZO1wg7qGlNN9p/UTjTcRIzJ7PzMgyWZ7l7OmbJlW5iZI5YTzafaBNHprlMWX1C/6V7T0ZmjyRET+JMsZnHAee6554KLe++9N+0DDzyQ9tlnn0379NNPp33EEUfQ3meffWgTes5Hu/2+dvc13drNBLbSuvT5pdkBSx9bigGj80edN3biY/9p/Q1SGTw4iF91X9JN+5OSE60MnNNdpq0vtpY+xU6fwDLdY7qy2r9vKTnnKwOsf1lM/n9w5wDqrW+1cuCsrjPVS+etHSQyPnaBTk+uqNXF5OCtOfnR/EkzNjU+oX62957eW1GMTxcVY3LopPGbhnRb324Fi1AheDt7XvA76Vnp/9sW5qyNqJ1/bAQTmt9pTvabhI8uUWrnTZ11XM8Txwxr/Lkx6zej0k4CIeCznGj3Zd3G7xv2g2RzEoUJL1phgpxcvm3ZTggFUaiMzh3li1kEcF5yySWAIhlac7O0gVLaV1555QMPPGA/hXiU0LMqVbJnSAJ2UBu/aJBPq2c3eignr0QemRFUjaDIsOnOU9wTGq2vtSoBJ7aY8d1Xd1cm2Xea9idlXaKj00YoBMsrX4bOl7qyWOCksCNSCFsVcCIC2Ih7Pr/OVC/YNX2yRYJJrpCK6ckEnCQM2BTbeWKB8zNFxdCCN37bkHTn7Z3gGI5v4bws37Qsz9vvajueZFtdRNk1sNF5R4d0goi19MklFRuQ5mPzh03luHzLclUMEwjF+8YvGzT4NyXqIn/QZJJNAicJEmdofr9pQyxk7wQ/eIHN7zYF8mwxiwFO7jUFy/333//CCy+kve+++64JnHysSlVBwgIara+0+k9dB7HIv6ExAm3zZ004KPxgJe1pf7ANv/r/23d8/3/6s0YQdocfRIMKPvkVEVtgWEQhDGh/uE2baWkjEtqMcRKFLdG0PysJUaNw10aNqAXrz0eAFh+T/fLD6d5T+lEpj2jsaHDvIM0PXz1kQPPHK2xsf4TTu8I3OYALKSc5Gzi5stckGOpIJ+dztSFYvmGZw8NHdgrngVX7x0eOZy0y38JGmcP43vPrTOSS4em8s8MKWSebEjg9SMFGTFUMH9wzANh0nhq/agwekl2pzjBPcKb1zVbYZfjPptiOVpLAjs7OWzoFHdCuJSXlDOfCZUMX60BWyv5ZMXEXzkhW/h8RCXRf30kUMhFTzpkATvSHhTEGbcw5U1uBEGvGPhpJkJqCXfRotdFbT0SMVxaxjHJFxZNA+QUDwakqhpEp9URw9UOOwan8Cl6BWAXgRJr0EEKZLae99PGluogWEtf0N3/apD24fyDwkGmgx+zLHBimIvnwgobJsIwoRTcLzm8GODka9Hdf1Z0cOFEte8/qmdwKVXcxcCNZzIIeB/H2B7y0nHDCCXaaqj388MMjVUvZfvvtKxk7IRC930iqlly26tJ9bZdsA6cXH5N+fms/5zySUZkR1G9a+txSZORoE4tkGDa4axBWmGlpA2m0uQPjh/pxOFa0l29ezvsrAyfWk9+qKKGdw8uGfFQbJEESkv72e9qtL7f8il9Rh1cOywMy4ntnwNjxLz4ayV5tE4xFfT1pXgY7syFUGAJOgm1st5z3eGcW2UgOv2eF/59dwn7VqL647UzuvmwossnhE7gxaywwN2T2QDhdASq6lOkM51+GtL7Rkl0wyjSAH6nwHxOsbiDKkB2AR7v3jNxLyDkTOTEkQgOHj841xTQ5ZEIDj4FLIOwIwABU0NN7bi9VyERMGWds81sbRE7lnKmrrE67wUwCCJyVUEICejUzB85UUYUTXWEEh/jmAE7jIabFjYhcKHjpt7PAiStDT+cNHVmt+OolalEcEjJSBLO5z45jMgeG+VuUja9CHzKils0Dp7E1akxkJVE0s5D6ppNIgHa2mAUAp6h53nnn8bCWtK33mj4Ouueee+Jx0L333hsZ3Y0XLH5ctKC+JpHKgZOKoSHHWwhbsRQ66eXAWVCCuG2uBpwF0pFhzvsrAGd+xylwYgTxtjwM2MqqqVqBkwNpiph43UsUVNOwiclNyVJxKcDgOGk8p9IQhCfIJATlZAL+TX371CLjWWu7B3cPdPmzokHEPSrU4UVDjN1sPydZe+SJcmHld5wY3/5j/c6bO0hTK8DdSaYz7hpIYKdUk2NcGoWry5mP3Hv1VG3KGSdH1TGUmng6MzGhFfgHCFc7qzXBn0gVMhFTxpkAMGOOcs6kLsJLirJDjYkkZmVaiHQ5hnGXXxU4M0UlInHxJLoQ5XypWsyOpwPLg10SxpgtA05CZBegRMS/eolSdHpALBYgQqvSjOw9u+euq2IYF5x06sDhwWg94GFGtC7gxKHR0KHhoGAhRkfh5TBZej5mi1kAcN5444133nmnbXARgLz66qtpi50EoLwVOuyww84//3y+qkpVJRDt1s1amARzpJrH66k4Whw/2mjYkx44sY/mjQW8+YATznjP73kAVPRGiUhW233ycjynDIQmaayahidYqDg3mUXGZChoZygPwgi4ZyeH/x7XQsV6krumYaa0///9cuAEcvBPI3up3QEtMp3RRylUxgNLzrCpO86cM6Gi46PHQSgTk1cY2gvMIpEZDZ5QVgPOnDM0orKGcs5kO3XCqO6XFHShM8QXdFHvmKQqcGaKikmFCu3AgzmA0wUIY0jKNjF3BpwUoMvA19PN8uol2nlbx698WhyGy6BTpQWN5sAwxce/JHIIlE08ZETrAk5lZNBJap0GtMKUGUjE5NliFgCcwOFtt90WH4kyb7/99vgYgEo/t55z/OWA93m4EqZty8fr1OPnIvgASNv6uQrJfu9EjeI3CJxkyQrRPbfoYpXf1gWcCdH5gTOCDxLg8wEnl1thgyI9izZzNoyHghuKCS/HI43WWokImU0ZFQ5z2GXOPDt1EgZnBfeZSLdQOa5gzGw/e/eomPAn7VwOnCS44u2lNzea+0xn+MqAb/VOuVkUeKgszJGr96txLH/6Uc6ZiI1WA2cmJnxKIZNt2kB5yhWyIKZyztg/vGqo9cSFKuFM/nfJRdmh0uR7Z2XqpXvQJdBcnTmPTtOtAZyuWU/Cj+WKqi8IkhkrI4KqGBY8JCswOmcUjkgROJNH9QV7ghHjXLPCuYnqOSHuSF2oMy6D9XC9ql+YE00xTPcObwnOA1cqXkK0DDglAVFxt5xo+wPtcC+4ETBbph1z4xFr5hxYBHBeccUV5GBPPfVU/iLlzDPPpH3SSSf51RlnnMGLIV4PXXvttfTTrkpVC4Wu6yeSxS5P1YKR5KCwg/AxwIb7Tjlo52rgBHLiLhOGYg7KgZMjpIaR3mE8IBQ6qgtfN3AmRA+a0KDSKWna2KwS4NQfNCXFYExAVeDEkecwCDneFKLxGHTtlDM7QEPA7mxjmEizMxvjg0q84mMxHg+8XVyE/uN9Y6aIh2opsRgEHSGLDwRYAG0dfNokITEf8oq/a2RVESdlOoM4GK9+YjXYBbsDHnxqoe6hFVxzcvVQuGKBJwC5oXBSEs4kwJmJSbwX2xigTDOFzMRUwpl4HGT4wj0oippyppaSAKcWXKLwpACcSEFZs35fkWSKSjYYlcYKcxbwMwJuy4GTDcIuq+8EzXZAka/ivoDrAwZE+O54zhoboQGvXGchOPZIkqmej6hpfANKB7Nl/TnFpMJ4N1lCVAyjBlHG+Le8chL+O1sJUWfmo6ThOe2wS6RVnH9dovwk5rfBjR46Fi8AHCxSZotZAHDusssuF198sf8HAtHnBRdcEC+AvNek8O2RRx5ZlaTehCqrnSr/gwESszqGVvLsJnAwK0yiRDnPAZyW/lP6qkvBcIRViqKHiAqqiwGrRoRUj0H7vTPAeV1Mm/RnJSHqRUihcmDKgZPzH5f27Hod4Hy8CJwGBM6gosdrXg6VnZgq+Ryn2lNh5WHL6tfIBKn2EwyFfx3qTkK1RvUl9gq3ScQSOF1wodIfQQZsxwoID5nO6JUb/Vv5if4sD2eUdSGH6YE36AzOJCXhTAKcmZj4N8JH8MDIIFPIVEw5Z+IFCvppf+8FvZQz9ZU1Uxf9/+ureCYMAzjlA9yON3Qlirp867KdbhZbtJFUbdQIBMk3rP6TSp0exRE1dkFdnabmMK4mwXa8tZmPKK5ADAtxOE+4U1R8x3KiYlhUdc9TptEQsxlWTjS0NGqEmLyS1eaXE43nPwWOgQWFwQJqtpgFAGckY/fcc881O/fYY4/NkQY+K/wHCDi5xASIsJDpQlPpxOUP4KyxcIH63/P/3pGBhMmFTm4LZh9QyHnG4zVvhD+42yTnnaf2Yk5i41ZbnQGWKsy/P7pX/MNco0D6y0nXzZkQU/2lBs488QV941RueHiiqATNR07o5NtNFpBsg6E2ekKumAQj1Nd4ffMXvYSaicoBdAzXZ5NE0XY4uUl2idZcb2/cM/4Xe3egwSAUBWDYBG5AhiBABERAhHqF3uDCBRj2dHuBvdfKZbMBDLbt+xBACD+c09k/09FP3p9HfvJi0/se4QT4HnlGLJzCb780O1z3hVrXUT5FOIfiUpSxrABAOAFAOAFAOAFAOAEA4QQA4QQA4QQA4QQA4QSAvw1n0zTjOC7L0vf9y6nqruumaRqGoa7rCgCEs23blNJ2AiXGuD3Xdb23c57nG3tnwEpNFgbgH2FtUoAIEQVFVBCBaEkhvwARqSqqoghRKigoSlS1KWxFbSFShEJLyUJEIKwg7NM97Wm6t++0I9/W3e99qu2dd87MGVPtc99zzvlG5/nYNd9R+VzfPzmwZkZHR393wLc/zXkzP1sADw8P69nx8fGMjAwCM/pTMN3d3QRm+Cjp5OQkwSdobGxsaWmxZnhg9WYEQRAEtxFnUVER4vT29ibmS5w4Mjw8nJgMsRJPSEgIcUpKCrFd+Mbnx8fH29sbYtPJubm53d1d8gMDAxwa8gY6OztpaaW/v9/agIxVVGZmZ2cfHx8JzLS3t3Nbgk/An3ZycmLNbG1tqU4FQRAEtxEn39osLCzUBRyC5POcxAkJCcR+fn7EWVlZqvQktsvExMT7+zuyqaurczpFXgnSnDeLs6qq6pd/CA0NtTYoKSmxZszExsbm5ub+x+Lc3t5+enoiEARBcGNEnBiIOD09nViXm8XFxerQLjc3N5ubm7e3txsbG99DnAjP9dTBwcFfDmpra3Xy9PR0fn6e5+H+Ozs7+sKamhrVmKs41HR0dFxcXKhymQut4pyZmXl9faVY7Orq0q+OIdzn52fO3t3d6R8Z+fn5qkfewPX1tYjzf4ggCDJUW1paSpyXl4dTmYRjdjMtLS05OZlTXl5etnplzREiYW5vamoK/TCf9+XiRId/Otjf39enKJExvdNQLYoiMzg42NDQQBfMgJJUvwxojN0RobUApTGzrampqViwr6/PKk46raiowL7cR02sYk3yzNHSmCfBoCTh/Pz84eGhvr6eW9HA7cUpCIIg4gwODsaIehEQssQi5LOzsxEqviTv6empFgoR2OoVkWALX19filqCysrKLxfnwsLCbw6mp6edGriK8+joSMWIltgwxxkdHc3li4uL8fHxrkO1/NogpjQnVq+LAvT+/r6np6e3txclk6+urg4ICCDgOWmg2jiJc3V1lasI3AZBEAQRJ/j4+CQmJuLIwMDAsrKynJwcklRaSqhhYWEcskqImMAWl5eXLy8vvzpAIXjiew/VmsWJX1W8trZ2dXVlECcMDQ3x8NyE8rG1tdUqTqtcmWSlECegsvzDAlUmVSn5goIC1f7s7MxJnCMjI+vr6wTuhCAIgohTw1Ig7KhqLPZuEjNgSwzsD2Gtja0uuQPawJ37DphEpOSy7hjhkKVD+tCQ/ypxMjf578WpoPI+PDzE5ex2/ZY4iRmI1mPFGpYm0aC5uZkYELASpxsjCIIg4qTcZGqTQcWIiAhGZRmqVeOxzEcyVEsBShnKv4GAROPi4mx1OTw8jDbQp16DwyH31A0oy1gyw/KZmJgYDg15W+LEzUgOOMsCHwIqQoM4OUsb5iBpQABqDyuODAoKYpx5eXkZcbLh1SDOlZUV2lCY8uODRxobG1MLelkQxDQn5TtbZVznOLkzGbVRxz0QBEEQcaJMpKgoLy9X+0/0qhk8Sh4yMzPtdokwWF9KoP2EWpaWlnQGNR4fH6MTSkwODXlb4mxqaiJvZW9vT4lTz4MiTkphFVMKO7VnRTGTsjwtMXAhAlaN29ratDijoqL0xC2vkU2ZHOpLIiMjyXNW3Yf/0hF/F0kNo7ucUl53DwRBEEScQInp7+//rf99U3LRgOBHg5qbujMpKelv9u4QV0IYCsPoBuC9SROWQXUxQ4Pq/lc0v0SiZpLmHEVwXPMFc2+eH8qssloof6L3lxnsGEMdAeYJJwAgnAAgnAAgnAAgnAAgnAAgnACAcAKAcAKAcAKAcP7Y374v1/WfzezTKyVfupynsc9kPY5M8vVwUyPblnGtWZDJdwhn7oG01nrvtdZSyv3Gde50vj/snQOT5UAUhefZ2JrS2rZt27Zt27Zt2979b/u9PVNdU5lKxk7q4ebmvk73Sd9zunvR/fqxSUgxmT3ZujUvDM5iEyeGV6zQK81WJKVzBA8cyPr7NzpnTpV/tHArLeVV9reODxjAQ0x0714NYS8sMraPr3796LRpkeXLI4sX5/Z73r0DSQYi2CWYNTwalVB2CJTJkWrcGLg8P34UDYFU06YG3ni+G1q4hyuc7JbFVl/sf8KeYnxOnDhR2tmzZ09OubRw4UJdrVWrVlESdfr0rF+/xOzeN2/w+O7fx9YLNXWFsyIIZ2zSpMCpU/Fhwwr1q9COHdw3smRJ2cAe2riRSqbYWb3iHjbI2B8SSF6kib1wlljWeD5/pgQwLDMEKpZw2iDAgo2Blz6WTxHu4QrnlClTUEeJIhtzopGtWrXCbt26NTt0KoaZKP5hsGph6Xjs2BzJfPXKf+GCujWDO+jA8/VrqQpnqlmzZJcuDOerw1ItLU2yd2nxaW7p0qKRY9nA7n35ktslunatMsKZ7NiRMKg82amTJReS7duDZA3t1eqYNa5wajMj4ALPoiHA6BN4mW4WSDjdwxVOdtxk70nZ7KiMQLIHVt4wxFVhhTrQSzpiaPNmw6qW7muhAP6Uwvv0KUNvrgb37En/l3M6tO/uXZyimPDatVryDR4+zBQ2tHMnTm4UGz3aLGrh1ys2fnzuymTit2+nKP+NGybHwitXoui8mC0R4Jw2LKkRE506FTu4bx+2/nDRf/MmdnjDBqiNV2TePOfRMcG+O3e4I5XBTvTurUvRWbMyldy6lQIzOHz4kGrQgPZGFi7UFATxiI0aZYryPXqkluI3TnCjbsBCCb7Hj0HVSGxk0SKVQ3uZwOGLjxjBz4mUE9t3+7bTBLdePf/FiyqZwZDI0QF2CR5OwmgO8eHVq3Hy6P1XrlAON/Vfvmx6AmtoqjyRVDU2bhzO8MaNlIBHfQBbncoBmbw3BQc81DkH6hkz1H9sm5qdrdGepoPBQ4dSDRvihqBVjv/q1Yz/yBHqbI+M0zhAzdRiDD1ffqpkkBS/O2SNQUBwRebPx5OvbJAFtAs7Oneu/KSVgOLlv37djE4YkJERGcFWR+3b1z5VC4cACUiB9AGd0go9L5vK2GcHWfDggeBiXq5Ie96wQUDlT5liFU77jhobM4a64edFGtL2aq0wrnBOmjTJElO3bl38/CFooW4J76jrJ3r1ksOZAhh05zDU0aPe58+xA8eO4Y8PHgwjMIA16QRH44ffVb4MYkxKB06cINKyZqhg/JJzREvZJT8yJoOct2uQSM3QAemELbI2DEgKyUjXrWsrnM2bK0ZCJdI3Km7qGTh3jmKhTvjCUklBKvEOnDljWaoNnD/PKbwGgBhmxY8/RdMp4wYoQEyNZgOX0R7s0K5/7JyFj+xGDIcFj5kZV1BmZmYUlJmZKygJC8L3N/e7fnduOrv2QsrJaHTKZrMzHo/tn2Fy32aUS6HbJBvlRsF2OR9GkKnxeHDw5Rh2Fkpcr9GVH+msC2YKbDteeIHBmVTiud7x0kvcTziTTHrkiCPIDWVMfys7b8IGgZfbPvwQE6zMyDFHxl7zgAKZciZvIC4AoBiwIibyPuDHxyC11hoe5g7Axs+NR6mYzgVOxaMrqDAQeUPF0L6uNimfW7/7jtXBxh3YilxVl+IAMCb3CKb5hN5xzemelJhEO9QCcTpStandSDiQAWcmqNDsSnHItn32Get1FcNtY6r2xRdf/KN4H3j22WfB16NHjy41Jc6gcqk/XgOn0s8djD4xHN6cvwV9sV+77r8fAcWGWunB5Qzg1MXWyjBjUWxzwD1XX82BhSi4EoFpF/RzewKn1lwFYwlzgXPP5ZezQIk3valpgAZWbQKKB7RNqj1honYqq3Hu22A7oSSclEiDPM0HYBOx4wqpWiGKHYHt2se6xhmcj3AQACOO1AZBIV2Lj9G0UhWCIZFFqrbgzPSka898/TU3+bvnyiu1s2sczhtzbX/3XaITjKmmVuD0t3AAjnGN6KacKRsBnACc1T5r4BTY3F/YSMJDH2IucBpmETHrOpgNIk1CHI8urKvt8ePKjAvcfeedrKtW1WU5QKiq1qC5wj8/TIlJtCOrcSZ2I+FAApyZoAqcdLK7mJQRYAYHnBydBSzjcBAACUx2H3jooYe4f/HFF69Qe1O2ULlFgJPIgDtNR0Bx6CI+o3Ohby5wmod0NABpLnCibOIWP+GmBpGsHdfoT0/g9AyhBKP2c4Ezzn1obsI0AGPd5121zjXTcU0AlAGnbkHTibfkgOP0qXE6CPDMNSi1OHCazbOBRtNEEs0QL7o1dY2z5kw2KY0nFSR2xz2qz4U6iFlBDajAGWhH0p5rAqOEM385cLZslMisJYIKbsUJPuel+04RuCiT/YoNKlR1WQ6AQDyDHslGUx0pMYl2ZMCZ2I1EVRPgzATV8oEfdVZM4w+ljcBpJpbXTu64447JZPLSSy891kle3XXXXaDm1au6VFo6/FMdwygr6ik3mIpYGzGQCLLvvvlmcE4t4nSurm4DnFi3pYATSgI4zXQFruCQ1sAZIQs64xIa4ETxuDZ3ahhUA6dFX7Uazdc0TBt0nXcnIm2obmfAiYnxI/ZdNlKmElokMggzNWojqOIrZue6buaiLcoSgS0BnJ3pOMRr0B97jSEjuIQhYSiniTS+xHbXnMkmtRlkFD6EzcyBAokLaORRA2fOmf7AmWqNiI4Yy0a3uwbOMP0hqObwGYooDQBotgAemocX5ApVXZYDMJYYzmFjZzNiEu1IgXOm3cg4EGXL8L1qQY1XiRAMV+1KB9RG4Ix29uxZYPLGG2/kOt5IuaHHIUYFUTWwnu/9EFmsAyZbO4tyxikMzuBgAc2MUVkR20irqgw1cJKcQbXEbCblmqpPBpxeS8kiNU6jUrTdULUBTsizvkjPPNBmUrQ08saZaSAdpFGACf6Q90Yy4PSshNVWWcHgbET4yJ5nwZR0T1LAW9dFSM10GeUOIvfEFU1GxvYMwyKqoFKFU7/lhx9YHfYISyrxUoKQdMMLZ0QqcF8wWBlnauDkgSiN19KLEMpGFgIPa+DMObMccMIZWEeXP6yLa5Ay0xonhTB4whawKOsOWQtBdSj63skE2JONwIkYFljFVm779NOdTz/NRggquaquwgGiTJ+MPG1CTAqcu2+5hTtCo9/SCX9n2o2MAw5lAl/3i0GI7DNBZXwEA4wHdI2/4dIIMwMCTsJNSpvnz5/n5ROytaRqD/9WP7jqqqtAzeeee+7KjbZCttazi817nJE02/zLL3HIIk7N+bCdB8znaI7p2PoaOCOA63ZUKANObR/jE7tofCUma6T+hDpmkSpKiUFA2AvNfQ2crFSQoAMDBXDiy4ddmDZGlJAb4AyrESYJ42L0FuO0Yevp0x4R9PmaeMy09FvTwnwUbA/Od2NH4UdnxQ7/rWNBvMPa43yyFUeDTmWj5kxMmkU5Hv2ti/TKmLm4GjgLztS5yhgq3kLhTtMBp0xrWGAUAp1ajtXA6VGayI4aRLr7npUNrAoO+BVbkKvqKhwwOR8qkxKTA+f2V16Z5hjom9iNlAPx3rnHlCwDZYKKW6wUhahbFBhKG4ETyAQgba+88gpBp/fvvffeuB/f9nmrEqXSo6wbz/Awvh6BVDOCd/70RjYG7EHurcoIJ3XDuAPAM18Ow3v1kF7RArk1hREV1Y3lc5ho+jALyD0T7VBvPGhSWA3bMbUkAPu8asmAkC0lPRtkQOQ0x9gRiGQJq3MmaWQy9XvIBC5IIRHG8pz5WxszojLI1eJTU4lkaQ0brR3M3Av+1qranwM5Mas37UbJgdUFFdeKmwrS4NqYqiXEPHfu3DHEa5Ct8VUJYf/q98qbkLfPf0cCL393+cc2D13cpvXjYKu1sY1tbCNwjm37G29s+emnzRcu/MreHeIwCIMBGDWMiYllwyIQaAgOyQW4A6YcAgsKgSHcd6tfsmVLpt7TFU3N17+m52mKT5pfSec5OY5rUXx42X8uTpfl152HkOz7aV0vff9+bCXL4nFtW6xmnML/ABBOABBOAEA4AUA4AUA4AUA4AUA4AUA4AQDhBADhBADhBADhBADhBADhzPO8bduu66qqur/6NaKu66ZpbgAgnGVZhhDGcRyGB3vn4CRL1kTxMrqLD2N75tm2bdvv+9a2bdu2bdv8e/bkZr87rjV6I0/c6Mg6lZM3K/SLvNPYiddVq1YpdipqwofS36eBXdaA7F/DD90L3fyWfFrk8o/y8SBxW/yrsktSJLtnu6lIJBKJigKca9euBTgrKioQT5gwAYDs6OhQd0tLS/fs2bN79+7fDU79c137UeMFQmTx5khf+0GzHrbSIpf+oY4HCeeGvyY5qU6QrH2npUUhkUgkEnCCi2vWrOG4pKQEgFy4cKG6u2jRovXr18+dO/d3g9O+wzafM/UviCXuRf85cAo4RSKRSMC5evVqvmxpacFlXV3dHwEny77JBh7UxJlfm9c/0EELANW92E0HdYLTeMUw3jJwy3jNiDvibOTYd9n6Z0Qp/SvdO81j33zWhBlOCRFjF8TI5ENj70yPJmCA6lvNucYp1KmjOqgAH3dz+3OF5DMomU0qPiDN8MNZIT+R+STtrsAZjYvMF0342NF8wcQlb+r/z8ezUxvXOcUNTpFIJJKj2s2bNxMgBg7csWPHnDlzEP8J4LyxE5xAGmIsBqTy/SMInAwk9u1b7Iya7gUuFXnZcM91rUcsjLbs6x93covrAI2UfxblM5vBMHCO8xGzbz5lGu8a9q1UxzvFI/NrzbrPwiti/Ocywwfmmd9scgNJZcI8Np83eRfAMilLktKEc7jVYgKnSCQSCTibm5sBRfXmIAygOJuFP2PGDMSDBw/+08HJsXWvxaMn84bByUzCAArqkP8J/H5F6AIFXzdyO3I8U2aDk4+LveM8ToubYnrtiBld0fioq89TI6bh/Mq8c5VDG71h9OeDhVwEbcStsQJnbm+OYYlkLGZqsCwIlgfkv09PhxwBZzFJJBIJOKHq6urJkydPmzatoaFhy5YtS5Ysgblr1y68J2jtz0IMcCLA4e0fB6f5tKl9r/nH+HxMihgrHUjgRIAJMoUGp+wnFUnajzC5gpGcBvB4x3vs6x+RGc4pgBMxg5Mzo9ERYqVgUQCTya3EyT0W5uD+fOxFwaeFIqjGDWAU7p3v/993zyHfvttGMvpBLOAsJolEIgGnUm1tLQA5fvx4njjnHdLWrVvhIwBZ01+tYHEQzgg5xuko8OCdTGyzb7cR84kocggbX2k8cSo+RRMjFWdpQAo+OZc7VORbLS0hD5McLvNr8nFLjECBU/+SfP+Az3+alJEZDY84Jx4ad/WxNTV8mhdODwtrZtifn1Qz/qkBFaMxMJKBqpKD+UHcGOfX5+HjdJdm7o35QvNFIZFIJBJwYtzEvzbr6+uHDh2K01p1PKvER7X4P2j6G+UfTdgw3jSwmCXAJMFyYcCX1oMWkIkA86gCJw+dOBEl/y47e4oFg/3DfPC4AMiqhPzbbMT6e7ral8HpXF3gq/WAZd9pG28ZjF7jbYPhbd9i46099k02JzO5nesd9xIXkEPZDJ+LGO8YjG0GJx3bfouY/nuK6RPdat9oAGdSmXCO+YwJR8ApEolExQROIHP/IW3btg1DJ8w/Dk4+BeX5jJd9g93J1GN9JgqW9bAFkJB5OIETBGKWgGTh5DCjPg2a32m8AFp+kw4EYuESJsDJd5PahEdJ8FL9ifm4yfnxsNh43VA+MMzJIKIy0VK2T0Mkd/KhzruHs6j5YGmAibOzz48Kb/HFf0nZAVO5TioSiUSiYjmqxYhZV1dXVlaW/gWK2+JoQtTHvyoH0Uc1mCJ93BoT4fUXlZQnyIzb495HuNGwqHc+kw/9xE1xT78uiSb26qeEmoxGRAiyfFV5VN+bYjsq3tCteFKf9NH5T+3cIQEAAADCsAr0T0sHDGYrcXcATN4BQDgBQDgBAOEEAOEEAOEEAOEcBACuCmr/QJP0VtbpAAAAAElFTkSuQmCC) In the left menu pane, click on *Pipelines* ![Image showing the Bitbucket action for lint on push](/assets/images/lint-on-push-bitbucket-746ed122c51527e3a775d29d3506ad5f.png) Sample output from SQLFluff in the `Run SQLFluff linter` job: ![Image showing the logs in Bitbucket for the SQLFluff run](data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAoYAAABhCAIAAADENvc9AAAeh0lEQVR4AezXAQaAQBCG0blHAHSCEBASgKQrJHT/A3SBwYKdxeMBBgN8/pjmtRwAIMkAIMkAgCQDgCQDAG1JXrZ7P5/jegGAPiLtcf1nACDJQ+xjAJDk5A4AkGQAkGQAQJIBQJIBAEkuAQCSDABIMgBIMgAgyQAgyd/PnjkoWRfDcPzpPtta27Zt27Zt27Yf5fvNduasu7p79i5mMnfaFGmanPzTXPfYqIx6Qd8N3c6dY2gbHJvdVFg/kVHe/8/c1zsiX8yPSKu9rjhEhCRW5teNZVcOOvgmK/y3v2zYFmZezUhwYoVkB7n0l1/NU4u73UOytdbGpo7hHJ5zSuYoFvEIzbn3A7sFZ/lEFijdbwauOdXD+taBCufGxnpkZO+ThO8V1I3zi31voLvwjYei728TL8VRzV2i7usYhnYh3Py7X7ZXnI8DOwek48B3Kp2Y6Rqcae0R/+GPnWSJ+tIVUl/6Jx1HG88E4omOpd8zJMuif/PQduvIbsf4Afd1dgJMhphQ2b7ATHC0uGmKRvvYPnwx5+ofA/uwsLJjoXFgI6mgQ8FROOxW071S17Na1bEo2UQu/f1v28N9lq9+KjIA78h81WxMcsMJX3+3lMxBQYhptd0r9+6UmKOhf0PpOgekcbDE/LarrL2xq6hvLI2YFa+u6lzEyXHpG+gel9PCZJGuaT+RzSuOKr7lO6KW4R0iw7nxREgXdDxx5A6TCtph1vWuHl+SXNihzG8a3AQkNC7dziepbXRP4eMAhFDJEtWkQ0b2oWgNHyilq6b00pYZwRREV+bnz4VrW6/EiyCZcIxhzmZbyUWd142zmRX9LCG3OsUnpYIfllItX35F6bjdN32Xq2+FgiQEWgXJgmp7VrUPkkUsjnvzw0ojxhKkWWOpT3wdKFjWMiPX/ZFBskIqQPK5HwJMRDv4pnzWdcIxaOtZBYjnO7kRXQHJp7LJ3JqRL7rOMVlNtCva5jUu3SUoI6Woi2oidvQMz4Vf3bUkWaKa9OjMRgURRYVSTd3zakd5H7/9af3T2KPmcLmlW8wzJN8EkuFTstZInK3qXMLGZ/nxh5GImrb2R/lnSD5Lz5BM2RYFA2JLniFZNUgm7iM3p2pIdMtaZ+lmlPXRBgx4qwXFl1MtOw7JALAoYIiuwBLSd81KP0XNg1tNg1uSJepIh2hQ6eS/sOOQrJr04+QVnscSTvIkIJmEnWxdlMvwv+LGKWWIlw25IVeMv1J24LJefbOQQDLJJi9a+LnVwzSgr3rOkljz3cCNPfmfRqkd0fWPKaZNWspyaiYN/f/JuQsd2Y0lDMBPePmGmZmZmZmZmTmCoCAMwtDb5NP5pZLjOVvx8c44MFJpZfe63d3V1fUXeX7Kqw467hLtjCbXUstelfYjT7tmtpb3fiMiqm3Qzjx8E7ztd/QFL7z3vV4eOO7sG2uBWOG/LkKNQnR69T37ivt0IW0HHHvRGx//hJn88sTMn3z988RtLDM24CrnszWBZGNZSEJAIvZJsfSQbIa0TExdHW99+K1NCKKl3f/sh/iZ5QSSrcjyQ6tHWszqqTe/JHLIA1fd/ly/Wc3a/3nwGUQiQ3uV4MqMzQo59hkigcpMm5x7rWziwy9+ahR01R3PVxc7RROFvdR6dJDnjz3zBtcOl7m5IE4aJdKOPPVq4XTPX3n7c5mVwoh+7cib6cc6iV4ygmRM8AABO+TEy3p2rYu4L2Zy1uX33fbIOxkoTqTKDO38oYpXu+UV9ZDsyJOHbOLTb37ldMwQ4AYYbrjvVW+Iejn3qgdri/ew7sycrxEkG26YFFO2osvl3rDW0UdZVe3ErOmyzOiI9s4pHkLyYqMP6dm3v/Yv27QVkEyTWq3owUXXP3bHY+9FnYUcjLi86k3EEwIMDSTnmSHJxDS6htLMVpXOdcspdH3HY++OXkWxaj/xvFuqpWg2JFMllFpiU8N2YpHzjywBKMZ2cz5XR6dKGqDyAC3jqOeFUS747L9pZAOZRjQRpd9z/qEXPslsxZR0yax6SH7j4x/N36mQv7HXov2bCMNGI3v/TQ+87lgGkllR2PvgnjmrXVotODB/eJa9FvloNqtZOzPRtX/pdenNT6n4c4BnbFYsIW8Ww+S/gkB2/emX3JWCtSgRnDSNobckRx7Mw+EEP6++8wWGYy7+eciZGRcSMz11DyCNKOeoF1SN3P2b7n89ssHPHkKyHGdGZ//17Fojgfmkw2OFMyuz7xauhcHturQwo7aBZJyPoOIn/8kDSgJnCHADDFF09L69YAU6fU4WztQDq5A8eiAFNHi+odFtonOku4umyzKjhxpIXmb0QnSacFvKu+IIKt0ctUPoqLkKn/aQHDudYaVdBakLxPyfB8mUmu5OuA3Oq7w8z7imdj2Zdsb4LmOhhGOvkEwXUPel3YI9RqTdiJSLUO87ljGR4gUM5C6T0WCSgYbaDZY0nOdm5ZlS09F0PSRbnRGVV2xOCkGXmRCJvQauDzv5ylVINiuH0y7XvveQ3KwdSq0GgWdsFmISxUjKxEKB5GwfiU0CLMIPSIJJo3VFfbNFkh5LzA00kij/IrECcYHPzAoHmrVnVsJ9uQXtI0i2uTFKGBA9uzYByZXke/adb2ZDsu3T6NuK3Cp5cztDgBtgEJGK+RIbUSTj6be+ctFAsvl4gIYUFHG4sljdNzG6qIAIBxk79KQr+i4LjN5A8sKjq830JIW/RRXXUu4RNayEjhg31LN4NB2SQ9r5PXU7D5LLQKZPd/Ls63ZDkFw6hVPrdkZ6MpCsGM11PIBkaMxf4XqU8tAGZyc2nL/s5qdcJBKLooUp9x6S73z8fY9FQQc1e2JF5fkiCr0WPmxPTC8ris07EZKjypNbmgjJzdqtK/xZSy6ZvAV9yUAEIJAcWC0llb1LmuaB5z4aBZZY+soghJGFOk3Pzgp4CF/73imPCfZ6TMhniqDyC7Uw1HaC5IQoxIensGuN+559rMMubCCUPQ+Sk4oCh5kwjiVQOV2Ae2CIYZ1plLCJbdjoBpKROLm3RT3KrbjQsvbR2XZW7TauUd9lgdEbSF5y9Ni+EH3rvktmgQIJWjUmf1wEZkvs/RDHd0OQHP8m3sOfB5JLYsrJgBzzIPn4c24KJIu8FSQTzVENoXYC3XD+2rtfimMtOFwkRddDckrKAxUJMPZztkzHb0hxDZGFDNuPOfP6nKgk/+oNlFoPyYwwLc5bJRF/F5KbtUvW+hefeC2QnHr+5BTQ5bc8XZCc9aIoF9AIFysNERJu1cJJNdUkPuCTRjJGDYnSz4BkQK4FZBbcjiCZe+ovI28Ku9a17wXJkeqiguSqvvRAD8llV9VUQ5ynXoCnA0PVD9uIqolJYr6H5GJOSky8QeRj7aMLzIRFU7osMHoDyYuNbtNjh23vT4VQ+vKmpYYSqKFwSwetHZKZ1VoEHiP0iVb9+SHZuZU22yUkR8tjeFlF8T8azgfbyPe8imvIB5wCGGsUwfi48Gbo0vWQ7Dr6N92Df5yhZrOate8UuN7XzVotNqFfKI4GktkBlZgIeT4zTwEXAmwB4HSZAcmEU0vKGLmexDXGXCA5shEesnV6dq2RGkimQLQrQMsZj8vbQ3LyheHwfAFugUG1nb7F9vgDvrSZAskhYY/y3Uv4BQZ2OXrilA7OxC4LjN5A8mKj24jktrYOkikUNrUqTZyKQxb3RZFFggYpD0EjSA6aunD+p0Oy5xEEKm2O9Uo0BfoMsWlIrtFHkKxSVHu0Xg/J+W+4RCUxnEUvZ0Cyt+V3IcLDfLenCLnnfL7P4ydRzT725T9FNddWKkHKGvM8HYex2EuLWV2MnqRU10jceq/lAuY7chNehWS/w2BWznDKvrQ4vSKBlGBUNuBpNqtZO28svycj4ceyFiaNM7qvmxV7wmv9ihAWiQN7rYU0kFx5MqE5jbRMKiWjyLLGGLuuEXmYCMm19sqzOghSm/mVnjJxqryLQ2mqWWnPrgUgmaWe+iz7a8LkYRWSRfItML5+mRrYpa8NJbHQd7YAx5ZioBTZghJUAVLo7rXZ08TnwnA8lLfOtUZmsap4u8b2otNqc4vojajB2aNHRANLwy72tOmyzOi2DB9iLTk7rnVfZnSFYDH0h+26b0t5V85MyHcIaa8azkTARrnknLpQBS1DwyDkTr+DU9+QcJXq84aUsUSfFiTbv11CcjN6QfIoSQZaGkimMsKZEH3XfCVSymgAyb8wfYJVrus9193z8l45jyHhfAro4g+FaKhKOZd7PdT+Bx9/aWq8ixywTXxla3VJwfpEhDJdheQQKRqyNEXXJukkY36zWf3aGZQ58yGPadzXzaqvd+phggGY66dp4P1vIfn++oawukge0+Npd1vJ4xfe/97tqIAD36YIKsGL+kupUQwanBxJJqDCxry2Z9e6CNtXIXn000u2NeWKQ31asxomaLCXEijOi1jOFuDwYUhK32MiZ4hQfXdXVXhFKR3Qa9jCVg6ri2IiJAwze/RYciNy3psuy4xeZ6GIvlpm9CQxR0Qtb1HgGssoGibMKDQhfJcs3RCS10v2LB9N/rVIUAHHdh9XAdtcNymWnTi/SkJnbEzPTI8tyyMaJcm5TZBMJ5U6fUpxjNRe9s9MX3tASIjMTHa3WbLjJ3sPALMp02cFcjbH3rik/zvi3Nns+kMOCC9/Ri9IvzkB9iqScO5VD+LMxN/YZz8xvwItq29jfTI7NjR63+VX9u4AA0AABsPopQIgdIDQFbr/GRqDgkDrh8cDYAw+ZWpy+rv8dF/veiQZgHqt0gdfpvN3kuueU5IB7td8H/2Jy3RJBgAkGQAkGQCQZACQZABAkgFAkgEASQYASQYAJBkAJHnbT3vJA0CSl/WwlzwAJLmrHHhWBgBJzgIAJBkAJBkAkGQAkGQAQJIBQJLnXOyYA4+1SwzHv8hr27Zt27Zt27a5tm3btr0x7i+Zm8XB854ne5icpElmOui00/Y/0837zt95+g1av/OU8ujY6Svvv/xl4RR0/8XPXkOnM01Q1/4TNdm8fc/R+07c/Gnr+93Ke97KXU38xev2v/vu9MchYP/JWyoXSst69sF66fqDBjfduJmrr97/oG50+sLN4vynrjzV88GOXXi4YPUe0e4zfCbGHzR2vnw3MEVSdDlZep278ZJbMzaNzl5/IVSYMm+DfiRit1VbjnbqO/6fpp6/cjeGJUK1KKt9rzGL1+7fdeTahFlrFHLR5r3neg+boXtZiqRTWVPmb4TJfCaYYdUMyaOevLMIjMpKzauz94ykq+AraQUN4YmFdh4Rbv4JQ8YtYCbdjMLGoRMWabL5X4fA9IIGJ58Yr5CUJvQFMNjBOzTVxTf29VcHlQulZSVlV7/55khDmgaOmWflHNJv5Gwdme7Iuftop250+8HLqJCQWZmQVaXnO41LL3/1xU60Zy3dxiWSy2S7gXw6f/MV+KF7BWW4nCy9UnJrrz34SMOoyNE7GhUIBO3aFkPFppcrMHm6RaeU4DDJubVYY+7ynYLPCy8sPp8zbN1/oSm4IpOKmClo9dbjWpG1bMMhLis5p4YYR9wPGx8BZg6eUXQF8/TVZ7qTBZFw8BY4X/560NWpXiQH+PEZFTDDEwr+f9yYyVy49g1LU85Z7386KzscHqYhJONwzDx05q4Cn9jGlaXXai5LgvhVsJy/rEEgWdDD178NC8ltdwPNySMoydU/XucKync59DJdSBZEitcuJFs4BuEnCkyeMsBDj8FTsST4QfiLmhbSidmWkOwekMCLGQjp3G9CUHQO8MOStssCz0ZNWSYmUEhDYsc+4y7cek2gzVqyDSZ/A5ETdCSLNhjpE5oGdjZBso5kUWZoqh2u3XGyybxmMjFI5tWsCUr9sPbBXXBlIIFSnmAu33SYWCL14Cs85dTlLMJsz7HrYCdEA6JepA4mnX1iAZ6mH3BIbC6N2cu2X7rzlpmX775jOVENc8TkpbRFeNPYduCiXEj+be/PvxNqSk8iMN79cKaaTV5AL76nMFduPnrl3nuWX7z9RqggURi0dQtHC8xCFQ5zhcTlwafoRGmd+OTA/OkJKgUbwse8TZ/RwOhsupyN6JKAZNRx8YvjqEwmXIdPXKwtNHr73ZE75cBoLSCZd7cwl0LNc+Pus2gk3CAUZdVBlxq9SCUvP9uxVqjAr4U/ExZGHLvRgKYt2CT9nyahsxw7kKEExz8ic8/xG2yIYV98tIEp7hGlhBsjSHxNnr6zFKd98cmWBtcEDCu7nIReFA+p/aACB0A1iqItIZluVHKxtUuomExJhgTNqezcw0Upsu30xeI/9u1CR5pbiQLww1xmJsHlMDMzMzMzMzMzRxRmZlE4D5C3yCcdpTTpnvY/s7MdWK1UGnncbkO5XKfA/SxX6fwr77IuqzvvijtUUuXlbjq/Yu99SG5vTYex88IJ7klRpR9l5Az+9Pf/jfc2iRn6v+yGB1OW2/JIJGbxsSYbvPLuF3ZBgVAR18TtLE1Lx2qksVDiamIABcnjjNVNxmkpzfTDw8JVSCaUa8wq0VA2+JjTLmdaHn/mVVI+UUMONgUnu3nbA0/rZ0hniRs75xogBUTfDcEkFVC646xLbnVWFW69/6mEaPx6/ZEn31R58LHnK6uk+xTeYxPMCckWss/hZ3j94mvvq8qX3vncuqhXGJxYcazpyQkgZumQI2uZXtStQrwByWyL8tdTbNQ5U2OIh86wBnqA6DiJA8YagmT9G2j7PY+GGWdfetuv/rquysXpvsdfzuQpTRMIJINe87Sc2j6krKVdO+Kki+kUkxkSg6F1xcy312x8OU7xYcvpCExQcyox7zQDSGtvtrt3Yy3R7yrxFv5RhfoH88wglZwwi2I2qTTQvY+9xIkxN3+jEKE4TOqL3NC6eDDAz9JMngfmFQeqINkWWyn+aFbiAeyPOuXSws7FiVmW2KbrCBXVMJO9Dz1NIQKMOpDc3po+Y+eCEwctoOv4Gy6cqXxTB5KZX8+9/lHsMyZsLKFlHIsYlGfsBAUdDcoo99fvSGOFupA88lhuPFRiYpVWJiTDLXtMQU86YZQaKeFOVdC1HdmTjeM71t+5IBnRC1pWuKaIG8dzXSRw7Ux2ILl4omdrnCtwHZ37239sAHStJaqHn6QfGay0oYhTOZWHjB5PKUcjHnbChbGLhyCZ9+Mt7hHnw982UXNUdoj3P1QJvQx6xU0P9wPXuNeB5Gdf+6j2CAI1IHloXeZ/x0PPLjlwzdzRFW5vuv2BVQmSVYI9ZTKjDO/tskLJsKlCMoYXQGVPxLzT2HxYNg2R66wLEpdgAOBJSGZmYY7gUO2O7WMBaIBAgq4W369AsmknbunGA3U8CyS3t6bD2HnhRGDD6y5OwlrLZJEUTvQh+ZDjzveXbcoyyEYQ6eUaixrxlxKrfRc10b+t0VJXdnOksdqQPMZYTHOScMu9T/7AIHAVkln9pB/Fy1SgOIYaJ7ToGOciVZSU85NDvvIgWRCyIuefzQ/J8YD5HEJwUT05WrT/5EwYs1N5yG/z1I4UcdqGIBl4SD1oH4emfaGD//Hm1wQFhyolJnT47w12nAWSOZEwYBZInrou5r9KfF4klyyKgy36ITyuoQaS1dQt8eg1NuVH35ye9i6yYt2jT73F4TAlNx5IlKezQ7JsXzRpH5L1HHZVBsffxGBCVO3i+xVIZht1Xp8Rkhsi12HsXHASRGd0YsI//r81Pneym52/gF+uRPxAKscjQrj4WAX2ImrV2LtES0vGRyxCYDnSWC1IHmGsf62/I+llpv/wIHAVkqWOZGuQzXY4FZyE9ivUPZ1ry2XpgmSVHoPTywXJdV2LHSDG+N1CMuep1sWp9XpiRG1INu1A8rmX31Fesn4kKSvfrNIXMlN5yBBRsPzZr3dZVMKzrgQvLhvxmZIDg2TKDUjGfzHe7BeQGILkxrrAg+/i+tNwhT45vxkJ/HzwyZd6E3QtSE5kO5AsVTyZYNMAIkbreYu+Y6daDviZC5JNkr9bdxi9IkmpnISIMDiEK6kjz0lsLy+BZDDTh2SKO5kjk2lAcmdrGoydKw+aiwhJOlAdHWYO3T+Sf/EIAi0+VuK9yVIX2XRPKbEyoaRpRxqrBckjjGUU6msVTVd+4BrmOZZOrNsoTn7khgDRL4AcqBCUDiTTEYK34nWzQDInUmP9R8HRiY4K58OjxSG5MVapZjEAlQLIDUhGyo6Bd+PezQXJGEItepG+dh1DsFQ/U3mocVxe3LY1bmqUetI5r9RUC1eAZQBSDdWvw8VFQv8mc8t9T/nA0fT6kCwFG8BGHFmRRl9lQFDTHhKDxrrAG/1CwGh8lmKFSQ1qdGEArG58B4ylQrgJD1x49d2GgLJTIdnWJ/9i8kCoItv2RdnuS5cqmE9T5LrrMnpiHvRm3GLoXrnkP/9r01yUNSuVjA8Sq2cT5uKIQo8HyQbyRTXvlryZVQeSIZ8lcNyHtmYqYxtwguGOWIic5EZCGKgTSiNX//DZuCi32TMH0ps7cdhFwifXgrEEDK/mHatyCmK51fjHv/tPgkDsTg3igNrHkcZKnt4a3XSRsVYQfB5prIg3wa7K3CFdpRUIyXyvqC1EaGiufFyorIYFHSu72suWpbG7xJPGKZTt9AxO0rIipZRv9Bo9Ip1ZkMynVOmWdV/vuPXQmHxjrFBWgXbe7/goXFdh+5Bc9xiRyuRB+8Qx6kMyBQdX8i0monSk/Ro8dFU41nFihrRVdZ5K21EByQChSsc++LE4UR9ZphH7getQIWJgm6dIvyhQBH0xaKwrdl4qJx19TEtaEcH7oakK8OBhZqvz+BOXXv+AdzuQHOMyvEIQqAIz2TLY4KkJNEWuuy7rDebhA3uiPnXNZav0AwKTzrBYAqZN2nPcx4NkxycDMTJMtQPJeQSYh7ZmKmOHKNcYi5gC4Ty/LfJpsXXAq1lIz24ORqqTvZ68pei+cSqXMBa57Yzlyn3ATznrJbSjjtWp5J2PNBbqVObYrtKK/S6ZAdtRT/w8R7ebS16YWHzp9ntIjFxoyqxeMg8FAKbysE/MXqbxjN9MO7rLvt02YpaWvNuhJcy+LjWSAv2Pgugg+ftZLnklXLxGMgQnNc7KMlImOSOBnLQfmxiCLuvNvTXzM7ZtCkOLRtA7p8C58B3jVEESXetnypc8FsqlP3YAMRh/rEkaf6yv2DsDTymiOIz+O4GAoiABSEgSkIJ4hCIkFS8EoRRSFsgD5YFUQCQEhYJQAIT+iE4+1s/s7LjWbHdxOLh73+zsCL7m3pnviFUhs0dyEBHhLjY7/f5WK2Iks4qbZZYZERFh65d7RH9LjGQREREjWURERIxkERERI1lERESMZBEREdmRSOYdRHphAq8nVuFgJlMqshm8dUqHbePBlHKk22H6MW+sShtcCeVTFCMw2BVERMRIpmO2On8oFqBziu6eQRc0YgPmG4VfHFahliHzNBnRztPYPkG/FRqA6WNqaWI7KTWkram2avNzDEREpBtGMj37pCaFbS16hlpVP0GtcZ+b+SM5zZdp7RYRkW4YyZSmYllIAG8vklGGRUQfx071ltA7T/F1NP7pZ6ZdOQcPfgvjGMFJRxgxHHFbIpklaDr9OcPzl2+WnZc01VFgy005i+2ZBJqHSWLm47GvPgw6ihmIiIj0ieR496iWJaKQ6s8Yydhn7zx4BhFU0AiNZ5tM5QyDEwK6AoQ8/OeAbzF59uIeB0dsx8dAcnNado7ZkEbhwjGJZK6cM7DEHTUh1xxvBAvRnCSmvOgUoxdFGIDd6NOXXzWSyW8ctAxERET6RPLB4QfuTRkgJ0FIMmMkc/PKDTFUrSzfXY1kDEjRw5GLRCmDMHAYk6/R6h07da4uXMevlxJ5xgghoipCScuTaHTok9kIDTmA226SeHThmtX7/LSIiEifSCalkJShXsHhFbX7VveSRyM51lJAGMye7rpIRhSD0jhGOYTKBPBgL5mLj4CPdexo75DUBh5eYyk7JtHRSKadO946ERGRDpEc33WFHP3/kRyzLBy++zwRyYH3snDCsxkc5+iopp596wwyH/AMMnnz3iPGePHI6RrJON5jyBcREekQyTwMxa1kxsAKNvusNZLJ7KMnznB7WhMU4TmTLDVvHMl8nbefWyKZ2+J/F3DkJB8xz1+9vs/FRHfPc17rIpkxf2V/mujFFLv/cHHl2t1Mcv4Ll2/wyhNH1kjmxjpPgYmIiHSI5K8/frOHmjEsDt6y7RqBPMowQisgkK+RnEmCeTqSebBrNJLzdUJxecJb95+ui+QcfGnvNh+fLF7zwFdm+FOeGnv84hVr76uRzFL8959/cjCDHMwl5frff/zGInmNZPJb2XMrIiJGsmT9+fjp840HE8zcZNcZdpTpQhk9LUsC/vP+ba+OaQAAAAAE9W9tCU82OgCAkgFAyQCAkgFAyQCAkgFAyQCAkgFAyQCAkgHgoWQAUDIAoGQAUDIAoGQAUDIAoGQAIFuSF99xMhs6AAAAAElFTkSuQmCC) #### Advanced: Create a release train with additional environments[​](#advanced-create-a-release-train-with-additional-environments "Direct link to Advanced: Create a release train with additional environments") Large and complex enterprises sometimes require additional layers of validation before deployment. Learn how to add these checks with dbt. Are you sure you need this? This approach can increase release safety, but creates additional manual steps in the deployment process as well as a greater maintenance burden. As such, it may slow down the time it takes to get new features into production. The team at Sunrun maintained a SOX-compliant deployment in dbt while reducing the number of environments. Check out [their Coalesce presentation](https://www.youtube.com/watch?v=vmBAO2XN-fM) to learn more. In this section, we will add a new **QA** environment. New features will branch off from and be merged back into the associated `qa` branch, and a member of your team (the "Release Manager") will create a PR against `main` to be validated in the CI environment before going live. The git flow will look like this: [![git flow diagram with an intermediary branch](/img/best-practices/environment-setup/many-branch-git.png?v=2 "git flow diagram with an intermediary branch")](#)git flow diagram with an intermediary branch ##### Advanced prerequisites[​](#advanced-prerequisites "Direct link to Advanced prerequisites") * You have the **Development**, **CI**, and **Production** environments, as described in [the Baseline setup](https://docs.getdbt.com/guides/set-up-ci.md). ##### 1. Create a `release` branch in your git repo[​](#1-create-a-release-branch-in-your-git-repo "Direct link to 1-create-a-release-branch-in-your-git-repo") As noted above, this branch will outlive any individual feature, and will be the base of all feature development for a period of time. Your team might choose to create a new branch for each sprint (`qa/sprint-01`, `qa/sprint-02`, etc), tie it to a version of your data product (`qa/1.0`, `qa/1.1`), or just have a single `qa` branch which remains active indefinitely. ##### 2. Update your Development environment to use the `qa` branch[​](#2-update-your-development-environment-to-use-the-qa-branch "Direct link to 2-update-your-development-environment-to-use-the-qa-branch") See [Custom branch behavior](https://docs.getdbt.com/docs/dbt-cloud-environments.md#custom-branch-behavior). Setting `qa` as your custom branch ensures that the IDE creates new branches and PRs with the correct target, instead of using `main`. [![A demonstration of configuring a custom branch for an environment](/img/docs/dbt-cloud/cloud-configuring-dbt-cloud/dev-environment-custom-branch.png?v=2 "A demonstration of configuring a custom branch for an environment")](#)A demonstration of configuring a custom branch for an environment ##### 3. Create a new QA environment[​](#3-create-a-new-qa-environment "Direct link to 3. Create a new QA environment") See [Create a new environment](https://docs.getdbt.com/docs/dbt-cloud-environments.md#create-a-deployment-environment). The environment should be called **QA**. Just like your existing Production and CI environments, it will be a Deployment-type environment. Set its branch to `qa` as well. ##### 4. Create a new job[​](#4-create-a-new-job "Direct link to 4. Create a new job") Use the **Continuous Integration Job** template, and call the job **QA Check**. In the Execution Settings, your command will be preset to `dbt build --select state:modified+`. Let's break this down: * [`dbt build`](https://docs.getdbt.com/reference/commands/build.md) runs all nodes (seeds, models, snapshots, tests) at once in DAG order. If something fails, nodes that depend on it will be skipped. * The [`state:modified+` selector](https://docs.getdbt.com/reference/node-selection/methods.md#state) means that only modified nodes and their children will be run ("Slim CI"). In addition to [not wasting time](https://discourse.getdbt.com/t/how-we-sped-up-our-ci-runs-by-10x-using-slim-ci/2603) building and testing nodes that weren't changed in the first place, this significantly reduces compute costs. To be able to find modified nodes, dbt needs to have something to compare against. Normally, we use the Production environment as the source of truth, but in this case there will be new code merged into `qa` long before it hits the `main` branch and Production environment. Because of this, we'll want to defer the Release environment to itself. ##### Optional: also add a compile-only job[​](#optional-also-add-a-compile-only-job "Direct link to Optional: also add a compile-only job") dbt uses the last successful run of any job in that environment as its [comparison state](https://docs.getdbt.com/reference/node-selection/syntax.md#about-node-selection). If you have a lot of PRs in flight, the comparison state could switch around regularly. Adding a regularly-scheduled job inside of the QA environment whose only command is `dbt compile` can regenerate a more stable manifest for comparison purposes. ##### 5. Test your process[​](#5-test-your-process "Direct link to 5. Test your process") When the Release Manager is ready to cut a new release, they will manually open a PR from `qa` into `main` from their git provider (e.g. GitHub, GitLab, Azure DevOps). dbt will detect the new PR, at which point the existing check in the CI environment will trigger and run. When using the [baseline configuration](https://docs.getdbt.com/guides/set-up-ci.md), it's possible to kick off the PR creation from inside of the Studio IDE. Under this paradigm, that button will create PRs targeting your QA branch instead. To test your new flow, create a new branch in the Studio IDE then add a new file or modify an existing one. Commit it, then create a new Pull Request (not a draft) against your `qa` branch. You'll see the integration tests begin to run. Once they complete, manually create a PR against `main`, and within a few seconds you’ll see the tests run again but this time incorporating all changes from all code that hasn't been merged to main yet. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### How to use prompts for dbt Copilot [Back to guides](https://docs.getdbt.com/guides.md) dbt Copilot AI Best practices Beginner [Menu ]() #### Overview[​](#overview "Direct link to Overview") Learn how to write effective prompts for dbt Copilot to generate accurate SQL, models, metrics, and macros. Each recipe is self-contained with its own realistic example. dbt Copilot is an AI assistant that generates SQL, YAML, documentation, tests, semantic models, and macros based on your project's context. The quality of output depends on the clarity of your prompts. This cookbook provides independent recipes for common prompting tasks. Jump to any section that matches your current need. This cookbook covers the following topics: * [Prompt best practices](https://docs.getdbt.com/guides/prompt-cookbook?step=2) * [Generate SQL queries](https://docs.getdbt.com/guides/prompt-cookbook?step=3) * [Use what you already have](https://docs.getdbt.com/guides/prompt-cookbook?step=4) * [Create semantic models and metrics](https://docs.getdbt.com/guides/prompt-cookbook?step=5) * [Create reusable macros](https://docs.getdbt.com/guides/prompt-cookbook?step=6) * [Troubleshoot errors and issues](https://docs.getdbt.com/guides/prompt-cookbook?step=7) * [Conclusion](https://docs.getdbt.com/guides/prompt-cookbook?step=8) #### Prompt best practices[​](#prompt-best-practices "Direct link to Prompt best practices") Writing effective prompts is about giving Copilot the right context and clear direction. Follow these principles: * [Provide rich context](#provide-rich-context) * [Break complex logic into smaller steps](#break-complex-logic-into-smaller-steps) * [State the business question, not just the output](#state-the-business-question-not-just-the-output) * [Be clear and explicit about the result](#be-clear-and-explicit-about-the-result) ##### Provide rich context[​](#provide-rich-context "Direct link to Provide rich context") In your prompt, include table names, column types, and example values to describe how they relate to each other. Include the following: * Table relationships (such as `orders` connects to `customers` on `customer_id`) * Data types (such as `signup_date` is a timestamp) * Sample values (such as `plan_type` can be "monthly" or "annual") tip The following example uses SQL terminology (like data types and joins) because it's generating a SQL query. However, the principle of providing rich context applies to all Copilot tasks—whether you're generating macros, documentation, or YAML configurations. **Example: Santi's neighborhood café** Let's say you run a neighborhood café and folks get a free drink after 10 visits: **Without rich context** (vague): ```text I need a query using customers, subscriptions, and activity tables to see weekly regulars. ``` **With rich context** (specific): ```text Context: I run a café loyalty program where customers earn a free drink after 10 visits. Tables and relationships: - customers (customer_id INT or integer, name STRING, email STRING, signup_date TIMESTAMP) - subscriptions (subscription_id INT, customer_id INT, plan_type STRING, start_date DATE, end_date DATE) - Joins to customers on customer_id - plan_type values: "monthly", "annual", null for non-subscribers - activity (activity_id INT, customer_id INT, visit_date DATE, visit_count INT) - Joins to customers on customer_id - visit_count tracks cumulative visits (resets after redemption) Business question: Show me which customers visit weekly (3+ times per week for 4+ weeks) and compare conversion rates: do high-frequency punch-card users convert to our 'beans of the month' subscription at a higher rate than casual visitors? ``` **Why it works:** The AI now knows exact data types, how tables relate, what values to expect, and the specific business logic (3+ visits/week defines "regulars"). ##### Break complex logic into smaller steps[​](#break-complex-logic-into-smaller-steps "Direct link to Break complex logic into smaller steps") Common misconception Many users try to ask for everything at once in a single prompt. Breaking your request into smaller, sequential steps consistently produces better results. For multi-part tasks, write them as a sequence of clear instructions. Copilot handles step-by-step logic better than complex, all-in-one requests. **Example:** ```text 1. Filter the dataset to active users in the last 90 days. 2. Calculate their average session duration. 3. Join to subscription data and group by plan tier. ``` **Why this works:** Each step is clear and actionable. You can always iterate on your prompt to refine results — start simple, then build complexity. ##### State the business question, not just the output[​](#state-the-business-question-not-just-the-output "Direct link to State the business question, not just the output") Describe the decision or insight the query supports, and avoid only technical-like prompts. This means, instead of "count users", you can say "count active users per week to analyze engagement trends." **Example: The sneaker drop** Let's say you run an online sneaker shop and just launched a new feature: customers can view 3D previews of sneakers before buying. ```text We launched a 3D preview feature with our latest limited-edition sneaker drop. Did customers who used the 3D preview convert to buyers at a higher rate than those who only saw photos? Show me weekly conversion rates: browsers who became buyers, segmented by whether they used the 3D preview. If preview users convert 20%+ higher, we'll add 3D to all products. If not, we'll improve the feature before expanding. ``` **Why it works:** You've described the feature, the behavior you're measuring, specific success criteria (20%+ lift), and the decision you'll make based on results. ##### Be clear and explicit about the result[​](#be-clear-and-explicit-about-the-result "Direct link to Be clear and explicit about the result") Define the expected output clearly. Mention the expected columns in the final result and state whether results should be sorted, limited, or filtered. **What to specify:** * Expected column names and formats * Sort order and any limits (for example, "top 10 products by revenue") * Output format examples (for example, "`conversion_rate` as a percentage") **Example: The fitness challenge** In this example, you run a fitness app with a 2-week challenge, Kimiko's kettlebell challenge. ```text Give me a weekly trend with the date, active folks, and a simple 'engagement per person.' Then a summary by launch week with 'trial starts,' 'upgrades in 30 days,' and an 'upgrade rate' as a percentage. Each week, show active challengers and total workouts. By challenge start week, show how many upgraded to paid within 30 days and what their average workouts looked like. ``` **Why it works:** Specific metrics that are ready to present. #### Generate SQL queries[​](#generate-sql-queries "Direct link to Generate SQL queries") Let's say you want to build a query to find top-spending customers. ```text Context: I have two tables: - customers (customer_id, name, email) - orders (order_id, customer_id, order_total, order_date) Relationship: orders.customer_id connects to customers.customer_id Business question: Show me the top 10 customers by total spending in 2024. Output: - customer_id - customer_name - total_spent - order_count Sort by total_spent descending, limit to 10 rows. ``` **What Copilot generates:** ```sql select c.customer_id, c.name as customer_name, sum(o.order_total) as total_spent, count(o.order_id) as order_count from {{ ref('customers') }} c inner join {{ ref('orders') }} o on c.customer_id = o.customer_id where year(o.order_date) = 2024 group by c.customer_id, c.name order by total_spent desc limit 10 ``` **Why it works:** * Clear context about tables and their relationship * Specific business question with a defined time period * Explicit output requirements and sorting logic **Pro tip:** Start simple, then iterate. If Copilot's first attempt isn't perfect, no worries! Refine your prompt with more specific details and let Copilot do its magic, it usually gets there in the end ✨ #### Use what you already have[​](#use-what-you-already-have "Direct link to Use what you already have") You don't need to write everything from scratch. Pull in documentation, definitions, and sample data you already have—it helps Copilot understand your specific business context. dbt Insights integration When using Copilot in [Insights](https://docs.getdbt.com/docs/explore/dbt-insights.md), you can easily cross-reference between Copilot's generated SQL and metadata from [dbt Catalog](https://docs.getdbt.com/docs/explore/explore-projects.md). This embedded integration makes it seamless to access documentation, definitions, and sample data while building queries. ##### Define your business rules[​](#define-your-business-rules "Direct link to Define your business rules") Instead of just saying "active customer," explain the rule: ```text Active customer = at least one paid purchase in the last 90 days, excluding refunds Net revenue = gross sales minus discounts and returns ``` **Pull from:** Metrics glossaries, KPI catalogs, product requirement docs, data dictionaries ##### Show sample values[​](#show-sample-values "Direct link to Show sample values") Give Copilot examples of what the data actually looks like, especially edge cases: ```text Order statuses: - `customer_id: C-12, created_at: 2025-05-03T09:07:00Z, status: 'completed'` - `customer_id: C-14, created_at: 2025-05-03T09:02:00Z, status: 'cancelled'` - `customer_id: C-13, created_at: 2020-01-02T06:40:00Z, status: 'pending'` ``` **Pull from:** Data profiling reports, QA test datasets, BI dashboard filters ##### Start with a draft, refine later[​](#start-with-a-draft-refine-later "Direct link to Start with a draft, refine later") Frame your model first, then iterate. Start with a clean outline that gets the basic structure right: ```text From stg_orders and dim_customers, draft a minimal model with order_id, customer_id, order_date, net_revenue = gross - coalesce(discount, 0), and join to dim_customers on customer_id. Filter to the last 30 days for preview only. ``` **Pull from:** Source-to-target mapping sheets (join keys and transformations), data dictionaries (primary and foreign keys) #### Create semantic models and metrics[​](#create-semantic-models-and-metrics "Direct link to Create semantic models and metrics") Fast-track your semantic layer strategy with AI-generated YAML using Copilot. dbt platform provides built-in generation buttons that automatically [generate code](https://docs.getdbt.com/docs/cloud/use-dbt-copilot.md), [documentation](https://docs.getdbt.com/docs/build/documentation.md), [data tests](https://docs.getdbt.com/docs/build/data-tests.md), [metrics](https://docs.getdbt.com/docs/build/metrics-overview.md), and [semantic models](https://docs.getdbt.com/docs/build/semantic-models.md) for you with the click of a button in the [Studio IDE](https://docs.getdbt.com/docs/cloud/studio-ide/develop-copilot.md), [Canvas](https://docs.getdbt.com/docs/cloud/build-canvas-copilot.md), and [Insights](https://docs.getdbt.com/docs/explore/dbt-insights.md). These features understand your model's structure and generate YAML in the correct location. **How to generate semantic models:** 1. Navigate to the Studio IDE and select a SQL model file in the **File explorer** 2. In the **Console** section (under the **Editor**), click the **dbt Copilot** icon to view AI options 3. Select **Semantic model** to create a semantic model based on your SQL model 4. Review and refine the generated YAML as needed You can also use Copilot to generate documentation, tests, and metrics. These built-in features automatically understand your model's columns, data types, and relationships, which means you don't need to manually describe your schema or copy-paste between file types. **Typical workflow:** 1. Build your SQL model using Copilot conversational prompts 2. Use built-in buttons to add documentation, tests, and semantic models 3. Refine the generated YAML as needed For more details, check out the [dbt Copilot](https://docs.getdbt.com/docs/cloud/use-dbt-copilot.md) docs. #### Create reusable macros[​](#create-reusable-macros "Direct link to Create reusable macros") In this section, we'll look at how to create reusable macros using Copilot. * [Turn repetitive code into reusable logic](#turn-repetitive-code-into-reusable-logic) * [Lower the barrier to entry](#lower-the-barrier-to-entry) * [Accelerate complex logic design](#accelerate-complex-logic-design) ##### Turn repetitive code into reusable logic[​](#turn-repetitive-code-into-reusable-logic "Direct link to Turn repetitive code into reusable logic") A junior analyst keeps copy-pasting CASE statements across models. **What to give Copilot:** ```text Turn this CASE pattern into a reusable macro: CASE WHEN amount >= 1000 THEN 'high' WHEN amount >= 500 THEN 'medium' ELSE 'low' END Macro requirements: - Name: categorize_amount - Parameters: column name, high threshold (default 1000), medium threshold (default 500) - Include docstring with usage example - Handle null values by returning 'unknown' ``` **Why it works:** Clear input (the CASE statement), clear requirements, clear output expectations. ##### Lower the barrier to entry[​](#lower-the-barrier-to-entry "Direct link to Lower the barrier to entry") You need a macro but don't know Jinja syntax well. **What to ask Copilot:** ```text I need a macro that calculates the number of days between two date columns, excluding weekends. Parameters: - start_date_column (required) - end_date_column (required) Include a docstring explaining how to use it. ``` **Outcome:** Copilot generates proper Jinja syntax, handles parameters, and includes documentation. You learn Jinja patterns while getting working code. ##### Accelerate complex logic design[​](#accelerate-complex-logic-design "Direct link to Accelerate complex logic design") This is best for advanced users who are comfortable with Jinja. **What to ask Copilot:** ```text I need a macro that builds a grouped aggregation with optional filters. Parameters: - relation (the model/table to query) - group_by (list of columns to group by) - metrics (list of columns to aggregate) - where (optional filter condition) Include defaults and guardrails for empty lists. Add a docstring with parameter descriptions and usage example. ``` **Why this works:** You've outlined the interface (parameters) and edge cases (empty lists), letting Copilot handle the Jinja boilerplate while you focus on design. This approach accelerates iteration so you can refine the structure without getting stuck in syntax details. #### Troubleshoot errors and issues[​](#troubleshoot-errors-and-issues "Direct link to Troubleshoot errors and issues") Copilot acts as a fast, context-aware reviewer for failing SQL and macros. It reads errors, inspects your query structure, and suggests minimal fixes. Troubleshooting with Copilot gives you: * Faster diagnosis by using plain-language translation of errors with likely root causes * Safer fixes by biasing toward small, targeted changes * Better learning by generating explanations you can paste into docs or PR descriptions ##### Troubleshoot errors[​](#troubleshoot-errors "Direct link to Troubleshoot errors") When something breaks, give Copilot the error message, your code, and what you expected to happen. Here are a couple of examples to show you how to use Copilot to troubleshoot errors. **Example: SQL error** ```text Error: "SQL compilation error: Column 'product_name' must appear in GROUP BY" Query: SELECT product_id, product_name, SUM(quantity) as total_quantity FROM inventory GROUP BY product_id Warehouse: Snowflake Expected: Group by product and show product name. What's wrong and how do I fix it? ``` **Example: Macro not working** ```text This macro should calculate discount but returns wrong values: {% macro calculate_discount(amount, rate) %} {{ amount }} * {{ rate }} {% endmacro %} When I call {{ calculate_discount(100, 0.1) }} I expect 10 but get an error. Show me the rendered SQL from target/compiled and explain what's wrong. ``` **Tip:** Include your warehouse type (Snowflake, BigQuery, Databricks and so on.) — this is because the syntax can vary across data platforms. #### Conclusion[​](#conclusion "Direct link to Conclusion") Congrats, you've now learned some tips on how to create and use prompts for dbt Copilot 🎉! You can: * Boost your prompting skills by providing rich context and stating clear business questions. Applicable for SQL, macros, documentation, tests, metrics, and semantic models. * Amplify your workflow by using existing documentation and project context * Generate Jinja macros to build more scalable and maintainable systems * Troubleshoot your code to diagnose issues fast and apply safe, explainable fixes ##### Quick reference checklist[​](#quick-reference-checklist "Direct link to Quick reference checklist") When writing prompts for dbt Copilot: * ✅ Provide rich context: Table names, columns, data types, relationships, sample values * ✅ Break down complex logic: Write multi-part queries as a sequence of steps * ✅ State the business question: What decision or insight you're supporting, not just "write a query" * ✅ Be clear and explicit: Expected columns, sort order, filters, and output format For troubleshooting: * ✅ Include complete error messages: Full warehouse error with line numbers * ✅ Show the failing code: Both the dbt model and compiled SQL (from `target/compiled/`) * ✅ Provide sample data: Representative rows that trigger the issue * ✅ State your warehouse: Snowflake, BigQuery, Databricks, etc. ##### Next steps[​](#next-steps "Direct link to Next steps") Start with one task—automating documentation, generating a test, or refactoring a model—and build the habit from there. The more you use Copilot, the more you'll discover ways to accelerate your analytics engineering workflow. Check out the following docs to learn more about how to use Copilot: * [About dbt Copilot](https://docs.getdbt.com/docs/cloud/dbt-copilot.md) * [Generate resources](https://docs.getdbt.com/docs/cloud/use-dbt-copilot.md#generate-resources) * [Generate and edit SQL inline](https://docs.getdbt.com/docs/cloud/use-dbt-copilot.md#generate-and-edit-sql-inline) * [Build visual models](https://docs.getdbt.com/docs/cloud/use-dbt-copilot.md#build-visual-models) * [Build queries](https://docs.getdbt.com/docs/cloud/use-dbt-copilot.md#build-queries) #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Integrate with dbt Semantic Layer using best practices [Back to guides](https://docs.getdbt.com/guides.md) Semantic Layer Best practices Advanced [Menu ]() #### Introduction[​](#introduction "Direct link to Introduction") To fit your tool within the world of the Semantic Layer, dbt Labs offers some best practice recommendations for how to expose metrics and allow users to interact with them seamlessly. This is an evolving guide that is meant to provide recommendations based on our experience. If you have any feedback, we'd love to hear it! 📹 Learn about the dbt Semantic Layer with on-demand video courses! Explore our [dbt Semantic Layer on-demand course](https://learn.getdbt.com/courses/semantic-layer) to learn how to define and query metrics in your dbt project. Additionally, dive into mini-courses for querying the dbt Semantic Layer in your favorite tools: [Tableau](https://courses.getdbt.com/courses/tableau-querying-the-semantic-layer), [Excel](https://learn.getdbt.com/courses/querying-the-semantic-layer-with-excel), [Hex](https://courses.getdbt.com/courses/hex-querying-the-semantic-layer), and [Mode](https://courses.getdbt.com/courses/mode-querying-the-semantic-layer). ##### Prerequisites[​](#prerequisites "Direct link to Prerequisites") To build a Semantic Layer integration: * We offer a [JDBC](https://docs.getdbt.com/docs/dbt-cloud-apis/sl-jdbc.md) API and [GraphQL API](https://docs.getdbt.com/docs/dbt-cloud-apis/sl-graphql.md). Refer to the dedicated [Semantic Layer API](https://docs.getdbt.com/docs/dbt-cloud-apis/sl-api-overview.md) for more technical integration details. * Familiarize yourself with the [Semantic Layer](https://docs.getdbt.com/docs/use-dbt-semantic-layer/dbt-sl.md) and [MetricFlow](https://docs.getdbt.com/docs/build/about-metricflow.md)'s key concepts. There are two main objects: * [Semantic models](https://docs.getdbt.com/docs/build/semantic-models.md) — Nodes in your semantic graph, connected via entities as edges. MetricFlow takes semantic models defined in YAML configuration files as inputs and creates a semantic graph that you can use to query metrics. * [Metrics](https://docs.getdbt.com/docs/build/metrics-overview.md) — Can be defined in the same YAML files as your semantic models, or split into separate YAML files into any other subdirectories (provided that these subdirectories are also within the same dbt project repo). ##### Connection parameters[​](#connection-parameters "Direct link to Connection parameters") The dbt Semantic Layer APIs authenticate with `environmentId`, `SERVICE_TOKEN`, and `host`. We recommend you provide users with separate input fields with these components for authentication (dbt will surface these parameters for the user). ##### Exposing metadata to dbt Labs[​](#exposing-metadata-to-dbt-labs "Direct link to Exposing metadata to dbt Labs") When building an integration, we recommend you expose certain metadata in the request for analytics and troubleshooting purpose. Please send us the following header with every query: `'X-dbt-partner-source': 'Your-Application-Name'` Additionally, it would be helpful if you also included the email and username of the person generating the query from your application. #### Use best practices when exposing metrics[​](#use-best-practices-when-exposing-metrics "Direct link to Use best practices when exposing metrics") Best practices for exposing metrics are summarized into five themes: * [Governance](#governance-and-traceability) — Recommendations on how to establish guardrails for governed data work. * [Discoverability](#discoverability) — Recommendations on how to make user-friendly data interactions. * [Organization](#organization) — Organize metrics and dimensions for all audiences, use [saved queries](https://docs.getdbt.com/docs/build/saved-queries.md). * [Query flexibility](#query-flexibility) — Allow users to query either one metric alone without dimensions or multiple metrics with dimensions. * [Context and interpretation](#context-and-interpretation) — Contextualize metrics for better analysis; expose definitions, metadata, lineage, and freshness. ##### Governance and traceability[​](#governance-and-traceability "Direct link to Governance and traceability") When working with more governed data, it's essential to establish clear guardrails. Here are some recommendations: * **Aggregations control** — Users shouldn't generally be allowed to modify aggregations unless they perform post-processing calculations on Semantic Layer data (such as year-over-year analysis). * **Time series alignment and using metric\_time** — Make sure users view metrics across the correct time series. When displaying metric graphs, using a non-default time aggregation dimension might lead to misleading interpretations. While users can still group by other time dimensions, they should be careful not to create trend lines with incorrect time axes.

When looking at one or multiple metrics, users should use `metric_time` as the main time dimension to guarantee they are looking at the right time series for the metric(s).

As such, when building an application, we recommend exposing `metric_time` as a separate, "special" time dimension on its own. This dimension is always going to align with all metrics and be common across them. Other time dimensions can still be looked at and grouped by, but having a clear delineation between the `metric_time` dimension and the other time dimensions is clarifying so that people do not confuse how metrics should be plotted.

Also, when a user requests a time granularity change for the main time series, the query that your application runs should use `metric_time` as this will always give you the correct slice. Related to this, we also strongly recommend that you have a way to expose what dimension `metric_time` actually maps to for users who may not be familiar. Our APIs allow you to fetch the actual underlying time dimensions that makeup metric\_time (such as `transaction_date`) so you can expose them to your users. * **Units consistency** — If units are supported, it's vital to avoid plotting data incorrectly with different units. Ensuring consistency in unit representation will prevent confusion and misinterpretation of the data. * **Traceability of metric and dimension changes** — When users change names of metrics and dimensions for reports, it's crucial to have a traceability mechanism in place to link back to the original source metric name. ##### Discoverability[​](#discoverability "Direct link to Discoverability") * Consider treating [metrics](https://docs.getdbt.com/docs/build/metrics-overview.md) as first-class objects rather than measures. Metrics offer a higher-level and more contextual way to interact with data, reducing the burden on end-users to manually aggregate data. * **Easy metric interactions** — Provide users with an intuitive approach to: * Search for Metrics — Users should be able to easily search and find relevant metrics. Metrics can serve as the starting point to lead users into exploring dimensions. * Search for Dimensions — Users should be able to query metrics with associated dimensions, allowing them to gain deeper insights into the data. * Filter by Dimension Values — Expose and enable users to filter metrics based on dimension values, encouraging data analysis and exploration. * Filter additional metadata — Allow users to filter metrics based on other available metadata, such as metric type and default time granularity. * **Suggested metrics** — Ideally, the system should intelligently suggest relevant metrics to users based on their team's activities. This approach encourages user exposure, facilitates learning, and supports collaboration among team members. By implementing these recommendations, the data interaction process becomes more user-friendly, empowering users to gain valuable insights without the need for extensive data manipulation. ##### Organization[​](#organization "Direct link to Organization") We recommend organizing metrics and dimensions in ways that a non-technical user can understand the data model, without needing much context: * **Organizing dimensions** — To help non-technical users understand the data model better, we recommend organizing dimensions based on the entity they originated from. For example, consider dimensions like `user__country` and `product__category`.

You can create groups by extracting `user` and `product` and then nest the respective dimensions under each group. This way, dimensions align with the entity or semantic model they belong to and make them more user-friendly and accessible. Additionally, we recommending adding a `label` parameter to dimensions in order to define the value displayed in downstream tools. * **Organizing metrics** — The goal is to organize metrics into a hierarchy in our configurations, instead of presenting them in a long list.

This hierarchy helps you organize metrics based on specific criteria, such as business unit or team. By providing this structured organization, users can find and navigate metrics more efficiently, enhancing their overall data analysis experience. * **Using saved queries** — The Semantic Layer has a concept of [saved queries](https://docs.getdbt.com/docs/build/saved-queries.md) which allows users to pre-build slices of metrics, dimensions, filters to be easily accessed. You should surface these as first class objects in your integration. Refer to the [JDBC](https://docs.getdbt.com/docs/dbt-cloud-apis/sl-jdbc.md) and [GraphQL](https://docs.getdbt.com/docs/dbt-cloud-apis/sl-graphql.md) APIs for syntax. ##### Query flexibility[​](#query-flexibility "Direct link to Query flexibility") Allow users to query either one metric alone without dimensions or multiple metrics with dimensions. * Allow toggling between metrics/dimensions seamlessly. * Be clear on exposing what dimensions are queryable with what metrics and hide things that don’t apply. (Our APIs provide calls for you to get relevant dimensions for metrics, and vice versa). * Only expose time granularities (monthly, daily, yearly) that match the available metrics. * For example, if a dbt model and its resulting semantic model have a monthly granularity, make sure querying data with a 'daily' granularity isn't available to the user. Our APIs have functionality that will help you surface the correct granularities * We recommend that time granularity is treated as a general time dimension-specific concept and that it can be applied to more than just the primary aggregation (or `metric_time`). Consider a situation where a user wants to look at `sales` over time by `customer signup month`; in this situation, having the ability to apply granularities to both time dimensions is crucial. Our APIs include information to fetch the granularities for the primary (metric\_time) dimensions, as well as all time dimensions. You can treat each time dimension and granularity selection independently in your application. Note: Initially, as a starting point, it makes sense to only support `metric_time` or the primary time dimension, but we recommend expanding that as your solution evolves. * You should allow users to filter on date ranges and expose a calendar and nice presets for filtering these. * For example, last 30 days, last week, and so on. ##### Context and interpretation[​](#context-and-interpretation "Direct link to Context and interpretation") For better analysis, it's best to have the context of the metrics close to where the analysis is happening. We recommend the following: * Expose business definitions of the metrics as well as logical definitions. * Expose additional metadata from the Semantic layer (measures, type parameters). * Use the [Discovery API](https://docs.getdbt.com/docs/dbt-cloud-apis/discovery-api.md) to enhance the metric and build confidence in its accuracy: * Check if the metric is fresh and when it was last updated. * Include lineage information to understand the metric's origin. * Allow for creating other metadata that’s useful for the metric. We can provide some of this information in our configuration (Display name, Default Granularity for View, Default Time range), but there may be other metadata that your tool wants to provide to make the metric richer. ##### Transparency and using compile[​](#transparency-and-using-compile "Direct link to Transparency and using compile") For transparency and additional context, we recommend you have an easy way for the user to obtain the SQL that MetricFlow generates. Depending on what API you are using, you can do this by using our `compile` parameter. This is incredibly powerful and emphasizes transparency and openness, particularly for technically inclined users. ##### Where filters and optimization[​](#where-filters-and-optimization "Direct link to Where filters and optimization") In the cases where our APIs support either a string or a filter list for the `where` clause, we always recommend that your application utilizes the filter list in order to gain maximum pushdown benefits. The `where` string may be more intuitive for users writing queries during testing, but it will not have the performance benefits of the filter list in a production environment. #### Understand stages of an integration[​](#understand-stages-of-an-integration "Direct link to Understand stages of an integration") These are recommendations on how to evolve a Semantic Layer integration and not a strict runbook. **Stage 1 - The basic** * Supporting and using [JDBC](https://docs.getdbt.com/docs/dbt-cloud-apis/sl-jdbc.md) or [GraphQL](https://docs.getdbt.com/docs/dbt-cloud-apis/sl-graphql.md) is the first step. Refer to the [Semantic Layer APIs](https://docs.getdbt.com/docs/dbt-cloud-apis/sl-api-overview.md) for more technical details. **Stage 2 - More discoverability and basic querying** * Support listing metrics defined in the project * Listing available dimensions based on one or many metrics * Querying defined metric values on their own or grouping by available dimensions * Display metadata from [Discovery API](https://docs.getdbt.com/docs/dbt-cloud-apis/discovery-api.md) and other context * Expose [saved queries](https://docs.getdbt.com/docs/build/saved-queries.md), which are pre-built metrics, dimensions, and filters that Semantic Layer developers create for easier analysis. You can expose them in your application. Refer to the [JDBC](https://docs.getdbt.com/docs/dbt-cloud-apis/sl-jdbc.md) and [GraphQL](https://docs.getdbt.com/docs/dbt-cloud-apis/sl-graphql.md) APIs for syntax. **Stage 3 - More querying flexibility and better user experience (UX)** * More advanced filtering * Time filters with good presets/calendar UX * Filtering metrics on a pre-populated set of dimension values * Make dimension values more user-friendly by organizing them effectively * Intelligent filtering of metrics based on available dimensions and vice versa **Stage 4 - More custom user interface (UI) / Collaboration** * A place where users can see all the relevant information about a given metric * Organize metrics by hierarchy and more advanced search features (such as filter on the type of metric or other metadata) * Use and expose more metadata * Querying dimensions without metrics and other more advanced querying functionality * Suggest metrics to users based on teams/identity, and so on. ##### Related docs[​](#related-docs "Direct link to Related docs") * [Semantic Layer FAQs](https://docs.getdbt.com/docs/use-dbt-semantic-layer/sl-faqs.md) * [Use the Semantic Layer](https://docs.getdbt.com/docs/use-dbt-semantic-layer/dbt-sl.md) to learn about the product. * [Build your metrics](https://docs.getdbt.com/docs/build/build-metrics-intro.md) for more info about MetricFlow and its components. * [Semantic Layer integrations page](https://www.getdbt.com/product/semantic-layer-integrations) for information about the available partner integrations. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Leverage dbt to generate analytics and ML-ready pipelines with SQL and Python with Snowflake [Back to guides](https://docs.getdbt.com/guides.md) Snowflake Intermediate [Menu ]() #### Introduction[​](#introduction "Direct link to Introduction") The focus of this workshop will be to demonstrate how we can use both *SQL and python together* in the same workflow to run *both analytics and machine learning models* on dbt. All code in today’s workshop can be found on [GitHub](https://github.com/dbt-labs/python-snowpark-formula1/tree/python-formula1). ##### What you'll use during the lab[​](#what-youll-use-during-the-lab "Direct link to What you'll use during the lab") * A [Snowflake account](https://trial.snowflake.com/) with ACCOUNTADMIN access * A [dbt account](https://www.getdbt.com/signup/) ##### What you'll learn[​](#what-youll-learn "Direct link to What you'll learn") * How to build scalable data transformation pipelines using dbt, and Snowflake using SQL and Python * How to leverage copying data into Snowflake from a public S3 bucket ##### What you need to know[​](#what-you-need-to-know "Direct link to What you need to know") * Basic to intermediate SQL and python. * Basic understanding of dbt fundamentals. We recommend the [dbt Fundamentals course](https://learn.getdbt.com) if you're interested. * High level machine learning process (encoding, training, testing) * Simple ML algorithms — we will use logistic regression to keep the focus on the *workflow*, not algorithms! ##### What you'll build[​](#what-youll-build "Direct link to What you'll build") * A set of data analytics and prediction pipelines using Formula 1 data leveraging dbt and Snowflake, making use of best practices like data quality tests and code promotion between environments * We will create insights for: 1. Finding the lap time average and rolling average through the years (is it generally trending up or down)? 2. Which constructor has the fastest pit stops in 2021? 3. Predicting the position of each driver given using a decade of data (2010 - 2020) As inputs, we are going to leverage Formula 1 datasets hosted on a dbt Labs public S3 bucket. We will create a Snowflake Stage for our CSV files then use Snowflake’s `COPY INTO` function to copy the data in from our CSV files into tables. The Formula 1 is available on [Kaggle](https://www.kaggle.com/datasets/rohanrao/formula-1-world-championship-1950-2020). The data is originally compiled from the [Ergast Developer API](http://ergast.com/mrd/). Overall we are going to set up the environments, build scalable pipelines in dbt, establish data tests, and promote code to production. #### Configure Snowflake[​](#configure-snowflake "Direct link to Configure Snowflake") 1. Log in to your trial Snowflake account. You can [sign up for a Snowflake Trial Account using this form](https://signup.snowflake.com/) if you don’t have one. 2. Ensure that your account is set up using **AWS** in the **US East (N. Virginia)**. We will be copying the data from a public AWS S3 bucket hosted by dbt Labs in the us-east-1 region. By ensuring our Snowflake environment setup matches our bucket region, we avoid any multi-region data copy and retrieval latency issues. [![Snowflake trial](/img/guides/dbt-ecosystem/dbt-python-snowpark/2-snowflake-configuration/1-snowflake-trial-AWS-setup.png?v=2 "Snowflake trial")](#)Snowflake trial 3. After creating your account and verifying it from your sign-up email, Snowflake will direct you back to the UI called Snowsight. 4. When Snowsight first opens, your window should look like the following, with you logged in as the ACCOUNTADMIN with demo worksheets open: [![Snowflake trial demo worksheets](/img/guides/dbt-ecosystem/dbt-python-snowpark/2-snowflake-configuration/2-new-snowflake-account.png?v=2 "Snowflake trial demo worksheets")](#)Snowflake trial demo worksheets 5. Navigate to **Admin > Billing & Terms**. Click **Enable > Acknowledge & Continue** to enable Anaconda Python Packages to run in Snowflake. [![Anaconda terms](/img/guides/dbt-ecosystem/dbt-python-snowpark/2-snowflake-configuration/3-accept-anaconda-terms.jpeg?v=2 "Anaconda terms")](#)Anaconda terms [![Enable Anaconda](/img/guides/dbt-ecosystem/dbt-python-snowpark/2-snowflake-configuration/4-enable-anaconda.png?v=2 "Enable Anaconda")](#)Enable Anaconda 6. Finally, create a new Worksheet by selecting **+ Worksheet** in the upper right corner. #### Connect to data source[​](#connect-to-data-source "Direct link to Connect to data source") We need to obtain our data source by copying our Formula 1 data into Snowflake tables from a public S3 bucket that dbt Labs hosts. 1. When a new Snowflake account is created, there should be a preconfigured warehouse in your account named `COMPUTE_WH`. 2. If for any reason your account doesn’t have this warehouse, we can create a warehouse using the following script: ```sql create or replace warehouse COMPUTE_WH with warehouse_size=XSMALL ``` 3. Rename the worksheet to `data setup script` since we will be placing code in this worksheet to ingest the Formula 1 data. Make sure you are still logged in as the **ACCOUNTADMIN** and select the **COMPUTE\_WH** warehouse. [![Rename worksheet and select warehouse](/img/guides/dbt-ecosystem/dbt-python-snowpark/3-connect-to-data-source/1-rename-worksheet-and-select-warehouse.png?v=2 "Rename worksheet and select warehouse")](#)Rename worksheet and select warehouse 4. Copy the following code into the main body of the Snowflake worksheet. You can also find this setup script under the `setup` folder in the [Git repository](https://github.com/dbt-labs/python-snowpark-formula1/blob/main/setup/setup_script_s3_to_snowflake.sql). The script is long since it's bring in all of the data we'll need today! ```sql -- create and define our formula1 database create or replace database formula1; use database formula1; create or replace schema raw; use schema raw; -- define our file format for reading in the CSVs create or replace file format CSVformat type = CSV field_delimiter =',' field_optionally_enclosed_by = '"', skip_header=1; -- create or replace stage formula1_stage file_format = CSVformat url = 's3://formula1-dbt-cloud-python-demo/formula1-kaggle-data/'; -- load in the 8 tables we need for our demo -- we are first creating the table then copying our data in from s3 -- think of this as an empty container or shell that we are then filling create or replace table formula1.raw.circuits ( CIRCUITID NUMBER(38,0), CIRCUITREF VARCHAR(16777216), NAME VARCHAR(16777216), LOCATION VARCHAR(16777216), COUNTRY VARCHAR(16777216), LAT FLOAT, LNG FLOAT, ALT NUMBER(38,0), URL VARCHAR(16777216) ); -- copy our data from public s3 bucket into our tables copy into circuits from @formula1_stage/circuits.csv on_error='continue'; create or replace table formula1.raw.constructors ( CONSTRUCTORID NUMBER(38,0), CONSTRUCTORREF VARCHAR(16777216), NAME VARCHAR(16777216), NATIONALITY VARCHAR(16777216), URL VARCHAR(16777216) ); copy into constructors from @formula1_stage/constructors.csv on_error='continue'; create or replace table formula1.raw.drivers ( DRIVERID NUMBER(38,0), DRIVERREF VARCHAR(16777216), NUMBER VARCHAR(16777216), CODE VARCHAR(16777216), FORENAME VARCHAR(16777216), SURNAME VARCHAR(16777216), DOB DATE, NATIONALITY VARCHAR(16777216), URL VARCHAR(16777216) ); copy into drivers from @formula1_stage/drivers.csv on_error='continue'; create or replace table formula1.raw.lap_times ( RACEID NUMBER(38,0), DRIVERID NUMBER(38,0), LAP NUMBER(38,0), POSITION FLOAT, TIME VARCHAR(16777216), MILLISECONDS NUMBER(38,0) ); copy into lap_times from @formula1_stage/lap_times.csv on_error='continue'; create or replace table formula1.raw.pit_stops ( RACEID NUMBER(38,0), DRIVERID NUMBER(38,0), STOP NUMBER(38,0), LAP NUMBER(38,0), TIME VARCHAR(16777216), DURATION VARCHAR(16777216), MILLISECONDS NUMBER(38,0) ); copy into pit_stops from @formula1_stage/pit_stops.csv on_error='continue'; create or replace table formula1.raw.races ( RACEID NUMBER(38,0), YEAR NUMBER(38,0), ROUND NUMBER(38,0), CIRCUITID NUMBER(38,0), NAME VARCHAR(16777216), DATE DATE, TIME VARCHAR(16777216), URL VARCHAR(16777216), FP1_DATE VARCHAR(16777216), FP1_TIME VARCHAR(16777216), FP2_DATE VARCHAR(16777216), FP2_TIME VARCHAR(16777216), FP3_DATE VARCHAR(16777216), FP3_TIME VARCHAR(16777216), QUALI_DATE VARCHAR(16777216), QUALI_TIME VARCHAR(16777216), SPRINT_DATE VARCHAR(16777216), SPRINT_TIME VARCHAR(16777216) ); copy into races from @formula1_stage/races.csv on_error='continue'; create or replace table formula1.raw.results ( RESULTID NUMBER(38,0), RACEID NUMBER(38,0), DRIVERID NUMBER(38,0), CONSTRUCTORID NUMBER(38,0), NUMBER NUMBER(38,0), GRID NUMBER(38,0), POSITION FLOAT, POSITIONTEXT VARCHAR(16777216), POSITIONORDER NUMBER(38,0), POINTS NUMBER(38,0), LAPS NUMBER(38,0), TIME VARCHAR(16777216), MILLISECONDS NUMBER(38,0), FASTESTLAP NUMBER(38,0), RANK NUMBER(38,0), FASTESTLAPTIME VARCHAR(16777216), FASTESTLAPSPEED FLOAT, STATUSID NUMBER(38,0) ); copy into results from @formula1_stage/results.csv on_error='continue'; create or replace table formula1.raw.status ( STATUSID NUMBER(38,0), STATUS VARCHAR(16777216) ); copy into status from @formula1_stage/status.csv on_error='continue'; ``` 5. Ensure all the commands are selected before running the query — an easy way to do this is to use Ctrl-a to highlight all of the code in the worksheet. Select **run** (blue triangle icon). Notice how the dot next to your **COMPUTE\_WH** turns from gray to green as you run the query. The **status** table is the final table of all 8 tables loaded in. [![Load data from S3 bucket](/img/guides/dbt-ecosystem/dbt-python-snowpark/3-connect-to-data-source/2-load-data-from-s3.png?v=2 "Load data from S3 bucket")](#)Load data from S3 bucket 6. Let’s unpack that pretty long query we ran into component parts. We ran this query to load in our 8 Formula 1 tables from a public S3 bucket. To do this, we: * Created a new database called `formula1` and a schema called `raw` to place our raw (untransformed) data into. * Defined our file format for our CSV files. Importantly, here we use a parameter called `field_optionally_enclosed_by =` since the string columns in our Formula 1 CSV files use quotes. Quotes are used around string values to avoid parsing issues where commas `,` and new lines `/n` in data values could cause data loading errors. * Created a stage to locate our data we are going to load in. Snowflake Stages are locations where data files are stored. Stages are used to both load and unload data to and from Snowflake locations. Here we are using an external stage, by referencing an S3 bucket. * Created our tables for our data to be copied into. These are empty tables with the column name and data type. Think of this as creating an empty container that the data will then fill into. * Used the `copy into` statement for each of our tables. We reference our staged location we created and upon loading errors continue to load in the rest of the data. You should not have data loading errors but if you do, those rows will be skipped and Snowflake will tell you which rows caused errors 7. Now let's take a look at some of our cool Formula 1 data we just loaded up! 1. Create a new worksheet by selecting the **+** then **New Worksheet**. [![Create new worksheet to query data](/img/guides/dbt-ecosystem/dbt-python-snowpark/3-connect-to-data-source/3-create-new-worksheet-to-query-data.png?v=2 "Create new worksheet to query data")](#)Create new worksheet to query data 2. Navigate to **Database > Formula1 > RAW > Tables**. 3. Query the data using the following code. There are only 76 rows in the circuits table, so we don’t need to worry about limiting the amount of data we query. ```sql select * from formula1.raw.circuits ``` 4. Run the query. From here on out, we’ll use the keyboard shortcuts Command-Enter or Control-Enter to run queries and won’t explicitly call out this step. 5. Review the query results, you should see information about Formula 1 circuits, starting with Albert Park in Australia! 6. Finally, ensure you have all 8 tables starting with `CIRCUITS` and ending with `STATUS`. Now we are ready to connect into dbt! [![Query circuits data](/img/guides/dbt-ecosystem/dbt-python-snowpark/3-connect-to-data-source/4-query-circuits-data.png?v=2 "Query circuits data")](#)Query circuits data #### Configure dbt[​](#configure-dbt "Direct link to Configure dbt") 1. We are going to be using [Snowflake Partner Connect](https://docs.snowflake.com/en/user-guide/ecosystem-partner-connect.html) to set up a dbt account. Using this method will allow you to spin up a fully fledged dbt account with your [Snowflake connection](https://docs.getdbt.com/docs/cloud/connect-data-platform/connect-snowflake.md), [managed repository](https://docs.getdbt.com/docs/cloud/git/managed-repository.md), environments, and credentials already established. 2. Navigate out of your worksheet back by selecting **home**. 3. In Snowsight, confirm that you are using the **ACCOUNTADMIN** role. 4. Navigate to the **Data Products** **> Partner Connect**. Find **dbt** either by using the search bar or navigating the **Data Integration**. Select the **dbt** tile. [![Open Partner Connect](/img/guides/dbt-ecosystem/dbt-python-snowpark/4-configure-dbt/1-open-partner-connect.png?v=2 "Open Partner Connect")](#)Open Partner Connect 5. You should now see a new window that says **Connect to dbt**. Select **Optional Grant** and add the `FORMULA1` database. This will grant access for your new dbt user role to the FORMULA1 database. [![Partner Connect Optional Grant](/img/guides/dbt-ecosystem/dbt-python-snowpark/4-configure-dbt/2-partner-connect-optional-grant.png?v=2 "Partner Connect Optional Grant")](#)Partner Connect Optional Grant 6. Ensure the `FORMULA1` is present in your optional grant before clicking **Connect**.  This will create a dedicated dbt user, database, warehouse, and role for your dbt trial. [![Connect to dbt](/img/guides/dbt-ecosystem/dbt-python-snowpark/4-configure-dbt/3-connect-to-dbt.png?v=2 "Connect to dbt")](#)Connect to dbt 7. When you see the **Your partner account has been created** window, click **Activate**. 8. You should be redirected to a dbt registration page. Fill out the form. Make sure to save the password somewhere for login in the future. [![dbt sign up](/img/guides/dbt-ecosystem/dbt-python-snowpark/4-configure-dbt/4-dbt-cloud-sign-up.png?v=2 "dbt sign up")](#)dbt sign up 9. Select **Complete Registration**. You should now be redirected to your dbt account, complete with a connection to your Snowflake account, a deployment and a development environment, and a sample job. 10. To help you version control your dbt project, we have connected it to a [managed repository](https://docs.getdbt.com/docs/cloud/git/managed-repository.md), which means that dbt Labs will be hosting your repository for you. This will give you access to a Git workflow without you having to create and host the repository yourself. You will not need to know Git for this workshop; dbt will help guide you through the workflow. In the future, when you’re developing your own project, [feel free to use your own repository](https://docs.getdbt.com/docs/cloud/git/connect-github.md). This will allow you to learn more about features like [Slim CI](https://docs.getdbt.com/docs/deploy/continuous-integration.md) builds after this workshop. #### Change development schema name and navigate the IDE[​](#change-development-schema-name-and-navigate-the-ide "Direct link to Change development schema name and navigate the IDE") 1. First we are going to change the name of our default schema to where our dbt models will build. By default, the name is `dbt_`. We will change this to `dbt_` to create your own personal development schema. To do this, click on your account name in the left side menu and select **Account settings**. [![Settings menu](/img/guides/dbt-ecosystem/dbt-python-snowpark/5-development-schema-name/1-settings-gear-icon.png?v=2 "Settings menu")](#)Settings menu 2. Navigate to the **Credentials** menu and select **Partner Connect Trial**, which will expand the credentials menu. [![Credentials edit schema name](/img/guides/dbt-ecosystem/dbt-python-snowpark/5-development-schema-name/2-credentials-edit-schema-name.png?v=2 "Credentials edit schema name")](#)Credentials edit schema name 3. Click **Edit** and change the name of your schema from `dbt_` to `dbt_YOUR_NAME` replacing `YOUR_NAME` with your initials and name. Be sure to click **Save** for your changes! [![Save new schema name](/img/guides/dbt-ecosystem/dbt-python-snowpark/5-development-schema-name/3-save-new-schema-name.png?v=2 "Save new schema name")](#)Save new schema name 4. We now have our own personal development schema, amazing! When we run our first dbt models they will build into this schema. 5. Let’s open up dbt’s Integrated Development Environment (Studio IDE) and familiarize ourselves. Choose **Develop** at the top of the UI. 6. When the Studio IDE is done loading, click **Initialize dbt project**. The initialization process creates a collection of files and folders necessary to run your dbt project. [![Initialize dbt project](/img/guides/dbt-ecosystem/dbt-python-snowpark/5-development-schema-name/4-initialize-dbt-project.png?v=2 "Initialize dbt project")](#)Initialize dbt project 7. After the initialization is finished, you can view the files and folders in the file tree menu. As we move through the workshop we'll be sure to touch on a few key files and folders that we'll work with to build out our project. 8. Next click **Commit and sync** to commit the new files and folders from the initialize step. We always want our commit messages to be relevant to the work we're committing, so be sure to provide a message like `initialize project` and select **Commit Changes**. [![First commit and push](/img/guides/dbt-ecosystem/dbt-python-snowpark/5-development-schema-name/5-first-commit-and-push.png?v=2 "First commit and push")](#)First commit and push [![Commit Changes button](/img/guides/dbt-ecosystem/dbt-python-snowpark/5-development-schema-name/6-initalize-project.png?v=2 "Commit Changes button")](#)Commit Changes button 9. [Committing](https://www.atlassian.com/git/tutorials/saving-changes/git-commit) your work here will save it to the managed git repository that was created during the Partner Connect signup. This initial commit is the only commit that will be made directly to our `main` branch and from *here on out we'll be doing all of our work on a development branch*. This allows us to keep our development work separate from our production code. 10. There are a couple of key features to point out about the Studio IDE before we get to work. It is a text editor, an SQL and Python runner, and a CLI with Git version control all baked into one package! This allows you to focus on editing your SQL and Python files, previewing the results with the SQL runner (it even runs Jinja!), and building models at the command line without having to move between different applications. The Git workflow in dbt allows both Git beginners and experts alike to be able to easily version control all of their work with a couple clicks. [![IDE overview](/img/guides/dbt-ecosystem/dbt-python-snowpark/5-development-schema-name/7-IDE-overview.png?v=2 "IDE overview")](#)IDE overview 11. Let's run our first dbt models! Two example models are included in your dbt project in the `models/examples` folder that we can use to illustrate how to run dbt at the command line. Type `dbt run` into the command line and click **Enter** on your keyboard. When the run bar expands you'll be able to see the results of the run, where you should see the run complete successfully. [![dbt run example models](/img/guides/dbt-ecosystem/dbt-python-snowpark/5-development-schema-name/8-dbt-run-example-models.png?v=2 "dbt run example models")](#)dbt run example models 12. The run results allow you to see the code that dbt compiles and sends to Snowflake for execution. To view the logs for this run, select one of the model tabs using the  **>** icon and then **Details**. If you scroll down a bit you'll be able to see the compiled code and how dbt interacts with Snowflake. Given that this run took place in our development environment, the models were created in your development schema. [![Details about the second model](/img/guides/dbt-ecosystem/dbt-python-snowpark/5-development-schema-name/9-second-model-details.png?v=2 "Details about the second model")](#)Details about the second model 13. Now let's switch over to Snowflake to confirm that the objects were actually created. Click on the three dots **…** above your database objects and then **Refresh**. Expand the **PC\_DBT\_DB** database and you should see your development schema. Select the schema, then **Tables**  and **Views**. Now you should be able to see `MY_FIRST_DBT_MODEL` as a table and `MY_SECOND_DBT_MODEL` as a view. [![Confirm example models are built in Snowflake](/img/guides/dbt-ecosystem/dbt-python-snowpark/5-development-schema-name/10-confirm-example-models-built-in-snowflake.png?v=2 "Confirm example models are built in Snowflake")](#)Confirm example models are built in Snowflake #### Create branch and set up project configs[​](#create-branch-and-set-up-project-configs "Direct link to Create branch and set up project configs") In this step, we’ll need to create a development branch and set up project level configurations. 1. To get started with development for our project, we'll need to create a new Git branch for our work. Select **create branch** and name your development branch. We'll call our branch `snowpark_python_workshop` then click **Submit**. 2. The first piece of development we'll do on the project is to update the `dbt_project.yml` file. Every dbt project requires a `dbt_project.yml` file — this is how dbt knows a directory is a dbt project. The [dbt\_project.yml](https://docs.getdbt.com/reference/dbt_project.yml.md) file also contains important information that tells dbt how to operate on your project. 3. Select the `dbt_project.yml` file from the file tree to open it and replace all of the existing contents with the following code below. When you're done, save the file by clicking **save**. You can also use the Command-S or Control-S shortcut from here on out. ```yaml # Name your project! Project names should contain only lowercase characters # and underscores. A good package name should reflect your organization's # name or the intended use of these models name: 'snowflake_dbt_python_formula1' version: '1.3.0' require-dbt-version: '>=1.3.0' config-version: 2 # This setting configures which "profile" dbt uses for this project. profile: 'default' # These configurations specify where dbt should look for different types of files. # The `model-paths` config, for example, states that models in this project can be # found in the "models/" directory. You probably won't need to change these! model-paths: ["models"] analysis-paths: ["analyses"] test-paths: ["tests"] seed-paths: ["seeds"] macro-paths: ["macros"] snapshot-paths: ["snapshots"] target-path: "target" # directory which will store compiled SQL files clean-targets: # directories to be removed by `dbt clean` - "target" - "dbt_packages" models: snowflake_dbt_python_formula1: staging: +docs: node_color: "CadetBlue" marts: +materialized: table aggregates: +docs: node_color: "Maroon" +tags: "bi" core: +docs: node_color: "#800080" intermediate: +docs: node_color: "MediumSlateBlue" ml: prep: +docs: node_color: "Indigo" train_predict: +docs: node_color: "#36454f" ``` 4. The key configurations to point out in the file with relation to the work that we're going to do are in the `models` section. * `require-dbt-version` — Tells dbt which version of dbt to use for your project. We are requiring 1.3.0 and any newer version to run python models and node colors. * `materialized` — Tells dbt how to materialize models when compiling the code before it pushes it down to Snowflake. All models in the `marts` folder will be built as tables. * `tags` — Applies tags at a directory level to all models. All models in the `aggregates` folder will be tagged as `bi` (abbreviation for business intelligence). * `docs` — Specifies the `node_color` either by the plain color name or a hex value. 5. [Materializations](https://docs.getdbt.com/docs/build/materializations.md) are strategies for persisting dbt models in a warehouse, with `tables` and `views` being the most commonly utilized types. By default, all dbt models are materialized as views and other materialization types can be configured in the `dbt_project.yml` file or in a model itself. It’s very important to note *Python models can only be materialized as tables or incremental models.* Since all our Python models exist under `marts`, the following portion of our `dbt_project.yml` ensures no errors will occur when we run our Python models. Starting with [dbt version 1.4](), Python files will automatically get materialized as tables even if not explicitly specified. ```yaml marts:     +materialized: table ``` #### Create folders and organize files[​](#create-folders-and-organize-files "Direct link to Create folders and organize files") dbt Labs has developed a [project structure guide](https://docs.getdbt.com/best-practices/how-we-structure/1-guide-overview.md) that contains a number of recommendations for how to build the folder structure for your project. Do check out that guide if you want to learn more. Right now we are going to create some folders to organize our files: * Sources — This is our Formula 1 dataset and it will be defined in a source properties YAML file. * Staging models — These models have a 1:1 with their source table. * Intermediate — This is where we will be joining some Formula staging models. * Marts models — Here is where we perform our major transformations. It contains these subfolders: * aggregates * core * ml 1. In your file tree, use your cursor and hover over the `models` subdirectory, click the three dots **…** that appear to the right of the folder name, then select **Create Folder**. We're going to add two new folders to the file path, `staging` and `formula1` (in that order) by typing `staging/formula1` into the file path. [![Create folder](/img/guides/dbt-ecosystem/dbt-python-snowpark/7-folder-structure/1-create-folder.png?v=2 "Create folder")](#)Create folder [![Set file path](/img/guides/dbt-ecosystem/dbt-python-snowpark/7-folder-structure/2-file-path.png?v=2 "Set file path")](#)Set file path * If you click into your `models` directory now, you should see the new `staging` folder nested within `models` and the `formula1` folder nested within `staging`. 2. Create two additional folders the same as the last step. Within the `models` subdirectory, create new directories `marts/core`. 3. We will need to create a few more folders and subfolders using the UI. After you create all the necessary folders, your folder tree should look like this when it's all done: [![File tree of new folders](/img/guides/dbt-ecosystem/dbt-python-snowpark/7-folder-structure/3-tree-of-new-folders.png?v=2 "File tree of new folders")](#)File tree of new folders Remember you can always reference the entire project in [GitHub](https://github.com/dbt-labs/python-snowpark-formula1/tree/python-formula1) to view the complete folder and file strucutre. #### Create source and staging models[​](#create-source-and-staging-models "Direct link to Create source and staging models") In this section, we are going to create our source and staging models. Sources allow us to create a dependency between our source database object and our staging models which will help us when we look at data lineage later. Also, if your source changes database or schema, you only have to update it in your `f1_sources.yml` file rather than updating all of the models it might be used in. Staging models are the base of our project, where we bring all the individual components we're going to use to build our more complex and useful models into the project. Since we want to focus on dbt and Python in this workshop, check out our [sources](https://docs.getdbt.com/docs/build/sources.md) and [staging](https://docs.getdbt.com/best-practices/how-we-structure/2-staging.md) docs if you want to learn more (or take our [dbt Fundamentals](https://learn.getdbt.com/courses/dbt-fundamentals) course which covers all of our core functionality). ##### 1. Create sources[​](#1-create-sources "Direct link to 1. Create sources") We're going to be using each of our 8 Formula 1 tables from our `formula1` database under the `raw`  schema for our transformations and we want to create those tables as sources in our project. 1. Create a new file called `f1_sources.yml` with the following file path: `models/staging/formula1/f1_sources.yml`. 2. Then, paste the following code into the file before saving it: ```yaml version: 2 sources: - name: formula1 description: formula 1 datasets with normalized tables database: formula1 schema: raw tables: - name: circuits description: One record per circuit, which is the specific race course. columns: - name: circuitid data_tests: - unique - not_null - name: constructors description: One record per constructor. Constructors are the teams that build their formula 1 cars. columns: - name: constructorid data_tests: - unique - not_null - name: drivers description: One record per driver. This table gives details about the driver. columns: - name: driverid data_tests: - unique - not_null - name: lap_times description: One row per lap in each race. Lap times started being recorded in this dataset in 1984 and joined through driver_id. - name: pit_stops description: One row per pit stop. Pit stops do not have their own id column, the combination of the race_id and driver_id identify the pit stop. columns: - name: stop data_tests: - accepted_values: arguments: # available in v1.10.5 and higher. Older versions can set the as the top-level property. values: [1,2,3,4,5,6,7,8] quote: false - name: races description: One race per row. Importantly this table contains the race year to understand trends. columns: - name: raceid data_tests: - unique - not_null - name: results columns: - name: resultid data_tests: - unique - not_null description: One row per result. The main table that we join out for grid and position variables. - name: status description: One status per row. The status contextualizes whether the race was finished or what issues arose e.g. collisions, engine, etc. columns: - name: statusid data_tests: - unique - not_null ``` ##### 2. Create staging models[​](#2-create-staging-models "Direct link to 2. Create staging models") The next step is to set up the staging models for each of the 8 source tables. Given the one-to-one relationship between staging models and their corresponding source tables, we'll build 8 staging models here. We know it’s a lot and in the future, we will seek to update the workshop to make this step less repetitive and more efficient. This step is also a good representation of the real world of data, where you have multiple hierarchical tables that you will need to join together! 1. Let's go in alphabetical order to easily keep track of all our staging models! Create a new file called `stg_f1_circuits.sql` with this file path `models/staging/formula1/stg_f1_circuits.sql`. Then, paste the following code into the file before saving it: ```sql with source as ( select * from {{ source('formula1','circuits') }} ), renamed as ( select circuitid as circuit_id, circuitref as circuit_ref, name as circuit_name, location, country, lat as latitude, lng as longitude, alt as altitude -- omit the url from source ) select * from renamed ``` All we're doing here is pulling the source data into the model using the `source` function, renaming some columns, and omitting the column `url` with a commented note since we don’t need it for our analysis. 2. Create `stg_f1_constructors.sql` with this file path `models/staging/formula1/stg_f1_constructors.sql`. Paste the following code into it before saving the file: ```sql with source as ( select * from {{ source('formula1','constructors') }} ), renamed as ( select constructorid as constructor_id, constructorref as constructor_ref, name as constructor_name, nationality as constructor_nationality -- omit the url from source ) select * from renamed ``` We have 6 other stages models to create. We can do this by creating new files, then copy and paste the code into our `staging` folder. 3. Create `stg_f1_drivers.sql` with this file path `models/staging/formula1/stg_f1_drivers.sql`: ```sql with source as ( select * from {{ source('formula1','drivers') }} ), renamed as ( select driverid as driver_id, driverref as driver_ref, number as driver_number, code as driver_code, forename, surname, dob as date_of_birth, nationality as driver_nationality -- omit the url from source ) select * from renamed ``` 4. Create `stg_f1_lap_times.sql` with this file path `models/staging/formula1/stg_f1_lap_times.sql`: ```sql with source as ( select * from {{ source('formula1','lap_times') }} ), renamed as ( select raceid as race_id, driverid as driver_id, lap, position, time as lap_time_formatted, milliseconds as lap_time_milliseconds from source ) select * from renamed ``` 5. Create `stg_f1_pit_stops.sql` with this file path `models/staging/formula1/stg_f1_pit_stops.sql`: ```sql with source as ( select * from {{ source('formula1','pit_stops') }} ), renamed as ( select raceid as race_id, driverid as driver_id, stop as stop_number, lap, time as lap_time_formatted, duration as pit_stop_duration_seconds, milliseconds as pit_stop_milliseconds from source ) select * from renamed order by pit_stop_duration_seconds desc ``` 6. Create `stg_f1_races.sql` with this file path `models/staging/formula1/stg_f1_races.sql`: ```sql with source as ( select * from {{ source('formula1','races') }} ), renamed as ( select raceid as race_id, year as race_year, round as race_round, circuitid as circuit_id, name as circuit_name, date as race_date, to_time(time) as race_time, -- omit the url fp1_date as free_practice_1_date, fp1_time as free_practice_1_time, fp2_date as free_practice_2_date, fp2_time as free_practice_2_time, fp3_date as free_practice_3_date, fp3_time as free_practice_3_time, quali_date as qualifying_date, quali_time as qualifying_time, sprint_date, sprint_time from source ) select * from renamed ``` 7. Create `stg_f1_results.sql` with this file path `models/staging/formula1/stg_f1_results.sql`: ```sql with source as ( select * from {{ source('formula1','results') }} ), renamed as ( select resultid as result_id, raceid as race_id, driverid as driver_id, constructorid as constructor_id, number as driver_number, grid, position::int as position, positiontext as position_text, positionorder as position_order, points, laps, time as results_time_formatted, milliseconds as results_milliseconds, fastestlap as fastest_lap, rank as results_rank, fastestlaptime as fastest_lap_time_formatted, fastestlapspeed::decimal(6,3) as fastest_lap_speed, statusid as status_id from source ) select * from renamed ``` 8. Last one! Create `stg_f1_status.sql` with this file path: `models/staging/formula1/stg_f1_status.sql`: ```sql with source as ( select * from {{ source('formula1','status') }} ), renamed as ( select statusid as status_id, status from source ) select * from renamed ``` After the source and all the staging models are complete for each of the 8 tables, your staging folder should look like this: [![Staging folder](/img/guides/dbt-ecosystem/dbt-python-snowpark/8-sources-and-staging/1-staging-folder.png?v=2 "Staging folder")](#)Staging folder 9. It’s a good time to delete our example folder since these two models are extraneous to our formula1 pipeline and `my_first_model` fails a `not_null` test that we won’t spend time investigating. dbt will warn us that this folder will be permanently deleted, and we are okay with that so select **Delete**. [![Delete example folder](/img/guides/dbt-ecosystem/dbt-python-snowpark/8-sources-and-staging/2-delete-example.png?v=2 "Delete example folder")](#)Delete example folder 10. Now that the staging models are built and saved, it's time to create the models in our development schema in Snowflake. To do this we're going to enter into the command line `dbt build` to run all of the models in our project, which includes the 8 new staging models and the existing example models. Your run should complete successfully and you should see green checkmarks next to all of your models in the run results. We built our 8 staging models as views and ran 13 source tests that we configured in the `f1_sources.yml` file with not that much code, pretty cool! [![Successful dbt build in Snowflake](/img/guides/dbt-ecosystem/dbt-python-snowpark/8-sources-and-staging/3-successful-run-in-snowflake.png?v=2 "Successful dbt build in Snowflake")](#)Successful dbt build in Snowflake Let's take a quick look in Snowflake, refresh database objects, open our development schema, and confirm that the new models are there. If you can see them, then we're good to go! [![Confirm models](/img/guides/dbt-ecosystem/dbt-python-snowpark/8-sources-and-staging/4-confirm-models.png?v=2 "Confirm models")](#)Confirm models Before we move onto the next section, be sure to commit your new models to your Git branch. Click **Commit and push** and give your commit a message like `profile, sources, and staging setup` before moving on. #### Transform SQL[​](#transform-sql "Direct link to Transform SQL") Now that we have all our sources and staging models done, it's time to move into where dbt shines — transformation! We need to: * Create some intermediate tables to join tables that aren’t hierarchical * Create core tables for business intelligence (BI) tool ingestion * Answer the two questions about: * fastest pit stops * lap time trends about our Formula 1 data by creating aggregate models using python! ##### Intermediate models[​](#intermediate-models "Direct link to Intermediate models") We need to join lots of reference tables to our results table to create a human readable dataframe. What does this mean? For example, we don’t only want to have the numeric `status_id` in our table, we want to be able to read in a row of data that a driver could not finish a race due to engine failure (`status_id=5`). By now, we are pretty good at creating new files in the correct directories so we won’t cover this in detail. All intermediate models should be created in the path `models/intermediate`. 1. Create a new file called `int_lap_times_years.sql`. In this model, we are joining our lap time and race information so we can look at lap times over years. In earlier Formula 1 eras, lap times were not recorded (only final results), so we filter out records where lap times are null. ```sql with lap_times as ( select * from {{ ref('stg_f1_lap_times') }} ), races as ( select * from {{ ref('stg_f1_races') }} ), expanded_lap_times_by_year as ( select lap_times.race_id, driver_id, race_year, lap, lap_time_milliseconds from lap_times left join races on lap_times.race_id = races.race_id where lap_time_milliseconds is not null ) select * from expanded_lap_times_by_year ``` 2. Create a file called `in_pit_stops.sql`. Pit stops are a many-to-one (M:1) relationship with our races. We are creating a feature called `total_pit_stops_per_race` by partitioning over our `race_id` and `driver_id`, while preserving individual level pit stops for rolling average in our next section. ```sql with stg_f1__pit_stops as ( select * from {{ ref('stg_f1_pit_stops') }} ), pit_stops_per_race as ( select race_id, driver_id, stop_number, lap, lap_time_formatted, pit_stop_duration_seconds, pit_stop_milliseconds, max(stop_number) over (partition by race_id,driver_id) as total_pit_stops_per_race from stg_f1__pit_stops ) select * from pit_stops_per_race ``` 3. Create a file called `int_results.sql`. Here we are using 4 of our tables — `races`, `drivers`, `constructors`, and `status` — to give context to our `results` table. We are now able to calculate a new feature `drivers_age_years` by bringing the `date_of_birth` and `race_year` into the same table. We are also creating a column to indicate if the driver did not finish (dnf) the race, based upon if their `position` was null called, `dnf_flag`. ```sql with results as ( select * from {{ ref('stg_f1_results') }} ), races as ( select * from {{ ref('stg_f1_races') }} ), drivers as ( select * from {{ ref('stg_f1_drivers') }} ), constructors as ( select * from {{ ref('stg_f1_constructors') }} ), status as ( select * from {{ ref('stg_f1_status') }} ), int_results as ( select result_id, results.race_id, race_year, race_round, circuit_id, circuit_name, race_date, race_time, results.driver_id, results.driver_number, forename ||' '|| surname as driver, cast(datediff('year', date_of_birth, race_date) as int) as drivers_age_years, driver_nationality, results.constructor_id, constructor_name, constructor_nationality, grid, position, position_text, position_order, points, laps, results_time_formatted, results_milliseconds, fastest_lap, results_rank, fastest_lap_time_formatted, fastest_lap_speed, results.status_id, status, case when position is null then 1 else 0 end as dnf_flag from results left join races on results.race_id=races.race_id left join drivers on results.driver_id = drivers.driver_id left join constructors on results.constructor_id = constructors.constructor_id left join status on results.status_id = status.status_id ) select * from int_results ``` 4. Create a *Markdown* file `intermediate.md` that we will go over in depth in the Test and Documentation sections of the [Leverage dbt to generate analytics and ML-ready pipelines with SQL and Python with Snowflake](https://docs.getdbt.com/guides/dbt-python-snowpark.md) guide. ```markdown # the intent of this .md is to allow for multi-line long form explanations for our intermediate transformations # below are descriptions {% docs int_results %} In this query we want to join out other important information about the race results to have a human readable table about results, races, drivers, constructors, and status. We will have 4 left joins onto our results table. {% enddocs %} {% docs int_pit_stops %} There are many pit stops within one race, aka a M:1 relationship. We want to aggregate this so we can properly join pit stop information without creating a fanout. {% enddocs %} {% docs int_lap_times_years %} Lap times are done per lap. We need to join them out to the race year to understand yearly lap time trends. {% enddocs %} ``` 5. Create a *YAML* file `intermediate.yml` that we will go over in depth during the Test and Document sections of the [Leverage dbt to generate analytics and ML-ready pipelines with SQL and Python with Snowflake](https://docs.getdbt.com/guides/dbt-python-snowpark.md) guide. ```yaml version: 2 models: - name: int_results description: '{{ doc("int_results") }}' - name: int_pit_stops description: '{{ doc("int_pit_stops") }}' - name: int_lap_times_years description: '{{ doc("int_lap_times_years") }}' ``` That wraps up the intermediate models we need to create our core models! ##### Core models[​](#core-models "Direct link to Core models") 1. Create a file `fct_results.sql`. This is what I like to refer to as the “mega table” — a really large denormalized table with all our context added in at row level for human readability. Importantly, we have a table `circuits` that is linked through the table `races`. When we joined `races` to `results` in `int_results.sql` we allowed our tables to make the connection from `circuits` to `results` in `fct_results.sql`. We are only taking information about pit stops at the result level so our join would not cause a [fanout](https://community.looker.com/technical-tips-tricks-1021/what-is-a-fanout-23327). ```sql with int_results as ( select * from {{ ref('int_results') }} ), int_pit_stops as ( select race_id, driver_id, max(total_pit_stops_per_race) as total_pit_stops_per_race from {{ ref('int_pit_stops') }} group by 1,2 ), circuits as ( select * from {{ ref('stg_f1_circuits') }} ), base_results as ( select result_id, int_results.race_id, race_year, race_round, int_results.circuit_id, int_results.circuit_name, circuit_ref, location, country, latitude, longitude, altitude, total_pit_stops_per_race, race_date, race_time, int_results.driver_id, driver, driver_number, drivers_age_years, driver_nationality, constructor_id, constructor_name, constructor_nationality, grid, position, position_text, position_order, points, laps, results_time_formatted, results_milliseconds, fastest_lap, results_rank, fastest_lap_time_formatted, fastest_lap_speed, status_id, status, dnf_flag from int_results left join circuits on int_results.circuit_id=circuits.circuit_id left join int_pit_stops on int_results.driver_id=int_pit_stops.driver_id and int_results.race_id=int_pit_stops.race_id ) select * from base_results ``` 2. Create the file `pit_stops_joined.sql`. Our results and pit stops are at different levels of dimensionality (also called grain). Simply put, we have multiple pit stops per a result. Since we are interested in understanding information at the pit stop level with information about race year and constructor, we will create a new table `pit_stops_joined.sql` where each row is per pit stop. Our new table tees up our aggregation in Python. ```sql with base_results as ( select * from {{ ref('fct_results') }} ), pit_stops as ( select * from {{ ref('int_pit_stops') }} ), pit_stops_joined as ( select base_results.race_id, race_year, base_results.driver_id, constructor_id, constructor_name, stop_number, lap, lap_time_formatted, pit_stop_duration_seconds, pit_stop_milliseconds from base_results left join pit_stops on base_results.race_id=pit_stops.race_id and base_results.driver_id=pit_stops.driver_id ) select * from pit_stops_joined ``` 3. Enter in the command line and execute `dbt build` to build out our entire pipeline to up to this point. Don’t worry about “overriding” your previous models – dbt workflows are designed to be idempotent so we can run them again and expect the same results. 4. Let’s talk about our lineage so far. It’s looking good 😎. We’ve shown how SQL can be used to make data type, column name changes, and handle hierarchical joins really well; all while building out our automated lineage! [![The DAG](/img/guides/dbt-ecosystem/dbt-python-snowpark/9-sql-transformations/1-dag.png?v=2 "The DAG")](#)The DAG 5. Time to **Commit and push** our changes and give your commit a message like `intermediate and fact models` before moving on. #### Running dbt Python models[​](#running-dbt-python-models "Direct link to Running dbt Python models") Up until now, SQL has been driving the project (car pun intended) for data cleaning and hierarchical joining. Now it’s time for Python to take the wheel (car pun still intended) for the rest of our lab! For more information about running Python models on dbt, check out our [docs](https://docs.getdbt.com/docs/build/python-models.md). To learn more about dbt python works under the hood, check out [Snowpark for Python](https://docs.snowflake.com/en/developer-guide/snowpark/python/index.html), which makes running dbt Python models possible. There are quite a few differences between SQL and Python in terms of the dbt syntax and DDL, so we’ll be breaking our code and model runs down further for our python models. ##### Pit stop analysis[​](#pit-stop-analysis "Direct link to Pit stop analysis") First, we want to find out: which constructor had the fastest pit stops in 2021? (constructor is a Formula 1 team that builds or “constructs” the car). 1. Create a new file called `fastest_pit_stops_by_constructor.py` in our `aggregates` (this is the first time we are using the `.py` extension!). 2. Copy the following code into the file: ```python import numpy as np import pandas as pd def model(dbt, session): # dbt configuration dbt.config(packages=["pandas","numpy"]) # get upstream data pit_stops_joined = dbt.ref("pit_stops_joined").to_pandas() # provide year so we do not hardcode dates year=2021 # describe the data pit_stops_joined["PIT_STOP_SECONDS"] = pit_stops_joined["PIT_STOP_MILLISECONDS"]/1000 fastest_pit_stops = pit_stops_joined[(pit_stops_joined["RACE_YEAR"]==year)].groupby(by="CONSTRUCTOR_NAME")["PIT_STOP_SECONDS"].describe().sort_values(by='mean') fastest_pit_stops.reset_index(inplace=True) fastest_pit_stops.columns = fastest_pit_stops.columns.str.upper() return fastest_pit_stops.round(2) ``` 3. Let’s break down what this code is doing step by step: * First, we are importing the Python libraries that we are using. A *library* is a reusable chunk of code that someone else wrote that you may want to include in your programs/projects. We are using `numpy` and `pandas`in this Python model. This is similar to a dbt *package*, but our Python libraries do *not* persist across the entire project. * Defining a function called `model` with the parameter `dbt` and `session`. The parameter `dbt` is a class compiled by dbt, which enables you to run your Python code in the context of your dbt project and DAG. The parameter `session` is a class representing your Snowflake’s connection to the Python backend. The `model` function *must return a single DataFrame*. You can see that all the data transformation happening is within the body of the `model` function that the `return` statement is tied to. * Then, within the context of our dbt model library, we are passing in a configuration of which packages we need using `dbt.config(packages=["pandas","numpy"])`. * Use the `.ref()` function to retrieve the data frame `pit_stops_joined` that we created in our last step using SQL. We cast this to a pandas dataframe (by default it's a Snowpark Dataframe). * Create a variable named `year` so we aren’t passing a hardcoded value. * Generate a new column called `PIT_STOP_SECONDS` by dividing the value of `PIT_STOP_MILLISECONDS` by 1000. * Create our final data frame `fastest_pit_stops` that holds the records where year is equal to our year variable (2021 in this case), then group the data frame by `CONSTRUCTOR_NAME` and use the `describe()` and `sort_values()` and in descending order. This will make our first row in the new aggregated data frame the team with the fastest pit stops over an entire competition year. * Finally, it resets the index of the `fastest_pit_stops` data frame. The `reset_index()` method allows you to reset the index back to the default 0, 1, 2, etc indexes. By default, this method will keep the "old" indexes in a column named "index"; to avoid this, use the drop parameter. Think of this as keeping your data “flat and square” as opposed to “tiered”. If you are new to Python, now might be a good time to [learn about indexes for 5 minutes](https://towardsdatascience.com/the-basics-of-indexing-and-slicing-python-lists-2d12c90a94cf) since it's the foundation of how Python retrieves, slices, and dices data. The `inplace` argument means we override the existing data frame permanently. Not to fear! This is what we want to do to avoid dealing with multi-indexed dataframes! * Convert our Python column names to all uppercase using `.upper()`, so Snowflake recognizes them. * Finally we are returning our dataframe with 2 decimal places for all the columns using the `round()` method. 4. Zooming out a bit, what are we doing differently here in Python from our typical SQL code: * Method chaining is a technique in which multiple methods are called on an object in a single statement, with each method call modifying the result of the previous one. The methods are called in a chain, with the output of one method being used as the input for the next one. The technique is used to simplify the code and make it more readable by eliminating the need for intermediate variables to store the intermediate results. * The way you see method chaining in Python is the syntax `.().()`. For example, `.describe().sort_values(by='mean')` where the `.describe()` method is chained to `.sort_values()`. * The `.describe()` method is used to generate various summary statistics of the dataset. It's used on pandas dataframe. It gives a quick and easy way to get the summary statistics of your dataset without writing multiple lines of code. * The `.sort_values()` method is used to sort a pandas dataframe or a series by one or multiple columns. The method sorts the data by the specified column(s) in ascending or descending order. It is the pandas equivalent to `order by` in SQL. We won’t go as in depth for our subsequent scripts, but will continue to explain at a high level what new libraries, functions, and methods are doing. 5. Build the model using the UI which will **execute**: ```bash dbt run --select fastest_pit_stops_by_constructor ``` in the command bar. Let’s look at some details of our first Python model to see what our model executed. There two major differences we can see while running a Python model compared to an SQL model: * Our Python model was executed as a stored procedure. Snowflake needs a way to know that it's meant to execute this code in a Python runtime, instead of interpreting in a SQL runtime. We do this by creating a Python stored proc, called by a SQL command. * The `snowflake-snowpark-python` library has been picked up to execute our Python code. Even though this wasn’t explicitly stated this is picked up by the dbt class object because we need our Snowpark package to run Python! Python models take a bit longer to run than SQL models, however we could always speed this up by using [Snowpark-optimized Warehouses](https://docs.snowflake.com/en/user-guide/warehouses-snowpark-optimized.html) if we wanted to. Our data is sufficiently small, so we won’t worry about creating a separate warehouse for Python versus SQL files today. [![We can see our python model is run a stored procedure in our personal development schema](/img/guides/dbt-ecosystem/dbt-python-snowpark/10-python-transformations/1-python-model-details-output.png?v=2 "We can see our python model is run a stored procedure in our personal development schema")](#)We can see our python model is run a stored procedure in our personal development schema The rest of our **Details** output gives us information about how dbt and Snowpark for Python are working together to define class objects and apply a specific set of methods to run our models. So which constructor had the fastest pit stops in 2021? Let’s look at our data to find out! 6. We can't preview Python models directly, so let’s create a new file using the **+** button or the Control-n shortcut to create a new scratchpad. 7. Reference our Python model: ```sql select * from {{ ref('fastest_pit_stops_by_constructor') }} ``` and preview the output: [![Looking at our new python data model we can see that Red Bull had the fastest pit stops!](/img/guides/dbt-ecosystem/dbt-python-snowpark/10-python-transformations/2-fastest-pit-stops-preview.png?v=2 "Looking at our new python data model we can see that Red Bull had the fastest pit stops!")](#)Looking at our new python data model we can see that Red Bull had the fastest pit stops! Not only did Red Bull have the fastest average pit stops by nearly 40 seconds, they also had the smallest standard deviation, meaning they are both fastest and most consistent teams in pit stops. By using the `.describe()` method we were able to avoid verbose SQL requiring us to create a line of code per column and repetitively use the `PERCENTILE_COUNT()` function. Now we want to find the lap time average and rolling average through the years (is it generally trending up or down)? 8. Create a new file called `lap_times_moving_avg.py` in our `aggregates` folder. 9. Copy the following code into the file: ```python import pandas as pd def model(dbt, session): # dbt configuration dbt.config(packages=["pandas"]) # get upstream data lap_times = dbt.ref("int_lap_times_years").to_pandas() # describe the data lap_times["LAP_TIME_SECONDS"] = lap_times["LAP_TIME_MILLISECONDS"]/1000 lap_time_trends = lap_times.groupby(by="RACE_YEAR")["LAP_TIME_SECONDS"].mean().to_frame() lap_time_trends.reset_index(inplace=True) lap_time_trends["LAP_MOVING_AVG_5_YEARS"] = lap_time_trends["LAP_TIME_SECONDS"].rolling(5).mean() lap_time_trends.columns = lap_time_trends.columns.str.upper() return lap_time_trends.round(1) ``` 10. Breaking down our code a bit: * We’re only using the `pandas` library for this model and casting it to a pandas data frame `.to_pandas()`. * Generate a new column called `LAP_TIMES_SECONDS` by dividing the value of `LAP_TIME_MILLISECONDS` by 1000. * Create the final dataframe. Get the lap time per year. Calculate the mean series and convert to a data frame. * Reset the index. * Calculate the rolling 5 year mean. * Round our numeric columns to one decimal place. 11. Now, run this model by using the UI **Run model** or ```bash dbt run --select lap_times_moving_avg ``` in the command bar. 12. Once again previewing the output of our data using the same steps for our `fastest_pit_stops_by_constructor` model. [![Viewing our lap trends and 5 year rolling trends](/img/guides/dbt-ecosystem/dbt-python-snowpark/10-python-transformations/3-lap-times-trends-preview.png?v=2 "Viewing our lap trends and 5 year rolling trends")](#)Viewing our lap trends and 5 year rolling trends We can see that it looks like lap times are getting consistently faster over time. Then in 2010 we see an increase occur! Using outside subject matter context, we know that significant rule changes were introduced to Formula 1 in 2010 and 2011 causing slower lap times. 13. Now is a good time to checkpoint and commit our work to Git. Click **Commit and push** and give your commit a message like `aggregate python models` before moving on. ##### The dbt model, .source(), .ref() and .config() functions[​](#the-dbt-model-source-ref-and-config-functions "Direct link to The dbt model, .source(), .ref() and .config() functions") Let’s take a step back before starting machine learning to both review and go more in-depth at the methods that make running dbt python models possible. If you want to know more outside of this lab’s explanation read the documentation [here](https://docs.getdbt.com/docs/build/python-models.md?version=1). * dbt model(dbt, session). For starters, each Python model lives in a .py file in your models/ folder. It defines a function named `model()`, which takes two parameters: * dbt — A class compiled by dbt Core, unique to each model, enables you to run your Python code in the context of your dbt project and DAG. * session — A class representing your data platform’s connection to the Python backend. The session is needed to read in tables as DataFrames and to write DataFrames back to tables. In PySpark, by convention, the SparkSession is named spark, and available globally. For consistency across platforms, we always pass it into the model function as an explicit argument called session. * The `model()` function must return a single DataFrame. On Snowpark (Snowflake), this can be a Snowpark or pandas DataFrame. * `.source()` and `.ref()` functions. Python models participate fully in dbt's directed acyclic graph (DAG) of transformations. If you want to read directly from a raw source table, use `dbt.source()`. We saw this in our earlier section using SQL with the source function. These functions have the same execution, but with different syntax. Use the `dbt.ref()` method within a Python model to read data from other models (SQL or Python). These methods return DataFrames pointing to the upstream source, model, seed, or snapshot. * `.config()`. Just like SQL models, there are three ways to configure Python models: * In a dedicated `.yml` file, within the `models/` directory * Within the model's `.py` file, using the `dbt.config()` method * Calling the `dbt.config()` method will set configurations for your model within your `.py` file, similar to the `{{ config() }} macro` in `.sql` model files: ```python def model(dbt, session): # setting configuration dbt.config(materialized="table") ``` * There's a limit to how complex you can get with the `dbt.config()` method. It accepts only literal values (strings, booleans, and numeric types). Passing another function or a more complex data structure is not possible. The reason is that dbt statically analyzes the arguments to `.config()` while parsing your model without executing your Python code. If you need to set a more complex configuration, we recommend you define it using the config property in a [properties YAML file](https://docs.getdbt.com/reference/resource-properties/config.md). Learn more about configurations [here](https://docs.getdbt.com/reference/model-configs.md). #### Prepare for machine learning: cleaning, encoding, and splits[​](#prepare-for-machine-learning-cleaning-encoding-and-splits "Direct link to Prepare for machine learning: cleaning, encoding, and splits") Now that we’ve gained insights and business intelligence about Formula 1 at a descriptive level, we want to extend our capabilities into prediction. We’re going to take the scenario where we censor the data. This means that we will pretend that we will train a model using earlier data and apply it to future data. In practice, this means we’ll take data from 2010-2019 to train our model and then predict 2020 data. In this section, we’ll be preparing our data to predict the final race position of a driver. At a high level we’ll be: * Creating new prediction features and filtering our dataset to active drivers * Encoding our data (algorithms like numbers) and simplifying our target variable called `position` * Splitting our dataset into training, testing, and validation ##### ML data prep[​](#ml-data-prep "Direct link to ML data prep") 1. To keep our project organized, we’ll need to create two new subfolders in our `ml` directory. Under the `ml` folder, make the subfolders `prep` and `train_predict`. 2. Create a new file under `ml/prep` called `ml_data_prep.py`. Copy the following code into the file and **Save**. ```python import pandas as pd def model(dbt, session): # dbt configuration dbt.config(packages=["pandas"]) # get upstream data fct_results = dbt.ref("fct_results").to_pandas() # provide years so we do not hardcode dates in filter command start_year=2010 end_year=2020 # describe the data for a full decade data = fct_results.loc[fct_results['RACE_YEAR'].between(start_year, end_year)] # convert string to an integer data['POSITION'] = data['POSITION'].astype(float) # we cannot have nulls if we want to use total pit stops data['TOTAL_PIT_STOPS_PER_RACE'] = data['TOTAL_PIT_STOPS_PER_RACE'].fillna(0) # some of the constructors changed their name over the year so replacing old names with current name mapping = {'Force India': 'Racing Point', 'Sauber': 'Alfa Romeo', 'Lotus F1': 'Renault', 'Toro Rosso': 'AlphaTauri'} data['CONSTRUCTOR_NAME'].replace(mapping, inplace=True) # create confidence metrics for drivers and constructors dnf_by_driver = data.groupby('DRIVER').sum(numeric_only=True)['DNF_FLAG'] driver_race_entered = data.groupby('DRIVER').count()['DNF_FLAG'] driver_dnf_ratio = (dnf_by_driver/driver_race_entered) driver_confidence = 1-driver_dnf_ratio driver_confidence_dict = dict(zip(driver_confidence.index,driver_confidence)) dnf_by_constructor = data.groupby('CONSTRUCTOR_NAME').sum(numeric_only=True)['DNF_FLAG'] constructor_race_entered = data.groupby('CONSTRUCTOR_NAME').count()['DNF_FLAG'] constructor_dnf_ratio = (dnf_by_constructor/constructor_race_entered) constructor_relaiblity = 1-constructor_dnf_ratio constructor_relaiblity_dict = dict(zip(constructor_relaiblity.index,constructor_relaiblity)) data['DRIVER_CONFIDENCE'] = data['DRIVER'].apply(lambda x:driver_confidence_dict[x]) data['CONSTRUCTOR_RELAIBLITY'] = data['CONSTRUCTOR_NAME'].apply(lambda x:constructor_relaiblity_dict[x]) #removing retired drivers and constructors active_constructors = ['Renault', 'Williams', 'McLaren', 'Ferrari', 'Mercedes', 'AlphaTauri', 'Racing Point', 'Alfa Romeo', 'Red Bull', 'Haas F1 Team'] active_drivers = ['Daniel Ricciardo', 'Kevin Magnussen', 'Carlos Sainz', 'Valtteri Bottas', 'Lance Stroll', 'George Russell', 'Lando Norris', 'Sebastian Vettel', 'Kimi Räikkönen', 'Charles Leclerc', 'Lewis Hamilton', 'Daniil Kvyat', 'Max Verstappen', 'Pierre Gasly', 'Alexander Albon', 'Sergio Pérez', 'Esteban Ocon', 'Antonio Giovinazzi', 'Romain Grosjean','Nicholas Latifi'] # create flags for active drivers and constructors so we can filter downstream data['ACTIVE_DRIVER'] = data['DRIVER'].apply(lambda x: int(x in active_drivers)) data['ACTIVE_CONSTRUCTOR'] = data['CONSTRUCTOR_NAME'].apply(lambda x: int(x in active_constructors)) return data ``` 3. As usual, let’s break down what we are doing in this Python model: * We’re first referencing our upstream `fct_results` table and casting it to a pandas dataframe. * Filtering on years 2010-2020 since we’ll need to clean all our data we are using for prediction (both training and testing). * Filling in empty data for `total_pit_stops` and making a mapping active constructors and drivers to avoid erroneous predictions * ⚠️ You might be wondering why we didn’t do this upstream in our `fct_results` table! The reason for this is that we want our machine learning cleanup to reflect the year 2020 for our predictions and give us an up-to-date team name. However, for business intelligence purposes we can keep the historical data at that point in time. Instead of thinking of one table as “one source of truth” we are creating different datasets fit for purpose: one for historical descriptions and reporting and another for relevant predictions. * Create new confidence features for drivers and constructors * Generate flags for the constructors and drivers that were active in 2020 4. Execute the following in the command bar: ```bash dbt run --select ml_data_prep ``` 5. There are more aspects we could consider for this project, such as normalizing the driver confidence by the number of races entered. Including this would help account for a driver’s history and consider whether they are a new or long-time driver. We’re going to keep it simple for now, but these are some of the ways we can expand and improve our machine learning dbt projects. Breaking down our machine learning prep model: * Lambda functions — We use some lambda functions to transform our data without having to create a fully-fledged function using the `def` notation. So what exactly are lambda functions? * In Python, a lambda function is a small, anonymous function defined using the keyword "lambda". Lambda functions are used to perform a quick operation, such as a mathematical calculation or a transformation on a list of elements. They are often used in conjunction with higher-order functions, such as `apply`, `map`, `filter`, and `reduce`. * `.apply()` method — We used `.apply()` to pass our functions into our lambda expressions to the columns and perform this multiple times in our code. Let’s explain apply a little more: * The `.apply()` function in the pandas library is used to apply a function to a specified axis of a DataFrame or a Series. In our case the function we used was our lambda function! * The `.apply()` function takes two arguments: the first is the function to be applied, and the second is the axis along which the function should be applied. The axis can be specified as 0 for rows or 1 for columns. We are using the default value of 0 so we aren’t explicitly writing it in the code. This means that the function will be applied to each *row* of the DataFrame or Series. 6. Let’s look at the preview of our clean dataframe after running our `ml_data_prep` model: [![What our clean dataframe fit for machine learning looks like](/img/guides/dbt-ecosystem/dbt-python-snowpark/11-machine-learning-prep/1-completed-ml-data-prep.png?v=2 "What our clean dataframe fit for machine learning looks like")](#)What our clean dataframe fit for machine learning looks like ##### Covariate encoding[​](#covariate-encoding "Direct link to Covariate encoding") In this next part, we’ll be performing covariate encoding. Breaking down this phrase a bit, a *covariate* is a variable that is relevant to the outcome of a study or experiment, and *encoding* refers to the process of converting data (such as text or categorical variables) into a numerical format that can be used as input for a model. This is necessary because most machine learning algorithms can only work with numerical data. Algorithms don’t speak languages, have eyes to see images, etc. so we encode our data into numbers so algorithms can perform tasks by using calculations they otherwise couldn’t. 🧠 We’ll think about this as : “algorithms like numbers”. 1. Create a new file under `ml/prep` called `covariate_encoding` copy the code below and save. ```python import pandas as pd import numpy as np from sklearn.preprocessing import StandardScaler,LabelEncoder,OneHotEncoder from sklearn.linear_model import LogisticRegression def model(dbt, session): # dbt configuration dbt.config(packages=["pandas","numpy","scikit-learn"]) # get upstream data data = dbt.ref("ml_data_prep").to_pandas() # list out covariates we want to use in addition to outcome variable we are modeling - position covariates = data[['RACE_YEAR','CIRCUIT_NAME','GRID','CONSTRUCTOR_NAME','DRIVER','DRIVERS_AGE_YEARS','DRIVER_CONFIDENCE','CONSTRUCTOR_RELAIBLITY','TOTAL_PIT_STOPS_PER_RACE','ACTIVE_DRIVER','ACTIVE_CONSTRUCTOR', 'POSITION']] # filter covariates on active drivers and constructors # use fil_cov as short for "filtered_covariates" fil_cov = covariates[(covariates['ACTIVE_DRIVER']==1)&(covariates['ACTIVE_CONSTRUCTOR']==1)] # Encode categorical variables using LabelEncoder # TODO: we'll update this to both ohe in the future for non-ordinal variables! le = LabelEncoder() fil_cov['CIRCUIT_NAME'] = le.fit_transform(fil_cov['CIRCUIT_NAME']) fil_cov['CONSTRUCTOR_NAME'] = le.fit_transform(fil_cov['CONSTRUCTOR_NAME']) fil_cov['DRIVER'] = le.fit_transform(fil_cov['DRIVER']) fil_cov['TOTAL_PIT_STOPS_PER_RACE'] = le.fit_transform(fil_cov['TOTAL_PIT_STOPS_PER_RACE']) # Simply target variable "position" to represent 3 meaningful categories in Formula1 # 1. Podium position 2. Points for team 3. Nothing - no podium or points! def position_index(x): if x<4: return 1 if x>10: return 3 else : return 2 # we are dropping the columns that we filtered on in addition to our training variable encoded_data = fil_cov.drop(['ACTIVE_DRIVER','ACTIVE_CONSTRUCTOR'],axis=1)) encoded_data['POSITION_LABEL']= encoded_data['POSITION'].apply(lambda x: position_index(x)) encoded_data_grouped_target = encoded_data.drop(['POSITION'],axis=1)) return encoded_data_grouped_target ``` 2. Execute the following in the command bar: ```bash dbt run --select covariate_encoding ``` 3. In this code, we are using a ton of functions from libraries! This is really cool, because we can utilize code other people have developed and bring it into our project simply by using the `import` function. [Scikit-learn](https://scikit-learn.org/stable/), “sklearn” for short, is an extremely popular data science library. Sklearn contains a wide range of machine learning techniques, including supervised and unsupervised learning algorithms, feature scaling and imputation, as well as tools model evaluation and selection. We’ll be using Sklearn for both preparing our covariates and creating models (our next section). 4. Our dataset is pretty small data so we are good to use pandas and `sklearn`. If you have larger data for your own project in mind, consider `dask` or `category_encoders`. 5. Breaking it down a bit more: * We’re selecting a subset of variables that will be used as predictors for a driver’s position. * Filter the dataset to only include rows using the active driver and constructor flags we created in the last step. * The next step is to use the `LabelEncoder` from scikit-learn to convert the categorical variables `CIRCUIT_NAME`, `CONSTRUCTOR_NAME`, `DRIVER`, and `TOTAL_PIT_STOPS_PER_RACE` into numerical values. * Create a new variable called `POSITION_LABEL`, which is a derived from our position variable. * 💭 Why are we changing our position variable? There are 20 total positions in Formula 1 and we are grouping them together to simplify the classification and improve performance. We also want to demonstrate you can create a new function within your dbt model! * Our new `position_label` variable has meaning: * In Formula1 if you are in: * Top 3 you get a “podium” position * Top 10 you gain points that add to your overall season total * Below top 10 you get no points! * We are mapping our original variable position to `position_label` to the corresponding places above to 1,2, and 3 respectively. * Drop the active driver and constructor flags since they were filter criteria and additionally drop our original position variable. ##### Splitting into training and testing datasets[​](#splitting-into-training-and-testing-datasets "Direct link to Splitting into training and testing datasets") Now that we’ve cleaned and encoded our data, we are going to further split in by time. In this step, we will create dataframes to use for training and prediction. We’ll be creating two dataframes 1) using data from 2010-2019 for training, and 2) data from 2020 for new prediction inferences. We’ll create variables called `start_year` and `end_year` so we aren’t filtering on hardcasted values (and can more easily swap them out in the future if we want to retrain our model on different timeframes). 1. Create a file called `train_test_dataset.py` copy and save the following code: ```python import pandas as pd def model(dbt, session): # dbt configuration dbt.config(packages=["pandas"], tags="train") # get upstream data encoding = dbt.ref("covariate_encoding").to_pandas() # provide years so we do not hardcode dates in filter command start_year=2010 end_year=2019 # describe the data for a full decade train_test_dataset = encoding.loc[encoding['RACE_YEAR'].between(start_year, end_year)] return train_test_dataset ``` 2. Create a file called `hold_out_dataset_for_prediction.py` copy and save the following code below. Now we’ll have a dataset with only the year 2020 that we’ll keep as a hold out set that we are going to use similar to a deployment use case. ```python import pandas as pd def model(dbt, session): # dbt configuration dbt.config(packages=["pandas"], tags="predict") # get upstream data encoding = dbt.ref("covariate_encoding").to_pandas() # variable for year instead of hardcoding it year=2020 # filter the data based on the specified year hold_out_dataset = encoding.loc[encoding['RACE_YEAR'] == year] return hold_out_dataset ``` 3. Execute the following in the command bar: ```bash dbt run --select train_test_dataset hold_out_dataset_for_prediction ``` To run our temporal data split models, we can use this syntax in the command line to run them both at once. Make sure you use a *space* [syntax](https://docs.getdbt.com/reference/node-selection/syntax.md) between the model names to indicate you want to run both! 4. **Commit and push** our changes to keep saving our work as we go using `ml data prep and splits` before moving on. 👏 Now that we’ve finished our machine learning prep work we can move onto the fun part — training and prediction! #### Training a model to predict in machine learning[​](#training-a-model-to-predict-in-machine-learning "Direct link to Training a model to predict in machine learning") We’re ready to start training a model to predict the driver’s position. Now is a good time to pause and take a step back and say, usually in ML projects you’ll try multiple algorithms during development and use an evaluation method such as cross validation to determine which algorithm to use. You can definitely do this in your dbt project, but for the content of this lab we’ll have decided on using a logistic regression to predict position (we actually tried some other algorithms using cross validation outside of this lab such as k-nearest neighbors and a support vector classifier but that didn’t perform as well as the logistic regression and a decision tree that overfit). There are 3 areas to break down as we go since we are working at the intersection all within one model file: 1. Machine Learning 2. Snowflake and Snowpark 3. dbt Python models If you haven’t seen code like this before or use joblib files to save machine learning models, we’ll be going over them at a high level and you can explore the links for more technical in-depth along the way! Because Snowflake and dbt have abstracted away a lot of the nitty gritty about serialization and storing our model object to be called again, we won’t go into too much detail here. There’s *a lot* going on here so take it at your pace! ##### Training and saving a machine learning model[​](#training-and-saving-a-machine-learning-model "Direct link to Training and saving a machine learning model") 1. Project organization remains key, so let’s make a new subfolder called `train_predict` under the `ml` folder. 2. Now create a new file called `train_test_position.py` and copy and save the following code: ```python import snowflake.snowpark.functions as F from sklearn.model_selection import train_test_split import pandas as pd from sklearn.metrics import confusion_matrix, balanced_accuracy_score import io from sklearn.linear_model import LogisticRegression from joblib import dump, load import joblib import logging import sys from joblib import dump, load logger = logging.getLogger("mylog") def save_file(session, model, path, dest_filename): input_stream = io.BytesIO() joblib.dump(model, input_stream) session._conn.upload_stream(input_stream, path, dest_filename) return "successfully created file: " + path def model(dbt, session): dbt.config( packages = ['numpy','scikit-learn','pandas','numpy','joblib','cachetools'], materialized = "table", tags = "train" ) # Create a stage in Snowflake to save our model file session.sql('create or replace stage MODELSTAGE').collect() #session._use_scoped_temp_objects = False version = "1.0" logger.info('Model training version: ' + version) # read in our training and testing upstream dataset test_train_df = dbt.ref("train_test_dataset") # cast snowpark df to pandas df test_train_pd_df = test_train_df.to_pandas() target_col = "POSITION_LABEL" # split out covariate predictors, x, from our target column position_label, y. split_X = test_train_pd_df.drop([target_col], axis=1) split_y = test_train_pd_df[target_col] # Split out our training and test data into proportions X_train, X_test, y_train, y_test = train_test_split(split_X, split_y, train_size=0.7, random_state=42) train = [X_train, y_train] test = [X_test, y_test] # now we are only training our one model to deploy # we are keeping the focus on the workflows and not algorithms for this lab! model = LogisticRegression() # fit the preprocessing pipeline and the model together model.fit(X_train, y_train) y_pred = model.predict_proba(X_test)[:,1] predictions = [round(value) for value in y_pred] balanced_accuracy = balanced_accuracy_score(y_test, predictions) # Save the model to a stage save_file(session, model, "@MODELSTAGE/driver_position_"+version, "driver_position_"+version+".joblib" ) logger.info('Model artifact:' + "@MODELSTAGE/driver_position_"+version+".joblib") # Take our pandas training and testing dataframes and put them back into snowpark dataframes snowpark_train_df = session.write_pandas(pd.concat(train, axis=1, join='inner'), "train_table", auto_create_table=True, create_temp_table=True) snowpark_test_df = session.write_pandas(pd.concat(test, axis=1, join='inner'), "test_table", auto_create_table=True, create_temp_table=True) # Union our training and testing data together and add a column indicating train vs test rows return snowpark_train_df.with_column("DATASET_TYPE", F.lit("train")).union(snowpark_test_df.with_column("DATASET_TYPE", F.lit("test"))) ``` 3. Execute the following in the command bar: ```bash dbt run --select train_test_position ``` 4. Breaking down our Python script here: * We’re importing some helpful libraries. * Defining a function called `save_file()` that takes four parameters: `session`, `model`, `path` and `dest_filename` that will save our logistic regression model file. * `session` — an object representing a connection to Snowflake. * `model` — an object that needs to be saved. In this case, it's a Python object that is a scikit-learn that can be serialized with joblib. * `path` — a string representing the directory or bucket location where the file should be saved. * `dest_filename` — a string representing the desired name of the file. * Creating our dbt model * Within this model we are creating a stage called `MODELSTAGE` to place our logistic regression `joblib` model file. This is really important since we need a place to keep our model to reuse and want to ensure it's there. When using Snowpark commands, it's common to see the `.collect()` method to ensure the action is performed. Think of the session as our “start” and collect as our “end” when [working with Snowpark](https://docs.snowflake.com/en/developer-guide/snowpark/python/working-with-dataframes.html) (you can use other ending methods other than collect). * Using `.ref()` to connect into our `train_test_dataset` model. * Now we see the machine learning part of our analysis: * Create new dataframes for our prediction features from our target variable `position_label`. * Split our dataset into 70% training (and 30% testing), train\_size=0.7 with a `random_state` specified to have repeatable results. * Specify our model is a logistic regression. * Fit our model. In a logistic regression this means finding the coefficients that will give the least classification error. * Round our predictions to the nearest integer since logistic regression creates a probability between for each class and calculate a balanced accuracy to account for imbalances in the target variable. * Right now our model is only in memory, so we need to use our nifty function `save_file` to save our model file to our Snowflake stage. We save our model as a joblib file so Snowpark can easily call this model object back to create predictions. We really don’t need to know much else as a data practitioner unless we want to. It’s worth noting that joblib files aren’t able to be queried directly by SQL. To do this, we would need to transform the joblib file to an SQL querable format such as JSON or CSV (out of scope for this workshop). * Finally we want to return our dataframe, but create a new column indicating what rows were used for training and those for training. 5. Viewing our output of this model: [![Preview which rows of our model were used for training and testing](/img/guides/dbt-ecosystem/dbt-python-snowpark/12-machine-learning-training-prediction/1-preview-train-test-position.png?v=2 "Preview which rows of our model were used for training and testing")](#)Preview which rows of our model were used for training and testing 6. Let’s pop back over to Snowflake and check that our logistic regression model has been stored in our `MODELSTAGE` using the command: ```sql list @modelstage ``` [![List the objects in our Snowflake stage to check for our logistic regression to predict driver position](/img/guides/dbt-ecosystem/dbt-python-snowpark/12-machine-learning-training-prediction/2-list-snowflake-stage.png?v=2 "List the objects in our Snowflake stage to check for our logistic regression to predict driver position")](#)List the objects in our Snowflake stage to check for our logistic regression to predict driver position 7. To investigate the commands run as part of `train_test_position` script, navigate to Snowflake query history to view it **Activity > Query History**. We can view the portions of query that we wrote such as `create or replace stage MODELSTAGE`, but we also see additional queries that Snowflake uses to interpret python code. [![View Snowflake query history to see how python models are run under the hood](/img/guides/dbt-ecosystem/dbt-python-snowpark/12-machine-learning-training-prediction/3-view-snowflake-query-history.png?v=2 "View Snowflake query history to see how python models are run under the hood")](#)View Snowflake query history to see how python models are run under the hood ##### Predicting on new data[​](#predicting-on-new-data "Direct link to Predicting on new data") 1. Create a new file called `predict_position.py` and copy and save the following code: ```python import logging import joblib import pandas as pd import os from snowflake.snowpark import types as T DB_STAGE = 'MODELSTAGE' version = '1.0' # The name of the model file model_file_path = 'driver_position_'+version model_file_packaged = 'driver_position_'+version+'.joblib' # This is a local directory, used for storing the various artifacts locally LOCAL_TEMP_DIR = f'/tmp/driver_position' DOWNLOAD_DIR = os.path.join(LOCAL_TEMP_DIR, 'download') TARGET_MODEL_DIR_PATH = os.path.join(LOCAL_TEMP_DIR, 'ml_model') TARGET_LIB_PATH = os.path.join(LOCAL_TEMP_DIR, 'lib') # The feature columns that were used during model training # and that will be used during prediction FEATURE_COLS = [ "RACE_YEAR" ,"CIRCUIT_NAME" ,"GRID" ,"CONSTRUCTOR_NAME" ,"DRIVER" ,"DRIVERS_AGE_YEARS" ,"DRIVER_CONFIDENCE" ,"CONSTRUCTOR_RELAIBLITY" ,"TOTAL_PIT_STOPS_PER_RACE"] def register_udf_for_prediction(p_predictor ,p_session ,p_dbt): # The prediction udf def predict_position(p_df: T.PandasDataFrame[int, int, int, int, int, int, int, int, int]) -> T.PandasSeries[int]: # Snowpark currently does not set the column name in the input dataframe # The default col names are like 0,1,2,... Hence we need to reset the column # names to the features that we initially used for training. p_df.columns = [*FEATURE_COLS] # Perform prediction. this returns an array object pred_array = p_predictor.predict(p_df) # Convert to series df_predicted = pd.Series(pred_array) return df_predicted # The list of packages that will be used by UDF udf_packages = p_dbt.config.get('packages') predict_position_udf = p_session.udf.register( predict_position ,name=f'predict_position' ,packages = udf_packages ) return predict_position_udf def download_models_and_libs_from_stage(p_session): p_session.file.get(f'@{DB_STAGE}/{model_file_path}/{model_file_packaged}', DOWNLOAD_DIR) def load_model(p_session): # Load the model and initialize the predictor model_fl_path = os.path.join(DOWNLOAD_DIR, model_file_packaged) predictor = joblib.load(model_fl_path) return predictor # ------------------------------- def model(dbt, session): dbt.config( packages = ['snowflake-snowpark-python' ,'scipy','scikit-learn' ,'pandas' ,'numpy'], materialized = "table", tags = "predict" ) session._use_scoped_temp_objects = False download_models_and_libs_from_stage(session) predictor = load_model(session) predict_position_udf = register_udf_for_prediction(predictor, session ,dbt) # Retrieve the data, and perform the prediction hold_out_df = (dbt.ref("hold_out_dataset_for_prediction") .select(*FEATURE_COLS) ) # Perform prediction. new_predictions_df = hold_out_df.withColumn("position_predicted" ,predict_position_udf(*FEATURE_COLS) ) return new_predictions_df ``` 2. Execute the following in the command bar: ```bash dbt run --select predict_position ``` 3. **Commit and push** our changes to keep saving our work as we go using the commit message `logistic regression model training and application` before moving on. 4. At a high level in this script, we are: * Retrieving our staged logistic regression model * Loading the model in * Placing the model within a user defined function (UDF) to call in line predictions on our driver’s position 5. At a more detailed level: * Import our libraries. * Create variables to reference back to the `MODELSTAGE` we just created and stored our model to. * The temporary file paths we created might look intimidating, but all we’re doing here is programmatically using an initial file path and adding to it to create the following directories: * LOCAL\_TEMP\_DIR ➡️ /tmp/driver\_position * DOWNLOAD\_DIR ➡️ /tmp/driver\_position/download * TARGET\_MODEL\_DIR\_PATH ➡️ /tmp/driver\_position/ml\_model * TARGET\_LIB\_PATH ➡️ /tmp/driver\_position/lib * Provide a list of our feature columns that we used for model training and will now be used on new data for prediction. * Next, we are creating our main function `register_udf_for_prediction(p_predictor ,p_session ,p_dbt):`. This function is used to register a user-defined function (UDF) that performs the machine learning prediction. It takes three parameters: `p_predictor` is an instance of the machine learning model, `p_session` is an instance of the Snowflake session, and `p_dbt` is an instance of the dbt library. The function creates a UDF named `predict_churn` which takes a pandas dataframe with the input features and returns a pandas series with the predictions. * ⚠️ Pay close attention to the whitespace here. We are using a function within a function for this script. * We have 2 simple functions that are programmatically retrieving our file paths to first get our stored model out of our `MODELSTAGE` and downloaded into the session `download_models_and_libs_from_stage` and then to load the contents of our model in (parameters) in `load_model` to use for prediction. * Take the model we loaded in and call it `predictor` and wrap it in a UDF. * Return our dataframe with both the features used to predict and the new label. 🧠 Another way to read this script is from the bottom up. This can help us progressively see what is going into our final dbt model and work backwards to see how the other functions are being referenced. 6. Let’s take a look at our predicted position alongside our feature variables. Open a new scratchpad and use the following query. I chose to order by the prediction of who would obtain a podium position: ```sql select * from {{ ref('predict_position') }} order by position_predicted ``` 7. We can see that we created predictions in our final dataset, we are ready to move on to testing! #### Test your data models[​](#test-your-data-models "Direct link to Test your data models") We have now completed building all the models for today’s lab, but how do we know if they meet our assertions? Put another way, how do we know the quality of our data models are any good? This brings us to testing! We test data models for mainly two reasons: * Ensure that our source data is clean on ingestion before we start data modeling/transformation (aka avoid garbage in, garbage out problem). * Make sure we don’t introduce bugs in the transformation code we wrote (stop ourselves from creating bad joins/fanouts). Testing in dbt comes in two flavors: [generic](https://docs.getdbt.com/docs/build/data-tests.md#generic-data-tests) and [singular](https://docs.getdbt.com/docs/build/data-tests.md#singular-data-tests). You define them in a test block (similar to a macro) and once defined, you can reference them by name in your `.yml` files (applying them to models, columns, sources, snapshots, and seeds). You might be wondering: *what about testing Python models?* Since the output of our Python models are tables, we can test SQL and Python models the same way! We don’t have to worry about any syntax differences when testing SQL versus Python data models. This means we use `.yml` and `.sql` files to test our entities (tables, views, etc.). Under the hood, dbt is running an SQL query on our tables to see if they meet assertions. If no rows are returned, dbt will surface a passed test. Conversely, if a test results in returned rows, it will fail or warn depending on the configuration (more on that later). ##### Generic tests[​](#generic-tests "Direct link to Generic tests") 1. To implement generic out-of-the-box tests dbt comes with, we can use YAML files to specify information about our models. To add generic tests to our aggregates model, create a file called `aggregates.yml`, copy the code block below into the file, and save. [![The aggregates.yml file in our file tree](/img/guides/dbt-ecosystem/dbt-python-snowpark/13-testing/1-generic-testing-file-tree.png?v=2 "The aggregates.yml file in our file tree")](#)The aggregates.yml file in our file tree ```yaml models: - name: fastest_pit_stops_by_constructor description: Use the python .describe() method to retrieve summary statistics table about pit stops by constructor. Sort by average stop time ascending so the first row returns the fastest constructor. columns: - name: constructor_name description: team that makes the car data_tests: - unique - name: lap_times_moving_avg description: Use the python .rolling() method to calculate the 5 year rolling average of pit stop times alongside the average for each year. columns: - name: race_year description: year of the race data_tests: - relationships: arguments: # available in v1.10.5 and higher. Older versions can set the as the top-level property. to: ref('int_lap_times_years') field: race_year ``` 2. Let’s unpack the code we have here. We have both our aggregates models with the model name to know the object we are referencing and the description of the model that we’ll populate in our documentation. At the column level (a level below our model), we are providing the column name followed by our tests. We want to ensure our `constructor_name` is unique since we used a pandas `groupby` on `constructor_name` in the model `fastest_pit_stops_by_constructor`. Next, we want to ensure our `race_year` has referential integrity from the model we selected from `int_lap_times_years` into our subsequent `lap_times_moving_avg` model. 3. Finally, if we want to see how tests were deployed on sources and SQL models, we can look at other files in our project such as the `f1_sources.yml` we created in our Sources and staging section. ##### Using macros for testing[​](#using-macros-for-testing "Direct link to Using macros for testing") 1. Under your `macros` folder, create a new file and name it `test_all_values_gte_zero.sql`. Copy the code block below and save the file. For clarity, “gte” is an abbreviation for greater than or equal to. [![macro file for reusable testing code](/img/guides/dbt-ecosystem/dbt-python-snowpark/13-testing/2-macro-testing.png?v=2 "macro file for reusable testing code")](#)macro file for reusable testing code ```sql {% macro test_all_values_gte_zero(table, column) %} select * from {{ ref(table) }} where {{ column }} < 0 {% endmacro %} ``` 2. Macros in Jinja are pieces of code that can be reused multiple times in our SQL models — they are analogous to "functions" in other programming languages, and are extremely useful if you find yourself repeating code across multiple models. 3. We use the `{% macro %}` to indicate the start of the macro and `{% endmacro %}` for the end. The text after the beginning of the macro block is the name we are giving the macro to later call it. In this case, our macro is called `test_all_values_gte_zero`. Macros take in *arguments* to pass through, in this case the `table` and the `column`. In the body of the macro, we see an SQL statement that is using the `ref` function to dynamically select the table and then the column. You can always view macros without having to run them by using `dbt run-operation`. You can learn more [here](https://docs.getdbt.com/reference/commands/run-operation.md). 4. Great, now we want to reference this macro as a test! Let’s create a new test file called `macro_pit_stops_mean_is_positive.sql` in our `tests` folder. [![creating a test on our pit stops model referencing the macro](/img/guides/dbt-ecosystem/dbt-python-snowpark/13-testing/3-gte-macro-applied-to-pit-stops.png?v=2 "creating a test on our pit stops model referencing the macro")](#)creating a test on our pit stops model referencing the macro 5. Copy the following code into the file and save: ```sql {{ config( enabled=true, severity='warn', tags = ['bi'] ) }} {{ test_all_values_gte_zero('fastest_pit_stops_by_constructor', 'mean') }} ``` 6. In our testing file, we are applying some configurations to the test including `enabled`, which is an optional configuration for disabling models, seeds, snapshots, and tests. Our severity is set to `warn` instead of `error`, which means our pipeline will still continue to run. We have tagged our test with `bi` since we are applying this test to one of our bi models. Then, in our final line, we are calling the `test_all_values_gte_zero` macro that takes in our table and column arguments and inputting our table `'fastest_pit_stops_by_constructor'` and the column `'mean'`. ##### Custom singular tests to validate Python models[​](#custom-singular-tests-to-validate-python-models "Direct link to Custom singular tests to validate Python models") The simplest way to define a test is by writing the exact SQL that will return failing records. We call these "singular" tests, because they're one-off assertions usable for a single purpose. These tests are defined in `.sql` files, typically in your `tests` directory (as defined by your test-paths config). You can use Jinja in SQL models (including ref and source) in the test definition, just like you can when creating models. Each `.sql` file contains one select statement, and it defines one test. Let’s add a custom test that asserts that the moving average of the lap time over the last 5 years is greater than zero (it’s impossible to have time less than 0!). It is easy to assume if this is not the case the data has been corrupted. 1. Create a file `lap_times_moving_avg_assert_positive_or_null.sql` under the `tests` folder. [![custom singular test for testing lap times are positive values](/img/guides/dbt-ecosystem/dbt-python-snowpark/13-testing/4-custom-singular-test.png?v=2 "custom singular test for testing lap times are positive values")](#)custom singular test for testing lap times are positive values 2. Copy the following code and save the file: ```sql {{ config( enabled=true, severity='error', tags = ['bi'] ) }} with lap_times_moving_avg as ( select * from {{ ref('lap_times_moving_avg') }} ) select * from lap_times_moving_avg where lap_moving_avg_5_years < 0 and lap_moving_avg_5_years is not null ``` ##### Putting all our tests together[​](#putting-all-our-tests-together "Direct link to Putting all our tests together") 1. Time to run our tests! Altogether, we have created 4 tests for our 2 Python models: * `fastest_pit_stops_by_constructor` * Unique `constructor_name` * Lap times are greater than 0 or null (to allow for the first leading values in a rolling calculation) * `lap_times_moving_avg` * Referential test on `race_year` * Mean pit stop times are greater than or equal to 0 (no negative time values) 2. To run the tests on both our models, we can use this syntax in the command line to run them both at once, similar to how we did our data splits earlier. Execute the following in the command bar: ```bash dbt test --select fastest_pit_stops_by_constructor lap_times_moving_avg ``` [![running tests on our python models](/img/guides/dbt-ecosystem/dbt-python-snowpark/13-testing/5-running-tests-on-python-models.png?v=2 "running tests on our python models")](#)running tests on our python models 3. All 4 of our tests passed (yay for clean data)! To understand the SQL being run against each of our tables, we can click into the details of the test. 4. Navigating into the **Details** of the `unique_fastest_pit_stops_by_constructor_name`, we can see that each line `constructor_name` should only have one row. [![view details of testing our python model that used SQL to test data assertions](/img/guides/dbt-ecosystem/dbt-python-snowpark/13-testing/6-testing-output-details.png?v=2 "view details of testing our python model that used SQL to test data assertions")](#)view details of testing our python model that used SQL to test data assertions #### Document your dbt project[​](#document-your-dbt-project "Direct link to Document your dbt project") When it comes to documentation, dbt brings together both column and model level descriptions that you can provide as well as details from your Snowflake information schema in a static site for consumption by other data team members and stakeholders. We are going to revisit 2 areas of our project to understand our documentation: * `intermediate.md` file * `dbt_project.yml` file To start, let’s look back at our `intermediate.md` file. We can see that we provided multi-line descriptions for the models in our intermediate models using [docs blocks](https://docs.getdbt.com/docs/build/documentation.md#using-docs-blocks). Then we reference these docs blocks in our `.yml` file. Building descriptions with doc blocks in Markdown files gives you the ability to format your descriptions with Markdown and are particularly helpful when building long descriptions, either at the column or model level. In our `dbt_project.yml`, we added `node_colors` at folder levels. 1. To see all these pieces come together, execute this in the command bar: ```bash dbt docs generate ``` This will generate the documentation for your project. Click the book button, as shown in the screenshot below to access the docs. [![dbt docs book icon](/img/guides/dbt-ecosystem/dbt-python-snowpark/14-documentation/1-docs-icon.png?v=2 "dbt docs book icon")](#)dbt docs book icon 2. Go to our project area and view `int_results`. View the description that we created in our doc block. [![Docblock description within docs site](/img/guides/dbt-ecosystem/dbt-python-snowpark/14-documentation/2-view-docblock-description.png?v=2 "Docblock description within docs site")](#)Docblock description within docs site 3. View the mini-lineage that looks at the model we are currently selected on (`int_results` in this case). [![Mini lineage view on docs site](/img/guides/dbt-ecosystem/dbt-python-snowpark/14-documentation/3-mini-lineage-docs.png?v=2 "Mini lineage view on docs site")](#)Mini lineage view on docs site 4. In our `dbt_project.yml`, we configured `node_colors` depending on the file directory. By color coding your project, it can help you cluster together similar models or steps and more easily troubleshoot when viewing lineage in your docs. [![Full project DAG on docs site](/img/guides/dbt-ecosystem/dbt-python-snowpark/14-documentation/4-full-dag-docs.png?v=2 "Full project DAG on docs site")](#)Full project DAG on docs site #### Deploy your code[​](#deploy-your-code "Direct link to Deploy your code") Before we jump into deploying our code, let's have a quick primer on environments. Up to this point, all of the work we've done in the Studio IDE has been in our development environment, with code committed to a feature branch and the models we've built created in our development schema in Snowflake as defined in our Development environment connection. Doing this work on a feature branch, allows us to separate our code from what other coworkers are building and code that is already deemed production ready. Building models in a development schema in Snowflake allows us to separate the database objects we might still be modifying and testing from the database objects running production dashboards or other downstream dependencies. Together, the combination of a Git branch and Snowflake database objects form our environment. Now that we've completed testing and documenting our work, we're ready to deploy our code from our development environment to our production environment and this involves two steps: * Promoting code from our feature branch to the production branch in our repository. * Generally, the production branch is going to be named your main branch and there's a review process to go through before merging code to the main branch of a repository. Here we are going to merge without review for ease of this workshop. * Deploying code to our production environment. * Once our code is merged to the main branch, we'll need to run dbt in our production environment to build all of our models and run all of our tests. This will allow us to build production-ready objects into our production environment in Snowflake. Luckily for us, the Partner Connect flow has already created our deployment environment and job to facilitate this step. 1. Before getting started, let's make sure that we've committed all of our work to our feature branch. If you still have work to commit, you'll be able to select the **Commit and push**, provide a message, and then select **Commit** again. 2. Once all of your work is committed, the git workflow button will now appear as **Merge this branch to main**. Click **Merge this branch to main** and the merge process will automatically run in the background. [![Merge this branch to main](/img/guides/dbt-ecosystem/dbt-python-snowpark/15-deployment/1-merge-to-main-branch.png?v=2 "Merge this branch to main")](#)Merge this branch to main 3. When it's completed, you should see the git button read **Create branch** and the branch you're currently looking at will become **main**. 4. Now that all of our development work has been merged to the main branch, we can build our deployment job. Given that our production environment and production job were created automatically for us through Partner Connect, all we need to do here is update some default configurations to meet our needs. 5. In the left-hand menu, go to **Orchestration** > **Environments**. 6. You should see two environments listed and you'll want to select the **Deployment** environment then **Settings** to modify it. 7. Before making any changes, let's touch on what is defined within this environment. The Snowflake connection shows the credentials that dbt is using for this environment and in our case they are the same as what was created for us through Partner Connect. Our deployment job will build in our `PC_DBT_DB` database and use the default Partner Connect role and warehouse to do so. The deployment credentials section also uses the info that was created in our Partner Connect job to create the credential connection. However, it is using the same default schema that we've been using as the schema for our development environment. 8. Let's update the schema to create a new schema specifically for our production environment. Click **Edit** to allow you to modify the existing field values. Navigate to **Deployment Credentials >** **schema.** 9. Update the schema name to **production**. Remember to select **Save** after you've made the change. [![Update the deployment credentials schema to production](/img/guides/dbt-ecosystem/dbt-python-snowpark/15-deployment/3-update-deployment-credentials-production.png?v=2 "Update the deployment credentials schema to production")](#)Update the deployment credentials schema to production 10. By updating the schema for our production environment to **production**, it ensures that our deployment job for this environment will build our dbt models in the **production** schema within the `PC_DBT_DB` database as defined in the Snowflake Connection section. 11. Now let's switch over to our production job. Click on the deploy tab again and then select **Jobs**. You should see an existing and preconfigured **Partner Connect Trial Job**. Similar to the environment, click on the job, then select **Settings** to modify it. Let's take a look at the job to understand it before making changes. * The Environment section is what connects this job with the environment we want it to run in. This job is already defaulted to use the Deployment environment that we just updated and the rest of the settings we can keep as is. * The Execution settings section gives us the option to generate docs, run source freshness, and defer to a previous run state. For the purposes of our lab, we're going to keep these settings as is as well and stick with just generating docs. * The Commands section is where we specify exactly which commands we want to run during this job, and we also want to keep this as is. We want our seed to be uploaded first, then run our models, and finally test them. The order of this is important as well, considering that we need our seed to be created before we can run our incremental model, and we need our models to be created before we can test them. * Finally, we have the Triggers section, where we have a number of different options for scheduling our job. Given that our data isn't updating regularly here and we're running this job manually for now, we're also going to leave this section alone. So, what are we changing then? Just the name! Click **Edit** to allow you to make changes. Then update the name of the job to **Production Job** to denote this as our production deployment job. After that's done, click **Save**. 12. Now let's go to run our job. Clicking on the job name in the path at the top of the screen will take you back to the job run history page where you'll be able to click **Run run** to kick off the job. If you encounter any job failures, try running the job again before further troubleshooting. [![Run production job](/img/guides/dbt-ecosystem/dbt-python-snowpark/15-deployment/4-run-production-job.png?v=2 "Run production job")](#)Run production job [![View production job details](/img/guides/dbt-ecosystem/dbt-python-snowpark/15-deployment/5-job-details.png?v=2 "View production job details")](#)View production job details 13. Let's go over to Snowflake to confirm that everything built as expected in our production schema. Refresh the database objects in your Snowflake account and you should see the production schema now within our default Partner Connect database. If you click into the schema and everything ran successfully, you should be able to see all of the models we developed. [![Check all our models in our pipeline are in Snowflake](/img/guides/dbt-ecosystem/dbt-python-snowpark/15-deployment/6-all-models-generated.png?v=2 "Check all our models in our pipeline are in Snowflake")](#)Check all our models in our pipeline are in Snowflake ##### Conclusion[​](#conclusion "Direct link to Conclusion") Fantastic! You’ve finished the workshop! We hope you feel empowered in using both SQL and Python in your dbt workflows with Snowflake. Having a reliable pipeline to surface both analytics and machine learning is crucial to creating tangible business value from your data. For more help and information join our [dbt community Slack](https://www.getdbt.com/community/) which contains more than 50,000 data practitioners today. We have a dedicated slack channel #db-snowflake to Snowflake related content. Happy dbt'ing! #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Migrate from dbt-spark to dbt-databricks [Back to guides](https://docs.getdbt.com/guides.md) Migration dbt Core dbt platform Intermediate [Menu ]() #### Introduction[​](#introduction "Direct link to Introduction") You can migrate your projects from using the `dbt-spark` adapter to using the [dbt-databricks adapter](https://github.com/databricks/dbt-databricks). In collaboration with dbt Labs, Databricks built this adapter using dbt-spark as the foundation and added some critical improvements. With it, you get an easier set up — requiring only three inputs for authentication — and more features such as support for [Unity Catalog](https://www.databricks.com/product/unity-catalog). ##### Prerequisite[​](#prerequisite "Direct link to Prerequisite") * For dbt, you need administrative (admin) privileges to migrate dbt projects. ##### Simpler authentication[​](#simpler-authentication "Direct link to Simpler authentication") Previously, you had to provide a `cluster` or `endpoint` ID which was hard to parse from the `http_path` that you were given. Now, it doesn't matter if you're using a cluster or an SQL endpoint because the [dbt-databricks setup](https://docs.getdbt.com/docs/local/connect-data-platform/databricks-setup.md) requires the *same* inputs for both. All you need to provide is: * hostname of the Databricks workspace * HTTP path of the Databricks SQL warehouse or cluster * appropriate credentials ##### Better defaults[​](#better-defaults "Direct link to Better defaults") The `dbt-databricks` adapter provides better defaults than `dbt-spark` does. The defaults help optimize your workflow so you can get the fast performance and cost-effectiveness of Databricks. They are: * The dbt models use the [Delta](https://docs.databricks.com/delta/index.html) table format. You can remove any declared configurations of `file_format = 'delta'` since they're now redundant. * Accelerate your expensive queries with the [Photon engine](https://docs.databricks.com/runtime/photon.html). * The `incremental_strategy` config is set to `merge`. With dbt-spark, however, the default for `incremental_strategy` is `append`. If you want to continue using `incremental_strategy=append`, you must set this config specifically on your incremental models. If you already specified `incremental_strategy=merge` on your incremental models, you don't need to change anything when moving to dbt-databricks; but, you can keep your models clean (tidy) by removing the config since it's redundant. Read [About incremental\_strategy](https://docs.getdbt.com/docs/build/incremental-strategy.md) to learn more. For more information on defaults, see [Caveats](https://docs.getdbt.com/docs/local/connect-data-platform/databricks-setup.md#caveats). ##### Pure Python[​](#pure-python "Direct link to Pure Python") If you use dbt Core, you no longer have to download an independent driver to interact with Databricks. The connection information is all embedded in a pure-Python library called `databricks-sql-connector`. #### Migrate your dbt projects in dbt[​](#migrate-your-dbt-projects-in-dbt "Direct link to Migrate your dbt projects in dbt") You can migrate your projects to the Databricks-specific adapter from the generic Apache Spark adapter. If you're using dbt Core, then skip to Step 4. The migration to the `dbt-databricks` adapter from `dbt-spark` shouldn't cause any downtime for production jobs. dbt Labs recommends that you schedule the connection change when usage of the IDE is light to avoid disrupting your team. To update your Databricks connection in dbt: 1. Select **Account Settings** in the main navigation bar. 2. On the **Projects** tab, find the project you want to migrate to the dbt-databricks adapter. 3. Click the hyperlinked Connection for the project. 4. Click **Edit** in the top right corner. 5. Select **Databricks** for the warehouse 6. Enter the: 1. `hostname` 2. `http_path` 3. (optional) catalog name 7. Click **Save**. Everyone in your organization who uses dbt must refresh the Studio IDE before starting work again. It should refresh in less than a minute. #### Configure your credentials[​](#configure-your-credentials "Direct link to Configure your credentials") When you update the Databricks connection in dbt, your team will not lose their credentials. This makes migrating easier since it only requires you to delete the Databricks connection and re-add the cluster or endpoint information. These credentials will not get lost when there's a successful connection to Databricks using the `dbt-spark` ODBC method: * The credentials you supplied to dbt to connect to your Databricks workspace. * The personal access tokens your team added in their dbt profile so they can develop in the Studio IDE for a given project. * The access token you added for each deployment environment so dbt can connect to Databricks during production jobs. #### Migrate dbt projects in dbt Core[​](#migrate-dbt-projects-in-dbt-core "Direct link to Migrate dbt projects in dbt Core") To migrate your dbt Core projects to the `dbt-databricks` adapter from `dbt-spark`, you: 1. Install the [dbt-databricks adapter](https://github.com/databricks/dbt-databricks) in your environment 2. Update your Databricks connection by modifying your `target` in your `~/.dbt/profiles.yml` file Anyone who's using your project must also make these changes in their environment. #### Try these examples[​](#try-these-examples "Direct link to Try these examples") You can use the following examples of the `profiles.yml` file to see the authentication setup with `dbt-spark` compared to the simpler setup with `dbt-databricks` when connecting to an SQL endpoint. A cluster example would look similar. An example of what authentication looks like with `dbt-spark`: \~/.dbt/profiles.yml ```yaml your_profile_name: target: dev outputs: dev: type: spark method: odbc driver: '/opt/simba/spark/lib/64/libsparkodbc_sb64.so' schema: my_schema host: dbc-l33t-nwb.cloud.databricks.com endpoint: 8657cad335ae63e3 token: [my_secret_token] ``` An example of how much simpler authentication is with `dbt-databricks`: \~/.dbt/profiles.yml ```yaml your_profile_name: target: dev outputs: dev: type: databricks schema: my_schema host: dbc-l33t-nwb.cloud.databricks.com http_path: /sql/1.0/endpoints/8657cad335ae63e3 token: [my_secret_token] ``` #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Migrate from DDL, DML, and stored procedures [Back to guides](https://docs.getdbt.com/guides.md) Migration dbt Core Beginner [Menu ]() #### Introduction[​](#introduction "Direct link to Introduction") One of the more common situations that new dbt adopters encounter is a historical codebase of transformations written as a hodgepodge of DDL and DML statements, or stored procedures. Going from DML statements to dbt models is often a challenging hump for new users to get over, because the process involves a significant paradigm shift between a procedural flow of building a dataset (e.g. a series of DDL and DML statements) to a declarative approach to defining a dataset (e.g. how dbt uses SELECT statements to express data models). This guide aims to provide tips, tricks, and common patterns for converting DML statements to dbt models. ##### Preparing to migrate[​](#preparing-to-migrate "Direct link to Preparing to migrate") Before getting into the meat of conversion, it’s worth noting that DML statements will not always illustrate a comprehensive set of columns and column types that an original table might contain. Without knowing the DDL to create the table, it’s impossible to know precisely if your conversion effort is apples-to-apples, but you can generally get close. If your data warehouse supports `SHOW CREATE TABLE`, that can be a quick way to get a comprehensive set of columns you’ll want to recreate. If you don’t have the DDL, but are working on a substantial stored procedure, one approach that can work is to pull column lists out of any DML statements that modify the table, and build up a full set of the columns that appear. As for ensuring that you have the right column types, since models materialized by dbt generally use `CREATE TABLE AS SELECT` or `CREATE VIEW AS SELECT` as the driver for object creation, tables can end up with unintended column types if the queries aren’t explicit. For example, if you care about `INT` versus `DECIMAL` versus `NUMERIC`, it’s generally going to be best to be explicit. The good news is that this is easy with dbt: you just cast the column to the type you intend. We also generally recommend that column renaming and type casting happen as close to the source tables as possible, typically in a layer of staging transformations, which helps ensure that future dbt modelers will know where to look for those transformations! See [How we structure our dbt projects](https://docs.getdbt.com/best-practices/how-we-structure/1-guide-overview.md) for more guidance on overall project structure. ##### Operations we need to map[​](#operations-we-need-to-map "Direct link to Operations we need to map") There are four primary DML statements that you are likely to have to convert to dbt operations while migrating a procedure: * `INSERT` * `UPDATE` * `DELETE` * `MERGE` Each of these can be addressed using various techniques in dbt. Handling `MERGE`s is a bit more involved than the rest, but can be handled effectively via dbt. The first three, however, are fairly simple to convert. #### Map INSERTs[​](#map-inserts "Direct link to Map INSERTs") An `INSERT` statement is functionally the same as using dbt to `SELECT` from an existing source or other dbt model. If you are faced with an `INSERT`-`SELECT` statement, the easiest way to convert the statement is to just create a new dbt model, and pull the `SELECT` portion of the `INSERT` statement out of the procedure and into the model. That’s basically it! To really break it down, let’s consider a simple example: ```sql INSERT INTO returned_orders (order_id, order_date, total_return) SELECT order_id, order_date, total FROM orders WHERE type = 'return' ``` Converting this with a first pass to a [dbt model](https://docs.getdbt.com/guides/bigquery.md?step=8) (in a file called returned\_orders.sql) might look something like: ```sql SELECT order_id as order_id, order_date as order_date, total as total_return FROM {{ ref('orders') }} WHERE type = 'return' ``` Functionally, this would create a model (which could be materialized as a table or view depending on needs) called `returned_orders` that contains three columns: `order_id`, `order_date`, `total_return`) predicated on the type column. It achieves the same end as the `INSERT`, just in a declarative fashion, using dbt. ##### **A note on `FROM` clauses**[​](#a-note-on-from-clauses "Direct link to a-note-on-from-clauses") In dbt, using a hard-coded table or view name in a `FROM` clause is one of the most serious mistakes new users make. dbt uses the ref and source macros to discover the ordering that transformations need to execute in, and if you don’t use them, you’ll be unable to benefit from dbt’s built-in lineage generation and pipeline execution. In the sample code throughout the remainder of this article, we’ll use ref statements in the dbt-converted versions of SQL statements, but it is an exercise for the reader to ensure that those models exist in their dbt projects. ##### **Sequential `INSERT`s to an existing table can be `UNION ALL`’ed together**[​](#sequential-inserts-to-an-existing-table-can-be-union-alled-together "Direct link to sequential-inserts-to-an-existing-table-can-be-union-alled-together") Since dbt models effectively perform a single `CREATE TABLE AS SELECT` (or if you break it down into steps, `CREATE`, then an `INSERT`), you may run into complexities if there are multiple `INSERT` statements in your transformation that all insert data into the same table. Fortunately, this is a simple thing to handle in dbt. Effectively, the logic is performing a `UNION ALL` between the `INSERT` queries. If I have a transformation flow that looks something like (ignore the contrived nature of the scenario): ```sql CREATE TABLE all_customers INSERT INTO all_customers SELECT * FROM us_customers INSERT INTO all_customers SELECT * FROM eu_customers ``` The dbt-ified version of this would end up looking something like: ```sql SELECT * FROM {{ ref('us_customers') }} UNION ALL SELECT * FROM {{ ref('eu_customers') }} ``` The logic is functionally equivalent. So if there’s another statement that `INSERT`s into a model that I’ve already created, I can just add that logic into a second `SELECT` statement that is just `UNION ALL`'ed with the first. Easy! #### Map UPDATEs[​](#map-updates "Direct link to Map UPDATEs") `UPDATE`s start to increase the complexity of your transformations, but fortunately, they’re pretty darn simple to migrate, as well. The thought process that you go through when translating an `UPDATE` is quite similar to how an `INSERT` works, but the logic for the `SELECT` list in the dbt model is primarily sourced from the content in the `SET` section of the `UPDATE` statement. Let’s look at a simple example: ```sql UPDATE orders SET type = 'return' WHERE total < 0 ``` The way to look at this is similar to an `INSERT`-`SELECT` statement. The table being updated is the model you want to modify, and since this is an `UPDATE`, that model has likely already been created, and you can either: * add to it with subsequent transformations * create an intermediate model that builds off of the original model – perhaps naming it something like `int_[entity]_[verb].sql`. The `SELECT` list should contain all of the columns for the table, but for the specific columns being updated by the DML, you’ll use the computation on the right side of the equals sign as the `SELECT`ed value. Then, you can use the target column name on the left of the equals sign as the column alias. If I were building an intermediate transformation from the above query would translate to something along the lines of: ```sql SELECT CASE WHEN total < 0 THEN 'return' ELSE type END AS type, order_id, order_date FROM {{ ref('stg_orders') }} ``` Since the `UPDATE` statement doesn’t modify every value of the type column, we use a `CASE` statement to apply the contents’ `WHERE` clause. We still want to select all of the columns that should end up in the target table. If we left one of the columns out, it wouldn’t be passed through to the target table at all due to dbt’s declarative approach. Sometimes, you may not be sure what all the columns are in a table, or in the situation as above, you’re only modifying a small number of columns relative to the total number of columns in the table. It can be cumbersome to list out every column in the table, but fortunately dbt contains some useful utility macros that can help list out the full column list of a table. Another way I could have written the model a bit more dynamically might be: ```sql SELECT {{ dbt_utils.star(from=ref('stg_orders'), except=['type']) }}, CASE WHEN total < 0 THEN 'return' ELSE type END AS type, FROM {{ ref('stg_orders') }} ``` The `dbt_utils.star()` macro will print out the full list of columns in the table, but skip the ones I’ve listed in the except list, which allows me to perform the same logic while writing fewer lines of code. This is a simple example of using dbt macros to simplify and shorten your code, and dbt can get a lot more sophisticated as you learn more techniques. Read more about the [dbt\_utils package](https://hub.getdbt.com/dbt-labs/dbt_utils/latest/) and the [star macro](https://github.com/dbt-labs/dbt-utils/tree/0.8.6/#star-source). #### Map DELETEs[​](#map-deletes "Direct link to Map DELETEs") One of the biggest differences between a procedural transformation and how dbt models data is that dbt, in general, will never destroy data. While there are ways to execute hard `DELETE`s in dbt that are outside of the scope of this article, the general best practice for handling deleted data is to just use soft deletes, and filter out soft-deleted data in a final transformation. Let’s consider a simple example query: ```sql DELETE FROM stg_orders WHERE order_status IS NULL ``` In a dbt model, you’ll need to first identify the records that should be deleted and then filter them out. There are really two primary ways you might translate this query: ```sql SELECT * FROM {{ ref('stg_orders') }} WHERE order_status IS NOT NULL ``` This first approach just inverts the logic of the DELETE to describe the set of records that should remain, instead of the set of records that should be removed. This ties back to the way dbt declaratively describes datasets. You reference the data that should be in a dataset, and the table or view gets created with that set of data. Another way you could achieve this is by marking the deleted records, and then filtering them out. For example: ```sql WITH soft_deletes AS ( SELECT *, CASE WHEN order_status IS NULL THEN true ELSE false END AS to_delete FROM {{ ref('stg_orders') }} ) SELECT * FROM soft_deletes WHERE to_delete = false ``` This approach flags all of the deleted records, and the final `SELECT` filters out any deleted data, so the resulting table contains only the remaining records. It’s a lot more verbose than just inverting the `DELETE` logic, but for complex `DELETE` logic, this ends up being a very effective way of performing the `DELETE` that retains historical context. It’s worth calling out that while this doesn’t enable a hard delete, hard deletes can be executed a number of ways, the most common being to execute a dbt [macros](https://docs.getdbt.com/docs/build/jinja-macros.md) via as a [run-operation](https://docs.getdbt.com/reference/commands/run-operation.md), or by using a [post-hook](https://docs.getdbt.com/reference/resource-configs/pre-hook-post-hook.md) to perform a `DELETE` statement after the records to-be-deleted have been marked. These are advanced approaches outside the scope of this guide. #### Map MERGEs[​](#map-merges "Direct link to Map MERGEs") dbt has a concept called [materialization](https://docs.getdbt.com/docs/build/materializations.md), which determines how a model is physically or logically represented in the warehouse. `INSERT`s, `UPDATE`s, and `DELETE`s will typically be accomplished using table or view materializations. For incremental workloads accomplished via commands like `MERGE` or `UPSERT`, dbt has a particular materialization called [incremental](https://docs.getdbt.com/docs/build/incremental-models.md). The incremental materialization is specifically used to handle incremental loads and updates to a table without recreating the entire table from scratch on every run. ##### Step 1: Map the MERGE like an INSERT/UPDATE to start[​](#step-1-map-the-merge-like-an-insertupdate-to-start "Direct link to Step 1: Map the MERGE like an INSERT/UPDATE to start") Before we get into the exact details of how to implement an incremental materialization, let’s talk about logic conversion. Extracting the logic of the `MERGE` and handling it as you would an `INSERT` or an `UPDATE` is the easiest way to get started migrating a `MERGE` command. . To see how the logic conversion works, we’ll start with an example `MERGE`. In this scenario, imagine a ride sharing app where rides are loaded into a details table daily, and tips may be updated at some later date, and need to be kept up-to-date: ```sql MERGE INTO ride_details USING ( SELECT ride_id, subtotal, tip FROM rides_to_load AS rtl ON ride_details.ride_id = rtl.ride_id WHEN MATCHED THEN UPDATE SET ride_details.tip = rtl.tip WHEN NOT MATCHED THEN INSERT (ride_id, subtotal, tip) VALUES (rtl.ride_id, rtl.subtotal, NVL(rtl.tip, 0, rtl.tip) ); ``` The content of the `USING` clause is a useful piece of code because that can easily be placed in a CTE as a starting point for handling the match statement. I find that the easiest way to break this apart is to treat each match statement as a separate CTE that builds on the previous match statements. We can ignore the `ON` clause for now, as that will only come into play once we get to a point where we’re ready to turn this into an incremental. As with `UPDATE`s and `INSERT`s, you can use the `SELECT` list and aliases to name columns appropriately for the target table, and `UNION` together `INSERT` statements (taking care to use `UNION`, rather than `UNION ALL` to avoid duplicates). The `MERGE` would end up translating to something like this: ```sql WITH using_clause AS ( SELECT ride_id, subtotal, tip FROM {{ ref('rides_to_load') }} ), updates AS ( SELECT ride_id, subtotal, tip FROM using_clause ), inserts AS ( SELECT ride_id, subtotal, NVL(tip, 0, tip) FROM using_clause ) SELECT * FROM updates UNION inserts ``` To be clear, this transformation isn’t complete. The logic here is similar to the `MERGE`, but will not actually do the same thing, since the updates and inserts CTEs are both selecting from the same source query. We’ll need to ensure we grab the separate sets of data as we transition to the incremental materialization. One important caveat is that dbt does not natively support `DELETE` as a `MATCH` action. If you have a line in your `MERGE` statement that uses `WHEN MATCHED THEN DELETE`, you’ll want to treat it like an update and add a soft-delete flag, which is then filtered out in a follow-on transformation. ##### Step 2: Convert to incremental materialization[​](#step-2-convert-to-incremental-materialization "Direct link to Step 2: Convert to incremental materialization") As mentioned above, incremental materializations are a little special in that when the target table does not exist, the materialization functions in nearly the same way as a standard table materialization, and executes a `CREATE TABLE AS SELECT` statement. If the target table does exist, however, the materialization instead executes a `MERGE` statement. Since a `MERGE` requires a `JOIN` condition between the `USING` clause and the target table, we need a way to specify how dbt determines whether or not a record triggers a match or not. That particular piece of information is specified in the dbt model configuration. We can add the following `config()` block to the top of our model to specify how it should build incrementally: ```sql {{ config( materialized='incremental', unique_key='ride_id', incremental_strategy='merge' ) }} ``` The three configuration fields in this example are the most important ones. * Setting `materialized='incremental'` tells dbt to apply UPSERT logic to the target table. * The `unique_key` should be a primary key of the target table. This is used to match records with the existing table. * `incremental_strategy` here is set to MERGE any existing rows in the target table with a value for the `unique_key` which matches the incoming batch of data. There are [various incremental strategies](https://docs.getdbt.com/docs/build/incremental-strategy.md) for different situations and warehouses. The bulk of the work in converting a model to an incremental materialization comes in determining how the logic should change for incremental loads versus full backfills or initial loads. dbt offers a special macro, `is_incremental()`, which evaluates false for initial loads or for backfills (called full refreshes in dbt parlance), but true for incremental loads. This macro can be used to augment the model code to adjust how data is loaded for subsequent loads. How that logic should be added will depend a little bit on how data is received. Some common ways might be: 1. The source table is truncated ahead of incremental loads, and only contains the data to be loaded in that increment. 2. The source table contains all historical data, and there is a load timestamp column that identifies new data to be loaded. In the first case, the work is essentially done already. Since the source table always contains only the new data to be loaded, the query doesn’t have to change for incremental loads. The second case, however, requires the use of the `is_incremental()` macro to correctly handle the logic. Taking the converted `MERGE` statement that we’d put together previously, we’d augment it to add this additional logic: ```sql WITH using_clause AS ( SELECT ride_id, subtotal, tip, max(load_timestamp) as load_timestamp FROM {{ ref('rides_to_load') }} {% if is_incremental() %} WHERE load_timestamp > (SELECT max(load_timestamp) FROM {{ this }}) {% endif %} ), updates AS ( SELECT ride_id, subtotal, tip, load_timestamp FROM using_clause {% if is_incremental() %} WHERE ride_id IN (SELECT ride_id FROM {{ this }}) {% endif %} ), inserts AS ( SELECT ride_id, subtotal, NVL(tip, 0, tip), load_timestamp FROM using_clause WHERE ride_id NOT IN (SELECT ride_id FROM updates) ) SELECT * FROM updates UNION inserts ``` There are a couple important concepts to understand here: 1. The code in the `is_incremental()` conditional block only executes for incremental executions of this model code. If the target table doesn’t exist, or if the `--full-refresh` option is used, that code will not execute. 2. `{{ this }}` is a special keyword in dbt that when used in a Jinja block, self-refers to the model for which the code is executing. So if you have a model in a file called `my_incremental_model.sql`, `{{ this }}` will refer to `my_incremental_model` (fully qualified with database and schema name if necessary). By using that keyword, we can leverage the current state of the target table to inform the source query. #### Migrate Stores procedures[​](#migrate-stores-procedures "Direct link to Migrate Stores procedures") The techniques shared above are useful ways to get started converting the individual DML statements that are often found in stored procedures. Using these types of patterns, legacy procedural code can be rapidly transitioned to dbt models that are much more readable, maintainable, and benefit from software engineering best practices like DRY principles. Additionally, once transformations are rewritten as dbt models, it becomes much easier to test the transformations to ensure that the data being used downstream is high-quality and trustworthy. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Move from dbt Core to the dbt platform: Get started [Back to guides](https://docs.getdbt.com/guides.md) Total estimated time: 3-4 hours Migration dbt Core dbt platform Intermediate [Menu ]() #### Introduction[​](#introduction "Direct link to Introduction") Moving from dbt Core to dbt streamlines analytics engineering workflows by allowing teams to develop, test, deploy, and explore data products using a single, fully managed software service. The data layer is the foundation for trusted analytics and AI; dbt platform gives you the governance, shared definitions, and reliability to scale both — without the hidden cost of self-hosting in engineer hours and wasted compute. Explore our 3-part-guide series on moving from dbt Core to dbt. This series is ideal for users aiming for streamlined workflows and enhanced analytics: | Guide | Information | Audience | | --------------------------------------------------------------------------------------------------------------- | -------------------------------------------------------------------------------------------- | ------------------------------------------------- | | [Move from dbt Core to dbt platform: What you need to know](https://docs.getdbt.com/guides/core-migration-2.md) | Understand the considerations and methods needed in your move from dbt Core to dbt platform. | Team leads
Admins | | [Move from dbt Core to dbt platform: Get started](https://docs.getdbt.com/guides/core-migration-1.md?step=1) | Learn the steps needed to move from dbt Core to dbt platform. | Developers
Data engineers
Data analysts | | [Move from dbt Core to dbt platform: Optimization tips](https://docs.getdbt.com/guides/core-migration-3.md) | Learn how to optimize your dbt experience with common scenarios and useful tips. | Everyone | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | ##### Why move to the dbt platform?[​](#why-move-to-the-dbt-platform "Direct link to Why move to the dbt platform?") If your team is using dbt Core today, you could be reading this guide because: * You've realized the burden of maintaining that deployment. * The person who set it up has since left. * You're interested in what dbt could do to better manage the complexity of your dbt deployment, democratize access to more contributors, or improve security and governance practices. * You need a governed data foundation for AI—shared definitions, lineage, and testing so analytics and AI give answers the business can trust. Self-hosting hides its true cost in engineer hours and wasted compute. dbt platform eliminates that overhead with managed infrastructure, state-aware orchestration, and browser-based development so more people can contribute without you being the bottleneck. The data layer is the AI layer—make sure it's tested, defined, and trusted end to end. Moving from dbt Core to dbt simplifies workflows by providing a fully managed environment that improves collaboration, security, and orchestration. With dbt, you gain access to features like cross-team collaboration ([dbt Mesh](https://docs.getdbt.com/best-practices/how-we-mesh/mesh-1-intro.md)), version management, streamlined CI/CD, [Catalog](https://docs.getdbt.com/docs/explore/explore-projects.md) for comprehensive insights, and more — making it easier to manage complex dbt deployments and scale your data workflows efficiently. It's ideal for teams looking to reduce the burden of maintaining their own infrastructure while enhancing governance and productivity.  What are dbt and dbt Core? * dbt is the fastest and most reliable way to deploy dbt. It enables you to develop, test, deploy, and explore data products using a single, fully managed service. Infrastructure is managed for you — no custom scripts or fragile orchestration. State-aware orchestration only builds what's changed, so you waste less compute and time. Browser-based development and Copilot open up development to analysts, so you're no longer the bottleneck for every change. With end-to-end lineage, shared metric definitions, and CI that catches regressions before production, you spend less time debugging and more time building. dbt also supports: * Development experiences tailored to multiple personas ([Studio IDE](https://docs.getdbt.com/docs/cloud/studio-ide/develop-in-studio.md) or [dbt CLI](https://docs.getdbt.com/docs/cloud/cloud-cli-installation.md)) * Out-of-the-box [CI/CD workflows](https://docs.getdbt.com/docs/deploy/ci-jobs.md) * The [Semantic Layer](https://docs.getdbt.com/docs/use-dbt-semantic-layer/dbt-sl.md) for consistent metrics * Domain ownership of data with multi-project [dbt Mesh](https://docs.getdbt.com/best-practices/how-we-mesh/mesh-1-intro.md) setups * [Catalog](https://docs.getdbt.com/docs/explore/explore-projects.md) for easier data discovery and understanding Learn more about [dbt features](https://docs.getdbt.com/docs/cloud/about-cloud/dbt-cloud-features.md). * dbt Core is an open-source tool that enables data teams to define and execute data transformations in a cloud data warehouse following analytics engineering best practices. While this can work well for 'single players' and small technical teams, all development happens on a command-line interface (CLI), and production deployments must be self-hosted and maintained. You absorb the cost of every upgrade, every broken CI run, and every request that pulls you away from real work: maintaining infrastructure, debugging the CI pipeline, and fielding every change that requires CLI access. Compute runs unchecked, upgrades are risky, and there's no easy way to trace what broke or why. This requires significant, costly work that adds up over time to maintain and scale — and without governance, shared definitions, or reliable testing. #### What you'll learn[​](#what-youll-learn "Direct link to What you'll learn") This guide outlines the steps you need to take to move from dbt Core to dbt and highlights the necessary technical changes: * [Account setup](https://docs.getdbt.com/guides/core-migration-1.md?step=4): Learn how to create a dbt account, invite team members, and configure it for your team. * [Data platform setup](https://docs.getdbt.com/guides/core-migration-1.md?step=5): Find out about connecting your data platform to dbt. * [Git setup](https://docs.getdbt.com/guides/core-migration-1.md?step=6): Learn to link your dbt project's Git repository with dbt. * [Developer setup:](https://docs.getdbt.com/guides/core-migration-1.md?step=7) Understand the setup needed for developing in dbt. * [Environment variables](https://docs.getdbt.com/guides/core-migration-1.md?step=8): Discover how to manage environment variables in dbt, including their priority. * [Orchestration setup](https://docs.getdbt.com/guides/core-migration-1.md?step=9): Learn how to prepare your dbt environment and jobs for orchestration. * [Models configuration](https://docs.getdbt.com/guides/core-migration-1.md?step=10): Get insights on validating and running your models in dbt, using either the Studio IDE or dbt CLI. * [What's next?](https://docs.getdbt.com/guides/core-migration-1.md?step=11): Summarizes key takeaways and introduces what to expect in the following guides. ##### Related docs[​](#related-docs "Direct link to Related docs") * [Learn dbt](https://learn.getdbt.com) on-demand video learning. * Book [expert-led demos](https://www.getdbt.com/resources/dbt-cloud-demos-with-experts) and insights * Work with the [dbt Labs' Professional Services](https://www.getdbt.com/dbt-labs/services) team to support your data organization and migration. #### Prerequisites[​](#prerequisites "Direct link to Prerequisites") * You have an existing dbt Core project connected to a Git repository and data platform supported in [dbt](https://docs.getdbt.com/docs/cloud/connect-data-platform/about-connections.md). * You have a dbt account. **[Don't have one? Start your free trial today](https://www.getdbt.com/signup)**! #### Account setup[​](#account-setup "Direct link to Account setup") This section outlines the steps to set up your dbt account and configure it for your team. 1. [Create your dbt account](https://www.getdbt.com/signup). 2. Provide user [access](https://docs.getdbt.com/docs/cloud/manage-access/about-user-access.md) and [invite users](https://docs.getdbt.com/docs/cloud/manage-access/about-user-access.md) to your dbt account and project. 3. Configure [Single Sign-On (SSO)](https://docs.getdbt.com/docs/cloud/manage-access/sso-overview.md) or [Role-based access control (RBAC)](https://docs.getdbt.com/docs/cloud/manage-access/about-user-access.md#role-based-access-control) for easy and secure access. [Enterprise](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing")[Enterprise +](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing") * This removes the need to save passwords and secret environment variables locally. ##### Additional configuration[​](#additional-configuration "Direct link to Additional configuration") Explore these additional configurations for performance and reliability improvements: 1. In **Account settings**, enable [partial parsing](https://docs.getdbt.com/docs/cloud/account-settings.md#partial-parsing) to only reparse changed files, saving time. 2. In **Account settings**, enable [Git repo caching](https://docs.getdbt.com/docs/cloud/account-settings.md#git-repository-caching) for job reliability & third-party outage protection. [Enterprise](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing")[Enterprise +](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing") #### Data platform setup[​](#data-platform-setup "Direct link to Data platform setup") This section outlines the considerations and methods to connect your data platform to dbt. 1. In dbt, set up your [data platform connections](https://docs.getdbt.com/docs/cloud/connect-data-platform/about-connections.md) and [environment variables](https://docs.getdbt.com/docs/build/environment-variables.md). dbt can connect with a variety of data platform providers including: * [AlloyDB](https://docs.getdbt.com/docs/cloud/connect-data-platform/connect-postgresql-alloydb.md) * [Amazon Athena](https://docs.getdbt.com/docs/cloud/connect-data-platform/connect-amazon-athena.md) * [Amazon Redshift](https://docs.getdbt.com/docs/cloud/connect-data-platform/connect-redshift.md) * [Apache Spark](https://docs.getdbt.com/docs/cloud/connect-data-platform/connect-apache-spark.md) * [Azure Synapse Analytics](https://docs.getdbt.com/docs/cloud/connect-data-platform/connect-azure-synapse-analytics.md) * [Databricks](https://docs.getdbt.com/docs/cloud/connect-data-platform/connect-databricks.md) * [Google BigQuery](https://docs.getdbt.com/docs/cloud/connect-data-platform/connect-bigquery.md) * [Microsoft Fabric](https://docs.getdbt.com/docs/cloud/connect-data-platform/connect-microsoft-fabric.md) * [PostgreSQL](https://docs.getdbt.com/docs/cloud/connect-data-platform/connect-postgresql-alloydb.md) * [Snowflake](https://docs.getdbt.com/docs/cloud/connect-data-platform/connect-snowflake.md) * [Starburst or Trino](https://docs.getdbt.com/docs/cloud/connect-data-platform/connect-starburst-trino.md) * [Teradata](https://docs.getdbt.com/docs/cloud/connect-data-platform/connect-teradata.md) 2. You can verify your data platform connections by clicking the **Test connection** button in your deployment and development credentials settings. ##### Additional configuration[​](#additional-configuration-1 "Direct link to Additional configuration") Explore these additional configurations to optimize your data platform setup further: 1. Use [OAuth connections](https://docs.getdbt.com/docs/cloud/manage-access/set-up-snowflake-oauth.md), which enables secure authentication using your data platform's SSO. [Enterprise](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing")[Enterprise +](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing") #### Git setup[​](#git-setup "Direct link to Git setup") Your existing dbt project source code should live in a Git repository. In this section, you will connect your existing dbt project source code from Git to dbt. 1. Ensure your dbt project is in a Git repository. 2. In **Account settings**, select **Integrations** to [connect your Git repository](https://docs.getdbt.com/docs/cloud/git/git-configuration-in-dbt-cloud.md) to dbt: * (**Recommended**) Connect with one of the [native integrations](https://docs.getdbt.com/docs/cloud/git/git-configuration-in-dbt-cloud.md) in dbt (such as GitHub, GitLab, and Azure DevOps). This method is preferred for its simplicity, security features (including secure OAuth logins and automated workflows like CI builds on pull requests), and overall ease of use. * [Import a Git repository](https://docs.getdbt.com/docs/cloud/git/import-a-project-by-git-url.md) from any valid Git URL that points to a dbt project. #### Developer setup[​](#developer-setup "Direct link to Developer setup") This section highlights the development configurations you'll need for your dbt project. The following categories are covered in this section: * [dbt environments](https://docs.getdbt.com/guides/core-migration-1.md?step=7#dbt-cloud-environments) * [Initial setup steps](https://docs.getdbt.com/guides/core-migration-1.md?step=7#initial-setup-steps) * [Additional configuration](https://docs.getdbt.com/guides/core-migration-1.md?step=7#additional-configuration-2) * [dbt commands](https://docs.getdbt.com/guides/core-migration-1.md?step=7#dbt-cloud-commands) ##### dbt environments[​](#dbt-environments "Direct link to dbt environments") The most common data environments are production, staging, and development. The way dbt Core manages [environments](https://docs.getdbt.com/docs/environments-in-dbt.md) is through `target`, which are different sets of connection details. [dbt environments](https://docs.getdbt.com/docs/dbt-cloud-environments.md) go further by: * Integrating with features such as job scheduling or version control, making it easier to manage the full lifecycle of your dbt projects within a single platform. * Streamlining the process of switching between development, staging, and production contexts. * Making it easy to configure environments through the dbt UI instead of manually editing the `profiles.yml` file. You can also [set up](https://docs.getdbt.com/reference/dbt-jinja-functions/target.md) or [customize](https://docs.getdbt.com/docs/build/custom-target-names.md) target names in dbt. * Adding `profiles.yml` attributes to dbt environment settings with [Extended Attributes](https://docs.getdbt.com/docs/dbt-cloud-environments.md#extended-attributes). * Using [Git repo caching](https://docs.getdbt.com/docs/cloud/account-settings.md#git-repository-caching) to protect you from third-party outages, Git auth failures, and more. [Enterprise](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing")[Enterprise +](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing") ##### Initial setup steps[​](#initial-setup-steps "Direct link to Initial setup steps") 1. **Set up development environment** — Set up your [development](https://docs.getdbt.com/docs/dbt-cloud-environments.md#create-a-development-environment) environment and [development credentials](https://docs.getdbt.com/docs/cloud/studio-ide/develop-in-studio.md#access-the-cloud-ide). You'll need this to access your dbt project and start developing. 2. **dbt Core version** — In your dbt environment, select a [release track](https://docs.getdbt.com/docs/dbt-versions/cloud-release-tracks.md) for ongoing dbt version upgrades. If your team plans to use both dbt Core and dbt for developing or deploying your dbt project, you can run `dbt --version` in the command line to find out which version of dbt Core you're using. * When using dbt Core, you need to think about which version you're using and manage your own upgrades. When using dbt, leverage [release tracks](https://docs.getdbt.com/docs/dbt-versions/cloud-release-tracks.md) so you don't have to. 3. **Connect to your data platform** — When using dbt, you can [connect to your data platform](https://docs.getdbt.com/docs/cloud/connect-data-platform/about-connections.md) directly in the UI. * Each environment is roughly equivalent to an entry in your `profiles.yml` file. This means you don't need a `profiles.yml` file in your project. 4. **Development tools** — Set up your development workspace with the [dbt CLI](https://docs.getdbt.com/docs/cloud/cloud-cli-installation.md) (command line interface or code editor) or [Studio IDE](https://docs.getdbt.com/docs/cloud/studio-ide/develop-in-studio.md) (browser-based) to build, test, run, and version control your dbt code in your tool of choice. * If you've previously installed dbt Core, the [dbt CLI installation doc](https://docs.getdbt.com/docs/cloud/cloud-cli-installation.md?install=pip#install-dbt-cloud-cli) has more information on how to install the dbt CLI, create aliases, or uninstall dbt Core for a smooth transition. ##### Additional configuration[​](#additional-configuration-2 "Direct link to Additional configuration") Explore these additional configurations to optimize your developer setup further: 1. **Custom target names** — Using [`custom target.names`](https://docs.getdbt.com/docs/build/custom-target-names.md) in your dbt projects helps identify different environments (like development, staging, and production). While you can specify the `custom target.name` values in your developer credentials or orchestration setup, we recommend using [environment variables](https://docs.getdbt.com/docs/build/environment-variables.md) as the preferred method. They offer a clearer way to handle different environments and are better supported by dbt's partial parsing feature, unlike using [`{{ target }}` logic](https://docs.getdbt.com/reference/dbt-jinja-functions/target.md) which is meant for defining the data warehouse connection. ##### dbt commands[​](#dbt-commands "Direct link to dbt commands") 1. Review the [dbt commands](https://docs.getdbt.com/reference/dbt-commands.md) supported for dbt development. For example, `dbt init` isn't needed in dbt as you can create a new project directly in dbt. #### Environment variables[​](#environment-variables "Direct link to Environment variables") This section will help you understand how to set up and manage dbt environment variables for your project. The following categories are covered: * [Environment variables in dbt](https://docs.getdbt.com/guides/core-migration-1.md?step=7#environment-variables-in-dbt-cloud) * [dbt environment variables order of precedence](https://docs.getdbt.com/guides/core-migration-1.md?step=7#dbt-cloud-environment-variables-order-of-precedence) * [Set environment variables in dbt](https://docs.getdbt.com/guides/core-migration-1.md?step=7#set-environment-variables-in-dbt-cloud) In dbt, you can set [environment variables](https://docs.getdbt.com/docs/build/environment-variables.md) in the dbt user interface (UI). Read [Set up environment variables](#set-environment-variables-in-dbt-cloud) for more info. In dbt Core, environment variables, or the [`env_var` function](https://docs.getdbt.com/reference/dbt-jinja-functions/env_var.md), are defined manually by the developer or within the external application running dbt. ##### Environment variables in dbt[​](#environment-variables-in-dbt "Direct link to Environment variables in dbt") * dbt environment variables must be prefixed with `DBT_` (including `DBT_ENV_CUSTOM_ENV_` or `DBT_ENV_SECRET`). * If your dbt Core environment variables don't follow this naming convention, perform a ["find and replace"](https://docs.getdbt.com/docs/cloud/studio-ide/develop-in-studio.md#dbt-cloud-ide-features) in your project to make sure all references to these environment variables contain the proper naming conventions. * dbt secures environment variables that enable more flexible configuration of data warehouse connections or git provider integrations, offering additional measures for sensitive values, such as prefixing keys with `DBT_ENV_SECRET`to obscure them in logs and the UI. [![Setting project level and environment level values]( "Setting project level and environment level values")](#)Setting project level and environment level values ##### dbt environment variables order of precedence[​](#dbt-environment-variables-order-of-precedence "Direct link to dbt environment variables order of precedence") Environment variables in dbt are managed with a clear [order of precedence](https://docs.getdbt.com/docs/build/environment-variables.md#setting-and-overriding-environment-variables), allowing users to define values at four levels (highest to lowest order of precedence): * The job level (job override) or in the Studio IDE for an individual developer (personal override). *Highest precedence* * The environment level, which can be overridden by the job level or personal override. * A project-wide default value, which can be overridden by the environment level, job level, or personal override. * The optional default argument supplied to the `env_var` Jinja function in the code. *Lowest precedence* [![Environment variables order of precedence]( "Environment variables order of precedence")](#)Environment variables order of precedence ##### Set environment variables in dbt[​](#set-environment-variables-in-dbt "Direct link to Set environment variables in dbt") * To set these variables for an entire project or specific environments, navigate to **Deploy** > **Environments** > **Environment variables** tab. * To set these variables at the job level, navigate to **Deploy** > **Jobs** > **Select your job** > **Settings** > **Advanced settings**. * To set these variables at the personal override level, navigate to **Profile Settings** > **Credentials** > **Select your project** > **Environment variables**. #### Orchestration setup[​](#orchestration-setup "Direct link to Orchestration setup") This section outlines the considerations and methods to set up your dbt environments and jobs for orchestration. The following categories are covered in this section: * [dbt environments](https://docs.getdbt.com/guides/core-migration-1.md?step=8#dbt-cloud-environments-1) * [Initial setup steps](https://docs.getdbt.com/guides/core-migration-1.md?step=8#initial-setup-steps-1) * [Additional configuration](https://docs.getdbt.com/guides/core-migration-1.md?step=8#additional-configuration-3) * [CI/CD setup](https://docs.getdbt.com/guides/core-migration-1.md?step=8#cicd-setup) ##### dbt environments[​](#dbt-environments-1 "Direct link to dbt environments") To use the [dbt's job scheduler](https://docs.getdbt.com/docs/deploy/job-scheduler.md), set up one environment as the production environment. This is the [deployment](https://docs.getdbt.com/docs/deploy/deploy-environments.md) environment. You can set up multiple environments for different stages of your deployment pipeline, such as development, staging/QA, and production. ##### Initial setup steps[​](#initial-setup-steps-1 "Direct link to Initial setup steps") 1. **dbt Core version** — In your environment settings, configure dbt with the same dbt Core version. * Once your full migration is complete, we recommend upgrading your environments to [release tracks](https://docs.getdbt.com/docs/dbt-versions/cloud-release-tracks.md) to always get the latest features and more. You only need to do this once. 2. **Configure your jobs** — [Create jobs](https://docs.getdbt.com/docs/deploy/deploy-jobs.md#create-and-schedule-jobs) for scheduled or event-driven dbt jobs. You can use cron execution, manual, pull requests, or trigger on the completion of another job. * Note that alongside [jobs in dbt](https://docs.getdbt.com/docs/deploy/jobs.md), discover other ways to schedule and run your dbt jobs with the help of other tools. Refer to [Integrate with other tools](https://docs.getdbt.com/docs/deploy/deployment-tools.md) for more information. ##### Additional configuration[​](#additional-configuration-3 "Direct link to Additional configuration") Explore these additional configurations to optimize your dbt orchestration setup further: 1. **Custom target names** — Use environment variables to set a `custom target.name` for every [corresponding dbt job](https://docs.getdbt.com/docs/build/custom-target-names.md) at the environment level. 2. **dbt commands** — Add any relevant [dbt commands](https://docs.getdbt.com/docs/deploy/job-commands.md) to execute your dbt jobs runs. 3. **Notifications** — Set up [notifications](https://docs.getdbt.com/docs/deploy/job-notifications.md) by configuring email and Slack alerts to monitor your jobs. 4. **Monitoring tools** — Use [monitoring tools](https://docs.getdbt.com/docs/deploy/monitor-jobs.md) like run history, job retries, job chaining, dashboard status tiles, and more for a seamless orchestration experience. 5. **API access** — Create [API auth tokens](https://docs.getdbt.com/docs/dbt-cloud-apis/authentication.md) and access to [dbt APIs](https://docs.getdbt.com/docs/dbt-cloud-apis/overview.md) as needed. [Starter](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing")[Enterprise](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing") 6. **Catalog** — If you use [Catalog](https://docs.getdbt.com/docs/explore/explore-projects.md) and run production jobs with an external orchestrator, ensure your production jobs run `dbt run` or `dbt build` to update and view models and their [metadata](https://docs.getdbt.com/docs/explore/explore-projects.md#generate-metadata) in Catalog. Running `dbt compile` alone will not update model metadata. In addition, features like column-level lineage also requires catalog metadata produced through running `dbt docs generate`. [Starter](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing")[Enterprise](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing") ##### CI/CD setup[​](#cicd-setup "Direct link to CI/CD setup") Building a custom solution to efficiently check code upon pull requests is complicated. With dbt, you can enable [continuous integration / continuous deployment (CI/CD)](https://docs.getdbt.com/docs/deploy/continuous-integration.md) and configure dbt to run your dbt projects in a temporary schema when new commits are pushed to open pull requests. [![Workflow of continuous integration in dbt](/img/docs/dbt-cloud/using-dbt-cloud/ci-workflow.png?v=2 "Workflow of continuous integration in dbt")](#)Workflow of continuous integration in dbt This build-on-PR functionality is a great way to catch bugs before deploying to production, and an essential tool for data practitioners. 1. Set up an integration with a native Git application (such as Azure DevOps, GitHub, GitLab) and a CI environment in dbt. 2. Create [a CI/CD job](https://docs.getdbt.com/docs/deploy/ci-jobs.md) to automate quality checks before code is deployed to production. 3. Run your jobs in a production environment to fully implement CI/CD. Future pull requests will also leverage the last production runs to compare against. #### Model development and discovery[​](#model-development-and-discovery "Direct link to Model development and discovery") In this section, you'll be able to validate whether your models run or compile correctly in your development tool of choice: The [Studio IDE](https://docs.getdbt.com/docs/cloud/studio-ide/develop-in-studio.md) or [dbt CLI](https://docs.getdbt.com/docs/cloud/cloud-cli-installation.md). You'll want to make sure you set up your [development environment and credentials](https://docs.getdbt.com/docs/dbt-cloud-environments.md#set-developer-credentials). 1. In your [development tool](https://docs.getdbt.com/docs/cloud/about-develop-dbt.md) of choice, you can review your dbt project, ensure it's set up correctly, and run some [dbt commands](https://docs.getdbt.com/reference/dbt-commands.md): * Run `dbt compile` to make sure your project compiles correctly. * Run a few models in the Studio IDE or dbt CLI to ensure you're experiencing accurate results in development. 2. Once your first job has successfully run in your production environment, use [Catalog](https://docs.getdbt.com/docs/explore/explore-projects.md) to view your project's [resources](https://docs.getdbt.com/docs/build/projects.md) (such as models, tests, and metrics) and their data lineage to gain a better understanding of its latest production state. [Starter](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing")[Enterprise](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing") #### What's next?[​](#whats-next "Direct link to What's next?") Congratulations on completing the first part of your move to dbt 🎉! You have learned: * How to set up your dbt account * How to connect your data platform and Git repository * How to configure your development, orchestration, and CI/CD environments * How to set up environment variables and validate your models For the next steps, you can continue exploring our 3-part-guide series on moving from dbt Core to dbt: | Guide | Information | Audience | | --------------------------------------------------------------------------------------------------------------- | -------------------------------------------------------------------------------------------- | ------------------------------------------------- | | [Move from dbt Core to dbt platform: What you need to know](https://docs.getdbt.com/guides/core-migration-2.md) | Understand the considerations and methods needed in your move from dbt Core to dbt platform. | Team leads
Admins | | [Move from dbt Core to dbt platform: Get started](https://docs.getdbt.com/guides/core-migration-1.md?step=1) | Learn the steps needed to move from dbt Core to dbt platform. | Developers
Data engineers
Data analysts | | [Move from dbt Core to dbt platform: Optimization tips](https://docs.getdbt.com/guides/core-migration-3.md) | Learn how to optimize your dbt experience with common scenarios and useful tips. | Everyone | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | ##### Why move to the dbt platform?[​](#why-move-to-the-dbt-platform "Direct link to Why move to the dbt platform?") If your team is using dbt Core today, you could be reading this guide because: * You've realized the burden of maintaining that deployment. * The person who set it up has since left. * You're interested in what dbt could do to better manage the complexity of your dbt deployment, democratize access to more contributors, or improve security and governance practices. * You need a governed data foundation for AI—shared definitions, lineage, and testing so analytics and AI give answers the business can trust. Self-hosting hides its true cost in engineer hours and wasted compute. dbt platform eliminates that overhead with managed infrastructure, state-aware orchestration, and browser-based development so more people can contribute without you being the bottleneck. The data layer is the AI layer—make sure it's tested, defined, and trusted end to end. Moving from dbt Core to dbt simplifies workflows by providing a fully managed environment that improves collaboration, security, and orchestration. With dbt, you gain access to features like cross-team collaboration ([dbt Mesh](https://docs.getdbt.com/best-practices/how-we-mesh/mesh-1-intro.md)), version management, streamlined CI/CD, [Catalog](https://docs.getdbt.com/docs/explore/explore-projects.md) for comprehensive insights, and more — making it easier to manage complex dbt deployments and scale your data workflows efficiently. It's ideal for teams looking to reduce the burden of maintaining their own infrastructure while enhancing governance and productivity. ##### Related docs[​](#related-docs-1 "Direct link to Related docs") * [Learn dbt](https://learn.getdbt.com) video courses for on-demand learning. * Book [expert-led demos](https://www.getdbt.com/resources/dbt-cloud-demos-with-experts) and insights. * Work with the [dbt Labs' Professional Services](https://www.getdbt.com/dbt-labs/services) team to support your data organization and migration. * [How dbt compares with dbt Core](https://www.getdbt.com/product/dbt-core-vs-dbt-cloud) for a detailed comparison of dbt Core and dbt. * Subscribe to the [dbt RSS alerts](https://status.getdbt.com/) #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Move from dbt Core to the dbt platform: Optimization tips [Back to guides](https://docs.getdbt.com/guides.md) Migration dbt Core dbt platform Intermediate [Menu ]() #### Introduction[​](#introduction "Direct link to Introduction") Moving from dbt Core to dbt streamlines analytics engineering workflows by allowing teams to develop, test, deploy, and explore data products using a single, fully managed software service. It's not just better tooling — it's about lowering total cost of ownership, powering AI with trusted data, and scaling with governed self-service. Explore our 3-part-guide series on moving from dbt Core to dbt. The series is ideal for users aiming for streamlined workflows and enhanced analytics: | Guide | Information | Audience | | --------------------------------------------------------------------------------------------------------------- | -------------------------------------------------------------------------------------------- | ------------------------------------------------- | | [Move from dbt Core to dbt platform: What you need to know](https://docs.getdbt.com/guides/core-migration-2.md) | Understand the considerations and methods needed in your move from dbt Core to dbt platform. | Team leads
Admins | | [Move from dbt Core to dbt platform: Get started](https://docs.getdbt.com/guides/core-migration-1.md?step=1) | Learn the steps needed to move from dbt Core to dbt platform. | Developers
Data engineers
Data analysts | | [Move from dbt Core to dbt platform: Optimization tips](https://docs.getdbt.com/guides/core-migration-3.md) | Learn how to optimize your dbt experience with common scenarios and useful tips. | Everyone | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | ##### Why move to the dbt platform?[​](#why-move-to-the-dbt-platform "Direct link to Why move to the dbt platform?") If your team is using dbt Core today, you could be reading this guide because: * You've realized the burden of maintaining that deployment. * The person who set it up has since left. * You're interested in what dbt could do to better manage the complexity of your dbt deployment, democratize access to more contributors, or improve security and governance practices. * You need a governed data foundation for AI—shared definitions, lineage, and testing so analytics and AI give answers the business can trust. Self-hosting hides its true cost in engineer hours and wasted compute. dbt platform eliminates that overhead with managed infrastructure, state-aware orchestration, and browser-based development so more people can contribute without you being the bottleneck. The data layer is the AI layer—make sure it's tested, defined, and trusted end to end. Moving from dbt Core to dbt simplifies workflows by providing a fully managed environment that improves collaboration, security, and orchestration. With dbt, you gain access to features like cross-team collaboration ([dbt Mesh](https://docs.getdbt.com/best-practices/how-we-mesh/mesh-1-intro.md)), version management, streamlined CI/CD, [Catalog](https://docs.getdbt.com/docs/explore/explore-projects.md) for comprehensive insights, and more — making it easier to manage complex dbt deployments and scale your data workflows efficiently. It's ideal for teams looking to reduce the burden of maintaining their own infrastructure while enhancing governance and productivity. #### What you'll learn[​](#what-youll-learn "Direct link to What you'll learn") You may have already started your move to dbt and are looking for tips to help you optimize your dbt experience. This guide includes tips and caveats for the following areas: * [Adapters and connections](https://docs.getdbt.com/guides/core-migration-3.md?step=3) * [Development tools](https://docs.getdbt.com/guides/core-migration-3.md?step=4) * [Orchestration](https://docs.getdbt.com/guides/core-migration-3.md?step=5) * [Mesh](https://docs.getdbt.com/guides/core-migration-3.md?step=6) * [Semantic Layer](https://docs.getdbt.com/guides/core-migration-3.md?step=7) * [Catalog](https://docs.getdbt.com/guides/core-migration-3.md?step=8) #### Adapters and connections[​](#adapters-and-connections "Direct link to Adapters and connections") In dbt, you can natively connect to your data platform and test its [connection](https://docs.getdbt.com/docs/connect-adapters.md) with a click of a button. This is especially useful for users who are new to dbt or are looking to streamline their connection setup. Here are some tips and caveats to consider: ##### Tips[​](#tips "Direct link to Tips") * Manage [dbt versions](https://docs.getdbt.com/docs/dbt-versions/upgrade-dbt-version-in-cloud.md) and ensure team collaboration with dbt's one-click feature, eliminating the need for manual updates and version discrepancies. Select a [release track](https://docs.getdbt.com/docs/dbt-versions/cloud-release-tracks.md) for ongoing updates, to always stay up to date with fixes and (optionally) get early access to new functionality for your dbt project. * dbt supports a whole host of [cloud providers](https://docs.getdbt.com/docs/cloud/connect-data-platform/about-connections.md), including Snowflake, Databricks, BigQuery, Fabric, and Redshift (to name a few). * Use [Extended Attributes](https://docs.getdbt.com/docs/deploy/deploy-environments.md#extended-attributes) to set a flexible [profiles.yml](https://docs.getdbt.com/docs/local/profiles.yml.md) snippet in your dbt environment settings. It gives you more control over environments (both deployment and development) and extends how dbt connects to the data platform within a given environment. * For example, if you have a field in your `profiles.yml` that you’d like to add to the dbt adapter user interface, you can use Extended Attributes to set it. ##### Caveats[​](#caveats "Direct link to Caveats") * Not all parameters are available for adapters. * A project can only use one warehouse type. #### Development tools[​](#development-tools "Direct link to Development tools") dbt empowers data practitioners to develop in the tool of their choice. It ships with a [dbt CLI](https://docs.getdbt.com/docs/cloud/cloud-cli-installation.md) (local) or [Studio IDE](https://docs.getdbt.com/docs/cloud/studio-ide/develop-in-studio.md) (browser-based) to build, test, run, and version control your dbt projects. Both development tools are tailored to suit different audiences and preferences within your team. To streamline your team's workflow, it's important to know who will prefer the Studio IDE and who might lean towards the dbt CLI. This section aims to clarify these preferences. ##### Studio IDE[​](#studio-ide "Direct link to Studio IDE") A web-based interface for building, testing, running, and version-controlling dbt projects. It compiles dbt code into SQL and executes it directly on your database. The Studio IDE makes developing fast and easy for new and seasoned data practitioners to build and test changes. **Who might prefer the Studio IDE?** * New dbt users or those transitioning from other tools who appreciate a more guided experience through a browser-based interface. * Team members focused on speed and convenience for getting started with a new or existing project. * Individuals who prioritize direct feedback from the Studio IDE, such as seeing unsaved changes. **Key features** * The Studio IDE has simplified Git functionality: * Create feature branches from the branch configured in the development environment. * View saved but not-committed code changes directly in the Studio IDE. * [Format or lint](https://docs.getdbt.com/docs/cloud/studio-ide/lint-format.md) your code with `sqlfluff` or `sqlfmt`. This includes support for adding your custom linting rules. * Allows users to natively [defer to production](https://docs.getdbt.com/docs/cloud/about-cloud-develop-defer.md#defer-in-dbt-cloud-cli) metadata directly in their development workflows, reducing the number of objects. * Support running multiple dbt commands at the same time through [safe parallel execution](https://docs.getdbt.com/reference/dbt-commands.md#parallel-execution), a [feature](https://docs.getdbt.com/docs/cloud/about-cloud/dbt-cloud-features.md) available in dbt's infrastructure. In contrast, `dbt-core` *doesn't support* safe parallel execution for multiple invocations in the same process. The Studio IDE provides a simplified interface that's accessible to all users, regardless of their technical background. However, there are some capabilities that are intentionally not available in the Studio IDE due to its focus on simplicity and ease of use: * Pre-commit for automated checks before *committing* code is not available (yet). * Mass-generating files / interacting with the file system are not available. * Combining/piping commands, such as `dbt run -s (bash command)`, is not available. ##### dbt CLI[​](#dbt-cli "Direct link to dbt CLI") The dbt CLI allows you to run dbt [commands](https://docs.getdbt.com/reference/dbt-commands.md#available-commands) against your dbt development environment from your local command line. For users who seek full control over their development environment and ideal for those comfortable with the command line. When moving from dbt Core to dbt, make sure you check the `.gitignore` file contains the [necessary folders](https://docs.getdbt.com/docs/cloud/git/version-control-basics.md#the-gitignore-file). dbt Core doesn't interact with git so dbt doesn't automatically add or verify entries in the `.gitignore` file. Additionally, if the repository already contains dbt code and doesn't require initialization, dbt won't add any missing entries to the `.gitignore file`. **Who might prefer the dbt CLI?** * Data practitioners accustomed to working with a specific set of development tooling. * Users looking for granular control over their Git workflows (such as pre-commits for automated checks before committing code). * Data practitioners who need to perform complex operations, like mass file generation or specific command combinations. **Key features** * Allows users to run dbt commands against their dbt development environment from their local command line with minimal configuration. * Allows users to natively [defer to production](https://docs.getdbt.com/docs/cloud/about-cloud-develop-defer.md#defer-in-dbt-cloud-cli) metadata directly in their development workflows, reducing the number of objects. * Support running multiple dbt commands at the same time through [safe parallel execution](https://docs.getdbt.com/reference/dbt-commands.md#parallel-execution), a [feature](https://docs.getdbt.com/docs/cloud/about-cloud/dbt-cloud-features.md) available in dbt's infrastructure. In contrast, `dbt-core` *doesn't support* safe parallel execution for multiple invocations in the same process. * Able to use Visual Studio (VS) Code extensions #### Orchestration[​](#orchestration "Direct link to Orchestration") dbt provides robust orchestration that enables you to schedule, run, and monitor dbt jobs with ease. Here are some tips and caveats to consider when using dbt's orchestration features: ##### Tips[​](#tips-1 "Direct link to Tips") * Enable [partial parsing](https://docs.getdbt.com/docs/cloud/account-settings.md#partial-parsing) between jobs in dbt to significantly speed up project parsing by only processing changed files, optimizing performance for large projects. * [Run multiple CI/CD](https://docs.getdbt.com/docs/deploy/continuous-integration.md) jobs at the same time which will not block production runs. The Job scheduler automatically cancels stale runs when a newer commit is pushed. This is because each PR will run in its own schema. * dbt automatically [cancels](https://docs.getdbt.com/docs/deploy/job-scheduler.md#run-cancellation-for-over-scheduled-jobs) a scheduled run if the existing run is still executing. This prevents unnecessary, duplicative executions. * Protect you and your data freshness from third-party outages by enabling dbt's [Git repository caching](https://docs.getdbt.com/docs/cloud/account-settings.md#git-repository-caching), which keeps a cache of the project's Git repository. [Enterprise](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing")[Enterprise +](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing") * [Link deploy jobs](https://docs.getdbt.com/docs/deploy/deploy-jobs.md#trigger-on-job-completion) across dbt projects by configuring your job or using the [Create Job API](https://docs.getdbt.com/dbt-cloud/api-v2#/operations/Create%20Job) to do this. [Starter](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing")[Enterprise](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing") * [Rerun your jobs](https://docs.getdbt.com/docs/deploy/retry-jobs.md) from the start or the point of failure if your dbt job run completed with a status of **`Error.`** ##### Caveats[​](#caveats-1 "Direct link to Caveats") * To automate the setup and configuration of your dbt platform, you can store your job configurations as code within a repository: * Check out our [Terraform provider.](https://registry.terraform.io/providers/dbt-labs/dbtcloud/latest/docs/resources/job) * Alternatively, check out our [jobs-as-code](https://github.com/dbt-labs/dbt-jobs-as-code) repository, which is a tool built to handle dbt jobs as a well-defined YAML file. * dbt users and external emails can receive notifications if a job fails, succeeds, or is cancelled. To get notifications for warnings, you can create a [webhook subscription](https://docs.getdbt.com/guides/zapier-slack.md) and post to Slack. #### dbt Mesh[​](#dbt-mesh "Direct link to dbt Mesh") [Mesh](https://docs.getdbt.com/best-practices/how-we-mesh/mesh-1-intro.md) helps organizations with mature, complex transformation workflows in dbt increase the flexibility and performance of their dbt projects. It allows you to make use of multiple interconnected dbt projects instead of a single large, monolithic project. It enables you to interface and navigate between different projects and models with [cross-project dependencies](https://docs.getdbt.com/docs/mesh/govern/project-dependencies.md#how-to-write-cross-project-ref), enhancing collaboration and data governance. Here are some tips and caveats to consider when using Mesh: ##### Tips[​](#tips-2 "Direct link to Tips") * To dynamically resolve [cross-project references](https://docs.getdbt.com/docs/mesh/govern/project-dependencies.md#how-to-write-cross-project-ref), all developers need to develop with dbt (either with the dbt CLI or Studio IDE). Cross-project references aren't natively supported in dbt Core, except by installing the source code from upstream projects [as packages](https://docs.getdbt.com/docs/build/packages.md#how-do-i-add-a-package-to-my-project) * Link models across projects for a modular and scalable approach for your project and teams. * Manage access to your dbt models both within and across projects using: * **[Groups](https://docs.getdbt.com/docs/mesh/govern/model-access.md#groups)** — Organize nodes in your dbt DAG that share a logical connection and assign an owner to the entire group. * **[Model access](https://docs.getdbt.com/docs/mesh/govern/model-access.md#access-modifiers)** — Control which other models or projects can reference this model. * **[Model versions](https://docs.getdbt.com/docs/mesh/govern/model-versions.md)** — Enable adoption and deprecation of models as they evolve. * **[Model contracts](https://docs.getdbt.com/docs/mesh/govern/model-contracts.md)** — Set clear expectations on the shape of the data to ensure data changes upstream of dbt or within a project's logic don't break downstream consumers' data products. ##### Caveats[​](#caveats-2 "Direct link to Caveats") * To use cross-project references in dbt, each dbt project must correspond to just one dbt project. We strongly discourage defining multiple projects for the same codebase, even if you're trying to manage access permissions, connect to different data warehouses, or separate production and non-production data. While this was required historically, features like [Staging environments](https://docs.getdbt.com/docs/dbt-cloud-environments.md#types-of-environments), Environment-level RBAC (*coming soon*), and [Extended attributes](https://docs.getdbt.com/docs/dbt-cloud-environments.md#extended-attributes) will make it unnecessary. * Project dependencies are uni-directional, meaning they go in one direction. This means dbt checks for cycles across projects (circular dependencies) and raise errors if any are detected. However, we are considering support to allow projects to depend on each other in both directions in the future, with dbt still checking for node-level cycles while allowing cycles at the project level. * Everyone in the account can view public model metadata, which helps users find data products more easily. This is separate from who can access the actual data, which is controlled by permissions in the data warehouse. For use cases where even metadata about a reusable data asset is sensitive, we are [considering](https://github.com/dbt-labs/dbt-core/issues/9340) an optional extension of protected models. Refer to the [Mesh FAQs](https://docs.getdbt.com/best-practices/how-we-mesh/mesh-5-faqs.md) for more questions. #### dbt Semantic Layer[​](#dbt-semantic-layer "Direct link to dbt Semantic Layer") Leverage the [Semantic Layer](https://docs.getdbt.com/docs/use-dbt-semantic-layer/dbt-sl.md), powered by MetricFlow, to create a unified view of your business metrics, ensuring consistency across all analytics tools. The data layer is the foundation for AI as well as analytics — shared definitions and lineage give AI and BI the same trusted context so answers are consistent and actionable. Here are some tips and caveats to consider when using Semantic Layer: ##### Tips[​](#tips-3 "Direct link to Tips") * Define semantic models and metrics once in dbt with the [Semantic Layer](https://docs.getdbt.com/docs/use-dbt-semantic-layer/dbt-sl.md) (powered by MetricFlow). Reuse them across various analytics platforms, reducing redundancy and errors. * Use the [Semantic Layer APIs](https://docs.getdbt.com/docs/dbt-cloud-apis/sl-api-overview.md) to query metrics in downstream tools for consistent, reliable data metrics. * Connect to several data applications, from business intelligence tools to notebooks, spreadsheets, data catalogs, and more, to query your metrics. [Available integrations](https://docs.getdbt.com/docs/cloud-integrations/avail-sl-integrations.md) include Tableau, Google Sheets, Hex, and more. * Use [exports](https://docs.getdbt.com/docs/use-dbt-semantic-layer/exports.md) to write commonly used queries directly within your data platform, on a schedule. ##### Caveats[​](#caveats-3 "Direct link to Caveats") * Semantic Layer currently supports the Deployment environment for querying. Development querying experience coming soon. * Run queries/semantic layer commands in the dbt CLI, however running queries/semantic layer commands in the Studio IDE isn't supported *yet.* * Semantic Layer doesn't support using [Single sign-on (SSO)](https://docs.getdbt.com/docs/cloud/manage-access/sso-overview.md) for Semantic Layer [production credentials](https://docs.getdbt.com/docs/dbt-cloud-apis/service-tokens.md#permissions-for-service-account-tokens), however, SSO is supported for development user accounts. Refer to the [Semantic Layer FAQs](https://docs.getdbt.com/docs/use-dbt-semantic-layer/sl-faqs.md) for more information. #### dbt Catalog[​](#dbt-catalog "Direct link to dbt Catalog") [Catalog](https://docs.getdbt.com/docs/explore/explore-projects.md) enhances your ability to discover, understand, and troubleshoot your data assets through rich metadata and lineage visualization. Lineage and discovery are essential for governance and for feeding reliable context to AI workflows. Here are some tips and caveats to consider when using Catalog: ##### Tips[​](#tips-4 "Direct link to Tips") * Use the search and filter capabilities in Catalog to quickly locate models, sources, and tests, streamlining your workflow. * View all the [different projects](https://docs.getdbt.com/docs/explore/explore-multiple-projects.md) and public models in the account, where the public models are defined, and how they are used to gain a better understanding of your cross-project resources. * Use the [Lenses](https://docs.getdbt.com/docs/explore/explore-projects.md#lenses) feature, which are map-like layers for your DAG, available from your project's lineage graph. Lenses help you further understand your project's contextual metadata at scale, especially to distinguish a particular model or a subset of models. * Access column-level lineage (CLL) for the resources in your dbt project. [Enterprise](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing")[Enterprise +](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing") ##### Caveats[​](#caveats-4 "Direct link to Caveats") * There must be at least one successful job run in the production deployment environment for Catalog to populate information. Familiarize yourself with Catalog's features to fully leverage its capabilities to avoid missed opportunities for efficiency gains. Refer to the [Catalog FAQs](https://docs.getdbt.com/docs/explore/dbt-explorer-faqs.md) for more information. #### What's next?[​](#whats-next "Direct link to What's next?") Congratulations on making it through the guide 🎉! We hope you're equipped with useful insights and tips to help you with your move. Something to note is that moving from dbt Core to dbt isn't just about evolving your data projects, it's about exploring new levels of collaboration, governance, efficiency, and innovation within your team, and building a data layer that's ready for AI. For the next steps, continue exploring our 3-part-guide series on moving from dbt Core to dbt: | Guide | Information | Audience | | --------------------------------------------------------------------------------------------------------------- | -------------------------------------------------------------------------------------------- | ------------------------------------------------- | | [Move from dbt Core to dbt platform: What you need to know](https://docs.getdbt.com/guides/core-migration-2.md) | Understand the considerations and methods needed in your move from dbt Core to dbt platform. | Team leads
Admins | | [Move from dbt Core to dbt platform: Get started](https://docs.getdbt.com/guides/core-migration-1.md?step=1) | Learn the steps needed to move from dbt Core to dbt platform. | Developers
Data engineers
Data analysts | | [Move from dbt Core to dbt platform: Optimization tips](https://docs.getdbt.com/guides/core-migration-3.md) | Learn how to optimize your dbt experience with common scenarios and useful tips. | Everyone | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | ##### Why move to the dbt platform?[​](#why-move-to-the-dbt-platform "Direct link to Why move to the dbt platform?") If your team is using dbt Core today, you could be reading this guide because: * You've realized the burden of maintaining that deployment. * The person who set it up has since left. * You're interested in what dbt could do to better manage the complexity of your dbt deployment, democratize access to more contributors, or improve security and governance practices. * You need a governed data foundation for AI—shared definitions, lineage, and testing so analytics and AI give answers the business can trust. Self-hosting hides its true cost in engineer hours and wasted compute. dbt platform eliminates that overhead with managed infrastructure, state-aware orchestration, and browser-based development so more people can contribute without you being the bottleneck. The data layer is the AI layer—make sure it's tested, defined, and trusted end to end. Moving from dbt Core to dbt simplifies workflows by providing a fully managed environment that improves collaboration, security, and orchestration. With dbt, you gain access to features like cross-team collaboration ([dbt Mesh](https://docs.getdbt.com/best-practices/how-we-mesh/mesh-1-intro.md)), version management, streamlined CI/CD, [Catalog](https://docs.getdbt.com/docs/explore/explore-projects.md) for comprehensive insights, and more — making it easier to manage complex dbt deployments and scale your data workflows efficiently. It's ideal for teams looking to reduce the burden of maintaining their own infrastructure while enhancing governance and productivity. ##### Resources[​](#resources "Direct link to Resources") If you need any additional help or have some questions, use the following resources: * [dbt Learn courses](https://learn.getdbt.com) for on-demand video learning. * Our [Support team](https://docs.getdbt.com/docs/dbt-support.md) is always available to help you troubleshoot your dbt issues. * Join the [dbt Community](https://community.getdbt.com/) to connect with other dbt users, ask questions, and share best practices. * Subscribe to the [dbt RSS alerts](https://status.getdbt.com/) * Enterprise accounts have an account management team available to help troubleshoot solutions and account management assistance. [Book a demo](https://www.getdbt.com/contact) to learn more. * [How dbt compares with dbt Core](https://www.getdbt.com/product/dbt-core-vs-dbt-cloud) for a detailed comparison of dbt Core and dbt. For tailored assistance, you can use the following resources: * Book [expert-led demos](https://www.getdbt.com/resources/dbt-cloud-demos-with-experts) and insights * Work with the [dbt Labs' Professional Services](https://www.getdbt.com/dbt-labs/services) team to support your data organization and move. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Move from dbt Core to the dbt platform: What you need to know [Back to guides](https://docs.getdbt.com/guides.md) Migration dbt Core dbt platform Intermediate [Menu ]() #### Introduction[​](#introduction "Direct link to Introduction") Moving from dbt Core to dbt streamlines analytics engineering workflows by allowing teams to develop, test, deploy, and explore data products using a single, fully managed software service. It's not just better tooling — it's about lowering total cost of ownership, powering AI with trusted data, and scaling with governed self-service. Explore our 3-part-guide series on moving from dbt Core to dbt. The series is ideal for users aiming for streamlined workflows and enhanced analytics: | Guide | Information | Audience | | --------------------------------------------------------------------------------------------------------------- | -------------------------------------------------------------------------------------------- | ------------------------------------------------- | | [Move from dbt Core to dbt platform: What you need to know](https://docs.getdbt.com/guides/core-migration-2.md) | Understand the considerations and methods needed in your move from dbt Core to dbt platform. | Team leads
Admins | | [Move from dbt Core to dbt platform: Get started](https://docs.getdbt.com/guides/core-migration-1.md?step=1) | Learn the steps needed to move from dbt Core to dbt platform. | Developers
Data engineers
Data analysts | | [Move from dbt Core to dbt platform: Optimization tips](https://docs.getdbt.com/guides/core-migration-3.md) | Learn how to optimize your dbt experience with common scenarios and useful tips. | Everyone | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | ##### Why move to the dbt platform?[​](#why-move-to-the-dbt-platform "Direct link to Why move to the dbt platform?") If your team is using dbt Core today, you could be reading this guide because: * You've realized the burden of maintaining that deployment. * The person who set it up has since left. * You're interested in what dbt could do to better manage the complexity of your dbt deployment, democratize access to more contributors, or improve security and governance practices. * You need a governed data foundation for AI—shared definitions, lineage, and testing so analytics and AI give answers the business can trust. Self-hosting hides its true cost in engineer hours and wasted compute. dbt platform eliminates that overhead with managed infrastructure, state-aware orchestration, and browser-based development so more people can contribute without you being the bottleneck. The data layer is the AI layer—make sure it's tested, defined, and trusted end to end. Moving from dbt Core to dbt simplifies workflows by providing a fully managed environment that improves collaboration, security, and orchestration. With dbt, you gain access to features like cross-team collaboration ([dbt Mesh](https://docs.getdbt.com/best-practices/how-we-mesh/mesh-1-intro.md)), version management, streamlined CI/CD, [Catalog](https://docs.getdbt.com/docs/explore/explore-projects.md) for comprehensive insights, and more — making it easier to manage complex dbt deployments and scale your data workflows efficiently. It's ideal for teams looking to reduce the burden of maintaining their own infrastructure while enhancing governance and productivity.  What are dbt and dbt Core? * dbt is the fastest and most reliable way to deploy dbt. It enables you to develop, test, deploy, and explore data products using a single, fully managed service. It also supports: * Development experiences tailored to multiple personas ([Studio IDE](https://docs.getdbt.com/docs/cloud/studio-ide/develop-in-studio.md) or [dbt CLI](https://docs.getdbt.com/docs/cloud/cloud-cli-installation.md)) * Out-of-the-box [CI/CD workflows](https://docs.getdbt.com/docs/deploy/ci-jobs.md) * The [Semantic Layer](https://docs.getdbt.com/docs/use-dbt-semantic-layer/dbt-sl.md) for consistent metrics * Domain ownership of data with multi-project [Mesh](https://docs.getdbt.com/best-practices/how-we-mesh/mesh-1-intro.md) setups * [Catalog](https://docs.getdbt.com/docs/explore/explore-projects.md) for easier data discovery and understanding Learn more about [dbt features](https://docs.getdbt.com/docs/cloud/about-cloud/dbt-cloud-features.md). * dbt Core is an open-source tool that enables data teams to define and execute data transformations in a cloud data warehouse following analytics engineering best practices. While this can work well for 'single players' and small technical teams, all development happens on a command-line interface, and production deployments must be self-hosted and maintained. This requires significant, costly work that adds up over time to maintain and scale. #### What you'll learn[​](#what-youll-learn "Direct link to What you'll learn") Today thousands of companies, with data teams ranging in size from 2 to 2,000, rely on dbt to accelerate data work, increase collaboration, and win the trust of the business. Understanding what you'll need to do in order to move between dbt and your current Core deployment will help you strategize and plan for your move. The guide outlines the following steps: * [Considerations](https://docs.getdbt.com/guides/core-migration-2.md?step=3): Learn about the most important things you need to think about when moving from Core to Cloud. * [Plan your move](https://docs.getdbt.com/guides/core-migration-2.md?step=4): Considerations you need to make, such as user roles and permissions, onboarding order, current workflows, and more. * [Move to dbt](https://docs.getdbt.com/guides/core-migration-2.md?step=5): Review the steps to move your dbt Core project to dbt, including setting up your account, data platform, and Git repository. * [Test and validate](https://docs.getdbt.com/guides/core-migration-2.md?step=6): Discover how to ensure model accuracy and performance post-move. * [Transition and training](https://docs.getdbt.com/guides/core-migration-2.md?step=7): Learn how to fully transition to dbt and what training and support you may need. * [Summary](https://docs.getdbt.com/guides/core-migration-2.md?step=8): Summarizes key takeaways and what you've learned in this guide. * [What's next?](https://docs.getdbt.com/guides/core-migration-2.md?step=9): Introduces what to expect in the following guides. #### Considerations[​](#considerations "Direct link to Considerations") If your team is using dbt Core today, you could be reading this guide because: * You've realized the burden of maintaining that deployment. * The person who set it up has since left. * You're interested in what dbt could do to better manage the complexity of your dbt deployment, democratize access to more contributors, or improve security and governance practices. This guide shares the technical adjustments and team collaboration strategies you'll need to know to move your project from dbt Core to dbt. Each "build your own" deployment of dbt Core will look a little different, but after seeing hundreds of teams make the migration, there are many things in common. The most important things you need to think about when moving from dbt Core to dbt: * How is your team structured? Are there natural divisions of domain? * Should you have one project or multiple? Which dbt resources do you want to standardize & keep central? * Who should have permission to view, develop, and administer? * How are you scheduling your dbt models to run in production? * How are you currently managing Continuous integration/Continuous deployment (CI/CD) of logical changes (if at all)? * How do your data developers prefer to work? * How do you manage different data environments and the different behaviors in those environments? dbt provides standard mechanisms for tackling these considerations, all of which deliver long-term benefits to your organization: * Cross-team collaboration * Access control * Orchestration * Isolated data environments If you have rolled out your own dbt Core deployment, you have probably come up with different answers. #### Plan your move[​](#plan-your-move "Direct link to Plan your move") As you plan your move, consider your workflow and team layout to ensure a smooth transition. Here are some key considerations to keep in mind:  Start small to minimize risk and maximize learning You don't need to move every team and every developer's workflow all at once. Many customers with large dbt deployments start by moving one team and one project. Once the benefits of a consolidated platform are clear, move the rest of your teams and workflows. While long-term 'hybrid' deployments can be challenging, it may make sense as a temporary on-ramp.  User roles and responsibilities Assess the users or personas involved in the pre-move, during the move, and post-move. * **Administrators**: Plan for new [access controls](https://docs.getdbt.com/docs/cloud/manage-access/about-user-access.md) in dbt, such as deciding what teams can manage themselves and what should be standardized. Determine who will be responsible for setting up and maintaining projects, data platform connections, and environments. * **Data developers** (data analysts, data engineers, analytics engineers, business analysts): Determine onboarding order, workflow adaptation in dbt, training on [dbt CLI](https://docs.getdbt.com/docs/cloud/cloud-cli-installation.md) or [Studio IDE](https://docs.getdbt.com/docs/cloud/studio-ide/develop-in-studio.md) usage, and role changes. * **Data consumers:** Discover data insights by using [Catalog](https://docs.getdbt.com/docs/explore/explore-projects.md) to view your project's resources (such as models, tests, and metrics) and their lineage to gain a better understanding of its latest production state. [Starter](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing")[Enterprise](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing")  Onboarding order If you have multiple teams of dbt developers, think about how to start your onboarding sequence for dbt: * Start with downstream (like business-embedded teams) who may benefit from the Studio IDE as dev experience (less technical users) and sharing features (like auto-deferral and Catalog) to share with their stakeholders, moving to more technical teams later. * Consider setting up a [CI job](https://docs.getdbt.com/docs/deploy/ci-jobs.md) in dbt (even before development or production jobs) to streamline development workflows. This is especially beneficial if there's no existing CI process.  Analyze current workflows, review processes, and team structures Discover how dbt can help simplify development, orchestration, and testing: * **Development**: Develop dbt models, allowing you to build, test, run, and version control your dbt projects using the dbt CLI (command line interface or code editor) or Studio IDE (browser-based). * **Orchestration**: Create custom schedules to run your production jobs. Schedule jobs by day of the week, time of day, or a recurring interval. * Set up [a CI job](https://docs.getdbt.com/docs/deploy/ci-jobs.md) to ensure developer effectiveness, and CD jobs to deploy changes as soon as they're merged. * Link deploy jobs together by [triggering a job](https://docs.getdbt.com/docs/deploy/deploy-jobs.md#trigger-on-job-completion) when another one is completed. * For the most flexibility, use the [dbt API](https://docs.getdbt.com/dbt-cloud/api-v2#/) to trigger jobs. This makes sense when you want to integrate dbt execution with other data workflows. * **Continuous integration (CI)**: Use [CI jobs](https://docs.getdbt.com/docs/deploy/ci-jobs.md) to run your dbt projects in a temporary schema when new commits are pushed to open pull requests. This build-on-PR functionality is a great way to catch bugs before deploying to production. * For many teams, dbt CI represents a major improvement compared to their previous development workflows. * **How are you defining tests today?**: While testing production data is important, it's not the most efficient way to catch logical errors introduced by developers You can use [unit testing](https://docs.getdbt.com/docs/build/unit-tests.md) to allow you to validate your SQL modeling logic on a small set of static inputs *before* you materialize your full model in production.  Understand access control Transition to dbt's [access control](https://docs.getdbt.com/docs/cloud/manage-access/about-user-access.md) mechanisms to ensure security and proper access management. dbt administrators can use dbt's permission model to control user-level access in a dbt account: * **License-based access controls:** Users are configured with account-wide license types. These licenses control the things a user can do within the application: view project metadata, develop changes within those projects, or administer access to those projects. * **Role-based Access Control (RBAC):** Users are assigned to *groups* with specific permissions on specific projects or all projects in the account. A user may be a member of multiple groups, and those groups may have permissions on multiple projects. [Enterprise](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing")[Enterprise +](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing")  Manage environments If you require isolation between production and pre-production data environments due to sensitive data, dbt can support Development, Staging, and Production data [environments](https://docs.getdbt.com/docs/dbt-cloud-environments.md). This provides developers with the benefits of an enhanced workflow while ensuring isolation between Staging and Production data, and locking down permissions on Prod. #### Move to dbt[​](#move-to-dbt "Direct link to Move to dbt") This guide is your roadmap to help you think about migration strategies and what moving from dbt Core to dbt could look like. After reviewing the considerations and planning your move, you may want to start moving your dbt Core project to dbt: * Check out the detailed [Move to dbt: Get started](https://docs.getdbt.com/guides/core-migration-1.md?step=1) guide for useful tasks and insights for a smooth transition from dbt Core to dbt. For a more detailed comparison of dbt Core and dbt, check out [How dbt compares with dbt Core](https://www.getdbt.com/product/dbt-core-vs-dbt-cloud). #### Test and validate[​](#test-and-validate "Direct link to Test and validate") After [setting the foundations of dbt](https://docs.getdbt.com/guides/core-migration-1.md?step=1), it's important to validate your migration to ensure seamless functionality and data integrity: * **Review your dbt project:** Ensure your project compiles correctly and that you can run commands. Make sure your models are accurate and monitor performance post-move. * **Start cutover:** You can start the cutover to dbt by creating a dbt job with commands that only run a small subset of the DAG. Validate the tables are being populated in the proper database/schemas as expected. Then continue to expand the scope of the job to include more sections of the DAG as you gain confidence in the results. * **Precision testing:** Use [unit testing](https://docs.getdbt.com/docs/build/unit-tests.md) to allow you to validate your SQL modeling logic on a small set of static inputs *before* you materialize your full model in production. * **Access and permissions**: Review and adjust [access controls and permissions](https://docs.getdbt.com/docs/cloud/manage-access/about-user-access.md) within dbt to maintain security protocols and safeguard your data. #### Transition and training[​](#transition-and-training "Direct link to Transition and training") Once you've confirmed that dbt orchestration and CI/CD are working as expected, you should pause your current orchestration tool and stop or update your current CI/CD process. This is not relevant if you're still using an external orchestrator (such as Airflow), and you've swapped out `dbt-core` execution for dbt execution (through the [API](https://docs.getdbt.com/docs/dbt-cloud-apis/overview.md)). Familiarize your team with dbt's [features](https://docs.getdbt.com/docs/cloud/about-cloud/dbt-cloud-features.md) and optimize development and deployment processes. Some key features to consider include: * **Release tracks:** Choose a [release track](https://docs.getdbt.com/docs/dbt-versions/cloud-release-tracks.md) for automatic dbt version upgrades, at the cadence appropriate for your team — removing the hassle of manual updates and the risk of version discrepancies. You can also get early access to new functionality, ahead of dbt Core. * **Development tools**: Use the [dbt CLI](https://docs.getdbt.com/docs/cloud/cloud-cli-installation.md) or [Studio IDE](https://docs.getdbt.com/docs/cloud/studio-ide/develop-in-studio.md) to build, test, run, and version control your dbt projects. * **Documentation and Source freshness:** Automate storage of [documentation](https://docs.getdbt.com/docs/build/documentation.md) and track [source freshness](https://docs.getdbt.com/docs/deploy/source-freshness.md) in dbt, which streamlines project maintenance. * **Notifications and logs:** Receive immediate [notifications](https://docs.getdbt.com/docs/deploy/monitor-jobs.md) for job failures, with direct links to the job details. Access comprehensive logs for all job runs to help with troubleshooting. * **CI/CD:** Use dbt's [CI/CD](https://docs.getdbt.com/docs/deploy/ci-jobs.md) feature to run your dbt projects in a temporary schema whenever new commits are pushed to open pull requests. This helps with catching bugs before deploying to production. ##### Beyond your move[​](#beyond-your-move "Direct link to Beyond your move") Now that you've chosen dbt as your platform, you've unlocked the power of streamlining collaboration, enhancing workflow efficiency, and leveraging powerful [features](https://docs.getdbt.com/docs/cloud/about-cloud/dbt-cloud-features.md) for analytics engineering teams. Here are some additional features you can use to unlock the full potential of dbt: * **Audit logs:** Use [audit logs](https://docs.getdbt.com/docs/cloud/manage-access/audit-log.md) to review actions performed by people in your organization. Audit logs contain audited user and system events in real time. You can even [export](https://docs.getdbt.com/docs/cloud/manage-access/audit-log.md#exporting-logs) *all* the activity (beyond the 90 days you can view in dbt). [Enterprise](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing")[Enterprise +](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing") * **dbt APIs:** Use dbt's robust [APIs](https://docs.getdbt.com/docs/dbt-cloud-apis/overview.md) to create, read, update, and delete (CRUD) projects/jobs/environments project. The [dbt Administrative API](https://docs.getdbt.com/docs/dbt-cloud-apis/admin-cloud-api.md) and [Terraform provider](https://registry.terraform.io/providers/dbt-labs/dbtcloud/latest/docs/resources/job) facilitate programmatic access and configuration storage. While the [Discovery API](https://docs.getdbt.com/docs/dbt-cloud-apis/discovery-api.md) offers extensive metadata querying capabilities, such as job data, model configurations, usage, and overall project health. [Starter](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing")[Enterprise](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing") * **Catalog**: Use [Catalog](https://docs.getdbt.com/docs/explore/explore-projects.md) to view your project's [resources](https://docs.getdbt.com/docs/build/projects.md) (such as models, tests, and metrics) and their [lineage](https://docs.getdbt.com/terms/data-lineage) to gain a better understanding of its latest production state. (Once you have a successful job in a Production environment). [Starter](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing")[Enterprise](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing") * **dbt Semantic Layer:** The [dbt Semantic Layer](https://docs.getdbt.com/docs/use-dbt-semantic-layer/dbt-sl.md) allows you to define universal metrics on top of your models that can then be queried in your [business intelligence (BI) tool](https://docs.getdbt.com/docs/cloud-integrations/avail-sl-integrations.md). This means no more inconsistent metrics — there's now a centralized way to define these metrics and create visibility in every component of the data flow. [Starter](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing")[Enterprise](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing") * **dbt Mesh:** Use [dbt Mesh](https://docs.getdbt.com/best-practices/how-we-mesh/mesh-1-intro.md) to share data models across organizations, enabling data teams to collaborate on shared data models and leverage the work of other teams. [Enterprise](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing")[Enterprise +](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing") ##### Additional help[​](#additional-help "Direct link to Additional help") * **dbt Learn courses**: Access our free [Learn dbt](https://learn.getdbt.com) video courses for on-demand training. * **dbt Community:** Join the [dbt Community](https://community.getdbt.com/) to connect with other dbt users, ask questions, and share best practices. * **dbt Support team:** Our [dbt Support team](https://docs.getdbt.com/docs/dbt-support.md) is always available to help you troubleshoot your dbt issues. Create a support ticket in dbt and we'll be happy to help! * **Account management** Enterprise accounts have an account management team available to help troubleshoot solutions and account management assistance. [Book a demo](https://www.getdbt.com/contact) to learn more. [Enterprise](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing")[Enterprise +](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing") #### Summary[​](#summary "Direct link to Summary") This guide should now have given you some insight and equipped you with a framework for moving from dbt Core to dbt. This guide has covered the following key areas: * **Considerations:** Understanding the foundational steps required for a successful migration, including evaluating your current setup and identifying key considerations unique to your team's structure and workflow needs. * **Plan you move**: Highlighting the importance of workflow redesign, role-specific responsibilities, and the adoption of new processes to harness dbt's collaborative and efficient environment. * **Move to dbt**: Linking to [the guide](https://docs.getdbt.com/guides/core-migration-1.md?step=1) that outlines technical steps required to transition your dbt Core project to dbt, including setting up your account, data platform, and Git repository. * **Test and validate**: Emphasizing technical transitions, including testing and validating your dbt projects within the dbt ecosystem to ensure data integrity and performance. * **Transition and training**: Share useful transition, training, and onboarding information for your team. Fully leverage dbt's capabilities, from development tools (dbt CLI and Studio IDE) to advanced features such as Catalog, the Semantic Layer, and Mesh. #### What's next?[​](#whats-next "Direct link to What's next?") Congratulations on finishing this guide, we hope it's given you insight into the considerations you need to take to best plan your move to dbt. For the next steps, you can continue exploring our 3-part-guide series on moving from dbt Core to dbt: | Guide | Information | Audience | | --------------------------------------------------------------------------------------------------------------- | -------------------------------------------------------------------------------------------- | ------------------------------------------------- | | [Move from dbt Core to dbt platform: What you need to know](https://docs.getdbt.com/guides/core-migration-2.md) | Understand the considerations and methods needed in your move from dbt Core to dbt platform. | Team leads
Admins | | [Move from dbt Core to dbt platform: Get started](https://docs.getdbt.com/guides/core-migration-1.md?step=1) | Learn the steps needed to move from dbt Core to dbt platform. | Developers
Data engineers
Data analysts | | [Move from dbt Core to dbt platform: Optimization tips](https://docs.getdbt.com/guides/core-migration-3.md) | Learn how to optimize your dbt experience with common scenarios and useful tips. | Everyone | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | ##### Why move to the dbt platform?[​](#why-move-to-the-dbt-platform "Direct link to Why move to the dbt platform?") If your team is using dbt Core today, you could be reading this guide because: * You've realized the burden of maintaining that deployment. * The person who set it up has since left. * You're interested in what dbt could do to better manage the complexity of your dbt deployment, democratize access to more contributors, or improve security and governance practices. * You need a governed data foundation for AI—shared definitions, lineage, and testing so analytics and AI give answers the business can trust. Self-hosting hides its true cost in engineer hours and wasted compute. dbt platform eliminates that overhead with managed infrastructure, state-aware orchestration, and browser-based development so more people can contribute without you being the bottleneck. The data layer is the AI layer—make sure it's tested, defined, and trusted end to end. Moving from dbt Core to dbt simplifies workflows by providing a fully managed environment that improves collaboration, security, and orchestration. With dbt, you gain access to features like cross-team collaboration ([dbt Mesh](https://docs.getdbt.com/best-practices/how-we-mesh/mesh-1-intro.md)), version management, streamlined CI/CD, [Catalog](https://docs.getdbt.com/docs/explore/explore-projects.md) for comprehensive insights, and more — making it easier to manage complex dbt deployments and scale your data workflows efficiently. It's ideal for teams looking to reduce the burden of maintaining their own infrastructure while enhancing governance and productivity. ##### Related content[​](#related-content "Direct link to Related content") * [Learn dbt](https://learn.getdbt.com) courses * Book [expert-led demos](https://www.getdbt.com/resources/dbt-cloud-demos-with-experts) and insights * Work with the [dbt Labs' Professional Services](https://www.getdbt.com/dbt-labs/services) team to support your data organization and migration. * [How dbt compares with dbt Core](https://www.getdbt.com/product/dbt-core-vs-dbt-cloud) for a detailed comparison of dbt Core and dbt. * Subscribe to the [dbt RSS alerts](https://status.getdbt.com/) #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Optimize and troubleshoot dbt models on Databricks [Back to guides](https://docs.getdbt.com/guides.md) Databricks dbt Core dbt platform Intermediate [Menu ]() #### Introduction[​](#introduction "Direct link to Introduction") Building on the [Set up your dbt project with Databricks](https://docs.getdbt.com/guides/set-up-your-databricks-dbt-project.md) guide, we'd like to discuss performance optimization. In this follow-up post, we outline simple strategies to optimize for cost, performance, and simplicity when you architect data pipelines. We’ve encapsulated these strategies in this acronym-framework: * Platform Components * Patterns & Best Practices * Performance Troubleshooting #### Platform Components[​](#platform-components "Direct link to Platform Components") As you start to develop your dbt projects, one of the first decisions you will make is what kind of backend infrastructure to run your models against. Databricks offers SQL warehouses, All-Purpose Compute, and Jobs Compute, each optimized to workloads they are catered to. Our recommendation is to use Databricks SQL warehouses for all your SQL workloads. SQL warehouses are optimized for SQL workloads when compared to other compute options, additionally, they can scale both vertically to support larger workloads and horizontally to support concurrency. Also, SQL warehouses are easier to manage and provide out-of-the-box features such as query history to help audit and optimize your SQL workloads. Between Serverless, Pro, and Classic SQL Warehouse types that Databricks offers, our standard recommendation for you is to leverage Databricks serverless warehouses. You can explore features of these warehouse types in the [Compare features section](https://www.databricks.com/product/pricing/databricks-sql?_gl=1*2rsmlo*_ga*ZmExYzgzZDAtMWU0Ny00N2YyLWFhYzEtM2RhZTQzNTAyZjZi*_ga_PQSEQ3RZQC*MTY3OTYwMDg0Ni4zNTAuMS4xNjc5NjAyMDMzLjUzLjAuMA..&_ga=2.104593536.1471430337.1679342371-fa1c83d0-1e47-47f2-aac1-3dae43502f6b) on the Databricks pricing page. With serverless warehouses, you greatly decrease spin-up time waiting for the cluster to warm up and scale time when your cluster needs to horizontally scale. This mitigates the need to keep clusters idle as serverless warehouses will spin up quickly when the workload begins and then spin down when the workload is complete. Plus, serverless warehouses leverage our Photon engine out of the box for optimal performance in both ELT and serving workloads. The next step would be to decide how big to make your serverless SQL warehouse. This is not an exact science but these subsections provide you with some quick tips that will drive huge improvements in performance. ##### Sizing your SQL warehouses[​](#sizing-your-sql-warehouses "Direct link to Sizing your SQL warehouses") To select the appropriate size of your SQL warehouse, consider the use case and workload you are running and its corresponding latency requirements. You can select a T-shirt size based on the amount of data and auto-scaling based on concurrency needs. A good rule of thumb to follow is to start with a Medium warehouse and work from there. For large and complex workloads, bigger warehouses are the way to go and that won’t necessarily mean higher costs. This is because larger warehouses take a shorter time to complete a unit of work. For example, if a Small warehouse takes an hour to complete a pipeline, it will only take half an hour with a Medium. This linear trend continues as long as there’s enough work for the warehouse to perform. ##### Provision warehouses by workload[​](#provision-warehouses-by-workload "Direct link to Provision warehouses by workload") Another technique worth implementing is to provision separate SQL warehouses for building dbt pipelines instead of ad hoc, interactive SQL analysis. This is because the query design patterns and compute usage are different for these two types of workloads. Choose T-shirt sizes based on data volumes and SLAs (scale-up principle), and choose auto-scaling based on concurrency requirements (scale-out principle). For larger deployments, this approach could be expanded to map different workload sizes to multiple “pipeline” warehouses, if needed. On the dbt side, take into account the [number of threads you have](https://docs.getdbt.com/docs/local/profiles.yml.md#understanding-threads), meaning how many dbt models you can run in parallel. The higher the thread count, the more compute you will require. ##### Configure auto-stop[​](#configure-auto-stop "Direct link to Configure auto-stop") Because of the ability of serverless warehouses to spin up in a matter of seconds, setting your auto-stop configuration to a lower threshold will not impact SLAs and end-user experience. From the SQL Workspace UI, the default value is 10 minutes and  you can set it to 5 minutes for a lower threshold with the UI. If you would like more custom settings, you can set the threshold to as low as 1 minute with the [API](https://docs.databricks.com/sql/api/sql-endpoints.html#). #### Patterns & Best Practices[​](#patterns--best-practices "Direct link to Patterns & Best Practices") Now that we have a solid sense of the infrastructure components, we can shift our focus to best practices and design patterns on pipeline development.  We recommend the staging/intermediate/mart approach which is analogous to the medallion architecture bronze/silver/gold approach that’s recommended by Databricks. Let’s dissect each stage further. dbt has guidelines on how you can [structure your dbt project](https://docs.getdbt.com/best-practices/how-we-structure/1-guide-overview.md) which you can learn more about. ##### Bronze / Staging Layer:[​](#bronze--staging-layer "Direct link to Bronze / Staging Layer:") There are a few different options for materializing bronze delta tables on Databricks. In the recommended dbt workflow, you should load your flat files into a table first before using dbt to transform on it. To do so, you can use an EL tool to handle this ingestion. However, we know this isn't always possible so for data sets in cloud storage, we recommend that you either leverage our `COPY INTO` functionality or stage the external table. In terms of the `COPY INTO` approach, you would have a few different options. The first option would be to run the `COPY INTO` logic as a pre-hook before building your silver/intermediate models. The second option would be to invoke the databricks `COPY INTO` macro with `dbt run-operation` and then subsequently execute your model runs. You can see an example implementation of the [COPY INTO macro](https://github.com/databricks/dbt-databricks/blob/main/docs/databricks-copy-into-macro-aws.md) in the dbt-databricks docs. The main benefit of leveraging `COPY INTO` is that it's an incremental operation and it ensures that data is written in Delta format (when we refer to Delta, we are simply referring to the open Parquet tables with a transaction log). If you instead opt to stage an external table, the bronze table retains its raw structure (whether it is CSV, Parquet, JSON, etc.). This would prevent the ability to leverage the performance, reliability, and governance advantages inherent in Delta. Further, external Parquet tables require additional manual work such as running repair operations to ensure new partition metadata is accounted for. Nevertheless, staging external tables could be a feasible option if you are migrating to Databricks from another cloud warehouse system, where you heavily leveraged this functionality. ##### Silver / Intermediate Layer[​](#silver--intermediate-layer "Direct link to Silver / Intermediate Layer") Now that we have our bronze table taken care of, we can proceed with the silver layer. For cost and performance reasons, many customers opt to implement an incremental pipeline approach. The main benefit with this approach is that you process a lot less data when you insert new records into the silver layer, rather than re-create the table each time with all the data from the bronze layer. However it should be noted that by default, [dbt recommends using views and tables](https://docs.getdbt.com/best-practices/materializations/1-guide-overview.md) to start out with and then moving to incremental as you require more performance optimization. dbt has an [incremental model materialization](https://docs.getdbt.com/reference/resource-configs/spark-configs.md#the-merge-strategy) to facilitate this framework. How this works at a high level is that Databricks will create a temp view with a snapshot of data and then merge that snapshot into the silver table. You can customize the time range of the snapshot to suit your specific use case by configuring the `where` conditional in your `is_incremental` logic. The most straightforward implementation is to merge data using a timestamp that’s later than the current max timestamp in the silver table, but there are certainly valid use cases for increasing the temporal range of the source snapshot. While merge should be fairly performant out of the box but if you have particularly tight SLAs, there are some more advanced tuning techniques that you can incorporate into your logic. Let us discuss several examples in further detail. ##### File Compaction[​](#file-compaction "Direct link to File Compaction") Most compute engines work best when file sizes are between 32 MB and 256 MB. In Databricks, we take care of optimal file sizing under the hood with our [auto optimize](https://docs.databricks.com/optimizations/auto-optimize.html) features. Auto optimize consists of two distinct features: auto compaction and optimized writes. In Databricks SQL warehouses, optimized writes are enabled by default. We recommend that you [opt in to auto compaction](https://docs.databricks.com/optimizations/auto-optimize.html#when-to-opt-in-to-auto-compaction). ##### Data skipping[​](#data-skipping "Direct link to Data skipping") Under the hood, Databricks will naturally [cluster data based on when it was ingested](https://www.databricks.com/blog/2022/11/18/introducing-ingestion-time-clustering-dbr-112.html). Since many queries include timestamps in `where` conditionals, this will naturally lead to a large amount of file skipping for enhanced performance. Nevertheless, if you have other high cardinality columns (basically columns with a large amount of distinct values such as id columns) that are frequently used in `join` keys or `where` conditionals, performance can typically be augmented further by leveraging Z-order. The SQL syntax for the Z-Order command is `OPTIMIZE table_name ZORDER BY (col1,col2,col3,etc)`. One caveat to be aware of is that you will rarely want to Z-Order by more than three columns. You will likely want to either run Z-order on run end after your model builds or run Z-Order as a separate scheduled job on a consistent cadence, whether it is daily, weekly, or monthly. ```sql config( materialized='incremental', zorder="column_A" | ["column_A", "column_B"] ) ``` ##### Analyze Table[​](#analyze-table "Direct link to Analyze Table") The `ANALYZE TABLE` command ensures that our system has the most up-to-date statistics to select the optimal join plan. You will likely want to either run analyze table posthook after your model builds or run analyze table as a separate scheduled dbt job on a consistent cadence, whether it is daily, weekly, or monthly.  The SQL syntax for this is: ```sql ANALYZE TABLE mytable COMPUTE STATISTICS FOR COLUMNS col1, col2, col3 ``` An important item to clarify is that you will want to prioritize statistics for columns that are frequently used in joins. ##### Vacuum[​](#vacuum "Direct link to Vacuum") When you delete a record from a Delta table, it is a soft delete. What this means is that the record is deleted from the transaction log and is not included in subsequent queries, but the underlying file still remains in cloud storage. If you want to delete the underlying files as well (whether for reducing storage cost or augmenting performance on merges), you can run a vacuum command. The factor you will want to be very cognizant of is restoring older versions of the table. Let’s say  you vacuum a table to delete all unused files that’s older than 7 days. You won’t be  able to restore versions of the table from over 7 days ago that rely on those deleted  files, so use with caution. If/when you choose to leverage vacuum, you will likely want to run vacuum using the dbt functionality [on-run-end](https://docs.getdbt.com/reference/project-configs/on-run-start-on-run-end.md) after your model builds or run vacuum as a separate scheduled dbt job on a consistent cadence (whether it is daily, weekly, or monthly) using the dbt [run-operation](https://docs.getdbt.com/reference/commands/run-operation.md) command (with the vaccum statement in a macro). ##### Gold / Marts Layer[​](#gold--marts-layer "Direct link to Gold / Marts Layer") Now onto the most final layer — the gold marts that business stakeholders typically interact with from their preferred BI tool. The considerations here will be fairly similar to the silver layer except that these marts are more likely to handling aggregations. Further, you will likely want to be even more intentional about Z-Ordering these tables as SLAs tend to be lower with these direct stakeholder facing tables. In addition, these tables are well suited for defining [metrics](https://docs.getdbt.com/docs/build/build-metrics-intro.md) on to ensure simplicity and consistency across your key business KPIs! Using the [MetricFlow](https://github.com/dbt-labs/metricflow), you can query the metrics inside of your own dbt project even. With the upcoming Semantic Layer Integration, you can also then query the metrics in any of the partner integrated tools. ##### Filter rows in target and/or source[​](#filter-rows-in-target-andor-source "Direct link to Filter rows in target and/or source") It can be done using `incremental_predicates` like in this example: ```sql {{ config( materialized='incremental', incremental_strategy = 'merge', unique_key = 'id', incremental_predicates = [ "dbt_internal_target.create_at >= '2023-01-01'", "dbt_internal_source.create_at >= '2023-01-01'"], ) }} ``` #### Performance Troubleshooting[​](#performance-troubleshooting "Direct link to Performance Troubleshooting") Performance troubleshooting refers to the process of identifying and resolving issues that impact the performance of your dbt models and overall data pipelines. By improving the speed and performance of your Lakehouse platform, you will be able to process data faster, process large and complex queries more effectively, and provide faster time to market.  Let’s go into detail the three effective strategies that you can implement. ##### SQL warehouse query profile[​](#sql-warehouse-query-profile "Direct link to SQL warehouse query profile") The SQL warehouse query profile is an effective tool found inside the Databricks SQL workspace. It’s used to troubleshoot slow-running queries, optimize query execution plans, and analyze granular metrics to see where compute resources are being spent. The query profile includes these high level capability areas: * Detailed information about the three main components of query execution, which are time spent in tasks, number of rows processed, and memory consumption. * Two types of graphical representations. A tree view to easily spot slow operations at a glance, and a graph view that breaks down how data is transformed across tasks. * Ability to understand mistakes and performance bottlenecks in queries. The three common examples of performance bottlenecks that can be surfaced by the query profile are: ##### Inefficient file pruning[​](#inefficient-file-pruning "Direct link to Inefficient file pruning") By default, Databricks Delta tables collect statistics on the *first 32 columns* defined in your table schema. When transforming data from the Bronze/staging layer to the Silver/intermediate layer, it is advised to reorder your columns to account for these file-level stats and improve overall performance. Move numerical keys and high cardinality query predicates to the left of the 32nd ordinal position, and move strings and complex data types after the 32nd ordinal position of the table. It is worth mentioning that while you can change the default table property to collect statistics on more columns, it will add more overhead as you write files. You may change this default value by using the [table property](https://docs.databricks.com/delta/table-properties.html), `delta.dataSkippingNumIndexedCols`. ##### Full Table Scans[​](#full-table-scans "Direct link to Full Table Scans") The Query Profile provides metrics that allow you to identify the presence of full table scans. Full table scans is a query operation that involves scanning the entire table to retrieve records. It can be a performance issue especially for large tables with billions or trillions of rows. This is because scanning an entire table can be time-consuming and resource-intensive, leading to high memory and CPU usage and slower response times. Table layout techniques such as file compaction and Z-Ordering described in the earlier section of this article will help alleviate this problem. ##### Exploding Joins[​](#exploding-joins "Direct link to Exploding Joins") The concept of *exploding joins* refers to a `join` operation that produces a much larger table result set than either of the input tables used, resulting in a Cartesian product. This performance issue can be determined by enabling the verbose mode setting in the Query Profile, by looking at the number of records produced by a join operator. There are several steps you can take to prevent exploding joins. As a first step, make the join conditions more specific to reduce the number of rows that are being matched. Another step is to utilize data preprocessing techniques such as aggregating, filtering, and performing data sampling before the join operation. These techniques can reduce the size of the input tables and help prevent exploding joins. ##### Materialization Best Practices  [​](#materialization-best-practices- "Direct link to Materialization Best Practices  ") Remember that data is stored as files, so the unit of I/O work is a file, not a row. That’s a lot of work if we’re dealing with TBs of data. Therefore we recommend relying on merge strategy as the recommended strategy for the majority of incremental models. Databricks is committed to continuously improving its performance. For example, in Delta and DBSQL, we’ve greatly improved performance of MERGE operations recently with [low-shuffle merge and Photon](https://www.databricks.com/blog/2022/10/17/faster-merge-performance-low-shuffle-merge-and-photon.html). With many future implementations in the pipeline such as deletion vectors for efficient deletes & upserts.Here’s the basic strategies to speed it up: 1. Only read partitions that are important by pushing down filters to scan source and target using filters in *model* and *incremental\_predicates* 2. Only update important rows 3. Improve key lookup by defining only *one* materialized key 4. Only update important columns ##### dbt Discovery API[​](#dbt-discovery-api "Direct link to dbt Discovery API") Now you might be wondering, how do you identify opportunities for performance improvement inside of dbt? Well, with each job run, dbt generates metadata on the timing, configuration, and freshness of models in your dbt project. The [dbt Discovery API](https://docs.getdbt.com/docs/dbt-cloud-apis/discovery-api.md) is a GraphQL service that supports queries on this metadata, using  the [graphical explorer](https://metadata.cloud.getdbt.com/graphiql) or the endpoint itself. Teams can pipe this data into their data warehouse and analyze it like any other data source in a business intelligence platform. dbt users can also use the data from the [Model Timing tab](https://docs.getdbt.com/docs/deploy/run-visibility.md#model-timing) to visually identify models that take the most time and may require refactoring. ##### dbt Admin API[​](#dbt-admin-api "Direct link to dbt Admin API") With the [dbt Admin API](https://docs.getdbt.com/docs/dbt-cloud-apis/admin-cloud-api.md), you can  pull the dbt artifacts from your dbt run,  put the generated `manifest.json` into an S3 bucket, stage it, and model the data using the [dbt artifacts package](https://hub.getdbt.com/brooklyn-data/dbt_artifacts/latest/). That package can help you identify inefficiencies in your dbt models and pinpoint where opportunities for improvement are. ##### Conclusion[​](#conclusion "Direct link to Conclusion") This builds on the content in [Set up your dbt project with Databricks](https://docs.getdbt.com/guides/set-up-your-databricks-dbt-project.md). We welcome you to try these strategies on our example open source TPC-H implementation and to provide us with thoughts/feedback as you start to incorporate these features into production. Looking forward to your feedback on [#db-databricks-and-spark](https://getdbt.slack.com/archives/CNGCW8HKL) Slack channel! #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Post to Microsoft Teams when a job finishes [Back to guides](https://docs.getdbt.com/guides.md) Webhooks Advanced [Menu ]() #### Introduction[​](#introduction "Direct link to Introduction") This guide will show you how to set up an integration between dbt jobs and Microsoft Teams using [dbt Webhooks](https://docs.getdbt.com/docs/deploy/webhooks.md) and Zapier, similar to the [native Slack integration](https://docs.getdbt.com/docs/deploy/job-notifications.md#slack-notifications). Want Microsoft Teams notifications without Zapier? If you only need job status alerts in Microsoft Teams (for example, “job succeeded/failed”) and *don’t* need to process webhook payloads, you can use **Job notifications** instead by sending notifications to a Teams channel email address (External Email). See **Job notifications**: [Job notifications](https://docs.getdbt.com/docs/deploy/job-notifications) When a dbt job finishes running, the integration will: * Receive a webhook notification in Zapier, * Extract the results from the dbt admin API, and * Post a summary to a Microsoft Teams channel. ![Screenshot of a message in MS Teams showing a summary of a run which failed](/assets/images/ms-teams-ui-ab48d824ddaa34c88daeeddbf0291616.png) ##### Prerequisites[​](#prerequisites "Direct link to Prerequisites") In order to set up the integration, you should have familiarity with: * [dbt Webhooks](https://docs.getdbt.com/docs/deploy/webhooks.md) * Zapier #### Set up the connection between Zapier and Microsoft Teams[​](#set-up-the-connection-between-zapier-and-microsoft-teams "Direct link to Set up the connection between Zapier and Microsoft Teams") * Install the [Zapier app in Microsoft Teams](https://appsource.microsoft.com/en-us/product/office/WA200002044) and [grant Zapier access to your account](https://zapier.com/blog/how-to-automate-microsoft-teams/). **Note**: To receive the message, add the Zapier app to the team's channel during installation. #### Create a new Zap in Zapier[​](#create-a-new-zap-in-zapier "Direct link to Create a new Zap in Zapier") Use **Webhooks by Zapier** as the Trigger, and **Catch Raw Hook** as the Event. If you don't intend to [validate the authenticity of your webhook](https://docs.getdbt.com/docs/deploy/webhooks.md#validate-a-webhook) (not recommended!) then you can choose **Catch Hook** instead. Press **Continue**, then copy the webhook URL. ![Screenshot of the Zapier UI, showing the webhook URL ready to be copied](/assets/images/catch-raw-hook-16dd72d8a6bc26284c5fad897f3da646.png) ##### 3. Configure a new webhook in dbt[​](#3-configure-a-new-webhook-in-dbt "Direct link to 3. Configure a new webhook in dbt") See [Create a webhook subscription](https://docs.getdbt.com/docs/deploy/webhooks.md#create-a-webhook-subscription) for full instructions. Choose either **Run completed** or **Run errored**, but not both, or you'll get double messages when a run fails. Make note of the Webhook Secret Key for later. Once you've tested the endpoint in dbt, go back to Zapier and click **Test Trigger**, which will create a sample webhook body based on the test event dbt sent. The sample body's values are hard-coded and not reflective of your project, but they give Zapier a correctly-shaped object during development. #### Store secrets[​](#store-secrets "Direct link to Store secrets") In the next step, you will need the Webhook Secret Key from the prior step, and a dbt [personal access token](https://docs.getdbt.com/docs/dbt-cloud-apis/user-tokens.md) or [service account token](https://docs.getdbt.com/docs/dbt-cloud-apis/service-tokens.md). Zapier allows you to [store secrets](https://help.zapier.com/hc/en-us/articles/8496293271053-Save-and-retrieve-data-from-Zaps), which prevents your keys from being displayed in plaintext in the Zap code. You will be able to access them via the [StoreClient utility](https://help.zapier.com/hc/en-us/articles/8496293969549-Store-data-from-code-steps-with-StoreClient). This guide assumes the names for the secret keys are: `DBT_CLOUD_SERVICE_TOKEN` and `DBT_WEBHOOK_KEY`. If you're using different names, make sure you update all references to it in the sample code. This guide uses a short-lived code action to store the secrets, but you can also use a tool like Postman to interact with the [REST API](https://store.zapier.com/) or create a separate Zap and call the [Set Value Action](https://help.zapier.com/hc/en-us/articles/8496293271053-Save-and-retrieve-data-from-Zaps#3-set-a-value-in-your-store-0-3). ###### a. Create a Storage by Zapier connection[​](#a-create-a-storage-by-zapier-connection "Direct link to a. Create a Storage by Zapier connection") If you haven't already got one, go to and create a new connection. Remember the UUID secret you generate for later. ###### b. Add a temporary code step[​](#b-add-a-temporary-code-step "Direct link to b. Add a temporary code step") Choose **Run Python** as the Event. Run the following code: ```python store = StoreClient('abc123') #replace with your UUID secret store.set('DBT_WEBHOOK_KEY', 'abc123') #replace with webhook secret store.set('DBT_CLOUD_SERVICE_TOKEN', 'abc123') #replace with your dbt API token ``` Test the step. You can delete this Action when the test succeeds. The key will remain stored as long as it is accessed at least once every three months. #### Add a code action[​](#add-a-code-action "Direct link to Add a code action") Select **Code by Zapier** as the App, and **Run Python** as the Event. In the **Set up action** area, add two items to **Input Data**: `raw_body` and `auth_header`. Map those to the `1. Raw Body` and `1. Headers Http Authorization` fields from the **Catch Raw Hook** step above. ![Screenshot of the Zapier UI, showing the mappings of raw\_body and auth\_header](/assets/images/run-python-40333883c6a20727c02d25224d0e40a4.png) In the **Code** field, paste the following code, replacing `YOUR_SECRET_HERE` with the secret you created when setting up the Storage by Zapier integration. Remember that this is not your dbt secret. The code below will validate the authenticity of the request, extract the run logs for the completed job from the Admin API, and then build a summary message that pulls out any error messages from the end-of-invocation logs created by dbt Core. ````python import hashlib import hmac import json import re auth_header = input_data['auth_header'] raw_body = input_data['raw_body'] # Access secret credentials secret_store = StoreClient('YOUR_SECRET_HERE') hook_secret = secret_store.get('DBT_WEBHOOK_KEY') api_token = secret_store.get('DBT_CLOUD_SERVICE_TOKEN') # Validate the webhook came from dbt signature = hmac.new(hook_secret.encode('utf-8'), raw_body.encode('utf-8'), hashlib.sha256).hexdigest() if signature != auth_header: raise Exception("Calculated signature doesn't match contents of the Authorization header. This webhook may not have been sent from .") full_body = json.loads(raw_body) hook_data = full_body['data'] # Steps derived from these commands won't have their error details shown inline, as they're messy commands_to_skip_logs = ['dbt source', 'dbt docs'] # When testing, you will want to hardcode run_id and account_id to IDs that exist; the sample webhook won't work. run_id = hook_data['runId'] account_id = full_body['accountId'] # Fetch run info from the dbt Admin API url = f'https://YOUR_ACCESS_URL/api/v2/accounts/{account_id}/runs/{run_id}/?include_related=["run_steps"]' headers = {'Authorization': f'Token {api_token}'} run_data_response = requests.get(url, headers=headers) run_data_response.raise_for_status() run_data_results = run_data_response.json()['data'] # Overall run summary outcome_message = f""" **[{hook_data['runStatus']} for Run #{run_id} on Job \"{hook_data['jobName']}\"]({run_data_results['href']})** **Environment:** {hook_data['environmentName']} | **Trigger:** {hook_data['runReason']} | **Duration:** {run_data_results['duration_humanized']} """ # Step-specific summaries for step in run_data_results['run_steps']: if step['status_humanized'] == 'Success': outcome_message += f""" ✅ {step['name']} ({step['status_humanized']} in {step['duration_humanized']}) """ else: outcome_message += f""" ❌ {step['name']} ({step['status_humanized']} in {step['duration_humanized']}) """ show_logs = not any(cmd in step['name'] for cmd in commands_to_skip_logs) if show_logs: full_log = step['logs'] # Remove timestamp and any colour tags full_log = re.sub('\x1b?\[[0-9]+m[0-9:]*', '', full_log) summary_start = re.search('(?:Completed with \d+ error.* and \d+ warnings?:|Database Error|Compilation Error|Runtime Error)', full_log) line_items = re.findall('(^.*(?:Failure|Error) in .*\n.*\n.*)', full_log, re.MULTILINE) if len(line_items) == 0: relevant_log = f'```{full_log[summary_start.start() if summary_start else 0:]}```' else: relevant_log = summary_start[0] for item in line_items: relevant_log += f'\n```\n{item.strip()}\n```\n' outcome_message += f""" {relevant_log} """ # Zapier looks for the `output` dictionary for use in subsequent steps output = {'outcome_message': outcome_message} ```` #### Add the Microsoft Teams action[​](#add-the-microsoft-teams-action "Direct link to Add the Microsoft Teams action") Select **Microsoft Teams** as the App, and **Send Channel Message** as the Action. In the **Set up action** area, choose the team and channel. Set the **Message Text Format** to **markdown**, then put **2. Outcome Message** from the Run Python in Code by Zapier output into the **Message Text** field. ![Screenshot of the Zapier UI, showing the mappings of prior steps to an MS Teams message](/assets/images/ms-teams-zap-config-998b96ebd7b3535473f5641dac7b4243.png) #### Test and deploy[​](#test-and-deploy "Direct link to Test and deploy") As you have gone through each step, you should have tested the outputs, so you can now try posting a message into your Teams channel. When you're happy with it, remember to ensure that your `run_id` and `account_id` are no longer hardcoded, then publish your Zap. ##### Other notes[​](#other-notes "Direct link to Other notes") * If you post to a chat instead of a team channel, you don't need to add the Zapier app to Microsoft Teams. * If you post to a chat instead of a team channel, note that markdown is not supported and you will need to remove the markdown formatting. * If you chose the **Catch Hook** trigger instead of **Catch Raw Hook**, you will need to pass each required property from the webhook as an input instead of running `json.loads()` against the raw body. You will also need to remove the validation code. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Post to Slack with error context when a job fails [Back to guides](https://docs.getdbt.com/guides.md) Webhooks Advanced [Menu ]() #### Introduction[​](#introduction "Direct link to Introduction") This guide will show you how to set up an integration between dbt jobs and Slack using [dbt webhooks](https://docs.getdbt.com/docs/deploy/webhooks.md) and Zapier. It builds on the native [native Slack integration](https://docs.getdbt.com/docs/deploy/job-notifications.md#slack-notifications) by attaching error message details of models and tests in a thread. Note: Because there is not a webhook for Run Cancelled, you may want to keep the standard Slack integration installed to receive those notifications. You could also use the [alternative integration](#alternate-approach) that augments the native integration without replacing it. When a dbt job finishes running, the integration will: * Receive a webhook notification in Zapier * Extract the results from the dbt admin API * Post a brief summary of the run to a Slack channel * Create a threaded message attached to that post which contains any reasons that the job failed ![Screenshot of a message in Slack showing a summary of a run which failed](/assets/images/slack-thread-example-f5cdad43397338d5be7a32eb4117228a.png) ##### Prerequisites[​](#prerequisites "Direct link to Prerequisites") In order to set up the integration, you should have familiarity with: * [dbt webhooks](https://docs.getdbt.com/docs/deploy/webhooks.md) * Zapier #### Create a new Zap in Zapier[​](#create-a-new-zap-in-zapier "Direct link to Create a new Zap in Zapier") 1. Use **Webhooks by Zapier** as the Trigger, and **Catch Raw Hook** as the Event. If you don't intend to [validate the authenticity of your webhook](https://docs.getdbt.com/docs/deploy/webhooks.md#validate-a-webhook) (not recommended!) then you can choose **Catch Hook** instead. 2. Click **Continue**, then copy the webhook URL. ![Screenshot of the Zapier UI, showing the webhook URL ready to be copied](/assets/images/catch-raw-hook-16dd72d8a6bc26284c5fad897f3da646.png) #### Configure a new webhook in dbt[​](#configure-a-new-webhook-in-dbt "Direct link to Configure a new webhook in dbt") See [Create a webhook subscription](https://docs.getdbt.com/docs/deploy/webhooks.md#create-a-webhook-subscription) for full instructions. Choose **Run completed** as the Event. You can alternatively choose **Run errored**, but you will need to account for the fact that the necessary metadata [might not be available immediately](https://docs.getdbt.com/docs/deploy/webhooks.md#completed-errored-event-difference). Remember the Webhook Secret Key for later. Once you've tested the endpoint in dbt, go back to Zapier and click **Test Trigger**. This creates a sample webhook body based on the test event dbt sent. The sample body's values are hardcoded and not reflective of your project, but they give Zapier a correctly-shaped object during development. #### Store secrets[​](#store-secrets "Direct link to Store secrets") In the next step, you will need the Webhook Secret Key from the prior step, and a dbt [personal access token](https://docs.getdbt.com/docs/dbt-cloud-apis/user-tokens.md) or [service account token](https://docs.getdbt.com/docs/dbt-cloud-apis/service-tokens.md). Zapier allows you to [store secrets](https://help.zapier.com/hc/en-us/articles/8496293271053-Save-and-retrieve-data-from-Zaps). This prevents your keys from being displayed as plaintext in the Zap code. You can access them with the [StoreClient utility](https://help.zapier.com/hc/en-us/articles/8496293969549-Store-data-from-code-steps-with-StoreClient). This guide assumes the names for the secret keys are: `DBT_CLOUD_SERVICE_TOKEN` and `DBT_WEBHOOK_KEY`. If you're using different names, make sure you update all references to it in the sample code. This guide uses a short-lived code action to store the secrets, but you can also use a tool like Postman to interact with the [REST API](https://store.zapier.com/) or create a separate Zap and call the [Set Value Action](https://help.zapier.com/hc/en-us/articles/8496293271053-Save-and-retrieve-data-from-Zaps#3-set-a-value-in-your-store-0-3). ###### a. Create a Storage by Zapier connection[​](#a-create-a-storage-by-zapier-connection "Direct link to a. Create a Storage by Zapier connection") If you haven't already got one, go to and create a new connection. Remember the UUID secret you generate for later. ###### b. Add a temporary code step[​](#b-add-a-temporary-code-step "Direct link to b. Add a temporary code step") Choose **Run Python** as the Event. Run the following code: ```python store = StoreClient('abc123') #replace with your UUID secret store.set('DBT_WEBHOOK_KEY', 'abc123') #replace with webhook secret store.set('DBT_CLOUD_SERVICE_TOKEN', 'abc123') #replace with your dbt API token ``` Test the step. You can delete this Action when the test succeeds. The key will remain stored as long as it is accessed at least once every three months. #### Add a code action[​](#add-a-code-action "Direct link to Add a code action") Select **Code by Zapier** as the App, and **Run Python** as the Event. In the **Set up action** section, add two items to **Input Data**: `raw_body` and `auth_header`. Map those to the `1. Raw Body` and `1. Headers Http Authorization` fields from the previous **Catch Raw Hook** step. ![Screenshot of the Zapier UI, showing the mappings of raw\_body and auth\_header](/assets/images/run-python-40333883c6a20727c02d25224d0e40a4.png) In the **Code** field, paste the following code, replacing `YOUR_SECRET_HERE` with the secret you created when setting up the Storage by Zapier integration. Remember that this is not your dbt secret. This example code validates the authenticity of the request, extracts the run logs for the completed job from the Admin API, and then builds two messages: a summary message containing the outcome of each step and its duration, and a message for inclusion in a thread displaying any error messages extracted from the end-of-invocation logs created by dbt Core. ````python import hashlib import hmac import json import re auth_header = input_data['auth_header'] raw_body = input_data['raw_body'] # Access secret credentials secret_store = StoreClient('YOUR_SECRET_HERE') hook_secret = secret_store.get('DBT_WEBHOOK_KEY') api_token = secret_store.get('DBT_CLOUD_SERVICE_TOKEN') # Validate the webhook came from dbt signature = hmac.new(hook_secret.encode('utf-8'), raw_body.encode('utf-8'), hashlib.sha256).hexdigest() if signature != auth_header: raise Exception("Calculated signature doesn't match contents of the Authorization header. This webhook may not have been sent from .") full_body = json.loads(raw_body) hook_data = full_body['data'] # Steps derived from these commands won't have their error details shown inline, as they're messy commands_to_skip_logs = ['dbt source', 'dbt docs'] # When testing, you will want to hardcode run_id and account_id to IDs that exist; the sample webhook won't work. run_id = hook_data['runId'] account_id = full_body['accountId'] # Fetch run info from the dbt Admin API url = f'https://YOUR_ACCESS_URL/api/v2/accounts/{account_id}/runs/{run_id}/?include_related=["run_steps"]' headers = {'Authorization': f'Token {api_token}'} run_data_response = requests.get(url, headers=headers) run_data_response.raise_for_status() run_data_results = run_data_response.json()['data'] # Overall run summary step_summary_post = f""" *<{run_data_results['href']}|{hook_data['runStatus']} for Run #{run_id} on Job \"{hook_data['jobName']}\">* *Environment:* {hook_data['environmentName']} | *Trigger:* {hook_data['runReason']} | *Duration:* {run_data_results['duration_humanized']} """ threaded_errors_post = "" # Step-specific summaries for step in run_data_results['run_steps']: if step['status_humanized'] == 'Success': step_summary_post += f""" ✅ {step['name']} ({step['status_humanized']} in {step['duration_humanized']}) """ else: step_summary_post += f""" ❌ {step['name']} ({step['status_humanized']} in {step['duration_humanized']}) """ # Don't try to extract info from steps that don't have well-formed logs show_logs = not any(cmd in step['name'] for cmd in commands_to_skip_logs) if show_logs: full_log = step['logs'] # Remove timestamp and any colour tags full_log = re.sub('\x1b?\[[0-9]+m[0-9:]*', '', full_log) summary_start = re.search('(?:Completed with \d+ error.* and \d+ warnings?:|Database Error|Compilation Error|Runtime Error)', full_log) line_items = re.findall('(^.*(?:Failure|Error) in .*\n.*\n.*)', full_log, re.MULTILINE) if not summary_start: continue threaded_errors_post += f""" *{step['name']}* """ # If there are no line items, the failure wasn't related to dbt nodes, and we want the whole rest of the message. # If there are, then we just want the summary line and then to log out each individual node's error. if len(line_items) == 0: relevant_log = f'```{full_log[summary_start.start():]}```' else: relevant_log = summary_start[0] for item in line_items: relevant_log += f'\n```\n{item.strip()}\n```\n' threaded_errors_post += f""" {relevant_log} """ send_error_thread = len(threaded_errors_post) > 0 # Zapier looks for the `output` dictionary for use in subsequent steps output = {'step_summary_post': step_summary_post, 'send_error_thread': send_error_thread, 'threaded_errors_post': threaded_errors_post} ```` #### Add Slack actions in Zapier[​](#add-slack-actions-in-zapier "Direct link to Add Slack actions in Zapier") Select **Slack** as the App, and **Send Channel Message** as the Action. In the **Action** section, choose which **Channel** to post to. Set the **Message Text** field to **2. Step Summary Post** from the Run Python in Code by Zapier output. Configure the other options as you prefer (for example, **Bot Name** and **Bot Icon**). ![Screenshot of the Zapier UI, showing the mappings of prior steps to a Slack message](/assets/images/parent-slack-config-39e85487efcfb04136c351992ed08cb9.png) Add another step, **Filter**. In the **Filter setup and testing** section, set the **Field** to **2. Send Error Thread** and the **condition** to **(Boolean) Is true**. This prevents the Zap from failing if the job succeeded and you try to send an empty Slack message in the next step. ![Screenshot of the Zapier UI, showing the correctly configured Filter step](/assets/images/filter-config-5a7f7eca78c49d24fd5b8674f23337e3.png) Add another **Send Channel Message in Slack** action. In the **Action** section, choose the same channel as last time, but set the **Message Text** to **2. Threaded Errors Post** from the same Run Python step. Set the **Thread** value to **3. Message Ts**, which is the timestamp of the post created by the first Slack action. This tells Zapier to add this post as a threaded reply to the main message, which prevents the full (potentially long) output from cluttering your channel. ![Screenshot of the Zapier UI, showing the mappings of prior steps to a Slack message](/assets/images/thread-slack-config-9ebe2df87964d97e82c18d80d9ff9ac2.png) #### Test and deploy[​](#test-and-deploy "Direct link to Test and deploy") When you're done testing your Zap, make sure that your `run_id` and `account_id` are no longer hardcoded in the Code step, then publish your Zap. #### Alternately, use a dbt app Slack message to trigger Zapier[​](#alternately-use-a-dbt-app-slack-message-to-trigger-zapier "Direct link to Alternately, use a dbt app Slack message to trigger Zapier") Instead of using a webhook as your trigger, you can keep the existing dbt app installed in your Slack workspace and use its messages being posted to your channel as the trigger. In this case, you can skip validating the webhook and only need to load the context from the thread. ##### 1. Create a new Zap in Zapier[​](#1-create-a-new-zap-in-zapier "Direct link to 1. Create a new Zap in Zapier") Use **Slack** as the initiating app, and **New Message Posted to Channel** as the Trigger. In the **Trigger** section, select the channel where your Slack alerts are being posted, and set **Trigger for Bot Messages?** to **Yes**. ![Screenshot of the Zapier UI, showing the correctly configured Message trigger step](/assets/images/message-trigger-config-432c82983008423e7914d0c59eab38cd.png) Test your Zap to find an example record. You might need to load additional samples until you get one that relates to a failed job, depending on whether you post all job events to Slack or not. ##### 2. Add a Filter step[​](#2-add-a-filter-step "Direct link to 2. Add a Filter step") Add a **Filter** step with the following conditions: * **1. Text contains failed on Job** * **1. User Is Bot Is true** * **1. User Name Exactly matches dbt** ![Screenshot of the Zapier UI, showing the correctly configured Filter step](/assets/images/message-trigger-filter-57c4f8c530e21a72704481619b040a51.png) ##### 3. Extract the run ID[​](#3-extract-the-run-id "Direct link to 3. Extract the run ID") Add a **Format** step with the **Event** of **Text**, and the Action **Extract Number**. For the **Input**, select **1. Text**. ![Screenshot of the Zapier UI, showing the Transform step configured to extract a number from the Slack message\'s Text property](/assets/images/extract-number-e9674c26f01614ccfd93b7fdefaab3ed.png) Test your step and validate that the run ID has been correctly extracted. ##### 4. Add a Delay[​](#4-add-a-delay "Direct link to 4. Add a Delay") Sometimes dbt posts the message about the run failing before the run's artifacts are available through the API. For this reason, it's recommended to add a brief delay to increase the likelihood that the data is available. On certain plans, Zapier will automatically retry a job that fails from to a 404 error, but its standdown period is longer than is normally necessary so the context will be missing from your thread for longer. A one-minute delay is generally sufficient. ##### 5. Store secrets[​](#5-store-secrets "Direct link to 5. Store secrets") In the next step, you will need either a dbt [personal access token](https://docs.getdbt.com/docs/dbt-cloud-apis/user-tokens.md) or [service account token](https://docs.getdbt.com/docs/dbt-cloud-apis/service-tokens.md). Zapier allows you to [store secrets](https://help.zapier.com/hc/en-us/articles/8496293271053-Save-and-retrieve-data-from-Zaps). This prevents your keys from being displayed as plaintext in the Zap code. You can access them with the [StoreClient utility](https://help.zapier.com/hc/en-us/articles/8496293969549-Store-data-from-code-steps-with-StoreClient). This guide assumes the name for the secret key is `DBT_CLOUD_SERVICE_TOKEN`. If you're using a different name, make sure you update all references to it in the sample code. This guide uses a short-lived code action to store the secrets, but you can also use a tool like Postman to interact with the [REST API](https://store.zapier.com/) or create a separate Zap and call the [Set Value Action](https://help.zapier.com/hc/en-us/articles/8496293271053-Save-and-retrieve-data-from-Zaps#3-set-a-value-in-your-store-0-3). ###### a. Create a Storage by Zapier connection[​](#a-create-a-storage-by-zapier-connection "Direct link to a. Create a Storage by Zapier connection") If you haven't already got one, go to and create a new connection. Remember the UUID secret you generate for later. ###### b. Add a temporary code step[​](#b-add-a-temporary-code-step "Direct link to b. Add a temporary code step") Choose **Run Python** as the Event. Run the following code: ```python store = StoreClient('abc123') #replace with your UUID secret store.set('DBT_CLOUD_SERVICE_TOKEN', 'abc123') #replace with your API token ``` Test the step. You can delete this Action when the test succeeds. The key will remain stored as long as it is accessed at least once every three months. ##### 6. Add a Code action[​](#6-add-a-code-action "Direct link to 6. Add a Code action") Select **Code by Zapier** as the App, and **Run Python** as the Event. This step is very similar to the one described in the main example, but you can skip a lot of the initial validation work. In the **Action** section, add two items to **Input Data**: `run_id` and `account_id`. Map those to the `3. Output` property and your hardcoded dbt Account ID, respectively. ![Screenshot of the Zapier UI, showing the mappings of raw\_body and auth\_header](/assets/images/code-example-alternate-bbb2b5028df008f1b6832e8453215ff4.png) In the **Code** field, paste the following code, replacing `YOUR_SECRET_HERE` with the secret you created when setting up the Storage by Zapier integration. Remember that this is not your dbt secret. This example code extracts the run logs for the completed job from the Admin API, and then builds a message displaying any error messages extracted from the end-of-invocation logs created by dbt Core (which will be posted in a thread). ````python import re # Access secret credentials secret_store = StoreClient('YOUR_SECRET_HERE') api_token = secret_store.get('DBT_CLOUD_SERVICE_TOKEN') # Steps derived from these commands won't have their error details shown inline, as they're messy commands_to_skip_logs = ['dbt source', 'dbt docs'] run_id = input_data['run_id'] account_id = input_data['account_id'] url = f'https://YOUR_ACCESS_URL/api/v2/accounts/{account_id}/runs/{run_id}/?include_related=["run_steps"]' headers = {'Authorization': f'Token {api_token}'} response = requests.get(url, headers=headers) response.raise_for_status() results = response.json()['data'] threaded_errors_post = "" for step in results['run_steps']: show_logs = not any(cmd in step['name'] for cmd in commands_to_skip_logs) if not show_logs: continue if step['status_humanized'] != 'Success': full_log = step['logs'] # Remove timestamp and any colour tags full_log = re.sub('\x1b?\[[0-9]+m[0-9:]*', '', full_log) summary_start = re.search('(?:Completed with \d+ error.* and \d+ warnings?:|Database Error|Compilation Error|Runtime Error)', full_log) line_items = re.findall('(^.*(?:Failure|Error) in .*\n.*\n.*)', full_log, re.MULTILINE) if not summary_start: continue threaded_errors_post += f""" *{step['name']}* """ # If there are no line items, the failure wasn't related to dbt nodes, and we want the whole rest of the message. # If there are, then we just want the summary line and then to log out each individual node's error. if len(line_items) == 0: relevant_log = f'```{full_log[summary_start.start():]}```' else: relevant_log = summary_start[0] for item in line_items: relevant_log += f'\n```\n{item.strip()}\n```\n' threaded_errors_post += f""" {relevant_log} """ output = {'threaded_errors_post': threaded_errors_post} ```` ##### 7. Add Slack action in Zapier[​](#7-add-slack-action-in-zapier "Direct link to 7. Add Slack action in Zapier") Add a **Send Channel Message in Slack** action. In the **Action** section, set the channel to **1. Channel Id**, which is the channel that the triggering message was posted in. Set the **Message Text** to **5. Threaded Errors Post** from the Run Python step. Set the **Thread** value to **1. Ts**, which is the timestamp of the triggering Slack post. This tells Zapier to add this post as a threaded reply to the main message, which prevents the full (potentially long) output from cluttering your channel. ![Screenshot of the Zapier UI, showing the mappings of prior steps to a Slack message](/assets/images/thread-slack-config-alternate-36df7dedc6e8e5688edd5bfe1439ef2c.png) ##### 8. Test and deploy[​](#8-test-and-deploy "Direct link to 8. Test and deploy") When you're done testing your Zap, publish it. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Productionize your dbt Databricks project [Back to guides](https://docs.getdbt.com/guides.md) Databricks dbt Core dbt platform Intermediate [Menu ]() #### Introduction[​](#introduction "Direct link to Introduction") Welcome to the third installment of our comprehensive series on optimizing and deploying your data pipelines using Databricks and dbt. In this guide, we'll dive into delivering these models to end users while incorporating best practices to ensure that your production data remains reliable and timely. ##### Prerequisites[​](#prerequisites "Direct link to Prerequisites") If you don't have any of the following requirements, refer to the instructions in the [Set up your dbt project with Databricks](https://docs.getdbt.com/guides/set-up-your-databricks-dbt-project.md) for help meeting these requirements: * You have [Set up your dbt project with Databricks](https://docs.getdbt.com/guides/set-up-your-databricks-dbt-project.md). * You have [optimized your dbt models for peak performance](https://docs.getdbt.com/guides/optimize-dbt-models-on-databricks.md). * You have created two catalogs in Databricks: *dev* and *prod*. * You have created Databricks Service Principal to run your production jobs. * You have at least one [deployment environment](https://docs.getdbt.com/docs/deploy/deploy-environments.md) in dbt. To get started, let's revisit the deployment environment created for your production data. ##### Deployment environments[​](#deployment-environments "Direct link to Deployment environments") In software engineering, environments play a crucial role in allowing engineers to develop and test code without affecting the end users of their software. Similarly, you can design [data lakehouses](https://www.databricks.com/product/data-lakehouse) with separate environments. The *production* environment includes the relations (schemas, tables, and views) that end users query or use, typically in a BI tool or ML model. In dbt, [environments](https://docs.getdbt.com/docs/dbt-cloud-environments.md) come in two flavors: * Deployment — Defines the settings used for executing jobs created within that environment. * Development — Determine the settings used in the Studio IDE for a particular dbt project. Each dbt project can have multiple deployment environments, but only one development environment per user. #### Create and schedule a production job[​](#create-and-schedule-a-production-job "Direct link to Create and schedule a production job") With your deployment environment set up, it's time to create a production job to run in your *prod* environment. To deploy our data transformation workflows, we will utilize [dbt’s built-in job scheduler](https://docs.getdbt.com/docs/deploy/deploy-jobs.md). The job scheduler is designed specifically to streamline your dbt project deployments and runs, ensuring that your data pipelines are easy to create, monitor, and modify efficiently. Leveraging dbt's job scheduler allows data teams to own the entire transformation workflow. You don't need to learn and maintain additional tools for orchestration or rely on another team to schedule code written by your team. This end-to-end ownership simplifies the deployment process and accelerates the delivery of new data products. Let’s [create a job](https://docs.getdbt.com/docs/deploy/deploy-jobs.md#create-and-schedule-jobs) in dbt that will transform data in our Databricks *prod* catalog. 1. Create a new job by clicking **Deploy** in the header, click **Jobs** and then **Create job**. 2. **Name** the job “Daily refresh”. 3. Set the **Environment** to your *production* environment. * This will allow the job to inherit the catalog, schema, credentials, and environment variables defined in [Set up your dbt project with Databricks](https://docs.getdbt.com/guides/set-up-your-databricks-dbt-project.md). 4. Under **Execution Settings** * Check the **Generate docs on run** checkbox to configure the job to automatically generate project docs each time this job runs. This will ensure your documentation stays evergreen as models are added and modified. * Select the **Run on source freshness** checkbox to configure dbt [source freshness](https://docs.getdbt.com/docs/deploy/source-freshness.md) as the first step of this job. Your sources will need to be configured to [snapshot freshness information](https://docs.getdbt.com/docs/build/sources.md#source-data-freshness) for this to drive meaningful insights. Add the following three **Commands:** * `dbt source freshness` * This will check if any sources are stale. We don’t want to recompute models with data that hasn’t changed since our last run. * `dbt test --models source:*` * This will test the data quality our source data, such as checking making sure ID fields are unique and not null. We don’t want bad data getting into production models. * `dbt build --exclude source:* --fail-fast` * dbt build is more efficient than issuing separate commands for dbt run and dbt test separately because it will run then test each model before continuing. * We are excluding source data because we already tested it in step 2. * The fail-fast flag will make dbt exit immediately if a single resource fails to build. If other models are in-progress when the first model fails, then dbt will terminate the connections for these still-running models. 5. Under **Triggers**, use the toggle to configure your job to [run on a schedule](https://docs.getdbt.com/docs/deploy/deploy-jobs.md#schedule-days). You can enter specific days and timing or create a custom cron schedule. * If you want your dbt job scheduled by another orchestrator, like Databricks Workflows, see the [Advanced Considerations](#advanced-considerations) section below. This is just one example of an all-or-nothing command list designed to minimize wasted computing. The [job command list](https://docs.getdbt.com/docs/deploy/job-commands.md) and [selectors](https://docs.getdbt.com/reference/node-selection/syntax.md) provide a lot of flexibility on how your DAG will execute. You may want to design yours to continue running certain models if others fail. You may want to set up multiple jobs to refresh models at different frequencies. See our [Job Creation Best Practices discourse](https://discourse.getdbt.com/t/job-creation-best-practices-in-dbt-cloud-feat-my-moms-lasagna/2980) for more job design suggestions. After your job is set up and runs successfully, configure your **[project artifacts](https://docs.getdbt.com/docs/deploy/artifacts.md)** to make this job inform your production docs site and data sources dashboard that can be reached from the UI. This will be our main production job to refresh data that will be used by end users. Another job everyone should include in their dbt project is a continuous integration job. #### Add a CI job[​](#add-a-ci-job "Direct link to Add a CI job") CI/CD, or Continuous Integration and Continuous Deployment/Delivery, has become a standard practice in software development for rapidly delivering new features and bug fixes while maintaining high quality and stability. dbt enables you to apply these practices to your data transformations. The steps below show how to create a CI test for your dbt project. CD in dbt requires no additional steps, as your jobs will automatically pick up the latest changes from the branch assigned to the environment your job is running in. You may choose to add steps depending on your deployment strategy. If you want to dive deeper into CD options, check out [this blog on adopting CI/CD with dbt](https://www.getdbt.com/blog/adopting-ci-cd-with-dbt-cloud/). dbt allows you to write [data tests](https://docs.getdbt.com/docs/build/data-tests.md) for your data pipeline, which can be run at every step of the process to ensure the stability and correctness of your data transformations. The main places you’ll use your dbt tests are: 1. **Daily runs:** Regularly running tests on your data pipeline helps catch issues caused by bad source data, ensuring the quality of data that reaches your users. 2. **Development**: Running tests during development ensures that your code changes do not break existing assumptions, enabling developers to iterate faster by catching problems immediately after writing code. 3. **CI checks**: Automated CI jobs run and test your pipeline end-to end when a pull request is created, providing confidence to developers, code reviewers, and end users that the proposed changes are reliable and will not cause disruptions or data quality issues Your CI job will ensure that the models build properly and pass any tests applied to them. We recommend creating a separate *test* environment and having a dedicated service principal. This will ensure the temporary schemas created during CI tests are in their own catalog and cannot unintentionally expose data to other users. Repeat the steps in [Set up your dbt project with Databricks](https://docs.getdbt.com/guides/set-up-your-databricks-dbt-project.md) to create your *prod* environment to create a *test* environment. After setup, you should have: * A catalog called *test* * A service principal called *dbt\_test\_sp* * A new dbt environment called *test* that defaults to the *test* catalog and uses the *dbt\_test\_sp* token in the deployment credentials We recommend setting up a dbt CI job. This will decrease the job’s runtime by running and testing only modified models, which also reduces compute spend on the lakehouse. To create a CI job, refer to [Set up CI jobs](https://docs.getdbt.com/docs/deploy/ci-jobs.md) for details. With dbt tests and SlimCI, you can feel confident that your production data will be timely and accurate even while delivering at high velocity. #### Monitor your jobs[​](#monitor-your-jobs "Direct link to Monitor your jobs") Keeping a close eye on your dbt jobs is crucial for maintaining a robust and efficient data pipeline. By monitoring job performance and quickly identifying potential issues, you can ensure that your data transformations run smoothly. dbt provides three entry points to monitor the health of your project: run history, deployment monitor, and status tiles. The [run history](https://docs.getdbt.com/docs/deploy/run-visibility.md#run-history) dashboard in dbt provides a detailed view of all your project's job runs, offering various filters to help you focus on specific aspects. This is an excellent tool for developers who want to check recent runs, verify overnight results, or track the progress of running jobs. To access it, select **Run History** from the **Deploy** menu. The deployment monitor in dbt offers a higher-level view of your run history, enabling you to gauge the health of your data pipeline over an extended period of time. This feature includes information on run durations and success rates, allowing you to identify trends in job performance, such as increasing run times or more frequent failures. The deployment monitor also highlights jobs in progress, queued, and recent failures. To access the deployment monitor click on the dbt logo in the top left corner of the dbt UI. [![The Deployment Monitor Shows Job Status Over Time Across Environments](/img/guides/databricks-guides/deployment_monitor_dbx.png?v=2 "The Deployment Monitor Shows Job Status Over Time Across Environments")](#)The Deployment Monitor Shows Job Status Over Time Across Environments By adding [data health tiles](https://docs.getdbt.com/docs/explore/data-tile.md) to your BI dashboards, you can give stakeholders visibility into the health of your data pipeline without leaving their preferred interface. Data tiles instill confidence in your data and help prevent unnecessary inquiries or context switching. To implement dashboard status tiles, you'll need to have dbt docs with [exposures](https://docs.getdbt.com/docs/build/exposures.md) defined. #### Set up notifications[​](#set-up-notifications "Direct link to Set up notifications") Setting up [notifications](https://docs.getdbt.com/docs/deploy/job-notifications.md) in dbt allows you to receive alerts via email or a Slack channel whenever a run ends. This ensures that the appropriate teams are notified and can take action promptly when jobs fail or are canceled. To set up notifications: 1. Navigate to your dbt project settings. 2. Select the **Notifications** tab. 3. Choose the desired notification type (Email or Slack) and configure the relevant settings. If you require notifications through other means than email or Slack, you can use dbt's outbound [webhooks](https://docs.getdbt.com/docs/deploy/webhooks.md) feature to relay job events to other tools. Webhooks enable you to integrate dbt with a wide range of SaaS applications, extending your pipeline’s automation into other systems. #### Troubleshooting[​](#troubleshooting "Direct link to Troubleshooting") When a disruption occurs in your production pipeline, it's essential to know how to troubleshoot issues effectively to minimize downtime and maintain a high degree of trust with your stakeholders. The five key steps for troubleshooting dbt issues are: 1. Read the error message: dbt error messages usually indicate the error type and the file where the issue occurred. 2. Inspect the problematic file and look for an immediate fix. 3. Isolate the problem by running one model at a time in the Studio IDE or undoing the code that caused the issue. 4. Check for problems in compiled files and logs. Consult the [Debugging errors documentation](https://docs.getdbt.com/guides/debug-errors.md) for a comprehensive list of error types and diagnostic methods. To troubleshoot issues with a dbt job, navigate to the "Deploy > Run History" tab in your dbt project and select the failed run. Then, expand the run steps to view [console and debug logs](https://docs.getdbt.com/docs/deploy/run-visibility.md#access-logs) to review the detailed log messages. To obtain additional information, open the Artifacts tab and download the compiled files associated with the run. If your jobs are taking longer than expected, use the [model timing](https://docs.getdbt.com/docs/deploy/run-visibility.md#model-timing) dashboard to identify bottlenecks in your pipeline. Analyzing the time taken for each model execution helps you pinpoint the slowest components and optimize them for better performance. The Databricks [Query History](https://docs.databricks.com/sql/admin/query-history.html) lets you inspect granular details such as time spent in each task, rows returned, I/O performance, and execution plan. For more on performance tuning, see our guide on [How to Optimize and Troubleshoot dbt Models on Databricks](https://docs.getdbt.com/guides/optimize-dbt-models-on-databricks.md). #### Advanced considerations[​](#advanced-considerations "Direct link to Advanced considerations") As you become more experienced with dbt and Databricks, you might want to explore advanced techniques to further enhance your data pipeline and improve the way you manage your data transformations. The topics in this section are not requirements but will help you harden your production environment for greater security, efficiency, and accessibility. ##### Refreshing your data with Databricks Workflows[​](#refreshing-your-data-with-databricks-workflows "Direct link to Refreshing your data with Databricks Workflows") The dbt job scheduler offers several ways to trigger your jobs. If your dbt transformations are just one step of a larger orchestration workflow, use the dbt API to trigger your job from Databricks Workflows. This is a common pattern for analytics use cases that want to minimize latency between ingesting bronze data into the lakehouse with a notebook, transforming that data into gold tables with dbt, and refreshing a dashboard. It is also useful for data science teams who use dbt for feature extraction before using the updated feature store to train and register machine learning models with MLflow. The API enables integration between your dbt jobs and the Databricks workflow, ensuring that your data transformations are effectively managed within the broader context of your data processing pipeline. Inserting dbt jobs into a Databricks Workflows allows you to chain together external tasks while still leveraging these benefits of dbt: * UI Context: The dbt UI enables you to define the job within the context of your dbt environments, making it easier to create and manage relevant configs. * Logs and Run History: Accessing logs and run history becomes more convenient when using dbt. * Monitoring and Notification Features: dbt comes equipped with monitoring and notification features like the ones described above that can help you stay informed about the status and performance of your jobs. To trigger your dbt job from Databricks, follow the instructions in our [Databricks Workflows to run dbt jobs guide](https://docs.getdbt.com/guides/how-to-use-databricks-workflows-to-run-dbt-cloud-jobs.md). #### Data masking[​](#data-masking "Direct link to Data masking") Our [Best Practices for dbt and Unity Catalog](https://docs.getdbt.com/best-practices/dbt-unity-catalog-best-practices.md) guide recommends using separate catalogs *dev* and *prod* for development and deployment environments, with Unity Catalog and dbt handling configurations and permissions for environment isolation. Ensuring security while maintaining efficiency in your development and deployment environments is crucial. Additional security measures may be necessary to protect sensitive data, such as personally identifiable information (PII). Databricks leverages [Dynamic Views](https://docs.databricks.com/data-governance/unity-catalog/create-views.html#create-a-dynamic-view) to enable data masking based on group membership. Because views in Unity Catalog use Spark SQL, you can implement advanced data masking by using more complex SQL expressions and regular expressions. You can now also apply fine grained access controls like row filters in preview and column masks in preview on tables in Databricks Unity Catalog, which will be the recommended approach to protect sensitive data once this goes GA. Additionally, in the near term, Databricks Unity Catalog will also enable Attribute Based Access Control natively, which will make protecting sensitive data at scale simpler. To implement data masking in a dbt model, ensure the model materialization configuration is set to view. Next, add a case statement using the is\_account\_group\_member function to identify groups permitted to view plain text values. Then, use regex to mask data for all other users. For example: ```sql CASE WHEN is_account_group_member('auditors') THEN email ELSE regexp_extract(email, '^.*@(.*)$', 1) END ``` It is recommended not to grant users the ability to read tables and views referenced in the dynamic view. Instead, assign your dbt sources to dynamic views rather than raw data, allowing developers to run end-to-end builds and source freshness commands securely. Using the same sources for development and deployment environments enables testing with the same volumes and frequency you will see in production. However, this may cause development runs to take longer than necessary. To address this issue, consider using the Jinja variable target.name to [limit data when working in the development environment](https://docs.getdbt.com/reference/dbt-jinja-functions/target.md#use-targetname-to-limit-data-in-dev). #### Pairing dbt Docs and Unity Catalog[​](#pairing-dbt-docs-and-unity-catalog "Direct link to Pairing dbt Docs and Unity Catalog") Though there are similarities between dbt docs and Databricks Unity Catalog, they are ultimately used for different purposes and complement each other well. By combining their strengths, you can provide your organization with a robust and user-friendly data management ecosystem. dbt docs is a documentation site generated from your dbt project that provides an interface for developers and non-technical stakeholders to understand the data lineage and business logic applied to transformations without requiring full access to dbt or Databricks. It gives you additional options on how you can organize and search for your data. You can automatically [build and view your dbt docs using dbt](https://docs.getdbt.com/docs/explore/build-and-view-your-docs.md) to keep the documentation evergreen. Unity Catalog is a unified governance solution for your lakehouse. It provides a data explorer that can be used for discovery of datasets that have not been defined in dbt. The data explorer also captures [column-level lineage](https://docs.databricks.com/data-governance/unity-catalog/data-lineage.html#capture-and-explore-lineage),  when you need to trace the lineage of a specific column. To get the most out of both tools, you can use the [persist docs config](https://docs.getdbt.com/reference/resource-configs/persist_docs.md) to push table and column descriptions written in dbt into Unity Catalog, making the information easily accessible to both tools' users. Keeping the descriptions in dbt ensures they are version controlled and can be reproduced after a table is dropped. ##### Related docs[​](#related-docs "Direct link to Related docs") * [Advanced Deployment course](https://learn.getdbt.com/courses/advanced-deployment) if you want a deeper dive into these topics * [Autoscaling CI: The intelligent Slim CI](https://docs.getdbt.com/docs/deploy/continuous-integration.md) * [Trigger a dbt Job in your automated workflow with Python](https://discourse.getdbt.com/t/triggering-a-dbt-cloud-job-in-your-automated-workflow-with-python/2573) * [Databricks + dbt Quickstart Guide](https://docs.getdbt.com/guides/databricks.md) * Reach out to your Databricks account team to get access to preview features on Databricks. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Quickstart for dbt and Amazon Athena [Back to guides](https://docs.getdbt.com/guides.md) Amazon Athena dbt platform Quickstart Beginner [Menu ]() #### Introduction[​](#introduction "Direct link to Introduction") In this quickstart guide, you'll learn how to use dbt with Amazon Athena. It will show you how to: * Create an S3 bucket for Athena query results. * Create an Athena database. * Access sample data in a public dataset. * Connect dbt to Amazon Athena. * Take a sample query and turn it into a model in your dbt project. A model in dbt is a select statement. * Add tests to your models. * Document your models. * Schedule a job to run. Videos for you You can check out [dbt Fundamentals](https://learn.getdbt.com/courses/dbt-fundamentals) for free if you're interested in course learning with videos. ##### Prerequisites​[​](#prerequisites "Direct link to Prerequisites​") * You have a [dbt account](https://www.getdbt.com/signup/). * You have an [AWS account](https://aws.amazon.com/). * You have set up [Amazon Athena](https://docs.aws.amazon.com/athena/latest/ug/getting-started.html). ##### Related content[​](#related-content "Direct link to Related content") * Learn more with [dbt Learn courses](https://learn.getdbt.com) * [CI jobs](https://docs.getdbt.com/docs/deploy/continuous-integration.md) * [Deploy jobs](https://docs.getdbt.com/docs/deploy/deploy-jobs.md) * [Job notifications](https://docs.getdbt.com/docs/deploy/job-notifications.md) * [Source freshness](https://docs.getdbt.com/docs/deploy/source-freshness.md) #### Getting started[​](#getting-started "Direct link to Getting started") For the following guide you can use an existing S3 bucket or [create a new one](https://docs.aws.amazon.com/AmazonS3/latest/userguide/create-bucket-overview.html). Download the following CSV files (the Jaffle Shop sample data) and upload them to your S3 bucket: * [jaffle\_shop\_customers.csv](https://dbt-tutorial-public.s3-us-west-2.amazonaws.com/jaffle_shop_customers.csv) * [jaffle\_shop\_orders.csv](https://dbt-tutorial-public.s3-us-west-2.amazonaws.com/jaffle_shop_orders.csv) * [stripe\_payments.csv](https://dbt-tutorial-public.s3-us-west-2.amazonaws.com/stripe_payments.csv) #### Configure Amazon Athena[​](#configure-amazon-athena "Direct link to Configure Amazon Athena") 1. Log into your AWS account and navigate to the **Athena console**. * If this is your first time in the Athena console (in your current AWS Region), click **Explore the query editor** to open the query editor. Otherwise, Athena opens automatically in the query editor. 2. Open **Settings** and find the **Location of query result box** field. 1. Enter the path of the S3 bucket (prefix it with `s3://`). 2. Navigate to **Browse S3**, select the S3 bucket you created, and click **Choose**. 3. **Save** these settings. 4. In the **query editor**, create a database by running `create database YOUR_DATABASE_NAME`. 5. To make the database you created the one you `write` into, select it from the **Database** list on the left side menu. 6. Access the Jaffle Shop data in the S3 bucket using one of these options: 1. Manually create the tables. 2. Create a glue crawler to recreate the data as external tables (recommended). 7. Once the tables have been created, you will able to `SELECT` from them. #### Set up security access to Athena[​](#set-up-security-access-to-athena "Direct link to Set up security access to Athena") To setup the security access for Athena, determine which access method you want to use: * Obtain `aws_access_key_id` and `aws_secret_access_key` (recommended) * Obtain an **AWS credentials** file. ##### AWS access key (recommended)[​](#aws-access-key-recommended "Direct link to AWS access key (recommended)") To obtain your `aws_access_key_id` and `aws_secret_access_key`: 1. Open the **AWS Console**. 2. Click on your **username** near the top right and click **Security Credentials**. 3. Click on **Users** in the sidebar. 4. Click on your **username** (or the name of the user for whom to create the key). 5. Click on the **Security Credentials** tab. 6. Click **Create Access Key**. 7. Click **Show User Security Credentials** and Save the `aws_access_key_id` and `aws_secret_access_key` for a future step. ##### AWS credentials file[​](#aws-credentials-file "Direct link to AWS credentials file") To obtain your AWS credentials file: 1. Follow the instructions for [configuring the credentials file](https://docs.aws.amazon.com/cli/v1/userguide/cli-configure-files.html) usin the AWS CLI 2. Locate the `~/.aws/credentials` file on your computer 1. Windows: `%USERPROFILE%\.aws\credentials` 2. Mac/Linux: `~/.aws/credentials` Retrieve the `aws_access_key_id` and `aws_secret_access_key` from the `~/.aws/credentials` file for a future step. #### Configure the connection in dbt[​](#configure-the-connection-in-dbt "Direct link to Configure the connection in dbt") To configure the Athena connection in dbt: 1. Click your **account name** on the left-side menu and click **Account settings**. 2. Click **Connections** and click **New connection**. 3. Click **Athena** and fill out the required fields (and any optional fields). 1. **AWS region name** — The AWS region of your environment. 2. **Database (catalog)** — Enter the database name created in earlier steps (lowercase only). 3. **AWS S3 staging directory** — Enter the S3 bucket created in earlier steps. 4. Click **Save** ##### Configure your environment[​](#configure-your-environment "Direct link to Configure your environment") To configure the Athena credentials in your environment: 1. Click **Deploy** on the left-side menu and click **Environments**. 2. Click **Create environment** and fill out the **General settings**. * Your **dbt version** must be on a [release track](https://docs.getdbt.com/docs/dbt-versions/cloud-release-tracks.md) to use the Athena connection. 3. Select the Athena connection from the **Connection** dropdown. 4. Fill out the `aws_access_key` and `aws_access_id` recorded in previous steps, as well as the `Schema` to write to. 5. Click **Test connection** and once it succeeds, **Save** the environment. Repeat the process to create a [development environment](https://docs.getdbt.com/docs/dbt-cloud-environments.md#types-of-environments). #### Set up a dbt managed repository[​](#set-up-a-dbt-managed-repository "Direct link to Set up a dbt managed repository") When you develop in dbt, you can leverage [Git](https://docs.getdbt.com/docs/cloud/git/git-version-control.md) to version control your code. To connect to a repository, you can either set up a dbt-hosted [managed repository](https://docs.getdbt.com/docs/cloud/git/managed-repository.md) or directly connect to a [supported git provider](https://docs.getdbt.com/docs/cloud/git/connect-github.md). Managed repositories are a great way to trial dbt without needing to create a new repository. In the long run, it's better to connect to a supported git provider to use features like automation and [continuous integration](https://docs.getdbt.com/docs/deploy/continuous-integration.md). To set up a managed repository: 1. Under "Setup a repository", select **Managed**. 2. Type a name for your repo such as `bbaggins-dbt-quickstart` 3. Click **Create**. It will take a few seconds for your repository to be created and imported. 4. Once you see the "Successfully imported repository," click **Continue**. #### Initialize your dbt project​ and start developing[​](#initialize-your-dbt-project-and-start-developing "Direct link to Initialize your dbt project​ and start developing") Now that you have a repository configured, you can initialize your project and start development in dbt: 1. Click **Start developing in the Studio IDE**. It might take a few minutes for your project to spin up for the first time as it establishes your git connection, clones your repo, and tests the connection to the warehouse. 2. Above the file tree to the left, click **Initialize dbt project**. This builds out your folder structure with example models. 3. Make your initial commit by clicking **Commit and sync**. Use the commit message `initial commit` and click **Commit**. This creates the first commit to your managed repo and allows you to open a branch where you can add new dbt code. 4. You can now directly query data from your warehouse and execute `dbt run`. You can try this out now: * Click **+ Create new file**, add this query to the new file, and click **Save as** to save the new file: ```sql select * from jaffle_shop.customers ``` * In the command line bar at the bottom, enter `dbt run` and click **Enter**. You should see a `dbt run succeeded` message. #### Build your first model[​](#build-your-first-model "Direct link to Build your first model") You have two options for working with files in the Studio IDE: * Create a new branch (recommended) — Create a new branch to edit and commit your changes. Navigate to **Version Control** on the left sidebar and click **Create branch**. * Edit in the protected primary branch — If you prefer to edit, format, or lint files and execute dbt commands directly in your primary git branch. The Studio IDE prevents commits to the protected branch, so you will be prompted to commit your changes to a new branch. Name the new branch `add-customers-model`. 1. Click the **...** next to the `models` directory, then select **Create file**. 2. Name the file `customers.sql`, then click **Create**. 3. Copy the following query into the file and click **Save**. ```sql with customers as ( select id as customer_id, first_name, last_name from jaffle_shop.customers ), orders as ( select id as order_id, user_id as customer_id, order_date, status from jaffle_shop.orders ), customer_orders as ( select customer_id, min(order_date) as first_order_date, max(order_date) as most_recent_order_date, count(order_id) as number_of_orders from orders group by 1 ), final as ( select customers.customer_id, customers.first_name, customers.last_name, customer_orders.first_order_date, customer_orders.most_recent_order_date, coalesce(customer_orders.number_of_orders, 0) as number_of_orders from customers left join customer_orders using (customer_id) ) select * from final ``` 4. Enter `dbt run` in the command prompt at the bottom of the screen. You should get a successful run and see the three models. Later, you can connect your business intelligence (BI) tools to these views and tables so they only read cleaned up data rather than raw data in your BI tool. ###### FAQs[​](#faqs "Direct link to FAQs") How can I see the SQL that dbt is running? To check out the SQL that dbt is running, you can look in: * dbt: * Within the run output, click on a model name, and then select "Details" * dbt Core: * The `target/compiled/` directory for compiled `select` statements * The `target/run/` directory for compiled `create` statements * The `logs/dbt.log` file for verbose logging. How did dbt choose which schema to build my models in? By default, dbt builds models in your target schema. To change your target schema: * If you're developing in **dbt**, these are set for each user when you first use a development environment. * If you're developing with **dbt Core**, this is the `schema:` parameter in your `profiles.yml` file. If you wish to split your models across multiple schemas, check out the docs on [using custom schemas](https://docs.getdbt.com/docs/build/custom-schemas.md). Note: on BigQuery, `dataset` is used interchangeably with `schema`. Do I need to create my target schema before running dbt? Nope! dbt will check if the schema exists when it runs. If the schema does not exist, dbt will create it for you. If I rerun dbt, will there be any downtime as models are rebuilt? Nope! The SQL that dbt generates behind the scenes ensures that any relations are replaced atomically (i.e. your business users won't experience any downtime). The implementation of this varies on each warehouse, check out the [logs](https://docs.getdbt.com/faqs/Runs/checking-logs.md) to see the SQL dbt is executing. What happens if the SQL in my query is bad or I get a database error? If there's a mistake in your SQL, dbt will return the error that your database returns. ```shell $ dbt run --select customers Running with dbt=1.9.0 Found 3 models, 9 tests, 0 snapshots, 0 analyses, 133 macros, 0 operations, 0 seed files, 0 sources 14:04:12 | Concurrency: 1 threads (target='dev') 14:04:12 | 14:04:12 | 1 of 1 START view model dbt_alice.customers.......................... [RUN] 14:04:13 | 1 of 1 ERROR creating view model dbt_alice.customers................. [ERROR in 0.81s] 14:04:13 | 14:04:13 | Finished running 1 view model in 1.68s. Completed with 1 error and 0 warnings: Database Error in model customers (models/customers.sql) Syntax error: Expected ")" but got identifier `your-info-12345` at [13:15] compiled SQL at target/run/jaffle_shop/customers.sql Done. PASS=0 WARN=0 ERROR=1 SKIP=0 TOTAL=1 ``` Any models downstream of this model will also be skipped. Use the error message and the [compiled SQL](https://docs.getdbt.com/faqs/Runs/checking-logs.md) to debug any errors. #### Change the way your model is materialized[​](#change-the-way-your-model-is-materialized "Direct link to Change the way your model is materialized") One of the most powerful features of dbt is that you can change the way a model is materialized in your warehouse, simply by changing a configuration value. You can change things between tables and views by changing a keyword rather than writing the data definition language (DDL) to do this behind the scenes. By default, everything gets created as a view. You can override that at the directory level so everything in that directory will materialize to a different materialization. 1. Edit your `dbt_project.yml` file. * Update your project `name` to: dbt\_project.yml ```yaml name: 'jaffle_shop' ``` * Configure `jaffle_shop` so everything in it will be materialized as a table; and configure `example` so everything in it will be materialized as a view. Update your `models` config in the project YAML file to: dbt\_project.yml ```yaml models: jaffle_shop: +materialized: table example: +materialized: view ``` * Click **Save**. 2. Enter the `dbt run` command. Your `customers` model should now be built as a table! info To do this, dbt had to first run a `drop view` statement (or API call on BigQuery), then a `create table as` statement. 3. Edit `models/customers.sql` to override the `dbt_project.yml` for the `customers` model only by adding the following snippet to the top, and click **Save**: models/customers.sql ```sql {{ config( materialized='view' ) }} with customers as ( select id as customer_id ... ) ``` 4. Enter the `dbt run` command. Your model, `customers`, should now build as a view. * BigQuery users need to run `dbt run --full-refresh` instead of `dbt run` to full apply materialization changes. 5. Enter the `dbt run --full-refresh` command for this to take effect in your warehouse. ##### FAQs[​](#faqs "Direct link to FAQs") What materializations are available in dbt? dbt ships with five built-in materializations: `view`, `table`, `incremental`, `ephemeral`, and `materialized_view`. Check out the documentation on [materializations](https://docs.getdbt.com/docs/build/materializations.md) for more information on each of these options. You can also create your own [custom materializations](https://docs.getdbt.com/guides/create-new-materializations.md). This is an advanced feature of dbt. Which materialization should I use for my model? Start out with views, and then change models to tables when required for performance reasons (i.e. downstream queries have slowed). Check out the [docs on materializations](https://docs.getdbt.com/docs/build/materializations.md) for advice on when to use each materialization. What model configurations exist? You can also configure: * [tags](https://docs.getdbt.com/reference/resource-configs/tags.md) to support easy categorization and graph selection * [custom schemas](https://docs.getdbt.com/reference/resource-properties/schema.md) to split your models across multiple schemas * [aliases](https://docs.getdbt.com/reference/resource-configs/alias.md) if your view/table name should differ from the filename * Snippets of SQL to run at the start or end of a model, known as [hooks](https://docs.getdbt.com/docs/build/hooks-operations.md) * Warehouse-specific configurations for performance (e.g. `sort` and `dist` keys on Redshift, `partitions` on BigQuery) Check out the docs on [model configurations](https://docs.getdbt.com/reference/model-configs.md) to learn more. #### Delete the example models[​](#delete-the-example-models "Direct link to Delete the example models") You can now delete the files that dbt created when you initialized the project: 1. Delete the `models/example/` directory. 2. Delete the `example:` key from your `dbt_project.yml` file, and any configurations that are listed under it. dbt\_project.yml ```yaml # before models: jaffle_shop: +materialized: table example: +materialized: view ``` dbt\_project.yml ```yaml # after models: jaffle_shop: +materialized: table ``` 3. Save your changes. ###### FAQs[​](#faqs "Direct link to FAQs") How do I remove deleted models from my data warehouse? If you delete a model from your dbt project, dbt does not automatically drop the relation from your schema. This means that you can end up with extra objects in schemas that dbt creates, which can be confusing to other users. (This can also happen when you switch a model from being a view or table, to ephemeral) When you remove models from your dbt project, you should manually drop the related relations from your schema. I got an "unused model configurations" error message, what does this mean? You might have forgotten to nest your configurations under your project name, or you might be trying to apply configurations to a directory that doesn't exist. Check out this [article](https://discourse.getdbt.com/t/faq-i-got-an-unused-model-configurations-error-message-what-does-this-mean/112) to understand more. #### Build models on top of other models[​](#build-models-on-top-of-other-models "Direct link to Build models on top of other models") As a best practice in SQL, you should separate logic that cleans up your data from logic that transforms your data. You have already started doing this in the existing query by using common table expressions (CTEs). Now you can experiment by separating the logic out into separate models and using the [ref](https://docs.getdbt.com/reference/dbt-jinja-functions/ref.md) function to build models on top of other models: [![The DAG we want for our dbt project](/img/dbt-dag.png?v=2 "The DAG we want for our dbt project")](#)The DAG we want for our dbt project 1. Create a new SQL file, `models/stg_customers.sql`, with the SQL from the `customers` CTE in our original query. 2. Create a second new SQL file, `models/stg_orders.sql`, with the SQL from the `orders` CTE in our original query. models/stg\_customers.sql ```sql select id as customer_id, first_name, last_name from jaffle_shop.customers ``` models/stg\_orders.sql ```sql select id as order_id, user_id as customer_id, order_date, status from jaffle_shop.orders ``` 3. Edit the SQL in your `models/customers.sql` file as follows: models/customers.sql ```sql with customers as ( select * from {{ ref('stg_customers') }} ), orders as ( select * from {{ ref('stg_orders') }} ), customer_orders as ( select customer_id, min(order_date) as first_order_date, max(order_date) as most_recent_order_date, count(order_id) as number_of_orders from orders group by 1 ), final as ( select customers.customer_id, customers.first_name, customers.last_name, customer_orders.first_order_date, customer_orders.most_recent_order_date, coalesce(customer_orders.number_of_orders, 0) as number_of_orders from customers left join customer_orders using (customer_id) ) select * from final ``` 4. Execute `dbt run`. This time, when you performed a `dbt run`, separate views/tables were created for `stg_customers`, `stg_orders` and `customers`. dbt inferred the order to run these models. Because `customers` depends on `stg_customers` and `stg_orders`, dbt builds `customers` last. You do not need to explicitly define these dependencies. ###### FAQs[​](#faq-2 "Direct link to FAQs") How do I run one model at a time? To run one model, use the `--select` flag (or `-s` flag), followed by the name of the model: ```shell $ dbt run --select customers ``` Check out the [model selection syntax documentation](https://docs.getdbt.com/reference/node-selection/syntax.md) for more operators and examples. Do ref-able resource names need to be unique? Within one project: yes! To build dependencies between resources (such as models, seeds, and snapshots), you need to use the `ref` function, and pass in the resource name as an argument. dbt uses that resource name to uniquely resolve the `ref` to a specific resource. As a result, these resource names need to be unique, *even if they are in distinct folders*. A resource in one project can have the same name as a resource in another project (installed as a dependency). dbt uses the project name to uniquely identify each resource. We call this "namespacing." If you `ref` a resource with a duplicated name, it will resolve to the resource within the same namespace (package or project), or raise an error because of an ambiguous reference. Use [two-argument `ref`](https://docs.getdbt.com/reference/dbt-jinja-functions/ref.md#ref-project-specific-models) to disambiguate references by specifying the namespace. Those resource will still need to land in distinct locations in the data warehouse. Read the docs on [custom aliases](https://docs.getdbt.com/docs/build/custom-aliases.md) and [custom schemas](https://docs.getdbt.com/docs/build/custom-schemas.md) for details on how to achieve this. As I create more models, how should I keep my project organized? What should I name my models? There's no one best way to structure a project! Every organization is unique. If you're just getting started, check out how we (dbt Labs) [structure our dbt projects](https://docs.getdbt.com/best-practices/how-we-structure/1-guide-overview.md). #### Add tests to your models[​](#add-tests-to-your-models "Direct link to Add tests to your models") Adding [data tests](https://docs.getdbt.com/docs/build/data-tests.md) to a project helps validate that your models are working correctly. To add data tests to your project: 1. Create a new YAML file in the `models` directory, named `models/schema.yml` 2. Add the following contents to the file: models/schema.yml ```yaml version: 2 models: - name: customers columns: - name: customer_id data_tests: - unique - not_null - name: stg_customers columns: - name: customer_id data_tests: - unique - not_null - name: stg_orders columns: - name: order_id data_tests: - unique - not_null - name: status data_tests: - accepted_values: arguments: # available in v1.10.5 and higher. Older versions can set the as the top-level property. values: ['placed', 'shipped', 'completed', 'return_pending', 'returned'] - name: customer_id data_tests: - not_null - relationships: arguments: to: ref('stg_customers') field: customer_id ``` 3. Run `dbt test`, and confirm that all your tests passed. When you run `dbt test`, dbt iterates through your YAML files, and constructs a query for each test. Each query will return the number of records that fail the test. If this number is 0, then the test is successful. ###### FAQs[​](#faqs "Direct link to FAQs") What tests are available for me to use in dbt? Can I add my own custom tests? Out of the box, dbt ships with the following data tests: * `unique` * `not_null` * `accepted_values` * `relationships` (for example, referential integrity) You can also write your own [custom generic tests](https://docs.getdbt.com/docs/build/data-tests.md#generic-data-tests). Some additional generic tests have been open-sourced in the [dbt-utils package](https://github.com/dbt-labs/dbt-utils#generic-tests). Check out the docs on [packages](https://docs.getdbt.com/docs/build/packages.md) to learn how to make these tests available in your project. How do I test one model at a time? Running tests on one model looks very similar to running a model: use the `--select` flag (or `-s` flag), followed by the name of the model: ```shell dbt test --select customers ``` Check out the [model selection syntax documentation](https://docs.getdbt.com/reference/node-selection/syntax.md) for full syntax, and [test selection examples](https://docs.getdbt.com/reference/node-selection/test-selection-examples.md) in particular. One of my tests failed, how can I debug it? To debug a failing test, find the SQL that dbt ran by: * dbt: * Within the test output, click on the failed test, and then select "Details". * dbt Core: * Open the file path returned as part of the error message. * Navigate to the `target/compiled/schema_tests` directory for all compiled test queries. Copy the SQL into a query editor (in dbt, you can paste it into a new `Statement`), and run the query to find the records that failed. Does my test file need to be named \`schema.yml\`? No! You can name this file whatever you want (including `whatever_you_want.yml`), so long as: * The file is in your `models/` directory¹ * The file has `.yml` extension Check out the [docs](https://docs.getdbt.com/reference/configs-and-properties.md) for more information. ¹If you're declaring properties for seeds, snapshots, or macros, you can also place this file in the related directory — `seeds/`, `snapshots/` and `macros/` respectively. Why do model and source YAML files always start with \`version: 2\`? Once upon a time, the structure of these `.yml` files was very different (s/o to anyone who was using dbt back then!). Adding `version: 2` allowed us to make this structure more extensible. From [dbt Core v1.5](), the top-level `version:` key is optional in all resource YAML files. If present, only `version: 2` is supported. Also starting in v1.5, both the [`config-version: 2`](https://docs.getdbt.com/reference/project-configs/config-version.md) and the top-level `version:` key in the `dbt_project.yml` are optional. Resource YAML files do not currently require this config. We only support `version: 2` if it's specified. Although we do not expect to update YAML files to `version: 3` soon, having this config will make it easier for us to introduce new structures in the future What data tests should I add to my project? We recommend that every model has a data test on a primary key, that is, a column that is `unique` and `not_null`. We also recommend that you test any assumptions on your source data. For example, if you believe that your payments can only be one of three payment methods, you should test that assumption regularly — a new payment method may introduce logic errors in your SQL. In advanced dbt projects, we recommend using [sources](https://docs.getdbt.com/docs/build/sources.md) and running these source data-integrity tests against the sources rather than models. When should I run my data tests? You should run your data tests whenever you are writing new code (to ensure you haven't broken any existing models by changing SQL), and whenever you run your transformations in production (to ensure that your assumptions about your source data are still valid). #### Document your models[​](#document-your-models "Direct link to Document your models") Adding [documentation](https://docs.getdbt.com/docs/build/documentation.md) to your project allows you to describe your models in rich detail, and share that information with your team. Here, we're going to add some basic documentation to our project. Update your `models/schema.yml` file to include some descriptions, such as those below. models/schema.yml ```yaml version: 2 models: - name: customers description: One record per customer columns: - name: customer_id description: Primary key data_tests: - unique - not_null - name: first_order_date description: NULL when a customer has not yet placed an order. - name: stg_customers description: This model cleans up customer data columns: - name: customer_id description: Primary key data_tests: - unique - not_null - name: stg_orders description: This model cleans up order data columns: - name: order_id description: Primary key data_tests: - unique - not_null - name: status data_tests: - accepted_values: arguments: # available in v1.10.5 and higher. Older versions can set the as the top-level property. values: ['placed', 'shipped', 'completed', 'return_pending', 'returned'] - name: customer_id data_tests: - not_null - relationships: arguments: to: ref('stg_customers') field: customer_id ``` * View in Catalog * View in Studio IDE [Catalog](https://docs.getdbt.com/docs/explore/explore-projects.md) provides powerful tools to interact with your dbt projects, including documentation: 1. From the IDE, run one of the following commands: * `dbt docs generate` if you're on dbt Core * `dbt build` if you're on the dbt Fusion engine 2. Click **Catalog** in the navigation menu to launch Catalog. 3. In the Catalog pane, click the environment selection dropdown menu at the top of the file tree and change it from **Production** to **Development**. [![View your development environment information.](/img/docs/collaborate/dbt-explorer/catalog-nav-dropdown.png?v=2 "View your development environment information.")](#)View your development environment information. 4. Select your project from the file tree. 5. Use the search bar or browse the resource list to find the `customers` model. 6. Click the model to view its details, including the descriptions you added. [![View your model's documentation and lineage in Catalog.](/img/docs/collaborate/dbt-explorer/example-model-details.png?v=2 "View your model's documentation and lineage in Catalog.")](#)View your model's documentation and lineage in Catalog. Catalog displays your model's description, column documentation, data tests, and lineage graph. You can also see which columns are missing documentation and track test coverage across your project. You can view docs directly from the IDE if you're on `Latest` or another version of dbt Core. Keep in mind that this is a legacy view and doesn't offer the same level of interactivity as Catalog. 1. In the IDE, run `dbt docs generate`. 2. From the navigation bar, click the **View docs** icon located to the right of the **branch name**. [![The View docs icon in the Studio IDE.](/img/docs/collaborate/dbt-explorer/docs-icon.png?v=2 "The View docs icon in the Studio IDE.")](#)The View docs icon in the Studio IDE. 3. From **Projects**, select your project name and expand the folders. 4. Click **models** > **marts** > **customers**. [![View your model's documentation in the legacy docs view.](/img/docs/collaborate/dbt-explorer/legacy-docs-view.png?v=2 "View your model's documentation in the legacy docs view.")](#)View your model's documentation in the legacy docs view. ###### FAQs[​](#faqs "Direct link to FAQs") How do I write long-form explanations in my descriptions? If you need more than a sentence to explain a model, you can: 1. Split your description over multiple lines using `>`. Interior line breaks are removed and Markdown can be used. This method is recommended for simple, single-paragraph descriptions: ```yml models: - name: customers description: > Lorem ipsum **dolor** sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. ``` 2. Split your description over multiple lines using `|`. Interior line breaks are maintained and Markdown can be used. This method is recommended for more complex descriptions: ```yml models: - name: customers description: | ### Lorem ipsum * dolor sit amet, consectetur adipisicing elit, sed do eiusmod * tempor incididunt ut labore et dolore magna aliqua. ``` 3. Use a [docs block](https://docs.getdbt.com/docs/build/documentation.md#using-docs-blocks) to write the description in a separate Markdown file. How do I access documentation in dbt Catalog? If you're using dbt to deploy your project and have a [Starter, Enterprise, or Enterprise+ plan](https://www.getdbt.com/pricing/), you can use Catalog to view your project's [resources](https://docs.getdbt.com/docs/build/projects.md) (such as models, tests, and metrics) and their lineage to gain a better understanding of its latest production state. Access Catalog in dbt by clicking the **Catalog** link in the navigation. You can have up to 5 read-only users access the documentation for your project. dbt developer plan and dbt Core users can use [dbt Docs](https://docs.getdbt.com/docs/explore/build-and-view-your-docs.md#dbt-docs), which generates basic documentation but it doesn't offer the same speed, metadata, or visibility as Catalog. #### Commit your changes[​](#commit-your-changes "Direct link to Commit your changes") Now that you've built your customer model, you need to commit the changes you made to the project so that the repository has your latest code. **If you edited directly in the protected primary branch:**
1. Click the **Commit and sync git** button. This action prepares your changes for commit. 2. A modal titled **Commit to a new branch** will appear. 3. In the modal window, name your new branch `add-customers-model`. This branches off from your primary branch with your new changes. 4. Add a commit message, such as "Add customers model, tests, docs" and and commit your changes. 5. Click **Merge this branch to main** to add these changes to the main branch on your repo. **If you created a new branch before editing:**
1. Since you already branched out of the primary protected branch, go to **Version Control** on the left. 2. Click **Commit and sync** to add a message. 3. Add a commit message, such as "Add customers model, tests, docs." 4. Click **Merge this branch to main** to add these changes to the main branch on your repo. #### Deploy dbt[​](#deploy-dbt "Direct link to Deploy dbt") Use dbt's Scheduler to deploy your production jobs confidently and build observability into your processes. You'll learn to create a deployment environment and run a job in the following steps. ##### Create a deployment environment[​](#create-a-deployment-environment "Direct link to Create a deployment environment") 1. From the main menu, go to **Orchestration** > **Environments**. 2. Click **Create environment**. 3. In the **Name** field, write the name of your deployment environment. For example, "Production." 4. The **dbt version** will default to the latest available. We recommend all new projects run on the latest version of dbt. 5. Under **Deployment connection**, enter the name of the dataset you want to use as the target, such as "Analytics". This will allow dbt to build and work with that dataset. For some data warehouses, the target dataset may be referred to as a "schema". 6. Click **Save**. ##### Create and run a job[​](#create-and-run-a-job "Direct link to Create and run a job") Jobs are a set of dbt commands that you want to run on a schedule. For example, `dbt build`. As the `jaffle_shop` business gains more customers, and those customers create more orders, you will see more records added to your source data. Because you materialized the `customers` model as a table, you'll need to periodically rebuild your table to ensure that the data stays up-to-date. This update will happen when you run a job. 1. After creating your deployment environment, you should be directed to the page for a new environment. If not, select **Orchestration** from the main menu, then click **Jobs**. 2. Click **Create job** > **Deploy job**. 3. Provide a job name (for example, "Production run") and select the environment you just created. 4. Scroll down to the **Execution settings** section. 5. Under **Commands**, add this command as part of your job if you don't see it: * `dbt build` 6. Select the **Generate docs on run** option to automatically [generate updated project docs](https://docs.getdbt.com/docs/explore/build-and-view-your-docs.md) each time your job runs. 7. For this exercise, do *not* set a schedule for your project to run — while your organization's project should run regularly, there's no need to run this example project on a schedule. Scheduling a job is sometimes referred to as *deploying a project*. 8. Click **Save**, then click **Run now** to run your job. 9. Click the run and watch its progress under **Run summary**. 10. Once the run is complete, click **View Documentation** to see the docs for your project. Congratulations 🎉! You've just deployed your first dbt project! ###### FAQs[​](#faqs "Direct link to FAQs") What happens if one of my runs fails? If you're using dbt, we recommend setting up email and Slack notifications (`Account Settings > Notifications`) for any failed runs. Then, debug these runs the same way you would debug any runs in development. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Quickstart for dbt and Azure Synapse Analytics [Back to guides](https://docs.getdbt.com/guides.md) dbt platform Quickstart Beginner [Menu ]() #### Introduction[​](#introduction "Direct link to Introduction") In this quickstart guide, you'll learn how to use dbt with [Azure Synapse Analytics](https://azure.microsoft.com/en-us/products/synapse-analytics/). It will show you how to: * Load the Jaffle Shop sample data (provided by dbt Labs) into your Azure Synapse Analytics warehouse. * Connect dbt to Azure Synapse Analytics. * Turn a sample query into a model in your dbt project. A model in dbt is a SELECT statement. * Add tests to your models. * Document your models. * Schedule a job to run. ##### Prerequisites[​](#prerequisites "Direct link to Prerequisites") * You have a [dbt](https://www.getdbt.com/signup/) account. * You have an Azure Synapse Analytics account. For a free trial, refer to [Synapse Analytics](https://azure.microsoft.com/en-us/free/synapse-analytics/) in the Microsoft docs. * As a Microsoft admin, you’ve enabled service principal authentication. You must add the service principal to the Synapse workspace with either a Member (recommended) or Admin permission set. For details, refer to [Create a service principal using the Azure portal](https://learn.microsoft.com/en-us/entra/identity-platform/howto-create-service-principal-portal) in the Microsoft docs. dbt needs these authentication credentials to connect to Azure Synapse Analytics. ##### Related content[​](#related-content "Direct link to Related content") * [dbt Learn courses](https://learn.getdbt.com) * [About continuous integration jobs](https://docs.getdbt.com/docs/deploy/continuous-integration.md) * [Deploy jobs](https://docs.getdbt.com/docs/deploy/deploy-jobs.md) * [Job notifications](https://docs.getdbt.com/docs/deploy/job-notifications.md) * [Source freshness](https://docs.getdbt.com/docs/deploy/source-freshness.md) #### Load data into your Azure Synapse Analytics[​](#load-data-into-your-azure-synapse-analytics "Direct link to Load data into your Azure Synapse Analytics") 1. Log in to your [Azure portal account](https://portal.azure.com/#home). 2. On the home page, select the **SQL databases** tile. 3. From the **SQL databases** page, navigate to your organization’s workspace or create a new workspace; refer to [Create a Synapse workspace](https://learn.microsoft.com/en-us/azure/synapse-analytics/quickstart-create-workspace) in the Microsoft docs for more details. 4. From the workspace's sidebar, select **Data**. Click the three dot menu on your database and select **New SQL script** to open the SQL editor. 5. Copy these statements into the SQL editor to load the Jaffle Shop example data: ```sql CREATE TABLE dbo.customers ( [ID] [bigint], [FIRST_NAME] [varchar](8000), [LAST_NAME] [varchar](8000) ); COPY INTO [dbo].[customers] FROM 'https://dbtlabsynapsedatalake.blob.core.windows.net/dbt-quickstart-public/jaffle_shop_customers.parquet' WITH ( FILE_TYPE = 'PARQUET' ); CREATE TABLE dbo.orders ( [ID] [bigint], [USER_ID] [bigint], [ORDER_DATE] [date], [STATUS] [varchar](8000) ); COPY INTO [dbo].[orders] FROM 'https://dbtlabsynapsedatalake.blob.core.windows.net/dbt-quickstart-public/jaffle_shop_orders.parquet' WITH ( FILE_TYPE = 'PARQUET' ); CREATE TABLE dbo.payments ( [ID] [bigint], [ORDERID] [bigint], [PAYMENTMETHOD] [varchar](8000), [STATUS] [varchar](8000), [AMOUNT] [bigint], [CREATED] [date] ); COPY INTO [dbo].[payments] FROM 'https://dbtlabsynapsedatalake.blob.core.windows.net/dbt-quickstart-public/stripe_payments.parquet' WITH ( FILE_TYPE = 'PARQUET' ); ``` [![Example of loading data](/img/quickstarts/dbt-cloud/example-load-data-azure-syn-analytics.png?v=2 "Example of loading data")](#)Example of loading data #### Connect dbt to Azure Synapse Analytics[​](#connect-dbt-to-azure-synapse-analytics "Direct link to Connect dbt to Azure Synapse Analytics") 1. Create a new project in dbt. Click on your account name in the left side menu, select **Account settings**, and click **+ New Project**. 2. Enter a project name and click **Continue**. 3. Choose **Synapse** as your connection and click **Next**. 4. In the **Configure your environment** section, enter the **Settings** for your new project: * **Server** — Use the service principal's **Synapse host name** value (without the trailing `, 1433` string) for the Synapse test endpoint. * **Port** — 1433 (which is the default). * **Database** — Use the service principal's **database** value for the Synapse test endpoint. 5. Enter the **Development credentials** for your new project: * **Authentication** — Choose **Service Principal** from the dropdown. * **Tenant ID** — Use the service principal’s **Directory (tenant) id** as the value. * **Client ID** — Use the service principal’s **application (client) ID id** as the value. * **Client secret** — Use the service principal’s **client secret** (not the **client secret id**) as the value. 6. Click **Test connection**. This verifies that dbt can access your Azure Synapse Analytics account. 7. Click **Next** when the test succeeds. If it failed, you might need to check your Microsoft service principal. #### Set up a dbt managed repository[​](#set-up-a-dbt-managed-repository "Direct link to Set up a dbt managed repository") When you develop in dbt, you can leverage [Git](https://docs.getdbt.com/docs/cloud/git/git-version-control.md) to version control your code. To connect to a repository, you can either set up a dbt-hosted [managed repository](https://docs.getdbt.com/docs/cloud/git/managed-repository.md) or directly connect to a [supported git provider](https://docs.getdbt.com/docs/cloud/git/connect-github.md). Managed repositories are a great way to trial dbt without needing to create a new repository. In the long run, it's better to connect to a supported git provider to use features like automation and [continuous integration](https://docs.getdbt.com/docs/deploy/continuous-integration.md). To set up a managed repository: 1. Under "Setup a repository", select **Managed**. 2. Type a name for your repo such as `bbaggins-dbt-quickstart` 3. Click **Create**. It will take a few seconds for your repository to be created and imported. 4. Once you see the "Successfully imported repository," click **Continue**. #### Initialize your dbt project​ and start developing[​](#initialize-your-dbt-project-and-start-developing "Direct link to Initialize your dbt project​ and start developing") Now that you have a repository configured, you can initialize your project and start development in dbt: 1. Click **Start developing in the Studio IDE**. It might take a few minutes for your project to spin up for the first time as it establishes your git connection, clones your repo, and tests the connection to the warehouse. 2. Above the file tree to the left, click **Initialize dbt project**. This builds out your folder structure with example models. 3. Make your initial commit by clicking **Commit and sync**. Use the commit message `initial commit` and click **Commit Changes**. This creates the first commit to your managed repo and allows you to open a branch where you can add new dbt code. 4. You can now directly query data from your warehouse and execute `dbt run`. You can try this out now: * In the command line bar at the bottom, enter `dbt run` and click **Enter**. You should see a `dbt run succeeded` message. #### Build your first model[​](#build-your-first-model "Direct link to Build your first model") 1. Under **Version Control** on the left, click **Create branch**. You can name it `add-customers-model`. You need to create a new branch since the main branch is set to read-only mode. 2. Click the three dot menu (**...**) next to the `models` directory, then select **Create file**. 3. Name the file `customers.sql`, then click **Create**. 4. Copy the following query into the file and click **Save**. customers.sql ```sql with customers as ( select ID as customer_id, FIRST_NAME as first_name, LAST_NAME as last_name from dbo.customers ), orders as ( select ID as order_id, USER_ID as customer_id, ORDER_DATE as order_date, STATUS as status from dbo.orders ), customer_orders as ( select customer_id, min(order_date) as first_order_date, max(order_date) as most_recent_order_date, count(order_id) as number_of_orders from orders group by customer_id ), final as ( select customers.customer_id, customers.first_name, customers.last_name, customer_orders.first_order_date, customer_orders.most_recent_order_date, coalesce(customer_orders.number_of_orders, 0) as number_of_orders from customers left join customer_orders on customers.customer_id = customer_orders.customer_id ) select * from final ``` 5. Enter `dbt run` in the command prompt at the bottom of the screen. You should get a successful run and see the three models. Later, you can connect your business intelligence (BI) tools to these views and tables so they only read cleaned up data rather than raw data in your BI tool. ###### FAQs[​](#faqs "Direct link to FAQs") How can I see the SQL that dbt is running? To check out the SQL that dbt is running, you can look in: * dbt: * Within the run output, click on a model name, and then select "Details" * dbt Core: * The `target/compiled/` directory for compiled `select` statements * The `target/run/` directory for compiled `create` statements * The `logs/dbt.log` file for verbose logging. How did dbt choose which schema to build my models in? By default, dbt builds models in your target schema. To change your target schema: * If you're developing in **dbt**, these are set for each user when you first use a development environment. * If you're developing with **dbt Core**, this is the `schema:` parameter in your `profiles.yml` file. If you wish to split your models across multiple schemas, check out the docs on [using custom schemas](https://docs.getdbt.com/docs/build/custom-schemas.md). Note: on BigQuery, `dataset` is used interchangeably with `schema`. Do I need to create my target schema before running dbt? Nope! dbt will check if the schema exists when it runs. If the schema does not exist, dbt will create it for you. If I rerun dbt, will there be any downtime as models are rebuilt? Nope! The SQL that dbt generates behind the scenes ensures that any relations are replaced atomically (i.e. your business users won't experience any downtime). The implementation of this varies on each warehouse, check out the [logs](https://docs.getdbt.com/faqs/Runs/checking-logs.md) to see the SQL dbt is executing. What happens if the SQL in my query is bad or I get a database error? If there's a mistake in your SQL, dbt will return the error that your database returns. ```shell $ dbt run --select customers Running with dbt=1.9.0 Found 3 models, 9 tests, 0 snapshots, 0 analyses, 133 macros, 0 operations, 0 seed files, 0 sources 14:04:12 | Concurrency: 1 threads (target='dev') 14:04:12 | 14:04:12 | 1 of 1 START view model dbt_alice.customers.......................... [RUN] 14:04:13 | 1 of 1 ERROR creating view model dbt_alice.customers................. [ERROR in 0.81s] 14:04:13 | 14:04:13 | Finished running 1 view model in 1.68s. Completed with 1 error and 0 warnings: Database Error in model customers (models/customers.sql) Syntax error: Expected ")" but got identifier `your-info-12345` at [13:15] compiled SQL at target/run/jaffle_shop/customers.sql Done. PASS=0 WARN=0 ERROR=1 SKIP=0 TOTAL=1 ``` Any models downstream of this model will also be skipped. Use the error message and the [compiled SQL](https://docs.getdbt.com/faqs/Runs/checking-logs.md) to debug any errors. #### Change the way your model is materialized[​](#change-the-way-your-model-is-materialized "Direct link to Change the way your model is materialized") One of the most powerful features of dbt is that you can change the way a model is materialized in your warehouse, simply by changing a configuration value. You can change things between tables and views by changing a keyword rather than writing the data definition language (DDL) to do this behind the scenes. By default, everything gets created as a view. You can override that at the directory level so everything in that directory will materialize to a different materialization. 1. Edit your `dbt_project.yml` file. * Update your project `name` to: dbt\_project.yml ```yaml name: 'jaffle_shop' ``` * Configure `jaffle_shop` so everything in it will be materialized as a table; and configure `example` so everything in it will be materialized as a view. Update your `models` config in the project YAML file to: dbt\_project.yml ```yaml models: jaffle_shop: +materialized: table example: +materialized: view ``` * Click **Save**. 2. Enter the `dbt run` command. Your `customers` model should now be built as a table! info To do this, dbt had to first run a `drop view` statement (or API call on BigQuery), then a `create table as` statement. 3. Edit `models/customers.sql` to override the `dbt_project.yml` for the `customers` model only by adding the following snippet to the top, and click **Save**: models/customers.sql ```sql {{ config( materialized='view' ) }} with customers as ( select id as customer_id ... ) ``` 4. Enter the `dbt run` command. Your model, `customers`, should now build as a view. * BigQuery users need to run `dbt run --full-refresh` instead of `dbt run` to full apply materialization changes. 5. Enter the `dbt run --full-refresh` command for this to take effect in your warehouse. ##### FAQs[​](#faqs "Direct link to FAQs") What materializations are available in dbt? dbt ships with five built-in materializations: `view`, `table`, `incremental`, `ephemeral`, and `materialized_view`. Check out the documentation on [materializations](https://docs.getdbt.com/docs/build/materializations.md) for more information on each of these options. You can also create your own [custom materializations](https://docs.getdbt.com/guides/create-new-materializations.md). This is an advanced feature of dbt. Which materialization should I use for my model? Start out with views, and then change models to tables when required for performance reasons (i.e. downstream queries have slowed). Check out the [docs on materializations](https://docs.getdbt.com/docs/build/materializations.md) for advice on when to use each materialization. What model configurations exist? You can also configure: * [tags](https://docs.getdbt.com/reference/resource-configs/tags.md) to support easy categorization and graph selection * [custom schemas](https://docs.getdbt.com/reference/resource-properties/schema.md) to split your models across multiple schemas * [aliases](https://docs.getdbt.com/reference/resource-configs/alias.md) if your view/table name should differ from the filename * Snippets of SQL to run at the start or end of a model, known as [hooks](https://docs.getdbt.com/docs/build/hooks-operations.md) * Warehouse-specific configurations for performance (e.g. `sort` and `dist` keys on Redshift, `partitions` on BigQuery) Check out the docs on [model configurations](https://docs.getdbt.com/reference/model-configs.md) to learn more. #### Delete the example models[​](#delete-the-example-models "Direct link to Delete the example models") You can now delete the files that dbt created when you initialized the project: 1. Delete the `models/example/` directory. 2. Delete the `example:` key from your `dbt_project.yml` file, and any configurations that are listed under it. dbt\_project.yml ```yaml # before models: jaffle_shop: +materialized: table example: +materialized: view ``` dbt\_project.yml ```yaml # after models: jaffle_shop: +materialized: table ``` 3. Save your changes. ###### FAQs[​](#faqs "Direct link to FAQs") How do I remove deleted models from my data warehouse? If you delete a model from your dbt project, dbt does not automatically drop the relation from your schema. This means that you can end up with extra objects in schemas that dbt creates, which can be confusing to other users. (This can also happen when you switch a model from being a view or table, to ephemeral) When you remove models from your dbt project, you should manually drop the related relations from your schema. I got an "unused model configurations" error message, what does this mean? You might have forgotten to nest your configurations under your project name, or you might be trying to apply configurations to a directory that doesn't exist. Check out this [article](https://discourse.getdbt.com/t/faq-i-got-an-unused-model-configurations-error-message-what-does-this-mean/112) to understand more. #### Build models on top of other models[​](#build-models-on-top-of-other-models "Direct link to Build models on top of other models") As a best practice in SQL, you should separate logic that cleans up your data from logic that transforms your data. You have already started doing this in the existing query by using common table expressions (CTEs). Now you can experiment by separating the logic out into separate models and using the [ref](https://docs.getdbt.com/reference/dbt-jinja-functions/ref.md) function to build models on top of other models: [![The DAG we want for our dbt project](/img/dbt-dag.png?v=2 "The DAG we want for our dbt project")](#)The DAG we want for our dbt project 1. Create a new SQL file, `models/stg_customers.sql`, with the SQL from the `customers` CTE in our original query. 2. Create a second new SQL file, `models/stg_orders.sql`, with the SQL from the `orders` CTE in our original query. models/stg\_customers.sql ```sql select ID as customer_id, FIRST_NAME as first_name, LAST_NAME as last_name from dbo.customers ``` models/stg\_orders.sql ```sql select ID as order_id, USER_ID as customer_id, ORDER_DATE as order_date, STATUS as status from dbo.orders ``` 3. Edit the SQL in your `models/customers.sql` file as follows: models/customers.sql ```sql with customers as ( select * from {{ ref('stg_customers') }} ), orders as ( select * from {{ ref('stg_orders') }} ), customer_orders as ( select customer_id, min(order_date) as first_order_date, max(order_date) as most_recent_order_date, count(order_id) as number_of_orders from orders group by customer_id ), final as ( select customers.customer_id, customers.first_name, customers.last_name, customer_orders.first_order_date, customer_orders.most_recent_order_date, coalesce(customer_orders.number_of_orders, 0) as number_of_orders from customers left join customer_orders on customers.customer_id = customer_orders.customer_id ) select * from final ``` 4. Execute `dbt run`. This time, when you performed a `dbt run`, separate views/tables were created for `stg_customers`, `stg_orders` and `customers`. dbt inferred the order to run these models. Because `customers` depends on `stg_customers` and `stg_orders`, dbt builds `customers` last. You do not need to explicitly define these dependencies. ###### FAQs[​](#faq-2 "Direct link to FAQs") How do I run one model at a time? To run one model, use the `--select` flag (or `-s` flag), followed by the name of the model: ```shell $ dbt run --select customers ``` Check out the [model selection syntax documentation](https://docs.getdbt.com/reference/node-selection/syntax.md) for more operators and examples. Do ref-able resource names need to be unique? Within one project: yes! To build dependencies between resources (such as models, seeds, and snapshots), you need to use the `ref` function, and pass in the resource name as an argument. dbt uses that resource name to uniquely resolve the `ref` to a specific resource. As a result, these resource names need to be unique, *even if they are in distinct folders*. A resource in one project can have the same name as a resource in another project (installed as a dependency). dbt uses the project name to uniquely identify each resource. We call this "namespacing." If you `ref` a resource with a duplicated name, it will resolve to the resource within the same namespace (package or project), or raise an error because of an ambiguous reference. Use [two-argument `ref`](https://docs.getdbt.com/reference/dbt-jinja-functions/ref.md#ref-project-specific-models) to disambiguate references by specifying the namespace. Those resource will still need to land in distinct locations in the data warehouse. Read the docs on [custom aliases](https://docs.getdbt.com/docs/build/custom-aliases.md) and [custom schemas](https://docs.getdbt.com/docs/build/custom-schemas.md) for details on how to achieve this. As I create more models, how should I keep my project organized? What should I name my models? There's no one best way to structure a project! Every organization is unique. If you're just getting started, check out how we (dbt Labs) [structure our dbt projects](https://docs.getdbt.com/best-practices/how-we-structure/1-guide-overview.md). #### Add tests to your models[​](#add-tests-to-your-models "Direct link to Add tests to your models") Adding [data tests](https://docs.getdbt.com/docs/build/data-tests.md) to a project helps validate that your models are working correctly. To add data tests to your project: 1. Create a new YAML file in the `models` directory, named `models/schema.yml` 2. Add the following contents to the file: models/schema.yml ```yaml version: 2 models: - name: customers columns: - name: customer_id data_tests: - unique - not_null - name: stg_customers columns: - name: customer_id data_tests: - unique - not_null - name: stg_orders columns: - name: order_id data_tests: - unique - not_null - name: status data_tests: - accepted_values: arguments: # available in v1.10.5 and higher. Older versions can set the as the top-level property. values: ['placed', 'shipped', 'completed', 'return_pending', 'returned'] - name: customer_id data_tests: - not_null - relationships: arguments: to: ref('stg_customers') field: customer_id ``` 3. Run `dbt test`, and confirm that all your tests passed. When you run `dbt test`, dbt iterates through your YAML files, and constructs a query for each test. Each query will return the number of records that fail the test. If this number is 0, then the test is successful. ###### FAQs[​](#faqs "Direct link to FAQs") What tests are available for me to use in dbt? Can I add my own custom tests? Out of the box, dbt ships with the following data tests: * `unique` * `not_null` * `accepted_values` * `relationships` (for example, referential integrity) You can also write your own [custom generic tests](https://docs.getdbt.com/docs/build/data-tests.md#generic-data-tests). Some additional generic tests have been open-sourced in the [dbt-utils package](https://github.com/dbt-labs/dbt-utils#generic-tests). Check out the docs on [packages](https://docs.getdbt.com/docs/build/packages.md) to learn how to make these tests available in your project. How do I test one model at a time? Running tests on one model looks very similar to running a model: use the `--select` flag (or `-s` flag), followed by the name of the model: ```shell dbt test --select customers ``` Check out the [model selection syntax documentation](https://docs.getdbt.com/reference/node-selection/syntax.md) for full syntax, and [test selection examples](https://docs.getdbt.com/reference/node-selection/test-selection-examples.md) in particular. One of my tests failed, how can I debug it? To debug a failing test, find the SQL that dbt ran by: * dbt: * Within the test output, click on the failed test, and then select "Details". * dbt Core: * Open the file path returned as part of the error message. * Navigate to the `target/compiled/schema_tests` directory for all compiled test queries. Copy the SQL into a query editor (in dbt, you can paste it into a new `Statement`), and run the query to find the records that failed. Does my test file need to be named \`schema.yml\`? No! You can name this file whatever you want (including `whatever_you_want.yml`), so long as: * The file is in your `models/` directory¹ * The file has `.yml` extension Check out the [docs](https://docs.getdbt.com/reference/configs-and-properties.md) for more information. ¹If you're declaring properties for seeds, snapshots, or macros, you can also place this file in the related directory — `seeds/`, `snapshots/` and `macros/` respectively. Why do model and source YAML files always start with \`version: 2\`? Once upon a time, the structure of these `.yml` files was very different (s/o to anyone who was using dbt back then!). Adding `version: 2` allowed us to make this structure more extensible. From [dbt Core v1.5](), the top-level `version:` key is optional in all resource YAML files. If present, only `version: 2` is supported. Also starting in v1.5, both the [`config-version: 2`](https://docs.getdbt.com/reference/project-configs/config-version.md) and the top-level `version:` key in the `dbt_project.yml` are optional. Resource YAML files do not currently require this config. We only support `version: 2` if it's specified. Although we do not expect to update YAML files to `version: 3` soon, having this config will make it easier for us to introduce new structures in the future What data tests should I add to my project? We recommend that every model has a data test on a primary key, that is, a column that is `unique` and `not_null`. We also recommend that you test any assumptions on your source data. For example, if you believe that your payments can only be one of three payment methods, you should test that assumption regularly — a new payment method may introduce logic errors in your SQL. In advanced dbt projects, we recommend using [sources](https://docs.getdbt.com/docs/build/sources.md) and running these source data-integrity tests against the sources rather than models. When should I run my data tests? You should run your data tests whenever you are writing new code (to ensure you haven't broken any existing models by changing SQL), and whenever you run your transformations in production (to ensure that your assumptions about your source data are still valid). #### Document your models[​](#document-your-models "Direct link to Document your models") Adding [documentation](https://docs.getdbt.com/docs/build/documentation.md) to your project allows you to describe your models in rich detail, and share that information with your team. Here, we're going to add some basic documentation to our project. Update your `models/schema.yml` file to include some descriptions, such as those below. models/schema.yml ```yaml version: 2 models: - name: customers description: One record per customer columns: - name: customer_id description: Primary key data_tests: - unique - not_null - name: first_order_date description: NULL when a customer has not yet placed an order. - name: stg_customers description: This model cleans up customer data columns: - name: customer_id description: Primary key data_tests: - unique - not_null - name: stg_orders description: This model cleans up order data columns: - name: order_id description: Primary key data_tests: - unique - not_null - name: status data_tests: - accepted_values: arguments: # available in v1.10.5 and higher. Older versions can set the as the top-level property. values: ['placed', 'shipped', 'completed', 'return_pending', 'returned'] - name: customer_id data_tests: - not_null - relationships: arguments: to: ref('stg_customers') field: customer_id ``` * View in Catalog * View in Studio IDE [Catalog](https://docs.getdbt.com/docs/explore/explore-projects.md) provides powerful tools to interact with your dbt projects, including documentation: 1. From the IDE, run one of the following commands: * `dbt docs generate` if you're on dbt Core * `dbt build` if you're on the dbt Fusion engine 2. Click **Catalog** in the navigation menu to launch Catalog. 3. In the Catalog pane, click the environment selection dropdown menu at the top of the file tree and change it from **Production** to **Development**. [![View your development environment information.](/img/docs/collaborate/dbt-explorer/catalog-nav-dropdown.png?v=2 "View your development environment information.")](#)View your development environment information. 4. Select your project from the file tree. 5. Use the search bar or browse the resource list to find the `customers` model. 6. Click the model to view its details, including the descriptions you added. [![View your model's documentation and lineage in Catalog.](/img/docs/collaborate/dbt-explorer/example-model-details.png?v=2 "View your model's documentation and lineage in Catalog.")](#)View your model's documentation and lineage in Catalog. Catalog displays your model's description, column documentation, data tests, and lineage graph. You can also see which columns are missing documentation and track test coverage across your project. You can view docs directly from the IDE if you're on `Latest` or another version of dbt Core. Keep in mind that this is a legacy view and doesn't offer the same level of interactivity as Catalog. 1. In the IDE, run `dbt docs generate`. 2. From the navigation bar, click the **View docs** icon located to the right of the **branch name**. [![The View docs icon in the Studio IDE.](/img/docs/collaborate/dbt-explorer/docs-icon.png?v=2 "The View docs icon in the Studio IDE.")](#)The View docs icon in the Studio IDE. 3. From **Projects**, select your project name and expand the folders. 4. Click **models** > **marts** > **customers**. [![View your model's documentation in the legacy docs view.](/img/docs/collaborate/dbt-explorer/legacy-docs-view.png?v=2 "View your model's documentation in the legacy docs view.")](#)View your model's documentation in the legacy docs view. ###### FAQs[​](#faqs "Direct link to FAQs") How do I write long-form explanations in my descriptions? If you need more than a sentence to explain a model, you can: 1. Split your description over multiple lines using `>`. Interior line breaks are removed and Markdown can be used. This method is recommended for simple, single-paragraph descriptions: ```yml models: - name: customers description: > Lorem ipsum **dolor** sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. ``` 2. Split your description over multiple lines using `|`. Interior line breaks are maintained and Markdown can be used. This method is recommended for more complex descriptions: ```yml models: - name: customers description: | ### Lorem ipsum * dolor sit amet, consectetur adipisicing elit, sed do eiusmod * tempor incididunt ut labore et dolore magna aliqua. ``` 3. Use a [docs block](https://docs.getdbt.com/docs/build/documentation.md#using-docs-blocks) to write the description in a separate Markdown file. How do I access documentation in dbt Catalog? If you're using dbt to deploy your project and have a [Starter, Enterprise, or Enterprise+ plan](https://www.getdbt.com/pricing/), you can use Catalog to view your project's [resources](https://docs.getdbt.com/docs/build/projects.md) (such as models, tests, and metrics) and their lineage to gain a better understanding of its latest production state. Access Catalog in dbt by clicking the **Catalog** link in the navigation. You can have up to 5 read-only users access the documentation for your project. dbt developer plan and dbt Core users can use [dbt Docs](https://docs.getdbt.com/docs/explore/build-and-view-your-docs.md#dbt-docs), which generates basic documentation but it doesn't offer the same speed, metadata, or visibility as Catalog. #### Commit your changes[​](#commit-your-changes "Direct link to Commit your changes") Now that you've built your customer model, you need to commit the changes you made to the project so that the repository has your latest code. **If you edited directly in the protected primary branch:**
1. Click the **Commit and sync git** button. This action prepares your changes for commit. 2. A modal titled **Commit to a new branch** will appear. 3. In the modal window, name your new branch `add-customers-model`. This branches off from your primary branch with your new changes. 4. Add a commit message, such as "Add customers model, tests, docs" and and commit your changes. 5. Click **Merge this branch to main** to add these changes to the main branch on your repo. **If you created a new branch before editing:**
1. Since you already branched out of the primary protected branch, go to **Version Control** on the left. 2. Click **Commit and sync** to add a message. 3. Add a commit message, such as "Add customers model, tests, docs." 4. Click **Merge this branch to main** to add these changes to the main branch on your repo. #### Deploy dbt[​](#deploy-dbt "Direct link to Deploy dbt") Use dbt's Scheduler to deploy your production jobs confidently and build observability into your processes. You'll learn to create a deployment environment and run a job in the following steps. ##### Create a deployment environment[​](#create-a-deployment-environment "Direct link to Create a deployment environment") 1. From the main menu, go to **Orchestration** > **Environments**. 2. Click **Create environment**. 3. In the **Name** field, write the name of your deployment environment. For example, "Production." 4. The **dbt version** will default to the latest available. We recommend all new projects run on the latest version of dbt. 5. Under **Deployment connection**, enter the name of the dataset you want to use as the target, such as "Analytics". This will allow dbt to build and work with that dataset. For some data warehouses, the target dataset may be referred to as a "schema". 6. Click **Save**. ##### Create and run a job[​](#create-and-run-a-job "Direct link to Create and run a job") Jobs are a set of dbt commands that you want to run on a schedule. For example, `dbt build`. As the `jaffle_shop` business gains more customers, and those customers create more orders, you will see more records added to your source data. Because you materialized the `customers` model as a table, you'll need to periodically rebuild your table to ensure that the data stays up-to-date. This update will happen when you run a job. 1. After creating your deployment environment, you should be directed to the page for a new environment. If not, select **Orchestration** from the main menu, then click **Jobs**. 2. Click **Create job** > **Deploy job**. 3. Provide a job name (for example, "Production run") and select the environment you just created. 4. Scroll down to the **Execution settings** section. 5. Under **Commands**, add this command as part of your job if you don't see it: * `dbt build` 6. Select the **Generate docs on run** option to automatically [generate updated project docs](https://docs.getdbt.com/docs/explore/build-and-view-your-docs.md) each time your job runs. 7. For this exercise, do *not* set a schedule for your project to run — while your organization's project should run regularly, there's no need to run this example project on a schedule. Scheduling a job is sometimes referred to as *deploying a project*. 8. Click **Save**, then click **Run now** to run your job. 9. Click the run and watch its progress under **Run summary**. 10. Once the run is complete, click **View Documentation** to see the docs for your project. Congratulations 🎉! You've just deployed your first dbt project! ###### FAQs[​](#faqs "Direct link to FAQs") What happens if one of my runs fails? If you're using dbt, we recommend setting up email and Slack notifications (`Account Settings > Notifications`) for any failed runs. Then, debug these runs the same way you would debug any runs in development. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Quickstart for dbt and BigQuery [Back to guides](https://docs.getdbt.com/guides.md) BigQuery Platform Quickstart Beginner [Menu ]() #### Introduction[​](#introduction "Direct link to Introduction") In this quickstart guide, you'll learn how to use dbt with BigQuery. It will show you how to: * Create a Google Cloud Platform (GCP) project. * Access sample data in a public dataset. * Connect dbt to BigQuery. * Take a sample query and turn it into a model in your dbt project. A model in dbt is a select statement. * Add tests to your models. * Document your models. * Schedule a job to run. Videos for you You can check out [dbt Fundamentals](https://learn.getdbt.com/courses/dbt-fundamentals) for free if you're interested in course learning with videos. ##### Prerequisites​[​](#prerequisites "Direct link to Prerequisites​") * You have a [dbt account](https://www.getdbt.com/signup/). * You have a [Google account](https://support.google.com/accounts/answer/27441?hl=en). * You can use a personal or work account to set up BigQuery through [Google Cloud Platform (GCP)](https://cloud.google.com/free). ##### Related content[​](#related-content "Direct link to Related content") * Learn more with [dbt Learn courses](https://learn.getdbt.com) * [CI jobs](https://docs.getdbt.com/docs/deploy/continuous-integration.md) * [Deploy jobs](https://docs.getdbt.com/docs/deploy/deploy-jobs.md) * [Job notifications](https://docs.getdbt.com/docs/deploy/job-notifications.md) * [Source freshness](https://docs.getdbt.com/docs/deploy/source-freshness.md) #### Create a new GCP project​[​](#create-a-new-gcp-project "Direct link to Create a new GCP project​") 1. Go to the [BigQuery Console](https://console.cloud.google.com/bigquery) after you log in to your Google account. If you have multiple Google accounts, make sure you’re using the correct one. 2. Create a new project from the [Manage resources page](https://console.cloud.google.com/projectcreate?previousPage=%2Fcloud-resource-manager%3Fwalkthrough_id%3Dresource-manager--create-project%26project%3D%26folder%3D%26organizationId%3D%23step_index%3D1\&walkthrough_id=resource-manager--create-project). For more information, refer to [Creating a project](https://cloud.google.com/resource-manager/docs/creating-managing-projects#creating_a_project) in the Google Cloud docs. GCP automatically populates the Project name field for you. You can change it to be more descriptive for your use. For example, `dbt Learn - BigQuery Setup`. #### Create BigQuery datasets[​](#create-bigquery-datasets "Direct link to Create BigQuery datasets") 1. From the [BigQuery Console](https://console.cloud.google.com/bigquery), click **Editor**. Make sure to select your newly created project, which is available at the top of the page. 2. Verify that you can run SQL queries. Copy and paste these queries into the Query Editor: ```sql select * from `dbt-tutorial.jaffle_shop.customers`; select * from `dbt-tutorial.jaffle_shop.orders`; select * from `dbt-tutorial.stripe.payment`; ``` Click **Run**, then check for results from the queries. For example: [![Bigquery Query Results](/img/bigquery/query-results.png?v=2 "Bigquery Query Results")](#)Bigquery Query Results 3. Create new datasets from the [BigQuery Console](https://console.cloud.google.com/bigquery). For more information, refer to [Create datasets](https://cloud.google.com/bigquery/docs/datasets#create-dataset) in the Google Cloud docs. Datasets in BigQuery are equivalent to schemas in a traditional database. On the **Create dataset** page: * **Dataset ID** — Enter a name that fits the purpose. This name is used like schema in fully qualified references to your database objects such as `database.schema.table`. As an example for this guide, create one for `jaffle_shop` and another one for `stripe` afterward. * **Data location** — Leave it blank (the default). It determines the GCP location of where your data is stored. The current default location is the US multi-region. All tables within this dataset will share this location. * **Enable table expiration** — Leave it unselected (the default). The default for the billing table expiration is 60 days. Because billing isn’t enabled for this project, GCP defaults to deprecating tables. * **Google-managed encryption key** — This option is available under **Advanced options**. Allow Google to manage encryption (the default). [![Bigquery Create Dataset ID](/img/bigquery/create-dataset-id.png?v=2 "Bigquery Create Dataset ID")](#)Bigquery Create Dataset ID 4. After you create the `jaffle_shop` dataset, create one for `stripe` with all the same values except for **Dataset ID**. #### Generate BigQuery credentials[​](#generate-bigquery-credentials "Direct link to Generate BigQuery credentials") In order to let dbt connect to your warehouse, you'll need to generate a keyfile. This is analogous to using a database username and password with most other data warehouses. 1. Start the [GCP credentials wizard](https://console.cloud.google.com/apis/credentials/wizard). Make sure your new project is selected in the header. If you do not see your account or project, click your profile picture to the right and verify you are using the correct email account. For **Credential Type**: * From the **Select an API** dropdown, choose **BigQuery API** * Select **Application data** for the type of data you will be accessing * Click **Next** to create a new service account. 2. Create a service account for your new project from the [Service accounts page](https://console.cloud.google.com/projectselector2/iam-admin/serviceaccounts?supportedpurview=project). For more information, refer to [Create a service account](https://developers.google.com/workspace/guides/create-credentials#create_a_service_account) in the Google Cloud docs. As an example for this guide, you can: * Type `dbt-user` as the **Service account name** * From the **Select a role** dropdown, choose **BigQuery Job User** and **BigQuery Data Editor** roles and click **Continue** * Leave the **Grant users access to this service account** fields blank * Click **Done** 3. Create a service account key for your new project from the [Service accounts page](https://console.cloud.google.com/iam-admin/serviceaccounts?walkthrough_id=iam--create-service-account-keys\&start_index=1#step_index=1). For more information, refer to [Create a service account key](https://cloud.google.com/iam/docs/creating-managing-service-account-keys#creating) in the Google Cloud docs. When downloading the JSON file, make sure to use a filename you can easily remember. For example, `dbt-user-creds.json`. For security reasons, dbt Labs recommends that you protect this JSON file like you would your identity credentials; for example, don't check the JSON file into your version control software. #### Connect dbt to BigQuery​[​](#connect-dbt-to-bigquery "Direct link to Connect dbt to BigQuery​") 1. Create a new project in [dbt](https://docs.getdbt.com/docs/cloud/about-cloud/access-regions-ip-addresses.md). Navigate to **Account settings** (by clicking on your account name in the left side menu), and click **+ New project**. 2. Enter a project name and click **Continue**. 3. For the warehouse, click **BigQuery** then **Next** to set up your connection. 4. Click **Upload a Service Account JSON File** in settings. 5. Select the JSON file you downloaded in [Generate BigQuery credentials](#generate-bigquery-credentials) and dbt will fill in all the necessary fields. 6. Optional — dbt Enterprise plans can configure developer OAuth with BigQuery, providing an additional layer of security. For more information, refer to [Set up BigQuery OAuth](https://docs.getdbt.com/docs/cloud/manage-access/set-up-bigquery-oauth.md). 7. Click **Test Connection**. This verifies that dbt can access your BigQuery account. 8. Click **Next** if the test succeeded. If it failed, you might need to go back and regenerate your BigQuery credentials. #### Set up a dbt managed repository[​](#set-up-a-dbt-managed-repository "Direct link to Set up a dbt managed repository") When you develop in dbt, you can leverage [Git](https://docs.getdbt.com/docs/cloud/git/git-version-control.md) to version control your code. To connect to a repository, you can either set up a dbt-hosted [managed repository](https://docs.getdbt.com/docs/cloud/git/managed-repository.md) or directly connect to a [supported git provider](https://docs.getdbt.com/docs/cloud/git/connect-github.md). Managed repositories are a great way to trial dbt without needing to create a new repository. In the long run, it's better to connect to a supported git provider to use features like automation and [continuous integration](https://docs.getdbt.com/docs/deploy/continuous-integration.md). To set up a managed repository: 1. Under "Setup a repository", select **Managed**. 2. Type a name for your repo such as `bbaggins-dbt-quickstart` 3. Click **Create**. It will take a few seconds for your repository to be created and imported. 4. Once you see the "Successfully imported repository," click **Continue**. #### Initialize your dbt project​ and start developing[​](#initialize-your-dbt-project-and-start-developing "Direct link to Initialize your dbt project​ and start developing") Now that you have a repository configured, you can initialize your project and start development in dbt: 1. Click **Start developing in the Studio IDE**. It might take a few minutes for your project to spin up for the first time as it establishes your git connection, clones your repo, and tests the connection to the warehouse. 2. Above the file tree to the left, click **Initialize dbt project**. This builds out your folder structure with example models. 3. Make your initial commit by clicking **Commit and sync**. Use the commit message `initial commit` and click **Commit**. This creates the first commit to your managed repo and allows you to open a branch where you can add new dbt code. 4. You can now directly query data from your warehouse and execute `dbt run`. You can try this out now: * Click **+ Create new file**, add this query to the new file, and click **Save as** to save the new file: ```sql select * from `dbt-tutorial.jaffle_shop.customers` ``` * In the command line bar at the bottom, enter `dbt run` and click **Enter**. You should see a `dbt run succeeded` message. #### Build your first model[​](#build-your-first-model "Direct link to Build your first model") You have two options for working with files in the Studio IDE: * Create a new branch (recommended) — Create a new branch to edit and commit your changes. Navigate to **Version Control** on the left sidebar and click **Create branch**. * Edit in the protected primary branch — If you prefer to edit, format, or lint files and execute dbt commands directly in your primary git branch. The Studio IDE prevents commits to the protected branch, so you will be prompted to commit your changes to a new branch. Name the new branch `add-customers-model`. 1. Click the **...** next to the `models` directory, then select **Create file**. 2. Name the file `customers.sql`, then click **Create**. 3. Copy the following query into the file and click **Save**. ```sql with customers as ( select id as customer_id, first_name, last_name from `dbt-tutorial`.jaffle_shop.customers ), orders as ( select id as order_id, user_id as customer_id, order_date, status from `dbt-tutorial`.jaffle_shop.orders ), customer_orders as ( select customer_id, min(order_date) as first_order_date, max(order_date) as most_recent_order_date, count(order_id) as number_of_orders from orders group by 1 ), final as ( select customers.customer_id, customers.first_name, customers.last_name, customer_orders.first_order_date, customer_orders.most_recent_order_date, coalesce(customer_orders.number_of_orders, 0) as number_of_orders from customers left join customer_orders using (customer_id) ) select * from final ``` 4. Enter `dbt run` in the command prompt at the bottom of the screen. You should get a successful run and see the three models. Later, you can connect your business intelligence (BI) tools to these views and tables so they only read cleaned up data rather than raw data in your BI tool. ###### FAQs[​](#faqs "Direct link to FAQs") How can I see the SQL that dbt is running? To check out the SQL that dbt is running, you can look in: * dbt: * Within the run output, click on a model name, and then select "Details" * dbt Core: * The `target/compiled/` directory for compiled `select` statements * The `target/run/` directory for compiled `create` statements * The `logs/dbt.log` file for verbose logging. How did dbt choose which schema to build my models in? By default, dbt builds models in your target schema. To change your target schema: * If you're developing in **dbt**, these are set for each user when you first use a development environment. * If you're developing with **dbt Core**, this is the `schema:` parameter in your `profiles.yml` file. If you wish to split your models across multiple schemas, check out the docs on [using custom schemas](https://docs.getdbt.com/docs/build/custom-schemas.md). Note: on BigQuery, `dataset` is used interchangeably with `schema`. Do I need to create my target schema before running dbt? Nope! dbt will check if the schema exists when it runs. If the schema does not exist, dbt will create it for you. If I rerun dbt, will there be any downtime as models are rebuilt? Nope! The SQL that dbt generates behind the scenes ensures that any relations are replaced atomically (i.e. your business users won't experience any downtime). The implementation of this varies on each warehouse, check out the [logs](https://docs.getdbt.com/faqs/Runs/checking-logs.md) to see the SQL dbt is executing. What happens if the SQL in my query is bad or I get a database error? If there's a mistake in your SQL, dbt will return the error that your database returns. ```shell $ dbt run --select customers Running with dbt=1.9.0 Found 3 models, 9 tests, 0 snapshots, 0 analyses, 133 macros, 0 operations, 0 seed files, 0 sources 14:04:12 | Concurrency: 1 threads (target='dev') 14:04:12 | 14:04:12 | 1 of 1 START view model dbt_alice.customers.......................... [RUN] 14:04:13 | 1 of 1 ERROR creating view model dbt_alice.customers................. [ERROR in 0.81s] 14:04:13 | 14:04:13 | Finished running 1 view model in 1.68s. Completed with 1 error and 0 warnings: Database Error in model customers (models/customers.sql) Syntax error: Expected ")" but got identifier `your-info-12345` at [13:15] compiled SQL at target/run/jaffle_shop/customers.sql Done. PASS=0 WARN=0 ERROR=1 SKIP=0 TOTAL=1 ``` Any models downstream of this model will also be skipped. Use the error message and the [compiled SQL](https://docs.getdbt.com/faqs/Runs/checking-logs.md) to debug any errors. #### Change the way your model is materialized[​](#change-the-way-your-model-is-materialized "Direct link to Change the way your model is materialized") One of the most powerful features of dbt is that you can change the way a model is materialized in your warehouse, simply by changing a configuration value. You can change things between tables and views by changing a keyword rather than writing the data definition language (DDL) to do this behind the scenes. By default, everything gets created as a view. You can override that at the directory level so everything in that directory will materialize to a different materialization. 1. Edit your `dbt_project.yml` file. * Update your project `name` to: dbt\_project.yml ```yaml name: 'jaffle_shop' ``` * Configure `jaffle_shop` so everything in it will be materialized as a table; and configure `example` so everything in it will be materialized as a view. Update your `models` config in the project YAML file to: dbt\_project.yml ```yaml models: jaffle_shop: +materialized: table example: +materialized: view ``` * Click **Save**. 2. Enter the `dbt run` command. Your `customers` model should now be built as a table! info To do this, dbt had to first run a `drop view` statement (or API call on BigQuery), then a `create table as` statement. 3. Edit `models/customers.sql` to override the `dbt_project.yml` for the `customers` model only by adding the following snippet to the top, and click **Save**: models/customers.sql ```sql {{ config( materialized='view' ) }} with customers as ( select id as customer_id ... ) ``` 4. Enter the `dbt run` command. Your model, `customers`, should now build as a view. * BigQuery users need to run `dbt run --full-refresh` instead of `dbt run` to full apply materialization changes. 5. Enter the `dbt run --full-refresh` command for this to take effect in your warehouse. ##### FAQs[​](#faqs "Direct link to FAQs") What materializations are available in dbt? dbt ships with five built-in materializations: `view`, `table`, `incremental`, `ephemeral`, and `materialized_view`. Check out the documentation on [materializations](https://docs.getdbt.com/docs/build/materializations.md) for more information on each of these options. You can also create your own [custom materializations](https://docs.getdbt.com/guides/create-new-materializations.md). This is an advanced feature of dbt. Which materialization should I use for my model? Start out with views, and then change models to tables when required for performance reasons (i.e. downstream queries have slowed). Check out the [docs on materializations](https://docs.getdbt.com/docs/build/materializations.md) for advice on when to use each materialization. What model configurations exist? You can also configure: * [tags](https://docs.getdbt.com/reference/resource-configs/tags.md) to support easy categorization and graph selection * [custom schemas](https://docs.getdbt.com/reference/resource-properties/schema.md) to split your models across multiple schemas * [aliases](https://docs.getdbt.com/reference/resource-configs/alias.md) if your view/table name should differ from the filename * Snippets of SQL to run at the start or end of a model, known as [hooks](https://docs.getdbt.com/docs/build/hooks-operations.md) * Warehouse-specific configurations for performance (e.g. `sort` and `dist` keys on Redshift, `partitions` on BigQuery) Check out the docs on [model configurations](https://docs.getdbt.com/reference/model-configs.md) to learn more. #### Delete the example models[​](#delete-the-example-models "Direct link to Delete the example models") You can now delete the files that dbt created when you initialized the project: 1. Delete the `models/example/` directory. 2. Delete the `example:` key from your `dbt_project.yml` file, and any configurations that are listed under it. dbt\_project.yml ```yaml # before models: jaffle_shop: +materialized: table example: +materialized: view ``` dbt\_project.yml ```yaml # after models: jaffle_shop: +materialized: table ``` 3. Save your changes. ###### FAQs[​](#faqs "Direct link to FAQs") How do I remove deleted models from my data warehouse? If you delete a model from your dbt project, dbt does not automatically drop the relation from your schema. This means that you can end up with extra objects in schemas that dbt creates, which can be confusing to other users. (This can also happen when you switch a model from being a view or table, to ephemeral) When you remove models from your dbt project, you should manually drop the related relations from your schema. I got an "unused model configurations" error message, what does this mean? You might have forgotten to nest your configurations under your project name, or you might be trying to apply configurations to a directory that doesn't exist. Check out this [article](https://discourse.getdbt.com/t/faq-i-got-an-unused-model-configurations-error-message-what-does-this-mean/112) to understand more. #### Build models on top of other models[​](#build-models-on-top-of-other-models "Direct link to Build models on top of other models") As a best practice in SQL, you should separate logic that cleans up your data from logic that transforms your data. You have already started doing this in the existing query by using common table expressions (CTEs). Now you can experiment by separating the logic out into separate models and using the [ref](https://docs.getdbt.com/reference/dbt-jinja-functions/ref.md) function to build models on top of other models: [![The DAG we want for our dbt project](/img/dbt-dag.png?v=2 "The DAG we want for our dbt project")](#)The DAG we want for our dbt project 1. Create a new SQL file, `models/stg_customers.sql`, with the SQL from the `customers` CTE in our original query. 2. Create a second new SQL file, `models/stg_orders.sql`, with the SQL from the `orders` CTE in our original query. models/stg\_customers.sql ```sql select id as customer_id, first_name, last_name from `dbt-tutorial`.jaffle_shop.customers ``` models/stg\_orders.sql ```sql select id as order_id, user_id as customer_id, order_date, status from `dbt-tutorial`.jaffle_shop.orders ``` 3. Edit the SQL in your `models/customers.sql` file as follows: models/customers.sql ```sql with customers as ( select * from {{ ref('stg_customers') }} ), orders as ( select * from {{ ref('stg_orders') }} ), customer_orders as ( select customer_id, min(order_date) as first_order_date, max(order_date) as most_recent_order_date, count(order_id) as number_of_orders from orders group by 1 ), final as ( select customers.customer_id, customers.first_name, customers.last_name, customer_orders.first_order_date, customer_orders.most_recent_order_date, coalesce(customer_orders.number_of_orders, 0) as number_of_orders from customers left join customer_orders using (customer_id) ) select * from final ``` 4. Execute `dbt run`. This time, when you performed a `dbt run`, separate views/tables were created for `stg_customers`, `stg_orders` and `customers`. dbt inferred the order to run these models. Because `customers` depends on `stg_customers` and `stg_orders`, dbt builds `customers` last. You do not need to explicitly define these dependencies. #### Build models on top of sources[​](#build-models-on-top-of-sources "Direct link to Build models on top of sources") Sources make it possible to name and describe the data loaded into your warehouse by your extract and load tools. By declaring these tables as sources in dbt, you can: * select from source tables in your models using the `{{ source() }}` function, helping define the lineage of your data * test your assumptions about your source data * calculate the freshness of your source data 1. Create a new YML file `models/sources.yml`. 2. Declare the sources by copying the following into the file and clicking **Save**. models/sources.yml ```yml sources: - name: jaffle_shop description: This is a replica of the Postgres database used by our app database: dbt-tutorial schema: jaffle_shop tables: - name: customers description: One record per customer. - name: orders description: One record per order. Includes cancelled and deleted orders. ``` 3. Edit the `models/stg_customers.sql` file to select from the `customers` table in the `jaffle_shop` source. models/stg\_customers.sql ```sql select id as customer_id, first_name, last_name from {{ source('jaffle_shop', 'customers') }} ``` 4. Edit the `models/stg_orders.sql` file to select from the `orders` table in the `jaffle_shop` source. models/stg\_orders.sql ```sql select id as order_id, user_id as customer_id, order_date, status from {{ source('jaffle_shop', 'orders') }} ``` 5. Execute `dbt run`. The results of your `dbt run` will be exactly the same as the previous step. Your `stg_customers` and `stg_orders` models will still query from the same raw data source in BigQuery. By using `source`, you can test and document your raw data and also understand the lineage of your sources. ###### FAQs[​](#faq-2 "Direct link to FAQs") How do I run one model at a time? To run one model, use the `--select` flag (or `-s` flag), followed by the name of the model: ```shell $ dbt run --select customers ``` Check out the [model selection syntax documentation](https://docs.getdbt.com/reference/node-selection/syntax.md) for more operators and examples. Do ref-able resource names need to be unique? Within one project: yes! To build dependencies between resources (such as models, seeds, and snapshots), you need to use the `ref` function, and pass in the resource name as an argument. dbt uses that resource name to uniquely resolve the `ref` to a specific resource. As a result, these resource names need to be unique, *even if they are in distinct folders*. A resource in one project can have the same name as a resource in another project (installed as a dependency). dbt uses the project name to uniquely identify each resource. We call this "namespacing." If you `ref` a resource with a duplicated name, it will resolve to the resource within the same namespace (package or project), or raise an error because of an ambiguous reference. Use [two-argument `ref`](https://docs.getdbt.com/reference/dbt-jinja-functions/ref.md#ref-project-specific-models) to disambiguate references by specifying the namespace. Those resource will still need to land in distinct locations in the data warehouse. Read the docs on [custom aliases](https://docs.getdbt.com/docs/build/custom-aliases.md) and [custom schemas](https://docs.getdbt.com/docs/build/custom-schemas.md) for details on how to achieve this. As I create more models, how should I keep my project organized? What should I name my models? There's no one best way to structure a project! Every organization is unique. If you're just getting started, check out how we (dbt Labs) [structure our dbt projects](https://docs.getdbt.com/best-practices/how-we-structure/1-guide-overview.md). #### Add tests to your models[​](#add-tests-to-your-models "Direct link to Add tests to your models") Adding [data tests](https://docs.getdbt.com/docs/build/data-tests.md) to a project helps validate that your models are working correctly. To add data tests to your project: 1. Create a new YAML file in the `models` directory, named `models/schema.yml` 2. Add the following contents to the file: models/schema.yml ```yaml version: 2 models: - name: customers columns: - name: customer_id data_tests: - unique - not_null - name: stg_customers columns: - name: customer_id data_tests: - unique - not_null - name: stg_orders columns: - name: order_id data_tests: - unique - not_null - name: status data_tests: - accepted_values: arguments: # available in v1.10.5 and higher. Older versions can set the as the top-level property. values: ['placed', 'shipped', 'completed', 'return_pending', 'returned'] - name: customer_id data_tests: - not_null - relationships: arguments: to: ref('stg_customers') field: customer_id ``` 3. Run `dbt test`, and confirm that all your tests passed. When you run `dbt test`, dbt iterates through your YAML files, and constructs a query for each test. Each query will return the number of records that fail the test. If this number is 0, then the test is successful. ###### FAQs[​](#faqs "Direct link to FAQs") What tests are available for me to use in dbt? Can I add my own custom tests? Out of the box, dbt ships with the following data tests: * `unique` * `not_null` * `accepted_values` * `relationships` (for example, referential integrity) You can also write your own [custom generic tests](https://docs.getdbt.com/docs/build/data-tests.md#generic-data-tests). Some additional generic tests have been open-sourced in the [dbt-utils package](https://github.com/dbt-labs/dbt-utils#generic-tests). Check out the docs on [packages](https://docs.getdbt.com/docs/build/packages.md) to learn how to make these tests available in your project. How do I test one model at a time? Running tests on one model looks very similar to running a model: use the `--select` flag (or `-s` flag), followed by the name of the model: ```shell dbt test --select customers ``` Check out the [model selection syntax documentation](https://docs.getdbt.com/reference/node-selection/syntax.md) for full syntax, and [test selection examples](https://docs.getdbt.com/reference/node-selection/test-selection-examples.md) in particular. One of my tests failed, how can I debug it? To debug a failing test, find the SQL that dbt ran by: * dbt: * Within the test output, click on the failed test, and then select "Details". * dbt Core: * Open the file path returned as part of the error message. * Navigate to the `target/compiled/schema_tests` directory for all compiled test queries. Copy the SQL into a query editor (in dbt, you can paste it into a new `Statement`), and run the query to find the records that failed. Does my test file need to be named \`schema.yml\`? No! You can name this file whatever you want (including `whatever_you_want.yml`), so long as: * The file is in your `models/` directory¹ * The file has `.yml` extension Check out the [docs](https://docs.getdbt.com/reference/configs-and-properties.md) for more information. ¹If you're declaring properties for seeds, snapshots, or macros, you can also place this file in the related directory — `seeds/`, `snapshots/` and `macros/` respectively. Why do model and source YAML files always start with \`version: 2\`? Once upon a time, the structure of these `.yml` files was very different (s/o to anyone who was using dbt back then!). Adding `version: 2` allowed us to make this structure more extensible. From [dbt Core v1.5](), the top-level `version:` key is optional in all resource YAML files. If present, only `version: 2` is supported. Also starting in v1.5, both the [`config-version: 2`](https://docs.getdbt.com/reference/project-configs/config-version.md) and the top-level `version:` key in the `dbt_project.yml` are optional. Resource YAML files do not currently require this config. We only support `version: 2` if it's specified. Although we do not expect to update YAML files to `version: 3` soon, having this config will make it easier for us to introduce new structures in the future What data tests should I add to my project? We recommend that every model has a data test on a primary key, that is, a column that is `unique` and `not_null`. We also recommend that you test any assumptions on your source data. For example, if you believe that your payments can only be one of three payment methods, you should test that assumption regularly — a new payment method may introduce logic errors in your SQL. In advanced dbt projects, we recommend using [sources](https://docs.getdbt.com/docs/build/sources.md) and running these source data-integrity tests against the sources rather than models. When should I run my data tests? You should run your data tests whenever you are writing new code (to ensure you haven't broken any existing models by changing SQL), and whenever you run your transformations in production (to ensure that your assumptions about your source data are still valid). #### Document your models[​](#document-your-models "Direct link to Document your models") Adding [documentation](https://docs.getdbt.com/docs/build/documentation.md) to your project allows you to describe your models in rich detail, and share that information with your team. Here, we're going to add some basic documentation to our project. Update your `models/schema.yml` file to include some descriptions, such as those below. models/schema.yml ```yaml version: 2 models: - name: customers description: One record per customer columns: - name: customer_id description: Primary key data_tests: - unique - not_null - name: first_order_date description: NULL when a customer has not yet placed an order. - name: stg_customers description: This model cleans up customer data columns: - name: customer_id description: Primary key data_tests: - unique - not_null - name: stg_orders description: This model cleans up order data columns: - name: order_id description: Primary key data_tests: - unique - not_null - name: status data_tests: - accepted_values: arguments: # available in v1.10.5 and higher. Older versions can set the as the top-level property. values: ['placed', 'shipped', 'completed', 'return_pending', 'returned'] - name: customer_id data_tests: - not_null - relationships: arguments: to: ref('stg_customers') field: customer_id ``` * View in Catalog * View in Studio IDE [Catalog](https://docs.getdbt.com/docs/explore/explore-projects.md) provides powerful tools to interact with your dbt projects, including documentation: 1. From the IDE, run one of the following commands: * `dbt docs generate` if you're on dbt Core * `dbt build` if you're on the dbt Fusion engine 2. Click **Catalog** in the navigation menu to launch Catalog. 3. In the Catalog pane, click the environment selection dropdown menu at the top of the file tree and change it from **Production** to **Development**. [![View your development environment information.](/img/docs/collaborate/dbt-explorer/catalog-nav-dropdown.png?v=2 "View your development environment information.")](#)View your development environment information. 4. Select your project from the file tree. 5. Use the search bar or browse the resource list to find the `customers` model. 6. Click the model to view its details, including the descriptions you added. [![View your model's documentation and lineage in Catalog.](/img/docs/collaborate/dbt-explorer/example-model-details.png?v=2 "View your model's documentation and lineage in Catalog.")](#)View your model's documentation and lineage in Catalog. Catalog displays your model's description, column documentation, data tests, and lineage graph. You can also see which columns are missing documentation and track test coverage across your project. You can view docs directly from the IDE if you're on `Latest` or another version of dbt Core. Keep in mind that this is a legacy view and doesn't offer the same level of interactivity as Catalog. 1. In the IDE, run `dbt docs generate`. 2. From the navigation bar, click the **View docs** icon located to the right of the **branch name**. [![The View docs icon in the Studio IDE.](/img/docs/collaborate/dbt-explorer/docs-icon.png?v=2 "The View docs icon in the Studio IDE.")](#)The View docs icon in the Studio IDE. 3. From **Projects**, select your project name and expand the folders. 4. Click **models** > **marts** > **customers**. [![View your model's documentation in the legacy docs view.](/img/docs/collaborate/dbt-explorer/legacy-docs-view.png?v=2 "View your model's documentation in the legacy docs view.")](#)View your model's documentation in the legacy docs view. ###### FAQs[​](#faqs "Direct link to FAQs") How do I write long-form explanations in my descriptions? If you need more than a sentence to explain a model, you can: 1. Split your description over multiple lines using `>`. Interior line breaks are removed and Markdown can be used. This method is recommended for simple, single-paragraph descriptions: ```yml models: - name: customers description: > Lorem ipsum **dolor** sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. ``` 2. Split your description over multiple lines using `|`. Interior line breaks are maintained and Markdown can be used. This method is recommended for more complex descriptions: ```yml models: - name: customers description: | ### Lorem ipsum * dolor sit amet, consectetur adipisicing elit, sed do eiusmod * tempor incididunt ut labore et dolore magna aliqua. ``` 3. Use a [docs block](https://docs.getdbt.com/docs/build/documentation.md#using-docs-blocks) to write the description in a separate Markdown file. How do I access documentation in dbt Catalog? If you're using dbt to deploy your project and have a [Starter, Enterprise, or Enterprise+ plan](https://www.getdbt.com/pricing/), you can use Catalog to view your project's [resources](https://docs.getdbt.com/docs/build/projects.md) (such as models, tests, and metrics) and their lineage to gain a better understanding of its latest production state. Access Catalog in dbt by clicking the **Catalog** link in the navigation. You can have up to 5 read-only users access the documentation for your project. dbt developer plan and dbt Core users can use [dbt Docs](https://docs.getdbt.com/docs/explore/build-and-view-your-docs.md#dbt-docs), which generates basic documentation but it doesn't offer the same speed, metadata, or visibility as Catalog. #### Commit your changes[​](#commit-your-changes "Direct link to Commit your changes") Now that you've built your customer model, you need to commit the changes you made to the project so that the repository has your latest code. **If you edited directly in the protected primary branch:**
1. Click the **Commit and sync git** button. This action prepares your changes for commit. 2. A modal titled **Commit to a new branch** will appear. 3. In the modal window, name your new branch `add-customers-model`. This branches off from your primary branch with your new changes. 4. Add a commit message, such as "Add customers model, tests, docs" and and commit your changes. 5. Click **Merge this branch to main** to add these changes to the main branch on your repo. **If you created a new branch before editing:**
1. Since you already branched out of the primary protected branch, go to **Version Control** on the left. 2. Click **Commit and sync** to add a message. 3. Add a commit message, such as "Add customers model, tests, docs." 4. Click **Merge this branch to main** to add these changes to the main branch on your repo. #### Deploy dbt[​](#deploy-dbt "Direct link to Deploy dbt") Use dbt's Scheduler to deploy your production jobs confidently and build observability into your processes. You'll learn to create a deployment environment and run a job in the following steps. ##### Create a deployment environment[​](#create-a-deployment-environment "Direct link to Create a deployment environment") 1. From the main menu, go to **Orchestration** > **Environments**. 2. Click **Create environment**. 3. In the **Name** field, write the name of your deployment environment. For example, "Production." 4. The **dbt version** will default to the latest available. We recommend all new projects run on the latest version of dbt. 5. Under **Deployment connection**, enter the name of the dataset you want to use as the target, such as "Analytics". This will allow dbt to build and work with that dataset. For some data warehouses, the target dataset may be referred to as a "schema". 6. Click **Save**. ##### Create and run a job[​](#create-and-run-a-job "Direct link to Create and run a job") Jobs are a set of dbt commands that you want to run on a schedule. For example, `dbt build`. As the `jaffle_shop` business gains more customers, and those customers create more orders, you will see more records added to your source data. Because you materialized the `customers` model as a table, you'll need to periodically rebuild your table to ensure that the data stays up-to-date. This update will happen when you run a job. 1. After creating your deployment environment, you should be directed to the page for a new environment. If not, select **Orchestration** from the main menu, then click **Jobs**. 2. Click **Create job** > **Deploy job**. 3. Provide a job name (for example, "Production run") and select the environment you just created. 4. Scroll down to the **Execution settings** section. 5. Under **Commands**, add this command as part of your job if you don't see it: * `dbt build` 6. Select the **Generate docs on run** option to automatically [generate updated project docs](https://docs.getdbt.com/docs/explore/build-and-view-your-docs.md) each time your job runs. 7. For this exercise, do *not* set a schedule for your project to run — while your organization's project should run regularly, there's no need to run this example project on a schedule. Scheduling a job is sometimes referred to as *deploying a project*. 8. Click **Save**, then click **Run now** to run your job. 9. Click the run and watch its progress under **Run summary**. 10. Once the run is complete, click **View Documentation** to see the docs for your project. Congratulations 🎉! You've just deployed your first dbt project! ###### FAQs[​](#faqs "Direct link to FAQs") What happens if one of my runs fails? If you're using dbt, we recommend setting up email and Slack notifications (`Account Settings > Notifications`) for any failed runs. Then, debug these runs the same way you would debug any runs in development. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Quickstart for dbt and Databricks [Back to guides](https://docs.getdbt.com/guides.md) Platform Quickstart Databricks Beginner [Menu ]() #### Introduction[​](#introduction "Direct link to Introduction") In this quickstart guide, you'll learn how to use dbt with Databricks. It will show you how to: * Create a Databricks workspace. * Load sample data into your Databricks account. * Connect dbt to Databricks. * Take a sample query and turn it into a model in your dbt project. A model in dbt is a select statement. * Add tests to your models. * Document your models. * Schedule a job to run. Videos for you You can check out [dbt Fundamentals](https://learn.getdbt.com/courses/dbt-fundamentals) for free if you're interested in course learning with videos. ##### Prerequisites​[​](#prerequisites "Direct link to Prerequisites​") * You have a [dbt account](https://www.getdbt.com/signup/). * You have an account with a cloud service provider (such as AWS, GCP, and Azure) and have permissions to create an S3 bucket with this account. For demonstrative purposes, this guide uses AWS as the cloud service provider. ##### Related content[​](#related-content "Direct link to Related content") * Learn more with [dbt Learn courses](https://learn.getdbt.com) * [CI jobs](https://docs.getdbt.com/docs/deploy/continuous-integration.md) * [Deploy jobs](https://docs.getdbt.com/docs/deploy/deploy-jobs.md) * [Job notifications](https://docs.getdbt.com/docs/deploy/job-notifications.md) * [Source freshness](https://docs.getdbt.com/docs/deploy/source-freshness.md) #### Create a Databricks workspace[​](#create-a-databricks-workspace "Direct link to Create a Databricks workspace") 1. Use your existing account or [sign up for a Databricks account](https://databricks.com/). Complete the form with your user information and click **Continue**. [![Sign up for Databricks](/img/databricks_tutorial/images/signup_form.png?v=2 "Sign up for Databricks")](#)Sign up for Databricks 2. On the next screen, select your cloud provider. This tutorial uses AWS as the cloud provider, but if you use Azure or GCP internally, please select your platform. The setup process will be similar. Do not select the **Get started with Community Edition** option, as this will not provide the required compute for this guide. [![Choose cloud provider](/img/databricks_tutorial/images/choose_provider.png?v=2 "Choose cloud provider")](#)Choose cloud provider 3. Check your email and complete the verification process. 4. After completing the verification processes, you will be brought to the first setup screen. Databricks defaults to the `Premium` plan and you can change the trial to `Enterprise` on this page. [![Choose Databricks Plan](/img/databricks_tutorial/images/choose_plan.png?v=2 "Choose Databricks Plan")](#)Choose Databricks Plan 5. Now, it's time to create your first workspace. A Databricks workspace is an environment for accessing all of your Databricks assets. The workspace organizes objects like notebooks, SQL warehouses, clusters, and more so into one place. Provide the name of your workspace, choose the appropriate AWS region, and click **Start Quickstart**. You might get the checkbox of **I have data in S3 that I want to query with Databricks**. You do not need to check this off for this tutorial. [![Create AWS resources](/img/databricks_tutorial/images/start_quickstart.png?v=2 "Create AWS resources")](#)Create AWS resources 6. By clicking on `Start Quickstart`, you will be redirected to AWS and asked to log in if you haven’t already. After logging in, you should see a page similar to this. [![Create AWS resources](/img/databricks_tutorial/images/quick_create_stack.png?v=2 "Create AWS resources")](#)Create AWS resources tip If you get a session error and don’t get redirected to this page, you can go back to the Databricks UI and create a workspace from the interface. All you have to do is click **create workspaces**, choose the quickstart, fill out the form and click **Start Quickstart**. 7. There is no need to change any of the pre-filled out fields in the Parameters. Just add in your Databricks password under **Databricks Account Credentials**. Check off the Acknowledgement and click **Create stack**. [![Parameters](/img/databricks_tutorial/images/parameters.png?v=2 "Parameters")](#)Parameters [![Capabilities](/img/databricks_tutorial/images/create_stack.png?v=2 "Capabilities")](#)Capabilities 8. Go back to the Databricks tab. You should see that your workspace is ready to use. [![A Databricks Workspace](/img/databricks_tutorial/images/workspaces.png?v=2 "A Databricks Workspace")](#)A Databricks Workspace 9. Now let’s jump into the workspace. Click **Open** and log into the workspace using the same login as you used to log into the account. #### Load data[​](#load-data "Direct link to Load data") 1. Download these CSV files (the Jaffle Shop sample data) that you will need for this guide: * [jaffle\_shop\_customers.csv](https://dbt-tutorial-public.s3-us-west-2.amazonaws.com/jaffle_shop_customers.csv) * [jaffle\_shop\_orders.csv](https://dbt-tutorial-public.s3-us-west-2.amazonaws.com/jaffle_shop_orders.csv) * [stripe\_payments.csv](https://dbt-tutorial-public.s3-us-west-2.amazonaws.com/stripe_payments.csv) 2. First we need a SQL warehouse. Find the drop down menu and toggle into the SQL space. [![SQL space](/img/databricks_tutorial/images/go_to_sql.png?v=2 "SQL space")](#)SQL space 3. We will be setting up a SQL warehouse now. Select **SQL Warehouses** from the left hand side console. You will see that a default SQL Warehouse exists. 4. Click **Start** on the Starter Warehouse. This will take a few minutes to get the necessary resources spun up. 5. Once the SQL Warehouse is up, click **New** and then **File upload** on the dropdown menu. [![New File Upload Using Databricks SQL](/img/databricks_tutorial/images/new_file_upload_using_databricks_SQL.png?v=2 "New File Upload Using Databricks SQL")](#)New File Upload Using Databricks SQL 6. Let's load the Jaffle Shop Customers data first. Drop in the `jaffle_shop_customers.csv` file into the UI. [![Databricks Table Loader](/img/databricks_tutorial/images/databricks_table_loader.png?v=2 "Databricks Table Loader")](#)Databricks Table Loader 7. Update the Table Attributes at the top: * **data\_catalog** = hive\_metastore * **database** = default * **table** = jaffle\_shop\_customers * Make sure that the column data types are correct. The way you can do this is by hovering over the datatype icon next to the column name. * **ID** = bigint * **FIRST\_NAME** = string * **LAST\_NAME** = string [![Load jaffle shop customers](/img/databricks_tutorial/images/jaffle_shop_customers_upload.png?v=2 "Load jaffle shop customers")](#)Load jaffle shop customers 8. Click **Create** on the bottom once you’re done. 9. Now let’s do the same for `Jaffle Shop Orders` and `Stripe Payments`. [![Load jaffle shop orders](/img/databricks_tutorial/images/jaffle_shop_orders_upload.png?v=2 "Load jaffle shop orders")](#)Load jaffle shop orders [![Load stripe payments](/img/databricks_tutorial/images/stripe_payments_upload.png?v=2 "Load stripe payments")](#)Load stripe payments 10. Once that's done, make sure you can query the training data. Navigate to the `SQL Editor` through the left hand menu. This will bring you to a query editor. 11. Ensure that you can run a `select *` from each of the tables with the following code snippets. ```sql select * from default.jaffle_shop_customers select * from default.jaffle_shop_orders select * from default.stripe_payments ``` [![Query Check](/img/databricks_tutorial/images/query_check.png?v=2 "Query Check")](#)Query Check 12. To ensure any users who might be working on your dbt project has access to your object, run this command. ```sql grant all privileges on schema default to users; ``` #### Connect dbt to Databricks[​](#connect-dbt-to-databricks "Direct link to Connect dbt to Databricks") There are two ways to connect dbt to Databricks. The first option is Partner Connect, which provides a streamlined setup to create your dbt account from within your new Databricks trial account. The second option is to create your dbt account separately and build the Databricks connection yourself (connect manually). If you want to get started quickly, dbt Labs recommends using Partner Connect. If you want to customize your setup from the very beginning and gain familiarity with the dbt setup flow, dbt Labs recommends connecting manually. #### Set up the integration from Partner Connect[​](#set-up-the-integration-from-partner-connect "Direct link to Set up the integration from Partner Connect") note Partner Connect is intended for trial partner accounts. If your organization already has a dbt account, connect manually. Refer to [Connect to dbt manually](https://docs.databricks.com/partners/prep/dbt-cloud.html#connect-to-dbt-cloud-manually) in the Databricks docs for instructions. To connect dbt to Databricks using Partner Connect, do the following: 1. In the sidebar of your Databricks account, click **Partner Connect**. 2. Click the **dbt tile**. 3. Select a catalog from the drop-down list, and then click **Next**. The drop-down list displays catalogs you have read and write access to. If your workspace isn't `-enabled`, the legacy Hive metastore (`hive_metastore`) is used. 4. If there are SQL warehouses in your workspace, select a SQL warehouse from the drop-down list. If your SQL warehouse is stopped, click **Start**. 5. If there are no SQL warehouses in your workspace: 1. Click **Create warehouse**. A new tab opens in your browser that displays the **New SQL Warehouse** page in the Databricks SQL UI. 2. Follow the steps in [Create a SQL warehouse](https://docs.databricks.com/en/sql/admin/create-sql-warehouse.html#create-a-sql-warehouse) in the Databricks docs. 3. Return to the Partner Connect tab in your browser, and then close the **dbt tile**. 4. Re-open the **dbt tile**. 5. Select the SQL warehouse you just created from the drop-down list. 6. Select a schema from the drop-down list, and then click **Add**. The drop-down list displays schemas you have read and write access to. You can repeat this step to add multiple schemas. Partner Connect creates the following resources in your workspace: * A Databricks service principal named **DBT\_CLOUD\_USER**. * A Databricks personal access token that is associated with the **DBT\_CLOUD\_USER** service principal. Partner Connect also grants the following privileges to the **DBT\_CLOUD\_USER** service principal: * (Unity Catalog) **USE CATALOG**: Required to interact with objects within the selected catalog. * (Unity Catalog) **USE SCHEMA**: Required to interact with objects within the selected schema. * (Unity Catalog) **CREATE SCHEMA**: Grants the ability to create schemas in the selected catalog. * (Hive metastore) **USAGE**: Required to grant the **SELECT** and **READ\_METADATA** privileges for the schemas you selected. * **SELECT**: Grants the ability to read the schemas you selected. * (Hive metastore) **READ\_METADATA**: Grants the ability to read metadata for the schemas you selected. * **CAN\_USE**: Grants permissions to use the SQL warehouse you selected. 7. Click **Next**. The **Email** box displays the email address for your Databricks account. dbt Labs uses this email address to prompt you to create a trial dbt account. 8. Click **Connect to dbt**. A new tab opens in your web browser, which displays the getdbt.com website. 9. Complete the on-screen instructions on the getdbt.com website to create your trial dbt account. #### Set up a dbt managed repository[​](#set-up-a-dbt-managed-repository "Direct link to Set up a dbt managed repository") When you develop in dbt, you can leverage [Git](https://docs.getdbt.com/docs/cloud/git/git-version-control.md) to version control your code. To connect to a repository, you can either set up a dbt-hosted [managed repository](https://docs.getdbt.com/docs/cloud/git/managed-repository.md) or directly connect to a [supported git provider](https://docs.getdbt.com/docs/cloud/git/connect-github.md). Managed repositories are a great way to trial dbt without needing to create a new repository. In the long run, it's better to connect to a supported git provider to use features like automation and [continuous integration](https://docs.getdbt.com/docs/deploy/continuous-integration.md). To set up a managed repository: 1. Under "Setup a repository", select **Managed**. 2. Type a name for your repo such as `bbaggins-dbt-quickstart` 3. Click **Create**. It will take a few seconds for your repository to be created and imported. 4. Once you see the "Successfully imported repository," click **Continue**. #### Initialize your dbt project​ and start developing[​](#initialize-your-dbt-project-and-start-developing "Direct link to Initialize your dbt project​ and start developing") Now that you have a repository configured, you can initialize your project and start development in dbt: 1. Click **Start developing in the Studio IDE**. It might take a few minutes for your project to spin up for the first time as it establishes your git connection, clones your repo, and tests the connection to the warehouse. 2. Above the file tree to the left, click **Initialize dbt project**. This builds out your folder structure with example models. 3. Make your initial commit by clicking **Commit and sync**. Use the commit message `initial commit` and click **Commit**. This creates the first commit to your managed repo and allows you to open a branch where you can add new dbt code. 4. You can now directly query data from your warehouse and execute `dbt run`. You can try this out now: * Click **+ Create new file**, add this query to the new file, and click **Save as** to save the new file: ```sql select * from default.jaffle_shop_customers ``` * In the command line bar at the bottom, enter `dbt run` and click **Enter**. You should see a `dbt run succeeded` message. #### Build your first model[​](#build-your-first-model "Direct link to Build your first model") You have two options for working with files in the Studio IDE: * Create a new branch (recommended) — Create a new branch to edit and commit your changes. Navigate to **Version Control** on the left sidebar and click **Create branch**. * Edit in the protected primary branch — If you prefer to edit, format, or lint files and execute dbt commands directly in your primary git branch. The Studio IDE prevents commits to the protected branch, so you will be prompted to commit your changes to a new branch. Name the new branch `add-customers-model`. 1. Click the **...** next to the `models` directory, then select **Create file**. 2. Name the file `customers.sql`, then click **Create**. 3. Copy the following query into the file and click **Save**. ```sql with customers as ( select id as customer_id, first_name, last_name from jaffle_shop_customers ), orders as ( select id as order_id, user_id as customer_id, order_date, status from jaffle_shop_orders ), customer_orders as ( select customer_id, min(order_date) as first_order_date, max(order_date) as most_recent_order_date, count(order_id) as number_of_orders from orders group by 1 ), final as ( select customers.customer_id, customers.first_name, customers.last_name, customer_orders.first_order_date, customer_orders.most_recent_order_date, coalesce(customer_orders.number_of_orders, 0) as number_of_orders from customers left join customer_orders using (customer_id) ) select * from final ``` 4. Enter `dbt run` in the command prompt at the bottom of the screen. You should get a successful run and see the three models. Later, you can connect your business intelligence (BI) tools to these views and tables so they only read cleaned up data rather than raw data in your BI tool. ###### FAQs[​](#faqs "Direct link to FAQs") How can I see the SQL that dbt is running? To check out the SQL that dbt is running, you can look in: * dbt: * Within the run output, click on a model name, and then select "Details" * dbt Core: * The `target/compiled/` directory for compiled `select` statements * The `target/run/` directory for compiled `create` statements * The `logs/dbt.log` file for verbose logging. How did dbt choose which schema to build my models in? By default, dbt builds models in your target schema. To change your target schema: * If you're developing in **dbt**, these are set for each user when you first use a development environment. * If you're developing with **dbt Core**, this is the `schema:` parameter in your `profiles.yml` file. If you wish to split your models across multiple schemas, check out the docs on [using custom schemas](https://docs.getdbt.com/docs/build/custom-schemas.md). Note: on BigQuery, `dataset` is used interchangeably with `schema`. Do I need to create my target schema before running dbt? Nope! dbt will check if the schema exists when it runs. If the schema does not exist, dbt will create it for you. If I rerun dbt, will there be any downtime as models are rebuilt? Nope! The SQL that dbt generates behind the scenes ensures that any relations are replaced atomically (i.e. your business users won't experience any downtime). The implementation of this varies on each warehouse, check out the [logs](https://docs.getdbt.com/faqs/Runs/checking-logs.md) to see the SQL dbt is executing. What happens if the SQL in my query is bad or I get a database error? If there's a mistake in your SQL, dbt will return the error that your database returns. ```shell $ dbt run --select customers Running with dbt=1.9.0 Found 3 models, 9 tests, 0 snapshots, 0 analyses, 133 macros, 0 operations, 0 seed files, 0 sources 14:04:12 | Concurrency: 1 threads (target='dev') 14:04:12 | 14:04:12 | 1 of 1 START view model dbt_alice.customers.......................... [RUN] 14:04:13 | 1 of 1 ERROR creating view model dbt_alice.customers................. [ERROR in 0.81s] 14:04:13 | 14:04:13 | Finished running 1 view model in 1.68s. Completed with 1 error and 0 warnings: Database Error in model customers (models/customers.sql) Syntax error: Expected ")" but got identifier `your-info-12345` at [13:15] compiled SQL at target/run/jaffle_shop/customers.sql Done. PASS=0 WARN=0 ERROR=1 SKIP=0 TOTAL=1 ``` Any models downstream of this model will also be skipped. Use the error message and the [compiled SQL](https://docs.getdbt.com/faqs/Runs/checking-logs.md) to debug any errors. #### Change the way your model is materialized[​](#change-the-way-your-model-is-materialized "Direct link to Change the way your model is materialized") One of the most powerful features of dbt is that you can change the way a model is materialized in your warehouse, simply by changing a configuration value. You can change things between tables and views by changing a keyword rather than writing the data definition language (DDL) to do this behind the scenes. By default, everything gets created as a view. You can override that at the directory level so everything in that directory will materialize to a different materialization. 1. Edit your `dbt_project.yml` file. * Update your project `name` to: dbt\_project.yml ```yaml name: 'jaffle_shop' ``` * Configure `jaffle_shop` so everything in it will be materialized as a table; and configure `example` so everything in it will be materialized as a view. Update your `models` config in the project YAML file to: dbt\_project.yml ```yaml models: jaffle_shop: +materialized: table example: +materialized: view ``` * Click **Save**. 2. Enter the `dbt run` command. Your `customers` model should now be built as a table! info To do this, dbt had to first run a `drop view` statement (or API call on BigQuery), then a `create table as` statement. 3. Edit `models/customers.sql` to override the `dbt_project.yml` for the `customers` model only by adding the following snippet to the top, and click **Save**: models/customers.sql ```sql {{ config( materialized='view' ) }} with customers as ( select id as customer_id ... ) ``` 4. Enter the `dbt run` command. Your model, `customers`, should now build as a view. * BigQuery users need to run `dbt run --full-refresh` instead of `dbt run` to full apply materialization changes. 5. Enter the `dbt run --full-refresh` command for this to take effect in your warehouse. ##### FAQs[​](#faqs "Direct link to FAQs") What materializations are available in dbt? dbt ships with five built-in materializations: `view`, `table`, `incremental`, `ephemeral`, and `materialized_view`. Check out the documentation on [materializations](https://docs.getdbt.com/docs/build/materializations.md) for more information on each of these options. You can also create your own [custom materializations](https://docs.getdbt.com/guides/create-new-materializations.md). This is an advanced feature of dbt. Which materialization should I use for my model? Start out with views, and then change models to tables when required for performance reasons (i.e. downstream queries have slowed). Check out the [docs on materializations](https://docs.getdbt.com/docs/build/materializations.md) for advice on when to use each materialization. What model configurations exist? You can also configure: * [tags](https://docs.getdbt.com/reference/resource-configs/tags.md) to support easy categorization and graph selection * [custom schemas](https://docs.getdbt.com/reference/resource-properties/schema.md) to split your models across multiple schemas * [aliases](https://docs.getdbt.com/reference/resource-configs/alias.md) if your view/table name should differ from the filename * Snippets of SQL to run at the start or end of a model, known as [hooks](https://docs.getdbt.com/docs/build/hooks-operations.md) * Warehouse-specific configurations for performance (e.g. `sort` and `dist` keys on Redshift, `partitions` on BigQuery) Check out the docs on [model configurations](https://docs.getdbt.com/reference/model-configs.md) to learn more. #### Delete the example models[​](#delete-the-example-models "Direct link to Delete the example models") You can now delete the files that dbt created when you initialized the project: 1. Delete the `models/example/` directory. 2. Delete the `example:` key from your `dbt_project.yml` file, and any configurations that are listed under it. dbt\_project.yml ```yaml # before models: jaffle_shop: +materialized: table example: +materialized: view ``` dbt\_project.yml ```yaml # after models: jaffle_shop: +materialized: table ``` 3. Save your changes. ###### FAQs[​](#faqs "Direct link to FAQs") How do I remove deleted models from my data warehouse? If you delete a model from your dbt project, dbt does not automatically drop the relation from your schema. This means that you can end up with extra objects in schemas that dbt creates, which can be confusing to other users. (This can also happen when you switch a model from being a view or table, to ephemeral) When you remove models from your dbt project, you should manually drop the related relations from your schema. I got an "unused model configurations" error message, what does this mean? You might have forgotten to nest your configurations under your project name, or you might be trying to apply configurations to a directory that doesn't exist. Check out this [article](https://discourse.getdbt.com/t/faq-i-got-an-unused-model-configurations-error-message-what-does-this-mean/112) to understand more. #### Build models on top of other models[​](#build-models-on-top-of-other-models "Direct link to Build models on top of other models") As a best practice in SQL, you should separate logic that cleans up your data from logic that transforms your data. You have already started doing this in the existing query by using common table expressions (CTEs). Now you can experiment by separating the logic out into separate models and using the [ref](https://docs.getdbt.com/reference/dbt-jinja-functions/ref.md) function to build models on top of other models: [![The DAG we want for our dbt project](/img/dbt-dag.png?v=2 "The DAG we want for our dbt project")](#)The DAG we want for our dbt project 1. Create a new SQL file, `models/stg_customers.sql`, with the SQL from the `customers` CTE in our original query. 2. Create a second new SQL file, `models/stg_orders.sql`, with the SQL from the `orders` CTE in our original query. models/stg\_customers.sql ```sql select id as customer_id, first_name, last_name from jaffle_shop_customers ``` models/stg\_orders.sql ```sql select id as order_id, user_id as customer_id, order_date, status from jaffle_shop_orders ``` 3. Edit the SQL in your `models/customers.sql` file as follows: models/customers.sql ```sql with customers as ( select * from {{ ref('stg_customers') }} ), orders as ( select * from {{ ref('stg_orders') }} ), customer_orders as ( select customer_id, min(order_date) as first_order_date, max(order_date) as most_recent_order_date, count(order_id) as number_of_orders from orders group by 1 ), final as ( select customers.customer_id, customers.first_name, customers.last_name, customer_orders.first_order_date, customer_orders.most_recent_order_date, coalesce(customer_orders.number_of_orders, 0) as number_of_orders from customers left join customer_orders using (customer_id) ) select * from final ``` 4. Execute `dbt run`. This time, when you performed a `dbt run`, separate views/tables were created for `stg_customers`, `stg_orders` and `customers`. dbt inferred the order to run these models. Because `customers` depends on `stg_customers` and `stg_orders`, dbt builds `customers` last. You do not need to explicitly define these dependencies. ###### FAQs[​](#faq-2 "Direct link to FAQs") How do I run one model at a time? To run one model, use the `--select` flag (or `-s` flag), followed by the name of the model: ```shell $ dbt run --select customers ``` Check out the [model selection syntax documentation](https://docs.getdbt.com/reference/node-selection/syntax.md) for more operators and examples. Do ref-able resource names need to be unique? Within one project: yes! To build dependencies between resources (such as models, seeds, and snapshots), you need to use the `ref` function, and pass in the resource name as an argument. dbt uses that resource name to uniquely resolve the `ref` to a specific resource. As a result, these resource names need to be unique, *even if they are in distinct folders*. A resource in one project can have the same name as a resource in another project (installed as a dependency). dbt uses the project name to uniquely identify each resource. We call this "namespacing." If you `ref` a resource with a duplicated name, it will resolve to the resource within the same namespace (package or project), or raise an error because of an ambiguous reference. Use [two-argument `ref`](https://docs.getdbt.com/reference/dbt-jinja-functions/ref.md#ref-project-specific-models) to disambiguate references by specifying the namespace. Those resource will still need to land in distinct locations in the data warehouse. Read the docs on [custom aliases](https://docs.getdbt.com/docs/build/custom-aliases.md) and [custom schemas](https://docs.getdbt.com/docs/build/custom-schemas.md) for details on how to achieve this. As I create more models, how should I keep my project organized? What should I name my models? There's no one best way to structure a project! Every organization is unique. If you're just getting started, check out how we (dbt Labs) [structure our dbt projects](https://docs.getdbt.com/best-practices/how-we-structure/1-guide-overview.md). #### Add tests to your models[​](#add-tests-to-your-models "Direct link to Add tests to your models") Adding [data tests](https://docs.getdbt.com/docs/build/data-tests.md) to a project helps validate that your models are working correctly. To add data tests to your project: 1. Create a new YAML file in the `models` directory, named `models/schema.yml` 2. Add the following contents to the file: models/schema.yml ```yaml version: 2 models: - name: customers columns: - name: customer_id data_tests: - unique - not_null - name: stg_customers columns: - name: customer_id data_tests: - unique - not_null - name: stg_orders columns: - name: order_id data_tests: - unique - not_null - name: status data_tests: - accepted_values: arguments: # available in v1.10.5 and higher. Older versions can set the as the top-level property. values: ['placed', 'shipped', 'completed', 'return_pending', 'returned'] - name: customer_id data_tests: - not_null - relationships: arguments: to: ref('stg_customers') field: customer_id ``` 3. Run `dbt test`, and confirm that all your tests passed. When you run `dbt test`, dbt iterates through your YAML files, and constructs a query for each test. Each query will return the number of records that fail the test. If this number is 0, then the test is successful. ###### FAQs[​](#faqs "Direct link to FAQs") What tests are available for me to use in dbt? Can I add my own custom tests? Out of the box, dbt ships with the following data tests: * `unique` * `not_null` * `accepted_values` * `relationships` (for example, referential integrity) You can also write your own [custom generic tests](https://docs.getdbt.com/docs/build/data-tests.md#generic-data-tests). Some additional generic tests have been open-sourced in the [dbt-utils package](https://github.com/dbt-labs/dbt-utils#generic-tests). Check out the docs on [packages](https://docs.getdbt.com/docs/build/packages.md) to learn how to make these tests available in your project. How do I test one model at a time? Running tests on one model looks very similar to running a model: use the `--select` flag (or `-s` flag), followed by the name of the model: ```shell dbt test --select customers ``` Check out the [model selection syntax documentation](https://docs.getdbt.com/reference/node-selection/syntax.md) for full syntax, and [test selection examples](https://docs.getdbt.com/reference/node-selection/test-selection-examples.md) in particular. One of my tests failed, how can I debug it? To debug a failing test, find the SQL that dbt ran by: * dbt: * Within the test output, click on the failed test, and then select "Details". * dbt Core: * Open the file path returned as part of the error message. * Navigate to the `target/compiled/schema_tests` directory for all compiled test queries. Copy the SQL into a query editor (in dbt, you can paste it into a new `Statement`), and run the query to find the records that failed. Does my test file need to be named \`schema.yml\`? No! You can name this file whatever you want (including `whatever_you_want.yml`), so long as: * The file is in your `models/` directory¹ * The file has `.yml` extension Check out the [docs](https://docs.getdbt.com/reference/configs-and-properties.md) for more information. ¹If you're declaring properties for seeds, snapshots, or macros, you can also place this file in the related directory — `seeds/`, `snapshots/` and `macros/` respectively. Why do model and source YAML files always start with \`version: 2\`? Once upon a time, the structure of these `.yml` files was very different (s/o to anyone who was using dbt back then!). Adding `version: 2` allowed us to make this structure more extensible. From [dbt Core v1.5](), the top-level `version:` key is optional in all resource YAML files. If present, only `version: 2` is supported. Also starting in v1.5, both the [`config-version: 2`](https://docs.getdbt.com/reference/project-configs/config-version.md) and the top-level `version:` key in the `dbt_project.yml` are optional. Resource YAML files do not currently require this config. We only support `version: 2` if it's specified. Although we do not expect to update YAML files to `version: 3` soon, having this config will make it easier for us to introduce new structures in the future What data tests should I add to my project? We recommend that every model has a data test on a primary key, that is, a column that is `unique` and `not_null`. We also recommend that you test any assumptions on your source data. For example, if you believe that your payments can only be one of three payment methods, you should test that assumption regularly — a new payment method may introduce logic errors in your SQL. In advanced dbt projects, we recommend using [sources](https://docs.getdbt.com/docs/build/sources.md) and running these source data-integrity tests against the sources rather than models. When should I run my data tests? You should run your data tests whenever you are writing new code (to ensure you haven't broken any existing models by changing SQL), and whenever you run your transformations in production (to ensure that your assumptions about your source data are still valid). #### Document your models[​](#document-your-models "Direct link to Document your models") Adding [documentation](https://docs.getdbt.com/docs/build/documentation.md) to your project allows you to describe your models in rich detail, and share that information with your team. Here, we're going to add some basic documentation to our project. Update your `models/schema.yml` file to include some descriptions, such as those below. models/schema.yml ```yaml version: 2 models: - name: customers description: One record per customer columns: - name: customer_id description: Primary key data_tests: - unique - not_null - name: first_order_date description: NULL when a customer has not yet placed an order. - name: stg_customers description: This model cleans up customer data columns: - name: customer_id description: Primary key data_tests: - unique - not_null - name: stg_orders description: This model cleans up order data columns: - name: order_id description: Primary key data_tests: - unique - not_null - name: status data_tests: - accepted_values: arguments: # available in v1.10.5 and higher. Older versions can set the as the top-level property. values: ['placed', 'shipped', 'completed', 'return_pending', 'returned'] - name: customer_id data_tests: - not_null - relationships: arguments: to: ref('stg_customers') field: customer_id ``` * View in Catalog * View in Studio IDE [Catalog](https://docs.getdbt.com/docs/explore/explore-projects.md) provides powerful tools to interact with your dbt projects, including documentation: 1. From the IDE, run one of the following commands: * `dbt docs generate` if you're on dbt Core * `dbt build` if you're on the dbt Fusion engine 2. Click **Catalog** in the navigation menu to launch Catalog. 3. In the Catalog pane, click the environment selection dropdown menu at the top of the file tree and change it from **Production** to **Development**. [![View your development environment information.](/img/docs/collaborate/dbt-explorer/catalog-nav-dropdown.png?v=2 "View your development environment information.")](#)View your development environment information. 4. Select your project from the file tree. 5. Use the search bar or browse the resource list to find the `customers` model. 6. Click the model to view its details, including the descriptions you added. [![View your model's documentation and lineage in Catalog.](/img/docs/collaborate/dbt-explorer/example-model-details.png?v=2 "View your model's documentation and lineage in Catalog.")](#)View your model's documentation and lineage in Catalog. Catalog displays your model's description, column documentation, data tests, and lineage graph. You can also see which columns are missing documentation and track test coverage across your project. You can view docs directly from the IDE if you're on `Latest` or another version of dbt Core. Keep in mind that this is a legacy view and doesn't offer the same level of interactivity as Catalog. 1. In the IDE, run `dbt docs generate`. 2. From the navigation bar, click the **View docs** icon located to the right of the **branch name**. [![The View docs icon in the Studio IDE.](/img/docs/collaborate/dbt-explorer/docs-icon.png?v=2 "The View docs icon in the Studio IDE.")](#)The View docs icon in the Studio IDE. 3. From **Projects**, select your project name and expand the folders. 4. Click **models** > **marts** > **customers**. [![View your model's documentation in the legacy docs view.](/img/docs/collaborate/dbt-explorer/legacy-docs-view.png?v=2 "View your model's documentation in the legacy docs view.")](#)View your model's documentation in the legacy docs view. ###### FAQs[​](#faqs "Direct link to FAQs") How do I write long-form explanations in my descriptions? If you need more than a sentence to explain a model, you can: 1. Split your description over multiple lines using `>`. Interior line breaks are removed and Markdown can be used. This method is recommended for simple, single-paragraph descriptions: ```yml models: - name: customers description: > Lorem ipsum **dolor** sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. ``` 2. Split your description over multiple lines using `|`. Interior line breaks are maintained and Markdown can be used. This method is recommended for more complex descriptions: ```yml models: - name: customers description: | ### Lorem ipsum * dolor sit amet, consectetur adipisicing elit, sed do eiusmod * tempor incididunt ut labore et dolore magna aliqua. ``` 3. Use a [docs block](https://docs.getdbt.com/docs/build/documentation.md#using-docs-blocks) to write the description in a separate Markdown file. How do I access documentation in dbt Catalog? If you're using dbt to deploy your project and have a [Starter, Enterprise, or Enterprise+ plan](https://www.getdbt.com/pricing/), you can use Catalog to view your project's [resources](https://docs.getdbt.com/docs/build/projects.md) (such as models, tests, and metrics) and their lineage to gain a better understanding of its latest production state. Access Catalog in dbt by clicking the **Catalog** link in the navigation. You can have up to 5 read-only users access the documentation for your project. dbt developer plan and dbt Core users can use [dbt Docs](https://docs.getdbt.com/docs/explore/build-and-view-your-docs.md#dbt-docs), which generates basic documentation but it doesn't offer the same speed, metadata, or visibility as Catalog. #### Commit your changes[​](#commit-your-changes "Direct link to Commit your changes") Now that you've built your customer model, you need to commit the changes you made to the project so that the repository has your latest code. **If you edited directly in the protected primary branch:**
1. Click the **Commit and sync git** button. This action prepares your changes for commit. 2. A modal titled **Commit to a new branch** will appear. 3. In the modal window, name your new branch `add-customers-model`. This branches off from your primary branch with your new changes. 4. Add a commit message, such as "Add customers model, tests, docs" and and commit your changes. 5. Click **Merge this branch to main** to add these changes to the main branch on your repo. **If you created a new branch before editing:**
1. Since you already branched out of the primary protected branch, go to **Version Control** on the left. 2. Click **Commit and sync** to add a message. 3. Add a commit message, such as "Add customers model, tests, docs." 4. Click **Merge this branch to main** to add these changes to the main branch on your repo. #### Deploy dbt[​](#deploy-dbt "Direct link to Deploy dbt") Use dbt's Scheduler to deploy your production jobs confidently and build observability into your processes. You'll learn to create a deployment environment and run a job in the following steps. ##### Create a deployment environment[​](#create-a-deployment-environment "Direct link to Create a deployment environment") 1. From the main menu, go to **Orchestration** > **Environments**. 2. Click **Create environment**. 3. In the **Name** field, write the name of your deployment environment. For example, "Production." 4. The **dbt version** will default to the latest available. We recommend all new projects run on the latest version of dbt. 5. Under **Deployment connection**, enter the name of the dataset you want to use as the target, such as "Analytics". This will allow dbt to build and work with that dataset. For some data warehouses, the target dataset may be referred to as a "schema". 6. Click **Save**. ##### Create and run a job[​](#create-and-run-a-job "Direct link to Create and run a job") Jobs are a set of dbt commands that you want to run on a schedule. For example, `dbt build`. As the `jaffle_shop` business gains more customers, and those customers create more orders, you will see more records added to your source data. Because you materialized the `customers` model as a table, you'll need to periodically rebuild your table to ensure that the data stays up-to-date. This update will happen when you run a job. 1. After creating your deployment environment, you should be directed to the page for a new environment. If not, select **Orchestration** from the main menu, then click **Jobs**. 2. Click **Create job** > **Deploy job**. 3. Provide a job name (for example, "Production run") and select the environment you just created. 4. Scroll down to the **Execution settings** section. 5. Under **Commands**, add this command as part of your job if you don't see it: * `dbt build` 6. Select the **Generate docs on run** option to automatically [generate updated project docs](https://docs.getdbt.com/docs/explore/build-and-view-your-docs.md) each time your job runs. 7. For this exercise, do *not* set a schedule for your project to run — while your organization's project should run regularly, there's no need to run this example project on a schedule. Scheduling a job is sometimes referred to as *deploying a project*. 8. Click **Save**, then click **Run now** to run your job. 9. Click the run and watch its progress under **Run summary**. 10. Once the run is complete, click **View Documentation** to see the docs for your project. Congratulations 🎉! You've just deployed your first dbt project! ###### FAQs[​](#faqs "Direct link to FAQs") What happens if one of my runs fails? If you're using dbt, we recommend setting up email and Slack notifications (`Account Settings > Notifications`) for any failed runs. Then, debug these runs the same way you would debug any runs in development. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Quickstart for dbt and Microsoft Fabric [Back to guides](https://docs.getdbt.com/guides.md) Platform Quickstart Beginner [Menu ]() #### Introduction[​](#introduction "Direct link to Introduction") In this quickstart guide, you'll learn how to use dbt with [Microsoft Fabric](https://www.microsoft.com/en-us/microsoft-fabric). It will show you how to: * Load the Jaffle Shop sample data (provided by dbt Labs) into your Microsoft Fabric warehouse. * Connect dbt to Microsoft Fabric. * Turn a sample query into a model in your dbt project. A model in dbt is a SELECT statement. * Add tests to your models. * Document your models. * Schedule a job to run. ##### Prerequisites[​](#prerequisites "Direct link to Prerequisites") * You have a [dbt](https://www.getdbt.com/signup/) account. * You have started the Microsoft Fabric (Preview) trial. For details, refer to [Microsoft Fabric (Preview) trial](https://learn.microsoft.com/en-us/fabric/get-started/fabric-trial) in the Microsoft docs. * As a Microsoft admin, you’ve enabled service principal authentication. You must add the service principal to the Microsoft Fabric workspace with either a Member (recommended) or Admin permission set. The service principal must also have `CONNECT` privleges to the database in the warehouse. For details, refer to [Enable service principal authentication](https://learn.microsoft.com/en-us/fabric/admin/metadata-scanning-enable-read-only-apis) in the Microsoft docs. dbt needs these authentication credentials to connect to Microsoft Fabric. ##### Related content[​](#related-content "Direct link to Related content") * [dbt Learn courses](https://learn.getdbt.com) * [About continuous integration jobs](https://docs.getdbt.com/docs/deploy/continuous-integration.md) * [Deploy jobs](https://docs.getdbt.com/docs/deploy/deploy-jobs.md) * [Job notifications](https://docs.getdbt.com/docs/deploy/job-notifications.md) * [Source freshness](https://docs.getdbt.com/docs/deploy/source-freshness.md) #### Load data into your Microsoft Fabric warehouse[​](#load-data-into-your-microsoft-fabric-warehouse "Direct link to Load data into your Microsoft Fabric warehouse") 1. Log in to your [Microsoft Fabric](http://app.fabric.microsoft.com) account. 2. On the home page, select the **Synapse Data Warehouse** tile. [![Example of the Synapse Data Warehouse tile](/img/quickstarts/dbt-cloud/example-start-fabric.png?v=2 "Example of the Synapse Data Warehouse tile")](#)Example of the Synapse Data Warehouse tile 3. From **Workspaces** on the left sidebar, navigate to your organization’s workspace. Or, you can create a new workspace; refer to [Create a workspace](https://learn.microsoft.com/en-us/fabric/get-started/create-workspaces) in the Microsoft docs for more details. 4. Choose your warehouse from the table. Or, you can create a new warehouse; refer to [Create a warehouse](https://learn.microsoft.com/en-us/fabric/data-warehouse/tutorial-create-warehouse) in the Microsoft docs for more details. 5. Open the SQL editor by selecting **New SQL query** from the top bar. 6. Copy these statements into the SQL editor to load the Jaffle Shop example data: ```sql DROP TABLE dbo.customers; CREATE TABLE dbo.customers ( [ID] [int], [FIRST_NAME] [varchar](8000), [LAST_NAME] [varchar](8000) ); COPY INTO [dbo].[customers] FROM 'https://dbtlabsynapsedatalake.blob.core.windows.net/dbt-quickstart-public/jaffle_shop_customers.parquet' WITH ( FILE_TYPE = 'PARQUET' ); DROP TABLE dbo.orders; CREATE TABLE dbo.orders ( [ID] [int], [USER_ID] [int], -- [ORDER_DATE] [int], [ORDER_DATE] [date], [STATUS] [varchar](8000) ); COPY INTO [dbo].[orders] FROM 'https://dbtlabsynapsedatalake.blob.core.windows.net/dbt-quickstart-public/jaffle_shop_orders.parquet' WITH ( FILE_TYPE = 'PARQUET' ); DROP TABLE dbo.payments; CREATE TABLE dbo.payments ( [ID] [int], [ORDERID] [int], [PAYMENTMETHOD] [varchar](8000), [STATUS] [varchar](8000), [AMOUNT] [int], [CREATED] [date] ); COPY INTO [dbo].[payments] FROM 'https://dbtlabsynapsedatalake.blob.core.windows.net/dbt-quickstart-public/stripe_payments.parquet' WITH ( FILE_TYPE = 'PARQUET' ); ``` [![Example of loading data](/img/quickstarts/dbt-cloud/example-load-data-ms-fabric.png?v=2 "Example of loading data")](#)Example of loading data #### Connect dbt to Microsoft Fabric[​](#connect-dbt-to-microsoft-fabric "Direct link to Connect dbt to Microsoft Fabric") 1. Create a new project in dbt. Navigate to **Account settings** (by clicking on your account name in the left side menu), and click **+ New Project**. 2. Enter a project name and click **Continue**. 3. Choose **Fabric** as your connection and click **Next**. 4. In the **Configure your environment** section, enter the **Settings** for your new project: * **Server** — Use the service principal's **host** value for the Fabric test endpoint. * **Port** — 1433 (which is the default). * **Database** — Use the service principal's **database** value for the Fabric test endpoint. 5. Enter the **Development credentials** for your new project: * **Authentication** — Choose **Service Principal** from the dropdown. * **Tenant ID** — Use the service principal’s **Directory (tenant) id** as the value. * **Client ID** — Use the service principal’s **application (client) ID id** as the value. * **Client secret** — Use the service principal’s **client secret** (not the **client secret id**) as the value. 6. Click **Test connection**. This verifies that dbt can access your Microsoft Fabric account. 7. Click **Next** when the test succeeds. If it failed, you might need to check your Microsoft service principal. Ensure that the principal has `CONNECT` priveleges to the database in the warehouse. #### Set up a dbt managed repository[​](#set-up-a-dbt-managed-repository "Direct link to Set up a dbt managed repository") When you develop in dbt, you can leverage [Git](https://docs.getdbt.com/docs/cloud/git/git-version-control.md) to version control your code. To connect to a repository, you can either set up a dbt-hosted [managed repository](https://docs.getdbt.com/docs/cloud/git/managed-repository.md) or directly connect to a [supported git provider](https://docs.getdbt.com/docs/cloud/git/connect-github.md). Managed repositories are a great way to trial dbt without needing to create a new repository. In the long run, it's better to connect to a supported git provider to use features like automation and [continuous integration](https://docs.getdbt.com/docs/deploy/continuous-integration.md). To set up a managed repository: 1. Under "Setup a repository", select **Managed**. 2. Type a name for your repo such as `bbaggins-dbt-quickstart` 3. Click **Create**. It will take a few seconds for your repository to be created and imported. 4. Once you see the "Successfully imported repository," click **Continue**. #### Initialize your dbt project​ and start developing[​](#initialize-your-dbt-project-and-start-developing "Direct link to Initialize your dbt project​ and start developing") Now that you have a repository configured, you can initialize your project and start development in dbt: 1. Click **Start developing in the Studio IDE**. It might take a few minutes for your project to spin up for the first time as it establishes your git connection, clones your repo, and tests the connection to the warehouse. 2. Above the file tree to the left, click **Initialize dbt project**. This builds out your folder structure with example models. 3. Make your initial commit by clicking **Commit and sync**. Use the commit message `initial commit` and click **Commit**. This creates the first commit to your managed repo and allows you to open a branch where you can add new dbt code. 4. You can now directly query data from your warehouse and execute `dbt run`. You can try this out now: * In the command line bar at the bottom, enter `dbt run` and click **Enter**. You should see a `dbt run succeeded` message. #### Build your first model[​](#build-your-first-model "Direct link to Build your first model") You have two options for working with files in the Studio IDE: * Create a new branch (recommended) — Create a new branch to edit and commit your changes. Navigate to **Version Control** on the left sidebar and click **Create branch**. * Edit in the protected primary branch — If you prefer to edit, format, or lint files and execute dbt commands directly in your primary git branch. The Studio IDE prevents commits to the protected branch, so you will be prompted to commit your changes to a new branch. Name the new branch `add-customers-model`. 1. Click the **...** next to the `models` directory, then select **Create file**. 2. Name the file `dim_customers.sql`, then click **Create**. 3. Copy the following query into the file and click **Save**. dim\_customers.sql ```sql with customers as ( select ID as customer_id, FIRST_NAME as first_name, LAST_NAME as last_name from dbo.customers ), orders as ( select ID as order_id, USER_ID as customer_id, ORDER_DATE as order_date, STATUS as status from dbo.orders ), customer_orders as ( select customer_id, min(order_date) as first_order_date, max(order_date) as most_recent_order_date, count(order_id) as number_of_orders from orders group by customer_id ), final as ( select customers.customer_id, customers.first_name, customers.last_name, customer_orders.first_order_date, customer_orders.most_recent_order_date, coalesce(customer_orders.number_of_orders, 0) as number_of_orders from customers left join customer_orders on customers.customer_id = customer_orders.customer_id ) select * from final ``` 4. Enter `dbt run` in the command prompt at the bottom of the screen. You should get a successful run and see the three models. Later, you can connect your business intelligence (BI) tools to these views and tables so they only read cleaned up data rather than raw data in your BI tool. ###### FAQs[​](#faqs "Direct link to FAQs") How can I see the SQL that dbt is running? To check out the SQL that dbt is running, you can look in: * dbt: * Within the run output, click on a model name, and then select "Details" * dbt Core: * The `target/compiled/` directory for compiled `select` statements * The `target/run/` directory for compiled `create` statements * The `logs/dbt.log` file for verbose logging. How did dbt choose which schema to build my models in? By default, dbt builds models in your target schema. To change your target schema: * If you're developing in **dbt**, these are set for each user when you first use a development environment. * If you're developing with **dbt Core**, this is the `schema:` parameter in your `profiles.yml` file. If you wish to split your models across multiple schemas, check out the docs on [using custom schemas](https://docs.getdbt.com/docs/build/custom-schemas.md). Note: on BigQuery, `dataset` is used interchangeably with `schema`. Do I need to create my target schema before running dbt? Nope! dbt will check if the schema exists when it runs. If the schema does not exist, dbt will create it for you. If I rerun dbt, will there be any downtime as models are rebuilt? Nope! The SQL that dbt generates behind the scenes ensures that any relations are replaced atomically (i.e. your business users won't experience any downtime). The implementation of this varies on each warehouse, check out the [logs](https://docs.getdbt.com/faqs/Runs/checking-logs.md) to see the SQL dbt is executing. What happens if the SQL in my query is bad or I get a database error? If there's a mistake in your SQL, dbt will return the error that your database returns. ```shell $ dbt run --select customers Running with dbt=1.9.0 Found 3 models, 9 tests, 0 snapshots, 0 analyses, 133 macros, 0 operations, 0 seed files, 0 sources 14:04:12 | Concurrency: 1 threads (target='dev') 14:04:12 | 14:04:12 | 1 of 1 START view model dbt_alice.customers.......................... [RUN] 14:04:13 | 1 of 1 ERROR creating view model dbt_alice.customers................. [ERROR in 0.81s] 14:04:13 | 14:04:13 | Finished running 1 view model in 1.68s. Completed with 1 error and 0 warnings: Database Error in model customers (models/customers.sql) Syntax error: Expected ")" but got identifier `your-info-12345` at [13:15] compiled SQL at target/run/jaffle_shop/customers.sql Done. PASS=0 WARN=0 ERROR=1 SKIP=0 TOTAL=1 ``` Any models downstream of this model will also be skipped. Use the error message and the [compiled SQL](https://docs.getdbt.com/faqs/Runs/checking-logs.md) to debug any errors. #### Change the way your model is materialized[​](#change-the-way-your-model-is-materialized "Direct link to Change the way your model is materialized") One of the most powerful features of dbt is that you can change the way a model is materialized in your warehouse, simply by changing a configuration value. You can change things between tables and views by changing a keyword rather than writing the data definition language (DDL) to do this behind the scenes. By default, everything gets created as a view. You can override that at the directory level so everything in that directory will materialize to a different materialization. 1. Edit your `dbt_project.yml` file. * Update your project `name` to: dbt\_project.yml ```yaml name: 'jaffle_shop' ``` * Configure `jaffle_shop` so everything in it will be materialized as a table; and configure `example` so everything in it will be materialized as a view. Update your `models` config in the project YAML file to: dbt\_project.yml ```yaml models: jaffle_shop: +materialized: table example: +materialized: view ``` * Click **Save**. 2. Enter the `dbt run` command. Your `customers` model should now be built as a table! info To do this, dbt had to first run a `drop view` statement (or API call on BigQuery), then a `create table as` statement. 3. Edit `models/customers.sql` to override the `dbt_project.yml` for the `customers` model only by adding the following snippet to the top, and click **Save**: models/customers.sql ```sql {{ config( materialized='view' ) }} with customers as ( select id as customer_id ... ) ``` 4. Enter the `dbt run` command. Your model, `customers`, should now build as a view. * BigQuery users need to run `dbt run --full-refresh` instead of `dbt run` to full apply materialization changes. 5. Enter the `dbt run --full-refresh` command for this to take effect in your warehouse. ##### FAQs[​](#faqs "Direct link to FAQs") What materializations are available in dbt? dbt ships with five built-in materializations: `view`, `table`, `incremental`, `ephemeral`, and `materialized_view`. Check out the documentation on [materializations](https://docs.getdbt.com/docs/build/materializations.md) for more information on each of these options. You can also create your own [custom materializations](https://docs.getdbt.com/guides/create-new-materializations.md). This is an advanced feature of dbt. Which materialization should I use for my model? Start out with views, and then change models to tables when required for performance reasons (i.e. downstream queries have slowed). Check out the [docs on materializations](https://docs.getdbt.com/docs/build/materializations.md) for advice on when to use each materialization. What model configurations exist? You can also configure: * [tags](https://docs.getdbt.com/reference/resource-configs/tags.md) to support easy categorization and graph selection * [custom schemas](https://docs.getdbt.com/reference/resource-properties/schema.md) to split your models across multiple schemas * [aliases](https://docs.getdbt.com/reference/resource-configs/alias.md) if your view/table name should differ from the filename * Snippets of SQL to run at the start or end of a model, known as [hooks](https://docs.getdbt.com/docs/build/hooks-operations.md) * Warehouse-specific configurations for performance (e.g. `sort` and `dist` keys on Redshift, `partitions` on BigQuery) Check out the docs on [model configurations](https://docs.getdbt.com/reference/model-configs.md) to learn more. #### Delete the example models[​](#delete-the-example-models "Direct link to Delete the example models") You can now delete the files that dbt created when you initialized the project: 1. Delete the `models/example/` directory. 2. Delete the `example:` key from your `dbt_project.yml` file, and any configurations that are listed under it. dbt\_project.yml ```yaml # before models: jaffle_shop: +materialized: table example: +materialized: view ``` dbt\_project.yml ```yaml # after models: jaffle_shop: +materialized: table ``` 3. Save your changes. ###### FAQs[​](#faqs "Direct link to FAQs") How do I remove deleted models from my data warehouse? If you delete a model from your dbt project, dbt does not automatically drop the relation from your schema. This means that you can end up with extra objects in schemas that dbt creates, which can be confusing to other users. (This can also happen when you switch a model from being a view or table, to ephemeral) When you remove models from your dbt project, you should manually drop the related relations from your schema. I got an "unused model configurations" error message, what does this mean? You might have forgotten to nest your configurations under your project name, or you might be trying to apply configurations to a directory that doesn't exist. Check out this [article](https://discourse.getdbt.com/t/faq-i-got-an-unused-model-configurations-error-message-what-does-this-mean/112) to understand more. #### Build models on top of other models[​](#build-models-on-top-of-other-models "Direct link to Build models on top of other models") As a best practice in SQL, you should separate logic that cleans up your data from logic that transforms your data. You have already started doing this in the existing query by using common table expressions (CTEs). Now you can experiment by separating the logic out into separate models and using the [ref](https://docs.getdbt.com/reference/dbt-jinja-functions/ref.md) function to build models on top of other models: [![The DAG we want for our dbt project](/img/dbt-dag.png?v=2 "The DAG we want for our dbt project")](#)The DAG we want for our dbt project 1. Create a new SQL file, `models/stg_customers.sql`, with the SQL from the `customers` CTE in our original query. 2. Create a second new SQL file, `models/stg_orders.sql`, with the SQL from the `orders` CTE in our original query. models/stg\_customers.sql ```sql select ID as customer_id, FIRST_NAME as first_name, LAST_NAME as last_name from dbo.customers ``` models/stg\_orders.sql ```sql select ID as order_id, USER_ID as customer_id, ORDER_DATE as order_date, STATUS as status from dbo.orders ``` 3. Edit the SQL in your `models/customers.sql` file as follows: models/customers.sql ```sql with customers as ( select * from {{ ref('stg_customers') }} ), orders as ( select * from {{ ref('stg_orders') }} ), customer_orders as ( select customer_id, min(order_date) as first_order_date, max(order_date) as most_recent_order_date, count(order_id) as number_of_orders from orders group by customer_id ), final as ( select customers.customer_id, customers.first_name, customers.last_name, customer_orders.first_order_date, customer_orders.most_recent_order_date, coalesce(customer_orders.number_of_orders, 0) as number_of_orders from customers left join customer_orders on customers.customer_id = customer_orders.customer_id ) select * from final ``` 4. Execute `dbt run`. This time, when you performed a `dbt run`, separate views/tables were created for `stg_customers`, `stg_orders` and `customers`. dbt inferred the order to run these models. Because `customers` depends on `stg_customers` and `stg_orders`, dbt builds `customers` last. You do not need to explicitly define these dependencies. ###### FAQs[​](#faq-2 "Direct link to FAQs") How do I run one model at a time? To run one model, use the `--select` flag (or `-s` flag), followed by the name of the model: ```shell $ dbt run --select customers ``` Check out the [model selection syntax documentation](https://docs.getdbt.com/reference/node-selection/syntax.md) for more operators and examples. Do ref-able resource names need to be unique? Within one project: yes! To build dependencies between resources (such as models, seeds, and snapshots), you need to use the `ref` function, and pass in the resource name as an argument. dbt uses that resource name to uniquely resolve the `ref` to a specific resource. As a result, these resource names need to be unique, *even if they are in distinct folders*. A resource in one project can have the same name as a resource in another project (installed as a dependency). dbt uses the project name to uniquely identify each resource. We call this "namespacing." If you `ref` a resource with a duplicated name, it will resolve to the resource within the same namespace (package or project), or raise an error because of an ambiguous reference. Use [two-argument `ref`](https://docs.getdbt.com/reference/dbt-jinja-functions/ref.md#ref-project-specific-models) to disambiguate references by specifying the namespace. Those resource will still need to land in distinct locations in the data warehouse. Read the docs on [custom aliases](https://docs.getdbt.com/docs/build/custom-aliases.md) and [custom schemas](https://docs.getdbt.com/docs/build/custom-schemas.md) for details on how to achieve this. As I create more models, how should I keep my project organized? What should I name my models? There's no one best way to structure a project! Every organization is unique. If you're just getting started, check out how we (dbt Labs) [structure our dbt projects](https://docs.getdbt.com/best-practices/how-we-structure/1-guide-overview.md). #### Add tests to your models[​](#add-tests-to-your-models "Direct link to Add tests to your models") Adding [data tests](https://docs.getdbt.com/docs/build/data-tests.md) to a project helps validate that your models are working correctly. To add data tests to your project: 1. Create a new YAML file in the `models` directory, named `models/schema.yml` 2. Add the following contents to the file: models/schema.yml ```yaml version: 2 models: - name: customers columns: - name: customer_id data_tests: - unique - not_null - name: stg_customers columns: - name: customer_id data_tests: - unique - not_null - name: stg_orders columns: - name: order_id data_tests: - unique - not_null - name: status data_tests: - accepted_values: arguments: # available in v1.10.5 and higher. Older versions can set the as the top-level property. values: ['placed', 'shipped', 'completed', 'return_pending', 'returned'] - name: customer_id data_tests: - not_null - relationships: arguments: to: ref('stg_customers') field: customer_id ``` 3. Run `dbt test`, and confirm that all your tests passed. When you run `dbt test`, dbt iterates through your YAML files, and constructs a query for each test. Each query will return the number of records that fail the test. If this number is 0, then the test is successful. ###### FAQs[​](#faqs "Direct link to FAQs") What tests are available for me to use in dbt? Can I add my own custom tests? Out of the box, dbt ships with the following data tests: * `unique` * `not_null` * `accepted_values` * `relationships` (for example, referential integrity) You can also write your own [custom generic tests](https://docs.getdbt.com/docs/build/data-tests.md#generic-data-tests). Some additional generic tests have been open-sourced in the [dbt-utils package](https://github.com/dbt-labs/dbt-utils#generic-tests). Check out the docs on [packages](https://docs.getdbt.com/docs/build/packages.md) to learn how to make these tests available in your project. How do I test one model at a time? Running tests on one model looks very similar to running a model: use the `--select` flag (or `-s` flag), followed by the name of the model: ```shell dbt test --select customers ``` Check out the [model selection syntax documentation](https://docs.getdbt.com/reference/node-selection/syntax.md) for full syntax, and [test selection examples](https://docs.getdbt.com/reference/node-selection/test-selection-examples.md) in particular. One of my tests failed, how can I debug it? To debug a failing test, find the SQL that dbt ran by: * dbt: * Within the test output, click on the failed test, and then select "Details". * dbt Core: * Open the file path returned as part of the error message. * Navigate to the `target/compiled/schema_tests` directory for all compiled test queries. Copy the SQL into a query editor (in dbt, you can paste it into a new `Statement`), and run the query to find the records that failed. Does my test file need to be named \`schema.yml\`? No! You can name this file whatever you want (including `whatever_you_want.yml`), so long as: * The file is in your `models/` directory¹ * The file has `.yml` extension Check out the [docs](https://docs.getdbt.com/reference/configs-and-properties.md) for more information. ¹If you're declaring properties for seeds, snapshots, or macros, you can also place this file in the related directory — `seeds/`, `snapshots/` and `macros/` respectively. Why do model and source YAML files always start with \`version: 2\`? Once upon a time, the structure of these `.yml` files was very different (s/o to anyone who was using dbt back then!). Adding `version: 2` allowed us to make this structure more extensible. From [dbt Core v1.5](), the top-level `version:` key is optional in all resource YAML files. If present, only `version: 2` is supported. Also starting in v1.5, both the [`config-version: 2`](https://docs.getdbt.com/reference/project-configs/config-version.md) and the top-level `version:` key in the `dbt_project.yml` are optional. Resource YAML files do not currently require this config. We only support `version: 2` if it's specified. Although we do not expect to update YAML files to `version: 3` soon, having this config will make it easier for us to introduce new structures in the future What data tests should I add to my project? We recommend that every model has a data test on a primary key, that is, a column that is `unique` and `not_null`. We also recommend that you test any assumptions on your source data. For example, if you believe that your payments can only be one of three payment methods, you should test that assumption regularly — a new payment method may introduce logic errors in your SQL. In advanced dbt projects, we recommend using [sources](https://docs.getdbt.com/docs/build/sources.md) and running these source data-integrity tests against the sources rather than models. When should I run my data tests? You should run your data tests whenever you are writing new code (to ensure you haven't broken any existing models by changing SQL), and whenever you run your transformations in production (to ensure that your assumptions about your source data are still valid). #### Document your models[​](#document-your-models "Direct link to Document your models") Adding [documentation](https://docs.getdbt.com/docs/build/documentation.md) to your project allows you to describe your models in rich detail, and share that information with your team. Here, we're going to add some basic documentation to our project. Update your `models/schema.yml` file to include some descriptions, such as those below. models/schema.yml ```yaml version: 2 models: - name: customers description: One record per customer columns: - name: customer_id description: Primary key data_tests: - unique - not_null - name: first_order_date description: NULL when a customer has not yet placed an order. - name: stg_customers description: This model cleans up customer data columns: - name: customer_id description: Primary key data_tests: - unique - not_null - name: stg_orders description: This model cleans up order data columns: - name: order_id description: Primary key data_tests: - unique - not_null - name: status data_tests: - accepted_values: arguments: # available in v1.10.5 and higher. Older versions can set the as the top-level property. values: ['placed', 'shipped', 'completed', 'return_pending', 'returned'] - name: customer_id data_tests: - not_null - relationships: arguments: to: ref('stg_customers') field: customer_id ``` * View in Catalog * View in Studio IDE [Catalog](https://docs.getdbt.com/docs/explore/explore-projects.md) provides powerful tools to interact with your dbt projects, including documentation: 1. From the IDE, run one of the following commands: * `dbt docs generate` if you're on dbt Core * `dbt build` if you're on the dbt Fusion engine 2. Click **Catalog** in the navigation menu to launch Catalog. 3. In the Catalog pane, click the environment selection dropdown menu at the top of the file tree and change it from **Production** to **Development**. [![View your development environment information.](/img/docs/collaborate/dbt-explorer/catalog-nav-dropdown.png?v=2 "View your development environment information.")](#)View your development environment information. 4. Select your project from the file tree. 5. Use the search bar or browse the resource list to find the `customers` model. 6. Click the model to view its details, including the descriptions you added. [![View your model's documentation and lineage in Catalog.](/img/docs/collaborate/dbt-explorer/example-model-details.png?v=2 "View your model's documentation and lineage in Catalog.")](#)View your model's documentation and lineage in Catalog. Catalog displays your model's description, column documentation, data tests, and lineage graph. You can also see which columns are missing documentation and track test coverage across your project. You can view docs directly from the IDE if you're on `Latest` or another version of dbt Core. Keep in mind that this is a legacy view and doesn't offer the same level of interactivity as Catalog. 1. In the IDE, run `dbt docs generate`. 2. From the navigation bar, click the **View docs** icon located to the right of the **branch name**. [![The View docs icon in the Studio IDE.](/img/docs/collaborate/dbt-explorer/docs-icon.png?v=2 "The View docs icon in the Studio IDE.")](#)The View docs icon in the Studio IDE. 3. From **Projects**, select your project name and expand the folders. 4. Click **models** > **marts** > **customers**. [![View your model's documentation in the legacy docs view.](/img/docs/collaborate/dbt-explorer/legacy-docs-view.png?v=2 "View your model's documentation in the legacy docs view.")](#)View your model's documentation in the legacy docs view. ###### FAQs[​](#faqs "Direct link to FAQs") How do I write long-form explanations in my descriptions? If you need more than a sentence to explain a model, you can: 1. Split your description over multiple lines using `>`. Interior line breaks are removed and Markdown can be used. This method is recommended for simple, single-paragraph descriptions: ```yml models: - name: customers description: > Lorem ipsum **dolor** sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. ``` 2. Split your description over multiple lines using `|`. Interior line breaks are maintained and Markdown can be used. This method is recommended for more complex descriptions: ```yml models: - name: customers description: | ### Lorem ipsum * dolor sit amet, consectetur adipisicing elit, sed do eiusmod * tempor incididunt ut labore et dolore magna aliqua. ``` 3. Use a [docs block](https://docs.getdbt.com/docs/build/documentation.md#using-docs-blocks) to write the description in a separate Markdown file. How do I access documentation in dbt Catalog? If you're using dbt to deploy your project and have a [Starter, Enterprise, or Enterprise+ plan](https://www.getdbt.com/pricing/), you can use Catalog to view your project's [resources](https://docs.getdbt.com/docs/build/projects.md) (such as models, tests, and metrics) and their lineage to gain a better understanding of its latest production state. Access Catalog in dbt by clicking the **Catalog** link in the navigation. You can have up to 5 read-only users access the documentation for your project. dbt developer plan and dbt Core users can use [dbt Docs](https://docs.getdbt.com/docs/explore/build-and-view-your-docs.md#dbt-docs), which generates basic documentation but it doesn't offer the same speed, metadata, or visibility as Catalog. #### Commit your changes[​](#commit-your-changes "Direct link to Commit your changes") Now that you've built your customer model, you need to commit the changes you made to the project so that the repository has your latest code. **If you edited directly in the protected primary branch:**
1. Click the **Commit and sync git** button. This action prepares your changes for commit. 2. A modal titled **Commit to a new branch** will appear. 3. In the modal window, name your new branch `add-customers-model`. This branches off from your primary branch with your new changes. 4. Add a commit message, such as "Add customers model, tests, docs" and and commit your changes. 5. Click **Merge this branch to main** to add these changes to the main branch on your repo. **If you created a new branch before editing:**
1. Since you already branched out of the primary protected branch, go to **Version Control** on the left. 2. Click **Commit and sync** to add a message. 3. Add a commit message, such as "Add customers model, tests, docs." 4. Click **Merge this branch to main** to add these changes to the main branch on your repo. #### Deploy dbt[​](#deploy-dbt "Direct link to Deploy dbt") Use dbt's Scheduler to deploy your production jobs confidently and build observability into your processes. You'll learn to create a deployment environment and run a job in the following steps. ##### Create a deployment environment[​](#create-a-deployment-environment "Direct link to Create a deployment environment") 1. From the main menu, go to **Orchestration** > **Environments**. 2. Click **Create environment**. 3. In the **Name** field, write the name of your deployment environment. For example, "Production." 4. The **dbt version** will default to the latest available. We recommend all new projects run on the latest version of dbt. 5. Under **Deployment connection**, enter the name of the dataset you want to use as the target, such as "Analytics". This will allow dbt to build and work with that dataset. For some data warehouses, the target dataset may be referred to as a "schema". 6. Click **Save**. ##### Create and run a job[​](#create-and-run-a-job "Direct link to Create and run a job") Jobs are a set of dbt commands that you want to run on a schedule. For example, `dbt build`. As the `jaffle_shop` business gains more customers, and those customers create more orders, you will see more records added to your source data. Because you materialized the `customers` model as a table, you'll need to periodically rebuild your table to ensure that the data stays up-to-date. This update will happen when you run a job. 1. After creating your deployment environment, you should be directed to the page for a new environment. If not, select **Orchestration** from the main menu, then click **Jobs**. 2. Click **Create job** > **Deploy job**. 3. Provide a job name (for example, "Production run") and select the environment you just created. 4. Scroll down to the **Execution settings** section. 5. Under **Commands**, add this command as part of your job if you don't see it: * `dbt build` 6. Select the **Generate docs on run** option to automatically [generate updated project docs](https://docs.getdbt.com/docs/explore/build-and-view-your-docs.md) each time your job runs. 7. For this exercise, do *not* set a schedule for your project to run — while your organization's project should run regularly, there's no need to run this example project on a schedule. Scheduling a job is sometimes referred to as *deploying a project*. 8. Click **Save**, then click **Run now** to run your job. 9. Click the run and watch its progress under **Run summary**. 10. Once the run is complete, click **View Documentation** to see the docs for your project. Congratulations 🎉! You've just deployed your first dbt project! ###### FAQs[​](#faqs "Direct link to FAQs") What happens if one of my runs fails? If you're using dbt, we recommend setting up email and Slack notifications (`Account Settings > Notifications`) for any failed runs. Then, debug these runs the same way you would debug any runs in development. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Quickstart for dbt and Redshift [Back to guides](https://docs.getdbt.com/guides.md) Redshift dbt platform Quickstart Beginner [Menu ]() #### Introduction[​](#introduction "Direct link to Introduction") In this quickstart guide, you'll learn how to use dbt with Redshift. It will show you how to: * Set up a Redshift cluster. * Load sample data into your Redshift account. * Connect dbt to Redshift. * Take a sample query and turn it into a model in your dbt project. A model in dbt is a select statement. * Add tests to your models. * Document your models. * Schedule a job to run. Videos for you Check out [dbt Fundamentals](https://learn.getdbt.com/courses/dbt-fundamentals) for free if you're interested in course learning with videos. ##### Prerequisites[​](#prerequisites "Direct link to Prerequisites") * You have a [dbt account](https://www.getdbt.com/signup/). * You have an AWS account with permissions to execute a CloudFormation template to create appropriate roles and a Redshift cluster. ##### Related content[​](#related-content "Direct link to Related content") * Learn more with [dbt Learn courses](https://learn.getdbt.com) * [CI jobs](https://docs.getdbt.com/docs/deploy/continuous-integration.md) * [Deploy jobs](https://docs.getdbt.com/docs/deploy/deploy-jobs.md) * [Job notifications](https://docs.getdbt.com/docs/deploy/job-notifications.md) * [Source freshness](https://docs.getdbt.com/docs/deploy/source-freshness.md) #### Create a Redshift cluster[​](#create-a-redshift-cluster "Direct link to Create a Redshift cluster") 1. Sign in to your [AWS account](https://signin.aws.amazon.com/console) as a root user or an IAM user depending on your level of access. 2. Use a CloudFormation template to quickly set up a Redshift cluster. A CloudFormation template is a configuration file that automatically spins up the necessary resources in AWS. [Start a CloudFormation stack](https://console.aws.amazon.com/cloudformation/home?region=us-east-1#/stacks/new?stackName=dbt-workshop\&templateURL=https://tpch-sample-data.s3.amazonaws.com/create-dbtworkshop-infr) and you can refer to the [create-dbtworkshop-infr JSON file](https://github.com/aws-samples/aws-modernization-with-dbtlabs/blob/main/resources/cloudformation/create-dbtworkshop-infr) for more template details. tip To avoid connectivity issues with dbt, make sure to allow inbound traffic on port 5439 from [dbt's IP addresses](https://docs.getdbt.com/docs/cloud/about-cloud/access-regions-ip-addresses.md) in your Redshift security groups and Network Access Control Lists (NACLs) settings. 3. Click **Next** for each page until you reach the **Select acknowledgement** checkbox. Select **I acknowledge that AWS CloudFormation might create IAM resources with custom names** and click **Create Stack**. You should land on the stack page with a CREATE\_IN\_PROGRESS status. [![Cloud Formation in Progress](/img/redshift_tutorial/images/cloud_formation_in_progress.png?v=2 "Cloud Formation in Progress")](#)Cloud Formation in Progress 4. When the stack status changes to CREATE\_COMPLETE, click the **Outputs** tab on the top to view information that you will use throughout the rest of this guide. Save those credentials for later by keeping this open in a tab. 5. Type `Redshift` in the search bar at the top and click **Amazon Redshift**. [![Click on Redshift](/img/redshift_tutorial/images/go_to_redshift.png?v=2 "Click on Redshift")](#)Click on Redshift 6. Confirm that your new Redshift cluster is listed in **Cluster overview**. Select your new cluster. The cluster name should begin with `dbtredshiftcluster-`. Then, click **Query Data**. You can choose the classic query editor or v2. We will be using the v2 version for the purpose of this guide. [![Available Redshift Cluster](/img/redshift_tutorial/images/cluster_overview.png?v=2 "Available Redshift Cluster")](#)Available Redshift Cluster 7. You might be asked to Configure account. For this sandbox environment, we recommend selecting “Configure account”. 8. Select your cluster from the list. In the **Connect to** popup, fill out the credentials from the output of the stack: * **Authentication** — Use the default which is **Database user name and password**. * **Database** — `dbtworkshop` * **User name** — `dbtadmin` * **Password** — Use the autogenerated `RSadminpassword` from the output of the stack and save it for later. [![Redshift Query Editor v2](/img/redshift_tutorial/images/redshift_query_editor.png?v=2 "Redshift Query Editor v2")](#)Redshift Query Editor v2 [![Connect to Redshift Cluster](/img/redshift_tutorial/images/connect_to_redshift_cluster.png?v=2 "Connect to Redshift Cluster")](#)Connect to Redshift Cluster 9. Click **Create connection**. #### Load data[​](#load-data "Direct link to Load data") Now we are going to load our sample data into the S3 bucket that our Cloudformation template created. S3 buckets are simple and inexpensive way to store data outside of Redshift. 1. The data used in this course is stored as CSVs in a public S3 bucket. You can use the following URLs to download these files. Download these to your computer to use in the following steps. * [jaffle\_shop\_customers.csv](https://dbt-tutorial-public.s3-us-west-2.amazonaws.com/jaffle_shop_customers.csv) * [jaffle\_shop\_orders.csv](https://dbt-tutorial-public.s3-us-west-2.amazonaws.com/jaffle_shop_orders.csv) * [stripe\_payments.csv](https://dbt-tutorial-public.s3-us-west-2.amazonaws.com/stripe_payments.csv) 2. Now we are going to use the S3 bucket that you created with CloudFormation and upload the files. Go to the search bar at the top and type in `S3` and click on S3. There will be sample data in the bucket already, feel free to ignore it or use it for other modeling exploration. The bucket will be prefixed with `dbt-data-lake`. [![Go to S3](/img/redshift_tutorial/images/go_to_S3.png?v=2 "Go to S3")](#)Go to S3 3. Click on the `name of the bucket` S3 bucket. If you have multiple S3 buckets, this will be the bucket that was listed under “Workshopbucket” on the Outputs page. [![Go to your S3 Bucket](/img/redshift_tutorial/images/s3_bucket.png?v=2 "Go to your S3 Bucket")](#)Go to your S3 Bucket 4. Click **Upload**. Drag the three files into the UI and click the **Upload** button. [![Upload your CSVs](/img/redshift_tutorial/images/upload_csv.png?v=2 "Upload your CSVs")](#)Upload your CSVs 5. Remember the name of the S3 bucket for later. It should look like this: `s3://dbt-data-lake-xxxx`. You will need it for the next section. 6. Now let’s go back to the Redshift query editor. Search for Redshift in the search bar, choose your cluster, and select Query data. 7. In your query editor, execute this query below to create the schemas that we will be placing your raw data into. You can highlight the statement and then click on Run to run them individually. If you are on the Classic Query Editor, you might need to input them separately into the UI. You should see these schemas listed under `dbtworkshop`. ```sql create schema if not exists jaffle_shop; create schema if not exists stripe; ``` 8. Now create the tables in your schema with these queries using the statements below. These will be populated as tables in the respective schemas. ```sql create table jaffle_shop.customers( id integer, first_name varchar(50), last_name varchar(50) ); create table jaffle_shop.orders( id integer, user_id integer, order_date date, status varchar(50) ); create table stripe.payment( id integer, orderid integer, paymentmethod varchar(50), status varchar(50), amount integer, created date ); ``` 9. Now we need to copy the data from S3. This enables you to run queries in this guide for demonstrative purposes; it's not an example of how you would do this for a real project. Make sure to update the S3 location, iam role, and region. You can find the S3 and iam role in your outputs from the CloudFormation stack. Find the stack by searching for `CloudFormation` in the search bar, then clicking **Stacks** in the CloudFormation tile. ```sql copy jaffle_shop.customers( id, first_name, last_name) from 's3://dbt-data-lake-xxxx/jaffle_shop_customers.csv' iam_role 'arn:aws:iam::XXXXXXXXXX:role/RoleName' region 'us-east-1' delimiter ',' ignoreheader 1 acceptinvchars; copy jaffle_shop.orders(id, user_id, order_date, status) from 's3://dbt-data-lake-xxxx/jaffle_shop_orders.csv' iam_role 'arn:aws:iam::XXXXXXXXXX:role/RoleName' region 'us-east-1' delimiter ',' ignoreheader 1 acceptinvchars; copy stripe.payment(id, orderid, paymentmethod, status, amount, created) from 's3://dbt-data-lake-xxxx/stripe_payments.csv' iam_role 'arn:aws:iam::XXXXXXXXXX:role/RoleName' region 'us-east-1' delimiter ',' ignoreheader 1 Acceptinvchars; ``` Ensure that you can run a `select *` from each of the tables with the following code snippets. ```sql select * from jaffle_shop.customers; select * from jaffle_shop.orders; select * from stripe.payment; ``` #### Connect dbt to Redshift[​](#connect-dbt-to-redshift "Direct link to Connect dbt to Redshift") 1. Create a new project in [dbt](https://docs.getdbt.com/docs/cloud/about-cloud/access-regions-ip-addresses.md). Navigate to **Account settings** (by clicking on your account name in the left side menu), and click **+ New Project**. 2. Enter a project name and click **Continue**. 3. In the **Configure your development environment** section, click the **Connection** dropdown menu and select **Add new connection**. This directs you to the connection configuration settings. 4. In the **Type** section, select **Redshift**. 5. Enter your Redshift settings. Reference your credentials you saved from the CloudFormation template. * **Hostname** — Your entire hostname. * **Port** — `5439` * **Database** (under **Optional settings**) — `dbtworkshop` [![dbt - Redshift Cluster Settings](/img/redshift_tutorial/images/dbt_cloud_redshift_account_settings.png?v=2 "dbt - Redshift Cluster Settings")](#)dbt - Redshift Cluster Settings Avoid connection issues To avoid connection issues with dbt, ensure you follow these minimal but essential AWS network setup steps because Redshift network access isn't configured automatically: * Allow inbound traffic on port `5439` from [dbt's IP addresses](https://docs.getdbt.com/docs/cloud/about-cloud/access-regions-ip-addresses.md) in your Redshift security groups and Network Access Control Lists settings. * Configure your Virtual Private Cloud with the necessary route tables, IP gateways (like an internet or NAT gateway), and inbound rules. For more information, see [AWS documentation on configuring Redshift security group communication](https://docs.aws.amazon.com/redshift/latest/mgmt/rs-security-group-public-private.html). 6. Click **Save**. 7. Set up your personal development credentials by going to **Your profile** > **Credentials**. 8. Select your project that uses the Redshift connection. 9. Click the **configure your development environment and add a connection** link. This directs you to a page where you can enter your personal development credentials. 10. Set your development credentials. These credentials will be used by dbt to connect to Redshift. Those credentials (as provided in your CloudFormation output) will be: * **Username** — `dbtadmin` * **Password** — This is the autogenerated password that you used earlier in the guide * **Schema** — dbt automatically generates a schema name for you. By convention, this is `dbt_`. This is the schema connected directly to your development environment, and it's where your models will be built when running dbt within the Studio IDE. [![dbt - Redshift Development Credentials](/img/redshift_tutorial/images/dbt_cloud_redshift_development_credentials.png?v=2 "dbt - Redshift Development Credentials")](#)dbt - Redshift Development Credentials 11. Click **Test connection**. This verifies that dbt can access your Redshift cluster. 12. If the test succeeded, click **Save** to complete the configuration. If it failed, you might need to check your Redshift settings and credentials. #### Set up a dbt managed repository[​](#set-up-a-dbt-managed-repository "Direct link to Set up a dbt managed repository") When you develop in dbt, you can leverage [Git](https://docs.getdbt.com/docs/cloud/git/git-version-control.md) to version control your code. To connect to a repository, you can either set up a dbt-hosted [managed repository](https://docs.getdbt.com/docs/cloud/git/managed-repository.md) or directly connect to a [supported git provider](https://docs.getdbt.com/docs/cloud/git/connect-github.md). Managed repositories are a great way to trial dbt without needing to create a new repository. In the long run, it's better to connect to a supported git provider to use features like automation and [continuous integration](https://docs.getdbt.com/docs/deploy/continuous-integration.md). To set up a managed repository: 1. Under "Setup a repository", select **Managed**. 2. Type a name for your repo such as `bbaggins-dbt-quickstart` 3. Click **Create**. It will take a few seconds for your repository to be created and imported. 4. Once you see the "Successfully imported repository," click **Continue**. #### Initialize your dbt project​ and start developing[​](#initialize-your-dbt-project-and-start-developing "Direct link to Initialize your dbt project​ and start developing") Now that you have a repository configured, you can initialize your project and start development in dbt: 1. Click **Start developing in the Studio IDE**. It might take a few minutes for your project to spin up for the first time as it establishes your git connection, clones your repo, and tests the connection to the warehouse. 2. Above the file tree to the left, click **Initialize dbt project**. This builds out your folder structure with example models. 3. Make your initial commit by clicking **Commit and sync**. Use the commit message `initial commit` and click **Commit**. This creates the first commit to your managed repo and allows you to open a branch where you can add new dbt code. 4. You can now directly query data from your warehouse and execute `dbt run`. You can try this out now: * Click **+ Create new file**, add this query to the new file, and click **Save as** to save the new file: ```sql select * from jaffle_shop.customers ``` * In the command line bar at the bottom, enter `dbt run` and click **Enter**. You should see a `dbt run succeeded` message. #### Build your first model[​](#build-your-first-model "Direct link to Build your first model") You have two options for working with files in the Studio IDE: * Create a new branch (recommended) — Create a new branch to edit and commit your changes. Navigate to **Version Control** on the left sidebar and click **Create branch**. * Edit in the protected primary branch — If you prefer to edit, format, or lint files and execute dbt commands directly in your primary git branch. The Studio IDE prevents commits to the protected branch, so you will be prompted to commit your changes to a new branch. Name the new branch `add-customers-model`. 1. Click the **...** next to the `models` directory, then select **Create file**. 2. Name the file `customers.sql`, then click **Create**. 3. Copy the following query into the file and click **Save**. ```sql with customers as ( select id as customer_id, first_name, last_name from jaffle_shop.customers ), orders as ( select id as order_id, user_id as customer_id, order_date, status from jaffle_shop.orders ), customer_orders as ( select customer_id, min(order_date) as first_order_date, max(order_date) as most_recent_order_date, count(order_id) as number_of_orders from orders group by 1 ), final as ( select customers.customer_id, customers.first_name, customers.last_name, customer_orders.first_order_date, customer_orders.most_recent_order_date, coalesce(customer_orders.number_of_orders, 0) as number_of_orders from customers left join customer_orders using (customer_id) ) select * from final ``` 4. Enter `dbt run` in the command prompt at the bottom of the screen. You should get a successful run and see the three models. Later, you can connect your business intelligence (BI) tools to these views and tables so they only read cleaned up data rather than raw data in your BI tool. ###### FAQs[​](#faqs "Direct link to FAQs") How can I see the SQL that dbt is running? To check out the SQL that dbt is running, you can look in: * dbt: * Within the run output, click on a model name, and then select "Details" * dbt Core: * The `target/compiled/` directory for compiled `select` statements * The `target/run/` directory for compiled `create` statements * The `logs/dbt.log` file for verbose logging. How did dbt choose which schema to build my models in? By default, dbt builds models in your target schema. To change your target schema: * If you're developing in **dbt**, these are set for each user when you first use a development environment. * If you're developing with **dbt Core**, this is the `schema:` parameter in your `profiles.yml` file. If you wish to split your models across multiple schemas, check out the docs on [using custom schemas](https://docs.getdbt.com/docs/build/custom-schemas.md). Note: on BigQuery, `dataset` is used interchangeably with `schema`. Do I need to create my target schema before running dbt? Nope! dbt will check if the schema exists when it runs. If the schema does not exist, dbt will create it for you. If I rerun dbt, will there be any downtime as models are rebuilt? Nope! The SQL that dbt generates behind the scenes ensures that any relations are replaced atomically (i.e. your business users won't experience any downtime). The implementation of this varies on each warehouse, check out the [logs](https://docs.getdbt.com/faqs/Runs/checking-logs.md) to see the SQL dbt is executing. What happens if the SQL in my query is bad or I get a database error? If there's a mistake in your SQL, dbt will return the error that your database returns. ```shell $ dbt run --select customers Running with dbt=1.9.0 Found 3 models, 9 tests, 0 snapshots, 0 analyses, 133 macros, 0 operations, 0 seed files, 0 sources 14:04:12 | Concurrency: 1 threads (target='dev') 14:04:12 | 14:04:12 | 1 of 1 START view model dbt_alice.customers.......................... [RUN] 14:04:13 | 1 of 1 ERROR creating view model dbt_alice.customers................. [ERROR in 0.81s] 14:04:13 | 14:04:13 | Finished running 1 view model in 1.68s. Completed with 1 error and 0 warnings: Database Error in model customers (models/customers.sql) Syntax error: Expected ")" but got identifier `your-info-12345` at [13:15] compiled SQL at target/run/jaffle_shop/customers.sql Done. PASS=0 WARN=0 ERROR=1 SKIP=0 TOTAL=1 ``` Any models downstream of this model will also be skipped. Use the error message and the [compiled SQL](https://docs.getdbt.com/faqs/Runs/checking-logs.md) to debug any errors. #### Change the way your model is materialized[​](#change-the-way-your-model-is-materialized "Direct link to Change the way your model is materialized") One of the most powerful features of dbt is that you can change the way a model is materialized in your warehouse, simply by changing a configuration value. You can change things between tables and views by changing a keyword rather than writing the data definition language (DDL) to do this behind the scenes. By default, everything gets created as a view. You can override that at the directory level so everything in that directory will materialize to a different materialization. 1. Edit your `dbt_project.yml` file. * Update your project `name` to: dbt\_project.yml ```yaml name: 'jaffle_shop' ``` * Configure `jaffle_shop` so everything in it will be materialized as a table; and configure `example` so everything in it will be materialized as a view. Update your `models` config in the project YAML file to: dbt\_project.yml ```yaml models: jaffle_shop: +materialized: table example: +materialized: view ``` * Click **Save**. 2. Enter the `dbt run` command. Your `customers` model should now be built as a table! info To do this, dbt had to first run a `drop view` statement (or API call on BigQuery), then a `create table as` statement. 3. Edit `models/customers.sql` to override the `dbt_project.yml` for the `customers` model only by adding the following snippet to the top, and click **Save**: models/customers.sql ```sql {{ config( materialized='view' ) }} with customers as ( select id as customer_id ... ) ``` 4. Enter the `dbt run` command. Your model, `customers`, should now build as a view. * BigQuery users need to run `dbt run --full-refresh` instead of `dbt run` to full apply materialization changes. 5. Enter the `dbt run --full-refresh` command for this to take effect in your warehouse. ##### FAQs[​](#faqs "Direct link to FAQs") What materializations are available in dbt? dbt ships with five built-in materializations: `view`, `table`, `incremental`, `ephemeral`, and `materialized_view`. Check out the documentation on [materializations](https://docs.getdbt.com/docs/build/materializations.md) for more information on each of these options. You can also create your own [custom materializations](https://docs.getdbt.com/guides/create-new-materializations.md). This is an advanced feature of dbt. Which materialization should I use for my model? Start out with views, and then change models to tables when required for performance reasons (i.e. downstream queries have slowed). Check out the [docs on materializations](https://docs.getdbt.com/docs/build/materializations.md) for advice on when to use each materialization. What model configurations exist? You can also configure: * [tags](https://docs.getdbt.com/reference/resource-configs/tags.md) to support easy categorization and graph selection * [custom schemas](https://docs.getdbt.com/reference/resource-properties/schema.md) to split your models across multiple schemas * [aliases](https://docs.getdbt.com/reference/resource-configs/alias.md) if your view/table name should differ from the filename * Snippets of SQL to run at the start or end of a model, known as [hooks](https://docs.getdbt.com/docs/build/hooks-operations.md) * Warehouse-specific configurations for performance (e.g. `sort` and `dist` keys on Redshift, `partitions` on BigQuery) Check out the docs on [model configurations](https://docs.getdbt.com/reference/model-configs.md) to learn more. #### Delete the example models[​](#delete-the-example-models "Direct link to Delete the example models") You can now delete the files that dbt created when you initialized the project: 1. Delete the `models/example/` directory. 2. Delete the `example:` key from your `dbt_project.yml` file, and any configurations that are listed under it. dbt\_project.yml ```yaml # before models: jaffle_shop: +materialized: table example: +materialized: view ``` dbt\_project.yml ```yaml # after models: jaffle_shop: +materialized: table ``` 3. Save your changes. ###### FAQs[​](#faqs "Direct link to FAQs") How do I remove deleted models from my data warehouse? If you delete a model from your dbt project, dbt does not automatically drop the relation from your schema. This means that you can end up with extra objects in schemas that dbt creates, which can be confusing to other users. (This can also happen when you switch a model from being a view or table, to ephemeral) When you remove models from your dbt project, you should manually drop the related relations from your schema. I got an "unused model configurations" error message, what does this mean? You might have forgotten to nest your configurations under your project name, or you might be trying to apply configurations to a directory that doesn't exist. Check out this [article](https://discourse.getdbt.com/t/faq-i-got-an-unused-model-configurations-error-message-what-does-this-mean/112) to understand more. #### Build models on top of other models[​](#build-models-on-top-of-other-models "Direct link to Build models on top of other models") As a best practice in SQL, you should separate logic that cleans up your data from logic that transforms your data. You have already started doing this in the existing query by using common table expressions (CTEs). Now you can experiment by separating the logic out into separate models and using the [ref](https://docs.getdbt.com/reference/dbt-jinja-functions/ref.md) function to build models on top of other models: [![The DAG we want for our dbt project](/img/dbt-dag.png?v=2 "The DAG we want for our dbt project")](#)The DAG we want for our dbt project 1. Create a new SQL file, `models/stg_customers.sql`, with the SQL from the `customers` CTE in our original query. 2. Create a second new SQL file, `models/stg_orders.sql`, with the SQL from the `orders` CTE in our original query. models/stg\_customers.sql ```sql select id as customer_id, first_name, last_name from jaffle_shop.customers ``` models/stg\_orders.sql ```sql select id as order_id, user_id as customer_id, order_date, status from jaffle_shop.orders ``` 3. Edit the SQL in your `models/customers.sql` file as follows: models/customers.sql ```sql with customers as ( select * from {{ ref('stg_customers') }} ), orders as ( select * from {{ ref('stg_orders') }} ), customer_orders as ( select customer_id, min(order_date) as first_order_date, max(order_date) as most_recent_order_date, count(order_id) as number_of_orders from orders group by 1 ), final as ( select customers.customer_id, customers.first_name, customers.last_name, customer_orders.first_order_date, customer_orders.most_recent_order_date, coalesce(customer_orders.number_of_orders, 0) as number_of_orders from customers left join customer_orders using (customer_id) ) select * from final ``` 4. Execute `dbt run`. This time, when you performed a `dbt run`, separate views/tables were created for `stg_customers`, `stg_orders` and `customers`. dbt inferred the order to run these models. Because `customers` depends on `stg_customers` and `stg_orders`, dbt builds `customers` last. You do not need to explicitly define these dependencies. ###### FAQs[​](#faq-2 "Direct link to FAQs") How do I run one model at a time? To run one model, use the `--select` flag (or `-s` flag), followed by the name of the model: ```shell $ dbt run --select customers ``` Check out the [model selection syntax documentation](https://docs.getdbt.com/reference/node-selection/syntax.md) for more operators and examples. Do ref-able resource names need to be unique? Within one project: yes! To build dependencies between resources (such as models, seeds, and snapshots), you need to use the `ref` function, and pass in the resource name as an argument. dbt uses that resource name to uniquely resolve the `ref` to a specific resource. As a result, these resource names need to be unique, *even if they are in distinct folders*. A resource in one project can have the same name as a resource in another project (installed as a dependency). dbt uses the project name to uniquely identify each resource. We call this "namespacing." If you `ref` a resource with a duplicated name, it will resolve to the resource within the same namespace (package or project), or raise an error because of an ambiguous reference. Use [two-argument `ref`](https://docs.getdbt.com/reference/dbt-jinja-functions/ref.md#ref-project-specific-models) to disambiguate references by specifying the namespace. Those resource will still need to land in distinct locations in the data warehouse. Read the docs on [custom aliases](https://docs.getdbt.com/docs/build/custom-aliases.md) and [custom schemas](https://docs.getdbt.com/docs/build/custom-schemas.md) for details on how to achieve this. As I create more models, how should I keep my project organized? What should I name my models? There's no one best way to structure a project! Every organization is unique. If you're just getting started, check out how we (dbt Labs) [structure our dbt projects](https://docs.getdbt.com/best-practices/how-we-structure/1-guide-overview.md). #### Add tests to your models[​](#add-tests-to-your-models "Direct link to Add tests to your models") Adding [data tests](https://docs.getdbt.com/docs/build/data-tests.md) to a project helps validate that your models are working correctly. To add data tests to your project: 1. Create a new YAML file in the `models` directory, named `models/schema.yml` 2. Add the following contents to the file: models/schema.yml ```yaml version: 2 models: - name: customers columns: - name: customer_id data_tests: - unique - not_null - name: stg_customers columns: - name: customer_id data_tests: - unique - not_null - name: stg_orders columns: - name: order_id data_tests: - unique - not_null - name: status data_tests: - accepted_values: arguments: # available in v1.10.5 and higher. Older versions can set the as the top-level property. values: ['placed', 'shipped', 'completed', 'return_pending', 'returned'] - name: customer_id data_tests: - not_null - relationships: arguments: to: ref('stg_customers') field: customer_id ``` 3. Run `dbt test`, and confirm that all your tests passed. When you run `dbt test`, dbt iterates through your YAML files, and constructs a query for each test. Each query will return the number of records that fail the test. If this number is 0, then the test is successful. ###### FAQs[​](#faqs "Direct link to FAQs") What tests are available for me to use in dbt? Can I add my own custom tests? Out of the box, dbt ships with the following data tests: * `unique` * `not_null` * `accepted_values` * `relationships` (for example, referential integrity) You can also write your own [custom generic tests](https://docs.getdbt.com/docs/build/data-tests.md#generic-data-tests). Some additional generic tests have been open-sourced in the [dbt-utils package](https://github.com/dbt-labs/dbt-utils#generic-tests). Check out the docs on [packages](https://docs.getdbt.com/docs/build/packages.md) to learn how to make these tests available in your project. How do I test one model at a time? Running tests on one model looks very similar to running a model: use the `--select` flag (or `-s` flag), followed by the name of the model: ```shell dbt test --select customers ``` Check out the [model selection syntax documentation](https://docs.getdbt.com/reference/node-selection/syntax.md) for full syntax, and [test selection examples](https://docs.getdbt.com/reference/node-selection/test-selection-examples.md) in particular. One of my tests failed, how can I debug it? To debug a failing test, find the SQL that dbt ran by: * dbt: * Within the test output, click on the failed test, and then select "Details". * dbt Core: * Open the file path returned as part of the error message. * Navigate to the `target/compiled/schema_tests` directory for all compiled test queries. Copy the SQL into a query editor (in dbt, you can paste it into a new `Statement`), and run the query to find the records that failed. Does my test file need to be named \`schema.yml\`? No! You can name this file whatever you want (including `whatever_you_want.yml`), so long as: * The file is in your `models/` directory¹ * The file has `.yml` extension Check out the [docs](https://docs.getdbt.com/reference/configs-and-properties.md) for more information. ¹If you're declaring properties for seeds, snapshots, or macros, you can also place this file in the related directory — `seeds/`, `snapshots/` and `macros/` respectively. Why do model and source YAML files always start with \`version: 2\`? Once upon a time, the structure of these `.yml` files was very different (s/o to anyone who was using dbt back then!). Adding `version: 2` allowed us to make this structure more extensible. From [dbt Core v1.5](), the top-level `version:` key is optional in all resource YAML files. If present, only `version: 2` is supported. Also starting in v1.5, both the [`config-version: 2`](https://docs.getdbt.com/reference/project-configs/config-version.md) and the top-level `version:` key in the `dbt_project.yml` are optional. Resource YAML files do not currently require this config. We only support `version: 2` if it's specified. Although we do not expect to update YAML files to `version: 3` soon, having this config will make it easier for us to introduce new structures in the future What data tests should I add to my project? We recommend that every model has a data test on a primary key, that is, a column that is `unique` and `not_null`. We also recommend that you test any assumptions on your source data. For example, if you believe that your payments can only be one of three payment methods, you should test that assumption regularly — a new payment method may introduce logic errors in your SQL. In advanced dbt projects, we recommend using [sources](https://docs.getdbt.com/docs/build/sources.md) and running these source data-integrity tests against the sources rather than models. When should I run my data tests? You should run your data tests whenever you are writing new code (to ensure you haven't broken any existing models by changing SQL), and whenever you run your transformations in production (to ensure that your assumptions about your source data are still valid). #### Document your models[​](#document-your-models "Direct link to Document your models") Adding [documentation](https://docs.getdbt.com/docs/build/documentation.md) to your project allows you to describe your models in rich detail, and share that information with your team. Here, we're going to add some basic documentation to our project. Update your `models/schema.yml` file to include some descriptions, such as those below. models/schema.yml ```yaml version: 2 models: - name: customers description: One record per customer columns: - name: customer_id description: Primary key data_tests: - unique - not_null - name: first_order_date description: NULL when a customer has not yet placed an order. - name: stg_customers description: This model cleans up customer data columns: - name: customer_id description: Primary key data_tests: - unique - not_null - name: stg_orders description: This model cleans up order data columns: - name: order_id description: Primary key data_tests: - unique - not_null - name: status data_tests: - accepted_values: arguments: # available in v1.10.5 and higher. Older versions can set the as the top-level property. values: ['placed', 'shipped', 'completed', 'return_pending', 'returned'] - name: customer_id data_tests: - not_null - relationships: arguments: to: ref('stg_customers') field: customer_id ``` * View in Catalog * View in Studio IDE [Catalog](https://docs.getdbt.com/docs/explore/explore-projects.md) provides powerful tools to interact with your dbt projects, including documentation: 1. From the IDE, run one of the following commands: * `dbt docs generate` if you're on dbt Core * `dbt build` if you're on the dbt Fusion engine 2. Click **Catalog** in the navigation menu to launch Catalog. 3. In the Catalog pane, click the environment selection dropdown menu at the top of the file tree and change it from **Production** to **Development**. [![View your development environment information.](/img/docs/collaborate/dbt-explorer/catalog-nav-dropdown.png?v=2 "View your development environment information.")](#)View your development environment information. 4. Select your project from the file tree. 5. Use the search bar or browse the resource list to find the `customers` model. 6. Click the model to view its details, including the descriptions you added. [![View your model's documentation and lineage in Catalog.](/img/docs/collaborate/dbt-explorer/example-model-details.png?v=2 "View your model's documentation and lineage in Catalog.")](#)View your model's documentation and lineage in Catalog. Catalog displays your model's description, column documentation, data tests, and lineage graph. You can also see which columns are missing documentation and track test coverage across your project. You can view docs directly from the IDE if you're on `Latest` or another version of dbt Core. Keep in mind that this is a legacy view and doesn't offer the same level of interactivity as Catalog. 1. In the IDE, run `dbt docs generate`. 2. From the navigation bar, click the **View docs** icon located to the right of the **branch name**. [![The View docs icon in the Studio IDE.](/img/docs/collaborate/dbt-explorer/docs-icon.png?v=2 "The View docs icon in the Studio IDE.")](#)The View docs icon in the Studio IDE. 3. From **Projects**, select your project name and expand the folders. 4. Click **models** > **marts** > **customers**. [![View your model's documentation in the legacy docs view.](/img/docs/collaborate/dbt-explorer/legacy-docs-view.png?v=2 "View your model's documentation in the legacy docs view.")](#)View your model's documentation in the legacy docs view. ###### FAQs[​](#faqs "Direct link to FAQs") How do I write long-form explanations in my descriptions? If you need more than a sentence to explain a model, you can: 1. Split your description over multiple lines using `>`. Interior line breaks are removed and Markdown can be used. This method is recommended for simple, single-paragraph descriptions: ```yml models: - name: customers description: > Lorem ipsum **dolor** sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. ``` 2. Split your description over multiple lines using `|`. Interior line breaks are maintained and Markdown can be used. This method is recommended for more complex descriptions: ```yml models: - name: customers description: | ### Lorem ipsum * dolor sit amet, consectetur adipisicing elit, sed do eiusmod * tempor incididunt ut labore et dolore magna aliqua. ``` 3. Use a [docs block](https://docs.getdbt.com/docs/build/documentation.md#using-docs-blocks) to write the description in a separate Markdown file. How do I access documentation in dbt Catalog? If you're using dbt to deploy your project and have a [Starter, Enterprise, or Enterprise+ plan](https://www.getdbt.com/pricing/), you can use Catalog to view your project's [resources](https://docs.getdbt.com/docs/build/projects.md) (such as models, tests, and metrics) and their lineage to gain a better understanding of its latest production state. Access Catalog in dbt by clicking the **Catalog** link in the navigation. You can have up to 5 read-only users access the documentation for your project. dbt developer plan and dbt Core users can use [dbt Docs](https://docs.getdbt.com/docs/explore/build-and-view-your-docs.md#dbt-docs), which generates basic documentation but it doesn't offer the same speed, metadata, or visibility as Catalog. #### Commit your changes[​](#commit-your-changes "Direct link to Commit your changes") Now that you've built your customer model, you need to commit the changes you made to the project so that the repository has your latest code. **If you edited directly in the protected primary branch:**
1. Click the **Commit and sync git** button. This action prepares your changes for commit. 2. A modal titled **Commit to a new branch** will appear. 3. In the modal window, name your new branch `add-customers-model`. This branches off from your primary branch with your new changes. 4. Add a commit message, such as "Add customers model, tests, docs" and and commit your changes. 5. Click **Merge this branch to main** to add these changes to the main branch on your repo. **If you created a new branch before editing:**
1. Since you already branched out of the primary protected branch, go to **Version Control** on the left. 2. Click **Commit and sync** to add a message. 3. Add a commit message, such as "Add customers model, tests, docs." 4. Click **Merge this branch to main** to add these changes to the main branch on your repo. #### Deploy dbt[​](#deploy-dbt "Direct link to Deploy dbt") Use dbt's Scheduler to deploy your production jobs confidently and build observability into your processes. You'll learn to create a deployment environment and run a job in the following steps. ##### Create a deployment environment[​](#create-a-deployment-environment "Direct link to Create a deployment environment") 1. From the main menu, go to **Orchestration** > **Environments**. 2. Click **Create environment**. 3. In the **Name** field, write the name of your deployment environment. For example, "Production." 4. The **dbt version** will default to the latest available. We recommend all new projects run on the latest version of dbt. 5. Under **Deployment connection**, enter the name of the dataset you want to use as the target, such as "Analytics". This will allow dbt to build and work with that dataset. For some data warehouses, the target dataset may be referred to as a "schema". 6. Click **Save**. ##### Create and run a job[​](#create-and-run-a-job "Direct link to Create and run a job") Jobs are a set of dbt commands that you want to run on a schedule. For example, `dbt build`. As the `jaffle_shop` business gains more customers, and those customers create more orders, you will see more records added to your source data. Because you materialized the `customers` model as a table, you'll need to periodically rebuild your table to ensure that the data stays up-to-date. This update will happen when you run a job. 1. After creating your deployment environment, you should be directed to the page for a new environment. If not, select **Orchestration** from the main menu, then click **Jobs**. 2. Click **Create job** > **Deploy job**. 3. Provide a job name (for example, "Production run") and select the environment you just created. 4. Scroll down to the **Execution settings** section. 5. Under **Commands**, add this command as part of your job if you don't see it: * `dbt build` 6. Select the **Generate docs on run** option to automatically [generate updated project docs](https://docs.getdbt.com/docs/explore/build-and-view-your-docs.md) each time your job runs. 7. For this exercise, do *not* set a schedule for your project to run — while your organization's project should run regularly, there's no need to run this example project on a schedule. Scheduling a job is sometimes referred to as *deploying a project*. 8. Click **Save**, then click **Run now** to run your job. 9. Click the run and watch its progress under **Run summary**. 10. Once the run is complete, click **View Documentation** to see the docs for your project. Congratulations 🎉! You've just deployed your first dbt project! ###### FAQs[​](#faqs "Direct link to FAQs") What happens if one of my runs fails? If you're using dbt, we recommend setting up email and Slack notifications (`Account Settings > Notifications`) for any failed runs. Then, debug these runs the same way you would debug any runs in development. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Quickstart for dbt and Snowflake [Back to guides](https://docs.getdbt.com/guides.md) dbt platform Quickstart Snowflake Beginner [Menu ]() #### Introduction[​](#introduction "Direct link to Introduction") In this quickstart guide, you'll learn how to use dbt with Snowflake. It will show you how to: * Create a new Snowflake worksheet. * Load sample data into your Snowflake account. * Connect dbt to Snowflake. * Take a sample query and turn it into a model in your dbt project. A model in dbt is a select statement. * Add sources to your dbt project. Sources allow you to name and describe the raw data already loaded into Snowflake. * Add tests to your models. * Document your models. * Schedule a job to run. Snowflake also provides a quickstart for you to learn how to use dbt. It makes use of a different public dataset (Knoema Economy Data Atlas) than what's shown in this guide. For more information, refer to [Accelerating Data Teams with dbt & Snowflake](https://quickstarts.snowflake.com/guide/accelerating_data_teams_with_snowflake_and_dbt_cloud_hands_on_lab/) in the Snowflake docs. Videos for you You can check out [dbt Fundamentals](https://learn.getdbt.com/courses/dbt-fundamentals) for free if you're interested in course learning with videos. You can also watch the [YouTube video on dbt and Snowflake](https://www.youtube.com/watch?v=kbCkwhySV_I\&list=PL0QYlrC86xQm7CoOH6RS7hcgLnd3OQioG). ##### Prerequisites​[​](#prerequisites "Direct link to Prerequisites​") * You have a [dbt account](https://www.getdbt.com/signup/). * You have a [trial Snowflake account](https://signup.snowflake.com/). During trial account creation, make sure to choose the **Enterprise** Snowflake edition so you have `ACCOUNTADMIN` access. For a full implementation, you should consider organizational questions when choosing a cloud provider. For more information, see [Introduction to Cloud Platforms](https://docs.snowflake.com/en/user-guide/intro-cloud-platforms.html) in the Snowflake docs. For the purposes of this setup, all cloud providers and regions will work so choose whichever you’d like. ##### Related content[​](#related-content "Direct link to Related content") * Learn more with [dbt Learn courses](https://learn.getdbt.com) * [How we configure Snowflake](https://blog.getdbt.com/how-we-configure-snowflake/) * [CI jobs](https://docs.getdbt.com/docs/deploy/continuous-integration.md) * [Deploy jobs](https://docs.getdbt.com/docs/deploy/deploy-jobs.md) * [Job notifications](https://docs.getdbt.com/docs/deploy/job-notifications.md) * [Source freshness](https://docs.getdbt.com/docs/deploy/source-freshness.md) #### Create a new Snowflake worksheet[​](#create-a-new-snowflake-worksheet "Direct link to Create a new Snowflake worksheet") 1. Log in to your trial Snowflake account. 2. In the Snowflake UI, click **+ Create** in the left-hand corner, underneath the Snowflake logo, which opens a dropdown. Select the first option, **SQL Worksheet**. #### Load data[​](#load-data "Direct link to Load data") The data used here is stored as CSV files in a public S3 bucket and the following steps will guide you through how to prepare your Snowflake account for that data and upload it. 1. Create a new virtual warehouse, two new databases (one for raw data, the other for future dbt development), and two new schemas (one for `jaffle_shop` data, the other for `stripe` data). To do this, run these SQL commands by typing them into the Editor of your new Snowflake worksheet and clicking **Run** in the upper right corner of the UI: ```sql create warehouse transforming; create database raw; create database analytics; create schema raw.jaffle_shop; create schema raw.stripe; ``` 2. In the `raw` database and `jaffle_shop` and `stripe` schemas, create three tables and load relevant data into them: * First, delete all contents (empty) in the Editor of the Snowflake worksheet. Then, run this SQL command to create the `customer` table: ```sql create table raw.jaffle_shop.customers ( id integer, first_name varchar, last_name varchar ); ``` * Delete all contents in the Editor, then run this command to load data into the `customer` table: ```sql copy into raw.jaffle_shop.customers (id, first_name, last_name) from 's3://dbt-tutorial-public/jaffle_shop_customers.csv' file_format = ( type = 'CSV' field_delimiter = ',' skip_header = 1 ); ``` * Delete all contents in the Editor (empty), then run this command to create the `orders` table: ```sql create table raw.jaffle_shop.orders ( id integer, user_id integer, order_date date, status varchar, _etl_loaded_at timestamp default current_timestamp ); ``` * Delete all contents in the Editor, then run this command to load data into the `orders` table: ```sql copy into raw.jaffle_shop.orders (id, user_id, order_date, status) from 's3://dbt-tutorial-public/jaffle_shop_orders.csv' file_format = ( type = 'CSV' field_delimiter = ',' skip_header = 1 ); ``` * Delete all contents in the Editor (empty), then run this command to create the `payment` table: ```sql create table raw.stripe.payment ( id integer, orderid integer, paymentmethod varchar, status varchar, amount integer, created date, _batched_at timestamp default current_timestamp ); ``` * Delete all contents in the Editor, then run this command to load data into the `payment` table: ```sql copy into raw.stripe.payment (id, orderid, paymentmethod, status, amount, created) from 's3://dbt-tutorial-public/stripe_payments.csv' file_format = ( type = 'CSV' field_delimiter = ',' skip_header = 1 ); ``` 3. Verify that the data is loaded by running these SQL queries. Confirm that you can see output for each one. ```sql select * from raw.jaffle_shop.customers; select * from raw.jaffle_shop.orders; select * from raw.stripe.payment; ``` #### Connect dbt to Snowflake[​](#connect-dbt-to-snowflake "Direct link to Connect dbt to Snowflake") There are two ways to connect dbt to Snowflake. The first option is Partner Connect, which provides a streamlined setup to create your dbt account from within your new Snowflake trial account. The second option is to create your dbt account separately and build the Snowflake connection yourself (connect manually). If you want to get started quickly, dbt Labs recommends using Partner Connect. If you want to customize your setup from the very beginning and gain familiarity with the dbt setup flow, dbt Labs recommends connecting manually. * Use Partner Connect * Connect manually Using Partner Connect allows you to create a complete dbt account with your [Snowflake connection](https://docs.getdbt.com/docs/cloud/connect-data-platform/connect-snowflake.md), [a managed repository](https://docs.getdbt.com/docs/cloud/git/managed-repository.md), [environments](https://docs.getdbt.com/docs/build/custom-schemas.md#managing-environments), and credentials. 1. On the left sidebar of the Snowflake UI, go to **Admin > Partner Connect**. Find the dbt tile under the **Data Integration** section or search for dbt in the search bar. Click the tile to connect to dbt. [![Snowflake Partner Connect Box](/img/snowflake_tutorial/snowflake_partner_connect_box.png?v=2 "Snowflake Partner Connect Box")](#)Snowflake Partner Connect Box If you’re using the classic version of the Snowflake UI, you can click the **Partner Connect** button in the top bar of your account. From there, click on the dbt tile to open up the connect box. [![Snowflake Classic UI - Partner Connect](/img/snowflake_tutorial/snowflake_classic_ui_partner_connect.png?v=2 "Snowflake Classic UI - Partner Connect")](#)Snowflake Classic UI - Partner Connect 2. In the **Connect to dbt** popup, find the **Optional Grant** option and select the **RAW** and **ANALYTICS** databases. This will grant access for your new dbt user role to each database. Then, click **Connect**. [![Snowflake Classic UI - Connection Box](/img/snowflake_tutorial/snowflake_classic_ui_connection_box.png?v=2 "Snowflake Classic UI - Connection Box")](#)Snowflake Classic UI - Connection Box [![Snowflake New UI - Connection Box](/img/snowflake_tutorial/snowflake_new_ui_connection_box.png?v=2 "Snowflake New UI - Connection Box")](#)Snowflake New UI - Connection Box 3. Click **Activate** when a popup appears: [![Snowflake Classic UI - Actviation Window](/img/snowflake_tutorial/snowflake_classic_ui_activation_window.png?v=2 "Snowflake Classic UI - Actviation Window")](#)Snowflake Classic UI - Actviation Window [![Snowflake New UI - Activation Window](/img/snowflake_tutorial/snowflake_new_ui_activation_window.png?v=2 "Snowflake New UI - Activation Window")](#)Snowflake New UI - Activation Window 4. After the new tab loads, you will see a form. If you already created a dbt account, you will be asked to provide an account name. If you haven't created account, you will be asked to provide an account name and password. 5. After you have filled out the form and clicked **Complete Registration**, you will be logged into dbt automatically. 6. Go to the left side menu and click your account name, then select **Account settings**, choose the "Partner Connect Trial" project, and select **snowflake** in the overview table. Select edit and update the fields **Database** and **Warehouse** to be `analytics` and `transforming`, respectively. [![dbt - Snowflake Project Overview](/img/snowflake_tutorial/dbt_cloud_snowflake_project_overview.png?v=2 "dbt - Snowflake Project Overview")](#)dbt - Snowflake Project Overview [![dbt - Update Database and Warehouse](/img/snowflake_tutorial/dbt_cloud_update_database_and_warehouse.png?v=2 "dbt - Update Database and Warehouse")](#)dbt - Update Database and Warehouse 1. Create a new project in dbt. Navigate to **Account settings** (by clicking on your account name in the left side menu), and click **+ New Project**. 2. Enter a project name and click **Continue**. 3. In the **Configure your development environment** section, click the **Connection** dropdown menu and select **Add new connection**. This directs you to the connection configuration settings. 4. In the **Type** section, select **Snowflake**. [![dbt - Choose Snowflake Connection](/img/snowflake_tutorial/dbt_cloud_setup_snowflake_connection_start.png?v=2 "dbt - Choose Snowflake Connection")](#)dbt - Choose Snowflake Connection 5. Enter your **Settings** for Snowflake with: * **Account** — Find your account by using the Snowflake trial account URL and removing `snowflakecomputing.com`. The order of your account information will vary by Snowflake version. For example, Snowflake's Classic console URL might look like: `oq65696.west-us-2.azure.snowflakecomputing.com`. The AppUI or Snowsight URL might look more like: `snowflakecomputing.com/west-us-2.azure/oq65696`. In both examples, your account will be: `oq65696.west-us-2.azure`. For more information, see [Account Identifiers](https://docs.snowflake.com/en/user-guide/admin-account-identifier.html) in the Snowflake docs.   ✅ `db5261993` or `db5261993.east-us-2.azure`
  ❌ `db5261993.eu-central-1.snowflakecomputing.com` * **Role** — Leave blank for now. You can update this to a default Snowflake role later. * **Database** — `analytics`. This tells dbt to create new models in the analytics database. * **Warehouse** — `transforming`. This tells dbt to use the transforming warehouse that was created earlier. [![dbt - Snowflake Account Settings](/img/snowflake_tutorial/dbt_cloud_snowflake_account_settings.png?v=2 "dbt - Snowflake Account Settings")](#)dbt - Snowflake Account Settings 6. Click **Save**. 7. Set up your personal development credentials by going to **Your profile** > **Credentials**. 8. Select your project that uses the Snowflake connection. 9. Click the **configure your development environment and add a connection** link. This directs you to a page where you can enter your personal development credentials. 10. Enter your **Development credentials** for Snowflake with: * **Username** — The username you created for Snowflake. The username is not your email address and is usually your first and last name together in one word. * **Password** — The password you set when creating your Snowflake account. * **Schema** — You’ll notice that the schema name has been auto created for you. By convention, this is `dbt_`. This is the schema connected directly to your development environment, and it's where your models will be built when running dbt within the Studio IDE. * **Target name** — Leave as the default. * **Threads** — Leave as 4. This is the number of simultaneous connects that dbt will make to build models concurrently. [![dbt - Snowflake Development Credentials](/img/snowflake_tutorial/dbt_cloud_snowflake_development_credentials.png?v=2 "dbt - Snowflake Development Credentials")](#)dbt - Snowflake Development Credentials 11. Click **Test connection**. This verifies that dbt can access your Snowflake account. 12. If the test succeeded, click **Save** to complete the configuration. If it failed, you may need to check your Snowflake settings and credentials. #### Set up a dbt managed repository[​](#set-up-a-dbt-managed-repository "Direct link to Set up a dbt managed repository") If you used Partner Connect, you can skip to [initializing your dbt project](#initialize-your-dbt-project-and-start-developing) as the Partner Connect provides you with a managed repository. Otherwise, you will need to create your repository connection. When you develop in dbt, you can leverage [Git](https://docs.getdbt.com/docs/cloud/git/git-version-control.md) to version control your code. To connect to a repository, you can either set up a dbt-hosted [managed repository](https://docs.getdbt.com/docs/cloud/git/managed-repository.md) or directly connect to a [supported git provider](https://docs.getdbt.com/docs/cloud/git/connect-github.md). Managed repositories are a great way to trial dbt without needing to create a new repository. In the long run, it's better to connect to a supported git provider to use features like automation and [continuous integration](https://docs.getdbt.com/docs/deploy/continuous-integration.md). To set up a managed repository: 1. Under "Setup a repository", select **Managed**. 2. Type a name for your repo such as `bbaggins-dbt-quickstart` 3. Click **Create**. It will take a few seconds for your repository to be created and imported. 4. Once you see the "Successfully imported repository," click **Continue**. #### Initialize your dbt project​ and start developing[​](#initialize-your-dbt-project-and-start-developing "Direct link to Initialize your dbt project​ and start developing") Now that you have a repository configured, you can initialize your project and start development in dbt: 1. Click **Start developing in the Studio IDE**. It might take a few minutes for your project to spin up for the first time as it establishes your git connection, clones your repo, and tests the connection to the warehouse. 2. Above the file tree to the left, click **Initialize your project**. This builds out your folder structure with example models. 3. Make your initial commit by clicking **Commit and sync**. Use the commit message `initial commit`. This creates the first commit to your managed repo and allows you to open a branch where you can add new dbt code. 4. You can now directly query data from your warehouse and execute `dbt run`. You can try this out now: * Click **+ Create new file**, add this query to the new file, and click **Save as** to save the new file: ```sql select * from raw.jaffle_shop.customers ``` * In the command line bar at the bottom, enter `dbt run` and click **Enter**. You should see a `dbt run succeeded` message. info If you receive an insufficient privileges error on Snowflake at this point, it may be because your Snowflake role doesn't have permission to access the raw source data, to build target tables and views, or both. To troubleshoot, use a role with sufficient privileges (like `ACCOUNTADMIN`) and run the following commands in Snowflake. **Note**: Replace `snowflake_role_name` with the role you intend to use. If you launched dbt with Snowflake Partner Connect, use `pc_dbt_role` as the role. ```text grant all on database raw to role snowflake_role_name; grant all on database analytics to role snowflake_role_name; grant all on schema raw.jaffle_shop to role snowflake_role_name; grant all on schema raw.stripe to role snowflake_role_name; grant all on all tables in database raw to role snowflake_role_name; grant all on future tables in database raw to role snowflake_role_name; ``` #### Build your first model[​](#build-your-first-model "Direct link to Build your first model") You have two options for working with files in the Studio IDE: * Create a new branch (recommended) — Create a new branch to edit and commit your changes. Navigate to **Version Control** on the left sidebar and click **Create branch**. * Edit in the protected primary branch — If you prefer to edit, format, or lint files and execute dbt commands directly in your primary git branch. The Studio IDE prevents commits to the protected branch, so you will be prompted to commit your changes to a new branch. Name the new branch `add-customers-model`. 1. Click the **...** next to the `models` directory, then select **Create file**. 2. Name the file `customers.sql`, then click **Create**. 3. Copy the following query into the file and click **Save**. ```sql with customers as ( select id as customer_id, first_name, last_name from raw.jaffle_shop.customers ), orders as ( select id as order_id, user_id as customer_id, order_date, status from raw.jaffle_shop.orders ), customer_orders as ( select customer_id, min(order_date) as first_order_date, max(order_date) as most_recent_order_date, count(order_id) as number_of_orders from orders group by 1 ), final as ( select customers.customer_id, customers.first_name, customers.last_name, customer_orders.first_order_date, customer_orders.most_recent_order_date, coalesce(customer_orders.number_of_orders, 0) as number_of_orders from customers left join customer_orders using (customer_id) ) select * from final ``` 4. Enter `dbt run` in the command prompt at the bottom of the screen. You should get a successful run and see the three models. Later, you can connect your business intelligence (BI) tools to these views and tables so they only read cleaned-up data rather than raw data. #### Change the way your model is materialized[​](#change-the-way-your-model-is-materialized "Direct link to Change the way your model is materialized") One of the most powerful features of dbt is that you can change the way a model is materialized in your warehouse, simply by changing a configuration value. You can change things between tables and views by changing a keyword rather than writing the data definition language (DDL) to do this behind the scenes. By default, everything gets created as a view. You can override that at the directory level so everything in that directory will materialize to a different materialization. 1. Edit your `dbt_project.yml` file. * Update your project `name` to: dbt\_project.yml ```yaml name: 'jaffle_shop' ``` * Configure `jaffle_shop` so everything in it will be materialized as a table; and configure `example` so everything in it will be materialized as a view. Update your `models` config in the project YAML file to: dbt\_project.yml ```yaml models: jaffle_shop: +materialized: table example: +materialized: view ``` * Click **Save**. 2. Enter the `dbt run` command. Your `customers` model should now be built as a table! info To do this, dbt had to first run a `drop view` statement (or API call on BigQuery), then a `create table as` statement. 3. Edit `models/customers.sql` to override the `dbt_project.yml` for the `customers` model only by adding the following snippet to the top, and click **Save**: models/customers.sql ```sql {{ config( materialized='view' ) }} with customers as ( select id as customer_id ... ) ``` 4. Enter the `dbt run` command. Your model, `customers`, should now build as a view. * BigQuery users need to run `dbt run --full-refresh` instead of `dbt run` to full apply materialization changes. 5. Enter the `dbt run --full-refresh` command for this to take effect in your warehouse. ##### FAQs[​](#faqs "Direct link to FAQs") What materializations are available in dbt? dbt ships with five built-in materializations: `view`, `table`, `incremental`, `ephemeral`, and `materialized_view`. Check out the documentation on [materializations](https://docs.getdbt.com/docs/build/materializations.md) for more information on each of these options. You can also create your own [custom materializations](https://docs.getdbt.com/guides/create-new-materializations.md). This is an advanced feature of dbt. Which materialization should I use for my model? Start out with views, and then change models to tables when required for performance reasons (i.e. downstream queries have slowed). Check out the [docs on materializations](https://docs.getdbt.com/docs/build/materializations.md) for advice on when to use each materialization. What model configurations exist? You can also configure: * [tags](https://docs.getdbt.com/reference/resource-configs/tags.md) to support easy categorization and graph selection * [custom schemas](https://docs.getdbt.com/reference/resource-properties/schema.md) to split your models across multiple schemas * [aliases](https://docs.getdbt.com/reference/resource-configs/alias.md) if your view/table name should differ from the filename * Snippets of SQL to run at the start or end of a model, known as [hooks](https://docs.getdbt.com/docs/build/hooks-operations.md) * Warehouse-specific configurations for performance (e.g. `sort` and `dist` keys on Redshift, `partitions` on BigQuery) Check out the docs on [model configurations](https://docs.getdbt.com/reference/model-configs.md) to learn more. #### Delete the example models[​](#delete-the-example-models "Direct link to Delete the example models") You can now delete the files that dbt created when you initialized the project: 1. Delete the `models/example/` directory. 2. Delete the `example:` key from your `dbt_project.yml` file, and any configurations that are listed under it. dbt\_project.yml ```yaml # before models: jaffle_shop: +materialized: table example: +materialized: view ``` dbt\_project.yml ```yaml # after models: jaffle_shop: +materialized: table ``` 3. Save your changes. ###### FAQs[​](#faqs "Direct link to FAQs") How do I remove deleted models from my data warehouse? If you delete a model from your dbt project, dbt does not automatically drop the relation from your schema. This means that you can end up with extra objects in schemas that dbt creates, which can be confusing to other users. (This can also happen when you switch a model from being a view or table, to ephemeral) When you remove models from your dbt project, you should manually drop the related relations from your schema. I got an "unused model configurations" error message, what does this mean? You might have forgotten to nest your configurations under your project name, or you might be trying to apply configurations to a directory that doesn't exist. Check out this [article](https://discourse.getdbt.com/t/faq-i-got-an-unused-model-configurations-error-message-what-does-this-mean/112) to understand more. #### Build models on top of other models[​](#build-models-on-top-of-other-models "Direct link to Build models on top of other models") As a best practice in SQL, you should separate logic that cleans up your data from logic that transforms your data. You have already started doing this in the existing query by using common table expressions (CTEs). Now you can experiment by separating the logic out into separate models and using the [ref](https://docs.getdbt.com/reference/dbt-jinja-functions/ref.md) function to build models on top of other models: [![The DAG we want for our dbt project](/img/dbt-dag.png?v=2 "The DAG we want for our dbt project")](#)The DAG we want for our dbt project 1. Create a new SQL file, `models/stg_customers.sql`, with the SQL from the `customers` CTE in our original query. 2. Create a second new SQL file, `models/stg_orders.sql`, with the SQL from the `orders` CTE in our original query. models/stg\_customers.sql ```sql select id as customer_id, first_name, last_name from raw.jaffle_shop.customers ``` models/stg\_orders.sql ```sql select id as order_id, user_id as customer_id, order_date, status from raw.jaffle_shop.orders ``` 3. Edit the SQL in your `models/customers.sql` file as follows: models/customers.sql ```sql with customers as ( select * from {{ ref('stg_customers') }} ), orders as ( select * from {{ ref('stg_orders') }} ), customer_orders as ( select customer_id, min(order_date) as first_order_date, max(order_date) as most_recent_order_date, count(order_id) as number_of_orders from orders group by 1 ), final as ( select customers.customer_id, customers.first_name, customers.last_name, customer_orders.first_order_date, customer_orders.most_recent_order_date, coalesce(customer_orders.number_of_orders, 0) as number_of_orders from customers left join customer_orders using (customer_id) ) select * from final ``` 4. Execute `dbt run`. This time, when you performed a `dbt run`, separate views/tables were created for `stg_customers`, `stg_orders` and `customers`. dbt inferred the order to run these models. Because `customers` depends on `stg_customers` and `stg_orders`, dbt builds `customers` last. You do not need to explicitly define these dependencies. ###### FAQs[​](#faq-2 "Direct link to FAQs") How do I run one model at a time? To run one model, use the `--select` flag (or `-s` flag), followed by the name of the model: ```shell $ dbt run --select customers ``` Check out the [model selection syntax documentation](https://docs.getdbt.com/reference/node-selection/syntax.md) for more operators and examples. Do ref-able resource names need to be unique? Within one project: yes! To build dependencies between resources (such as models, seeds, and snapshots), you need to use the `ref` function, and pass in the resource name as an argument. dbt uses that resource name to uniquely resolve the `ref` to a specific resource. As a result, these resource names need to be unique, *even if they are in distinct folders*. A resource in one project can have the same name as a resource in another project (installed as a dependency). dbt uses the project name to uniquely identify each resource. We call this "namespacing." If you `ref` a resource with a duplicated name, it will resolve to the resource within the same namespace (package or project), or raise an error because of an ambiguous reference. Use [two-argument `ref`](https://docs.getdbt.com/reference/dbt-jinja-functions/ref.md#ref-project-specific-models) to disambiguate references by specifying the namespace. Those resource will still need to land in distinct locations in the data warehouse. Read the docs on [custom aliases](https://docs.getdbt.com/docs/build/custom-aliases.md) and [custom schemas](https://docs.getdbt.com/docs/build/custom-schemas.md) for details on how to achieve this. As I create more models, how should I keep my project organized? What should I name my models? There's no one best way to structure a project! Every organization is unique. If you're just getting started, check out how we (dbt Labs) [structure our dbt projects](https://docs.getdbt.com/best-practices/how-we-structure/1-guide-overview.md). #### Build models on top of sources[​](#build-models-on-top-of-sources "Direct link to Build models on top of sources") Sources make it possible to name and describe the data loaded into your warehouse by your extract and load tools. By declaring these tables as sources in dbt, you can: * select from source tables in your models using the `{{ source() }}` function, helping define the lineage of your data * test your assumptions about your source data * calculate the freshness of your source data 1. Create a new YML file `models/sources.yml`. 2. Declare the sources by copying the following into the file and clicking **Save**. models/sources.yml ```yml sources: - name: jaffle_shop description: This is a replica of the Postgres database used by our app database: raw schema: jaffle_shop tables: - name: customers description: One record per customer. - name: orders description: One record per order. Includes cancelled and deleted orders. ``` 3. Edit the `models/stg_customers.sql` file to select from the `customers` table in the `jaffle_shop` source. models/stg\_customers.sql ```sql select id as customer_id, first_name, last_name from {{ source('jaffle_shop', 'customers') }} ``` 4. Edit the `models/stg_orders.sql` file to select from the `orders` table in the `jaffle_shop` source. models/stg\_orders.sql ```sql select id as order_id, user_id as customer_id, order_date, status from {{ source('jaffle_shop', 'orders') }} ``` 5. Execute `dbt run`. The results of your `dbt run` will be exactly the same as the previous step. Your `stg_customers` and `stg_orders` models will still query from the same raw data source in Snowflake. By using `source`, you can test and document your raw data and also understand the lineage of your sources. #### Add tests to your models[​](#add-tests-to-your-models "Direct link to Add tests to your models") Adding [data tests](https://docs.getdbt.com/docs/build/data-tests.md) to a project helps validate that your models are working correctly. To add data tests to your project: 1. Create a new YAML file in the `models` directory, named `models/schema.yml` 2. Add the following contents to the file: models/schema.yml ```yaml version: 2 models: - name: customers columns: - name: customer_id data_tests: - unique - not_null - name: stg_customers columns: - name: customer_id data_tests: - unique - not_null - name: stg_orders columns: - name: order_id data_tests: - unique - not_null - name: status data_tests: - accepted_values: arguments: # available in v1.10.5 and higher. Older versions can set the as the top-level property. values: ['placed', 'shipped', 'completed', 'return_pending', 'returned'] - name: customer_id data_tests: - not_null - relationships: arguments: to: ref('stg_customers') field: customer_id ``` 3. Run `dbt test`, and confirm that all your tests passed. When you run `dbt test`, dbt iterates through your YAML files, and constructs a query for each test. Each query will return the number of records that fail the test. If this number is 0, then the test is successful. ###### FAQs[​](#faqs "Direct link to FAQs") What tests are available for me to use in dbt? Can I add my own custom tests? Out of the box, dbt ships with the following data tests: * `unique` * `not_null` * `accepted_values` * `relationships` (for example, referential integrity) You can also write your own [custom generic tests](https://docs.getdbt.com/docs/build/data-tests.md#generic-data-tests). Some additional generic tests have been open-sourced in the [dbt-utils package](https://github.com/dbt-labs/dbt-utils#generic-tests). Check out the docs on [packages](https://docs.getdbt.com/docs/build/packages.md) to learn how to make these tests available in your project. How do I test one model at a time? Running tests on one model looks very similar to running a model: use the `--select` flag (or `-s` flag), followed by the name of the model: ```shell dbt test --select customers ``` Check out the [model selection syntax documentation](https://docs.getdbt.com/reference/node-selection/syntax.md) for full syntax, and [test selection examples](https://docs.getdbt.com/reference/node-selection/test-selection-examples.md) in particular. One of my tests failed, how can I debug it? To debug a failing test, find the SQL that dbt ran by: * dbt: * Within the test output, click on the failed test, and then select "Details". * dbt Core: * Open the file path returned as part of the error message. * Navigate to the `target/compiled/schema_tests` directory for all compiled test queries. Copy the SQL into a query editor (in dbt, you can paste it into a new `Statement`), and run the query to find the records that failed. Does my test file need to be named \`schema.yml\`? No! You can name this file whatever you want (including `whatever_you_want.yml`), so long as: * The file is in your `models/` directory¹ * The file has `.yml` extension Check out the [docs](https://docs.getdbt.com/reference/configs-and-properties.md) for more information. ¹If you're declaring properties for seeds, snapshots, or macros, you can also place this file in the related directory — `seeds/`, `snapshots/` and `macros/` respectively. Why do model and source YAML files always start with \`version: 2\`? Once upon a time, the structure of these `.yml` files was very different (s/o to anyone who was using dbt back then!). Adding `version: 2` allowed us to make this structure more extensible. From [dbt Core v1.5](), the top-level `version:` key is optional in all resource YAML files. If present, only `version: 2` is supported. Also starting in v1.5, both the [`config-version: 2`](https://docs.getdbt.com/reference/project-configs/config-version.md) and the top-level `version:` key in the `dbt_project.yml` are optional. Resource YAML files do not currently require this config. We only support `version: 2` if it's specified. Although we do not expect to update YAML files to `version: 3` soon, having this config will make it easier for us to introduce new structures in the future What data tests should I add to my project? We recommend that every model has a data test on a primary key, that is, a column that is `unique` and `not_null`. We also recommend that you test any assumptions on your source data. For example, if you believe that your payments can only be one of three payment methods, you should test that assumption regularly — a new payment method may introduce logic errors in your SQL. In advanced dbt projects, we recommend using [sources](https://docs.getdbt.com/docs/build/sources.md) and running these source data-integrity tests against the sources rather than models. When should I run my data tests? You should run your data tests whenever you are writing new code (to ensure you haven't broken any existing models by changing SQL), and whenever you run your transformations in production (to ensure that your assumptions about your source data are still valid). #### Document your models[​](#document-your-models "Direct link to Document your models") Adding [documentation](https://docs.getdbt.com/docs/build/documentation.md) to your project allows you to describe your models in rich detail, and share that information with your team. Here, we're going to add some basic documentation to our project. Update your `models/schema.yml` file to include some descriptions, such as those below. models/schema.yml ```yaml version: 2 models: - name: customers description: One record per customer columns: - name: customer_id description: Primary key data_tests: - unique - not_null - name: first_order_date description: NULL when a customer has not yet placed an order. - name: stg_customers description: This model cleans up customer data columns: - name: customer_id description: Primary key data_tests: - unique - not_null - name: stg_orders description: This model cleans up order data columns: - name: order_id description: Primary key data_tests: - unique - not_null - name: status data_tests: - accepted_values: arguments: # available in v1.10.5 and higher. Older versions can set the as the top-level property. values: ['placed', 'shipped', 'completed', 'return_pending', 'returned'] - name: customer_id data_tests: - not_null - relationships: arguments: to: ref('stg_customers') field: customer_id ``` * View in Catalog * View in Studio IDE [Catalog](https://docs.getdbt.com/docs/explore/explore-projects.md) provides powerful tools to interact with your dbt projects, including documentation: 1. From the IDE, run one of the following commands: * `dbt docs generate` if you're on dbt Core * `dbt build` if you're on the dbt Fusion engine 2. Click **Catalog** in the navigation menu to launch Catalog. 3. In the Catalog pane, click the environment selection dropdown menu at the top of the file tree and change it from **Production** to **Development**. [![View your development environment information.](/img/docs/collaborate/dbt-explorer/catalog-nav-dropdown.png?v=2 "View your development environment information.")](#)View your development environment information. 4. Select your project from the file tree. 5. Use the search bar or browse the resource list to find the `customers` model. 6. Click the model to view its details, including the descriptions you added. [![View your model's documentation and lineage in Catalog.](/img/docs/collaborate/dbt-explorer/example-model-details.png?v=2 "View your model's documentation and lineage in Catalog.")](#)View your model's documentation and lineage in Catalog. Catalog displays your model's description, column documentation, data tests, and lineage graph. You can also see which columns are missing documentation and track test coverage across your project. You can view docs directly from the IDE if you're on `Latest` or another version of dbt Core. Keep in mind that this is a legacy view and doesn't offer the same level of interactivity as Catalog. 1. In the IDE, run `dbt docs generate`. 2. From the navigation bar, click the **View docs** icon located to the right of the **branch name**. [![The View docs icon in the Studio IDE.](/img/docs/collaborate/dbt-explorer/docs-icon.png?v=2 "The View docs icon in the Studio IDE.")](#)The View docs icon in the Studio IDE. 3. From **Projects**, select your project name and expand the folders. 4. Click **models** > **marts** > **customers**. [![View your model's documentation in the legacy docs view.](/img/docs/collaborate/dbt-explorer/legacy-docs-view.png?v=2 "View your model's documentation in the legacy docs view.")](#)View your model's documentation in the legacy docs view. ###### FAQs[​](#faqs "Direct link to FAQs") How do I write long-form explanations in my descriptions? If you need more than a sentence to explain a model, you can: 1. Split your description over multiple lines using `>`. Interior line breaks are removed and Markdown can be used. This method is recommended for simple, single-paragraph descriptions: ```yml models: - name: customers description: > Lorem ipsum **dolor** sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. ``` 2. Split your description over multiple lines using `|`. Interior line breaks are maintained and Markdown can be used. This method is recommended for more complex descriptions: ```yml models: - name: customers description: | ### Lorem ipsum * dolor sit amet, consectetur adipisicing elit, sed do eiusmod * tempor incididunt ut labore et dolore magna aliqua. ``` 3. Use a [docs block](https://docs.getdbt.com/docs/build/documentation.md#using-docs-blocks) to write the description in a separate Markdown file. How do I access documentation in dbt Catalog? If you're using dbt to deploy your project and have a [Starter, Enterprise, or Enterprise+ plan](https://www.getdbt.com/pricing/), you can use Catalog to view your project's [resources](https://docs.getdbt.com/docs/build/projects.md) (such as models, tests, and metrics) and their lineage to gain a better understanding of its latest production state. Access Catalog in dbt by clicking the **Catalog** link in the navigation. You can have up to 5 read-only users access the documentation for your project. dbt developer plan and dbt Core users can use [dbt Docs](https://docs.getdbt.com/docs/explore/build-and-view-your-docs.md#dbt-docs), which generates basic documentation but it doesn't offer the same speed, metadata, or visibility as Catalog. #### Commit your changes[​](#commit-your-changes "Direct link to Commit your changes") Now that you've built your customer model, you need to commit the changes you made to the project so that the repository has your latest code. **If you edited directly in the protected primary branch:**
1. Click the **Commit and sync git** button. This action prepares your changes for commit. 2. A modal titled **Commit to a new branch** will appear. 3. In the modal window, name your new branch `add-customers-model`. This branches off from your primary branch with your new changes. 4. Add a commit message, such as "Add customers model, tests, docs" and and commit your changes. 5. Click **Merge this branch to main** to add these changes to the main branch on your repo. **If you created a new branch before editing:**
1. Since you already branched out of the primary protected branch, go to **Version Control** on the left. 2. Click **Commit and sync** to add a message. 3. Add a commit message, such as "Add customers model, tests, docs." 4. Click **Merge this branch to main** to add these changes to the main branch on your repo. #### Deploy dbt[​](#deploy-dbt "Direct link to Deploy dbt") Use dbt's Scheduler to deploy your production jobs confidently and build observability into your processes. You'll learn to create a deployment environment and run a job in the following steps. ##### Create a deployment environment[​](#create-a-deployment-environment "Direct link to Create a deployment environment") 1. From the main menu, go to **Orchestration** > **Environments**. 2. Click **Create environment**. 3. In the **Name** field, write the name of your deployment environment. For example, "Production." 4. The **dbt version** will default to the latest available. We recommend all new projects run on the latest version of dbt. 5. Under **Deployment connection**, enter the name of the dataset you want to use as the target, such as "Analytics". This will allow dbt to build and work with that dataset. For some data warehouses, the target dataset may be referred to as a "schema". 6. Click **Save**. ##### Create and run a job[​](#create-and-run-a-job "Direct link to Create and run a job") Jobs are a set of dbt commands that you want to run on a schedule. For example, `dbt build`. As the `jaffle_shop` business gains more customers, and those customers create more orders, you will see more records added to your source data. Because you materialized the `customers` model as a table, you'll need to periodically rebuild your table to ensure that the data stays up-to-date. This update will happen when you run a job. 1. After creating your deployment environment, you should be directed to the page for a new environment. If not, select **Orchestration** from the main menu, then click **Jobs**. 2. Click **Create job** > **Deploy job**. 3. Provide a job name (for example, "Production run") and select the environment you just created. 4. Scroll down to the **Execution settings** section. 5. Under **Commands**, add this command as part of your job if you don't see it: * `dbt build` 6. Select the **Generate docs on run** option to automatically [generate updated project docs](https://docs.getdbt.com/docs/explore/build-and-view-your-docs.md) each time your job runs. 7. For this exercise, do *not* set a schedule for your project to run — while your organization's project should run regularly, there's no need to run this example project on a schedule. Scheduling a job is sometimes referred to as *deploying a project*. 8. Click **Save**, then click **Run now** to run your job. 9. Click the run and watch its progress under **Run summary**. 10. Once the run is complete, click **View Documentation** to see the docs for your project. Congratulations 🎉! You've just deployed your first dbt project! ###### FAQs[​](#faqs "Direct link to FAQs") What happens if one of my runs fails? If you're using dbt, we recommend setting up email and Slack notifications (`Account Settings > Notifications`) for any failed runs. Then, debug these runs the same way you would debug any runs in development. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Quickstart for dbt and Starburst Galaxy [Back to guides](https://docs.getdbt.com/guides.md) dbt platform Quickstart Beginner [Menu ]() #### Introduction[​](#introduction "Direct link to Introduction") In this quickstart guide, you'll learn how to use dbt with [Starburst Galaxy](https://www.starburst.io/platform/starburst-galaxy/). It will show you how to: * Load data into the Amazon S3 bucket. This guide uses AWS as the cloud service provider for demonstrative purposes. Starburst Galaxy also [supports other data sources](https://docs.starburst.io/starburst-galaxy/catalogs/index.html) such as Google Cloud, Microsoft Azure, and more. * Connect Starburst Galaxy to the Amazon S3 bucket. * Create tables with Starburst Galaxy. * Connect dbt to Starburst Galaxy. * Take a sample query and turn it into a model in your dbt project. A model in dbt is a select statement. * Add tests to your models. * Document your models. * Schedule a job to run. * Connect to multiple data sources in addition to your S3 bucket. Videos for you You can check out [dbt Fundamentals](https://learn.getdbt.com/courses/dbt-fundamentals) for free if you're interested in course learning with videos. You can also watch the [Build Better Data Pipelines with dbt and Starburst](https://www.youtube.com/watch?v=tfWm4dWgwRg) YouTube video produced by Starburst Data, Inc. ##### Prerequisites[​](#prerequisites "Direct link to Prerequisites") * You have a [multi-tenant](https://docs.getdbt.com/docs/cloud/about-cloud/access-regions-ip-addresses.md) deployment in [dbt](https://www.getdbt.com/signup/). For more information, refer to [Tenancy](https://docs.getdbt.com/docs/cloud/about-cloud/tenancy.md). * You have a [Starburst Galaxy account](https://www.starburst.io/platform/starburst-galaxy/). If you don't, you can start a free trial. Refer to the [getting started guide](https://docs.starburst.io/starburst-galaxy/get-started.html) in the Starburst Galaxy docs for further setup details. * You have an AWS account with permissions to upload data to an S3 bucket. * For Amazon S3 authentication, you will need either an AWS access key and AWS secret key with access to the bucket, or you will need a cross account IAM role with access to the bucket. For details, refer to these Starburst Galaxy docs: * [AWS access and secret key instructions](https://docs.starburst.io/starburst-galaxy/security/external-aws.html#aws-access-and-secret-key) * [Cross account IAM role](https://docs.starburst.io/starburst-galaxy/security/external-aws.html#role) ##### Related content[​](#related-content "Direct link to Related content") * [dbt Learn courses](https://learn.getdbt.com) * [dbt CI job](https://docs.getdbt.com/docs/deploy/continuous-integration.md) * [Job notifications](https://docs.getdbt.com/docs/deploy/job-notifications.md) * [Source freshness](https://docs.getdbt.com/docs/deploy/source-freshness.md) * [SQL overview for Starburst Galaxy](https://docs.starburst.io/starburst-galaxy/sql/index.html) #### Load data to an Amazon S3 bucket[​](#load-data-to-s3 "Direct link to Load data to an Amazon S3 bucket") Using Starburst Galaxy, you can create tables and also transform them with dbt. Start by loading the Jaffle Shop data (provided by dbt Labs) to your Amazon S3 bucket. Jaffle Shop is a fictional cafe selling food and beverages in several US cities. 1. Download these CSV files to your local machine: * [jaffle\_shop\_customers.csv](https://dbt-tutorial-public.s3-us-west-2.amazonaws.com/jaffle_shop_customers.csv) * [jaffle\_shop\_orders.csv](https://dbt-tutorial-public.s3-us-west-2.amazonaws.com/jaffle_shop_orders.csv) * [stripe\_payments.csv](https://dbt-tutorial-public.s3-us-west-2.amazonaws.com/stripe_payments.csv) 2. Upload these files to S3. For details, refer to [Upload objects](https://docs.aws.amazon.com/AmazonS3/latest/userguide/upload-objects.html) in the Amazon S3 docs. When uploading these files, you must create the following folder structure and upload the appropriate file to each folder: ```text dbt-quickstart (folder) jaffle-shop-customers (folder) jaffle_shop_customers.csv (file) jaffle-shop-orders (folder) jaffle_shop_orders.csv (file) stripe-payments (folder) stripe-payments.csv (file) ``` #### Connect Starburst Galaxy to the Amazon S3 bucket[​](#connect-to-s3-bucket "Direct link to Connect Starburst Galaxy to the Amazon S3 bucket") If your Starburst Galaxy instance is not already connected to your S3 bucket, you need to create a cluster, configure a catalog that allows Starburst Galaxy to connect to the S3 bucket, add the catalog to your new cluster, and configure privilege settings. In addition to Amazon S3, Starburst Galaxy supports many other data sources. To learn more about them, you can refer to the [Catalogs overview](https://docs.starburst.io/starburst-galaxy/catalogs/index.html) in the Starburst Galaxy docs. 1. Create a cluster. Click **Clusters** on the left sidebar of the Starburst Galaxy UI, then click **Create cluster** in the main body of the page. 2. In the **Create a new cluster** modal, you only need to set the following options. You can use the defaults for the other options. * **Cluster name** — Type a name for your cluster. * **Cloud provider region** — Select the AWS region. When done, click **Create cluster**. 3. Create a catalog. Click **Catalogs** on the left sidebar of the Starburst Galaxy UI, then click **Create catalog** in the main body of the page. 4. On the **Create a data source** page, select the Amazon S3 tile. 5. In the **Name and description** section of the **Amazon S3** page, fill out the fields. 6. In the **Authentication to S3** section of the **Amazon S3** page, select the [AWS (S3) authentication mechanism](#prerequisites) you chose to connect with. 7. In the **Metastore configuration** section, set these options: * **Default S3 bucket name** — Enter the name of your S3 bucket you want to access. * **Default directory name** — Enter the folder name of where the Jaffle Shop data lives in the S3 bucket. This is the same folder name you used in [Load data to an Amazon S3 bucket](#load-data-to-s3). * **Allow creating external tables** — Enable this option. * **Allow writing to external tables** — Enable this option. The **Amazon S3** page should look similar to this, except for the **Authentication to S3** section which is dependant on your setup: [![Amazon S3 connection settings in Starburst Galaxy](/img/quickstarts/dbt-cloud/starburst-galaxy-config-s3.png?v=2 "Amazon S3 connection settings in Starburst Galaxy")](#)Amazon S3 connection settings in Starburst Galaxy 8. Click **Test connection**. This verifies that Starburst Galaxy can access your S3 bucket. 9. Click **Connect catalog** if the connection test passes. [![Successful connection test](/img/quickstarts/dbt-cloud/test-connection-success.png?v=2 "Successful connection test")](#)Successful connection test 10. On the **Set permissions** page, click **Skip**. You can add permissions later if you want. 11. On the **Add to cluster** page, choose the cluster you want to add the catalog to from the dropdown and click **Add to cluster**. 12. Add the location privilege for your S3 bucket to your role in Starburst Galaxy. Click **Access control > Roles and privileges** on the left sidebar of the Starburst Galaxy UI. Then, in the **Roles** table, click the role name **accountadmin**. If you're using an existing Starburst Galaxy cluster and don't have access to the accountadmin role, then select a role that you do have access to. To learn more about access control, refer to [Access control](https://docs.starburst.io/starburst-galaxy/security/access-control.html) in the Starburst Galaxy docs. 13. On the **Roles** page, click the **Privileges** tab and click **Add privilege**. 14. On the **Add privilege** page, set these options: * **What would you like to modify privileges for?** — Choose **Location**. * **Enter a storage location provide** — Enter the storage location of *your S3 bucket* and the folder of where the Jaffle Shop data lives. Make sure to include the `/*` at the end of the location. * **Create SQL** — Enable the option. When done, click **Add privileges**. [![Add privilege to accountadmin role](/img/quickstarts/dbt-cloud/add-privilege.png?v=2 "Add privilege to accountadmin role")](#)Add privilege to accountadmin role #### Create tables with Starburst Galaxy[​](#create-tables-with-starburst-galaxy "Direct link to Create tables with Starburst Galaxy") To query the Jaffle Shop data with Starburst Galaxy, you need to create tables using the Jaffle Shop data that you [loaded to your S3 bucket](#load-data-to-s3). You can do this (and run any SQL statement) from the [query editor](https://docs.starburst.io/starburst-galaxy/query/query-editor.html). 1. Click **Query > Query editor** on the left sidebar of the Starburst Galaxy UI. The main body of the page is now the query editor. 2. Configure the query editor so it queries your S3 bucket. In the upper right corner of the query editor, select your cluster in the first gray box and select your catalog in the second gray box: [![Set the cluster and catalog in query editor](/img/quickstarts/dbt-cloud/starburst-galaxy-editor.png?v=2 "Set the cluster and catalog in query editor")](#)Set the cluster and catalog in query editor 3. Copy and paste these queries into the query editor. Then **Run** each query individually. Replace `YOUR_S3_BUCKET_NAME` with the name of your S3 bucket. These queries create a schema named `jaffle_shop` and also create the `jaffle_shop_customers`, `jaffle_shop_orders`, and `stripe_payments` tables: ```sql CREATE SCHEMA jaffle_shop WITH (location='s3://YOUR_S3_BUCKET_NAME/dbt-quickstart/'); CREATE TABLE jaffle_shop.jaffle_shop_customers ( id VARCHAR, first_name VARCHAR, last_name VARCHAR ) WITH ( external_location = 's3://YOUR_S3_BUCKET_NAME/dbt-quickstart/jaffle-shop-customers/', format = 'csv', type = 'hive', skip_header_line_count=1 ); CREATE TABLE jaffle_shop.jaffle_shop_orders ( id VARCHAR, user_id VARCHAR, order_date VARCHAR, status VARCHAR ) WITH ( external_location = 's3://YOUR_S3_BUCKET_NAME/dbt-quickstart/jaffle-shop-orders/', format = 'csv', type = 'hive', skip_header_line_count=1 ); CREATE TABLE jaffle_shop.stripe_payments ( id VARCHAR, order_id VARCHAR, paymentmethod VARCHAR, status VARCHAR, amount VARCHAR, created VARCHAR ) WITH ( external_location = 's3://YOUR_S3_BUCKET_NAME/dbt-quickstart/stripe-payments/', format = 'csv', type = 'hive', skip_header_line_count=1 ); ``` 4. When the queries are done, you can see the following hierarchy on the query editor's left sidebar: [![Hierarchy of data in query editor](/img/quickstarts/dbt-cloud/starburst-data-hierarchy.png?v=2 "Hierarchy of data in query editor")](#)Hierarchy of data in query editor 5. Verify that the tables were created successfully. In the query editor, run the following queries: ```sql select * from jaffle_shop.jaffle_shop_customers; select * from jaffle_shop.jaffle_shop_orders; select * from jaffle_shop.stripe_payments; ``` #### Connect dbt to Starburst Galaxy[​](#connect-dbt-to-starburst-galaxy "Direct link to Connect dbt to Starburst Galaxy") 1. Make sure you are still logged in to [Starburst Galaxy](https://galaxy.starburst.io/login). 2. If you haven’t already, set your account’s role to accountadmin. Click your email address in the upper right corner, choose **Switch role** and select **accountadmin**. If this role is not listed for you, choose the role you selected in [Connect Starburst Galaxy to the Amazon S3 bucket](#connect-to-s3-bucket) when you added location privilege for your S3 bucket. 3. Click **Clusters** on the left sidebar. 4. Find your cluster in the **View clusters** table and click **Connection info**. Choose **dbt** from the **Select client** dropdown. Keep the **Connection information** modal open. You will use details from that modal in dbt. 5. In another browser tab, log in to [dbt](https://docs.getdbt.com/docs/cloud/about-cloud/access-regions-ip-addresses.md). 6. Create a new project in dbt. Click on your account name in the left side menu, select **Account settings**, and click **+ New Project**. 7. Enter a project name and click **Continue**. 8. Choose **Starburst** as your connection and click **Next**. 9. Enter the **Settings** for your new project: * **Host** – The **Host** value from the **Connection information** modal in your Starburst Galaxy tab. * **Port** – 443 (which is the default) 10. Enter the **Development Credentials** for your new project: * **User** – The **User** value from the **Connection information** modal in your Starburst Galaxy tab. Make sure to use the entire string, including the account's role which is the `/` and all the characters that follow. If you don’t include it, your default role is used and that might not have the correct permissions for project development. * **Password** – The password you use to log in to your Starburst Galaxy account. * **Database** – The Starburst catalog you want to save your data to (for example, when writing new tables). For future reference, database is synonymous to catalog between dbt and Starburst Galaxy. * Leave the remaining options as is. You can use their default values. 11. Click **Test Connection**. This verifies that dbt can access your Starburst Galaxy cluster. 12. Click **Next** if the test succeeded. If it failed, you might need to check your Starburst Galaxy settings and credentials. #### Set up a dbt managed repository[​](#set-up-a-dbt-managed-repository "Direct link to Set up a dbt managed repository") When you develop in dbt, you can leverage [Git](https://docs.getdbt.com/docs/cloud/git/git-version-control.md) to version control your code. To connect to a repository, you can either set up a dbt-hosted [managed repository](https://docs.getdbt.com/docs/cloud/git/managed-repository.md) or directly connect to a [supported git provider](https://docs.getdbt.com/docs/cloud/git/connect-github.md). Managed repositories are a great way to trial dbt without needing to create a new repository. In the long run, it's better to connect to a supported git provider to use features like automation and [continuous integration](https://docs.getdbt.com/docs/deploy/continuous-integration.md). To set up a managed repository: 1. Under "Setup a repository", select **Managed**. 2. Type a name for your repo such as `bbaggins-dbt-quickstart` 3. Click **Create**. It will take a few seconds for your repository to be created and imported. 4. Once you see the "Successfully imported repository," click **Continue**. #### Initialize your dbt project​ and start developing[​](#initialize-your-dbt-project-and-start-developing "Direct link to Initialize your dbt project​ and start developing") Now that you have a repository configured, you can initialize your project and start development in dbt: 1. Click **Start developing in the Studio IDE**. It might take a few minutes for your project to spin up for the first time as it establishes your git connection, clones your repo, and tests the connection to the warehouse. 2. Above the file tree to the left, click **Initialize dbt project**. This builds out your folder structure with example models. 3. Make your initial commit by clicking **Commit and sync**. Use the commit message `initial commit` and click **Commit**. This creates the first commit to your managed repo and allows you to open a branch where you can add new dbt code. 4. You can now directly query data from your warehouse and execute `dbt run`. You can try this out now: * Click **+ Create new file**, add this query to the new file, and click **Save as** to save the new file: ```sql select * from dbt_quickstart.jaffle_shop.jaffle_shop_customers ``` * In the command line bar at the bottom, enter `dbt run` and click **Enter**. You should see a `dbt run succeeded` message. #### Build your first model[​](#build-your-first-model "Direct link to Build your first model") You have two options for working with files in the Studio IDE: * Create a new branch (recommended) — Create a new branch to edit and commit your changes. Navigate to **Version Control** on the left sidebar and click **Create branch**. * Edit in the protected primary branch — If you prefer to edit, format, or lint files and execute dbt commands directly in your primary git branch. The Studio IDE prevents commits to the protected branch, so you will be prompted to commit your changes to a new branch. Name the new branch `add-customers-model`. 1. Click the **...** next to the `models` directory, then select **Create file**. 2. Name the file `customers.sql`, then click **Create**. 3. Copy the following query into the file and click **Save**. ```sql with customers as ( select id as customer_id, first_name, last_name from dbt_quickstart.jaffle_shop.jaffle_shop_customers ), orders as ( select id as order_id, user_id as customer_id, order_date, status from dbt_quickstart.jaffle_shop.jaffle_shop_orders ), customer_orders as ( select customer_id, min(order_date) as first_order_date, max(order_date) as most_recent_order_date, count(order_id) as number_of_orders from orders group by 1 ), final as ( select customers.customer_id, customers.first_name, customers.last_name, customer_orders.first_order_date, customer_orders.most_recent_order_date, coalesce(customer_orders.number_of_orders, 0) as number_of_orders from customers left join customer_orders on customers.customer_id = customer_orders.customer_id ) select * from final ``` 4. Enter `dbt run` in the command prompt at the bottom of the screen. You should get a successful run and see the three models. Later, you can connect your business intelligence (BI) tools to these views and tables so they only read cleaned up data rather than raw data in your BI tool. ###### FAQs[​](#faqs "Direct link to FAQs") How can I see the SQL that dbt is running? To check out the SQL that dbt is running, you can look in: * dbt: * Within the run output, click on a model name, and then select "Details" * dbt Core: * The `target/compiled/` directory for compiled `select` statements * The `target/run/` directory for compiled `create` statements * The `logs/dbt.log` file for verbose logging. How did dbt choose which schema to build my models in? By default, dbt builds models in your target schema. To change your target schema: * If you're developing in **dbt**, these are set for each user when you first use a development environment. * If you're developing with **dbt Core**, this is the `schema:` parameter in your `profiles.yml` file. If you wish to split your models across multiple schemas, check out the docs on [using custom schemas](https://docs.getdbt.com/docs/build/custom-schemas.md). Note: on BigQuery, `dataset` is used interchangeably with `schema`. Do I need to create my target schema before running dbt? Nope! dbt will check if the schema exists when it runs. If the schema does not exist, dbt will create it for you. If I rerun dbt, will there be any downtime as models are rebuilt? Nope! The SQL that dbt generates behind the scenes ensures that any relations are replaced atomically (i.e. your business users won't experience any downtime). The implementation of this varies on each warehouse, check out the [logs](https://docs.getdbt.com/faqs/Runs/checking-logs.md) to see the SQL dbt is executing. What happens if the SQL in my query is bad or I get a database error? If there's a mistake in your SQL, dbt will return the error that your database returns. ```shell $ dbt run --select customers Running with dbt=1.9.0 Found 3 models, 9 tests, 0 snapshots, 0 analyses, 133 macros, 0 operations, 0 seed files, 0 sources 14:04:12 | Concurrency: 1 threads (target='dev') 14:04:12 | 14:04:12 | 1 of 1 START view model dbt_alice.customers.......................... [RUN] 14:04:13 | 1 of 1 ERROR creating view model dbt_alice.customers................. [ERROR in 0.81s] 14:04:13 | 14:04:13 | Finished running 1 view model in 1.68s. Completed with 1 error and 0 warnings: Database Error in model customers (models/customers.sql) Syntax error: Expected ")" but got identifier `your-info-12345` at [13:15] compiled SQL at target/run/jaffle_shop/customers.sql Done. PASS=0 WARN=0 ERROR=1 SKIP=0 TOTAL=1 ``` Any models downstream of this model will also be skipped. Use the error message and the [compiled SQL](https://docs.getdbt.com/faqs/Runs/checking-logs.md) to debug any errors. #### Change the way your model is materialized[​](#change-the-way-your-model-is-materialized "Direct link to Change the way your model is materialized") One of the most powerful features of dbt is that you can change the way a model is materialized in your warehouse, simply by changing a configuration value. You can change things between tables and views by changing a keyword rather than writing the data definition language (DDL) to do this behind the scenes. By default, everything gets created as a view. You can override that at the directory level so everything in that directory will materialize to a different materialization. 1. Edit your `dbt_project.yml` file. * Update your project `name` to: dbt\_project.yml ```yaml name: 'jaffle_shop' ``` * Configure `jaffle_shop` so everything in it will be materialized as a table; and configure `example` so everything in it will be materialized as a view. Update your `models` config in the project YAML file to: dbt\_project.yml ```yaml models: jaffle_shop: +materialized: table example: +materialized: view ``` * Click **Save**. 2. Enter the `dbt run` command. Your `customers` model should now be built as a table! info To do this, dbt had to first run a `drop view` statement (or API call on BigQuery), then a `create table as` statement. 3. Edit `models/customers.sql` to override the `dbt_project.yml` for the `customers` model only by adding the following snippet to the top, and click **Save**: models/customers.sql ```sql {{ config( materialized='view' ) }} with customers as ( select id as customer_id ... ) ``` 4. Enter the `dbt run` command. Your model, `customers`, should now build as a view. * BigQuery users need to run `dbt run --full-refresh` instead of `dbt run` to full apply materialization changes. 5. Enter the `dbt run --full-refresh` command for this to take effect in your warehouse. ##### FAQs[​](#faqs "Direct link to FAQs") What materializations are available in dbt? dbt ships with five built-in materializations: `view`, `table`, `incremental`, `ephemeral`, and `materialized_view`. Check out the documentation on [materializations](https://docs.getdbt.com/docs/build/materializations.md) for more information on each of these options. You can also create your own [custom materializations](https://docs.getdbt.com/guides/create-new-materializations.md). This is an advanced feature of dbt. Which materialization should I use for my model? Start out with views, and then change models to tables when required for performance reasons (i.e. downstream queries have slowed). Check out the [docs on materializations](https://docs.getdbt.com/docs/build/materializations.md) for advice on when to use each materialization. What model configurations exist? You can also configure: * [tags](https://docs.getdbt.com/reference/resource-configs/tags.md) to support easy categorization and graph selection * [custom schemas](https://docs.getdbt.com/reference/resource-properties/schema.md) to split your models across multiple schemas * [aliases](https://docs.getdbt.com/reference/resource-configs/alias.md) if your view/table name should differ from the filename * Snippets of SQL to run at the start or end of a model, known as [hooks](https://docs.getdbt.com/docs/build/hooks-operations.md) * Warehouse-specific configurations for performance (e.g. `sort` and `dist` keys on Redshift, `partitions` on BigQuery) Check out the docs on [model configurations](https://docs.getdbt.com/reference/model-configs.md) to learn more. #### Delete the example models[​](#delete-the-example-models "Direct link to Delete the example models") You can now delete the files that dbt created when you initialized the project: 1. Delete the `models/example/` directory. 2. Delete the `example:` key from your `dbt_project.yml` file, and any configurations that are listed under it. dbt\_project.yml ```yaml # before models: jaffle_shop: +materialized: table example: +materialized: view ``` dbt\_project.yml ```yaml # after models: jaffle_shop: +materialized: table ``` 3. Save your changes. ###### FAQs[​](#faqs "Direct link to FAQs") How do I remove deleted models from my data warehouse? If you delete a model from your dbt project, dbt does not automatically drop the relation from your schema. This means that you can end up with extra objects in schemas that dbt creates, which can be confusing to other users. (This can also happen when you switch a model from being a view or table, to ephemeral) When you remove models from your dbt project, you should manually drop the related relations from your schema. I got an "unused model configurations" error message, what does this mean? You might have forgotten to nest your configurations under your project name, or you might be trying to apply configurations to a directory that doesn't exist. Check out this [article](https://discourse.getdbt.com/t/faq-i-got-an-unused-model-configurations-error-message-what-does-this-mean/112) to understand more. #### Build models on top of other models[​](#build-models-on-top-of-other-models "Direct link to Build models on top of other models") As a best practice in SQL, you should separate logic that cleans up your data from logic that transforms your data. You have already started doing this in the existing query by using common table expressions (CTEs). Now you can experiment by separating the logic out into separate models and using the [ref](https://docs.getdbt.com/reference/dbt-jinja-functions/ref.md) function to build models on top of other models: [![The DAG we want for our dbt project](/img/dbt-dag.png?v=2 "The DAG we want for our dbt project")](#)The DAG we want for our dbt project 1. Create a new SQL file, `models/stg_customers.sql`, with the SQL from the `customers` CTE in our original query. 2. Create a second new SQL file, `models/stg_orders.sql`, with the SQL from the `orders` CTE in our original query. models/stg\_customers.sql ```sql select id as customer_id, first_name, last_name from dbt_quickstart.jaffle_shop.jaffle_shop_customers ``` models/stg\_orders.sql ```sql select id as order_id, user_id as customer_id, order_date, status from dbt_quickstart.jaffle_shop.jaffle_shop_orders ``` 3. Edit the SQL in your `models/customers.sql` file as follows: models/customers.sql ```sql with customers as ( select * from {{ ref('stg_customers') }} ), orders as ( select * from {{ ref('stg_orders') }} ), customer_orders as ( select customer_id, min(order_date) as first_order_date, max(order_date) as most_recent_order_date, count(order_id) as number_of_orders from orders group by 1 ), final as ( select customers.customer_id, customers.first_name, customers.last_name, customer_orders.first_order_date, customer_orders.most_recent_order_date, coalesce(customer_orders.number_of_orders, 0) as number_of_orders from customers left join customer_orders on customers.customer_id = customer_orders.customer_id ) select * from final ``` 4. Execute `dbt run`. This time, when you performed a `dbt run`, separate views/tables were created for `stg_customers`, `stg_orders` and `customers`. dbt inferred the order to run these models. Because `customers` depends on `stg_customers` and `stg_orders`, dbt builds `customers` last. You do not need to explicitly define these dependencies. ###### FAQs[​](#faq-2 "Direct link to FAQs") How do I run one model at a time? To run one model, use the `--select` flag (or `-s` flag), followed by the name of the model: ```shell $ dbt run --select customers ``` Check out the [model selection syntax documentation](https://docs.getdbt.com/reference/node-selection/syntax.md) for more operators and examples. Do ref-able resource names need to be unique? Within one project: yes! To build dependencies between resources (such as models, seeds, and snapshots), you need to use the `ref` function, and pass in the resource name as an argument. dbt uses that resource name to uniquely resolve the `ref` to a specific resource. As a result, these resource names need to be unique, *even if they are in distinct folders*. A resource in one project can have the same name as a resource in another project (installed as a dependency). dbt uses the project name to uniquely identify each resource. We call this "namespacing." If you `ref` a resource with a duplicated name, it will resolve to the resource within the same namespace (package or project), or raise an error because of an ambiguous reference. Use [two-argument `ref`](https://docs.getdbt.com/reference/dbt-jinja-functions/ref.md#ref-project-specific-models) to disambiguate references by specifying the namespace. Those resource will still need to land in distinct locations in the data warehouse. Read the docs on [custom aliases](https://docs.getdbt.com/docs/build/custom-aliases.md) and [custom schemas](https://docs.getdbt.com/docs/build/custom-schemas.md) for details on how to achieve this. As I create more models, how should I keep my project organized? What should I name my models? There's no one best way to structure a project! Every organization is unique. If you're just getting started, check out how we (dbt Labs) [structure our dbt projects](https://docs.getdbt.com/best-practices/how-we-structure/1-guide-overview.md). #### Add tests to your models[​](#add-tests-to-your-models "Direct link to Add tests to your models") Adding [data tests](https://docs.getdbt.com/docs/build/data-tests.md) to a project helps validate that your models are working correctly. To add data tests to your project: 1. Create a new YAML file in the `models` directory, named `models/schema.yml` 2. Add the following contents to the file: models/schema.yml ```yaml version: 2 models: - name: customers columns: - name: customer_id data_tests: - unique - not_null - name: stg_customers columns: - name: customer_id data_tests: - unique - not_null - name: stg_orders columns: - name: order_id data_tests: - unique - not_null - name: status data_tests: - accepted_values: arguments: # available in v1.10.5 and higher. Older versions can set the as the top-level property. values: ['placed', 'shipped', 'completed', 'return_pending', 'returned'] - name: customer_id data_tests: - not_null - relationships: arguments: to: ref('stg_customers') field: customer_id ``` 3. Run `dbt test`, and confirm that all your tests passed. When you run `dbt test`, dbt iterates through your YAML files, and constructs a query for each test. Each query will return the number of records that fail the test. If this number is 0, then the test is successful. ###### FAQs[​](#faqs "Direct link to FAQs") What tests are available for me to use in dbt? Can I add my own custom tests? Out of the box, dbt ships with the following data tests: * `unique` * `not_null` * `accepted_values` * `relationships` (for example, referential integrity) You can also write your own [custom generic tests](https://docs.getdbt.com/docs/build/data-tests.md#generic-data-tests). Some additional generic tests have been open-sourced in the [dbt-utils package](https://github.com/dbt-labs/dbt-utils#generic-tests). Check out the docs on [packages](https://docs.getdbt.com/docs/build/packages.md) to learn how to make these tests available in your project. How do I test one model at a time? Running tests on one model looks very similar to running a model: use the `--select` flag (or `-s` flag), followed by the name of the model: ```shell dbt test --select customers ``` Check out the [model selection syntax documentation](https://docs.getdbt.com/reference/node-selection/syntax.md) for full syntax, and [test selection examples](https://docs.getdbt.com/reference/node-selection/test-selection-examples.md) in particular. One of my tests failed, how can I debug it? To debug a failing test, find the SQL that dbt ran by: * dbt: * Within the test output, click on the failed test, and then select "Details". * dbt Core: * Open the file path returned as part of the error message. * Navigate to the `target/compiled/schema_tests` directory for all compiled test queries. Copy the SQL into a query editor (in dbt, you can paste it into a new `Statement`), and run the query to find the records that failed. Does my test file need to be named \`schema.yml\`? No! You can name this file whatever you want (including `whatever_you_want.yml`), so long as: * The file is in your `models/` directory¹ * The file has `.yml` extension Check out the [docs](https://docs.getdbt.com/reference/configs-and-properties.md) for more information. ¹If you're declaring properties for seeds, snapshots, or macros, you can also place this file in the related directory — `seeds/`, `snapshots/` and `macros/` respectively. Why do model and source YAML files always start with \`version: 2\`? Once upon a time, the structure of these `.yml` files was very different (s/o to anyone who was using dbt back then!). Adding `version: 2` allowed us to make this structure more extensible. From [dbt Core v1.5](), the top-level `version:` key is optional in all resource YAML files. If present, only `version: 2` is supported. Also starting in v1.5, both the [`config-version: 2`](https://docs.getdbt.com/reference/project-configs/config-version.md) and the top-level `version:` key in the `dbt_project.yml` are optional. Resource YAML files do not currently require this config. We only support `version: 2` if it's specified. Although we do not expect to update YAML files to `version: 3` soon, having this config will make it easier for us to introduce new structures in the future What data tests should I add to my project? We recommend that every model has a data test on a primary key, that is, a column that is `unique` and `not_null`. We also recommend that you test any assumptions on your source data. For example, if you believe that your payments can only be one of three payment methods, you should test that assumption regularly — a new payment method may introduce logic errors in your SQL. In advanced dbt projects, we recommend using [sources](https://docs.getdbt.com/docs/build/sources.md) and running these source data-integrity tests against the sources rather than models. When should I run my data tests? You should run your data tests whenever you are writing new code (to ensure you haven't broken any existing models by changing SQL), and whenever you run your transformations in production (to ensure that your assumptions about your source data are still valid). #### Document your models[​](#document-your-models "Direct link to Document your models") Adding [documentation](https://docs.getdbt.com/docs/build/documentation.md) to your project allows you to describe your models in rich detail, and share that information with your team. Here, we're going to add some basic documentation to our project. Update your `models/schema.yml` file to include some descriptions, such as those below. models/schema.yml ```yaml version: 2 models: - name: customers description: One record per customer columns: - name: customer_id description: Primary key data_tests: - unique - not_null - name: first_order_date description: NULL when a customer has not yet placed an order. - name: stg_customers description: This model cleans up customer data columns: - name: customer_id description: Primary key data_tests: - unique - not_null - name: stg_orders description: This model cleans up order data columns: - name: order_id description: Primary key data_tests: - unique - not_null - name: status data_tests: - accepted_values: arguments: # available in v1.10.5 and higher. Older versions can set the as the top-level property. values: ['placed', 'shipped', 'completed', 'return_pending', 'returned'] - name: customer_id data_tests: - not_null - relationships: arguments: to: ref('stg_customers') field: customer_id ``` * View in Catalog * View in Studio IDE [Catalog](https://docs.getdbt.com/docs/explore/explore-projects.md) provides powerful tools to interact with your dbt projects, including documentation: 1. From the IDE, run one of the following commands: * `dbt docs generate` if you're on dbt Core * `dbt build` if you're on the dbt Fusion engine 2. Click **Catalog** in the navigation menu to launch Catalog. 3. In the Catalog pane, click the environment selection dropdown menu at the top of the file tree and change it from **Production** to **Development**. [![View your development environment information.](/img/docs/collaborate/dbt-explorer/catalog-nav-dropdown.png?v=2 "View your development environment information.")](#)View your development environment information. 4. Select your project from the file tree. 5. Use the search bar or browse the resource list to find the `customers` model. 6. Click the model to view its details, including the descriptions you added. [![View your model's documentation and lineage in Catalog.](/img/docs/collaborate/dbt-explorer/example-model-details.png?v=2 "View your model's documentation and lineage in Catalog.")](#)View your model's documentation and lineage in Catalog. Catalog displays your model's description, column documentation, data tests, and lineage graph. You can also see which columns are missing documentation and track test coverage across your project. You can view docs directly from the IDE if you're on `Latest` or another version of dbt Core. Keep in mind that this is a legacy view and doesn't offer the same level of interactivity as Catalog. 1. In the IDE, run `dbt docs generate`. 2. From the navigation bar, click the **View docs** icon located to the right of the **branch name**. [![The View docs icon in the Studio IDE.](/img/docs/collaborate/dbt-explorer/docs-icon.png?v=2 "The View docs icon in the Studio IDE.")](#)The View docs icon in the Studio IDE. 3. From **Projects**, select your project name and expand the folders. 4. Click **models** > **marts** > **customers**. [![View your model's documentation in the legacy docs view.](/img/docs/collaborate/dbt-explorer/legacy-docs-view.png?v=2 "View your model's documentation in the legacy docs view.")](#)View your model's documentation in the legacy docs view. ###### FAQs[​](#faqs "Direct link to FAQs") How do I write long-form explanations in my descriptions? If you need more than a sentence to explain a model, you can: 1. Split your description over multiple lines using `>`. Interior line breaks are removed and Markdown can be used. This method is recommended for simple, single-paragraph descriptions: ```yml models: - name: customers description: > Lorem ipsum **dolor** sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. ``` 2. Split your description over multiple lines using `|`. Interior line breaks are maintained and Markdown can be used. This method is recommended for more complex descriptions: ```yml models: - name: customers description: | ### Lorem ipsum * dolor sit amet, consectetur adipisicing elit, sed do eiusmod * tempor incididunt ut labore et dolore magna aliqua. ``` 3. Use a [docs block](https://docs.getdbt.com/docs/build/documentation.md#using-docs-blocks) to write the description in a separate Markdown file. How do I access documentation in dbt Catalog? If you're using dbt to deploy your project and have a [Starter, Enterprise, or Enterprise+ plan](https://www.getdbt.com/pricing/), you can use Catalog to view your project's [resources](https://docs.getdbt.com/docs/build/projects.md) (such as models, tests, and metrics) and their lineage to gain a better understanding of its latest production state. Access Catalog in dbt by clicking the **Catalog** link in the navigation. You can have up to 5 read-only users access the documentation for your project. dbt developer plan and dbt Core users can use [dbt Docs](https://docs.getdbt.com/docs/explore/build-and-view-your-docs.md#dbt-docs), which generates basic documentation but it doesn't offer the same speed, metadata, or visibility as Catalog. #### Commit your changes[​](#commit-your-changes "Direct link to Commit your changes") Now that you've built your customer model, you need to commit the changes you made to the project so that the repository has your latest code. **If you edited directly in the protected primary branch:**
1. Click the **Commit and sync git** button. This action prepares your changes for commit. 2. A modal titled **Commit to a new branch** will appear. 3. In the modal window, name your new branch `add-customers-model`. This branches off from your primary branch with your new changes. 4. Add a commit message, such as "Add customers model, tests, docs" and and commit your changes. 5. Click **Merge this branch to main** to add these changes to the main branch on your repo. **If you created a new branch before editing:**
1. Since you already branched out of the primary protected branch, go to **Version Control** on the left. 2. Click **Commit and sync** to add a message. 3. Add a commit message, such as "Add customers model, tests, docs." 4. Click **Merge this branch to main** to add these changes to the main branch on your repo. #### Deploy dbt[​](#deploy-dbt "Direct link to Deploy dbt") Use dbt's Scheduler to deploy your production jobs confidently and build observability into your processes. You'll learn to create a deployment environment and run a job in the following steps. ##### Create a deployment environment[​](#create-a-deployment-environment "Direct link to Create a deployment environment") 1. From the main menu, go to **Orchestration** > **Environments**. 2. Click **Create environment**. 3. In the **Name** field, write the name of your deployment environment. For example, "Production." 4. The **dbt version** will default to the latest available. We recommend all new projects run on the latest version of dbt. 5. Under **Deployment connection**, enter the name of the dataset you want to use as the target, such as "Analytics". This will allow dbt to build and work with that dataset. For some data warehouses, the target dataset may be referred to as a "schema". 6. Click **Save**. ##### Create and run a job[​](#create-and-run-a-job "Direct link to Create and run a job") Jobs are a set of dbt commands that you want to run on a schedule. For example, `dbt build`. As the `jaffle_shop` business gains more customers, and those customers create more orders, you will see more records added to your source data. Because you materialized the `customers` model as a table, you'll need to periodically rebuild your table to ensure that the data stays up-to-date. This update will happen when you run a job. 1. After creating your deployment environment, you should be directed to the page for a new environment. If not, select **Orchestration** from the main menu, then click **Jobs**. 2. Click **Create job** > **Deploy job**. 3. Provide a job name (for example, "Production run") and select the environment you just created. 4. Scroll down to the **Execution settings** section. 5. Under **Commands**, add this command as part of your job if you don't see it: * `dbt build` 6. Select the **Generate docs on run** option to automatically [generate updated project docs](https://docs.getdbt.com/docs/explore/build-and-view-your-docs.md) each time your job runs. 7. For this exercise, do *not* set a schedule for your project to run — while your organization's project should run regularly, there's no need to run this example project on a schedule. Scheduling a job is sometimes referred to as *deploying a project*. 8. Click **Save**, then click **Run now** to run your job. 9. Click the run and watch its progress under **Run summary**. 10. Once the run is complete, click **View Documentation** to see the docs for your project. Congratulations 🎉! You've just deployed your first dbt project! ###### FAQs[​](#faqs "Direct link to FAQs") What happens if one of my runs fails? If you're using dbt, we recommend setting up email and Slack notifications (`Account Settings > Notifications`) for any failed runs. Then, debug these runs the same way you would debug any runs in development. #### Connect to multiple data sources[​](#connect-to-multiple-data-sources "Direct link to Connect to multiple data sources") This quickstart focuses on using dbt to run models against a data lake (S3) by using Starburst Galaxy as the query engine. In most real world scenarios, the data that is needed for running models is actually spread across multiple data sources and is stored in a variety of formats. With Starburst Galaxy, Starburst Enterprise, and Trino, you can run your models on any of the data you need, no matter where it is stored. If you want to try this out, you can refer to the [Starburst Galaxy docs](https://docs.starburst.io/starburst-galaxy/catalogs/) to add more data sources and load the Jaffle Shop data into the source you select. Then, extend your models to query the new data source and the data source you created in this quickstart. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Quickstart for dbt and Teradata [Back to guides](https://docs.getdbt.com/guides.md) dbt platform Quickstart Teradata Beginner [Menu ]() #### Introduction[​](#introduction "Direct link to Introduction") In this quickstart guide, you'll learn how to use dbt with Teradata Vantage. It will show you how to: * Create a new Teradata Clearscape instance * Load sample data into your Teradata Database * Connect dbt to Teradata. * Take a sample query and turn it into a model in your dbt project. A model in dbt is a select statement. * Add tests to your models. * Document your models. * Schedule a job to run. Videos for you You can check out [dbt Fundamentals](https://learn.getdbt.com/courses/dbt-fundamentals) for free if you're interested in course learning with videos. ##### Prerequisites​[​](#prerequisites "Direct link to Prerequisites​") * You have a [dbt account](https://www.getdbt.com/signup/). * You have access to a Teradata Vantage instance. You can provision one for free at . See [the ClearScape Analytics Experience guide](https://developers.teradata.com/quickstarts/get-access-to-vantage/clearscape-analytics-experience/getting-started-with-csae/) for details. ##### Related content[​](#related-content "Direct link to Related content") * Learn more with [dbt Learn courses](https://learn.getdbt.com) * [How we provision Teradata Clearscape Vantage instance](https://developers.teradata.com/quickstarts/get-access-to-vantage/clearscape-analytics-experience/getting-started-with-csae/) * [CI jobs](https://docs.getdbt.com/docs/deploy/continuous-integration.md) * [Deploy jobs](https://docs.getdbt.com/docs/deploy/deploy-jobs.md) * [Job notifications](https://docs.getdbt.com/docs/deploy/job-notifications.md) * [Source freshness](https://docs.getdbt.com/docs/deploy/source-freshness.md) #### Load data[​](#load-data "Direct link to Load data") The following steps will guide you through how to get the data stored as CSV files in a public S3 bucket and insert it into the tables. SQL Studio IDE If you created your Teradata Vantage database instance at and you don't have an SQL Studio IDE handy, use the JupyterLab bundled with your database to execute SQL: 1. Navigate to [ClearScape Analytics Experience dashboard](https://clearscape.teradata.com/dashboard) and click the **Run Demos** button. The demo will launch JupyterLab. 2. In JupyterLab, go to **Launcher** by clicking the blue **+** icon in the top left corner. Find the **Notebooks** section and click **Teradata SQL**. 3. In the notebook's first cell, connect to the database using `connect` magic. You will be prompted to enter your database password when you execute it: ```ipynb %connect local ``` 4. Use additional cells to type and run SQL statements. 1) Use your preferred SQL IDE editor to create the database, `jaffle_shop`: ```sql CREATE DATABASE jaffle_shop AS PERM = 1e9; ``` 2) In `jaffle_shop` database, create three foreign tables and reference the respective csv files located in object storage: ```sql CREATE FOREIGN TABLE jaffle_shop.customers ( id integer, first_name varchar (100), last_name varchar (100), email varchar (100) ) USING ( LOCATION ('/gs/storage.googleapis.com/clearscape_analytics_demo_data/dbt/raw_customers.csv') ) NO PRIMARY INDEX; CREATE FOREIGN TABLE jaffle_shop.orders ( id integer, user_id integer, order_date date, status varchar(100) ) USING ( LOCATION ('/gs/storage.googleapis.com/clearscape_analytics_demo_data/dbt/raw_orders.csv') ) NO PRIMARY INDEX; CREATE FOREIGN TABLE jaffle_shop.payments ( id integer, orderid integer, paymentmethod varchar (100), amount integer ) USING ( LOCATION ('/gs/storage.googleapis.com/clearscape_analytics_demo_data/dbt/raw_payments.csv') ) NO PRIMARY INDEX; ``` #### Connect dbt to Teradata[​](#connect-dbt-to-teradata "Direct link to Connect dbt to Teradata") 1. Create a new project in dbt. Click on your account name in the left side menu, select **Account settings**, and click **+ New Project**. 2. Enter a project name and click **Continue**. 3. In the **Configure your development environment** section, click the **Connection** dropdown menu and select **Add new connection**. 4. In the **Type** section, select **Teradata**. 5. Enter your Teradata settings and click **Save**. [![dbt - Choose Teradata Connection](/img/teradata/dbt_cloud_teradata_setup_connection_start.png?v=2 "dbt - Choose Teradata Connection")](#)dbt - Choose Teradata Connection [![dbt - Teradata Account Settings](/img/teradata/dbt_cloud_teradata_account_settings.png?v=2 "dbt - Teradata Account Settings")](#)dbt - Teradata Account Settings 6. Set up your personal development credentials by going to **Your profile** > **Credentials**. 7. Select your project that uses the Teradata connection. 8. Click the **configure your development environment and add a connection** link. This directs you to a page where you can enter your personal development credentials. 9. Enter your **Development credentials** for Teradata with: * **Username** — The username of Teradata database. * **Password** — The password of Teradata database. * **Schema** — The default database to use. [![dbt - Teradata Development Credentials](/img/teradata/dbt_cloud_teradata_development_credentials.png?v=2 "dbt - Teradata Development Credentials")](#)dbt - Teradata Development Credentials 10. Click **Test Connection** to verify that dbt can access your Teradata Vantage instance. 11. If the test succeeded, click **Save** to complete the configuration. If it failed, you may need to check your Teradata settings and credentials. #### Set up a dbt managed repository[​](#set-up-a-dbt-managed-repository "Direct link to Set up a dbt managed repository") When you develop in dbt, you can leverage [Git](https://docs.getdbt.com/docs/cloud/git/git-version-control.md) to version control your code. To connect to a repository, you can either set up a dbt-hosted [managed repository](https://docs.getdbt.com/docs/cloud/git/managed-repository.md) or directly connect to a [supported git provider](https://docs.getdbt.com/docs/cloud/git/connect-github.md). Managed repositories are a great way to trial dbt without needing to create a new repository. In the long run, it's better to connect to a supported git provider to use features like automation and [continuous integration](https://docs.getdbt.com/docs/deploy/continuous-integration.md). To set up a managed repository: 1. Under "Setup a repository", select **Managed**. 2. Type a name for your repo such as `bbaggins-dbt-quickstart` 3. Click **Create**. It will take a few seconds for your repository to be created and imported. 4. Once you see the "Successfully imported repository," click **Continue**. #### Initialize your dbt project​ and start developing[​](#initialize-your-dbt-project-and-start-developing "Direct link to Initialize your dbt project​ and start developing") Now that you have a repository configured, you can initialize your project and start development in dbt: 1. Click **Start developing in the Studio IDE**. It might take a few minutes for your project to spin up for the first time as it establishes your git connection, clones your repo, and tests the connection to the warehouse. 2. Above the file tree to the left, click **Initialize your project** to build out your folder structure with example models. 3. Make your initial commit by clicking **Commit and sync**. Use the commit message `initial commit` to create the first commit to your managed repo. Once you’ve created the commit, you can open a branch to add new dbt code. #### Delete the example models[​](#delete-the-example-models "Direct link to Delete the example models") You can now delete the files that dbt created when you initialized the project: 1. Delete the `models/example/` directory. 2. Delete the `example:` key from your `dbt_project.yml` file, and any configurations that are listed under it. dbt\_project.yml ```yaml # before models: my_new_project: +materialized: table example: +materialized: view ``` dbt\_project.yml ```yaml # after models: my_new_project: +materialized: table ``` 3. Save your changes. 4. Commit your changes and merge to the main branch. ###### FAQs[​](#faqs "Direct link to FAQs") How do I remove deleted models from my data warehouse? If you delete a model from your dbt project, dbt does not automatically drop the relation from your schema. This means that you can end up with extra objects in schemas that dbt creates, which can be confusing to other users. (This can also happen when you switch a model from being a view or table, to ephemeral) When you remove models from your dbt project, you should manually drop the related relations from your schema. I got an "unused model configurations" error message, what does this mean? You might have forgotten to nest your configurations under your project name, or you might be trying to apply configurations to a directory that doesn't exist. Check out this [article](https://discourse.getdbt.com/t/faq-i-got-an-unused-model-configurations-error-message-what-does-this-mean/112) to understand more. #### Build your first model[​](#build-your-first-model "Direct link to Build your first model") You have two options for working with files in the Studio IDE: * Create a new branch (recommended) — Create a new branch to edit and commit your changes. Navigate to **Version Control** on the left sidebar and click **Create branch**. * Edit in the protected primary branch — If you prefer to edit, format, lint files, or execute dbt commands directly in your primary git branch. The Studio IDE prevents commits to the protected branch, so you will receive a prompt to commit your changes to a new branch. Name the new branch `add-customers-model`. 1. Click the **...** next to the `models` directory, then select **Create file**. 2. Name the file `bi_customers.sql`, then click **Create**. 3. Copy the following query into the file and click **Save**. ```sql with customers as ( select id as customer_id, first_name, last_name from jaffle_shop.customers ), orders as ( select id as order_id, user_id as customer_id, order_date, status from jaffle_shop.orders ), customer_orders as ( select customer_id, min(order_date) as first_order_date, max(order_date) as most_recent_order_date, count(order_id) as number_of_orders from orders group by 1 ), final as ( select customers.customer_id, customers.first_name, customers.last_name, customer_orders.first_order_date, customer_orders.most_recent_order_date, coalesce(customer_orders.number_of_orders, 0) as number_of_orders from customers left join customer_orders on customers.customer_id = customer_orders.customer_id ) select * from final ``` 4. Enter `dbt run` in the command prompt at the bottom of the screen. You should get a successful run and see the three models. You can connect your business intelligence (BI) tools to these views and tables so they only read cleaned-up data rather than raw data in your BI tool. #### Change the way your model is materialized[​](#change-the-way-your-model-is-materialized "Direct link to Change the way your model is materialized") One of the most powerful features of dbt is that you can change the way a model is materialized in your warehouse, simply by changing a configuration value. You can change things between tables and views by changing a keyword rather than writing the data definition language (DDL) to do this behind the scenes. By default, everything gets created as a view. You can override that at the directory level so everything in that directory will materialize to a different materialization. 1. Edit your `dbt_project.yml` file. * Update your project `name` to: dbt\_project.yml ```yaml name: 'jaffle_shop' ``` * Configure `jaffle_shop` so everything in it will be materialized as a table; and configure `example` so everything in it will be materialized as a view. Update your `models` config in the project YAML file to: dbt\_project.yml ```yaml models: jaffle_shop: +materialized: table ``` * Click **Save**. 2. Enter the `dbt run` command. Your `bi_customers` model should now be built as a table! info To do this, dbt had to first run a `drop view` statement (or API call on BigQuery), then a `create table as` statement. 3. Edit `models/bi_customers.sql` to override the `dbt_project.yml` for the `customers` model only by adding the following snippet to the top, and click **Save**: models/bi\_customers.sql ```sql {{ config( materialized='view' ) }} with customers as ( select id as customer_id ... ) ``` 4. Enter the `dbt run` command. Your model, `bi_customers`, should now build as a view. ##### FAQs[​](#faqs-1 "Direct link to FAQs") What materializations are available in dbt? dbt ships with five built-in materializations: `view`, `table`, `incremental`, `ephemeral`, and `materialized_view`. Check out the documentation on [materializations](https://docs.getdbt.com/docs/build/materializations.md) for more information on each of these options. You can also create your own [custom materializations](https://docs.getdbt.com/guides/create-new-materializations.md). This is an advanced feature of dbt. Which materialization should I use for my model? Start out with views, and then change models to tables when required for performance reasons (i.e. downstream queries have slowed). Check out the [docs on materializations](https://docs.getdbt.com/docs/build/materializations.md) for advice on when to use each materialization. What model configurations exist? You can also configure: * [tags](https://docs.getdbt.com/reference/resource-configs/tags.md) to support easy categorization and graph selection * [custom schemas](https://docs.getdbt.com/reference/resource-properties/schema.md) to split your models across multiple schemas * [aliases](https://docs.getdbt.com/reference/resource-configs/alias.md) if your view/table name should differ from the filename * Snippets of SQL to run at the start or end of a model, known as [hooks](https://docs.getdbt.com/docs/build/hooks-operations.md) * Warehouse-specific configurations for performance (e.g. `sort` and `dist` keys on Redshift, `partitions` on BigQuery) Check out the docs on [model configurations](https://docs.getdbt.com/reference/model-configs.md) to learn more. #### Build models on top of other models[​](#build-models-on-top-of-other-models "Direct link to Build models on top of other models") As a best practice in SQL, you should separate logic that cleans up your data from logic that transforms your data. You have already started doing this in the existing query by using common table expressions (CTEs). Now you can experiment by separating the logic out into separate models and using the [ref](https://docs.getdbt.com/reference/dbt-jinja-functions/ref.md) function to build models on top of other models: [![The DAG we want for our dbt project](/img/dbt-dag.png?v=2 "The DAG we want for our dbt project")](#)The DAG we want for our dbt project 1. Create a new SQL file, `models/stg_customers.sql`, with the SQL from the `customers` CTE in your original query. models/stg\_customers.sql ```sql select id as customer_id, first_name, last_name from jaffle_shop.customers ``` 2. Create a second new SQL file, `models/stg_orders.sql`, with the SQL from the `orders` CTE in your original query. models/stg\_orders.sql ```sql select id as order_id, user_id as customer_id, order_date, status from jaffle_shop.orders ``` 3. Edit the SQL in your `models/bi_customers.sql` file as follows: models/bi\_customers.sql ```sql with customers as ( select * from {{ ref('stg_customers') }} ), orders as ( select * from {{ ref('stg_orders') }} ), customer_orders as ( select customer_id, min(order_date) as first_order_date, max(order_date) as most_recent_order_date, count(order_id) as number_of_orders from orders group by 1 ), final as ( select customers.customer_id, customers.first_name, customers.last_name, customer_orders.first_order_date, customer_orders.most_recent_order_date, coalesce(customer_orders.number_of_orders, 0) as number_of_orders from customers left join customer_orders on customers.customer_id = customer_orders.customer_id ) select * from final ``` 4. Execute `dbt run`. This time, when you performed a `dbt run`, it created separate views/tables for `stg_customers`, `stg_orders`, and `customers`. dbt inferred the order in which these models should run. Because `customers` depends on `stg_customers` and `stg_orders`, dbt builds `customers` last. You don’t need to define these dependencies explicitly. ###### FAQs[​](#faq-2 "Direct link to FAQs") How do I run one model at a time? To run one model, use the `--select` flag (or `-s` flag), followed by the name of the model: ```shell $ dbt run --select customers ``` Check out the [model selection syntax documentation](https://docs.getdbt.com/reference/node-selection/syntax.md) for more operators and examples. Do ref-able resource names need to be unique? Within one project: yes! To build dependencies between resources (such as models, seeds, and snapshots), you need to use the `ref` function, and pass in the resource name as an argument. dbt uses that resource name to uniquely resolve the `ref` to a specific resource. As a result, these resource names need to be unique, *even if they are in distinct folders*. A resource in one project can have the same name as a resource in another project (installed as a dependency). dbt uses the project name to uniquely identify each resource. We call this "namespacing." If you `ref` a resource with a duplicated name, it will resolve to the resource within the same namespace (package or project), or raise an error because of an ambiguous reference. Use [two-argument `ref`](https://docs.getdbt.com/reference/dbt-jinja-functions/ref.md#ref-project-specific-models) to disambiguate references by specifying the namespace. Those resource will still need to land in distinct locations in the data warehouse. Read the docs on [custom aliases](https://docs.getdbt.com/docs/build/custom-aliases.md) and [custom schemas](https://docs.getdbt.com/docs/build/custom-schemas.md) for details on how to achieve this. As I create more models, how should I keep my project organized? What should I name my models? There's no one best way to structure a project! Every organization is unique. If you're just getting started, check out how we (dbt Labs) [structure our dbt projects](https://docs.getdbt.com/best-practices/how-we-structure/1-guide-overview.md). #### Build models on top of sources[​](#build-models-on-top-of-sources "Direct link to Build models on top of sources") Sources make it possible to name and describe the data loaded into your warehouse by your extract and load tools. By declaring these tables as sources in dbt, you can: * Select from source tables in your models using the `{{ source() }}` function, helping define the lineage of your data * Test your assumptions about your source data * Calculate the freshness of your source data 1. Create a new YML file, `models/sources.yml`. 2. Declare the sources by copying the following into the file and clicking **Save**. models/sources.yml ```yml version: 2 sources: - name: jaffle_shop description: This is a replica of the Postgres database used by the app database: raw schema: jaffle_shop tables: - name: customers description: One record per customer. - name: orders description: One record per order. Includes canceled and deleted orders. ``` 3. Edit the `models/stg_customers.sql` file to select from the `customers` table in the `jaffle_shop` source. models/stg\_customers.sql ```sql select id as customer_id, first_name, last_name from {{ source('jaffle_shop', 'customers') }} ``` 4. Edit the `models/stg_orders.sql` file to select from the `orders` table in the `jaffle_shop` source. models/stg\_orders.sql ```sql select id as order_id, user_id as customer_id, order_date, status from {{ source('jaffle_shop', 'orders') }} ``` 5. Execute `dbt run`. Your `dbt run` results will be the same as those in the previous step. Your `stg_customers` and `stg_orders` models will still query from the same raw data source in Teradata. By using `source`, you can test and document your raw data and also understand the lineage of your sources. #### Add data tests to your models[​](#add-data-tests-to-your-models "Direct link to Add data tests to your models") Adding [data tests](https://docs.getdbt.com/docs/build/data-tests.md) to a project helps validate that your models are working correctly. To add data tests to your project: 1. Create a new properties YAML file in the `models` directory, named `models/schema.yml` 2. Add the following contents to the file: models/schema.yml ```yaml version: 2 models: - name: bi_customers columns: - name: customer_id data_tests: - unique - not_null - name: stg_customers columns: - name: customer_id data_tests: - unique - not_null - name: stg_orders columns: - name: order_id data_tests: - unique - not_null - name: status data_tests: - accepted_values: arguments: # available in v1.10.5 and higher. Older versions can set the as the top-level property. values: ['placed', 'shipped', 'completed', 'return_pending', 'returned'] - name: customer_id data_tests: - not_null - relationships: arguments: to: ref('stg_customers') field: customer_id ``` 3. Run `dbt test`, and confirm that all your tests passed. When you run `dbt test`, dbt iterates through your YAML files, and constructs a query for each data test. Each query will return the number of records that fail the test. If this number is 0, then the data test is successful. ###### FAQs[​](#faqs-2 "Direct link to FAQs") What data tests are available for me to use in dbt? Can I add my own custom tests? Out of the box, dbt ships with the following data tests: * `unique` * `not_null` * `accepted_values` * `relationships` (for example, referential integrity) You can also write your own [custom generic tests](https://docs.getdbt.com/docs/build/data-tests.md#generic-data-tests). Some additional generic tests have been open-sourced in the [dbt-utils package](https://github.com/dbt-labs/dbt-utils#generic-tests). Check out the docs on [packages](https://docs.getdbt.com/docs/build/packages.md) to learn how to make these tests available in your project. How do I test one model at a time? Running tests on one model looks very similar to running a model: use the `--select` flag (or `-s` flag), followed by the name of the model: ```shell dbt test --select customers ``` Check out the [model selection syntax documentation](https://docs.getdbt.com/reference/node-selection/syntax.md) for full syntax, and [test selection examples](https://docs.getdbt.com/reference/node-selection/test-selection-examples.md) in particular. One of my tests failed, how can I debug it? To debug a failing test, find the SQL that dbt ran by: * dbt: * Within the test output, click on the failed test, and then select "Details". * dbt Core: * Open the file path returned as part of the error message. * Navigate to the `target/compiled/schema_tests` directory for all compiled test queries. Copy the SQL into a query editor (in dbt, you can paste it into a new `Statement`), and run the query to find the records that failed. Does my test file need to be named \`schema.yml\`? No! You can name this file whatever you want (including `whatever_you_want.yml`), so long as: * The file is in your `models/` directory¹ * The file has `.yml` extension Check out the [docs](https://docs.getdbt.com/reference/configs-and-properties.md) for more information. ¹If you're declaring properties for seeds, snapshots, or macros, you can also place this file in the related directory — `seeds/`, `snapshots/` and `macros/` respectively. Why do model and source YAML files always start with \`version: 2\`? Once upon a time, the structure of these `.yml` files was very different (s/o to anyone who was using dbt back then!). Adding `version: 2` allowed us to make this structure more extensible. From [dbt Core v1.5](), the top-level `version:` key is optional in all resource YAML files. If present, only `version: 2` is supported. Also starting in v1.5, both the [`config-version: 2`](https://docs.getdbt.com/reference/project-configs/config-version.md) and the top-level `version:` key in the `dbt_project.yml` are optional. Resource YAML files do not currently require this config. We only support `version: 2` if it's specified. Although we do not expect to update YAML files to `version: 3` soon, having this config will make it easier for us to introduce new structures in the future What data tests should I add to my project? We recommend that every model has a data test on a primary key, that is, a column that is `unique` and `not_null`. We also recommend that you test any assumptions on your source data. For example, if you believe that your payments can only be one of three payment methods, you should test that assumption regularly — a new payment method may introduce logic errors in your SQL. In advanced dbt projects, we recommend using [sources](https://docs.getdbt.com/docs/build/sources.md) and running these source data-integrity tests against the sources rather than models. When should I run my data tests? You should run your data tests whenever you are writing new code (to ensure you haven't broken any existing models by changing SQL), and whenever you run your transformations in production (to ensure that your assumptions about your source data are still valid). #### Document your models[​](#document-your-models "Direct link to Document your models") Adding [documentation](https://docs.getdbt.com/docs/build/documentation.md) to your project allows you to describe your models in rich detail, and share that information with your team. Here, we're going to add some basic documentation to our project. 1. Update your `models/schema.yml` file to include some descriptions, such as those below. models/schema.yml ```yaml models: - name: bi_customers description: One record per customer columns: - name: customer_id description: Primary key data_tests: - unique - not_null - name: first_order_date description: NULL when a customer has not yet placed an order. - name: stg_customers description: This model cleans up customer data columns: - name: customer_id description: Primary key data_tests: - unique - not_null - name: stg_orders description: This model cleans up order data columns: - name: order_id description: Primary key data_tests: - unique - not_null - name: status data_tests: - accepted_values: arguments: # available in v1.10.5 and higher. Older versions can set the as the top-level property. values: ['placed', 'shipped', 'completed', 'return_pending', 'returned'] - name: customer_id data_tests: - not_null - relationships: arguments: to: ref('stg_customers') field: customer_id ``` 2. Run `dbt docs generate` to generate the documentation for your project. dbt introspects your project and your warehouse to generate a JSON file with rich documentation about your project. 3. Click the book icon in the Develop interface to launch documentation in a new tab. ###### FAQs[​](#faqs-3 "Direct link to FAQs") How do I write long-form explanations in my descriptions? If you need more than a sentence to explain a model, you can: 1. Split your description over multiple lines using `>`. Interior line breaks are removed and Markdown can be used. This method is recommended for simple, single-paragraph descriptions: ```yml models: - name: customers description: > Lorem ipsum **dolor** sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. ``` 2. Split your description over multiple lines using `|`. Interior line breaks are maintained and Markdown can be used. This method is recommended for more complex descriptions: ```yml models: - name: customers description: | ### Lorem ipsum * dolor sit amet, consectetur adipisicing elit, sed do eiusmod * tempor incididunt ut labore et dolore magna aliqua. ``` 3. Use a [docs block](https://docs.getdbt.com/docs/build/documentation.md#using-docs-blocks) to write the description in a separate Markdown file. How do I access documentation in dbt Catalog? If you're using dbt to deploy your project and have a [Starter, Enterprise, or Enterprise+ plan](https://www.getdbt.com/pricing/), you can use Catalog to view your project's [resources](https://docs.getdbt.com/docs/build/projects.md) (such as models, tests, and metrics) and their lineage to gain a better understanding of its latest production state. Access Catalog in dbt by clicking the **Catalog** link in the navigation. You can have up to 5 read-only users access the documentation for your project. dbt developer plan and dbt Core users can use [dbt Docs](https://docs.getdbt.com/docs/explore/build-and-view-your-docs.md#dbt-docs), which generates basic documentation but it doesn't offer the same speed, metadata, or visibility as Catalog. #### Commit your changes[​](#commit-your-changes "Direct link to Commit your changes") Now that you've built your customer model, you need to commit the changes you made to the project so that the repository has your latest code. **If you edited directly in the protected primary branch:**
1. Click the **Commit and sync git** button. This action prepares your changes for commit. 2. A modal titled **Commit to a new branch** will appear. 3. In the modal window, name your new branch `add-customers-model`. This branches off from your primary branch with your new changes. 4. Add a commit message, such as "Add customers model, tests, docs" and commit your changes. 5. Click **Merge this branch to main** to add these changes to the main branch on your repo. **If you created a new branch before editing:**
1. Since you already branched out of the primary protected branch, go to **Version Control** on the left. 2. Click **Commit and sync** to add a message. 3. Add a commit message, such as "Add customers model, tests, docs." 4. Click **Merge this branch to main** to add these changes to the main branch on your repo. #### Deploy dbt[​](#deploy-dbt "Direct link to Deploy dbt") Use dbt's Scheduler to deploy your production jobs confidently and build observability into your processes. You'll learn to create a deployment environment and run a job in the following steps. ##### Create a deployment environment[​](#create-a-deployment-environment "Direct link to Create a deployment environment") 1. In the upper left, select **Deploy**, then click **Environments**. 2. Click **Create Environment**. 3. In the **Name** field, write the name of your deployment environment. For example, "Production." 4. In the **dbt Version** field, select the latest version from the dropdown. 5. Under **Deployment connection**, enter the name of the dataset you want to use as the target, such as `jaffle_shop_prod`. This will allow dbt to build and work with that dataset. 6. Click **Save**. ##### Create and run a job[​](#create-and-run-a-job "Direct link to Create and run a job") Jobs are a set of dbt commands that you want to run on a schedule. For example, `dbt build`. As the `jaffle_shop` business gains more customers, and those customers create more orders, you will see more records added to your source data. Because you materialized the `bi_customers` model as a table, you'll need to periodically rebuild your table to ensure that the data stays up-to-date. This update will happen when you run a job. 1. After creating your deployment environment, you should be directed to the page for a new environment. If not, select **Deploy** in the upper left, then click **Jobs**. 2. Click **+ Create job** and then select **Deploy job**. Provide a name, for example, "Production run", and link it to the Environment you just created. 3. Scroll down to the **Execution Settings** section. 4. Under **Commands**, add this command as part of your job if you don't see it: * `dbt build` 5. Select the **Generate docs on run** checkbox to automatically [generate updated project docs](https://docs.getdbt.com/docs/explore/build-and-view-your-docs.md) each time your job runs. 6. For this exercise, do *not* set a schedule for your project to run — while your organization's project should run regularly, there's no need to run this example project on a schedule. Scheduling a job is sometimes referred to as *deploying a project*. 7. Select **Save**, then click **Run now** to run your job. 8. Click the run and watch its progress under "Run history." 9. Once the run is complete, click **View Documentation** to see the docs for your project. Congratulations 🎉! You've just deployed your first dbt project! ###### FAQs[​](#faqs-4 "Direct link to FAQs") What happens if one of my runs fails? If you're using dbt, we recommend setting up email and Slack notifications (`Account Settings > Notifications`) for any failed runs. Then, debug these runs the same way you would debug any runs in development. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Quickstart for dbt Canvas [Back to guides](https://docs.getdbt.com/guides.md) Canvas Analyst dbt platform Model Beginner [Menu ]() #### Introduction[​](#introduction "Direct link to Introduction") Canvas offers a quick and straightforward way for anyone to build analytics models, no background in analytics engineering is required! In this guide, you will learn about: * Accessing Canvas and creating a new model * Navigating the interface * Building a model using operators * Committing your changes to Git * Locating your Canvas model and data #### Canvas prerequisites[​](#canvas-prerequisites "Direct link to Canvas prerequisites") Before using Canvas, you should: * Have a [dbt Enterprise or Enterprise+](https://www.getdbt.com/pricing) account. * Have a [developer license](https://docs.getdbt.com/docs/cloud/manage-access/seats-and-users.md) with developer credentials set up. * Be using one of the following adapters: * Bigquery * Databricks * Redshift * Snowflake * Trino * You can access the Canvas with adapters not listed, but some features may be missing at this time. * Use [GitHub](https://docs.getdbt.com/docs/cloud/git/connect-github.md), [GitLab](https://docs.getdbt.com/docs/cloud/git/connect-gitlab.md), or [Azure DevOps](https://docs.getdbt.com/docs/cloud/git/connect-azure-devops.md) as your Git provider, connected to dbt via HTTPS. * SSH connections aren't supported at this time. * Self-hosted or on-premises deployments of any Git provider aren't supported for Canvas at this time. * Have an existing dbt project already created with a Staging or Production run completed. * Verify your Development environment is on a supported [release track](https://docs.getdbt.com/docs/dbt-versions/cloud-release-tracks.md) to receive ongoing updates. * Have read-only access to the [Staging environment](https://docs.getdbt.com/docs/deploy/deploy-environments.md#staging-environment) with the data to be able to execute `run` in the Canvas. To customize the required access for the Canvas user group, refer to [Set up environment-level permissions](https://docs.getdbt.com/docs/cloud/manage-access/environment-permissions-setup.md) for more information. * Have the AI-powered features toggle enabled (for [Copilot integration](https://docs.getdbt.com/docs/cloud/dbt-copilot.md)). Prerequisite for using the Jaffle Shop The examples in this guide use the [Jaffle Shop](https://github.com/dbt-labs/jaffle-shop) GitHub repo sample project. You can use your own data, but the Jaffle Shop offers a full-featured project useful for testing dbt features. Ask your dbt administrator about importing it to a project in your environment. They must also execute `dbt run` on the Jaffle Shop project in your `Production` environment before you begin, or you will be unable to reference the source models. #### Access Canvas[​](#access-canvas "Direct link to Access Canvas") To access Canvas: 1. Click **Canvas** on the left-side menu. 2. From the right side, click **Create new workspace**. This will open a new workspace with a blank untitled model. You don't need to take any additional action to continue with this guide, but in scenarios where you want to create a new model, click **+Add** on the top navigation bar and click **Create new model**. [![Create a new model from the Canvas landing page.](/img/docs/dbt-cloud/canvas/canvas-create-new-model.png?v=2 "Create a new model from the Canvas landing page.")](#)Create a new model from the Canvas landing page. #### Navigating the interface[​](#navigating-the-interface "Direct link to Navigating the interface") Canvas comprises a series of menus activated by clicking icons surrounding the border of the canvas workspace area. With none of the menu items activated, the workspace looks like this: [![The Canvas workspace screen. The number of items is defined in this section.](/img/docs/dbt-cloud/canvas/canvas-screen.png?v=2 "The Canvas workspace screen. The number of items is defined in this section.")](#)The Canvas workspace screen. The number of items is defined in this section. Click on an icon to expand its section or execute an action depending on its purpose. The options are as follows: 1. The main menu (click on the **dbt logo**) and the workspace's title. The title default is random but can be edited anytime by clicking on it. 2. The **current model tab** and name. The name for the model is set with the **Output** operator. 3. The **model icon** button. Manage your models in the workspace. 4. The **Runs** pane displays run data, including warnings and errors. 5. The **Previews** pane that displays previews data for individual operators. 6. The **Add** option for creating new models, editing existing, or adding seed files. 7. The **Operators** toolbar (`Input`, `Transform`, and `Output`) contains the building blocks for creating a model with the editor. 8. The [Copilot](https://docs.getdbt.com/docs/cloud/dbt-copilot.md) icon (where available). Use natural language to build your Canvas models. 9. The **SQL code** area displays the SQL that compiles your model. 10. The **Run** command executes `dbt run` for the model. 11. This button is initially a **Commit** command for your integrated Git provider. It will change to "Open pull request" once your first commit is made. The button will not initially appear until you begin working in the canvas area. 12. The navigation tab that has icons for (from top to bottom): * Zoom in * Zoom out * Center the model to fit the screen * Zoom to selection (the operator selected on or off screen will be zoomed and centered) * Auto-layout option for the individual operator tiles #### Create a model[​](#create-a-model "Direct link to Create a model") This section will walk you through creating a model with operators using sample data from the [Jaffle Shop](https://github.com/dbt-labs/jaffle-shop) project. With this guide, you will create a basic model that transforms two datasets to build a view of repeat customer purchases while you consider a loyalty program for your shop. The operators are the heart of your model. They determine what data will be transformed and how. Click the **+** icon to open the operator menu. Operators are divided into three types: * **Input:** Input operators configure the source data. * **Transform:** Transform operators change and shape your data. * **Output:** Output operators define your model name and location. [![The operator’s menu on the side of the Canvas workspace.](/img/docs/dbt-cloud/canvas/operators.png?v=2 "The operator’s menu on the side of the Canvas workspace.")](#)The operator’s menu on the side of the Canvas workspace. Read more about the [individual operators](https://docs.getdbt.com/docs/cloud/canvas-interface.md#operators) to understand the basic purpose of each. The dbt model created by the Canvas builds off of existing models. In this guide, there will be input (source) models and an output model (what you are building) which will be *your model*. More about operator tiles The operators are drag-and-drop from their menu to the canvas, and when they are dropped they will create a tile. The tiles have the same basic setup with different fields depending on their function. All operators except for **Model** must be connected to another tile before configuring. Once configured, they’ll have the same basic layout. [![An operator tile with configurations filled out.](/img/docs/dbt-cloud/canvas/operator-tile.png?v=2 "An operator tile with configurations filled out.")](#)An operator tile with configurations filled out. 1. **The connectors:** Click-and-drag to the connector on another operator to link them. Some connectors have L and R markers. When implementing joins, they designate the left and right joins, respectively. 2. **The title:** Click to change. The examples and images in this guide will use the default names. 3. **Play icon and menu:** Preview the data at any point in its transformation by clicking the tile's **play icon**. The dropdown menu contains the option to **Delete** a tile. 4. **Column icon:** The number next to it represents the number of columns in the data at that point in its transformation. tip Make operator tile titles unique compared to your column names to avoid confusion, and the same applies to any aliases you create. ##### Create your model from pre-existing models[​](#create-your-model-from-pre-existing-models "Direct link to Create your model from pre-existing models") To get started: 1. Click the **Input** menu and drag the **Input Model** operator over to the canvas. 2. Click **Choose a model** and then select the source `stg_models` from the dropdown. 3. Click the **Select model** option in the window that lists the columns. [![A single model operator.](/img/docs/dbt-cloud/canvas/one-model-operator.png?v=2 "A single model operator.")](#)A single model operator. You now have your first input model in Canvas! 4. Drag a new **Input Model** operator to the canvas below the first and repeat the previous steps, but this time set the source model to `stg_order_items`. [![Two model operators in the canvas.](/img/docs/dbt-cloud/canvas/two-model-operators.png?v=2 "Two model operators in the canvas.")](#)Two model operators in the canvas. Now, you have two input models and are ready to transform the data! tip Don't see a pre-existing model you're looking for? Ask your dbt admins to ensure it's been run in your Production environment recently and hasn't gone stale. ##### Create a join[​](#create-a-join "Direct link to Create a join") 1. From the **Operators** menu, click **Transform** and drag the **Join** operator onto the canvas to the right of the source models. [![A join that has not been connected to the models](/img/docs/dbt-cloud/canvas/join-not-connected.png?v=2 "A join that has not been connected to the models")](#)A join that has not been connected to the models 2. Click and drag a line from the **+** connector below the `L` on the join border to the **+** on the `stg_orders` model. Do the same for the `R` connector to the `stg_order_items` model. [![The join is connected to two model operators.](/img/docs/dbt-cloud/canvas/join-connected.png?v=2 "The join is connected to two model operators.")](#)The join is connected to two model operators. 3. In the **Join** tile, click **Configure inputs.** 4. Set the **Join type** to `Inner`. 5. In the pair of dropdowns, set both `stg_orders` and `stg_order_items` to `ORDER_ID`. 6. Click **Select and rename columns** and click **Configure columns** select the following columns: * From `stg_orders` click `ORDER_ID` and `CUSTOMER_ID`. * From `stg_order_items` click `PRODUCT_ID`. * Note: These will appear in the order they are clicked. 7. You've now built your join! Test it by clicking the **Play icon** in the top right corner of the join tile. Your data will populate in the **Runs and previews** pane. [![A completed join with the sample data.](/img/docs/dbt-cloud/canvas/preview-join.png?v=2 "A completed join with the sample data.")](#)A completed join with the sample data. tip Your work in the Canvas is automatically saved as you progress, so if you need a break, you can always come back to a session later. Just be sure to give it a unique title! #### Enhance your model[​](#enhance-your-model "Direct link to Enhance your model") You've got the basics going with your Canvas model! It has successfully joined two pre-existing input models, but you want to transform the data further to get what you need: a list of customers who buy repeat items as you consider a loyalty club rewards program. ##### Aggregate data[​](#aggregate-data "Direct link to Aggregate data") Multiple options for transforming your data include custom formulas, filters, and unions. Keep it simple and add an aggregation operator to tell you which customers buy the most repeat products. 1. From **Transform**, drag the **Aggregation** operator over to the right of the join. 2. Connect the aggregation operator to the join operator. 3. Click **Configure aggregation** in the **Aggregation tile**. 4. Click in the **Group by** field and first select `CUSTOMER_ID` then `PRODUCT_ID`. 5. Configure the next three fields with the following: * **Function:** Count * **Column:** PRODUCT\_ID * **Alias:** count\_PRODUCT\_ID [![The configured aggregation operator tile.](/img/docs/dbt-cloud/canvas/aggregation.png?v=2 "The configured aggregation operator tile.")](#)The configured aggregation operator tile. 6. Click the **Play icon** to preview the data. You're starting to see the results you're looking for, but the data is scattered. Let's clean it up a bit more. tip As your model grows, you can zoom in and out to view your needs. Click and hold in empty canvas space to drag your setup across the screen. Click the **Fit view** icon to see your entire model on the screen. Click the **Auto layout** icon to auto-arrange the tiles efficiently. ##### Add some order[​](#add-some-order "Direct link to Add some order") There's a lot of data there. Dozens of customers are buying hundreds of products. You will sort it so that the customers are listed ascending by their CUSTOMER\_ID number, with the most purchased products listed in descending order. 1. From **Transform**, drag the **Order** operator over to the right of the **Aggregation** tile and connect them. 2. Click the **pencil edit icon**. 3. In the **Sort order** field click **Select column** and click `Aggregation1.CUSTOMER_ID` from the dropdown. Set it to `Asc`. 4. Click **Add sorting** and in the new **Select column** field select `Aggregation1.count_PRODUCT_ID`. Set it to `Desc`. 5. Press the **Play icon** to preview the new data. [![The ordered data operator tile config.](/img/docs/dbt-cloud/canvas/order.png?v=2 "The ordered data operator tile config.")](#)The ordered data operator tile config. tip Want to practice on your own? Try adding a **Filter** operator that removes items with less than 10 sales for any customer ID. Be sure to run the preview and verify the data is correct. #### Configure your output model[​](#configure-your-output-model "Direct link to Configure your output model") Now that you've built your model, you need to customize the output name and location: 1. From **Output**, drag the **Output Model** operator to the right of your **Order** operator. 2. Connect the **Order** and **Output Model** operators. 3. The **Output Model** configuration will default to the name of your Canvas project and the default models directory. Click the **pencil edit icon** to configure the optional fields: * Edit the **Model name** field if you want the name to be different than that of your project. * Edit the **File path** if you have a custom directory for your Canvas models. * Hover over a column name and click the **-** icon to remove it from the output model. 4. Click the **play icon** to preview your final model. [![The output model configures your final model's name and location.](/img/docs/dbt-cloud/canvas/output-model.png?v=2 "The output model configures your final model's name and location.")](#)The output model configures your final model's name and location. Model locations You can customize the location for Canvas models to keep them separate from other dbt models. Check with your dbt admins for best practices and ideas for Canvas model locations and naming conventions. #### Run and share your model[​](#run-and-share-your-model "Direct link to Run and share your model") Now that you've built a model that results in the data you want, it's time to run it and push it to your Git repo. Before you run your model, keep a few items in mind: * When you run previews (at any stage in the process), it does not affect the state of your warehouse. So, you can test and develop in the Canvas without impacting anything outside of the dbt Development environment. * When you're ready to use this model in a downstream tool, you can run it to materialize it in your data warehouse development schema. * Once your model is ready for production and ready to be used by others or orchestrated, commit it and open a pull request. ##### Run[​](#run "Direct link to Run") To run your model, you only need to click the big **Run** button. With the Canvas, there is no command line and no need to memorize a list of commands; there is only **Run**. Click it to see the results populate in the **Runs and previews** pane. [![The results of a successful run in the 'Runs and previews' pane.](/img/docs/dbt-cloud/canvas/run-results.png?v=2 "The results of a successful run in the 'Runs and previews' pane.")](#)The results of a successful run in the 'Runs and previews' pane. This will [materialize](https://docs.getdbt.com/docs/build/materializations.md) the data as a `view` in your developer schema in the database. Once the model has been merged with your project and `dbt run` is executed in your Staging or Production environments, it will be materialized as a view in related schemas. [![Preview of the transformed data in Snowflake.](/img/docs/dbt-cloud/canvas/preview-data.png?v=2 "Preview of the transformed data in Snowflake.")](#)Preview of the transformed data in Snowflake. tip Have dbt [Copilot](https://docs.getdbt.com/docs/cloud/dbt-copilot.md) enabled for your dbt Enterprise account? Clear the board and try using natural language to build the model in this guide without manually configuring any operators. ##### Git commit[​](#git-commit "Direct link to Git commit") The models built in the Canvas are a part of your larger dbt project. They are stored in the `visual_editor` folder of your `/models` directory. This is all done automatically; you don't have to configure any paths or directories. [![Example of the Canvas model path in GitHub.](/img/docs/dbt-cloud/canvas/ve-model-folder.png?v=2 "Example of the Canvas model path in GitHub.")](#)Example of the Canvas model path in GitHub. However, it won't be created in your Git repo until you commit your first model. So, back in the model's view: 1. Click **Commit** in the top right. * If you've already created a commit and wish to make more, click the arrow next to **Create a pull request** to see the **Commit** option. 2. Fill out the **Description** field with information about your model. If it's long, part of it will be included in the pull request title, and the rest will be in the body. That's okay! You can correct it during the PR creation process. 3. Click **Commit**. 4. The **Commit** button will change to **Create a pull request**. You can add more commits, but click the **Create a pull request** button for now. You will then be redirected to your Git provider in a new tab. The following example uses GitHub as the provider: [![Example of the screen you're taken to in GitHub when you create a pull request from Canvas.](/img/docs/dbt-cloud/canvas/demo-model-github.png?v=2 "Example of the screen you're taken to in GitHub when you create a pull request from Canvas.")](#)Example of the screen you're taken to in GitHub when you create a pull request from Canvas. 5. Click **Create pull request** in the GitHub window. 6. Complete the **Add a title** and **Add a description** fields. If your description is split between both, copy all the contents to the description field and give it a shorter title. 7. Click **Create pull request**. You've just submitted your first model from the Canvas for review. Once approved and merged, the model will be included in your organization’s project and run whenever `dbt run` is executed in any environment your model is in. You're now on your way to becoming an expert in data transformation! tip Want to take your skills to the next level? Try taking the SQL output from your Canvas model and using it to create a model in the [Studio IDE](https://docs.getdbt.com/docs/cloud/studio-ide/develop-in-studio.md). Want to learn more? Be sure to check out our [Canvas fundamentals course](https://learn.getdbt.com/learn/course/canvas-fundamentals) on [dbt Learn](https://learn.getdbt.com/catalog). #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Quickstart for dbt Core from a manual install [Back to guides](https://docs.getdbt.com/guides.md) dbt Core Quickstart Beginner [Menu ]() #### Introduction[​](#introduction "Direct link to Introduction") When you use dbt Core to work with dbt, you will be editing files locally using a code editor, and running projects using a command line interface (CLI). If you want to edit files and run projects using the web-based dbt Integrated Development Environment (Studio IDE), refer to the [dbt quickstarts](https://docs.getdbt.com/guides.md). You can also develop and run dbt commands using the [dbt CLI](https://docs.getdbt.com/docs/cloud/cloud-cli-installation.md) — a dbt powered command line. ##### Prerequisites[​](#prerequisites "Direct link to Prerequisites") * To use dbt Core, it's important that you know some basics of the Terminal. In particular, you should understand `cd`, `ls` and `pwd` to navigate through the directory structure of your computer easily. * Install dbt Core using the [installation instructions](https://docs.getdbt.com/docs/local/install-dbt.md) for your operating system. * Complete appropriate Setting up and Loading data steps in the Quickstart for dbt series. For example, for BigQuery, complete [Setting up (in BigQuery)](https://docs.getdbt.com/guides/bigquery.md?step=2) and [Loading data (BigQuery)](https://docs.getdbt.com/guides/bigquery.md?step=3). * [Create a GitHub account](https://github.com/join) if you don't already have one. ##### Create a starter project[​](#create-a-starter-project "Direct link to Create a starter project") After setting up BigQuery to work with dbt, you are ready to create a starter project with example models, before building your own models. #### Create a repository[​](#create-a-repository "Direct link to Create a repository") The following steps use [GitHub](https://github.com/) as the Git provider for this guide, but you can use any Git provider. You should have already [created a GitHub account](https://github.com/join). 1. [Create a new GitHub repository](https://github.com/new) named `dbt-tutorial`. 2. Select one of the following (You can always change this setting later): * **Private (recommended):** To secure your environment and prevent private information (like credentials) from being public. * **Public:** If you need to easily collaborate and share with others, especially outside of your organization. 3. Leave the default values for all other settings. 4. Click **Create repository**. 5. Save the commands from "…or create a new repository on the command line" to use later in [Commit your changes](https://docs.getdbt.com/guides/manual-install.md?step=6). #### Create a project[​](#create-a-project "Direct link to Create a project") Learn how to use a series of commands using the command line of the Terminal to create your project. dbt Core includes an `init` command that helps scaffold a dbt project. To create your dbt project: 1. Make sure you have dbt Core installed and check the version using the `dbt --version` command: ```shell dbt --version ``` 2. Initiate the `jaffle_shop` project using the `init` command: ```shell dbt init jaffle_shop ``` 3. Navigate into your project's directory: ```shell cd jaffle_shop ``` 4. Use `pwd` to confirm that you are in the right spot: ```shell $ pwd > Users/BBaggins/dbt-tutorial/jaffle_shop ``` 5. Use a code editor like Atom or VSCode to open the project directory you created in the previous steps, which we named jaffle\_shop. The content includes folders and `.sql` and `.yml` files generated by the `init` command. [![The starter project in a code editor](/img/starter-project-dbt-cli.png?v=2 "The starter project in a code editor")](#)The starter project in a code editor 6. dbt provides the following values in the `dbt_project.yml` file: dbt\_project.yml ```yaml name: jaffle_shop # Change from the default, `my_new_project` ... profile: jaffle_shop # Change from the default profile name, `default` ... models: jaffle_shop: # Change from `my_new_project` to match the previous value for `name:` ... ``` #### Connect to BigQuery[​](#connect-to-bigquery "Direct link to Connect to BigQuery") When developing locally, dbt connects to your data warehouse using a [profile](https://docs.getdbt.com/docs/local/profiles.yml.md), which is a YAML file with all the connection details to your warehouse. 1. Create a file in the `~/.dbt/` directory named `profiles.yml`. 2. Move your BigQuery keyfile into this directory. 3. Copy the following and paste into the new profiles.yml file. Make sure you update the values where noted. profiles.yml ```yaml jaffle_shop: # this needs to match the profile in your dbt_project.yml file target: dev outputs: dev: type: bigquery method: service-account keyfile: /Users/BBaggins/.dbt/dbt-tutorial-project-331118.json # replace this with the full path to your keyfile project: grand-highway-265418 # Replace this with your project id dataset: dbt_bbagins # Replace this with dbt_your_name, e.g. dbt_bilbo threads: 1 timeout_seconds: 300 location: US priority: interactive ``` 4. Run the `debug` command from your project to confirm that you can successfully connect: ```shell $ dbt debug > Connection test: OK connection ok ``` [![A successful dbt debug command](/img/successful-dbt-debug.png?v=2 "A successful dbt debug command")](#)A successful dbt debug command ##### FAQs[​](#faqs "Direct link to FAQs") My data team uses a different data warehouse. What should my profiles.yml file look like for my warehouse? The structure of a profile looks different on each warehouse. Check out the [Supported Data Platforms](https://docs.getdbt.com/docs/supported-data-platforms.md) page, and navigate to the `Profile Setup` section for your warehouse. Why are profiles stored outside of my project? Profiles are stored separately to dbt projects to avoid checking credentials into version control. Database credentials are extremely sensitive information and should **never be checked into version control**. What should I name my profile? We typically use a company name for a profile name, and then use targets to differentiate between `dev` and `prod`. Check out the docs on [environments in dbt Core](https://docs.getdbt.com/docs/local/dbt-core-environments.md) for more information. What should I name my target? We typically use targets to differentiate between development and production runs of dbt, naming the targets `dev` and `prod`, respectively. Check out the docs on [managing environments in dbt Core](https://docs.getdbt.com/docs/local/dbt-core-environments.md) for more information. Can I use environment variables in my profile? Yes! Check out the docs on [environment variables](https://docs.getdbt.com/reference/dbt-jinja-functions/env_var.md) for more information. #### Perform your first dbt run[​](#perform-your-first-dbt-run "Direct link to Perform your first dbt run") Our sample project has some example models in it. We're going to check that we can run them to confirm everything is in order. 1. Enter the `run` command to build example models: ```shell dbt run ``` You should have an output that looks like this: [![A successful dbt run command](/img/successful-dbt-run.png?v=2 "A successful dbt run command")](#)A successful dbt run command #### Commit your changes[​](#commit-your-changes "Direct link to Commit your changes") Commit your changes so that the repository contains the latest code. 1. Link the GitHub repository you created to your dbt project by running the following commands in Terminal. Make sure you use the correct git URL for your repository, which you should have saved from step 5 in [Create a repository](https://docs.getdbt.com/guides/manual-install.md?step=2). ```shell git init git branch -M main git add . git commit -m "Create a dbt project" git remote add origin https://github.com/USERNAME/dbt-tutorial.git git push -u origin main ``` 2. Return to your GitHub repository to verify your new files have been added. ##### Build your first models[​](#build-your-first-models "Direct link to Build your first models") Now that you set up your sample project, you can get to the fun part — [building models](https://docs.getdbt.com/docs/build/sql-models.md)! In the next steps, you will take a sample query and turn it into a model in your dbt project. #### Checkout a new git branch[​](#checkout-a-new-git-branch "Direct link to Checkout a new git branch") Check out a new git branch to work on new code: 1. Create a new branch by using the `checkout` command and passing the `-b` flag: ```shell $ git checkout -b add-customers-model > Switched to a new branch `add-customer-model` ``` #### Build your first model[​](#build-your-first-model "Direct link to Build your first model") 1. Open your project in your favorite code editor. 2. Create a new SQL file in the `models` directory, named `models/customers.sql`. 3. Paste the following query into the `models/customers.sql` file. * BigQuery * Databricks * Redshift * Snowflake ```sql with customers as ( select id as customer_id, first_name, last_name from `dbt-tutorial`.jaffle_shop.customers ), orders as ( select id as order_id, user_id as customer_id, order_date, status from `dbt-tutorial`.jaffle_shop.orders ), customer_orders as ( select customer_id, min(order_date) as first_order_date, max(order_date) as most_recent_order_date, count(order_id) as number_of_orders from orders group by 1 ), final as ( select customers.customer_id, customers.first_name, customers.last_name, customer_orders.first_order_date, customer_orders.most_recent_order_date, coalesce(customer_orders.number_of_orders, 0) as number_of_orders from customers left join customer_orders using (customer_id) ) select * from final ``` ```sql with customers as ( select id as customer_id, first_name, last_name from jaffle_shop_customers ), orders as ( select id as order_id, user_id as customer_id, order_date, status from jaffle_shop_orders ), customer_orders as ( select customer_id, min(order_date) as first_order_date, max(order_date) as most_recent_order_date, count(order_id) as number_of_orders from orders group by 1 ), final as ( select customers.customer_id, customers.first_name, customers.last_name, customer_orders.first_order_date, customer_orders.most_recent_order_date, coalesce(customer_orders.number_of_orders, 0) as number_of_orders from customers left join customer_orders using (customer_id) ) select * from final ``` ```sql with customers as ( select id as customer_id, first_name, last_name from jaffle_shop.customers ), orders as ( select id as order_id, user_id as customer_id, order_date, status from jaffle_shop.orders ), customer_orders as ( select customer_id, min(order_date) as first_order_date, max(order_date) as most_recent_order_date, count(order_id) as number_of_orders from orders group by 1 ), final as ( select customers.customer_id, customers.first_name, customers.last_name, customer_orders.first_order_date, customer_orders.most_recent_order_date, coalesce(customer_orders.number_of_orders, 0) as number_of_orders from customers left join customer_orders using (customer_id) ) select * from final ``` ```sql with customers as ( select id as customer_id, first_name, last_name from raw.jaffle_shop.customers ), orders as ( select id as order_id, user_id as customer_id, order_date, status from raw.jaffle_shop.orders ), customer_orders as ( select customer_id, min(order_date) as first_order_date, max(order_date) as most_recent_order_date, count(order_id) as number_of_orders from orders group by 1 ), final as ( select customers.customer_id, customers.first_name, customers.last_name, customer_orders.first_order_date, customer_orders.most_recent_order_date, coalesce(customer_orders.number_of_orders, 0) as number_of_orders from customers left join customer_orders using (customer_id) ) select * from final ``` 4. From the command line, enter `dbt run`. [![A successful run with the dbt Core CLI](/img/first-model-dbt-cli.png?v=2 "A successful run with the dbt Core CLI")](#)A successful run with the dbt Core CLI When you return to the BigQuery console, you can `select` from this model. ##### FAQs[​](#faqs-1 "Direct link to FAQs") How can I see the SQL that dbt is running? To check out the SQL that dbt is running, you can look in: * dbt: * Within the run output, click on a model name, and then select "Details" * dbt Core: * The `target/compiled/` directory for compiled `select` statements * The `target/run/` directory for compiled `create` statements * The `logs/dbt.log` file for verbose logging. How did dbt choose which schema to build my models in? By default, dbt builds models in your target schema. To change your target schema: * If you're developing in **dbt**, these are set for each user when you first use a development environment. * If you're developing with **dbt Core**, this is the `schema:` parameter in your `profiles.yml` file. If you wish to split your models across multiple schemas, check out the docs on [using custom schemas](https://docs.getdbt.com/docs/build/custom-schemas.md). Note: on BigQuery, `dataset` is used interchangeably with `schema`. Do I need to create my target schema before running dbt? Nope! dbt will check if the schema exists when it runs. If the schema does not exist, dbt will create it for you. If I rerun dbt, will there be any downtime as models are rebuilt? Nope! The SQL that dbt generates behind the scenes ensures that any relations are replaced atomically (i.e. your business users won't experience any downtime). The implementation of this varies on each warehouse, check out the [logs](https://docs.getdbt.com/faqs/Runs/checking-logs.md) to see the SQL dbt is executing. What happens if the SQL in my query is bad or I get a database error? If there's a mistake in your SQL, dbt will return the error that your database returns. ```shell $ dbt run --select customers Running with dbt=1.9.0 Found 3 models, 9 tests, 0 snapshots, 0 analyses, 133 macros, 0 operations, 0 seed files, 0 sources 14:04:12 | Concurrency: 1 threads (target='dev') 14:04:12 | 14:04:12 | 1 of 1 START view model dbt_alice.customers.......................... [RUN] 14:04:13 | 1 of 1 ERROR creating view model dbt_alice.customers................. [ERROR in 0.81s] 14:04:13 | 14:04:13 | Finished running 1 view model in 1.68s. Completed with 1 error and 0 warnings: Database Error in model customers (models/customers.sql) Syntax error: Expected ")" but got identifier `your-info-12345` at [13:15] compiled SQL at target/run/jaffle_shop/customers.sql Done. PASS=0 WARN=0 ERROR=1 SKIP=0 TOTAL=1 ``` Any models downstream of this model will also be skipped. Use the error message and the [compiled SQL](https://docs.getdbt.com/faqs/Runs/checking-logs.md) to debug any errors. #### Change the way your model is materialized[​](#change-the-way-your-model-is-materialized "Direct link to Change the way your model is materialized") One of the most powerful features of dbt is that you can change the way a model is materialized in your warehouse, simply by changing a configuration value. You can change things between tables and views by changing a keyword rather than writing the data definition language (DDL) to do this behind the scenes. By default, everything gets created as a view. You can override that at the directory level so everything in that directory will materialize to a different materialization. 1. Edit your `dbt_project.yml` file. * Update your project `name` to: dbt\_project.yml ```yaml name: 'jaffle_shop' ``` * Configure `jaffle_shop` so everything in it will be materialized as a table; and configure `example` so everything in it will be materialized as a view. Update your `models` config in the project YAML file to: dbt\_project.yml ```yaml models: jaffle_shop: +materialized: table example: +materialized: view ``` * Click **Save**. 2. Enter the `dbt run` command. Your `customers` model should now be built as a table! info To do this, dbt had to first run a `drop view` statement (or API call on BigQuery), then a `create table as` statement. 3. Edit `models/customers.sql` to override the `dbt_project.yml` for the `customers` model only by adding the following snippet to the top, and click **Save**: models/customers.sql ```sql {{ config( materialized='view' ) }} with customers as ( select id as customer_id ... ) ``` 4. Enter the `dbt run` command. Your model, `customers`, should now build as a view. * BigQuery users need to run `dbt run --full-refresh` instead of `dbt run` to full apply materialization changes. 5. Enter the `dbt run --full-refresh` command for this to take effect in your warehouse. ##### FAQs[​](#faqs "Direct link to FAQs") What materializations are available in dbt? dbt ships with five built-in materializations: `view`, `table`, `incremental`, `ephemeral`, and `materialized_view`. Check out the documentation on [materializations](https://docs.getdbt.com/docs/build/materializations.md) for more information on each of these options. You can also create your own [custom materializations](https://docs.getdbt.com/guides/create-new-materializations.md). This is an advanced feature of dbt. Which materialization should I use for my model? Start out with views, and then change models to tables when required for performance reasons (i.e. downstream queries have slowed). Check out the [docs on materializations](https://docs.getdbt.com/docs/build/materializations.md) for advice on when to use each materialization. What model configurations exist? You can also configure: * [tags](https://docs.getdbt.com/reference/resource-configs/tags.md) to support easy categorization and graph selection * [custom schemas](https://docs.getdbt.com/reference/resource-properties/schema.md) to split your models across multiple schemas * [aliases](https://docs.getdbt.com/reference/resource-configs/alias.md) if your view/table name should differ from the filename * Snippets of SQL to run at the start or end of a model, known as [hooks](https://docs.getdbt.com/docs/build/hooks-operations.md) * Warehouse-specific configurations for performance (e.g. `sort` and `dist` keys on Redshift, `partitions` on BigQuery) Check out the docs on [model configurations](https://docs.getdbt.com/reference/model-configs.md) to learn more. #### Delete the example models[​](#delete-the-example-models "Direct link to Delete the example models") You can now delete the files that dbt created when you initialized the project: 1. Delete the `models/example/` directory. 2. Delete the `example:` key from your `dbt_project.yml` file, and any configurations that are listed under it. dbt\_project.yml ```yaml # before models: jaffle_shop: +materialized: table example: +materialized: view ``` dbt\_project.yml ```yaml # after models: jaffle_shop: +materialized: table ``` 3. Save your changes. ###### FAQs[​](#faqs "Direct link to FAQs") How do I remove deleted models from my data warehouse? If you delete a model from your dbt project, dbt does not automatically drop the relation from your schema. This means that you can end up with extra objects in schemas that dbt creates, which can be confusing to other users. (This can also happen when you switch a model from being a view or table, to ephemeral) When you remove models from your dbt project, you should manually drop the related relations from your schema. I got an "unused model configurations" error message, what does this mean? You might have forgotten to nest your configurations under your project name, or you might be trying to apply configurations to a directory that doesn't exist. Check out this [article](https://discourse.getdbt.com/t/faq-i-got-an-unused-model-configurations-error-message-what-does-this-mean/112) to understand more. #### Build models on top of other models[​](#build-models-on-top-of-other-models "Direct link to Build models on top of other models") As a best practice in SQL, you should separate logic that cleans up your data from logic that transforms your data. You have already started doing this in the existing query by using common table expressions (CTEs). Now you can experiment by separating the logic out into separate models and using the [ref](https://docs.getdbt.com/reference/dbt-jinja-functions/ref.md) function to build models on top of other models: [![The DAG we want for our dbt project](/img/dbt-dag.png?v=2 "The DAG we want for our dbt project")](#)The DAG we want for our dbt project 1. Create a new SQL file, `models/stg_customers.sql`, with the SQL from the `customers` CTE in our original query. 2. Create a second new SQL file, `models/stg_orders.sql`, with the SQL from the `orders` CTE in our original query. * BigQuery * Databricks * Redshift * Snowflake models/stg\_customers.sql ```sql select id as customer_id, first_name, last_name from `dbt-tutorial`.jaffle_shop.customers ``` models/stg\_orders.sql ```sql select id as order_id, user_id as customer_id, order_date, status from `dbt-tutorial`.jaffle_shop.orders ``` models/stg\_customers.sql ```sql select id as customer_id, first_name, last_name from jaffle_shop_customers ``` models/stg\_orders.sql ```sql select id as order_id, user_id as customer_id, order_date, status from jaffle_shop_orders ``` models/stg\_customers.sql ```sql select id as customer_id, first_name, last_name from jaffle_shop.customers ``` models/stg\_orders.sql ```sql select id as order_id, user_id as customer_id, order_date, status from jaffle_shop.orders ``` models/stg\_customers.sql ```sql select id as customer_id, first_name, last_name from raw.jaffle_shop.customers ``` models/stg\_orders.sql ```sql select id as order_id, user_id as customer_id, order_date, status from raw.jaffle_shop.orders ``` 3. Edit the SQL in your `models/customers.sql` file as follows: models/customers.sql ```sql with customers as ( select * from {{ ref('stg_customers') }} ), orders as ( select * from {{ ref('stg_orders') }} ), customer_orders as ( select customer_id, min(order_date) as first_order_date, max(order_date) as most_recent_order_date, count(order_id) as number_of_orders from orders group by 1 ), final as ( select customers.customer_id, customers.first_name, customers.last_name, customer_orders.first_order_date, customer_orders.most_recent_order_date, coalesce(customer_orders.number_of_orders, 0) as number_of_orders from customers left join customer_orders using (customer_id) ) select * from final ``` 4. Execute `dbt run`. This time, when you performed a `dbt run`, separate views/tables were created for `stg_customers`, `stg_orders` and `customers`. dbt inferred the order to run these models. Because `customers` depends on `stg_customers` and `stg_orders`, dbt builds `customers` last. You do not need to explicitly define these dependencies. ##### FAQs[​](#faq-2 "Direct link to FAQs") How do I run one model at a time? To run one model, use the `--select` flag (or `-s` flag), followed by the name of the model: ```shell $ dbt run --select customers ``` Check out the [model selection syntax documentation](https://docs.getdbt.com/reference/node-selection/syntax.md) for more operators and examples. Do ref-able resource names need to be unique? Within one project: yes! To build dependencies between resources (such as models, seeds, and snapshots), you need to use the `ref` function, and pass in the resource name as an argument. dbt uses that resource name to uniquely resolve the `ref` to a specific resource. As a result, these resource names need to be unique, *even if they are in distinct folders*. A resource in one project can have the same name as a resource in another project (installed as a dependency). dbt uses the project name to uniquely identify each resource. We call this "namespacing." If you `ref` a resource with a duplicated name, it will resolve to the resource within the same namespace (package or project), or raise an error because of an ambiguous reference. Use [two-argument `ref`](https://docs.getdbt.com/reference/dbt-jinja-functions/ref.md#ref-project-specific-models) to disambiguate references by specifying the namespace. Those resource will still need to land in distinct locations in the data warehouse. Read the docs on [custom aliases](https://docs.getdbt.com/docs/build/custom-aliases.md) and [custom schemas](https://docs.getdbt.com/docs/build/custom-schemas.md) for details on how to achieve this. As I create more models, how should I keep my project organized? What should I name my models? There's no one best way to structure a project! Every organization is unique. If you're just getting started, check out how we (dbt Labs) [structure our dbt projects](https://docs.getdbt.com/best-practices/how-we-structure/1-guide-overview.md). ##### Next steps[​](#next-steps "Direct link to Next steps") Before moving on from building your first models, make a change and see how it affects your results: * Write some bad SQL to cause an error — can you debug the error? * Run only a single model at a time. For more information, see [Syntax overview](https://docs.getdbt.com/reference/node-selection/syntax.md). * Group your models with a `stg_` prefix into a `staging` subdirectory. For example, `models/staging/stg_customers.sql`. * Configure your `staging` models to be views. * Run only the `staging` models. You can also explore: * The `target` directory to see all of the compiled SQL. The `run` directory shows the create or replace table statements that are running, which are the select statements wrapped in the correct DDL. * The `logs` file to see how dbt Core logs all of the action happening within your project. It shows the select statements that are running and the python logging happening when dbt runs. #### Add tests to your models[​](#add-tests-to-your-models "Direct link to Add tests to your models") Adding [data tests](https://docs.getdbt.com/docs/build/data-tests.md) to a project helps validate that your models are working correctly. To add data tests to your project: 1. Create a new YAML file in the `models` directory, named `models/schema.yml` 2. Add the following contents to the file: models/schema.yml ```yaml version: 2 models: - name: customers columns: - name: customer_id data_tests: - unique - not_null - name: stg_customers columns: - name: customer_id data_tests: - unique - not_null - name: stg_orders columns: - name: order_id data_tests: - unique - not_null - name: status data_tests: - accepted_values: arguments: # available in v1.10.5 and higher. Older versions can set the as the top-level property. values: ['placed', 'shipped', 'completed', 'return_pending', 'returned'] - name: customer_id data_tests: - not_null - relationships: arguments: to: ref('stg_customers') field: customer_id ``` 3. Run `dbt test`, and confirm that all your tests passed. When you run `dbt test`, dbt iterates through your YAML files, and constructs a query for each test. Each query will return the number of records that fail the test. If this number is 0, then the test is successful. ###### FAQs[​](#faqs "Direct link to FAQs") What tests are available for me to use in dbt? Can I add my own custom tests? Out of the box, dbt ships with the following data tests: * `unique` * `not_null` * `accepted_values` * `relationships` (for example, referential integrity) You can also write your own [custom generic tests](https://docs.getdbt.com/docs/build/data-tests.md#generic-data-tests). Some additional generic tests have been open-sourced in the [dbt-utils package](https://github.com/dbt-labs/dbt-utils#generic-tests). Check out the docs on [packages](https://docs.getdbt.com/docs/build/packages.md) to learn how to make these tests available in your project. How do I test one model at a time? Running tests on one model looks very similar to running a model: use the `--select` flag (or `-s` flag), followed by the name of the model: ```shell dbt test --select customers ``` Check out the [model selection syntax documentation](https://docs.getdbt.com/reference/node-selection/syntax.md) for full syntax, and [test selection examples](https://docs.getdbt.com/reference/node-selection/test-selection-examples.md) in particular. One of my tests failed, how can I debug it? To debug a failing test, find the SQL that dbt ran by: * dbt: * Within the test output, click on the failed test, and then select "Details". * dbt Core: * Open the file path returned as part of the error message. * Navigate to the `target/compiled/schema_tests` directory for all compiled test queries. Copy the SQL into a query editor (in dbt, you can paste it into a new `Statement`), and run the query to find the records that failed. Does my test file need to be named \`schema.yml\`? No! You can name this file whatever you want (including `whatever_you_want.yml`), so long as: * The file is in your `models/` directory¹ * The file has `.yml` extension Check out the [docs](https://docs.getdbt.com/reference/configs-and-properties.md) for more information. ¹If you're declaring properties for seeds, snapshots, or macros, you can also place this file in the related directory — `seeds/`, `snapshots/` and `macros/` respectively. Why do model and source YAML files always start with \`version: 2\`? Once upon a time, the structure of these `.yml` files was very different (s/o to anyone who was using dbt back then!). Adding `version: 2` allowed us to make this structure more extensible. From [dbt Core v1.5](), the top-level `version:` key is optional in all resource YAML files. If present, only `version: 2` is supported. Also starting in v1.5, both the [`config-version: 2`](https://docs.getdbt.com/reference/project-configs/config-version.md) and the top-level `version:` key in the `dbt_project.yml` are optional. Resource YAML files do not currently require this config. We only support `version: 2` if it's specified. Although we do not expect to update YAML files to `version: 3` soon, having this config will make it easier for us to introduce new structures in the future What data tests should I add to my project? We recommend that every model has a data test on a primary key, that is, a column that is `unique` and `not_null`. We also recommend that you test any assumptions on your source data. For example, if you believe that your payments can only be one of three payment methods, you should test that assumption regularly — a new payment method may introduce logic errors in your SQL. In advanced dbt projects, we recommend using [sources](https://docs.getdbt.com/docs/build/sources.md) and running these source data-integrity tests against the sources rather than models. When should I run my data tests? You should run your data tests whenever you are writing new code (to ensure you haven't broken any existing models by changing SQL), and whenever you run your transformations in production (to ensure that your assumptions about your source data are still valid). #### Document your models[​](#document-your-models "Direct link to Document your models") Adding [documentation](https://docs.getdbt.com/docs/build/documentation.md) to your project allows you to describe your models in rich detail, and share that information with your team. Here, we're going to add some basic documentation to our project. Update your `models/schema.yml` file to include some descriptions, such as those below. models/schema.yml ```yaml version: 2 models: - name: customers description: One record per customer columns: - name: customer_id description: Primary key data_tests: - unique - not_null - name: first_order_date description: NULL when a customer has not yet placed an order. - name: stg_customers description: This model cleans up customer data columns: - name: customer_id description: Primary key data_tests: - unique - not_null - name: stg_orders description: This model cleans up order data columns: - name: order_id description: Primary key data_tests: - unique - not_null - name: status data_tests: - accepted_values: arguments: # available in v1.10.5 and higher. Older versions can set the as the top-level property. values: ['placed', 'shipped', 'completed', 'return_pending', 'returned'] - name: customer_id data_tests: - not_null - relationships: arguments: to: ref('stg_customers') field: customer_id ``` * View in Catalog * View in Studio IDE [Catalog](https://docs.getdbt.com/docs/explore/explore-projects.md) provides powerful tools to interact with your dbt projects, including documentation: 1. From the IDE, run one of the following commands: * `dbt docs generate` if you're on dbt Core * `dbt build` if you're on the dbt Fusion engine 2. Click **Catalog** in the navigation menu to launch Catalog. 3. In the Catalog pane, click the environment selection dropdown menu at the top of the file tree and change it from **Production** to **Development**. [![View your development environment information.](/img/docs/collaborate/dbt-explorer/catalog-nav-dropdown.png?v=2 "View your development environment information.")](#)View your development environment information. 4. Select your project from the file tree. 5. Use the search bar or browse the resource list to find the `customers` model. 6. Click the model to view its details, including the descriptions you added. [![View your model's documentation and lineage in Catalog.](/img/docs/collaborate/dbt-explorer/example-model-details.png?v=2 "View your model's documentation and lineage in Catalog.")](#)View your model's documentation and lineage in Catalog. Catalog displays your model's description, column documentation, data tests, and lineage graph. You can also see which columns are missing documentation and track test coverage across your project. You can view docs directly from the IDE if you're on `Latest` or another version of dbt Core. Keep in mind that this is a legacy view and doesn't offer the same level of interactivity as Catalog. 1. In the IDE, run `dbt docs generate`. 2. From the navigation bar, click the **View docs** icon located to the right of the **branch name**. [![The View docs icon in the Studio IDE.](/img/docs/collaborate/dbt-explorer/docs-icon.png?v=2 "The View docs icon in the Studio IDE.")](#)The View docs icon in the Studio IDE. 3. From **Projects**, select your project name and expand the folders. 4. Click **models** > **marts** > **customers**. [![View your model's documentation in the legacy docs view.](/img/docs/collaborate/dbt-explorer/legacy-docs-view.png?v=2 "View your model's documentation in the legacy docs view.")](#)View your model's documentation in the legacy docs view. 3. Run `dbt docs serve` command to launch the documentation in a local website. ###### FAQs[​](#faqs-2 "Direct link to FAQs") How do I write long-form explanations in my descriptions? If you need more than a sentence to explain a model, you can: 1. Split your description over multiple lines using `>`. Interior line breaks are removed and Markdown can be used. This method is recommended for simple, single-paragraph descriptions: ```yml models: - name: customers description: > Lorem ipsum **dolor** sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. ``` 2. Split your description over multiple lines using `|`. Interior line breaks are maintained and Markdown can be used. This method is recommended for more complex descriptions: ```yml models: - name: customers description: | ### Lorem ipsum * dolor sit amet, consectetur adipisicing elit, sed do eiusmod * tempor incididunt ut labore et dolore magna aliqua. ``` 3. Use a [docs block](https://docs.getdbt.com/docs/build/documentation.md#using-docs-blocks) to write the description in a separate Markdown file. How do I access documentation in dbt Catalog? If you're using dbt to deploy your project and have a [Starter, Enterprise, or Enterprise+ plan](https://www.getdbt.com/pricing/), you can use Catalog to view your project's [resources](https://docs.getdbt.com/docs/build/projects.md) (such as models, tests, and metrics) and their lineage to gain a better understanding of its latest production state. Access Catalog in dbt by clicking the **Catalog** link in the navigation. You can have up to 5 read-only users access the documentation for your project. dbt developer plan and dbt Core users can use [dbt Docs](https://docs.getdbt.com/docs/explore/build-and-view-your-docs.md#dbt-docs), which generates basic documentation but it doesn't offer the same speed, metadata, or visibility as Catalog. ###### Next steps[​](#next-steps-1 "Direct link to Next steps") Before moving on from testing, make a change and see how it affects your results: * Write a test that fails, for example, omit one of the order statuses in the `accepted_values` list. What does a failing test look like? Can you debug the failure? * Run the tests for one model only. If you grouped your `stg_` models into a directory, try running the tests for all the models in that directory. * Use a [docs block](https://docs.getdbt.com/docs/build/documentation.md#using-docs-blocks) to add a Markdown description to a model. #### Commit updated changes[​](#commit-updated-changes "Direct link to Commit updated changes") You need to commit the changes you made to the project so that the repository has your latest code. 1. Add all your changes to git: `git add -A` 2. Commit your changes: `git commit -m "Add customers model, tests, docs"` 3. Push your changes to your repository: `git push -u origin add-customers-model` 4. Navigate to your repository, and open a pull request to merge the code into your master branch. #### Schedule a job[​](#schedule-a-job "Direct link to Schedule a job") We recommend using dbt as the easiest and most reliable way to [deploy jobs](https://docs.getdbt.com/docs/deploy/deployments.md) and automate your dbt project in production. For more info on how to get started, refer to [create and schedule jobs](https://docs.getdbt.com/docs/deploy/deploy-jobs.md#create-and-schedule-jobs). [![Overview of a dbt job run, which includes the job run details, trigger type, commit SHA, environment name, detailed run steps, logs, and more.](/img/docs/dbt-cloud/deployment/run-overview.png?v=2 "Overview of a dbt job run, which includes the job run details, trigger type, commit SHA, environment name, detailed run steps, logs, and more.")](#)Overview of a dbt job run, which includes the job run details, trigger type, commit SHA, environment name, detailed run steps, logs, and more. For more information about using dbt Core to schedule a job, refer [dbt airflow](https://docs.getdbt.com/blog/dbt-airflow-spiritual-alignment) blog post. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Quickstart for dbt Core using DuckDB [Back to guides](https://docs.getdbt.com/guides.md) dbt Core Quickstart Beginner [Menu ]() #### Introduction[​](#introduction "Direct link to Introduction") In this quickstart guide, you'll learn how to use dbt Core with DuckDB, enabling you to get set up quickly and efficiently. [DuckDB](https://duckdb.org/) is an open-source database management system which is designed for analytical workloads. It is designed to provide fast and easy access to large datasets, making it well-suited for data analytics tasks. This guide will demonstrate how to: * [Create a virtual development environment](https://docs.getdbt.com/docs/local/install-dbt.md#using-virtual-environments) using a template provided by dbt Labs. * We will set up a fully functional dbt environment with an operational and executable project. The codespace automatically connects to the DuckDB database and loads a year's worth of data from our fictional Jaffle Shop café, which sells food and beverages in several US cities. * Run through the steps outlined in the `jaffle_shop_duck_db` repository, but if you want to dig into the underlying code further, refer to the [README](https://github.com/dbt-labs/jaffle_shop_duckdb/blob/duckdb/README.md) for the Jaffle Shop template. * Run any dbt command from the environment’s terminal. * Generate a larger dataset for the Jaffle Shop café (for example, five years of data instead of just one). You can learn more through high-quality [dbt Learn courses and workshops](https://learn.getdbt.com). ##### Related content[​](#related-content "Direct link to Related content") * [DuckDB setup](https://docs.getdbt.com/docs/local/connect-data-platform/duckdb-setup.md) * [Create a GitHub repository](https://docs.getdbt.com/guides/manual-install.md?step=2) * [Build your first models](https://docs.getdbt.com/guides/manual-install.md?step=3) * [Test and document your project](https://docs.getdbt.com/guides/manual-install.md?step=4) #### Prerequisites[​](#prerequisites "Direct link to Prerequisites") * When using DuckDB with dbt Core, you'll need to use the dbt command-line interface (CLI). Currently, DuckDB is not supported in dbt. * It's important that you know some basics of the terminal. In particular, you should understand `cd`, `ls` , and `pwd` to navigate through the directory structure of your computer easily. * You have a [GitHub account](https://github.com/join). #### Set up DuckDB for dbt Core[​](#set-up-duckdb-for-dbt-core "Direct link to Set up DuckDB for dbt Core") This section will provide a step-by-step guide for setting up DuckDB for use in local (Mac and Windows) environments and web browsers. In the repository, there's a [`requirements.txt`](https://github.com/dbt-labs/jaffle_shop_duckdb/blob/duckdb/requirements.txt) file which is used to install dbt Core, DuckDB, and all other necessary dependencies. You can check this file to see what will be installed on your machine. It's typically located in the root directory of your project alongside other key files like `dbt_project.yml`. Otherwise, we will show you how in later steps. Below is an example of the `requirements.txt` file alongside other key files like `dbt_project.yml`: ```shell /my_dbt_project/ ├── dbt_project.yml ├── models/ │ ├── my_model.sql ├── tests/ │ ├── my_test.sql └── requirements.txt ``` For more information, refer to the [DuckDB setup](https://docs.getdbt.com/docs/local/connect-data-platform/duckdb-setup.md). * Local * Web browser 1. First, [clone](https://git-scm.com/docs/git-clone) the Jaffle Shop git repository by running the following command in your terminal: ```bash git clone https://github.com/dbt-labs/jaffle_shop_duckdb.git ``` 2. Change into the docs-duckdb directory from the command line: ```shell cd jaffle_shop_duckdb ``` 3. Install dbt Core and DuckDB in a virtual environment.  Example for Mac ```shell python3 -m venv venv source venv/bin/activate python3 -m pip install --upgrade pip python3 -m pip install -r requirements.txt source venv/bin/activate ```  Example for Windows ```shell python -m venv venv venv\Scripts\activate.bat python -m pip install --upgrade pip python -m pip install -r requirements.txt venv\Scripts\activate.bat ```  Example for Windows PowerShell ```shell python -m venv venv venv\Scripts\Activate.ps1 python -m pip install --upgrade pip python -m pip install -r requirements.txt venv\Scripts\Activate.ps1 ``` 4. Ensure your profile is setup correctly from the command line by running the following [dbt commands](https://docs.getdbt.com/reference/dbt-commands.md). * [dbt seed](https://docs.getdbt.com/reference/commands/seed.md) — loads CSV files located in the seed-paths directory of your project into your data warehouse * [dbt compile](https://docs.getdbt.com/reference/commands/compile.md) — generates executable SQL from your project source files * [dbt run](https://docs.getdbt.com/reference/commands/run.md) — compiles and runs your project * [dbt test](https://docs.getdbt.com/reference/commands/test.md) — compiles and tests your project * [dbt build](https://docs.getdbt.com/reference/commands/build.md) — compiles, runs, and tests your project * [dbt docs generate](https://docs.getdbt.com/reference/commands/cmd-docs.md#dbt-docs-generate) — generates your project's documentation. * [dbt docs serve](https://docs.getdbt.com/reference/commands/cmd-docs.md#dbt-docs-serve) — starts a webserver on port 8080 to serve your documentation locally and opens the documentation site in your default browser. For complete details, refer to the [dbt command reference](https://docs.getdbt.com/reference/dbt-commands.md). Here's what a successful output will look like: ```jinja (venv) ➜ jaffle_shop_duckdb git:(duckdb) dbt build 15:10:12 Running with dbt=1.8.1 15:10:13 Registered adapter: duckdb=1.8.1 15:10:13 Found 5 models, 3 seeds, 20 data tests, 416 macros 15:10:13 15:10:14 Concurrency: 24 threads (target='dev') 15:10:14 15:10:14 1 of 28 START seed file main.raw_customers ..................................... [RUN] 15:10:14 2 of 28 START seed file main.raw_orders ........................................ [RUN] 15:10:14 3 of 28 START seed file main.raw_payments ...................................... [RUN] .... 15:10:15 27 of 28 PASS relationships_orders_customer_id__customer_id__ref_customers_ .... [PASS in 0.32s] 15:10:15 15:10:15 Finished running 3 seeds, 3 view models, 20 data tests, 2 table models in 0 hours 0 minutes and 1.52 seconds (1.52s). 15:10:15 15:10:15 Completed successfully 15:10:15 15:10:15 Done. PASS=28 WARN=0 ERROR=0 SKIP=0 TOTAL=28 ``` To query data, some useful commands you can run from the command line: * `dbt show --select "raw_orders"` — run a query against the data warehouse and preview the results in the terminal. * [`dbt source`](https://docs.getdbt.com/reference/commands/source.md) — provides subcommands such as [`dbt source freshness`](https://docs.getdbt.com/reference/commands/source.md#dbt-source-freshness) that are useful when working with source data. * `dbt source freshness` — checks the freshness (how up to date) a specific source table is. note The steps will fail if you decide to run this project in your data warehouse (outside of this DuckDB demo). You will need to reconfigure the project files for your warehouse. Definitely consider this if you are using a community-contributed adapter. ##### Troubleshoot[​](#troubleshoot "Direct link to Troubleshoot")  Could not set lock on file error ```jinja IO Error: Could not set lock on file "jaffle_shop.duckdb": Resource temporarily unavailable ``` This is a known issue in DuckDB. Try disconnecting from any sessions that are locking the database. If you are using DBeaver, this means shutting down DBeaver (disconnecting doesn't always work). As a last resort, deleting the database file will get you back in action (*but* you will lose all your data). 1. Go to the `jaffle-shop-template` [repository](https://github.com/dbt-labs/jaffle_shop_duckdb) after you log in to your GitHub account. 2. Click **Use this template** at the top of the page and choose **Create new repository**. 3. Click **Create repository from template** when you’re done setting the options for your new repository. 4. Click **Code** (at the top of the new repository’s page). Under the **Codespaces** tab, choose **Create codespace on main**. Depending on how you've configured your computer's settings, this either opens a new browser tab with the Codespace development environment with VSCode running in it or opens a new VSCode window with the codespace in it. 5. Wait for the codespace to finish building by waiting for the `postCreateCommand` command to complete; this can take several minutes: [![Wait for postCreateCommand to complete](/img/codespace-quickstart/postCreateCommand.png?v=2 "Wait for postCreateCommand to complete")](#)Wait for postCreateCommand to complete When this command completes, you can start using the codespace development environment. The terminal the command ran in will close and you will get a prompt in a brand new terminal. 6. At the terminal's prompt, you can execute any dbt command you want. For example: ```shell /workspaces/test (main) $ dbt build ``` You can also use the [duckcli](https://duckdb.org/docs/api/cli/overview.html) to write SQL against the warehouse from the command line or build reports in the [Evidence](https://evidence.dev/) project provided in the `reports` directory. For complete information, refer to the [dbt command reference](https://docs.getdbt.com/reference/dbt-commands.md). Common commands are: * [dbt compile](https://docs.getdbt.com/reference/commands/compile.md) — generates executable SQL from your project source files * [dbt run](https://docs.getdbt.com/reference/commands/run.md) — compiles and runs your project * [dbt test](https://docs.getdbt.com/reference/commands/test.md) — compiles and tests your project * [dbt build](https://docs.getdbt.com/reference/commands/build.md) — compiles, runs, and tests your project #### Generate a larger data set[​](#generate-a-larger-data-set "Direct link to Generate a larger data set") If you'd like to work with a larger selection of Jaffle Shop data, you can generate an arbitrary number of years of fictitious data from within your codespace. 1. Install the Python package called [jafgen](https://pypi.org/project/jafgen/). At the terminal's prompt, run: ```shell python -m pip install jafgen ``` 2. When installation is done, run: ```shell jafgen [number of years to generate] # e.g. jafgen 6 ``` Replace `NUMBER_OF_YEARS` with the number of years you want to simulate. For example, to generate data for 6 years, you would run: `jafgen --years 6`. This command builds the CSV files and stores them in the `jaffle-data` folder, and is automatically sourced based on the `sources.yml` file and the [dbt-duckdb](https://docs.getdbt.com/docs/local/connect-data-platform/duckdb-setup.md) adapter. As you increase the number of years, it takes exponentially more time to generate the data because the Jaffle Shop stores grow in size and number. For a good balance of data size and time to build, dbt Labs suggests a maximum of 6 years. #### Next steps[​](#next-steps "Direct link to Next steps") Now that you have dbt Core, DuckDB, and the Jaffle Shop data up and running, you can explore dbt's capabilities. Refer to these materials to get a better understanding of dbt projects and commands: * The [About projects](https://docs.getdbt.com/docs/build/projects.md) page guides you through the structure of a dbt project and its components. * [dbt command reference](https://docs.getdbt.com/reference/dbt-commands.md) explains the various commands available and what they do. * [dbt Labs courses](https://courses.getdbt.com/collections) offer a variety of beginner, intermediate, and advanced learning modules designed to help you become a dbt expert. * Once you see the potential of dbt and what it can do for your organization, sign up for a free trial of [dbt](https://www.getdbt.com/signup). It's the fastest and easiest way to deploy dbt today! * Check out the other [quickstart guides](https://docs.getdbt.com/guides.md?tags=Quickstart) to begin integrating into your existing data warehouse. Additionally, with your new understanding of the basics of using DuckDB, consider optimizing your setup by [documenting your project](https://docs.getdbt.com/guides/duckdb.md#document-your-project), [commit your changes](https://docs.getdbt.com/guides/duckdb.md#commit-your-changes) and, [schedule a job](https://docs.getdbt.com/guides/duckdb.md#schedule-a-job). ##### Document your project[​](#document-your-project "Direct link to Document your project") To document your dbt projects with DuckDB, follow the steps: * Use the `dbt docs generate` command to compile information about your dbt project and warehouse into `manifest.json` and `catalog.json` files * Run the [`dbt docs serve`](https://docs.getdbt.com/reference/commands/cmd-docs.md#dbt-docs-serve) command to create a local website using the generated `.json` files. This allows you to view your project's documentation in a web browser. * Enhance your documentation by adding [descriptions](https://docs.getdbt.com/reference/resource-properties/description.md) to models, columns, and sources using the `description` key in your YAML files. ##### Commit your changes[​](#commit-your-changes "Direct link to Commit your changes") Commit your changes to ensure the repository is up to date with the latest code. 1. In the GitHub repository you created for your project, run the following commands in the terminal: ```shell git add git commit -m "Your commit message" git push ``` 2. Go back to your GitHub repository to verify your new files have been added. ##### Schedule a job[​](#schedule-a-job "Direct link to Schedule a job") 1. Ensure dbt Core is installed and configured to connect to your DuckDB instance. 2. Create a dbt project and define your [`models`](https://docs.getdbt.com/docs/build/models.md), [`seeds`](https://docs.getdbt.com/reference/seed-properties.md), and [`tests`](https://docs.getdbt.com/reference/commands/test.md). 3. Use a scheduler such [Prefect](https://docs.getdbt.com/docs/deploy/deployment-tools.md#prefect) to schedule your dbt runs. You can create a DAG (Directed Acyclic Graph) that triggers dbt commands at specified intervals. 4. Write a script that runs your dbt commands, such as [`dbt run`](https://docs.getdbt.com/reference/commands/run.md), `dbt test` and more so. 5. Use your chosen scheduler to run the script at your desired frequency. Congratulations on making it through the guide 🎉! #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Quickstart for the dbt Catalog workshop [Back to guides](https://docs.getdbt.com/guides.md) Explorer Snowflake dbt platform Quickstart Catalog Beginner [Menu ]() #### Introduction[​](#introduction "Direct link to Introduction") Unlock the power of [Catalog](https://docs.getdbt.com/docs/explore/explore-projects.md) in this hands-on workshop designed for analytics engineers, data analysts, stakeholders, and data leaders. This quickstart guide accompanies the Catalog hands-on workshop and helps you dive into a production-level Mesh implementation and discover how to explore your data workflows.⁠ Whether you're looking to streamline your data operations, improve data quality, or self-serve information about your data platform, this workshop will equip you with the tools and knowledge to take your dbt projects to the next level. By the end of the guide and workshop, you'll understand how to leverage Catalog and have the confidence to navigate multiple dbt projects, trace dependencies, and identify opportunities to improve performance and data quality. ##### What you'll learn[​](#what-youll-learn "Direct link to What you'll learn") In this guide, you will learn how to: * Navigate multiple dbt projects using Catalog * Self-serve on data documentation * Trace dependencies at the model and column level * Identify opportunities to improve performance and data quality ##### Prerequisites[​](#prerequisites "Direct link to Prerequisites") * Familiarity with data platforms #### Setup[​](#setup "Direct link to Setup") Now we’ll be creating your dbt account and connecting it to a data warehouse. 1. Go to this URL (sign out if you're already logged in): 2. Enter your first name and last name. 3. Select the **Exploring a Mesh implementation with Catalog** option. 4. Use the passcode provided by the workshop facilitator. 5. Agree to the terms of service and click the **Complete Registration** button. 6. Wait about 30 seconds, you’ll be in the dbt account for this workshop and already connected to a data warehouse. 7. Toggle into the **Platform project**. Go to the **Orchestration** tab and select **Jobs** from the dropdown menu. 8. Run each job you see by clicking on the job and then selecting **Run now**. This will run the *upstream* project job in both a production and staging environment. 9. Toggle into the **Analytics project**. Go to the **Orchestration** tab and select **Jobs** from the dropdown menu. 10. Run each job you see by clicking on the job and then selecting **Run now**. This will run the *downstream* project job in both a production and staging environment. [![Run the jobs](/img/quickstarts/dbt-cloud/run_job.png?v=2 "Run the jobs")](#)Run the jobs #### Performance[​](#performance "Direct link to Performance") [![dbt Catalog's Performance tab](/img/quickstarts/dbt-cloud/explorer_performance_tab.png?v=2 "dbt Catalog's Performance tab")](#)dbt Catalog's Performance tab Catalog will show you your project's most executed models, longest model executions, most failed models and tests, and most consumed models all in one place: The performance tab. ##### Hands-On[​](#hands-on "Direct link to Hands-On") * Trigger the Daily Prod job to run again. * Explore the **Performance** tab on the **Project details** page. * Which model took the longest over the last two weeks? Over the last month? * Which model failed the most tests? * Click on the model that took the longest to run in the *Longest model executions* graph. * What is the average duration time over the last two weeks? Over the last month? * How often is the model being built? What is the Model Test Failure Rate? #### Resources[​](#resources "Direct link to Resources") With Catalog, you can view your project's resources (such as models, tests, and metrics), their lineage, and model consumption to gain a better understanding of its latest production state. Navigate and manage your projects within dbt to help you and other data developers, analysts, and consumers discover and leverage your dbt resources. [![dbt Catalog's Models tab](/img/quickstarts/dbt-cloud/explorer_models_tab.png?v=2 "dbt Catalog's Models tab")](#)dbt Catalog's Models tab ##### Hands-On[​](#hands-on-1 "Direct link to Hands-On") * Explore the **Model** tab * Pick a model. What’s its row count? * Use the test results drop down to see if this model’s tests passed. What other models does it depend on? * Explore the **Tests** tab * What tests do we see? Which tests have warnings? Failures? * Explore the **Sources** tab * What sources can we see? Which sources have stale data? Which sources have fresh data? * Explore **Exposures** * Use the lineage graph to find an exposure. Which models and metrics does the Exposure reference? #### Lineage[​](#lineage "Direct link to Lineage") Catalog provides a visualization of your project’s DAG that you can interact with. The nodes in the lineage graph represent the project’s resources and the edges represent the relationships between the nodes. Nodes are color-coded and include iconography according to their resource type. * Use the search bar and [node selectors](https://docs.getdbt.com/reference/node-selection/syntax.md) to filter your DAG. * [Lenses](https://docs.getdbt.com/docs/explore/explore-projects.md#lenses) make it easier to understand your project’s contextual metadata at scales, especially to distinguish a particular model or a subset of models. * Applying a lens adds tags to the nodes, showing metadata like layer values, with color coding to help you distinguish them. [![dbt Catalog's lineage graph](/img/quickstarts/dbt-cloud/dbt_explorer_dag.png?v=2 "dbt Catalog's lineage graph")](#)dbt Catalog's lineage graph * Use the [advanced search](https://docs.getdbt.com/docs/explore/explore-projects.md#search-resources) feature to locate resources in your project. * Perform hard searches and keyword searches. * All resource names, column names, resource descriptions, warehouse relations, and code matching your search criteria will appear in the center of the page. * Apply filters to fully refine your search. * When searching for a column name, the results show all relational nodes containing that column in their schemas. [![dbt Catalog's advanced search feature](/img/quickstarts/dbt-cloud/dbt_explorer_advanced_search.png?v=2 "dbt Catalog's advanced search feature")](#)dbt Catalog's advanced search feature ##### Hands-On[​](#hands-on-2 "Direct link to Hands-On") * Explore **Project-Level lineage** * Pick a model and review its upstream and downstream dependencies * Which sources does this model depend on? Which models depend on this model? * Explore **Lenses** * Apply the Test Status Lenses. Which models passed tests? Which had warnings? * Explore different lenses (Model Layer, Materialization Type, Resource). What information do you see? * Explore **Column-Level Lineage** * Navigate to the model’s **Model resource** page and explore the primary key column’s **Column-Level Lineage** #### Multi-project[​](#multi-project "Direct link to Multi-project") Use Catalog to gain a deeper understanding of *all* your dbt projects with its [multi-project capabilities](https://docs.getdbt.com/docs/explore/explore-multiple-projects.md). * See the number of public, protected, and private models, as well as metrics for each project. * View cross-project lineage and navigate between individual projects’ lineage graphs. * Explore column-level lineage across projects. ##### Hands-On[​](#hands-on-3 "Direct link to Hands-On") * In the lineage graph, filter the Platform Project’s Project-Level Lineage for Public models using the `access:public` filter * Make a note of which models are referenced by the analytics project. * Explore the Analytics Project’s lineage * Choose a model in the Platform project referenced by the Analytics project. * Look at the multi-project column-level lineage of its primary key column. * Open the Analytics project’s lineage graph. Which models does it reference? #### Project recommendations[​](#project-recommendations "Direct link to Project recommendations") These recommendations are designed to build trust in your project and reduce confusion. To learn more about the specific suggestions and the reasons behind them, check out [our docs](https://docs.getdbt.com/docs/explore/project-recommendations.md). [![dbt Catalog's project recommendation tab](/img/quickstarts/dbt-cloud/dbt_explorer_project_recs.png?v=2 "dbt Catalog's project recommendation tab")](#)dbt Catalog's project recommendation tab ##### Hands-On[​](#hands-on-4 "Direct link to Hands-On") * Review your project recommendations. * Find the project recommendation for the model `agg_daily_returned_orders`. * Add documentation to this model in the `aggregates.yml` file. #### What's next[​](#whats-next "Direct link to What's next") Congratulations! You've completed the Catalog workshop. You now have the tools and knowledge to navigate multiple dbt projects, trace dependencies, and identify opportunities to improve performance and data quality. You've learned how to: * Use Catalog to visualize your project’s lineage and interact with the DAG * Search for resources in your project and apply filters to refine your search * Explore lenses and find table materializations in your current project * Navigate multiple dbt projects using Catalog * Trace dependencies at the model and column level * Review project recommendations and implement improvements For the next steps, you can check out the [Catalog documentation](https://docs.getdbt.com/docs/explore/explore-projects.md) and [FAQs](https://docs.getdbt.com/docs/explore/dbt-explorer-faqs.md) to learn more about how to use Catalog. Keep an eye out for new features coming out soon, like: * [Visualize downstream exposures](https://docs.getdbt.com/docs/cloud-integrations/downstream-exposures-tableau.md) integrations (like Tableau). * [Model query history](https://docs.getdbt.com/docs/explore/model-query-history.md) for additional warehouses (like Redshift and Databricks) * Improvements to [data health tiles](https://docs.getdbt.com/docs/explore/data-tile.md) #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Quickstart for the dbt Fusion engine [Back to guides](https://docs.getdbt.com/guides.md) dbt Fusion engine dbt Cloud Quickstart Beginner [Menu ]() #### Introduction[​](#introduction "Direct link to Introduction") important The dbt Fusion engine is currently available for installation in: * [Local command line interface (CLI) tools](https://docs.getdbt.com/docs/local/install-dbt.md?version=2#get-started) [Preview](https://docs.getdbt.com/docs/dbt-versions/product-lifecycles "Go to https://docs.getdbt.com/docs/dbt-versions/product-lifecycles") * [VS Code and Cursor with the dbt extension](https://docs.getdbt.com/docs/install-dbt-extension.md) [Preview](https://docs.getdbt.com/docs/dbt-versions/product-lifecycles "Go to https://docs.getdbt.com/docs/dbt-versions/product-lifecycles") * [dbt platform environments](https://docs.getdbt.com/docs/dbt-versions/upgrade-dbt-version-in-cloud.md#dbt-fusion-engine) [Private preview](https://docs.getdbt.com/docs/dbt-versions/product-lifecycles "Go to https://docs.getdbt.com/docs/dbt-versions/product-lifecycles") Join the conversation in our Community Slack channel [`#dbt-fusion-engine`](https://getdbt.slack.com/archives/C088YCAB6GH). Read the [Fusion Diaries](https://github.com/dbt-labs/dbt-fusion/discussions/categories/announcements) for the latest updates. The dbt Fusion engine is a powerful new approach to classic dbt ideas! Completely rebuilt from the ground up in Rust, Fusion lets you compile and run your dbt projects faster than ever — often in seconds. This quickstart guide will get you from zero to running your first dbt project with Fusion + VS Code. By the end, you’ll have: * A working dbt project (`jaffle_shop`) built with the dbt Fusion engine * The dbt VS Code extension installed and connected * The ability to preview, compile, and run dbt commands directly from your IDE ##### About the dbt Fusion engine[​](#about-the-dbt-fusion-engine "Direct link to About the dbt Fusion engine") Fusion and the features it provides are available in multiple environments: | Environment | How to use Fusion | | ------------------------ | ----------------------------------------------------------------------------------------------------------------------------------------------------------------- | | **Studio IDE** | Fusion is automatically enabled; just [upgrade your environment(s)](https://docs.getdbt.com/docs/dbt-versions/upgrade-dbt-version-in-cloud.md#dbt-fusion-engine). | | **dbt CLI (local)** | [Install dbt Fusion engine](https://docs.getdbt.com/docs/local/install-dbt.md?version=2#get-started) locally following this guide. | | **VS Code / Cursor IDE** | [Install the dbt extension](https://docs.getdbt.com/docs/install-dbt-extension.md) to unlock Fusion's interactive power in your editor. | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | To learn more about which tool is best for you, see the [Fusion availability](https://docs.getdbt.com/docs/fusion/fusion-availability.md) page. To learn about the dbt Fusion engine and how it works, read more [about the dbt Fusion engine](https://docs.getdbt.com/docs/fusion/about-fusion.md). #### Prerequisites[​](#prerequisites "Direct link to Prerequisites") To take full advantage of this guide, you'll need to meet the following prerequisites: * You should have a basic understanding of [dbt projects](https://docs.getdbt.com/docs/build/projects.md), [git workflows](https://docs.getdbt.com/docs/cloud/git/git-version-control.md), and [data warehouse requirements](https://docs.getdbt.com/docs/supported-data-platforms.md). * Make sure you're using a supported adapter and authentication method:  BigQuery * Service Account / User Token * Native OAuth * External OAuth * [Required permissions](https://docs.getdbt.com/docs/local/connect-data-platform/bigquery-setup.md#required-permissions)  Databricks * Service Account / User Token * Native OAuth  Redshift * Username / Password * IAM profile  Snowflake * Username / Password * Native OAuth * External OAuth * Key pair using a modern PKCS#8 method * MFA * You need a macOS (Terminal), Linux, or Windows (Powershell) machine to run the dbt Fusion engine. * You need to have [Visual Studio Code](https://code.visualstudio.com/) installed. The [Cursor](https://www.cursor.com/en) code editor will also work, but these instructions will focus on VS Code. * You need admin or install privileges on your machine. ##### What you’ll learn[​](#what-youll-learn "Direct link to What you’ll learn") By following this guide, you will: * Set up a fully functional dbt environment with an operational project * Install and use the dbt Fusion engine + dbt VS Code extension * Run dbt commands from your IDE or terminal * Preview data, view lineage, and write SQL faster with autocomplete, and more! You can learn more through high-quality [dbt Learn courses and workshops](https://learn.getdbt.com/). #### Installation[​](#installation "Direct link to Installation") It's easy to think of the dbt Fusion engine and the dbt extension as two different products, but they're a powerful combo that works together to unlock the full potential of dbt. Think of the dbt Fusion engine as exactly that — an engine. The dbt extension and VS Code are the chassis, and together they form a powerful vehicle for transforming your data. info * You can install the dbt Fusion engine and use it standalone with the CLI. * You *cannot* use the dbt extension without Fusion installed. The following are the essential steps from the [dbt Fusion engine](https://docs.getdbt.com/docs/local/install-dbt.md?version=2#get-started) and [extension](https://docs.getdbt.com/docs/install-dbt-extension.md) installation guides: * macOS & Linux * Windows (PowerShell) 1. Run the following command in the terminal to install the `dbtf` binary — Fusion’s CLI command. ```shell curl -fsSL https://public.cdn.getdbt.com/fs/install/install.sh | sh -s -- --update ``` 2. To use `dbtf` immediately after installation, reload your shell so that the new `$PATH` is recognized: ```shell exec $SHELL ``` Or you can close and reopen your terminal window. This will load the updated environment settings into the new session. 1) Run the following command in PowerShell to install the `dbtf` binary: ```powershell irm https://public.cdn.getdbt.com/fs/install/install.ps1 | iex ``` 2) To use `dbtf` immediately after installation, reload your shell so that the new `Path` is recognized: ```powershell Start-Process powershell ``` Or you can close and reopen your terminal window. This will load the updated environment settings into the new session. ##### Verify the dbt Fusion engine installation[​](#verify-the--installation "Direct link to verify-the--installation") 1. After installation, open a new command-line window to confirm that Fusion was installed correctly by checking the version. ```bash dbtf --version ``` 2. You should see output similar to the following: ```bash dbt-fusion 2.0.0-preview.45 ``` tip You can run these commands using `dbt`, or use `dbtf` as an unambiguous alias for Fusion, if you have another dbt CLI installed on your machine. ##### Install the dbt VS Code extension[​](#install-the-dbt-vs-code-extension "Direct link to Install the dbt VS Code extension") The dbt VS Code extension is available in the [Visual Studio extension marketplace](https://marketplace.visualstudio.com/items?itemName=dbtLabsInc.dbt). Download it directly from your VS Code editor: 1. Navigate to the **Extensions** tab of VS Code (or Cursor). 2. Search for `dbt` and choose the one from the publisher `dbt Labs Inc`. [![Search for the extension](/img/docs/extension/extension-marketplace.png?v=2 "Search for the extension")](#)Search for the extension 3. Click **Install**. 4. When the prompt appears, you can register the extension now or skip it (you can register later). You can also check out our [installation instructions](https://docs.getdbt.com/docs/install-dbt-extension.md) to come back to it later. 5. Confirm you've installed the extension by looking for the **dbt Extension** label in the status bar. If you see it, the extension was installed successfully! [![Verify installation in the status bar.](/img/docs/extension/extension-lsp-download.png?v=2 "Verify installation in the status bar.")](#)Verify installation in the status bar. #### Initialize the Jaffle Shop project[​](#initialize-the-jaffle-shop-project "Direct link to Initialize the Jaffle Shop project") Now let's create your first dbt project powered by Fusion! 1. Run `dbt init` to set up an example project and configure a database connection profile. * If you *do not* have a connection profile that you want to use, start with `dbt init` and use the prompts to configure a profile: * If you already have a connection profile that you want to use, use the `--skip-profile-setup` flag then edit the generated `dbt_project.yml` to replace `profile: jaffle_shop` with `profile: `. ```bash dbtf init --skip-profile-setup ``` * If you created new credentials through the interactive prompts, `init` automatically runs `dbtf debug` at the end. This ensures the newly created profile establishes a valid connection with the database. 2. Change directories into your newly created project: ```bash cd jaffle_shop ``` 3. Build your dbt project (which includes creating example data): ```bash dbtf build ``` This will: * Load example data into your warehouse * Create, build, and test models * Verify your dbt environment is fully operational #### Explore with the dbt VS Code extension[​](#explore-with-the-dbt-vs-code-extension "Direct link to Explore with the dbt VS Code extension") The dbt VS Code extension compiles and builds your project with the dbt Fusion engine, a powerful and blazing fast rebuild of dbt from the ground up. Want to see Fusion in action? Check out the following video to get a sense of how it works: [dbt Fusion + VS Code extension walkthrough](https://app.storylane.io/share/a1rkqx0mbd7a) Now that your project works, open it in VS Code and see Fusion in action: 1. In VS Code, open the **View** menu and click **Command Palette**. Enter **Workspaces: Add Folder to Workspace**. 2. Select your `jaffle_shop` folder. If you don't add the root folder of the dbt project to the workspace, the [dbt language server](https://docs.getdbt.com/blog/dbt-fusion-engine-components#the-dbt-vs-code-extension-and-language-server) (LSP) will not run. The LSP enables features like autocomplete, hover info, and inline error highlights. 3. Open a model file to see the definition for the `orders` model. This is the model we'll use in all of the examples below. ```bash models/marts/orders.sql ``` 4. Locate **Lineage** and **Query Results** in the lower panel, and the **dbt icon** in the upper right corner next to your editor groups. If you see all of these, the extension is installed correctly and running! [![The VS Code UI with the extension running.](/img/docs/extension/extension-running.png?v=2 "The VS Code UI with the extension running.")](#)The VS Code UI with the extension running. Now you're ready to see some of these awesome features in action! * [Preview data and code](#preview-data-and-code) * [Navigate your project with lineage tools](#navigate-your-project-with-lineage-tools) * [Use the power of SQL understanding](#use-the-power-of-sql-understanding) * [Speed up common dbt commands](#speed-up-common-dbt-commands) ###### Preview data and code[​](#preview-data-and-code "Direct link to Preview data and code") Gain valuable insights into your data transformation during each step of your development process. You can quickly access model results and underlying data structures directly from your code. These previews help validate your code step-by-step. 1. Locate the **table icon** for **Preview File** in the upper right corner. Click it to preview results in the **Query Results** tab. [![Preview model query results.](/img/docs/extension/preview-query-results.png?v=2 "Preview model query results.")](#)Preview model query results. 2. Click **Preview CTE** above `orders as (` to preview results in the **Query Results** tab. [![Preview CTE query results.](/img/docs/extension/preview-cte-query-results-3.png?v=2 "Preview CTE query results.")](#)Preview CTE query results. 3. Locate the code icon for **Compile File** in between the dbt and the table icons. Clicking this icon opens a window with the compiled version of the model. [![Compile File icon.](/img/docs/extension/compile-file-icon.png?v=2 "Compile File icon.")](#)Compile File icon. [![Compile File results.](/img/docs/extension/compile-file.png?v=2 "Compile File results.")](#)Compile File results. ###### Navigate your project with lineage tools[​](#navigate-your-project-with-lineage-tools "Direct link to Navigate your project with lineage tools") Almost as important as where your data is going is where it's been. The lineage tools in the extension let you visualize the lineage of the resources in your models as well as the column-level lineage. These capabilities deepen your understanding of model relationships and dependencies. 1. Open the **Lineage** tab to visualize the model-level lineage of this model. [![Visualizing model-level lineage.](/img/docs/extension/extension-pane.png?v=2 "Visualizing model-level lineage.")](#)Visualizing model-level lineage. 2. Open the **View** menu, click **Command Palette** and enter `dbt: Show Column Lineage` to visualize the column-level lineage in the **Lineage** tab. [![Show column-level lineage.](/img/docs/extension/show-cll.png?v=2 "Show column-level lineage.")](#)Show column-level lineage. ###### Use the power of SQL understanding[​](#use-the-power-of-sql-understanding "Direct link to Use the power of SQL understanding") Code smarter, not harder. The autocomplete and context clues help avoid mistakes and enable you to write fast and accurate SQL. Catch issues before you commit them! 1. To see **Autocomplete** in action, delete `ref('stg_orders')`, and begin typing `ref(stg_` to see the subset of matching model names. Use up and down arrows to select `stg_orders`. [![Autocomplete for a model name.](/img/docs/extension/autocomplete.png?v=2 "Autocomplete for a model name.")](#)Autocomplete for a model name. 2. Hover over any `*` to see the list of column names and data types being selected. [![Hovering over \* to see column names and data types.](/img/docs/extension/hover-star.png?v=2 "Hovering over * to see column names and data types.")](#)Hovering over \* to see column names and data types. ###### Speed up common dbt commands[​](#speed-up-common-dbt-commands "Direct link to Speed up common dbt commands") Testing, testing... is this mic on? It is and it's ready to execute your commands with blazing fast speeds! When you want to test your code against various dbt commands: 1. The dbt icon in the top right opens a list of extension-specific commands: [![Select a command via the dbt icon.](/img/docs/extension/run-command.png?v=2 "Select a command via the dbt icon.")](#)Select a command via the dbt icon. 2. Opening the **View** menu, clicking the **Command Palette**, and entering `>dbt:` in the command bar shows all the new commands that are available. [![dbt commands in the command bar.](/img/docs/extension/extension-commands-all.png?v=2 "dbt commands in the command bar.")](#)dbt commands in the command bar. Try choosing some of them and see what they do 😎 This is just the start. There is so much more available and so much more coming. Be sure to check out our resources for all the information about the dbt Fusion engine and the dbt VS Code extension! #### Troubleshooting[​](#troubleshooting "Direct link to Troubleshooting") If you run into any issues, check out the troubleshooting section below.  How to create a .dbt directory in root and move dbt\_cloud.yml file If you've never had a `.dbt` directory, you should perform the following recommended steps to create one. If you already have a `.dbt` directory, move the `dbt_cloud.yml` file into it. Some information about the `.dbt` directory: * A `.dbt` directory is a hidden folder in the root of your filesystem. It's used to store your dbt configuration files. The `.` prefix is used to create a hidden folder, which means it's not visible in Finder or File Explorer by default. * To view hidden files and folders, press Command + Shift + G on macOS or Ctrl + Shift + G on Windows. This opens the "Go to Folder" dialog where you can search for the `.dbt` directory. - Create a .dbt directory - Move the dbt\_cloud.yml file 1. Clone your dbt project repository locally. 2. Use the `mkdir` command followed by the name of the folder you want to create. * If using macOS, add the `~` prefix to create a `.dbt` folder in the root of your filesystem: ```bash mkdir ~/.dbt # macOS mkdir %USERPROFILE%\.dbt # Windows ``` You can move the `dbt_cloud.yml` file into the `.dbt` directory using the `mv` command or by dragging and dropping the file into the `.dbt` directory by opening the Downloads folder using the "Go to Folder" dialog and then using drag-and-drop in the UI. To move the file using the terminal, use the `mv/move` command. This command moves the `dbt_cloud.yml` from the `Downloads` folder to the `.dbt` folder. If your `dbt_cloud.yml` file is located elsewhere, adjust the path accordingly. ###### Mac or Linux[​](#mac-or-linux "Direct link to Mac or Linux") In your command line, use the `mv` command to move your `dbt_cloud.yml` file into the `.dbt` directory. If you've just downloaded the `dbt_cloud.yml` file and it's in your Downloads folder, the command might look something like this: ```bash mv ~/Downloads/dbt_cloud.yml ~/.dbt/dbt_cloud.yml ``` ###### Windows[​](#windows "Direct link to Windows") In your command line, use the move command. Assuming your file is in the Downloads folder, the command might look like this: ```bash move %USERPROFILE%\Downloads\dbt_cloud.yml %USERPROFILE%\.dbt\dbt_cloud.yml ```  I can't see the lineage tab in Cursor If you're using the dbt VS Code extension in Cursor, the lineage tab works best in Editor mode and doesn't render in Agent mode. If you're in Agent mode and the lineage tab isn't rendering, just switch to Editor mode to view your project's table and column lineage.  The extension gets stuck in a loading state If the extension is attempting to activate during startup and locks into a permanent loading state, check that: * Your dbt VS Code extension is on the latest version. * Your IDE is on the latest version. * You have a valid `dbt_cloud.yml` file configured and in the [correct location](#register-with-dbt_cloudyml). If you're still experiencing issues, try these steps before contacting dbt Support: * Delete and download a new copy of your `dbt_cloud.yml` file. * Delete and reinstall the dbt VS Code extension.  dbt platform configurations If you're a cloud-based dbt platform user who has the `dbt-cloud:` config in the `dbt_project.yml` file and are also using dbt Mesh, you must have the project ID configured: ```yaml dbt-cloud: project-id: 12345 # Required ``` If you don’t configure this correctly, cross-platform references will not resolve properly, and you will encounter errors executing dbt commands.  dbt extension not activating If the dbt extension has activated successfully, you will see the **dbt Extension** label in the status bar at the bottom left of your editor. You can view diagnostic information about the dbt extension by clicking the **dbt Extension** button. If the **dbt Extension** label is not present, then it is likely that the dbt extension was not installed successfully. If this happens, try uninstalling the extension, restarting your editor, and then reinstalling the extension. **Note:** It is possible to "hide" status bar items in VS Code. Double-check if the dbt Extension status bar label is hidden by right-clicking on the status bar in your editor. If you see dbt Extension in the right-click menu, then the extension has installed successfully.  Missing dbt LSP features If you receive a `no active LSP for this workspace` error message or aren't seeing dbt Language Server (LSP) features in your editor (like autocomplete, go-to-definition, or hover text), start by first following the general troubleshooting steps mentioned earlier. If you've confirmed the dbt extension is installed correctly but don't see LSP features, try the following: 1. Check extension version — Ensure that you're using the latest available version of the dbt extension by: * Opening the **Extensions** page in your editor, or * Going to the **Output** tab and looking for the version number, or * Running `dbtf --version` in the terminal. 2. Reinstall the LSP — If the version is correct, reinstall the LSP: 1. Open the Command Palette: Command + Shift + P (macOS) or Ctrl + Shift + P (Windows/Linux). 2. Paste `dbt: Reinstall dbt LSP` and enter. This command downloads the LSP and re-activates the extension to resolve the error.  Unsupported dbt version If you see an error message indicating that your version of dbt is unsupported, then there is likely a problem with your environment. Check the dbt Path setting in your VS Code settings. If this path is set, ensure that it is pointing to a valid dbt Fusion Engine executable. If necessary, you can also install the dbt Fusion Engine directly using these instructions: [Install the Fusion CLI](https://docs.getdbt.com/docs/local/install-dbt.md?version=2#get-started)  Addressing the 'dbt language server is not running in this workspace' error To resolve the `dbt language server is not running in this workspace` error, you need to add your dbt project folder to a workspace: 1. In VS Code, click **File** in the toolbar then select **Add Folder to Workspace**. 2. Select the dbt project file you want to add to a workspace. 3. To save your workspace, click **File** then select **Save Workspace As**. 4. Navigate to the location you want to save your workspace. This should resolve the error and open your dbt project by opening the workspace it belongs to. For more information on workspaces, refer to [What is a VS Code workspace?](https://code.visualstudio.com/docs/editing/workspaces/workspaces).  Manifest cannot be downloaded from the dbt platform If the dbt VS Code extension cannot download the manifest from the dbt platform or you get `warning: dbt1200: Failed to download manifest` using Fusion locally, you are probably having DNS-related issues. To confirm this, do a DNS lookup for the host Fusion is trying to download from (for example, prodeu2.blob.core.windows.net) by using `dig` on Linux/Mac or `nslookup` on Windows. If this doesn't return an IP address, the likely reason is that your company uses the same cloud provider with private endpoints for cloud resources, and DNS requests for these are forwarded to private DNS zones. This situation can be remedied by setting up an internet fallback, which will then return a public IP to any cloud storage that does not have a private IP registered with the private DNS zone. For Azure refer to [Fallback to internet for Azure Private DNS zones](https://learn.microsoft.com/en-us/azure/dns/private-dns-fallback). #### More information about Fusion[​](#more-information-about-fusion "Direct link to More information about Fusion") Fusion marks a significant update to dbt. While many of the workflows you've grown accustomed to remain unchanged, there are a lot of new ideas, and a lot of old ones going away. The following is a list of the full scope of our current release of the Fusion engine, including implementation, installation, deprecations, and limitations: * [About the dbt Fusion engine](https://docs.getdbt.com/docs/fusion/about-fusion.md) * [About the dbt extension](https://docs.getdbt.com/docs/about-dbt-extension.md) * [New concepts in Fusion](https://docs.getdbt.com/docs/fusion/new-concepts.md) * [Supported features matrix](https://docs.getdbt.com/docs/fusion/supported-features.md) * [Installing Fusion CLI](https://docs.getdbt.com/docs/local/install-dbt.md?version=2#get-started) * [Installing VS Code extension](https://docs.getdbt.com/docs/install-dbt-extension.md) * [Fusion release track](https://docs.getdbt.com/docs/dbt-versions/upgrade-dbt-version-in-cloud.md#dbt-fusion-engine) * [Quickstart for Fusion](https://docs.getdbt.com/guides/fusion.md?step=1) * [Upgrade guide](https://docs.getdbt.com/docs/dbt-versions/core-upgrade/upgrading-to-fusion.md) * [Fusion licensing](http://www.getdbt.com/licenses-faq) #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Quickstart for the dbt Semantic Layer and Snowflake [Back to guides](https://docs.getdbt.com/guides.md) Semantic Layer Snowflake dbt platform Quickstart Intermediate [Menu ]() #### Introduction[​](#introduction "Direct link to Introduction") The [Semantic Layer](https://docs.getdbt.com/docs/use-dbt-semantic-layer/dbt-sl.md), powered by [MetricFlow](https://docs.getdbt.com/docs/build/about-metricflow.md), simplifies the setup of key business metrics. It centralizes definitions, avoids duplicate code, and ensures easy access to metrics in downstream tools. MetricFlow helps manage company metrics easier, allowing you to define metrics in your dbt project and query them in dbt with [MetricFlow commands](https://docs.getdbt.com/docs/build/metricflow-commands.md). 📹 Learn about the dbt Semantic Layer with on-demand video courses! Explore our [dbt Semantic Layer on-demand course](https://learn.getdbt.com/courses/semantic-layer) to learn how to define and query metrics in your dbt project. Additionally, dive into mini-courses for querying the dbt Semantic Layer in your favorite tools: [Tableau](https://courses.getdbt.com/courses/tableau-querying-the-semantic-layer), [Excel](https://learn.getdbt.com/courses/querying-the-semantic-layer-with-excel), [Hex](https://courses.getdbt.com/courses/hex-querying-the-semantic-layer), and [Mode](https://courses.getdbt.com/courses/mode-querying-the-semantic-layer). This quickstart guide is designed for dbt users using Snowflake as their data platform. It focuses on building and defining metrics, setting up the Semantic Layer in a dbt project, and querying metrics in Google Sheets. If you're on different data platforms, you can also follow this guide and will need to modify the setup for the specific platform. See the [users on different platforms](#for-users-on-different-data-platforms) section for more information. ##### Prerequisites[​](#prerequisites "Direct link to Prerequisites") * You need a [dbt](https://www.getdbt.com/signup/) Trial, Starter, or Enterprise-tier account for all deployments. * Have the correct [dbt license](https://docs.getdbt.com/docs/cloud/manage-access/seats-and-users.md) and [permissions](https://docs.getdbt.com/docs/cloud/manage-access/enterprise-permissions.md) based on your plan:  More info on license and permissions * Enterprise-tier — Developer license with Account Admin permissions. Or "Owner" with a Developer license, assigned Project Creator, Database Admin, or Admin permissions. * Starter — "Owner" access with a Developer license. * Trial — Automatic "Owner" access under a Starter plan trial. * Create a [trial Snowflake account](https://signup.snowflake.com/): * Select the Enterprise Snowflake edition with ACCOUNTADMIN access. Consider organizational questions when choosing a cloud provider, and refer to Snowflake's [Introduction to Cloud Platforms](https://docs.snowflake.com/en/user-guide/intro-cloud-platforms). * Select a cloud provider and region. All cloud providers and regions will work so choose whichever you prefer. * Complete the [Quickstart for dbt and Snowflake](https://docs.getdbt.com/guides/snowflake.md) guide. * Basic understanding of SQL and dbt. For example, you've used dbt before or have completed the [dbt Fundamentals](https://learn.getdbt.com/courses/dbt-fundamentals) course. ##### For users on different data platforms[​](#for-users-on-different-data-platforms "Direct link to For users on different data platforms") If you're using a data platform other than Snowflake, this guide is also applicable to you. You can adapt the setup for your specific platform by following the account setup and data loading instructions detailed in the following tabs for each respective platform. The rest of this guide applies universally across all supported platforms, ensuring you can fully leverage the Semantic Layer. * BigQuery * Databricks * Microsoft Fabric * Redshift * Starburst Galaxy Open a new tab and follow these quick steps for account setup and data loading instructions: * [Step 2: Create a new GCP project](https://docs.getdbt.com/guides/bigquery.md?step=2) * [Step 3: Create BigQuery dataset](https://docs.getdbt.com/guides/bigquery.md?step=3) * [Step 4: Generate BigQuery credentials](https://docs.getdbt.com/guides/bigquery.md?step=4) * [Step 5: Connect dbt to BigQuery](https://docs.getdbt.com/guides/bigquery.md?step=5) Open a new tab and follow these quick steps for account setup and data loading instructions: * [Step 2: Create a Databricks workspace](https://docs.getdbt.com/guides/databricks.md?step=2) * [Step 3: Load data](https://docs.getdbt.com/guides/databricks.md?step=3) * [Step 4: Connect dbt to Databricks](https://docs.getdbt.com/guides/databricks.md?step=4) Open a new tab and follow these quick steps for account setup and data loading instructions: * [Step 2: Load data into your Microsoft Fabric warehouse](https://docs.getdbt.com/guides/microsoft-fabric.md?step=2) * [Step 3: Connect dbt to Microsoft Fabric](https://docs.getdbt.com/guides/microsoft-fabric.md?step=3) Open a new tab and follow these quick steps for account setup and data loading instructions: * [Step 2: Create a Redshift cluster](https://docs.getdbt.com/guides/redshift.md?step=2) * [Step 3: Load data](https://docs.getdbt.com/guides/redshift.md?step=3) * [Step 4: Connect dbt to Redshift](https://docs.getdbt.com/guides/redshift.md?step=3) Open a new tab and follow these quick steps for account setup and data loading instructions: * [Step 2: Load data to an Amazon S3 bucket](https://docs.getdbt.com/guides/starburst-galaxy.md?step=2) * [Step 3: Connect Starburst Galaxy to Amazon S3 bucket data](https://docs.getdbt.com/guides/starburst-galaxy.md?step=3) * [Step 4: Create tables with Starburst Galaxy](https://docs.getdbt.com/guides/starburst-galaxy.md?step=4) * [Step 5: Connect dbt to Starburst Galaxy](https://docs.getdbt.com/guides/starburst-galaxy.md?step=5) #### Create new Snowflake worksheet and set up environment[​](#create-new-snowflake-worksheet-and-set-up-environment "Direct link to Create new Snowflake worksheet and set up environment") 1. Log in to your [trial Snowflake account](https://signup.snowflake.com). 2. In the Snowflake user interface (UI), click **+ Worksheet** in the upper right corner. 3. Select **SQL Worksheet** to create a new worksheet. ##### Set up and load data into Snowflake[​](#set-up-and-load-data-into-snowflake "Direct link to Set up and load data into Snowflake") The data used here is stored as CSV files in a public S3 bucket and the following steps will guide you through how to prepare your Snowflake account for that data and upload it. 1. Create a new virtual warehouse, two new databases (one for raw data, the other for future dbt development), and two new schemas (one for `jaffle_shop` data, the other for `stripe` data). To do this, run these SQL commands by typing them into the Editor of your new Snowflake worksheet and clicking **Run** in the upper right corner of the UI: ```sql create warehouse transforming; create database raw; create database analytics; create schema raw.jaffle_shop; create schema raw.stripe; ``` 2. In the `raw` database and `jaffle_shop` and `stripe` schemas, create three tables and load relevant data into them: * First, delete all contents (empty) in the Editor of the Snowflake worksheet. Then, run this SQL command to create the `customer` table: ```sql create table raw.jaffle_shop.customers ( id integer, first_name varchar, last_name varchar ); ``` * Delete all contents in the Editor, then run this command to load data into the `customer` table: ```sql copy into raw.jaffle_shop.customers (id, first_name, last_name) from 's3://dbt-tutorial-public/jaffle_shop_customers.csv' file_format = ( type = 'CSV' field_delimiter = ',' skip_header = 1 ); ``` * Delete all contents in the Editor (empty), then run this command to create the `orders` table: ```sql create table raw.jaffle_shop.orders ( id integer, user_id integer, order_date date, status varchar, _etl_loaded_at timestamp default current_timestamp ); ``` * Delete all contents in the Editor, then run this command to load data into the `orders` table: ```sql copy into raw.jaffle_shop.orders (id, user_id, order_date, status) from 's3://dbt-tutorial-public/jaffle_shop_orders.csv' file_format = ( type = 'CSV' field_delimiter = ',' skip_header = 1 ); ``` * Delete all contents in the Editor (empty), then run this command to create the `payment` table: ```sql create table raw.stripe.payment ( id integer, orderid integer, paymentmethod varchar, status varchar, amount integer, created date, _batched_at timestamp default current_timestamp ); ``` * Delete all contents in the Editor, then run this command to load data into the `payment` table: ```sql copy into raw.stripe.payment (id, orderid, paymentmethod, status, amount, created) from 's3://dbt-tutorial-public/stripe_payments.csv' file_format = ( type = 'CSV' field_delimiter = ',' skip_header = 1 ); ``` 3. Verify that the data is loaded by running these SQL queries. Confirm that you can see output for each one. ```sql select * from raw.jaffle_shop.customers; select * from raw.jaffle_shop.orders; select * from raw.stripe.payment; ``` [![The image displays Snowflake's confirmation output when data loaded correctly in the Editor.](/img/docs/dbt-cloud/semantic-layer/sl-snowflake-confirm.jpg?v=2 "The image displays Snowflake's confirmation output when data loaded correctly in the Editor.")](#)The image displays Snowflake's confirmation output when data loaded correctly in the Editor. #### Connect dbt to Snowflake[​](#connect-dbt-to-snowflake "Direct link to Connect dbt to Snowflake") There are two ways to connect dbt to Snowflake. The first option is Partner Connect, which provides a streamlined setup to create your dbt account from within your new Snowflake trial account. The second option is to create your dbt account separately and build the Snowflake connection yourself (connect manually). If you want to get started quickly, dbt Labs recommends using Partner Connect. If you want to customize your setup from the very beginning and gain familiarity with the dbt setup flow, dbt Labs recommends connecting manually. * Use Partner Connect * Connect manually Using Partner Connect allows you to create a complete dbt account with your [Snowflake connection](https://docs.getdbt.com/docs/cloud/connect-data-platform/connect-snowflake.md), [a managed repository](https://docs.getdbt.com/docs/cloud/git/managed-repository.md), [environments](https://docs.getdbt.com/docs/build/custom-schemas.md#managing-environments), and credentials. 1. On the left sidebar of the Snowflake UI, go to **Admin > Partner Connect**. Find the dbt tile under the **Data Integration** section or search for dbt in the search bar. Click the tile to connect to dbt. [![Snowflake Partner Connect Box](/img/snowflake_tutorial/snowflake_partner_connect_box.png?v=2 "Snowflake Partner Connect Box")](#)Snowflake Partner Connect Box If you’re using the classic version of the Snowflake UI, you can click the **Partner Connect** button in the top bar of your account. From there, click on the dbt tile to open up the connect box. [![Snowflake Classic UI - Partner Connect](/img/snowflake_tutorial/snowflake_classic_ui_partner_connect.png?v=2 "Snowflake Classic UI - Partner Connect")](#)Snowflake Classic UI - Partner Connect 2. In the **Connect to dbt** popup, find the **Optional Grant** option and select the **RAW** and **ANALYTICS** databases. This will grant access for your new dbt user role to each selected database. Then, click **Connect**. [![Snowflake Classic UI - Connection Box](/img/snowflake_tutorial/snowflake_classic_ui_connection_box.png?v=2 "Snowflake Classic UI - Connection Box")](#)Snowflake Classic UI - Connection Box [![Snowflake New UI - Connection Box](/img/snowflake_tutorial/snowflake_new_ui_connection_box.png?v=2 "Snowflake New UI - Connection Box")](#)Snowflake New UI - Connection Box 3. Click **Activate** when a popup appears: [![Snowflake Classic UI - Actviation Window](/img/snowflake_tutorial/snowflake_classic_ui_activation_window.png?v=2 "Snowflake Classic UI - Actviation Window")](#)Snowflake Classic UI - Actviation Window [![Snowflake New UI - Activation Window](/img/snowflake_tutorial/snowflake_new_ui_activation_window.png?v=2 "Snowflake New UI - Activation Window")](#)Snowflake New UI - Activation Window 4. After the new tab loads, you will see a form. If you already created a dbt account, you will be asked to provide an account name. If you haven't created an account, you will be asked to provide an account name and password. 5. After you have filled out the form and clicked **Complete Registration**, you will be logged into dbt automatically. 6. Click your account name in the left side menu and select **Account settings**, choose the "Partner Connect Trial" project, and select **snowflake** in the overview table. Select **Edit** and update the **Database** field to `analytics` and the **Warehouse** field to `transforming`. [![dbt - Snowflake Project Overview](/img/snowflake_tutorial/dbt_cloud_snowflake_project_overview.png?v=2 "dbt - Snowflake Project Overview")](#)dbt - Snowflake Project Overview [![dbt - Update Database and Warehouse](/img/snowflake_tutorial/dbt_cloud_update_database_and_warehouse.png?v=2 "dbt - Update Database and Warehouse")](#)dbt - Update Database and Warehouse 1. Create a new project in dbt. Navigate to **Account settings** (by clicking on your account name in the left side menu), and click **+ New Project**. 2. Enter a project name and click **Continue**. 3. In the **Configure your development environment** section, click the **Connection** dropdown menu and select **Add new connection**. This directs you to the connection configuration settings. 4. In the **Type** section, select **Snowflake**. [![dbt - Choose Snowflake Connection](/img/snowflake_tutorial/dbt_cloud_setup_snowflake_connection_start.png?v=2 "dbt - Choose Snowflake Connection")](#)dbt - Choose Snowflake Connection 5. Enter your **Settings** for Snowflake with: * **Account** — Find your account by using the Snowflake trial account URL and removing `snowflakecomputing.com`. The order of your account information will vary by Snowflake version. For example, Snowflake's Classic console URL might look like: `oq65696.west-us-2.azure.snowflakecomputing.com`. The AppUI or Snowsight URL might look more like: `snowflakecomputing.com/west-us-2.azure/oq65696`. In both examples, your account will be: `oq65696.west-us-2.azure`. For more information, see [Account Identifiers](https://docs.snowflake.com/en/user-guide/admin-account-identifier.html) in the Snowflake docs.   ✅ `db5261993` or `db5261993.east-us-2.azure`
  ❌ `db5261993.eu-central-1.snowflakecomputing.com` * **Role** — Leave blank for now. You can update this to a default Snowflake role later. * **Database** — `analytics`. This tells dbt to create new models in the analytics database. * **Warehouse** — `transforming`. This tells dbt to use the transforming warehouse that was created earlier. [![dbt - Snowflake Account Settings](/img/snowflake_tutorial/dbt_cloud_snowflake_account_settings.png?v=2 "dbt - Snowflake Account Settings")](#)dbt - Snowflake Account Settings 6. Click **Save**. 7. Set up your personal development credentials by going to **Your profile** > **Credentials**. 8. Select your project that uses the Snowflake connection. 9. Click the **configure your development environment and add a connection** link. This directs you to a page where you can enter your personal development credentials. 10. Enter your **Development credentials** for Snowflake with: * **Username** — The username you created for Snowflake. The username is not your email address and is usually your first and last name together in one word. * **Password** — The password you set when creating your Snowflake account. * **Schema** — You’ll notice that the schema name has been auto-created for you. By convention, this is `dbt_`. This is the schema connected directly to your development environment, and it's where your models will be built when running dbt within the Studio IDE. * **Target name** — Leave as the default. * **Threads** — Leave as 4. This is the number of simultaneous connects that dbt will make to build models concurrently. [![dbt - Snowflake Development Credentials](/img/snowflake_tutorial/dbt_cloud_snowflake_development_credentials.png?v=2 "dbt - Snowflake Development Credentials")](#)dbt - Snowflake Development Credentials 11. Click **Test connection**. This verifies that dbt can access your Snowflake account. 12. If the test succeeded, click **Save** to complete the configuration. If it failed, you may need to check your Snowflake settings and credentials. #### Set up dbt project[​](#set-up-dbt-project "Direct link to Set up dbt project") In this section, you will set up a dbt managed repository and initialize your dbt project to start developing. ##### Set up a dbt managed repository[​](#set-up-a-dbt-managed-repository "Direct link to Set up a dbt managed repository") If you used Partner Connect, you can skip to [initializing your dbt project](#initialize-your-dbt-project-and-start-developing) as Partner Connect provides you with a [managed repository](https://docs.getdbt.com/docs/cloud/git/managed-repository.md). Otherwise, you will need to create your repository connection. When you develop in dbt, you can leverage [Git](https://docs.getdbt.com/docs/cloud/git/git-version-control.md) to version control your code. To connect to a repository, you can either set up a dbt-hosted [managed repository](https://docs.getdbt.com/docs/cloud/git/managed-repository.md) or directly connect to a [supported git provider](https://docs.getdbt.com/docs/cloud/git/connect-github.md). Managed repositories are a great way to trial dbt without needing to create a new repository. In the long run, it's better to connect to a supported git provider to use features like automation and [continuous integration](https://docs.getdbt.com/docs/deploy/continuous-integration.md). To set up a managed repository: 1. Under "Setup a repository", select **Managed**. 2. Type a name for your repo such as `bbaggins-dbt-quickstart` 3. Click **Create**. It will take a few seconds for your repository to be created and imported. 4. Once you see the "Successfully imported repository," click **Continue**. ##### Initialize your dbt project[​](#initialize-your-dbt-project "Direct link to Initialize your dbt project") This guide assumes you use the [Studio IDE](https://docs.getdbt.com/docs/cloud/studio-ide/develop-in-studio.md) to develop your dbt project, define metrics, and query and preview metrics using [MetricFlow commands](https://docs.getdbt.com/docs/build/metricflow-commands.md). Now that you have a repository configured, you can initialize your project and start development in dbt using the Studio IDE: 1. Click **Start developing in the Studio IDE**. It might take a few minutes for your project to spin up for the first time as it establishes your git connection, clones your repo, and tests the connection to the warehouse. 2. Above the file tree to the left, click **Initialize your project**. This builds out your folder structure with example models. 3. Make your initial commit by clicking **Commit and sync**. Use the commit message `initial commit`. This creates the first commit to your managed repo and allows you to open a branch where you can add a new dbt code. 4. You can now directly query data from your warehouse and execute `dbt run`. You can try this out now: * Delete the models/examples folder in the **File Catalog**. * Click **+ Create new file**, add this query to the new file, and click **Save as** to save the new file: ```sql select * from raw.jaffle_shop.customers ``` * In the command line bar at the bottom, enter dbt run and click Enter. You should see a dbt run succeeded message. #### Build your dbt project[​](#build-your-dbt-project "Direct link to Build your dbt project") The next step is to build your project. This involves adding sources, staging models, business-defined entities, and packages to your project. ##### Add sources[​](#add-sources "Direct link to Add sources") [Sources](https://docs.getdbt.com/docs/build/sources.md) in dbt are the raw data tables you'll transform. By organizing your source definitions, you document the origin of your data. It also makes your project and transformation more reliable, structured, and understandable. You have two options for working with files in the Studio IDE: * **Create a new branch (recommended)** — Create a new branch to edit and commit your changes. Navigate to **Version Control** on the left sidebar and click **Create branch**. * **Edit in the protected primary branch** — If you prefer to edit, format, or lint files and execute dbt commands directly in your primary git branch, use this option. The Studio IDE prevents commits to the protected branch so you'll be prompted to commit your changes to a new branch. Name the new branch `build-project`. 1. Hover over the `models` directory and click the three-dot menu (**...**), then select **Create file**. 2. Name the file `staging/jaffle_shop/src_jaffle_shop.yml` , then click **Create**. 3. Copy the following text into the file and click **Save**. models/staging/jaffle\_shop/src\_jaffle\_shop.yml ```yaml sources: - name: jaffle_shop database: raw schema: jaffle_shop tables: - name: customers - name: orders ``` tip In your source file, you can also use the **Generate model** button to create a new model file for each source. This creates a new file in the `models` directory with the given source name and fill in the SQL code of the source definition. 4. Hover over the `models` directory and click the three dot menu (**...**), then select **Create file**. 5. Name the file `staging/stripe/src_stripe.yml` , then click **Create**. 6. Copy the following text into the file and click **Save**. models/staging/stripe/src\_stripe.yml ```yaml sources: - name: stripe database: raw schema: stripe tables: - name: payment ``` ##### Add staging models[​](#add-staging-models "Direct link to Add staging models") [Staging models](https://docs.getdbt.com/best-practices/how-we-structure/2-staging.md) are the first transformation step in dbt. They clean and prepare your raw data, making it ready for more complex transformations and analyses. Follow these steps to add your staging models to your project. 1. In the `jaffle_shop` sub-directory, create the file `stg_customers.sql`. Or, you can use the **Generate model** button to create a new model file for each source. 2. Copy the following query into the file and click **Save**. models/staging/jaffle\_shop/stg\_customers.sql ```sql select id as customer_id, first_name, last_name from {{ source('jaffle_shop', 'customers') }} ``` 3. In the same `jaffle_shop` sub-directory, create the file `stg_orders.sql` 4. Copy the following query into the file and click **Save**. models/staging/jaffle\_shop/stg\_orders.sql ```sql select id as order_id, user_id as customer_id, order_date, status from {{ source('jaffle_shop', 'orders') }} ``` 5. In the `stripe` sub-directory, create the file `stg_payments.sql`. 6. Copy the following query into the file and click **Save**. models/staging/stripe/stg\_payments.sql ```sql select id as payment_id, orderid as order_id, paymentmethod as payment_method, status, -- amount is stored in cents, convert it to dollars amount / 100 as amount, created as created_at from {{ source('stripe', 'payment') }} ``` 7. Enter `dbt run` in the command prompt at the bottom of the screen. You should get a successful run and see the three models. ##### Add business-defined entities[​](#add-business-defined-entities "Direct link to Add business-defined entities") This phase involves creating [models that serve as the entity layer or concept layer of your dbt project](https://docs.getdbt.com/best-practices/how-we-structure/4-marts.md), making the data ready for reporting and analysis. It also includes adding [packages](https://docs.getdbt.com/docs/build/packages.md) and the [MetricFlow time spine](https://docs.getdbt.com/docs/build/metricflow-time-spine.md) that extend dbt's functionality. This phase is the [marts layer](https://docs.getdbt.com/best-practices/how-we-structure/1-guide-overview.md#guide-structure-overview), which brings together modular pieces into a wide, rich vision of the entities an organization cares about. 1. Create the file `models/marts/fct_orders.sql`. 2. Copy the following query into the file and click **Save**. models/marts/fct\_orders.sql ```sql with orders as ( select * from {{ ref('stg_orders' )}} ), payments as ( select * from {{ ref('stg_payments') }} ), order_payments as ( select order_id, sum(case when status = 'success' then amount end) as amount from payments group by 1 ), final as ( select orders.order_id, orders.customer_id, orders.order_date, coalesce(order_payments.amount, 0) as amount from orders left join order_payments using (order_id) ) select * from final ``` 3. In the `models/marts` directory, create the file `dim_customers.sql`. 4. Copy the following query into the file and click **Save**. models/marts/dim\_customers.sql ```sql with customers as ( select * from {{ ref('stg_customers')}} ), orders as ( select * from {{ ref('fct_orders')}} ), customer_orders as ( select customer_id, min(order_date) as first_order_date, max(order_date) as most_recent_order_date, count(order_id) as number_of_orders, sum(amount) as lifetime_value from orders group by 1 ), final as ( select customers.customer_id, customers.first_name, customers.last_name, customer_orders.first_order_date, customer_orders.most_recent_order_date, coalesce(customer_orders.number_of_orders, 0) as number_of_orders, customer_orders.lifetime_value from customers left join customer_orders using (customer_id) ) select * from final ``` 5. Create a MetricFlow time spine model by following the [MetricFlow time spine guide](https://docs.getdbt.com/guides/mf-time-spine.md?step=1). This guide walks you through creating both the SQL model and YAML configuration required for time-based metric calculations. 6. Enter `dbt run` in the command prompt at the bottom of the screen. You should get a successful run message and also see in the run details that dbt has successfully built your models. #### Create semantic models[​](#create-semantic-models "Direct link to Create semantic models") In this section, you'll learn about [semantic model](https://docs.getdbt.com/guides/sl-snowflake-qs.md?step=6#about-semantic-models), [their components](https://docs.getdbt.com/guides/sl-snowflake-qs.md?step=6#semantic-model-components), and [how to configure a time spine](https://docs.getdbt.com/guides/sl-snowflake-qs.md?step=6#configure-a-time-spine). ##### About semantic models[​](#about-semantic-models "Direct link to About semantic models") ##### Semantic model components[​](#semantic-model-components "Direct link to Semantic model components") ##### Entities[​](#entities "Direct link to Entities") [Entities](https://docs.getdbt.com/docs/build/semantic-models.md#entities) are a real-world concept in a business, serving as the backbone of your semantic model. These are going to be ID columns (like `order_id`) in our semantic models. These will serve as join keys to other semantic models. Add entities to your `fct_orders.yml` semantic model file: ##### Dimensions[​](#dimensions "Direct link to Dimensions") [Dimensions](https://docs.getdbt.com/docs/build/semantic-models.md#dimensions) are a way to group or filter information based on categories or time. Add dimensions to your `fct_orders.yml` semantic model file: ##### Configure a time spine[​](#configure-a-time-spine "Direct link to Configure a time spine") To ensure accurate time-based aggregations, you must configure a [time spine](https://docs.getdbt.com/docs/build/metricflow-time-spine.md). The time spine allows you to have accurate metric calculations over different time granularities. Follow the [MetricFlow time spine guide](https://docs.getdbt.com/guides/mf-time-spine.md?step=1) for complete step-by-step instructions on creating and configuring your time spine model. This guide provides the current best practices and avoids deprecated configurations. #### Define metrics and add a second semantic model[​](#define-metrics-and-add-a-second-semantic-model "Direct link to Define metrics and add a second semantic model") In this section, you will [define metrics](#define-metrics) and [add a second semantic model](#add-second-semantic-model-to-your-project) to your project. ##### Define metrics[​](#define-metrics "Direct link to Define metrics") [Metrics](https://docs.getdbt.com/docs/build/metrics-overview.md) are the language your business users speak and measure business performance. They are an aggregation over a column in your warehouse that you enrich with dimensional cuts. There are different types of metrics you can configure: * [Conversion metrics](https://docs.getdbt.com/docs/build/conversion.md) — Track when a base event and a subsequent conversion event occur for an entity within a set time period. * [Cumulative metrics](https://docs.getdbt.com/docs/build/cumulative.md) — Aggregate a measure over a given window. If no window is specified, the window will accumulate the measure over all of the recorded time period. Note that you must create the time spine model before you add cumulative metrics. * [Derived metrics](https://docs.getdbt.com/docs/build/metrics-overview.md#derived-metrics) — Allows you to do calculations on top of metrics. * [Simple metrics](https://docs.getdbt.com/docs/build/metrics-overview.md#simple-metrics) — Directly reference a single column expression within a semantic model, without any additional columns involved. They are aggregations over a column in your data platform and can be filtered by one or multiple dimensions. * [Ratio metrics](https://docs.getdbt.com/docs/build/metrics-overview.md#ratio-metrics) — Involve a numerator metric and a denominator metric. A constraint string can be applied to both the numerator and denominator or separately to the numerator or denominator. Once you've created your semantic models, it's time to start referencing those you made to create some metrics: 1. Add metrics to your `fct_orders.yml` file: tip Make sure to save all semantic models and metrics under the directory defined in the [`model-paths`](https://docs.getdbt.com/reference/project-configs/model-paths.md) (or a subdirectory of it, like `models/semantic_models/`). If you save them outside of this path, it will result in an empty `semantic_manifest.json` file, and your semantic models or metrics won't be recognized. ##### Add second semantic model to your project[​](#add-second-semantic-model-to-your-project "Direct link to Add second semantic model to your project") Let’s expand your project's analytical capabilities by adding another semantic model in your other marts model, such as: `dim_customers.yml`. After setting up your orders model: 1. Create the file `dim_customers.yml`. 2. Copy the following code into the file and click **Save**. This semantic model uses simple metrics to focus on customer metrics and emphasizes customer dimensions like name, type, and order dates. It uniquely analyzes customer behavior, lifetime value, and order patterns. #### Test and query metrics[​](#test-and-query-metrics "Direct link to Test and query metrics") To work with metrics in dbt, you have several tools to validate or run commands. Here's how you can test and query metrics depending on your setup: * [**Studio IDE users**](#dbt-cloud-ide-users) — Run [MetricFlow commands](https://docs.getdbt.com/docs/build/metricflow-commands.md#metricflow-commands) directly in the [Studio IDE](https://docs.getdbt.com/docs/cloud/studio-ide/develop-in-studio.md) to query/preview metrics. View metrics visually in the **Lineage** tab. * [**dbt CLI users**](#dbt-cloud-cli-users) — The [dbt CLI](https://docs.getdbt.com/docs/cloud/cloud-cli-installation.md) enables you to run [MetricFlow commands](https://docs.getdbt.com/docs/build/metricflow-commands.md#metricflow-commands) to query and preview metrics directly in your command line interface. * **dbt Core users** — Use the MetricFlow CLI for command execution. While this guide focuses on dbt users, dbt Core users can find detailed MetricFlow CLI setup instructions in the [MetricFlow commands](https://docs.getdbt.com/docs/build/metricflow-commands.md#metricflow-commands) page. Note that to use the Semantic Layer, you need to have a [Starter or Enterprise-tier account](https://www.getdbt.com/). Alternatively, you can run commands with SQL client tools like DataGrip, DBeaver, or RazorSQL. ##### Studio IDE users[​](#studio-ide-users "Direct link to Studio IDE users") You can use the `dbt sl` prefix before the command name to execute them in dbt. For example, to list all metrics, run `dbt sl list metrics`. For a complete list of the MetricFlow commands available in the Studio IDE, refer to the [MetricFlow commands](https://docs.getdbt.com/docs/build/metricflow-commands.md#metricflow-commandss) page. The Studio IDE **Status button** (located in the bottom right of the editor) displays an **Error** status if there's an error in your metric or semantic model definition. You can click the button to see the specific issue and resolve it. Once viewed, make sure you commit and merge your changes in your project. ##### Cloud CLI users[​](#cloud-cli-users "Direct link to Cloud CLI users") This section is for dbt CLI users. MetricFlow commands are integrated with dbt, which means you can run MetricFlow commands as soon as you install the dbt CLI. Your account will automatically manage version control for you. Refer to the following steps to get started: 1. Install the [dbt CLI](https://docs.getdbt.com/docs/cloud/cloud-cli-installation.md) (if you haven't already). Then, navigate to your dbt project directory. 2. Run a dbt command, such as `dbt parse`, `dbt run`, `dbt compile`, or `dbt build`. If you don't, you'll receive an error message that begins with: "ensure that you've ran an artifacts....". 3. MetricFlow builds a semantic graph and generates a `semantic_manifest.json` file in dbt, which is stored in the `/target` directory. If using the Jaffle Shop example, run `dbt seed && dbt run` to ensure the required data is in your data platform before proceeding. Run dbt parse to reflect metric changes When you make changes to metrics, make sure to run `dbt parse` at a minimum to update the Semantic Layer. This updates the `semantic_manifest.json` file, reflecting your changes when querying metrics. By running `dbt parse`, you won't need to rebuild all the models. 4. Run `dbt sl --help` to confirm you have MetricFlow installed and that you can view the available commands. 5. Run `dbt sl query --metrics --group-by ` to query the metrics and dimensions. For example, to query the `order_total` and `order_count` (both metrics), and then group them by the `order_date` (dimension), you would run: ```sql dbt sl query --metrics order_total,order_count --group-by order_date ``` 6. Verify that the metric values are what you expect. To further understand how the metric is being generated, you can view the generated SQL if you type `--compile` in the command line. 7. Commit and merge the code changes that contain the metric definitions. #### Run a production job[​](#run-a-production-job "Direct link to Run a production job") This section explains how you can perform a job run in your deployment environment in dbt to materialize and deploy your metrics. Currently, the deployment environment is only supported. 1. Once you’ve [defined your semantic models and metrics](https://docs.getdbt.com/guides/sl-snowflake-qs.md?step=10), commit and merge your metric changes in your dbt project. 2. In dbt, create a new [deployment environment](https://docs.getdbt.com/docs/deploy/deploy-environments.md#create-a-deployment-environment) or use an existing environment on dbt 1.6 or higher. * Note — Deployment environment is currently supported (*development experience coming soon*) 3. To create a new environment, navigate to **Deploy** in the navigation menu, select **Environments**, and then select **Create new environment**. 4. Fill in your deployment credentials with your Snowflake username and password. You can name the schema anything you want. Click **Save** to create your new production environment. 5. [Create a new deploy job](https://docs.getdbt.com/docs/deploy/deploy-jobs.md#create-and-schedule-jobs) that runs in the environment you just created. Go back to the **Deploy** menu, select **Jobs**, select **Create job**, and click **Deploy job**. 6. Set the job to run a `dbt parse` job to parse your projects and generate a [`semantic_manifest.json` artifact](https://docs.getdbt.com/reference/artifacts/sl-manifest.md) file. Although running `dbt build` isn't required, you can choose to do so if needed. note If you are on the dbt Fusion engine, add the `dbt docs generate` command to your job to successfully deploy your metrics. 7. Run the job by clicking the **Run now** button. Monitor the job's progress in real-time through the **Run summary** tab. Once the job completes successfully, your dbt project, including the generated documentation, will be fully deployed and available for use in your production environment. If any issues arise, review the logs to diagnose and address any errors. What’s happening internally? * Merging the code into your main branch allows dbt to pull those changes and build the definition in the manifest produced by the run.
* Re-running the job in the deployment environment helps materialize the models, which the metrics depend on, in the data platform. It also makes sure that the manifest is up to date.
* The Semantic Layer APIs pull in the most recent manifest and enables your integration to extract metadata from it. #### Administer the Semantic Layer[​](#administer-the-semantic-layer "Direct link to Administer the Semantic Layer") In this section, you will learn how to add credentials and create service tokens to start querying the dbt Semantic Layer. This section goes over the following topics: * [Select environment](#1-select-environment) * [Configure credentials and create tokens](#2-configure-credentials-and-create-tokens) * [View connection detail](#3-view-connection-detail) * [Add more credentials](#4-add-more-credentials) * [Delete configuration](#delete-configuration) You must be part of the Owner group and have the correct [license](https://docs.getdbt.com/docs/cloud/manage-access/seats-and-users.md) and [permissions](https://docs.getdbt.com/docs/cloud/manage-access/enterprise-permissions.md) to administer the Semantic Layer at the environment and project level. * Enterprise+ and Enterprise plan: * Developer license with Account Admin permissions, or * Owner with a Developer license, assigned Project Creator, Database Admin, or Admin permissions. * Starter plan: Owner with a Developer license. * Free trial: You are on a free trial of the Starter plan as an Owner, which means you have access to the dbt Semantic Layer. ##### 1. Select environment[​](#1-select-environment "Direct link to 1. Select environment") Select the environment where you want to enable the Semantic Layer: 1. Navigate to **Account settings** in the navigation menu. 2. Under **Settings**, click **Projects** and select the specific project you want to enable the Semantic Layer for. 3. In the **Project details** page, navigate to the **Semantic Layer** section. Select **Configure Semantic Layer**. [![Semantic Layer section in the 'Project details' page](/img/docs/dbt-cloud/semantic-layer/new-sl-configure.png?v=2 "Semantic Layer section in the 'Project details' page")](#)Semantic Layer section in the 'Project details' page 4. In the **Set Up Semantic Layer Configuration** page, select the deployment environment you want for the Semantic Layer and click **Save**. This provides administrators with the flexibility to choose the environment where the Semantic Layer will be enabled. [![Select the deployment environment to run your Semantic Layer against.](/img/docs/dbt-cloud/semantic-layer/sl-select-env.png?v=2 "Select the deployment environment to run your Semantic Layer against.")](#)Select the deployment environment to run your Semantic Layer against. ##### 2. Configure credentials and create tokens[​](#2-configure-credentials-and-create-tokens "Direct link to 2. Configure credentials and create tokens") There are two options for setting up Semantic Layer using API tokens: * [Add a credential and create service tokens](#add-a-credential-and-create-service-tokens) * [Configure development credentials and create personal tokens](#configure-development-credentials-and-create-a-personal-token) ###### Add a credential and create service tokens[​](#add-a-credential-and-create-service-tokens "Direct link to Add a credential and create service tokens") The first option is to use [service tokens](https://docs.getdbt.com/docs/dbt-cloud-apis/service-tokens.md) for authentication which are tied to an underlying data platform credential that you configure. The credential configured is used to execute queries that the Semantic Layer issues against your data platform. This credential controls the physical access to underlying data accessed by the Semantic Layer, and all access policies set in the data platform for this credential will be respected. | Feature | Starter plan | Enterprise+ and Enterprise plan | | --------------------------------------------------- | ------------------------------------------------------------ | --------------------------------------------------------------------------------------------------------------------------------------------------------------- | | Service tokens | Can create multiple service tokens linked to one credential. | Can use multiple credentials and link multiple service tokens to each credential. Note that you cannot link a single service token to more than one credential. | | Credentials per project | One credential per project. | Can [add multiple](#4-add-more-credentials) credentials per project. | | Link multiple service tokens to a single credential | ✅ | ✅ | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | *If you're on a Starter plan and need to add more credentials, consider upgrading to our [Enterprise+ or Enterprise plan](https://www.getdbt.com/contact). All Enterprise users can refer to [Add more credentials](#4-add-more-credentials) for detailed steps on adding multiple credentials.* ###### 1. Select deployment environment[​](#1--select-deployment-environment "Direct link to 1. Select deployment environment") * After selecting the deployment environment, you should see the **Credentials & service tokens** page. * Click the **Add Semantic Layer credential** button. ###### 2. Configure credential[​](#2-configure-credential "Direct link to 2. Configure credential") * In the **1. Add credentials** section, enter the credentials specific to your data platform that you want the Semantic Layer to use. * Use credentials with minimal privileges. The Semantic Layer requires read access to the schema(s) containing the dbt models used in your semantic models for downstream applications * Use [Extended Attributes](https://docs.getdbt.com/docs/dbt-cloud-environments.md#extended-attributes) and [Environment Variables](https://docs.getdbt.com/docs/build/environment-variables.md) when connecting to the Semantic Layer. If you set a value directly in the Semantic Layer Credentials, it will have a higher priority than Extended Attributes. When using environment variables, the default value for the environment will be used. For example, set the warehouse by using `{{env_var('DBT_WAREHOUSE')}}` in your Semantic Layer credentials. Similarly, if you set the account value using `{{env_var('DBT_ACCOUNT')}}` in Extended Attributes, dbt will check both the Extended Attributes and the environment variable. [![Add credentials and map them to a service token. ](/img/docs/dbt-cloud/semantic-layer/sl-add-credential.png?v=2 "Add credentials and map them to a service token. ")](#)Add credentials and map them to a service token. ###### 3. Create or link service tokens[​](#3-create-or-link-service-tokens "Direct link to 3. Create or link service tokens") * If you have permission to create service tokens, you’ll see the [**Map new service token** option](https://docs.getdbt.com/docs/use-dbt-semantic-layer/setup-sl.md#map-service-tokens-to-credentials) after adding the credential. Name the token, set permissions to 'Semantic Layer Only' and 'Metadata Only', and click **Save**. * Once the token is generated, you won't be able to view this token again, so make sure to record it somewhere safe. * If you don’t have access to create service tokens, you’ll see a message prompting you to contact your admin to create one for you. Admins can create and link tokens as needed. [![If you don’t have access to create service tokens, you can create a credential and contact your admin to create one for you.](/img/docs/dbt-cloud/semantic-layer/sl-credential-no-service-token.png?v=2 "If you don’t have access to create service tokens, you can create a credential and contact your admin to create one for you.")](#)If you don’t have access to create service tokens, you can create a credential and contact your admin to create one for you. info * Starter plans can create multiple service tokens that link to a single underlying credential, but each project can only have one credential. * All Enterprise plans can [add multiple credentials](#4-add-more-credentials) and map those to service tokens for tailored access. [Book a free live demo](https://www.getdbt.com/contact) to discover the full potential of dbt Enterprise and higher plans. ###### Configure development credentials and create a personal token[​](#configure-development-credentials-and-create-a-personal-token "Direct link to Configure development credentials and create a personal token") Using [personal access tokens (PATs)](https://docs.getdbt.com/docs/dbt-cloud-apis/user-tokens.md) is also a supported authentication method for the dbt Semantic Layer. This enables user-level authentication, reducing the need for sharing tokens between users. When you authenticate using PATs, queries are run using your personal development credentials. To use PATs in Semantic Layer: 1. Configure your development credentials. 1. Click your account name at the bottom left-hand menu and go to **Account settings** > **Credentials**. 2. Select your project. 3. Click **Edit**. 4. Go to **Development credentials** and enter your details. 5. Click **Save**. 2. [Create a personal access token](https://docs.getdbt.com/docs/dbt-cloud-apis/user-tokens.md). Make sure to copy the token. You can use the generated PAT as the authentication method for Semantic Layer [APIs](https://docs.getdbt.com/docs/dbt-cloud-apis/sl-api-overview.md) and [integrations](https://docs.getdbt.com/docs/cloud-integrations/avail-sl-integrations.md). ##### 3. View connection detail[​](#3-view-connection-detail "Direct link to 3. View connection detail") 1. Go back to the **Project details** page for connection details to connect to downstream tools. 2. Copy and share the Environment ID, service or personal token, Host, as well as the service or personal token name to the relevant teams for BI connection setup. If your tool uses the GraphQL API, save the GraphQL API host information instead of the JDBC URL. For info on how to connect to other integrations, refer to [Available integrations](https://docs.getdbt.com/docs/cloud-integrations/avail-sl-integrations.md). [![After configuring, you'll be provided with the connection details to connect to you downstream tools.](/img/docs/dbt-cloud/semantic-layer/sl-configure-example.png?v=2 "After configuring, you'll be provided with the connection details to connect to you downstream tools.")](#)After configuring, you'll be provided with the connection details to connect to you downstream tools. ##### 4. Add more credentials [Enterprise +](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing")[Enterprise](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing")[​](#4-add-more-credentials- "Direct link to 4-add-more-credentials-") All dbt Enterprise plans can optionally add multiple credentials and map them to service tokens, offering more granular control and tailored access for different teams, which can then be shared to relevant teams for BI connection setup. These credentials control the physical access to underlying data accessed by the Semantic Layer. We recommend configuring credentials and service tokens to reflect your teams and their roles. For example, create tokens or credentials that align with your team's needs, such as providing access to finance-related schemas to the Finance team.  Considerations for linking credentials * Admins can link multiple service tokens to a single credential within a project, but each service token can only be linked to one credential per project. * When you send a request through the APIs, the service token of the linked credential will follow access policies of the underlying view and tables used to build your semantic layer requests. * Use [Extended Attributes](https://docs.getdbt.com/docs/dbt-cloud-environments.md#extended-attributes) and [Environment Variables](https://docs.getdbt.com/docs/build/environment-variables.md) when connecting to the Semantic Layer. If you set a value directly in the Semantic Layer Credentials, it will have a higher priority than Extended Attributes. When using environment variables, the default value for the environment will be used. For example, set the warehouse by using `{{env_var('DBT_WAREHOUSE')}}` in your Semantic Layer credentials. Similarly, if you set the account value using `{{env_var('DBT_ACCOUNT')}}` in Extended Attributes, dbt will check both the Extended Attributes and the environment variable. ###### 1. Add more credentials[​](#1-add-more-credentials "Direct link to 1. Add more credentials") * After configuring your environment, on the **Credentials & service tokens** page, click the **Add Semantic Layer credential** button to create multiple credentials and map them to a service token.
* In the **1. Add credentials** section, fill in the data platform's credential fields. We recommend using “read-only” credentials. [![Add credentials and map them to a service token. ](/img/docs/dbt-cloud/semantic-layer/sl-add-credential.png?v=2 "Add credentials and map them to a service token. ")](#)Add credentials and map them to a service token. ###### 2. Map service tokens to credentials[​](#2-map-service-tokens-to-credentials "Direct link to 2. Map service tokens to credentials") * In the **2. Map new service token** section, [map a service token to the credential](https://docs.getdbt.com/docs/use-dbt-semantic-layer/setup-sl.md#map-service-tokens-to-credentials) you configured in the previous step. dbt automatically selects the service token permission set you need (Semantic Layer Only and Metadata Only). * To add another service token during configuration, click **Add Service Token**. * You can link more service tokens to the same credential later on in the **Semantic Layer Configuration Details** page. To add another service token to an existing Semantic Layer configuration, click **Add service token** under the **Linked service tokens** section. * Click **Save** to link the service token to the credential. Remember to copy and save the service token securely, as it won't be viewable again after generation. [![Use the configuration page to manage multiple credentials or link or unlink service tokens for more granular control.](/img/docs/dbt-cloud/semantic-layer/sl-credentials-service-token.png?v=2 "Use the configuration page to manage multiple credentials or link or unlink service tokens for more granular control.")](#)Use the configuration page to manage multiple credentials or link or unlink service tokens for more granular control. ###### 3. Delete credentials[​](#3-delete-credentials "Direct link to 3. Delete credentials") * To delete a credential, go back to the **Credentials & service tokens** page. * Under **Linked Service Tokens**, click **Edit** and, select **Delete Credential** to remove a credential. When you delete a credential, any service tokens mapped to that credential in the project will no longer work and will break for any end users. ##### Delete configuration[​](#delete-configuration "Direct link to Delete configuration") You can delete the entire Semantic Layer configuration for a project. Note that deleting the Semantic Layer configuration will remove all credentials and unlink all service tokens to the project. It will also cause all queries to the Semantic Layer to fail. Follow these steps to delete the Semantic Layer configuration for a project: 1. Navigate to the **Project details** page. 2. In the **Semantic Layer** section, select **Delete Semantic Layer**. 3. Confirm the deletion by clicking **Yes, delete semantic layer** in the confirmation pop up. To re-enable the dbt Semantic Layer setup in the future, you will need to recreate your setup configurations by following the [previous steps](#set-up-dbt-semantic-layer). If your semantic models and metrics are still in your project, no changes are needed. If you've removed them, you'll need to set up the YAML configs again. [![Delete the Semantic Layer configuration for a project.](/img/docs/dbt-cloud/semantic-layer/sl-delete-config.png?v=2 "Delete the Semantic Layer configuration for a project.")](#)Delete the Semantic Layer configuration for a project. #### Additional configuration[​](#additional-configuration "Direct link to Additional configuration") The following are the additional flexible configurations for Semantic Layer credentials. ##### Map service tokens to credentials[​](#map-service-tokens-to-credentials "Direct link to Map service tokens to credentials") * After configuring your environment, you can map additional service tokens to the same credential if you have the required [permissions](https://docs.getdbt.com/docs/cloud/manage-access/about-user-access.md#permission-sets). * Go to the **Credentials & service tokens** page and click the **+Add Service Token** button in the **Linked Service Tokens** section. * Type the service token name and select the permission set you need (Semantic Layer Only and Metadata Only). * Click **Save** to link the service token to the credential. * Remember to copy and save the service token securely, as it won't be viewable again after generation. [![Map additional service tokens to a credential.](/img/docs/dbt-cloud/semantic-layer/sl-add-service-token.gif?v=2 "Map additional service tokens to a credential.")](#)Map additional service tokens to a credential. ##### Unlink service tokens[​](#unlink-service-tokens "Direct link to Unlink service tokens") * Unlink a service token from the credential by clicking **Unlink** under the **Linked service tokens** section. If you try to query the Semantic Layer with an unlinked credential, you'll experience an error in your BI tool because no valid token is mapped. ##### Manage from service token page[​](#manage-from-service-token-page "Direct link to Manage from service token page") **View credential from service token** * View your Semantic Layer credential directly by navigating to the **API tokens** and then **Service tokens** page. * Select the service token to view the credential it's linked to. This is useful if you want to know which service tokens are mapped to credentials in your project. ###### Create a new service token[​](#create-a-new-service-token "Direct link to Create a new service token") * From the **Service tokens** page, create a new service token and map it to the credential(s) (assuming the semantic layer permission exists). This is useful if you want to create a new service token and directly map it to a credential in your project. * Make sure to select the correct permission set for the service token (Semantic Layer Only and Metadata Only). [![Create a new service token and map credentials directly on the separate 'Service tokens page'.](/img/docs/dbt-cloud/semantic-layer/sl-create-service-token-page.png?v=2 "Create a new service token and map credentials directly on the separate 'Service tokens page'.")](#)Create a new service token and map credentials directly on the separate 'Service tokens page'. #### Query the Semantic Layer[​](#query-the-semantic-layer "Direct link to Query the Semantic Layer") This page will guide you on how to connect and use the following integrations to query your metrics: * [Connect and query with Google Sheets](#connect-and-query-with-google-sheets) * [Connect and query with Hex](#connect-and-query-with-hex) * [Connect and query with Sigma](#connect-and-query-with-sigma) The Semantic Layer enables you to connect and query your metric with various available tools like [PowerBI](https://docs.getdbt.com/docs/cloud-integrations/semantic-layer/power-bi.md), [Google Sheets](https://docs.getdbt.com/docs/cloud-integrations/semantic-layer/gsheets.md), [Hex](https://learn.hex.tech/docs/connect-to-data/data-connections/dbt-integration#dbt-semantic-layer-integration), [Microsoft Excel](https://docs.getdbt.com/docs/cloud-integrations/semantic-layer/excel.md), [Tableau](https://docs.getdbt.com/docs/cloud-integrations/semantic-layer/tableau.md), and more. Query metrics using other tools such as [first-class integrations](https://docs.getdbt.com/docs/cloud-integrations/avail-sl-integrations.md), [Semantic Layer APIs](https://docs.getdbt.com/docs/dbt-cloud-apis/sl-api-overview.md), and [exports](https://docs.getdbt.com/docs/use-dbt-semantic-layer/exports.md) to expose tables of metrics and dimensions in your data platform and create a custom integrations. ##### Connect and query with Google Sheets[​](#connect-and-query-with-google-sheets "Direct link to Connect and query with Google Sheets") The Google Sheets integration allows you to query your metrics using Google Sheets. This section will guide you on how to connect and use the Google Sheets integration. To query your metrics using Google Sheets: 1. Make sure you have a [Gmail](http://gmail.com/) account. 2. To set up Google Sheets and query your metrics, follow the detailed instructions on [Google Sheets integration](https://docs.getdbt.com/docs/cloud-integrations/semantic-layer/gsheets.md). 3. Start exploring and querying metrics! * Query a metric, like `order_total`, and filter it with a dimension, like `order_date`. * You can also use the `group_by` parameter to group your metrics by a specific dimension. [![Use the dbt Semantic Layer's Google Sheet integration to query metrics with a Query Builder menu.](/img/docs/dbt-cloud/semantic-layer/sl-gsheets.jpg?v=2 "Use the dbt Semantic Layer's Google Sheet integration to query metrics with a Query Builder menu.")](#)Use the dbt Semantic Layer's Google Sheet integration to query metrics with a Query Builder menu. ##### Connect and query with Hex[​](#connect-and-query-with-hex "Direct link to Connect and query with Hex") This section will guide you on how to use the Hex integration to query your metrics using Hex. Select the appropriate tab based on your connection method: * Query Semantic Layer with Hex * Getting started with the Semantic Layer workshop 1. Navigate to the [Hex login page](https://app.hex.tech/login). 2. Sign in or make an account (if you don’t already have one). * You can make Hex free trial accounts with your work email or a .edu email. 3. In the top left corner of your page, click on the **HEX** icon to go to the home page. 4. Then, click the **+ New project** button on the top right. [![Click the '+ New project' button on the top right](/img/docs/dbt-cloud/semantic-layer/hex_new.png?v=2 "Click the '+ New project' button on the top right")](#)Click the '+ New project' button on the top right 5. Go to the menu on the left side and select **Data browser**. Then select **Add a data connection**. 6. Click **Snowflake**. Provide your data connection a name and description. You don't need to your data warehouse credentials to use the Semantic Layer. [![Select 'Data browser' and then 'Add a data connection' to connect to Snowflake.](/img/docs/dbt-cloud/semantic-layer/hex_new_data_connection.png?v=2 "Select 'Data browser' and then 'Add a data connection' to connect to Snowflake.")](#)Select 'Data browser' and then 'Add a data connection' to connect to Snowflake. 7. Under **Integrations**, toggle the dbt switch to the right to enable the dbt integration. [![Click on the dbt toggle to enable the integration. ](/img/docs/dbt-cloud/semantic-layer/hex_dbt_toggle.png?v=2 "Click on the dbt toggle to enable the integration. ")](#)Click on the dbt toggle to enable the integration. 8. Enter the following information: * Select your version of dbt as 1.6 or higher * Enter your Environment ID * Enter your service or personal token * Make sure to click on the **Use Semantic Layer** toggle. This way, all queries are routed through dbt. * Click **Create connection** in the bottom right corner. 9. Hover over **More** on the menu shown in the following image and select **Semantic Layer**. [![Hover over 'More' on the menu and select 'dbt Semantic Layer'.](/img/docs/dbt-cloud/semantic-layer/hex_make_sl_cell.png?v=2 "Hover over 'More' on the menu and select 'dbt Semantic Layer'.")](#)Hover over 'More' on the menu and select 'dbt Semantic Layer'. 10. Now, you should be able to query metrics using Hex! Try it yourself: * Create a new cell and pick a metric. * Filter it by one or more dimensions. * Create a visualization. 1) Click on the link provided to you in the workshop’s chat. * Look at the **Pinned message** section of the chat if you don’t see it right away. 2) Enter your email address in the textbox provided. Then, select **SQL and Python** to be taken to Hex’s home screen. [![The 'Welcome to Hex' homepage.](/img/docs/dbt-cloud/semantic-layer/welcome_to_hex.png?v=2 "The 'Welcome to Hex' homepage.")](#)The 'Welcome to Hex' homepage. 3. Then click the purple Hex button in the top left corner. 4. Click the **Collections** button on the menu on the left. 5. Select the **Semantic Layer Workshop** collection. 6. Click the **Getting started with the Semantic Layer** project collection. [![Click 'Collections' to select the 'Semantic Layer Workshop' collection.](/img/docs/dbt-cloud/semantic-layer/hex_collections.png?v=2 "Click 'Collections' to select the 'Semantic Layer Workshop' collection.")](#)Click 'Collections' to select the 'Semantic Layer Workshop' collection. 7. To edit this Hex notebook, click the **Duplicate** button from the project dropdown menu (as displayed in the following image). This creates a new copy of the Hex notebook that you own. [![Click the 'Duplicate' button from the project dropdown menu to create a Hex notebook copy.](/img/docs/dbt-cloud/semantic-layer/hex_duplicate.png?v=2 "Click the 'Duplicate' button from the project dropdown menu to create a Hex notebook copy.")](#)Click the 'Duplicate' button from the project dropdown menu to create a Hex notebook copy. 8. To make it easier to find, rename your copy of the Hex project to include your name. [![Rename your Hex project to include your name.](/img/docs/dbt-cloud/semantic-layer/hex_rename.png?v=2 "Rename your Hex project to include your name.")](#)Rename your Hex project to include your name. 9. Now, you should be able to query metrics using Hex! Try it yourself with the following example queries: * In the first cell, you can see a table of the `order_total` metric over time. Add the `order_count` metric to this table. * The second cell shows a line graph of the `order_total` metric over time. Play around with the graph! Try changing the time grain using the **Time unit** drop-down menu. * The next table in the notebook, labeled “Example\_query\_2”, shows the number of customers who have made their first order on a given day. Create a new chart cell. Make a line graph of `first_ordered_at` vs `customers` to see how the number of new customers each day changes over time. * Create a new semantic layer cell and pick one or more metrics. Filter your metric(s) by one or more dimensions. [![Query metrics using Hex ](/img/docs/dbt-cloud/semantic-layer/hex_make_sl_cell.png?v=2 "Query metrics using Hex ")](#)Query metrics using Hex ##### Connect and query with Sigma[​](#connect-and-query-with-sigma "Direct link to Connect and query with Sigma") This section will guide you on how to use the Sigma integration to query your metrics using Sigma. If you already have a Sigma account, simply log in and skip to step 6. Otherwise, you'll be using a Sigma account you'll create with Snowflake Partner Connect. 1. Go back to your Snowflake account. In the Snowflake UI, click on the home icon in the upper left corner. In the left sidebar, select **Data Products**. Then, select **Partner Connect**. Find the Sigma tile by scrolling or by searching for Sigma in the search bar. Click the tile to connect to Sigma. [![Click the '+ New project' button on the top right](/img/docs/dbt-cloud/semantic-layer/sl-sigma-partner-connect.png?v=2 "Click the '+ New project' button on the top right")](#)Click the '+ New project' button on the top right 2. Select the Sigma tile from the list. Click the **Optional Grant** dropdown menu. Write **RAW** and **ANALYTICS** in the text box and then click **Connect**. [![Click the '+ New project' button on the top right](/img/docs/dbt-cloud/semantic-layer/sl-sigma-optional-grant.png?v=2 "Click the '+ New project' button on the top right")](#)Click the '+ New project' button on the top right 3. Make up a company name and URL to use. It doesn’t matter what URL you use, as long as it’s unique. [![Click the '+ New project' button on the top right](/img/docs/dbt-cloud/semantic-layer/sl-sigma-company-name.png?v=2 "Click the '+ New project' button on the top right")](#)Click the '+ New project' button on the top right 4. Enter your name and email address. Choose a password for your account. [![Click the '+ New project' button on the top right](/img/docs/dbt-cloud/semantic-layer/sl-sigma-create-profile.png?v=2 "Click the '+ New project' button on the top right")](#)Click the '+ New project' button on the top right 5. Great! You now have a Sigma account. Before we get started, go back to Snowlake and open a blank worksheet. Run these lines. * `grant all privileges on all views in schema analytics.SCHEMA to role pc_sigma_role;` * `grant all privileges on all tables in schema analytics.SCHEMA to role pc_sigma_role;` 6. Click on your bubble in the top right corner. Click the **Administration** button from the dropdown menu. [![Click the '+ New project' button on the top right](/img/docs/dbt-cloud/semantic-layer/sl-sigma-admin.png?v=2 "Click the '+ New project' button on the top right")](#)Click the '+ New project' button on the top right 7. Scroll down to the integrations section, then select **Add** next to the dbt integration. [![Click the '+ New project' button on the top right](/img/docs/dbt-cloud/semantic-layer/sl-sigma-add-integration.png?v=2 "Click the '+ New project' button on the top right")](#)Click the '+ New project' button on the top right 8. In the **dbt Integration** section, fill out the required fields, and then hit save: * Your dbt [service account token](https://docs.getdbt.com/docs/dbt-cloud-apis/service-tokens.md) or [personal access tokens](https://docs.getdbt.com/docs/dbt-cloud-apis/user-tokens.md). * Your access URL of your existing Sigma dbt integration. Use `cloud.getdbt.com` as your access URL. * Your dbt Environment ID. [![Click the '+ New project' button on the top right](/img/docs/dbt-cloud/semantic-layer/sl-sigma-add-info.png?v=2 "Click the '+ New project' button on the top right")](#)Click the '+ New project' button on the top right 9. Return to the Sigma home page. Create a new workbook. [![Click the '+ New project' button on the top right](/img/docs/dbt-cloud/semantic-layer/sl-sigma-make-workbook.png?v=2 "Click the '+ New project' button on the top right")](#)Click the '+ New project' button on the top right 10. Click on **Table**, then click on **SQL**. Select Snowflake `PC_SIGMA_WH` as your data connection. [![Click the '+ New project' button on the top right](/img/docs/dbt-cloud/semantic-layer/sl-sigma-make-table.png?v=2 "Click the '+ New project' button on the top right")](#)Click the '+ New project' button on the top right 11. Go ahead and query a working metric in your project! For example, let's say you had a metric that measures various order-related values. Here’s how you would query it: ```sql select * from {{ semantic_layer.query ( metrics = ['order_total', 'order_count', 'large_orders', 'customers_with_orders', 'avg_order_value', 'pct_of_orders_that_are_large'], group_by = [Dimension('metric_time').grain('day') ] ) }} ``` #### What's next[​](#whats-next "Direct link to What's next") Great job on completing the comprehensive Semantic Layer guide 🎉! You should hopefully have gained a clear understanding of what the Semantic Layer is, its purpose, and when to use it in your projects. You've learned how to: * Set up your Snowflake environment and dbt, including creating worksheets and loading data. * Connect and configure dbt with Snowflake. * Build, test, and manage dbt projects, focusing on metrics and semantic layers. * Run production jobs and query metrics with our available integrations. For next steps, you can start defining your own metrics and learn additional configuration options such as [exports](https://docs.getdbt.com/docs/use-dbt-semantic-layer/exports.md), [fill null values](https://docs.getdbt.com/docs/build/advanced-topics.md), [implementing Mesh with the Semantic Layer](https://docs.getdbt.com/docs/use-dbt-semantic-layer/sl-faqs.md#how-can-i-implement-dbt-mesh-with-the-dbt-semantic-layer), and more. Here are some additional resources to help you continue your journey: * [Semantic Layer FAQs](https://docs.getdbt.com/docs/use-dbt-semantic-layer/sl-faqs.md) * [Available integrations](https://docs.getdbt.com/docs/cloud-integrations/avail-sl-integrations.md) * Demo on [how to define and query metrics with MetricFlow](https://www.loom.com/share/60a76f6034b0441788d73638808e92ac?sid=861a94ac-25eb-4fd8-a310-58e159950f5a) * [Join our live demos](https://www.getdbt.com/resources/webinars/dbt-cloud-demos-with-experts) #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Quickstart with dbt Mesh [Back to guides](https://docs.getdbt.com/guides.md) dbt platform Quickstart Intermediate [Menu ]() #### Introduction[​](#introduction "Direct link to Introduction") Mesh is a framework that helps organizations scale their teams and data assets effectively. It promotes governance best practices and breaks large projects into manageable sections — for faster data development. Mesh is available for [dbt Enterprise](https://www.getdbt.com/) accounts. This guide will teach you how to set up a multi-project design using foundational concepts of [Mesh](https://www.getdbt.com/blog/what-is-data-mesh-the-definition-and-importance-of-data-mesh) and how to implement a data mesh in dbt: * Set up a foundational project called “Jaffle | Data Analytics” * Set up a downstream project called “Jaffle | Finance” * Add model access, versions, and contracts * Set up a dbt job that is triggered on completion of an upstream job For more information on why data mesh is important, read this post: [What is data mesh? The definition and importance of data mesh](https://www.getdbt.com/blog/what-is-data-mesh-the-definition-and-importance-of-data-mesh). Videos for you You can check out [dbt Fundamentals](https://learn.getdbt.com/courses/dbt-fundamentals) for free if you're interested in course learning with videos. You can also watch the [YouTube video on dbt and Snowflake](https://www.youtube.com/watch?v=kbCkwhySV_I\&list=PL0QYlrC86xQm7CoOH6RS7hcgLnd3OQioG). ##### Related content:[​](#related-content "Direct link to Related content:") * [Data mesh concepts: What it is and how to get started](https://www.getdbt.com/blog/data-mesh-concepts-what-it-is-and-how-to-get-started) * [Deciding how to structure your Mesh](https://docs.getdbt.com/best-practices/how-we-mesh/mesh-3-structures.md) * [Mesh best practices guide](https://docs.getdbt.com/best-practices/how-we-mesh/mesh-4-implementation.md) * [Mesh FAQs](https://docs.getdbt.com/best-practices/how-we-mesh/mesh-5-faqs.md) #### Prerequisites​[​](#prerequisites "Direct link to Prerequisites​") To leverage Mesh, you need the following: * You must have a [dbt Enterprise-tier account](https://www.getdbt.com/get-started/enterprise-contact-pricing) [Enterprise](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing")[Enterprise +](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing") * You have access to a cloud data platform, permissions to load the sample data tables, and dbt permissions to create new projects. * This guide uses the Jaffle Shop sample data, including `customers`, `orders`, and `payments` tables. Follow the provided instructions to load this data into your respective data platform: * [Snowflake](https://docs.getdbt.com/guides/snowflake.md?step=3) * [Databricks](https://docs.getdbt.com/guides/databricks.md?step=3) * [Redshift](https://docs.getdbt.com/guides/redshift.md?step=3) * [BigQuery](https://docs.getdbt.com/guides/bigquery.md?step=3) * [Fabric](https://docs.getdbt.com/guides/microsoft-fabric.md?step=2) * [Starburst Galaxy](https://docs.getdbt.com/guides/starburst-galaxy.md?step=2) This guide assumes you have experience with or fundamental knowledge of dbt. Take the [dbt Fundamentals](https://learn.getdbt.com/courses/dbt-fundamentals) course first if you are brand new to dbt. #### Create and configure two projects[​](#create-and-configure-two-projects "Direct link to Create and configure two projects") In this section, you'll create two new, empty projects in dbt to serve as your foundational and downstream projects: * **Foundational projects** (or upstream projects) typically contain core models and datasets that serve as the base for further analysis and reporting. * **Downstream projects** build on these foundations, often adding more specific transformations or business logic for dedicated teams or purposes. For example, the always-enterprising and fictional account "Jaffle Labs" will create two projects for their data analytics and finance team: **Jaffle | Data Analytics** and **Jaffle | Finance**. To [create](https://docs.getdbt.com/docs/cloud/about-cloud-setup.md) a new project in dbt: 1. From **Account settings**, go to **Projects**. Click **New project**. 2. Enter a project name and click **Continue**. * Use "Jaffle | Data Analytics" for one project * Use "Jaffle | Finance" for the other project 3. Select your data platform, then **Next** to set up your connection. 4. In the **Configure your environment** section, enter the **Settings** for your new project. 5. Click **Test Connection**. This verifies that dbt can access your data platform account. 6. Click **Next** if the test succeeded. If it fails, you might need to go back and double-check your settings. * For this guide, make sure you create a single [development](https://docs.getdbt.com/docs/dbt-cloud-environments.md#create-a-development-environment) and [Deployment](https://docs.getdbt.com/docs/deploy/deploy-environments.md) per project. * For "Jaffle | Data Analytics", set the default database to `jaffle_da`. * For "Jaffle | Finance", set the default database to `jaffle_finance`. 7. Continue the prompts to complete the project setup. Once configured, each project should have: * A data platform connection * New git repo * One or more [environments](https://docs.getdbt.com/docs/deploy/deploy-environments.md) (such as development, deployment) [![Navigate to Account settings.](/img/guides/dbt-ecosystem/dbt-python-snowpark/5-development-schema-name/1-settings-gear-icon.png?v=2 "Navigate to Account settings.")](#)Navigate to Account settings. [![Select projects from the menu.](/img/guides/dbt-mesh/select_projects.png?v=2 "Select projects from the menu.")](#)Select projects from the menu. [![Create a new project in the Studio IDE.](/img/guides/dbt-mesh/create_a_new_project.png?v=2 "Create a new project in the Studio IDE.")](#)Create a new project in the Studio IDE. [![Name your project.](/img/guides/dbt-mesh/enter_project_name.png?v=2 "Name your project.")](#)Name your project. [![Select the relevant connection for your projects.](/img/guides/dbt-mesh/select_a_connection.png?v=2 "Select the relevant connection for your projects.")](#)Select the relevant connection for your projects. ##### Create a production environment[​](#create-a-production-environment "Direct link to Create a production environment") In dbt, each project can have one deployment environment designated as "Production.". You must set up a ["Production" or "Staging" deployment environment](https://docs.getdbt.com/docs/deploy/deploy-environments.md) for each project you want to "mesh" together. This enables you to leverage Catalog in the [later steps](https://docs.getdbt.com/guides/mesh-qs.md?step=5#create-and-run-a-dbt-cloud-job) of this guide. To set a production environment: 1. Navigate to **Deploy** -> **Environments**, then click **Create New Environment**. 2. Select **Deployment** as the environment type. 3. Under **Set deployment type**, select the **Production** button. 4. Select the dbt version. 5. Continue filling out the fields as necessary in the **Deployment connection** and **Deployment credentials** sections. 6. Click **Test Connection** to confirm the deployment connection. 7. Click **Save** to create a production environment. [![Set your production environment as the default environment in your Environment Settings](/img/docs/dbt-cloud/using-dbt-cloud/prod-settings-1.png?v=2 "Set your production environment as the default environment in your Environment Settings")](#)Set your production environment as the default environment in your Environment Settings #### Set up a foundational project[​](#set-up-a-foundational-project "Direct link to Set up a foundational project") This upstream project is where you build your core data assets. This project will contain the raw data sources, staging models, and core business logic. dbt enables data practitioners to develop in their tool of choice and comes equipped with a local [dbt CLI](https://docs.getdbt.com/docs/cloud/cloud-cli-installation.md) or in-browser [Studio IDE](https://docs.getdbt.com/docs/cloud/studio-ide/develop-in-studio.md). In this section of the guide, you will set the "Jaffle | Data Analytics" project as your foundational project using the Studio IDE. 1. First, navigate to the **Studio IDE** page to verify your setup. 2. Click **Initialize dbt project** if you’ve started with an empty repo. 3. Delete the `models/example` folder. 4. Navigate to the `dbt_project.yml` file and rename the project (line 5) from `my_new_project` to `analytics`. 5. In your `dbt_project.yml` file, remove lines 39-42 (the `my_new_project` model reference). 6. In the **File Catalog**, hover over the project directory and click the **...**, then select **Create file**. 7. Create two new folders: `models/staging` and `models/core`. ##### Staging layer[​](#staging-layer "Direct link to Staging layer") Now that you've set up the foundational project, let's start building the data assets. Set up the staging layer as follows: 1. Create a new properties YAML file `models/staging/sources.yml`. 2. Declare the sources by copying the following into the file and clicking **Save**. models/staging/sources.yml ```yaml sources: - name: jaffle_shop description: This is a replica of the Postgres database used by our app database: raw schema: jaffle_shop tables: - name: customers description: One record per customer. - name: orders description: One record per order. Includes cancelled and deleted orders. ``` 3. Create a `models/staging/stg_customers.sql` file to select from the `customers` table in the `jaffle_shop` source. models/staging/stg\_customers.sql ```sql select id as customer_id, first_name, last_name from {{ source('jaffle_shop', 'customers') }} ``` 4. Create a `models/staging/stg_orders.sql` file to select from the `orders` table in the `jaffle_shop` source. models/staging/stg\_orders.sql ```sql select id as order_id, user_id as customer_id, order_date, status from {{ source('jaffle_shop', 'orders') }} ``` 5. Create a `models/core/fct_orders.sql` file to build a fact table with customer and order details. models/core/fct\_orders.sql ```sql with customers as ( select * from {{ ref('stg_customers') }} ), orders as ( select * from {{ ref('stg_orders') }} ), customer_orders as ( select customer_id, min(order_date) as first_order_date from orders group by customer_id ), final as ( select o.order_id, o.order_date, o.status, c.customer_id, c.first_name, c.last_name, co.first_order_date, -- Note that we've used a macro for this so that the appropriate DATEDIFF syntax is used for each respective data platform {{ datediff('first_order_date', 'order_date', 'day') }} as days_as_customer_at_purchase from orders o left join customers c using (customer_id) left join customer_orders co using (customer_id) ) select * from final ``` 6. Navigate to the **Command bar** and execute `dbt build`. Before a downstream team can leverage assets from this foundational project, you need to first: * [Create and define](https://docs.getdbt.com/docs/mesh/govern/model-access.md) at least one model as “public” * Run a [deployment job](https://docs.getdbt.com/docs/deploy/deploy-jobs.md) successfully * Note, Enable [**Generate docs on run**](https://docs.getdbt.com/docs/deploy/deploy-jobs.md) for this job to update assets in Catalog. Once run, you can click **Catalog** from the main navigation and select your project to see its lineage, tests, and documentation coming through successfully. #### Define a public model and run first job[​](#define-a-public-model-and-run-first-job "Direct link to Define a public model and run first job") In the previous section, you've arranged your basic building blocks, now let's integrate Mesh. Although the Finance team requires the `fct_orders` model for analyzing payment trends, other models, particularly those in the staging layer used for data cleansing and joining, are not needed by downstream teams. To make `fct_orders` publicly available: 1. In the `models/core/core.yml` file, add a `access: public` clause to the relevant properties YAML file by adding and saving the following: models/core/core.yml ```yaml models: - name: fct_orders config: access: public # changed to config in v1.10 description: "Customer and order details" columns: - name: order_id data_type: number description: "" - name: order_date data_type: date description: "" - name: status data_type: varchar description: "Indicates the status of the order" - name: customer_id data_type: number description: "" - name: first_name data_type: varchar description: "" - name: last_name data_type: varchar description: "" - name: first_order_date data_type: date description: "" - name: days_as_customer_at_purchase data_type: number description: "Days between this purchase and customer's first purchase" ``` Note: By default, model access is set to "protected", which means they can only be referenced within the same project. Learn more about access types and model groups [here](https://docs.getdbt.com/docs/mesh/govern/model-access.md#access-modifiers). 2. Navigate to the Studio IDE **Lineage** tab to see the model noted as **Public**, below the model name. [![Jaffle | Data Analytics lineage](/img/guides/dbt-mesh/da_lineage.png?v=2 "Jaffle | Data Analytics lineage")](#)Jaffle | Data Analytics lineage 3. Go to **Version control** and click the **Commit and Sync** button to commit your changes. 4. Merge your changes to the main or production branch. ##### Create and run a dbt job[​](#create-and-run-a-dbt-job "Direct link to Create and run a dbt job") Before a downstream team can leverage assets from this foundational project, you need to [create a production environment](https://docs.getdbt.com/guides/mesh-qs.md?step=3#create-a-production-environment) and run a [deployment job](https://docs.getdbt.com/docs/deploy/deploy-jobs.md) successfully. To run your first deployment dbt job, you will need to create a new dbt job. 1. Go to **Orchestration** > **Jobs**. 2. Click **Create job** and then **Deploy job**. 3. Select the **Generate docs on run** option. This will hydrate your metadata in Catalog. [![ Select the 'Generate docs on run' option when configuring your dbt job.](/img/guides/dbt-mesh/generate_docs_on_run.png?v=2 " Select the 'Generate docs on run' option when configuring your dbt job.")](#) Select the 'Generate docs on run' option when configuring your dbt job. 4. Click **Save**. 5. Click **Run now** to trigger the job. 6. After the run is complete, navigate to Catalog. You should now see your lineage, tests, and documentation coming through successfully. For details on how dbt uses metadata from the Staging environment to resolve references in downstream projects, check out the section on [Staging with downstream dependencies](https://docs.getdbt.com/docs/mesh/govern/project-dependencies.md#staging-with-downstream-dependencies). #### Reference a public model in your downstream project[​](#reference-a-public-model-in-your-downstream-project "Direct link to Reference a public model in your downstream project") In this section, you will set up the downstream project, "Jaffle | Finance", and [cross-project reference](https://docs.getdbt.com/docs/mesh/govern/project-dependencies.md) the `fct_orders` model from the foundational project. Navigate to the **Develop** page to set up our project: 1. If you’ve also started with a new git repo, click **Initialize dbt project** under the **Version control** section. 2. Delete the `models/example` folder. 3. Navigate to the `dbt_project.yml` file and rename the project (line 5) from `my_new_project` to `finance`. 4. Navigate to the `dbt_project.yml` file and remove lines 39-42 (the `my_new_project` model reference). 5. In the **File Catalog**, hover over the project directory, click the **...** and select **Create file**. 6. Name the file `dependencies.yml`. 7. Add the upstream `analytics` project and the `dbt_utils` package. Click **Save**. dependencies.yml ```yaml packages: - package: dbt-labs/dbt_utils version: 1.1.1 projects: - name: analytics ``` ##### Staging layer[​](#staging-layer-1 "Direct link to Staging layer") Now that you've set up the foundational project, let's start building the data assets. Set up the staging layer as follows: 1. Create a new property file `models/staging/sources.yml` and declare the sources by copying the following into the file and clicking **Save**. models/staging/sources.yml ```yml sources: - name: stripe database: raw schema: stripe tables: - name: payment ``` 2. Create `models/staging/stg_payments.sql` to select from the `payment` table in the `stripe` source. models/staging/stg\_payments.sql ```sql with payments as ( select * from {{ source('stripe', 'payment') }} ), final as ( select id as payment_id, orderID as order_id, paymentMethod as payment_method, amount, created as payment_date from payments ) select * from final ``` ##### Reference the public model[​](#reference-the-public-model "Direct link to Reference the public model") You're now set to add a model that explores how payment types vary throughout a customer's journey. This helps determine whether coupon gift cards decrease with repeat purchases, as our marketing team anticipates, or remain consistent. 1. To reference the model, use the following logic to ascertain this: models/core/agg\_customer\_payment\_journey.sql ```sql with stg_payments as ( select * from {{ ref('stg_payments') }} ), fct_orders as ( select * from {{ ref('analytics', 'fct_orders') }} ), final as ( select days_as_customer_at_purchase, -- we use the pivot macro in the dbt_utils package to create columns that total payments for each method {{ dbt_utils.pivot( 'payment_method', dbt_utils.get_column_values(ref('stg_payments'), 'payment_method'), agg='sum', then_value='amount', prefix='total_', suffix='_amount' ) }}, sum(amount) as total_amount from fct_orders left join stg_payments using (order_id) group by 1 ) select * from final ``` 2. Notice the cross-project ref at work! When you add the `ref`, the Studio IDE's auto-complete feature recognizes the public model as available. [![Cross-project ref autocomplete in the Studio IDE](/img/guides/dbt-mesh/cross_proj_ref_autocomplete.png?v=2 "Cross-project ref autocomplete in the Studio IDE")](#)Cross-project ref autocomplete in the Studio IDE 3. This automatically resolves (or links) to the correct database, schema, and table/view set by the upstream project. [![Cross-project ref compile](/img/guides/dbt-mesh/cross_proj_ref_compile.png?v=2 "Cross-project ref compile")](#)Cross-project ref compile 4. You can also see this connection displayed in the live **Lineage** tab. [![Cross-project ref lineage](/img/guides/dbt-mesh/cross_proj_ref_lineage.png?v=2 "Cross-project ref lineage")](#)Cross-project ref lineage #### Add model versions and contracts[​](#add-model-versions-and-contracts "Direct link to Add model versions and contracts") How can you enhance resilience and add guardrails to this type of multi-project relationship? You can adopt best practices from software engineering by: 1. Defining model contracts — Set up [model contracts](https://docs.getdbt.com/docs/mesh/govern/model-contracts.md) in dbt to define a set of upfront "guarantees" that define the shape of your model. While building your model, dbt will verify that your model's transformation will produce a dataset matching up with its contract; if not, the build fails. 2. Defining model versions — Use [model versions](https://docs.getdbt.com/docs/mesh/govern/model-versions.md) to manage updates and handle breaking changes systematically. ##### Set up model contracts[​](#set-up-model-contracts "Direct link to Set up model contracts") As part of the Data Analytics team, you may want to ensure the `fct_orders` model is reliable for downstream users, like the Finance team. 1. Navigate to `models/core/core.yml` and under the `fct_orders` model before the `columns:` section, add a data contract to enforce reliability: ```yaml models: - name: fct_orders description: “Customer and order details” config: access: public # changed to config in v1.10 contract: enforced: true columns: - name: order_id ..... ``` 2. Test what would happen if this contract were violated. In `models/core/fct_orders.sql`, comment out the `orders.status` column and click **Build** to try building the model. * If the contract is breached, the build fails, as seen in the command bar history. [![The data contract was breached and the dbt build run failed.](/img/guides/dbt-mesh/break_contract.png?v=2 "The data contract was breached and the dbt build run failed.")](#)The data contract was breached and the dbt build run failed. ##### Set up model versions[​](#set-up-model-versions "Direct link to Set up model versions") In this section, you will set up model versions by the Data Analytics team as they upgrade the `fct_orders` model while offering backward compatibility and a migration notice to the downstream Finance team. 1. Rename the existing model file from `models/core/fct_orders.sql` to `models/core/fct_orders_v1.sql`. 2. Create a new file `models/core/fct_orders_v2.sql` and adjust the schema: * Comment out `o.status` in the `final` CTE. * Add a new field, `case when o.status = 'returned' then true else false end as is_return` to indicate if an order was returned. 3. Then, add the following to your `models/core/core.yml` file: * The `is_return` column * The two model `versions` * A `latest_version` to indicate which model is the latest (and should be used by default, unless specified otherwise) * A `deprecation_date` to version 1 as well to indicate when the model will be deprecated. 4. It should now read as follows: models/core/core.yml ```yaml models: - name: fct_orders description: "Customer and order details" latest_version: 2 config: access: public # changed to config in v1.10 contract: enforced: true columns: - name: order_id data_type: number description: "" - name: order_date data_type: date description: "" - name: status data_type: varchar description: "Indicates the status of the order" - name: is_return data_type: boolean description: "Indicates if an order was returned" - name: customer_id data_type: number description: "" - name: first_name data_type: varchar description: "" - name: last_name data_type: varchar description: "" - name: first_order_date data_type: date description: "" - name: days_as_customer_at_purchase data_type: number description: "Days between this purchase and customer's first purchase" # Declare the versions, and highlight the diffs versions: - v: 1 deprecation_date: 2024-06-30 00:00:00.00+00:00 columns: # This means: use the 'columns' list from above, but exclude is_return - include: all exclude: [is_return] - v: 2 columns: # This means: use the 'columns' list from above, but exclude status - include: all exclude: [status] ``` 5. Verify how dbt compiles the `ref` statement based on the updates. Open a new file, add the following select statements, and click **Compile**. Note how each ref is compiled to the specified version (or the latest version if not specified). ```sql select * from {{ ref('fct_orders', v=1) }} select * from {{ ref('fct_orders', v=2) }} select * from {{ ref('fct_orders') }} ``` #### Add a dbt job in the downstream project[​](#add-a-dbt-job-in-the-downstream-project "Direct link to Add a dbt job in the downstream project") Before proceeding, make sure you commit and merge your changes in both the “Jaffle | Data Analytics” and “Jaffle | Finance” projects. A member of the Finance team would like to schedule a dbt job for their customer payment journey analysis immediately after the data analytics team refreshes their pipelines. 1. In the “Jaffle | Finance” project, go to the **Jobs** page by navigating to **Orchestration** > **Jobs**. 2. Click **Create job** and then **Deploy job**. 3. Add a name for the job, then scroll to the bottom of the **Job completion** section. 4. In the **Triggers** section, configure the job to **Run when another job finishes** and select the upstream job from the “Jaffle | Data Analytics” project. [![Trigger job on completion](/img/guides/dbt-mesh/trigger_on_completion.png?v=2 "Trigger job on completion")](#)Trigger job on completion 5. Click **Save** and verify the job is set up correctly. 6. Go to the “Jaffle | Data Analytics” jobs page. Select the **Daily job** and click **Run now**. 7. Once this job completes successfully, go back to the “Jaffle | Finance” jobs page. You should see that the Finance team’s job was triggered automatically. This simplifies the process of staying in sync with the upstream tables and removes the need for more sophisticated orchestration skills, such as coordinating jobs across projects via an external orchestrator. #### View deprecation warning[​](#view-deprecation-warning "Direct link to View deprecation warning") To find out how long the Finance team has to migrate from `fct_orders_v1` to `fct_orders_v2`, follow these steps: 1. In the “Jaffle | Finance” project, navigate to the **Develop** page. 2. Edit the cross-project ref to use v=1 in `models/marts/agg_customer_payment_journey.sql`: models/core/agg\_customer\_payment\_journey.sql ```sql with stg_payments as ( select * from {{ ref('stg_payments') }} ), fct_orders as ( select * from {{ ref('analytics', 'fct_orders', v=1) }} ), final as ( select days_as_customer_at_purchase, -- we use the pivot macro in the dbt_utils package to create columns that total payments for each method {{ dbt_utils.pivot( 'payment_method', dbt_utils.get_column_values(ref('stg_payments'), 'payment_method'), agg='sum', then_value='amount', prefix='total_', suffix='_amount' ) }}, sum(amount) as total_amount from fct_orders left join stg_payments using (order_id) group by 1 ) select * from final ``` 3. In the Studio IDE, go to **Version control** to commit and merge the changes. 4. Go to the **Deploy** and then **Jobs** page. 5. Click **Run now** to run the Finance job. The `agg_customer_payment_journey` model will build and display a deprecation date warning. [![The model will display a deprecation date warning.](/img/guides/dbt-mesh/deprecation_date_warning.png?v=2 "The model will display a deprecation date warning.")](#)The model will display a deprecation date warning. #### View lineage with dbt Catalog[​](#view-lineage-with-dbt-catalog "Direct link to View lineage with dbt Catalog") Use [Catalog](https://docs.getdbt.com/docs/explore/explore-projects.md) to view the lineage across projects in dbt. Navigate to the **Explore** page for each of your projects — you should now view the [lineage seamlessly across projects](https://docs.getdbt.com/docs/explore/explore-multiple-projects.md). [![View 'Jaffle | Data Analytics' lineage with dbt Catalog ](/img/guides/dbt-mesh/jaffle_da_final_lineage.png?v=2 "View 'Jaffle | Data Analytics' lineage with dbt Catalog ")](#)View 'Jaffle | Data Analytics' lineage with dbt Catalog #### What's next[​](#whats-next "Direct link to What's next") Congratulations 🎉! You're ready to bring the benefits of Mesh to your organization. You've learned: * How to establish a foundational project "Jaffle | Data Analytics." * Create a downstream project "Jaffle | Finance." * Implement model access, versions, and contracts. * Set up dbt jobs triggered by upstream job completions. Here are some additional resources to help you continue your journey: * [How we build our dbt mesh projects](https://docs.getdbt.com/best-practices/how-we-mesh/mesh-1-intro.md) * [Mesh FAQs](https://docs.getdbt.com/best-practices/how-we-mesh/mesh-5-faqs.md) * [Implement Mesh with the Semantic Layer](https://docs.getdbt.com/docs/use-dbt-semantic-layer/sl-faqs.md#how-can-i-implement-dbt-mesh-with-the-dbt-semantic-layer) * [Cross-project references](https://docs.getdbt.com/docs/mesh/govern/project-dependencies.md#how-to-write-cross-project-ref) * [Catalog](https://docs.getdbt.com/docs/explore/explore-projects.md) #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Quickstart with MetricFlow time spine [Back to guides](https://docs.getdbt.com/guides.md) Quickstart Semantic Layer Intermediate [Menu ]() #### Introduction[​](#introduction "Direct link to Introduction") This guide explains how to configure a time spine using the [Semantic Layer Jaffle shop example project](https://github.com/dbt-labs/jaffle-sl-template) as a reference. ##### What is a time spine model?[​](#what-is-a-time-spine-model "Direct link to What is a time spine model?") A [time spine](https://docs.getdbt.com/docs/build/metricflow-time-spine.md) is essential for time-based joins and aggregations in MetricFlow, the engine that powers the Semantic Layer. To use MetricFlow with time-based metrics and dimensions, you must provide a time spine. This serves as the foundation for time-based joins and aggregations. You can either: * Create a time spine SQL model from scratch, or * Use an existing model in your project, like `dim_date`. And once you have a time spine, you need to configure it in YAML to tell MetricFlow how to use it. This guide will show you how to do both! ##### Prerequisites[​](#prerequisites "Direct link to Prerequisites") Before you start, make sure you have: * A dbt project set up. If you don't have one, follow the [Semantic Layer quickstart guide](https://docs.getdbt.com/guides/sl-snowflake-qs.md?step=1) or the [dbt quickstart guides](https://docs.getdbt.com/guides.md?tags=Quickstart) guide to help you get started. #### Add a time spine SQL model[​](#add-a-time-spine-sql-model "Direct link to Add a time spine SQL model") Let's get started by assuming you're creating a time spine from scratch. If you have a dbt project set up already and have your own time spine (like a `dim_date` type model), you can skip this step and go to [Use an existing dim\_date model](https://docs.getdbt.com/guides/mf-time-spine.md#using-an-existing-dim-date-model). The time spine is a dbt model that generates a series of dates (or timestamps) at a specific granularity. In this example, let's create a daily time spine — `time_spine_daily.sql`. 1. Navigate to the `models/marts` directory in your dbt project. 2. Add a new SQL file named `time_spine_daily.sql` with the following content: models/marts/time\_spine\_daily.sql ```sql {{ config( materialized = 'table', ) }} with base_dates as ( {{ dbt.date_spine( 'day', "DATE('2000-01-01')", "DATE('2030-01-01')" ) }} ), final as ( select cast(date_day as date) as date_day from base_dates ) select * from final where date_day > dateadd(year, -5, current_date()) -- Keep recent dates only and date_day < dateadd(day, 30, current_date()) ``` This generates a model of daily dates ranging from 5 years in the past to 30 days into the future. 3. Run and preview the model to create the model: ```bash dbt run --select time_spine_daily dbt show --select time_spine_daily # Use this command to preview the model if developing locally ``` 4. If developing in the Studio IDE, you can preview the model by clicking the **Preview** button: [![Preview the time spine model in the Studio IDE](/img/mf-guide-preview-time-spine-table.png?v=2 "Preview the time spine model in the Studio IDE")](#)Preview the time spine model in the Studio IDE #### Add YAML configuration for the time spine[​](#add-yaml-configuration-for-the-time-spine "Direct link to Add YAML configuration for the time spine") Now that you've created the SQL file, configure it in YAML so MetricFlow can recognize and use it. 1. Navigate to the `models/marts` directory. 2. Add a new YAML file named `_models.yml` with the following content: models/marts/\_models.yml ```yaml models: - name: time_spine_daily description: A time spine with one row per day, ranging from 5 years in the past to 30 days into the future. time_spine: standard_granularity_column: date_day # The base column used for time joins columns: - name: date_day description: The base date column for daily granularity granularity: day ``` This time spine YAML file: * Defines `date_day` as the base column for daily granularity. * Configures `time_spine` properties so MetricFlow can use the model. ##### Using an existing dim\_date model[​](#using-an-existing-dim_date-model "Direct link to Using an existing dim_date model") This optional approach reuses an existing model, saving you the effort of creating a new one. However if you created a time spine from scratch, you can skip this section. If your project already includes a `dim_date` or similar model, you can configure it as a time spine: 1. Locate the existing model (`dim_date`). 2. Update `_models.yml` file to configure it as a time spine: \_models.yml ```yaml models: - name: dim_date description: An existing date dimension model used as a time spine. time_spine: standard_granularity_column: date_day columns: - name: date_day granularity: day - name: day_of_week granularity: day - name: full_date granularity: day ``` This time spine YAML file configures the `time_spine` property so MetricFlow can use the model. #### Run and preview the time spine[​](#run-and-preview-the-time-spine "Direct link to Run and preview the time spine") For the time spine you created, let's run it and preview the output if you haven't already done this. If you have already run the model, you can skip this step. 1. Run the following command: ```bash dbt run --select time_spine_daily dbt show --select time_spine_daily # Use this command to preview the model if developing locally ``` 2. If developing in the Studio IDE, you can preview the model by clicking the **Preview** button: [![Preview the time spine model in the Studio IDE](/img/mf-guide-preview-time-spine-table.png?v=2 "Preview the time spine model in the Studio IDE")](#)Preview the time spine model in the Studio IDE 3. Check that the model: * Contains one row per day. * Covers the date range you want (5 years back to 30 days forward). 4. (Optional) If you have [metrics](https://docs.getdbt.com/docs/build/metrics-overview.md) already defined in your project, you can query the model/metrics using [Semantic Layer commands](https://docs.getdbt.com/docs/build/metricflow-commands.md) to validate the time spine. Let's say you have a `revenue` metric defined. You can query the model/metrics using the following command: ```bash dbt sl query --metrics revenue --group-by metric_time ``` This will output results similar to the following in the Studio IDE: [![Validate the metrics and time spine output in the Studio IDE](/img/quickstarts/dbt-cloud/validate-mf-timespine-output.png?v=2 "Validate the metrics and time spine output in the Studio IDE")](#)Validate the metrics and time spine output in the Studio IDE 5. Double check that the results are correct and returning the expected data. #### Add additional granularities[​](#add-additional-granularities "Direct link to Add additional granularities") This section is optional and will show you how to add additional granularities to your time spine: * [Yearly](#yearly-time-spine) * [Custom calendars](#custom-calendars) ##### Yearly time spine[​](#yearly-time-spine "Direct link to Yearly time spine") To support multiple granularities (like hourly, yearly, monthly), create additional time spine models and configure them in YAML. 1. Add a new SQL file named `time_spine_yearly.sql` with the following content: models/marts/time\_spine\_yearly.sql ```sql {{ config( materialized = 'table', ) }} with years as ( {{ dbt.date_spine( 'year', "to_date('01/01/2000','mm/dd/yyyy')", "to_date('01/01/2025','mm/dd/yyyy')" ) }} ), final as ( select cast(date_year as date) as date_year from years ) select * from final -- filter the time spine to a specific range where date_year >= date_trunc('year', dateadd(year, -4, current_timestamp())) and date_year < date_trunc('year', dateadd(year, 1, current_timestamp())) ``` 2. Then update the `_models.yml` file and add the yearly time spine (below the daily time spine config): \_models.yml ```yaml models: - name: time_spine_daily ... rest of the daily time spine config ... - name: time_spine_yearly description: time spine one row per house time_spine: standard_granularity_column: date_year columns: - name: date_year granularity: year ``` 3. Run or preview the model to create the model: ```bash dbt run --select time_spine_yearly dbt show --select time_spine_yearly # Use this command to preview the model if developing locally ``` 4. Validate the output by querying the generated model: ```bash dbt sl query --metrics orders --group-by metric_time__year ``` If you're developing in the Studio IDE, you can preview the model by clicking the **Preview** button. [![Validate the metrics and time spine output in the Studio IDE](/img/mf-guide-query.png?v=2 "Validate the metrics and time spine output in the Studio IDE")](#)Validate the metrics and time spine output in the Studio IDE Extra credit! For some extra practice, try one of the following exercises: * Order the `dbt sl query --metrics orders --group-by metric_time__year` command output by ascending order of `metric_time__year`. Check out the [dbt Semantic Layer commands](https://docs.getdbt.com/docs/build/metricflow-commands.md#query) docs for more information on how to do this. * Filter to this year and last year only to limit data returned. * Try creating a monthly time spine — duplicate your daily time spine model, adjust it to generate one row per month, and update the YAML file to include `granularity: month`. Give it a try! ##### Custom calendars[​](#custom-calendars "Direct link to Custom calendars") To support custom calendars (like fiscal years, fiscal quarters, and so on), create an additional time spine and configure it in YAML. This feature is available in the dbt's [Latest release track](https://docs.getdbt.com/docs/dbt-versions/cloud-release-tracks.md) or [dbt Core 1.9 and later](https://docs.getdbt.com/docs/dbt-versions/core-upgrade/upgrading-to-v1.9.md). 1. Add a new SQL file named `fiscal_calendar.sql` with the following content (or use your own custom calendar and configure it in YAML): models/marts/fiscal\_calendar.sql ```sql with date_spine as ( select date_day, extract(year from date_day) as calendar_year, extract(week from date_day) as calendar_week from {{ ref('time_spine_daily') }} ), fiscal_calendar as ( select date_day, -- Define custom fiscal year starting in October case when extract(month from date_day) >= 10 then extract(year from date_day) + 1 else extract(year from date_day) end as fiscal_year, -- Define fiscal weeks (e.g., shift by 1 week) extract(week from date_day) + 1 as fiscal_week from date_spine ) select * from fiscal_calendar ``` 2. Then update `_models.yml` file and add the fiscal calendar time spine (below the yearly time spine config): \_models.yml ```yaml models: - name: time_spine_yearly ... rest of the yearly time spine config ... - name: fiscal_calendar description: A custom fiscal calendar with fiscal year and fiscal week granularities. time_spine: standard_granularity_column: date_day custom_granularities: - name: fiscal_year column_name: fiscal_year - name: fiscal_week column_name: fiscal_week columns: - name: date_day granularity: day - name: fiscal_year description: "Custom fiscal year starting in October" - name: fiscal_week description: "Fiscal week, shifted by 1 week from standard calendar" ``` 3. Run or preview the model to create the model: ```bash dbt run --select fiscal_calendar dbt show --select fiscal_calendar # Use this command to preview the model if developing locally ``` If you're developing in the Studio IDE, you can preview the model by clicking the **Preview** button. 4. Validate the output by querying the generated model along with your metrics: ```bash dbt sl query --metrics orders --group-by metric_time__fiscal_year ``` [![Validate the custom calendar metrics and time spine output in the Studio IDE](/img/mf-guide-fiscal-preview.png?v=2 "Validate the custom calendar metrics and time spine output in the Studio IDE")](#)Validate the custom calendar metrics and time spine output in the Studio IDE #### What's next[​](#whats-next "Direct link to What's next") Congratulations 🎉! You've set up a time spine and are ready to bring the benefits of MetricFlow and the Semantic Layer to your organization. You've learned: * How to create a time spine or use an existing model. * How to configure a time spine in YAML. * How to add additional granularities to your time spine. Here are some additional resources to help you continue your journey: * [MetricFlow time spine](https://docs.getdbt.com/docs/build/metricflow-time-spine.md) * [Semantic Layer](https://docs.getdbt.com/docs/use-dbt-semantic-layer/dbt-sl.md) * [Build metrics](https://docs.getdbt.com/docs/build/metrics-overview.md) * [Quickstart with Semantic Layer](https://docs.getdbt.com/guides/sl-snowflake-qs.md?step=1) #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Refactoring legacy SQL to dbt [Back to guides](https://docs.getdbt.com/guides.md) SQL Advanced [Menu ]() #### Introduction[​](#introduction "Direct link to Introduction") You may have already learned how to build dbt models from scratch. But in reality, you probably already have some queries or stored procedures that power analyses and dashboards, and now you’re wondering how to port those into dbt. There are two parts to accomplish this: migration and refactoring. In this guide we’re going to learn a process to help us turn legacy SQL code into modular dbt models. When migrating and refactoring code, it’s of course important to stay organized. We'll do this by following several steps: 1. Migrate your code 1:1 into dbt 2. Implement dbt sources rather than referencing raw database tables 3. Choose a refactoring strategy 4. Implement CTE groupings and cosmetic cleanup 5. Separate [data transformations](https://www.getdbt.com/analytics-engineering/transformation/) into standardized layers 6. Audit the output of dbt models vs legacy SQL Let's get into it! More resources This guide is excerpted from the new dbt Learn On-demand Course, "Refactoring SQL for Modularity" - if you're curious, pick up the [free refactoring course here](https://learn.getdbt.com/courses/refactoring-sql-for-modularity), which includes example and practice refactoring projects. Or for a more in-depth look at migrating DDL and DML from stored procedures, refer to the[Migrate from stored procedures](https://docs.getdbt.com/guides/migrate-from-stored-procedures.md) guide. #### Migrate your existing SQL code[​](#migrate-your-existing-sql-code "Direct link to Migrate your existing SQL code") Your goal in this initial step is simply to use dbt to run your existing SQL transformation, with as few modifications as possible. This will give you a solid base to work from. While refactoring you'll be **moving around** a lot of logic, but ideally you won't be **changing** the logic. More changes = more auditing work, so if you come across anything you'd like to fix, try your best to card that up for another task after refactoring! We'll save the bulk of our auditing for the end when we've finalized our legacy-to-dbt model restructuring. To get going, you'll copy your legacy SQL query into your dbt project, by saving it in a `.sql` file under the `/models` directory of your project. [![Your dbt project's folder structure](/img/tutorial/refactoring/legacy-query-model.png?v=2 "Your dbt project's folder structure")](#)Your dbt project's folder structure Once you've copied it over, you'll want to `dbt run` to execute the query and populate the table in your warehouse. If this is your first time running dbt, you may want to start with the [Introduction to dbt](https://docs.getdbt.com/docs/introduction.md) and the earlier sections of the [quickstart guide](https://docs.getdbt.com/guides.md) before diving into refactoring. This step may sound simple, but if you're porting over an existing set of SQL transformations to a new SQL dialect, you will need to consider how your legacy SQL dialect differs from your new SQL flavor, and you may need to modify your legacy code to get it to run at all. This will commonly happen if you're migrating from a [stored procedure workflow on a legacy database](https://getdbt.com/analytics-engineering/case-for-elt-workflow/) into dbt + a cloud data warehouse. Functions that you were using previously may not exist, or their syntax may shift slightly between SQL dialects. If you're not migrating data warehouses at the moment, then you can keep your SQL syntax the same. You have access to the exact same SQL dialect inside of dbt that you have querying directly from your warehouse. #### Create sources from table references[​](#create-sources-from-table-references "Direct link to Create sources from table references") To query from your data warehouse, we recommend creating [sources in dbt](https://docs.getdbt.com/docs/build/sources.md) rather than querying the database table directly. This allows you to call the same table in multiple places with `{{ src('my_source', 'my_table') }}` rather than `my_database.my_schema.my_table`. We start here for several reasons: ###### Source freshness reporting[​](#source-freshness-reporting "Direct link to Source freshness reporting") Using sources unlocks the ability to run [source freshness reporting](https://docs.getdbt.com/docs/build/sources.md#source-data-freshness) to make sure your raw data isn't stale. ###### Easy dependency tracing[​](#easy-dependency-tracing "Direct link to Easy dependency tracing") If you're migrating multiple stored procedures into dbt, with sources you can see which queries depend on the same raw tables. This allows you to consolidate modeling work on those base tables, rather than calling them separately in multiple places. [![Sources appear in green in your DAG in dbt docs](/img/docs/building-a-dbt-project/sources-dag.png?v=2 "Sources appear in green in your DAG in dbt docs")](#)Sources appear in green in your DAG in dbt docs ###### Build the habit of analytics-as-code[​](#build-the-habit-of-analytics-as-code "Direct link to Build the habit of analytics-as-code") Sources are an easy way to get your feet wet using config files to define aspects of your transformation pipeline. ```yml sources: - name: jaffle_shop tables: - name: orders - name: customers ``` With a few lines of code in a `.yml` file in your dbt project's `/models` subfolder, you can now version control how your data sources (Snowplow, Shopify, etc) map to actual database tables. For example, let's say you migrate from one ETL tool to another, and the new tool writes to a new schema in your warehouse. dbt sources allow you to make that update in a single config file, and flip on the change with one pull request to your dbt project. #### Choose a refactoring strategy[​](#choose-a-refactoring-strategy "Direct link to Choose a refactoring strategy") There are two ways you can choose to refactor: in-place or alongside. ###### In-place refactoring[​](#in-place-refactoring "Direct link to In-place refactoring") Means that you will work directly on the SQL script that you ported over in the first step. You'll move it into a `/marts` subfolder within your project's `/models` folder and go to town. **Pros**: * You won't have any old models to delete once refactoring is done. **Cons**: * More pressure to get it right the first time, especially if you've referenced this model from any BI tool or downstream process. * Harder to audit, since you've overwritten your audit comparison model. * Requires navigating through Git commits to see what code you've changed throughout. ###### Alongside refactoring[​](#alongside-refactoring "Direct link to Alongside refactoring") Means that you will copy your model to a `/marts` folder, and work on changes on that copy. **Pros**: * Less impact on end users - anything that is referencing the model you're refactoring can keep that reference until you can safely deprecate that model. * Less pressure to get it right the first time, meaning you can push/merge smaller PRs. This is better for you and your reviewers. * You can audit easier by running the old and new models in your dev branch and comparing the results. This ensures the datasets you're comparing have the same or very close to the same records. * You can look at old code more easily, as it has not been changed. * You can decide when the old model is ready to be deprecated. **Cons**: * You'll have the old file(s) in your project until you can deprecate them - running side-by-side like this can feel duplicative, and may be a headache to manage if you're migrating a number of queries in bulk. We generally recommend the **alongside** approach, which we'll follow in this tutorial. #### Implement CTE groupings[​](#implement-cte-groupings "Direct link to Implement CTE groupings") Once you choose your refactoring strategy, you'll want to do some cosmetic cleanups according to your data modeling best practices and start moving code into CTE groupings. This will give you a head start on porting SQL snippets from CTEs into modular [dbt data models](https://docs.getdbt.com/docs/build/models.md). ##### What's a CTE?[​](#whats-a-cte "Direct link to What's a CTE?") CTE stands for “Common Table Expression”, which is a temporary result set available for use until the end of SQL script execution. Using the `with` keyword at the top of a query allows us to use CTEs in our code. Inside of the model we're refactoring, we’re going to use a 4-part layout: 1. 'Import' CTEs 2. 'Logical' CTEs 3. A 'Final' CTE 4. A simple SELECT statement In practice this looks like: ```sql with import_orders as ( -- query only non-test orders select * from {{ source('jaffle_shop', 'orders') }} where amount > 0 ), import_customers as ( select * from {{ source('jaffle_shop', 'customers') }} ), logical_cte_1 as ( -- perform some math on import_orders ), logical_cte_2 as ( -- perform some math on import_customers ), final_cte as ( -- join together logical_cte_1 and logical_cte_2 ) select * from final_cte ``` Notice there are no nested queries here, which makes reading our logic much more straightforward. If a query needs to be nested, it's just a new CTE that references the previous CTE. ###### 1. Import CTEs[​](#1-import-ctes "Direct link to 1. Import CTEs") Let's start with our components, and identify raw data that is being used in our analysis. For this exercise, the components are three sources: * jaffle\_shop.customers * jaffle\_shop.orders * stripe.payment Let's make a CTE for each of these under the `Import CTEs` comment. These import CTEs should be only simple `select *` statements, but can have filters if necessary. We'll cover that later - for now, just use `select * from {{ source('schema', 'table') }}` for each, with the appropriate reference. Then, we will switch out all hard-coded references with our import CTE names. ###### 2. Logical CTEs[​](#2-logical-ctes "Direct link to 2. Logical CTEs") Logical CTEs contain unique transformations used to generate the final product, and we want to separate these into logical blocks. To identify our logical CTEs, we will follow subqueries in order. If a subquery has nested subqueries, we will want to continue moving down until we get to the first layer, then pull out the subqueries in order as CTEs, making our way back to the final select statement. Name these CTEs as the alias that the subquery was given - you can rename it later, but for now it is best to make as few changes as possible. If the script is particularly complicated, it's worth it to go through once you're finished pulling out subqueries and follow the CTEs to make sure they happen in an order that makes sense for the end result. ###### 3. Final CTE[​](#3-final-cte "Direct link to 3. Final CTE") The previous process usually results in a select statement that is left over at the end - this select statement can be moved into its own CTE called the final CTE, or can be named something that is inherent for others to understand. This CTE determines the final product of the model. ###### 4. Simple SELECT statement[​](#4-simple-select-statement "Direct link to 4. Simple SELECT statement") After you have moved everything into CTEs, you'll want to write a `select * from final` (or something similar, depending on your final CTE name) at the end of the model. This allows anyone after us to easily step through the CTEs when troubleshooting, rather than having to untangle nested queries. > For more background on CTEs, check out the [dbt Labs style guide](https://docs.getdbt.com/best-practices/how-we-style/0-how-we-style-our-dbt-projects.md). #### Port CTEs to individual data models[​](#port-ctes-to-individual-data-models "Direct link to Port CTEs to individual data models") Rather than keep our SQL code confined to one long SQL file, we'll now start splitting it into modular + reusable [dbt data models](https://docs.getdbt.com/docs/build/models.md). Internally at dbt Labs, we follow roughly this [data modeling technique](https://www.getdbt.com/analytics-engineering/modular-data-modeling-technique/) and we [structure our dbt projects](https://docs.getdbt.com/best-practices/how-we-structure/1-guide-overview.md) accordingly. We'll follow those structures in this walkthrough, but your team's conventions may of course differ from ours. ##### Identifying staging models[​](#identifying-staging-models "Direct link to Identifying staging models") To identify our [staging models](https://www.getdbt.com/analytics-engineering/modular-data-modeling-technique/#staging-models), we want to look at the things we've imported in our import CTEs. For us, that's customers, orders, and payments. We want to look at the transformations that can occur within each of these sources without needing to be joined to each other, and then we want to make components out of those so they can be our building blocks for further development. ##### CTEs or intermediate models[​](#ctes-or-intermediate-models "Direct link to CTEs or intermediate models") Our left-over logic can then be split into steps that are more easily understandable. We'll start by using CTEs, but when a model becomes complex or can be divided out into reusable components you may consider an intermediate model. Intermediate models are optional and are not always needed, but do help when you have large data flows coming together. ##### Final model[​](#final-model "Direct link to Final model") Our final model accomplishes the result set we want, and it uses the components we've built. By this point we've identified what we think should stay in our final model. #### Data model auditing[​](#data-model-auditing "Direct link to Data model auditing") We'll want to audit our results using the dbt [audit\_helper package](https://hub.getdbt.com/dbt-labs/audit_helper/latest/). Under the hood, it generates comparison queries between our before and after states, so that we can compare our original query results to our refactored results to identify differences. Sure, we could write our own query manually to audit these models, but using the dbt `audit_helper` package gives us a head start and allows us to identify variances more quickly. ##### Ready for refactoring practice?[​](#ready-for-refactoring-practice "Direct link to Ready for refactoring practice?") Head to the free on-demand course, [Refactoring from Procedural SQL to dbt](https://learn.getdbt.com/courses/refactoring-sql-for-modularity) for a more in-depth refactoring example + a practice refactoring problem to test your skills. Questions on this guide or the course? Drop a note in #learn-on-demand in [dbt Community Slack](https://getdbt.com/community). #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Refresh a Mode dashboard when a job completes [Back to guides](https://docs.getdbt.com/guides.md) Webhooks Advanced [Menu ]() #### Introduction[​](#introduction "Direct link to Introduction") This guide will teach you how to refresh a Mode dashboard when a dbt job has completed successfully and there is fresh data available. The integration will: * Receive a webhook notification in Zapier * Trigger a refresh of a Mode report Although we are using the Mode API for a concrete example, the principles are readily transferrable to your [tool](https://learn.hex.tech/docs/develop-logic/hex-api/api-reference#operation/RunProject) [of](https://learn.microsoft.com/en-us/rest/api/power-bi/datasets/refresh-dataset) [choice](https://help.tableau.com/current/api/rest_api/en-us/REST/rest_api_ref.htm#update_workbook_now). ##### Prerequisites[​](#prerequisites "Direct link to Prerequisites") In order to set up the integration, you should have familiarity with: * [dbt Webhooks](https://docs.getdbt.com/docs/deploy/webhooks.md) * Zapier * The [Mode API](https://mode.com/developer/api-reference/introduction/) #### Create a new Zap in Zapier[​](#create-a-new-zap-in-zapier "Direct link to Create a new Zap in Zapier") Use **Webhooks by Zapier** as the Trigger, and **Catch Raw Hook** as the Event. If you don't intend to [validate the authenticity of your webhook](https://docs.getdbt.com/docs/deploy/webhooks.md#validate-a-webhook) (not recommended!) then you can choose **Catch Hook** instead. Press **Continue**, then copy the webhook URL. ![Screenshot of the Zapier UI, showing the webhook URL ready to be copied](/assets/images/catch-raw-hook-16dd72d8a6bc26284c5fad897f3da646.png) #### Configure a new webhook in dbt[​](#configure-a-new-webhook-in-dbt "Direct link to Configure a new webhook in dbt") See [Create a webhook subscription](https://docs.getdbt.com/docs/deploy/webhooks.md#create-a-webhook-subscription) for full instructions. Your event should be **Run completed**, and you need to change the **Jobs** list to only contain any jobs whose completion should trigger a report refresh. Make note of the Webhook Secret Key for later. Once you've tested the endpoint in dbt, go back to Zapier and click **Test Trigger**, which will create a sample webhook body based on the test event dbt sent. The sample body's values are hard-coded and not reflective of your project, but they give Zapier a correctly-shaped object during development. #### Store secrets[​](#store-secrets "Direct link to Store secrets") In the next step, you will need the Webhook Secret Key from the prior step, and a dbt [personal access token](https://docs.getdbt.com/docs/dbt-cloud-apis/user-tokens.md) or [service account token](https://docs.getdbt.com/docs/dbt-cloud-apis/service-tokens.md), as well as a [Mode API token and secret](https://mode.com/developer/api-reference/authentication/). Zapier allows you to [store secrets](https://help.zapier.com/hc/en-us/articles/8496293271053-Save-and-retrieve-data-from-Zaps), which prevents your keys from being displayed in plaintext in the Zap code. You will be able to access them via the [StoreClient utility](https://help.zapier.com/hc/en-us/articles/8496293969549-Store-data-from-code-steps-with-StoreClient). This guide assumes the names for the secret keys are: `DBT_WEBHOOK_KEY`, `MODE_API_TOKEN`, and `MODE_API_SECRET`. If you are using different names, make sure you update all references to them in the sample code. This guide uses a short-lived code action to store the secrets, but you can also use a tool like Postman to interact with the [REST API](https://store.zapier.com/) or create a separate Zap and call the [Set Value Action](https://help.zapier.com/hc/en-us/articles/8496293271053-Save-and-retrieve-data-from-Zaps#3-set-a-value-in-your-store-0-3). ##### a. Create a Storage by Zapier connection[​](#a-create-a-storage-by-zapier-connection "Direct link to a. Create a Storage by Zapier connection") If you haven't already got one, go to and create a new connection. Remember the UUID secret you generate for later. ##### b. Add a temporary code step[​](#b-add-a-temporary-code-step "Direct link to b. Add a temporary code step") Choose **Run Python** as the Event. Run the following code: ```python store = StoreClient('abc123') #replace with your UUID secret store.set('DBT_WEBHOOK_KEY', 'abc123') #replace with your API token store.set('MODE_API_TOKEN', 'abc123') #replace with your Mode API Token store.set('MODE_API_SECRET', 'abc123') #replace with your Mode API Secret ``` Test the step. You can delete this Action when the test succeeds. The key will remain stored as long as it is accessed at least once every three months. #### Add a code action[​](#add-a-code-action "Direct link to Add a code action") Select **Code by Zapier** as the App, and **Run Python** as the Event. In the **Set up action** area, add two items to **Input Data**: `raw_body` and `auth_header`. Map those to the `1. Raw Body` and `1. Headers Http Authorization` fields from the **Catch Raw Hook** step above. ![Screenshot of the Zapier UI, showing the mappings of raw\_body and auth\_header](/assets/images/run-python-40333883c6a20727c02d25224d0e40a4.png) In the **Code** field, paste the following code, replacing `YOUR_SECRET_HERE` in the StoreClient constructor with the secret you created when setting up the Storage by Zapier integration (not your dbt secret), and setting the `account_username` and `report_token` variables to actual values. The code below will validate the authenticity of the request, then send a [`run report` command to the Mode API](https://mode.com/developer/api-reference/analytics/report-runs/#runReport) for the given report token. ```python import hashlib import hmac import json #replace with the report token you want to run account_username = 'YOUR_MODE_ACCOUNT_USERNAME_HERE' report_token = 'YOUR_REPORT_TOKEN_HERE' auth_header = input_data['auth_header'] raw_body = input_data['raw_body'] # Access secret credentials secret_store = StoreClient('YOUR_SECRET_HERE') hook_secret = secret_store.get('DBT_WEBHOOK_KEY') username = secret_store.get('MODE_API_TOKEN') password = secret_store.get('MODE_API_SECRET') # Validate the webhook came from dbt signature = hmac.new(hook_secret.encode('utf-8'), raw_body.encode('utf-8'), hashlib.sha256).hexdigest() if signature != auth_header: raise Exception("Calculated signature doesn't match contents of the Authorization header. This webhook may not have been sent from .") full_body = json.loads(raw_body) hook_data = full_body['data'] if hook_data['runStatus'] == "Success": # Create a report run with the Mode API url = f'https://app.mode.com/api/{account_username}/reports/{report_token}/run' params = { 'parameters': { "user_id": 123, "location": "San Francisco" } } headers = { 'Content-Type': 'application/json', 'Accept': 'application/hal+json' } response = requests.post( url, json=params, headers=headers, auth=HTTPBasicAuth(username, password) ) response.raise_for_status() return ``` #### Test and deploy[​](#test-and-deploy "Direct link to Test and deploy") You can iterate on the Code step by modifying the code and then running the test again. When you're happy with it, you can publish your Zap. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Refresh Tableau workbook with extracts after a job finishes [Back to guides](https://docs.getdbt.com/guides.md) Webhooks Advanced [Menu ]() #### Introduction[​](#introduction "Direct link to Introduction") This guide will teach you how to refresh a Tableau workbook that leverages [extracts](https://help.tableau.com/current/pro/desktop/en-us/extracting_data.htm) when a dbt job has completed successfully and there is fresh data available. The integration will: * Receive a webhook notification in Zapier * Trigger a refresh of a Tableau workbook ##### Prerequisites[​](#prerequisites "Direct link to Prerequisites") To set up the integration, you need to be familiar with: * [dbt Webhooks](https://docs.getdbt.com/docs/deploy/webhooks.md) * Zapier * The [Tableau API](https://help.tableau.com/current/api/rest_api/en-us/REST/rest_api.htm) * The [version](https://help.tableau.com/current/api/rest_api/en-us/REST/rest_api_concepts_versions.htm#rest_api_versioning) of Tableau's REST API that is compatible with your server #### Obtain authentication credentials from Tableau[​](#obtain-authentication-credentials-from-tableau "Direct link to Obtain authentication credentials from Tableau") To authenticate with the Tableau API, obtain a [Personal Access Token](https://help.tableau.com/current/server/en-us/security_personal_access_tokens.htm) from your Tableau Server/Cloud instance. In addition, make sure your Tableau workbook uses data sources that allow refresh access, which is usually set when publishing. #### Create a new Zap in Zapier[​](#create-a-new-zap-in-zapier "Direct link to Create a new Zap in Zapier") To trigger an action with the delivery of a webhook in Zapier, you'll want to create a new Zap with **Webhooks by Zapier** as the Trigger and **Catch Raw Hook** as the Event. However, if you choose not to [validate the authenticity of your webhook](https://docs.getdbt.com/docs/deploy/webhooks.md#validate-a-webhook), which isn't recommended, you can choose **Catch Hook** instead. Press **Continue**, then copy the webhook URL. ![Screenshot of the Zapier UI, showing the webhook URL ready to be copied](/assets/images/catch-raw-hook-16dd72d8a6bc26284c5fad897f3da646.png) #### Configure a new webhook in dbt[​](#configure-a-new-webhook-in-dbt "Direct link to Configure a new webhook in dbt") To set up a webhook subscription for dbt, follow the instructions in [Create a webhook subscription](https://docs.getdbt.com/docs/deploy/webhooks.md#create-a-webhook-subscription). For the event, choose **Run completed** and modify the **Jobs** list to include only the jobs that should trigger a report refresh. Remember to save the Webhook Secret Key for later. Paste in the webhook URL obtained from Zapier in step 2 into the **Endpoint** field and test the endpoint. Once you've tested the endpoint in dbt, go back to Zapier and click **Test Trigger**, which will create a sample webhook body based on the test event dbt sent. The sample body's values are hard-coded and not reflective of your project, but they give Zapier a correctly-shaped object during development. #### Store secrets[​](#store-secrets "Direct link to Store secrets") In the next step, you will need the Webhook Secret Key from the prior step, and your Tableau authentication credentials and details. Specifically, you'll need your Tableau server/site URL, server/site name, PAT name, and PAT secret. Zapier allows you to [store secrets](https://help.zapier.com/hc/en-us/articles/8496293271053-Save-and-retrieve-data-from-Zaps), which prevents your keys from being displayed in plaintext in the Zap code. You will be able to access them via the [StoreClient utility](https://help.zapier.com/hc/en-us/articles/8496293969549-Store-data-from-code-steps-with-StoreClient). This guide assumes the names for the secret keys are: `DBT_WEBHOOK_KEY`, `TABLEAU_SITE_URL`, `TABLEAU_SITE_NAME`, `TABLEAU_API_TOKEN_NAME`, and `TABLEAU_API_TOKEN_SECRET`. If you are using different names, make sure you update all references to them in the sample code. This guide uses a short-lived code action to store the secrets, but you can also use a tool like Postman to interact with the [REST API](https://store.zapier.com/) or create a separate Zap and call the [Set Value Action](https://help.zapier.com/hc/en-us/articles/8496293271053-Save-and-retrieve-data-from-Zaps#3-set-a-value-in-your-store-0-3). ##### a. Create a Storage by Zapier connection[​](#a-create-a-storage-by-zapier-connection "Direct link to a. Create a Storage by Zapier connection") Create a new connection at if you don't already have one and remember the UUID secret you generate for later. ##### b. Add a temporary code step[​](#b-add-a-temporary-code-step "Direct link to b. Add a temporary code step") Choose **Run Python** as the Event and input the following code: ```python store = StoreClient('abc123') #replace with your UUID secret store.set('DBT_WEBHOOK_KEY', 'abc123') #replace with your Webhook key store.set('TABLEAU_SITE_URL', 'abc123') #replace with your Tableau Site URL, inclusive of https:// and .com store.set('TABLEAU_SITE_NAME', 'abc123') #replace with your Tableau Site/Server Name store.set('TABLEAU_API_TOKEN_NAME', 'abc123') #replace with your Tableau API Token Name store.set('TABLEAU_API_TOKEN_SECRET', 'abc123') #replace with your Tableau API Secret ``` Test the step to run the code. You can delete this action when the test succeeds. The keys will remain stored as long as it is accessed at least once every three months. #### Add a code action[​](#add-a-code-action "Direct link to Add a code action") Select **Code by Zapier** as the App, and **Run Python** as the Event. In the **Set up action** area, add two items to **Input Data**: `raw_body` and `auth_header`. Map those to the `1. Raw Body` and `1. Headers Http Authorization` fields from the **Catch Raw Hook** step above. ![Screenshot of the Zapier UI, showing the mappings of raw\_body and auth\_header](/assets/images/run-python-40333883c6a20727c02d25224d0e40a4.png) In the **Code** field, paste the following code, replacing `YOUR_STORAGE_SECRET_HERE` in the StoreClient constructor with the UUID secret you created when setting up the Storage by Zapier integration, and replacing the `workbook_name` and `api_version` variables to actual values. The following code validates the authenticity of the request and obtains the workbook ID for the specified workbook name. Next, the code will send a [`update workbook` command to the Tableau API](https://help.tableau.com/current/api/rest_api/en-us/REST/rest_api_ref_workbooks_and_views.htm#update_workbook_now) for the given workbook ID. ```python import requests import hashlib import json import hmac # Access secret credentials secret_store = StoreClient('YOUR_STORAGE_SECRET_HERE') hook_secret = secret_store.get('DBT_WEBHOOK_KEY') server_url = secret_store.get('TABLEAU_SITE_URL') server_name = secret_store.get('TABLEAU_SITE_NAME') pat_name = secret_store.get('TABLEAU_API_TOKEN_NAME') pat_secret = secret_store.get('TABLEAU_API_TOKEN_SECRET') #Enter the name of the workbook to refresh workbook_name = "YOUR_WORKBOOK_NAME" api_version = "ENTER_COMPATIBLE_VERSION" #Validate authenticity of webhook coming from auth_header = input_data['auth_header'] raw_body = input_data['raw_body'] signature = hmac.new(hook_secret.encode('utf-8'), raw_body.encode('utf-8'), hashlib.sha256).hexdigest() if signature != auth_header: raise Exception("Calculated signature doesn't match contents of the Authorization header. This webhook may not have been sent from .") full_body = json.loads(raw_body) hook_data = full_body['data'] if hook_data['runStatus'] == "Success": #Authenticate with Tableau Server to get an authentication token auth_url = f"{server_url}/api/{api_version}/auth/signin" auth_data = { "credentials": { "personalAccessTokenName": pat_name, "personalAccessTokenSecret": pat_secret, "site": { "contentUrl": server_name } } } auth_headers = { "Accept": "application/json", "Content-Type": "application/json" } auth_response = requests.post(auth_url, data=json.dumps(auth_data), headers=auth_headers) #Extract token to use for subsequent calls auth_token = auth_response.json()["credentials"]["token"] site_id = auth_response.json()["credentials"]["site"]["id"] #Extract the workbook ID workbooks_url = f"{server_url}/api/{api_version}/sites/{site_id}/workbooks" workbooks_headers = { "Accept": "application/json", "Content-Type": "application/json", "X-Tableau-Auth": auth_token } workbooks_params = { "filter": f"name:eq:{workbook_name}" } workbooks_response = requests.get(workbooks_url, headers=workbooks_headers, params=workbooks_params) #Assign workbook ID workbooks_data = workbooks_response.json() workbook_id = workbooks_data["workbooks"]["workbook"][0]["id"] # Refresh the workbook refresh_url = f"{server_url}/api/{api_version}/sites/{site_id}/workbooks/{workbook_id}/refresh" refresh_data = {} refresh_headers = { "Accept": "application/json", "Content-Type": "application/json", "X-Tableau-Auth": auth_token } refresh_trigger = requests.post(refresh_url, data=json.dumps(refresh_data), headers=refresh_headers) return {"message": "Workbook refresh has been queued"} ``` #### Test and deploy[​](#test-and-deploy "Direct link to Test and deploy") To make changes to your code, you can modify it and test it again. When you're happy with it, you can publish your Zap. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Set up your dbt project with Databricks [Back to guides](https://docs.getdbt.com/guides.md) Databricks dbt Core dbt platform Intermediate [Menu ]() #### Introduction[​](#introduction "Direct link to Introduction") Databricks and dbt Labs are partnering to help data teams think like software engineering teams and ship trusted data, faster. The dbt-databricks adapter enables dbt users to leverage the latest Databricks features in their dbt project. Hundreds of customers are now using dbt and Databricks to build expressive and reliable data pipelines on the Lakehouse, generating data assets that enable analytics, ML, and AI use cases throughout the business. In this guide, we discuss how to set up your dbt project on the Databricks Lakehouse Platform so that it scales from a small team all the way up to a large organization. #### Configuring the Databricks Environments[​](#configuring-the-databricks-environments "Direct link to Configuring the Databricks Environments") To get started, we will use Databricks’s Unity Catalog. Without it, we would not be able to design separate [environments](https://docs.getdbt.com/docs/environments-in-dbt.md) for development and production per our [best practices](https://docs.getdbt.com/best-practices/how-we-structure/1-guide-overview.md). It also allows us to ensure the proper access controls have been applied using SQL. You will need to be using the dbt-databricks adapter to use it (as opposed to the dbt-spark adapter). We will set up two different *catalogs* in Unity Catalog: **dev** and **prod**. A catalog is a top-level container for *schemas* (previously known as databases in Databricks), which in turn contain tables and views. Our dev catalog will be the development environment that analytics engineers interact with through their Studio IDE. Developers should have their own sandbox to build and test objects in without worry of overwriting or dropping a coworker’s work; we recommend creating personal schemas for this purpose. In terms of permissions, they should only have access to the **dev** catalog. Only production runs will have access to data in the **prod** catalog. In a future guide, we will discuss a **test** catalog where our continuous integration/continuous deployment (CI/CD) system can run `dbt test`. For now, let’s keep things simple and [create two catalogs](https://docs.databricks.com/sql/language-manual/sql-ref-syntax-ddl-create-catalog.html) either using the Data Catalog or in the SQL editor with these commands: ```sql create catalog if not exists dev; create catalog if not exists prod; ``` As long as your developer is given write access to the dev data catalog, there is no need to create the sandbox schemas ahead of time. #### Setting up Service Principals[​](#setting-up-service-principals "Direct link to Setting up Service Principals") When an analytics engineer runs a dbt project from their Studio IDE, it is perfectly fine for the resulting queries to execute with that user’s identity. However, we want production runs to execute with a *service principal's* identity. As a reminder, a service principal is a headless account that does not belong to an actual person. Service principals are used to remove humans from deploying to production for convenience and security. Personal identities should not be used to build production pipelines because they could break if the user leaves the company or changes their credentials. Also, there should not be ad hoc commands modifying production data. Only scheduled jobs and running code that has passed CI tests and code reviews should be allowed to modify production data. If something breaks, there is an auditable trail of changes to find the root cause, easily revert to the last working version of the code, and minimize the impact on end users. [Let’s create a service principal](https://docs.databricks.com/administration-guide/users-groups/service-principals.html#add-a-service-principal-to-your-databricks-account) in Databricks: 1. Have your Databricks Account admin [add a service principal](https://docs.databricks.com/administration-guide/users-groups/service-principals.html#add-a-service-principal-to-your-databricks-account) to your account. The service principal’s name should differentiate itself from a user ID and make its purpose clear (eg dbt\_prod\_sp). 2. Add the service principal added to any groups it needs to be a member of at this time. There are more details on permissions in our ["Unity Catalog best practices" guide](https://docs.getdbt.com/best-practices/dbt-unity-catalog-best-practices.md). 3. [Add the service principal to your workspace](https://docs.databricks.com/administration-guide/users-groups/service-principals.html#add-a-service-principal-to-a-workspace) and apply any [necessary entitlements](https://docs.databricks.com/administration-guide/users-groups/service-principals.html#add-a-service-principal-to-a-workspace-using-the-admin-console), such as Databricks SQL access and Workspace access. #### Setting up Databricks Compute[​](#setting-up-databricks-compute "Direct link to Setting up Databricks Compute") When you run a dbt project, it generates SQL, which can run on All Purpose Clusters or SQL warehouses. We strongly recommend running dbt-generated SQL on a Databricks SQL warehouse. Since SQL warehouses are optimized for executing SQL queries, you can save on the cost with lower uptime needed for the cluster to run the queries. If you need to debug, you will also have access to a Query Profile. We recommend using a serverless cluster if you want to minimize the time spent on spinning up a cluster and remove the need to change cluster sizes depending on workflows. If you use a Databricks serverless SQL warehouse, you still need to choose a [cluster size](https://docs.databricks.com/aws/en/compute/sql-warehouse/create#configure-sql-warehouse-settings) (for example, 2X-Small, X-Small, Small, Medium, Large). For more information on serverless SQL warehouses, see the [Databricks docs](https://docs.databricks.com/aws/en/compute/sql-warehouse/warehouse-behavior#sizing-a-serverless-sql-warehouse). Let’s [create a Databricks SQL warehouse](https://docs.databricks.com/sql/admin/sql-endpoints.html#create-a-sql-warehouse): 1. Click **SQL Warehouses** in the sidebar. 2. Click **Create SQL Warehouse**. 3. Enter a name for the warehouse. 4. If using a serverless SQL warehouse, select a [cluster size](https://docs.databricks.com/aws/en/compute/sql-warehouse/warehouse-behavior#sizing-a-serverless-sql-warehouse) (2X-Small through 4X-Large) or leave the default, but ensure it suits your workload. 5. Accept the default warehouse settings or edit them. 6. Click **Create**. 7. Configure warehouse permissions to ensure our service principal and developer have the right access. We are not covering python in this post but if you want to learn more, check out these [docs](https://docs.getdbt.com/docs/build/python-models.md#specific-data-platforms). Depending on your workload, you may wish to create a larger SQL Warehouse for production workflows while having a smaller development SQL Warehouse (if you’re not using Serverless SQL Warehouses). As your project grows, you might want to apply [compute per model configurations](https://docs.getdbt.com/reference/resource-configs/databricks-configs.md#specifying-the-compute-for-models). #### Configure your dbt project[​](#configure-your-dbt-project "Direct link to Configure your dbt project") Now that the Databricks components are in place, we can configure our dbt project. This involves connecting dbt to our Databricks SQL warehouse to run SQL queries and using a version control system like GitHub to store our transformation code. If you are migrating an existing dbt project from the dbt-spark adapter to dbt-databricks, follow this [migration guide](https://docs.getdbt.com/guides/migrate-from-spark-to-databricks.md) to switch adapters without needing to update developer credentials and other existing configs. If you’re starting a new dbt project, follow the steps below. For a more detailed setup flow, check out our [quickstart guide.](https://docs.getdbt.com/guides/databricks.md) ##### Connect dbt to Databricks[​](#connect-dbt-to-databricks "Direct link to Connect dbt to Databricks") First, you’ll need to connect your dbt project to Databricks so it can send transformation instructions and build objects in Unity Catalog. Follow the instructions for [dbt](https://docs.getdbt.com/guides/databricks.md?step=4) or [Core](https://docs.getdbt.com/docs/local/connect-data-platform/databricks-setup.md) to configure your project’s connection credentials. Each developer must generate their Databricks PAT and use the token in their development credentials. They will also specify a unique developer schema that will store the tables and views generated by dbt runs executed from their Studio IDE. This provides isolated developer environments and ensures data access is fit for purpose. Let’s generate a [Databricks personal access token (PAT)](https://docs.databricks.com/sql/user/security/personal-access-tokens.html) for Development: 1. In Databricks, click on your Databricks username in the top bar and select User Settings in the drop down. 2. On the Access token tab, click Generate new token. 3. Click Generate. 4. Copy the displayed token and click Done. (don’t lose it!) For your development credentials/profiles.yml: 1. Set your default catalog to dev. 2. Your developer schema should be named after yourself. We recommend dbt\_\\. During your first invocation of `dbt run`, dbt will create the developer schema if it doesn't already exist in the dev catalog. #### Defining your dbt deployment environment[​](#defining-your-dbt-deployment-environment "Direct link to Defining your dbt deployment environment") We need to give dbt a way to deploy code outside of development environments. To do so, we’ll use dbt [environments](https://docs.getdbt.com/docs/environments-in-dbt.md) to define the production targets that end users will interact with. Core projects can use [targets in profiles](https://docs.getdbt.com/docs/local/profiles.yml.md#understanding-targets-in-profiles) to separate environments. [dbt environments](https://docs.getdbt.com/docs/cloud/studio-ide/develop-in-studio.md#set-up-and-access-the-cloud-ide) allow you to define environments via the UI and [schedule jobs](https://docs.getdbt.com/guides/databricks.md#create-and-run-a-job) for specific environments. Let’s set up our deployment environment: 1. Follow the Databricks instructions to [set up your service principal’s token](https://docs.databricks.com/dev-tools/service-principals.html#use-curl-or-postman). Note that the `lifetime_seconds` will define how long this credential stays valid. You should use a large number here to avoid regenerating tokens frequently and production job failures. 2. Now let’s pop back over to dbt to fill out the environment fields. Click on environments in the dbt UI or define a new target in your profiles.yml. 3. Set the Production environment’s *catalog* to the **prod** catalog created above. Provide the [service token](https://docs.databricks.com/administration-guide/users-groups/service-principals.html#manage-access-tokens-for-a-service-principal) for your **prod** service principal and set that as the *token* in your production environment’s deployment credentials. 4. Set the schema to the default for your prod environment. This can be overridden by [custom schemas](https://docs.getdbt.com/docs/build/custom-schemas.md#what-is-a-custom-schema) if you need to use more than one. 5. Provide your Service Principal token. #### Connect dbt to your git repository[​](#connect-dbt-to-your-git-repository "Direct link to Connect dbt to your git repository") Next, you’ll need somewhere to store and version control your code that allows you to collaborate with teammates. Connect your dbt project to a git repository with [dbt](https://docs.getdbt.com/guides/databricks.md#set-up-a-dbt-cloud-managed-repository). [Core](https://docs.getdbt.com/guides/manual-install.md#create-a-repository) projects will use the git CLI. ##### Next steps[​](#next-steps "Direct link to Next steps") Now that your project is configured, you can start transforming your Databricks data with dbt. To help you scale efficiently, we recommend you follow our best practices, starting with the [Unity Catalog best practices](https://docs.getdbt.com/best-practices/dbt-unity-catalog-best-practices.md), then you can [Optimize dbt models on Databricks](https://docs.getdbt.com/guides/optimize-dbt-models-on-databricks.md). #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Trigger PagerDuty alarms when dbt jobs fail [Back to guides](https://docs.getdbt.com/guides.md) Webhooks Advanced [Menu ]() #### Introduction[​](#introduction "Direct link to Introduction") This guide will teach you how to build and host a basic Python app which will monitor dbt jobs and create PagerDuty alarms based on failure. To do this, when a dbt job completes it will: * Check for any failed nodes (e.g. non-passing tests or errored models), and * create a PagerDuty alarm based on those nodes by calling the PagerDuty Events API. Events are deduplicated per run ID. ![Screenshot of the PagerDuty UI, showing an alarm created by invalid SQL in a dbt model](/assets/images/pagerduty-example-alarm-b963e5d15b2ec724c8fd76abd58ec13c.png) In this example, we will use fly.io for hosting/running the service. fly.io is a platform for running full stack apps without provisioning servers etc. This level of usage should comfortably fit inside of the Free tier. You can also use an alternative tool such as [AWS Lambda](https://ademoverflow.com/en/posts/tutorial-fastapi-aws-lambda-serverless/) or [Google Cloud Run](https://github.com/sekR4/FastAPI-on-Google-Cloud-Run). ##### Prerequisites[​](#prerequisites "Direct link to Prerequisites") This guide assumes some familiarity with: * [dbt Webhooks](https://docs.getdbt.com/docs/deploy/webhooks.md) * CLI apps * Deploying code to a serverless code runner like fly.io or AWS Lambda #### Clone the `dbt-cloud-webhooks-pagerduty` repo[​](#clone-the-dbt-cloud-webhooks-pagerduty-repo "Direct link to clone-the-dbt-cloud-webhooks-pagerduty-repo") [This repository](https://github.com/dpguthrie/dbt-cloud-webhooks-pagerduty) contains the sample code for validating a webhook and creating events in PagerDuty. #### Install `flyctl` and sign up for fly.io[​](#install-flyctl-and-sign-up-for-flyio "Direct link to install-flyctl-and-sign-up-for-flyio") Follow the directions for your OS in the [fly.io docs](https://fly.io/docs/hands-on/install-flyctl/), then from your command line, run the following commands: Switch to the directory containing the repo you cloned in step 1: ```shell #example: replace with your actual path cd ~/Documents/GitHub/dbt-cloud-webhooks-pagerduty ``` Sign up for fly.io: ```shell flyctl auth signup ``` Your console should show `successfully logged in as YOUR_EMAIL` when you're done, but if it doesn't then sign in to fly.io from your command line: ```shell flyctl auth login ``` #### Launch your fly.io app[​](#launch-your-flyio-app "Direct link to Launch your fly.io app") Launching your app publishes it to the web and makes it ready to catch webhook events: ```shell flyctl launch ``` You will see a message saying that an existing `fly.toml` file was found. Type `y` to copy its configuration to your new app. Choose an app name of your choosing, such as `YOUR_COMPANY-dbt-cloud-webhook-pagerduty`, or leave blank and one will be generated for you. Note that your name can only contain numbers, lowercase letters and dashes. Choose a deployment region, and take note of the hostname that is generated (normally `APP_NAME.fly.dev`). When asked if you would like to set up Postgresql or Redis databases, type `n` for each. Type `y` when asked if you would like to deploy now. Sample output from the setup wizard: \`\` `joel@Joel-Labes dbt-cloud-webhooks-pagerduty % flyctl launch An existing fly.toml file was found for app dbt-cloud-webhooks-pagerduty ? Would you like to copy its configuration to the new app? Yes Creating app in /Users/joel/Documents/GitHub/dbt-cloud-webhooks-pagerduty Scanning source code Detected a Dockerfile app ? Choose an app name (leave blank to generate one): demo-dbt-cloud-webhook-pagerduty automatically selected personal organization: Joel Labes Some regions require a paid plan (fra, maa). See https://fly.io/plans to set up a plan. ? Choose a region for deployment: [Use arrows to move, type to filter] ? Choose a region for deployment: Sydney, Australia (syd) Created app dbtlabs-dbt-cloud-webhook-pagerduty in organization personal Admin URL: https://fly.io/apps/demo-dbt-cloud-webhook-pagerduty Hostname: demo-dbt-cloud-webhook-pagerduty.fly.dev ? Would you like to set up a Postgresql database now? No ? Would you like to set up an Upstash Redis database now? No Wrote config file fly.toml ? Would you like to deploy now? Yes` #### Create a PagerDuty integration application[​](#create-a-pagerduty-integration-application "Direct link to Create a PagerDuty integration application") See [PagerDuty's guide](https://developer.pagerduty.com/docs/ZG9jOjExMDI5NTgw-events-api-v2-overview#getting-started) for full instructions. Make note of the integration key for later. #### Configure a new webhook in dbt[​](#configure-a-new-webhook-in-dbt "Direct link to Configure a new webhook in dbt") See [Create a webhook subscription](https://docs.getdbt.com/docs/deploy/webhooks.md#create-a-webhook-subscription) for full instructions. Your event should be **Run completed**. Set the webhook URL to the host name you created earlier (`APP_NAME.fly.dev`) Make note of the Webhook Secret Key for later. *Do not test the endpoint*; it won't work until you have stored the auth keys (next step) #### Store secrets[​](#store-secrets "Direct link to Store secrets") The application requires three secrets to be set, using these names: * `DBT_CLOUD_SERVICE_TOKEN`: a dbt [personal access token](https://docs.getdbt.com/docs/dbt-cloud-apis/user-tokens.md) or [service account token](https://docs.getdbt.com/docs/dbt-cloud-apis/service-tokens.md) with at least the `Metdata Only` permission. * `DBT_CLOUD_AUTH_TOKEN`: the Secret Key for the dbt webhook you created earlier. * `PD_ROUTING_KEY`: the integration key for the PagerDuty integration you created earlier. Set these secrets as follows, replacing `abc123` etc with actual values: ```shell flyctl secrets set DBT_CLOUD_SERVICE_TOKEN=abc123 DBT_CLOUD_AUTH_TOKEN=def456 PD_ROUTING_KEY=ghi789 ``` #### Deploy your app[​](#deploy-your-app "Direct link to Deploy your app") After you set your secrets, fly.io will redeploy your application. When it has completed successfully, go back to the dbt webhook settings and click **Test Endpoint**. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Upgrade to Fusion part 1: Preparing to upgrade This guide helps you prepare for an in-place upgrade from dbt Core to the dbt Fusion engine in the dbt platform. [Back to guides](https://docs.getdbt.com/guides.md) dbt Fusion engine dbt platform Upgrade Intermediate [Menu ]() #### Introduction[​](#introduction "Direct link to Introduction") private preview The dbt Fusion engine is available as a private preview for all tiers of dbt platform accounts. dbt Labs is enabling Fusion only on accounts that have eligible projects. Following the steps outlined in this guide doesn't guarantee Fusion eligibility. The dbt Fusion engine represents the next evolution of data transformation. dbt has been rebuilt from the ground up but at its most basic, Fusion is a new version, and like any new version you should take steps to prepare to upgrade. This guide will take you through those preparations. If Fusion is brand new to you, take a look at our [comprehensive documentation](https://docs.getdbt.com/docs/fusion.md) on what it is, how it behaves, and what's different from dbt Core before getting started with this guide. Once you're caught up, it's time to begin preparing your projects for the speed and power that Fusion has to offer. #### Prerequisites[​](#prerequisites "Direct link to Prerequisites") This guide will cover the preparations for upgrading to the dbt Fusion engine and is intended for customers already using the dbt platform with a version of dbt Core. If you're brand new to dbt, check out our [quickstart guides](https://docs.getdbt.com/guides.md). To follow the steps in this guide, you must meet the following prerequisites: * You're using a dbt platform account on any tier. * You have a developer license. * You have [proper permissions](https://docs.getdbt.com/docs/cloud/manage-access/enterprise-permissions.md) to edit projects. * Your project is using a Fusion supported adapter:  BigQuery * Service Account / User Token * Native OAuth * External OAuth * [Required permissions](https://docs.getdbt.com/docs/local/connect-data-platform/bigquery-setup.md#required-permissions)  Databricks * Service Account / User Token * Native OAuth  Redshift * Username / Password * IAM profile  Snowflake * Username / Password * Native OAuth * External OAuth * Key pair using a modern PKCS#8 method * MFA Upgrading your first project Start with smaller, newer, or more familiar projects first. This makes it easier to identify and troubleshoot any issues before upgrading larger, more complex projects. #### Upgrade to the latest dbt Core version[​](#upgrade-to-the-latest-dbt-core-version "Direct link to Upgrade to the latest dbt Core version") Before upgrading to Fusion, you need to move your environments to the **Latest** [dbt Core release track](https://docs.getdbt.com/docs/dbt-versions/cloud-release-tracks.md). The **Latest** track includes all the features and tooling to help you prepare for Fusion. It ensures the smoothest upgrade experience by validating that your project doesn't rely on deprecated behaviors. Test before you deploy Always test version upgrades in development first. Use the [Override dbt version](#step-1-test-in-development-using-override) feature to safely try the **Latest** release track without affecting your team or production runs. ##### Step 1: Test in development (using override)[​](#step-1-test-in-development-using-override "Direct link to Step 1: Test in development (using override)") Test the **Latest** release track for your individual account without changing the environment for your entire team: 1. Click your account name in the left sidebar and select **Account settings**. 2. Select **Credentials** from the sidebar and choose your project. 3. In the side panel, click **Edit** and scroll to **User development settings**. 4. Select **Latest** from the **dbt version** dropdown and click **Save**. [![Override dbt version in your account settings](/img/docs/dbt-cloud/cloud-configuring-dbt-cloud/choosing-dbt-version/example-override-version.png?v=2 "Override dbt version in your account settings")](#)Override dbt version in your account settings 5. Launch the Studio IDE or dbt CLI and test your normal development workflows. 6. Verify the override is active by running any dbt command and checking the **System Logs**. The first line should show `Running with dbt=` and your selected version. If the version number is `v1.11` or higher, you're on the right path to Fusion readiness. If everything works as expected, proceed to the next step to start upgrading your environments. If you encounter deprecation warnings, don't fear! We'll address those [later in this guide](https://docs.getdbt.com/guides/prepare-fusion-upgrade.md?step=4). If you encounter errors, revert to your previous version and refer to the [version upgrade guides](https://docs.getdbt.com/docs/dbt-versions/core-upgrade.md) to resolve any differences between your current version and the latest available dbt Core version. ##### Step 2: Upgrade your development environment[​](#step-2-upgrade-your-development-environment "Direct link to Step 2: Upgrade your development environment") After successfully testing your individual development environment with the override, upgrade the development environment for the entire project (be sure to give your team notice!): 1. Navigate to **Environments** in your project settings. 2. Select your **Development** environment and click **Edit**. 3. Click the **dbt version** dropdown and select **Latest**. 4. Click **Save** to apply the changes. [![Upgrade development environment to Latest dbt Core release track](/img/docs/dbt-cloud/cloud-configuring-dbt-cloud/choosing-dbt-version/select-development.png?v=2 "Upgrade development environment to Latest dbt Core release track")](#)Upgrade development environment to Latest dbt Core release track Remove your override Once your development environment is upgraded, you can remove your personal override by returning to your account credentials and selecting the same version as your environment. ##### Step 3: Upgrade staging and pre-production[​](#step-3-upgrade-staging-and-pre-production "Direct link to Step 3: Upgrade staging and pre-production") If your organization has staging or pre-production environments, upgrade these before production: 1. Navigate to **Environments** and select your staging/pre-production environment. 2. Click **Edit** and select **Latest** from the **dbt version** dropdown. 3. Click **Save**. 4. Run your jobs in this environment for a few days to validate everything works correctly. This provides a final validation layer before upgrading production environments. ##### Step 4: Upgrade your production environment[​](#step-4-upgrade-your-production-environment "Direct link to Step 4: Upgrade your production environment") After validating in staging (or development if you don't have staging), upgrade your production environment: 1. Navigate to **Environments** and select your **Production** environment. 2. Click **Edit** and select **Latest** from the **dbt version** dropdown. 3. Click **Save** to apply the changes. 4. Monitor your first few production runs to ensure everything executes successfully. ##### Step 5: Update jobs[​](#step-5-update-jobs "Direct link to Step 5: Update jobs") While environments control the dbt version for most scenarios, some older job configurations may have version overrides. Review your jobs and [update any that specify a dbt version](https://docs.getdbt.com/docs/dbt-versions/upgrade-dbt-version-in-cloud.md#jobs) to ensure they use the environment's Latest release track. #### Resolve all deprecation warnings[​](#resolve-all-deprecation-warnings "Direct link to Resolve all deprecation warnings") Fusion enforces strict validation and won't accept deprecated code that currently generates warnings in dbt Core. You must resolve all deprecation warnings before upgrading to Fusion. Fortunately, the autofix tool in the Studio IDE can automatically resolve most common deprecations for you. VS Code extension This guide provides steps to resolve deprecation warnings without leaving dbt platform. If you prefer to work in the VS Code or Cursor editors locally, you can run the autofix in our dbt VS Code extension. Check out the [installation guide](https://docs.getdbt.com/docs/install-dbt-extension.md) for more information about those workflows. ##### What the autofix tool handles[​](#what-the-autofix-tool-handles "Direct link to What the autofix tool handles") The autofix tool can resolve many deprecations automatically, including: * Moving custom configurations into the `meta` dictionary * Fixing duplicate YAML keys * Correcting unrecognized resource properties * Updating deprecated configuration patterns Check out the [autofix readme](https://github.com/dbt-labs/dbt-autofix/) for a complete list of the deprecations it addresses. Fusion package compatibility In addition to deprecations, the autofix tool attempts to upgrade packages to the lowest supported Fusion-compatible version. Check out [package support](https://docs.getdbt.com/docs/fusion/supported-features.md#package-support) for more information about Fusion compatibility. ##### Step 1: Create a new branch[​](#step-1-create-a-new-branch "Direct link to Step 1: Create a new branch") Before running the autofix tool, create a new branch to isolate your changes: 1. Navigate to the Studio IDE by clicking **Studio** in the left-side menu. 2. Click the **Version control** panel (git branch icon) on the left sidebar. 3. Click **Create branch** and name it something descriptive like `fusion-deprecation-fixes`. 4. Click **Create** to switch to your new branch. Save before autofixing The autofix tool will modify files in your project. Make sure to commit or stash any unsaved work to avoid losing changes. ##### Step 2: Run the autofix tool[​](#step-2-run-the-autofix-tool "Direct link to Step 2: Run the autofix tool") Now you're ready to scan for and automatically fix deprecation warnings: 1. Click the **three-dot menu** in the bottom right corner of the Studio IDE. 2. Select **Check & fix deprecations**. [![Access the Studio IDE options menu](/img/docs/dbt-cloud/cloud-ide/ide-options-menu-with-save.png?v=2 "Access the Studio IDE options menu")](#)Access the Studio IDE options menu The tool runs `dbt parse --show-all-deprecations --no-partial-parse` to identify all deprecations in your project. This may take a few moments depending on your project size. 3. When parsing completes, view the results in the **Command history** panel in the bottom left. [![View command history and deprecation results](/img/docs/dbt-cloud/cloud-ide/command-history.png?v=2 "View command history and deprecation results")](#)View command history and deprecation results ##### Step 3: Review and apply autofixes[​](#step-3-review-and-apply-autofixes "Direct link to Step 3: Review and apply autofixes") After the deprecation scan completes, review the findings and apply automatic fixes: 1. In the **Command history** panel, review the list of deprecation warnings. 2. Click the **Autofix warnings** button to proceed. [![Click Autofix warnings to resolve deprecations automatically](/img/docs/dbt-cloud/cloud-ide/autofix-button.png?v=2 "Click Autofix warnings to resolve deprecations automatically")](#)Click Autofix warnings to resolve deprecations automatically 3. In the **Proceed with autofix** dialog, review the warning and click **Continue**. [![Confirm autofix operation](/img/docs/dbt-cloud/cloud-ide/proceed-with-autofix.png?v=2 "Confirm autofix operation")](#)Confirm autofix operation The tool automatically modifies your project files to resolve fixable deprecations, then runs another parse to identify any remaining warnings. 4. When complete, a success message appears. Click **Review changes**. [![Autofix complete](/img/docs/dbt-cloud/cloud-ide/autofix-success.png?v=2 "Autofix complete")](#)Autofix complete ##### Step 4: Verify the changes[​](#step-4-verify-the-changes "Direct link to Step 4: Verify the changes") Review the changes made by the autofix tool to ensure they're correct: 1. Open the **Version control** panel to view all modified files. 2. Click on individual files to review the specific changes. 3. Look for files with moved configurations, corrected properties, or updated syntax. 4. If needed, make any additional manual adjustments. ##### Step 5: Commit your changes[​](#step-5-commit-your-changes "Direct link to Step 5: Commit your changes") Once you're satisfied with the autofix changes, commit them to your branch: 1. In the **Version control** panel, add a descriptive commit message like "Fix deprecation warnings for Fusion upgrade". 2. Click **Commit and sync** to save your changes. ##### Step 6: Address remaining deprecations[​](#step-6-address-remaining-deprecations "Direct link to Step 6: Address remaining deprecations") If the autofix tool reports remaining deprecation warnings that couldn't be automatically fixed: 1. Review the warning messages in the **Command history** panel. Each warning includes the file path and line number. 2. Manually update the code based on the deprecation guidance: * Custom inputs should be moved to the `meta` config. * Deprecated properties should be updated to their new equivalents. * Refer to specific [version upgrade guides](https://docs.getdbt.com/docs/dbt-versions/core-upgrade.md) for detailed migration instructions. 3. After making manual fixes, run **Check & fix deprecations** again to verify all warnings are resolved. 4. Commit your changes. ##### Step 7: Merge to your main branch[​](#step-7-merge-to-your-main-branch "Direct link to Step 7: Merge to your main branch") Once all deprecations are resolved: 1. Create a pull request in your git provider to merge your deprecation fixes. 2. Have your team review the changes. 3. Merge the PR to your main development branch. 4. Ensure these changes are deployed to your environments before proceeding with the Fusion upgrade. #### Validate and upgrade your dbt packages[​](#validate-and-upgrade-your-dbt-packages "Direct link to Validate and upgrade your dbt packages") Run autofix first This section contains instructions for manual package upgrades. We recommend running the autofix tool before taking these steps. The autofix tool finds packages incompatible with Fusion and upgrades them to the lowest compatible version. For more information, check out [package support](https://docs.getdbt.com/docs/fusion/supported-features.md#package-support). dbt packages extend your project's functionality, but they must be compatible with Fusion. Most commonly used packages from dbt Labs (like `dbt_utils` and `dbt_project_evaluator`) and many community packages [already support Fusion](https://docs.getdbt.com/docs/fusion/supported-features.md#package-support). Before upgrading, verify your packages are compatible and upgrade them to the latest versions. Check for packages that support version 2.0.0, or ask the maintainer if you're unsure. What if a package isn't compatible? If a critical package isn't yet compatible with Fusion: * Check with the package maintainer about their roadmap. * Open an issue requesting Fusion support. * Consider contributing the compatibility updates yourself. * Try it out anyway! The incompatible portion of the package might not impact your project. ###### Package compatibility messages[​](#package-compatibility-messages "Direct link to Package compatibility messages") Inconsistent Fusion warnings and `dbt-autofix` logs Fusion warnings and `dbt-autofix` logs may show different messages about package compatibility. If you use [`dbt-autofix`](https://github.com/dbt-labs/dbt-autofix) while upgrading to Fusion in the Studio IDE or dbt VS Code extension, you may see different messages about package compatibility between `dbt-autofix` and Fusion warnings. Here's why: * Fusion warnings are emitted based on a package's `require-dbt-version` and whether `require-dbt-version` contains `2.0.0`. * Some packages are already Fusion-compatible even though package maintainers haven't yet updated `require-dbt-version`. * `dbt-autofix` knows about these compatible packages and will not try to upgrade a package that it knows is already compatible. This means that even if you see a Fusion warning for a package that `dbt-autofix` identifies as compatible, you don't need to change the package. The message discrepancy is temporary while we implement and roll out `dbt-autofix`'s enhanced compatibility detection to Fusion warnings. Here's an example of a Fusion warning in the Studio IDE that says a package isn't compatible with Fusion but `dbt-autofix` indicates it is compatible: ```text dbt1065: Package 'dbt_utils' requires dbt version [>=1.30,<2.0.0], but current version is 2.0.0-preview.72. This package may not be compatible with your dbt version. dbt(1065) [Ln 1, Col 1] ``` ##### Step 1: Review your current packages[​](#step-1-review-your-current-packages "Direct link to Step 1: Review your current packages") Identify which packages your project uses: 1. In the Studio IDE, open your project's root directory. 2. Look for either `packages.yml` or `dependencies.yml` file. 3. Review the list of packages and their current versions. Your file will look something like this: ```yaml packages: - package: dbt-labs/dbt_utils version: 1.0.0 - package: dbt-labs/codegen version: 0.9.0 ``` ##### Step 2: Check compatibility and find the latest package versions[​](#step-2-check-compatibility-and-find-the-latest-package-versions "Direct link to Step 2: Check compatibility and find the latest package versions") Review [the dbt package hub](https://hub.getdbt.com) to see verified Fusion-compatible packages by checking that the `require-dbt-version` configuration includes `2.0.0` or higher. Refer to [package support](https://docs.getdbt.com/docs/fusion/supported-features.md#package-support) for more information. For packages that aren't Fusion-compatible: * Visit the package's GitHub repository. * Check the README or recent releases for Fusion compatibility information. * Look for issues or discussions about Fusion support. For each package, find the most recent version: * Visit [dbt Hub](https://hub.getdbt.com) for packages hosted there. * For packages from GitHub, check the repository's releases page. * Note the latest version number for each package you use. For Hub packages, you can use version ranges to stay up-to-date: ```yaml packages: - package: dbt-labs/dbt_utils version: [">=1.0.0", "<3.0.0"] # Gets latest 1.x or 2.x version ``` ##### Step 3: Update your package versions[​](#step-3-update-your-package-versions "Direct link to Step 3: Update your package versions") Update your `packages.yml` or `dependencies.yml` file with the latest compatible versions: 1. In the Studio IDE, open your `packages.yml` or `dependencies.yml` file. 2. Update each package version to the latest compatible version. 3. Save the file. Before update: ```yaml packages: - package: dbt-labs/dbt_utils version: 0.9.6 - package: dbt-labs/codegen version: 0.9.0 ``` After update: ```yaml packages: - package: dbt-labs/dbt_utils version: [">=1.0.0", "<2.0.0"] - package: dbt-labs/codegen version: [">=0.12.0", "<1.0.0"] ``` ##### Step 4: Install updated packages[​](#step-4-install-updated-packages "Direct link to Step 4: Install updated packages") After updating your package versions, install them: 1. In the Studio IDE command line, run: ```bash dbt deps --upgrade ``` The `--upgrade` flag ensures dbt installs the latest versions within your specified ranges, updating the `package-lock.yml` file. 2. Review the output to confirm all packages installed successfully. 3. Check that the `package-lock.yml` file was updated with the new package versions. About package-lock.yml The `package-lock.yml` file pins your packages to specific versions for reproducible builds. We recommend committing this file to version control so your entire team uses the same package versions. ##### Step 5: Test your project with updated packages[​](#step-5-test-your-project-with-updated-packages "Direct link to Step 5: Test your project with updated packages") After upgrading packages, test your project to ensure everything works: 1. Run a subset of your models to verify basic functionality: ```bash dbt run --select tag:daily ``` 2. Run your tests to catch any breaking changes (exact command may vary): ```bash dbt test ``` 3. If you encounter issues: * Review the package's changelog for breaking changes * Adjust your code to match new package behavior * If problems persist, temporarily pin to an older compatible version (if possible) ##### Step 6: Commit package updates[​](#step-6-commit-package-updates "Direct link to Step 6: Commit package updates") Once you've verified the updated packages work correctly: 1. In the **Version control** panel, stage your changes: * `packages.yml` or `dependencies.yml` * `package-lock.yml` 2. Add a commit message like "Upgrade dbt packages for Fusion compatibility". 3. Click **Commit and sync**. #### Check for known Fusion limitations[​](#check-for-known-fusion-limitations "Direct link to Check for known Fusion limitations") While Fusion supports most of dbt Core's capabilities, some features have limited support or are still in development. Before upgrading, review your project to identify any features that Fusion doesn't yet fully support. This allows you to plan accordingly — whether that means removing non-critical features, implementing workarounds, or waiting for specific features to become available. Fusion is rapidly evolving Many limitations are being addressed as Fusion moves toward General Availability. You can track progress on specific features through the [dbt-fusion GitHub milestones](https://github.com/dbt-labs/dbt-fusion/milestones) and stay updated via the [Fusion Diaries](https://github.com/dbt-labs/dbt-fusion/discussions/categories/announcements). ##### Step 1: Review the limitations table[​](#step-1-review-the-limitations-table "Direct link to Step 1: Review the limitations table") Start by understanding which features have limited or no support in Fusion: Visit the [Fusion supported features page](https://docs.getdbt.com/docs/fusion/supported-features.md#limitations) and review the limitations table to see features that may affect your project. Common limitations include: * **Model-level notifications:** Job-level notifications work, model-level don't yet * **Semantic Layer development:** Active semantic model development should stay on dbt Core * **SQLFluff linting:** Not integrated yet (though linting will be built into Fusion directly) ##### Step 2: Search your project for limited features[​](#step-2-search-your-project-for-limited-features "Direct link to Step 2: Search your project for limited features") Check if your project uses any features with limited support. For example: 1. Check for Python models: * Python models for Snowflake, BigQuery, and Databricks are supported in Fusion. If you use Python models on other data platforms, confirm [Fusion support](https://docs.getdbt.com/docs/fusion/supported-features.md) for your data platform. * In the Studio IDE, look in your `models/` directory * Search for files with `.py` extensions 2. Review your `dbt_project.yml` for specific configurations: * Look for `store_failures` settings * Check for custom materializations beyond `view`, `table`, and `incremental` * Review any `warn-error` or `warn-error-options` configurations 3. Check your job configurations: * Review any jobs using `--fail-fast` flag * Identify jobs using `--store-failures` * Note that [Advanced CI (dbt compare in orchestration)](https://docs.getdbt.com/docs/deploy/advanced-ci.md) is supported in Fusion. 4. Review model governance settings: * Search for models with `deprecation_date` set * Note these may not generate deprecation warnings yet in Fusion ##### Step 3: Assess the impact[​](#step-3-assess-the-impact "Direct link to Step 3: Assess the impact") For each limitation that affects your project, determine its criticality: * **Critical features:** Features your project can't function without: * Python models for Snowflake, BigQuery, and Databricks are supported in Fusion. If you use Python models on other data platforms, confirm [Fusion support](https://docs.getdbt.com/docs/fusion/supported-features.md) for your data platform. * If Semantic Layer development is active, continue those workloads on dbt Core * **Nice-to-have features:** Features that improve workflows but aren't blockers: * Model-level notifications can be replaced with job-level notifications temporarily * SQLFluff linting can continue running with dbt Core in CI * **Minimal impact:** Features you can easily work around: * `--fail-fast` can be removed from job commands * `--store-failures` can be disabled temporarily ##### Step 4: Create an action plan[​](#step-4-create-an-action-plan "Direct link to Step 4: Create an action plan") Based on your assessment, decide how to handle each limitation: * Remove non-critical features: Temporarily disable features you can live without: Before (in model config): ```sql {{ config( materialized='incremental', store_failures=true ) }} ``` After: ```sql {{ config( materialized='incremental' ) }} ``` * Implement workarounds for low-impact features. * Use job-level notifications instead of model-level * Run SQLFluff linting separately in CI with dbt Core * Use standard state selection instead of granular subselectors ##### Step 5: Document your findings[​](#step-5-document-your-findings "Direct link to Step 5: Document your findings") Create a record of limitations affecting your project: 1. In your Studio IDE, create a document (like `FUSION_MIGRATION.md`) listing: * Features your project uses that Fusion doesn't fully support * Which models or jobs are affected * Your mitigation strategy for each limitation * GitHub issue links to track when features become available 2. It's critical that your teams understand the limitations so share this document with your stakeholders. ##### Step 6: Track feature progress[​](#step-6-track-feature-progress "Direct link to Step 6: Track feature progress") Stay up-to-date with feature availability: 1. Subscribe to relevant GitHub issues for features you need (linked in the [limitations table](https://docs.getdbt.com/docs/fusion/supported-features.md#limitations)). 2. Follow the [Fusion Diaries](https://github.com/dbt-labs/dbt-fusion/discussions/categories/announcements) for updates. 3. Check the [dbt-fusion milestones](https://github.com/dbt-labs/dbt-fusion/milestones) to see release timelines. #### What's next?[​](#whats-next "Direct link to What's next?") With limitations identified and addressed, you've completed all the preparation steps. Your project is now ready to upgrade to Fusion! Check out [Part 2: Making the move](https://docs.getdbt.com/guides/upgrade-to-fusion.md) #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Upgrade to Fusion part 2: Making the move This guide helps you implement an in-place upgrade from the latest version of dbt Core to the dbt Fusion engine in the dbt platform. [Back to guides](https://docs.getdbt.com/guides.md) dbt Fusion engine dbt platform Upgrade Intermediate [Menu ]() #### Introduction[​](#introduction "Direct link to Introduction") private preview The dbt Fusion engine is available as a private preview for all tiers of dbt platform accounts. dbt Labs is enabling Fusion only on accounts that have eligible projects. Following the steps outlined in this guide doesn't guarantee Fusion eligibility. The dbt Fusion engine represents the next evolution of data transformation. dbt has been rebuilt from the ground up but at its most basic, Fusion is a new version, and moving to it is the same as upgrading between dbt Core versions in the dbt platform. Once your project is Fusion ready, it's only a matter of pulling a few levers to make the move, but you have some flexibility in how you do so, especially in your development environments. Once you complete the Fusion migration, your team will benefit from: * ⚡ Up to 30x faster parsing and compilation * 💰 30%+ reduction in warehouse costs (with state-aware orchestration) * 🔍 Enhanced SQL validation and error messages * 🚀 [State-aware orchestration](https://docs.getdbt.com/docs/deploy/state-aware-about.md) for intelligent model rebuilding * 🛠️ Modern development tools Fusion availability Fusion on the dbt platform is currently in `Private preview`. Enabling it for your account depends on your plan: * **Enteprise and Enterprise+ plans:** Contact your account manager to enable Fusion for your environment. * **Developer and Starter plans:** Complete the steps in the [Part 1: Prepare for upgrade](https://docs.getdbt.com/guides/prepare-fusion-upgrade.md) guide to become Fusion eligible, and it will be enabled for your account automatically so you can start the upgrade processes. #### Prerequisites[​](#prerequisites "Direct link to Prerequisites") Before upgrading your development environment, confirm: * Your project is on the **Latest** release track (completed in [Part 1: Preparing to upgrade](https://docs.getdbt.com/guides/prepare-fusion-upgrade.md)) * Your project must be using a supported adapter and auth method.  BigQuery * Service Account / User Token * Native OAuth * External OAuth * [Required permissions](https://docs.getdbt.com/docs/local/connect-data-platform/bigquery-setup.md#required-permissions)  Databricks * Service Account / User Token * Native OAuth  Redshift * Username / Password * IAM profile  Snowflake * Username / Password * Native OAuth * External OAuth * Key pair using a modern PKCS#8 method * MFA * You have a developer license in dbt platform * Fusion has been enabled for your account * You have appropriate permissions to modify environments (see [Assign upgrade access](https://docs.getdbt.com/guides/upgrade-to-fusion?step=3#assign-upgrade-access-optional) if restricted) #### Upgrade your development environment[​](#upgrade-your-development-environment "Direct link to Upgrade your development environment") With your project prepared and tested on the **Latest** release track, you're ready to upgrade your development environment to Fusion. The dbt platform provides a guided upgrade assistant that walks you through the process and helps validate your project is Fusion ready. Start with development Always upgrade your development environment first before moving to production. This lets you and your team test Fusion in a safe environment and address any issues before they affect production workflows. ##### Assign upgrade access (optional)[​](#assign-upgrade-access-optional "Direct link to Assign upgrade access (optional)") By default, the Fusion upgrade assistant is visible to all users, but account admins can restrict access using the **Fusion admin** [permission set](https://docs.getdbt.com/docs/cloud/manage-access/enterprise-permissions.md#fusion-admin) To limit access to the upgrade workflow: 1. Navigate to **Account settings** in dbt platform. 2. Select **Groups** and choose the group to grant access. 3. Click **Edit** and scroll to **Access and permissions**. 4. Click **Add permission** and select **Fusion admin** from the dropdown. 5. Select the project(s) users should access. 6. Click **Save**. [![Assign Fusion admin permissions to groups](/img/docs/dbt-cloud/cloud-configuring-dbt-cloud/choosing-dbt-version/assign-fusion-admin.png?v=2 "Assign Fusion admin permissions to groups")](#)Assign Fusion admin permissions to groups For more details on access control, see [Assign access to upgrade](https://docs.getdbt.com/docs/dbt-versions/upgrade-dbt-version-in-cloud.md#assign-access-to-upgrade). ##### Step 1: Start the upgrade assistant[​](#step-1-start-the-upgrade-assistant "Direct link to Step 1: Start the upgrade assistant") Launch the Fusion upgrade workflow from your project: 1. Log into dbt platform and navigate to your project. 2. From the project homepage or sidebar, click **Start Fusion upgrade** or **Get started**. [![Start the Fusion upgrade from the project homepage](/img/docs/dbt-cloud/cloud-configuring-dbt-cloud/choosing-dbt-version/start-upgrade.png?v=2 "Start the Fusion upgrade from the project homepage")](#)Start the Fusion upgrade from the project homepage You'll be redirected to the Studio IDE with the upgrade assistant visible at the top. ##### Step 2: Check for deprecation warnings[​](#step-2-check-for-deprecation-warnings "Direct link to Step 2: Check for deprecation warnings") Even if you resolved deprecations in Part 1, run a final check to ensure nothing was missed: 1. At the top of the Studio IDE, click **Check deprecation warnings**. [![Check for deprecation warnings in your project](/img/docs/dbt-cloud/cloud-configuring-dbt-cloud/choosing-dbt-version/check-deprecations.png?v=2 "Check for deprecation warnings in your project")](#)Check for deprecation warnings in your project 2. Wait for the parse to complete (this may take a few moments depending on project size). 3. Review the results: * **No warnings found**: Skip to Step 4 to continue upgrading. * **Warnings found**: Continue to Step 3 to resolve them. Inconsistent Fusion warnings and `dbt-autofix` logs You may see Fusion deprecation warnings about packages not being compatible with Fusion, while `dbt autofix` indicates they are compatible. Use `dbt autofix` as the source of truth because it has additional context that Fusion warnings don't have yet. This conflict is temporary and will be resolved as soon as we implement and roll out `dbt-autofix`'s enhanced compatibility detection to Fusion warnings. ##### Step 3: Resolve remaining deprecations[​](#step-3-resolve-remaining-deprecations "Direct link to Step 3: Resolve remaining deprecations") If you find deprecation warnings, use the autofix tool to resolve them: 1. In the deprecation warnings list, click **Autofix warnings**. 2. Review the proposed changes in the dialog. 3. Click **Continue** to apply the fixes automatically. 4. Wait for the autofix tool to complete and run a follow-up parse. 5. Review the modified files in the **Version control** panel. 6. If all warnings are resolved, you'll see a success message. [![Success message when deprecations are resolved](/img/docs/dbt-cloud/cloud-configuring-dbt-cloud/choosing-dbt-version/autofix-success.png?v=2 "Success message when deprecations are resolved")](#)Success message when deprecations are resolved For detailed information about the autofix process, see [Fix deprecation warnings](https://docs.getdbt.com/docs/cloud/studio-ide/autofix-deprecations.md). Manual fixes required? If the autofix tool can't resolve all deprecations automatically, you'll need to fix them manually. Review the warning messages for specific guidance, make the necessary changes in your code, then run **Check deprecation warnings** again. ##### Step 4: Enable Fusion[​](#step-4-enable-fusion "Direct link to Step 4: Enable Fusion") After you resolve all deprecations, upgrade your development environment: 1. Click the **Enable Fusion** button at the top of the Studio IDE. 2. Confirm the upgrade when prompted. 3. Wait for the environment to update (this typically takes just a few seconds). Your development environment is now running on Fusion! ##### Step 5: Restart the IDE[​](#step-5-restart-the-ide "Direct link to Step 5: Restart the IDE") After upgrading, all users need to restart their IDE to connect to the new Fusion-powered environment: 1. If you're currently in the Studio IDE, refresh your browser window. 2. Notify your team members that they also need to restart their IDEs. ##### Step 6: Verify the upgrade[​](#step-6-verify-the-upgrade "Direct link to Step 6: Verify the upgrade") Confirm your development environment is running Fusion: 1. Open or create a dbt model file in the Studio IDE. 2. Look for Fusion-powered [features](https://docs.getdbt.com/docs/fusion/supported-features.md#features-and-capabilities): * Faster parsing and compilation times * Enhanced SQL validation and error messages * Improved autocomplete functionality 3. Run a simple command to test functionality: ```bash dbt compile ``` 4. Check the command output for significantly faster performance. ##### Step 7: Test your workflows[​](#step-7-test-your-workflows "Direct link to Step 7: Test your workflows") Before declaring victory, test your typical development workflows: 1. Make changes to a model and compile it by running `dbt compile`. 2. Run a subset of models: `dbt run --select model_name`. 3. Execute tests. 4. Preview results in the integrated query tool. 5. Verify Git operations (commit, push, pull) work as expected. Share feedback If you encounter any unexpected behavior or have feedback about the Fusion experience, share it with your account team or [dbt Support](https://docs.getdbt.com/docs/dbt-support.md). ##### What about production?[​](#what-about-production "Direct link to What about production?") Your development environment is now on Fusion, but your production environment and deployment jobs are still running on dbt Core. This is intentional as it gives you and your team time to: * Test Fusion thoroughly in development. * Build confidence in the new engine. * Identify and resolve any project-specific issues. * Train team members on any workflow changes. When you're ready to upgrade production, you'll update your deployment environments and jobs to use the `Latest Fusion` release track. We'll cover that in the next section. #### Upgrade staging and intermediate environments[​](#upgrade-staging-and-intermediate-environments "Direct link to Upgrade staging and intermediate environments") After successfully upgrading and testing your development environment, the next step is upgrading your staging or other intermediate deployment environments. These environments serve as a critical validation layer before promoting Fusion to production, allowing you to test with production-like data and workflows while limiting risk. Why upgrade staging first? Staging environments provide: * A final validation layer for Fusion with production-scale data * The ability to test scheduled jobs and deployment workflows * An opportunity to verify integrations and downstream dependencies * A safe environment to identify performance characteristics before production ##### What is a staging environment?[​](#what-is-a-staging-environment "Direct link to What is a staging environment?") A [staging environment](https://docs.getdbt.com/docs/deploy/deploy-environments.md#staging-environment) is a deployment environment that mirrors your production setup but uses non-production data or limited access credentials. It enables your team to test deployment workflows, scheduled jobs, and data transformations without affecting production systems. If you don't have a staging environment yet, consider creating one before upgrading production to Fusion. It provides an invaluable testing ground. ##### Step 1: Navigate to environment settings[​](#step-1-navigate-to-environment-settings "Direct link to Step 1: Navigate to environment settings") Access the settings for your staging or intermediate environment: 1. Log into dbt platform and navigate to your project. 2. Click **Orchestration** in the left sidebar. 3. Select **Environments** from the dropdown. 4. Click on your staging environment name to open its settings. 5. Click the **Edit** button in the top right. [![Navigate to environment settings](/img/docs/dbt-cloud/cloud-configuring-dbt-cloud/choosing-dbt-version/example-environment-settings.png?v=2 "Navigate to environment settings")](#)Navigate to environment settings ##### Step 2: Update the dbt version[​](#step-2-update-the-dbt-version "Direct link to Step 2: Update the dbt version") Change your staging environment to use the Fusion release track: 1. In the environment settings, scroll to the **dbt version** section. 2. Click the **dbt version** dropdown menu. 3. Select **Latest Fusion** from the list. 4. Scroll to the top and click **Save**. [![Select Latest Fusion from the dbt version dropdown](/img/docs/dbt-cloud/cloud-configuring-dbt-cloud/cloud-upgrading-dbt-versions/upgrade-fusion.png?v=2 "Select Latest Fusion from the dbt version dropdown")](#)Select Latest Fusion from the dbt version dropdown Your staging environment is now configured to use Fusion! Any jobs associated with this environment will use Fusion on their next run. ##### Step 3: Run a test job[​](#step-3-run-a-test-job "Direct link to Step 3: Run a test job") Validate that Fusion works correctly in your staging environment by running a job: 1. From the **Environments** page, click on your staging environment. 2. Select an existing job or click **Create job** to make a new one. 3. Click **Run now** to execute the job immediately. 4. Monitor the job run in real-time by clicking into the run details. ##### Step 4: Monitor scheduled jobs[​](#step-4-monitor-scheduled-jobs "Direct link to Step 4: Monitor scheduled jobs") If you have scheduled jobs in your staging environment, monitor their next scheduled runs: 1. Navigate to **Deploy** → **Jobs** and filter to your staging environment. 2. Wait for scheduled jobs to run automatically (or trigger them manually). 3. Review job run history for any unexpected failures or warnings. 4. Compare run times to previous dbt Core runs. You should see significant improvements. ##### Step 5: Validate integrations and dependencies[​](#step-5-validate-integrations-and-dependencies "Direct link to Step 5: Validate integrations and dependencies") Test any integrations or dependencies that rely on your staging environment: 1. **Cross-project references**: If using [dbt Mesh](https://docs.getdbt.com/docs/mesh/govern/project-dependencies.md), verify downstream projects can still reference your staging models. 2. **BI tools**: Check that any BI tools or dashboards connected to staging still function correctly. 3. **Downstream consumers**: Notify teams that consume staging data to verify their processes still work. 4. **CI/CD workflows**: Run any CI jobs that target staging to ensure they execute properly. Repeat for other intermediate environments Found an issue? If you encounter problems in staging: * Review the [Fusion limitations](https://docs.getdbt.com/docs/fusion/supported-features.md#limitations) to see if it's a known issue. * Check job logs for specific error messages. * Test the same models in your development environment to isolate the problem. * Contact [dbt Support](https://docs.getdbt.com/docs/dbt-support.md) or your account team for assistance. You can revert the staging environment to **Latest** release track while investigating. ##### How long should I test in staging?[​](#how-long-should-i-test-in-staging "Direct link to How long should I test in staging?") The recommended testing period depends on your organization: * **Minimum**: Run all critical jobs at least once successfully. * **Recommended**: Monitor scheduled jobs for 3-7 days to catch any time-based or data-dependent issues. * **Enterprise/Complex projects**: Consider 1-2 weeks of testing, especially if you have many downstream dependencies. Don't rush this phase. Thorough testing in staging prevents production disruptions. *** #### Upgrade your production environment[​](#upgrade-your-production-environment "Direct link to Upgrade your production environment") Congratulations! You've successfully upgraded development and staging environments and you're now ready for the final step: upgrading your production environment to the dbt Fusion engine. Production environment upgrade considerations Upgrading production is a critical operation. While Fusion is production ready and has been thoroughly tested in your dev and staging environments, follow these best practices: * Plan the upgrade during a low-traffic window to minimize impact. * Notify stakeholders about the maintenance window. * Have a rollback plan ready (reverting to **Latest** release track). * Monitor closely for the first few job runs after upgrading. ##### Step 1: Plan your maintenance window[​](#step-1-plan-your-maintenance-window "Direct link to Step 1: Plan your maintenance window") Choose an optimal time to upgrade production: * **Review your job schedule:** Identify periods with minimal job activity. * **Check downstream dependencies:** Ensure dependent systems can tolerate brief interruptions. * **Notify stakeholders:** Inform BI tool users, data consumers, and team members. * **Document the plan:** Note which jobs to monitor and success criteria. ##### Step 2: Navigate to production environment settings[​](#step-2-navigate-to-production-environment-settings "Direct link to Step 2: Navigate to production environment settings") Access your production environment configuration: 1. Log into dbt platform and navigate to your project. 2. Click **Orchestration** in the left sidebar. 3. Select **Environments** from the dropdown. 4. Click on your production environment (typically marked with a **Production** badge). 5. Click the **Edit** button in the top right. [![Access production environment settings](/img/docs/dbt-cloud/cloud-configuring-dbt-cloud/choosing-dbt-version/example-environment-settings.png?v=2 "Access production environment settings")](#)Access production environment settings ##### Step 3: Upgrade to Latest Fusion[​](#step-3-upgrade-to-latest-fusion "Direct link to Step 3: Upgrade to Latest Fusion") Update your production environment to use Fusion: 1. In the environment settings, scroll to the **dbt version** section. 2. Click the **dbt version** dropdown menu. 3. Select **Latest Fusion** from the list. 4. Review your settings one final time to ensure everything is correct. 5. Scroll to the top and click **Save**. [![Select Latest Fusion for production](/img/docs/dbt-cloud/cloud-configuring-dbt-cloud/cloud-upgrading-dbt-versions/upgrade-fusion.png?v=2 "Select Latest Fusion for production")](#)Select Latest Fusion for production Your production environment is now running on Fusion! ##### Step 4: Run an immediate test job[​](#step-4-run-an-immediate-test-job "Direct link to Step 4: Run an immediate test job") Validate the upgrade by running a job: 1. From the **Environments** page, click on your production environment. 2. Select a critical job that covers a good subset of your models. 3. Click **Run now** to execute the job immediately. 4. Monitor the job run closely: * Check the **parse** and **compile** steps. * Verify all models build successfully. * Confirm tests pass as expected. * Review the logs for any unexpected warnings. If the job succeeds, your production upgrade is successful! ##### Step 5: Enable state-aware orchestration (optional but recommended) [Enterprise](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing")Enterprise+[​](#step-5-enable-state-aware-orchestration-optional-but-recommended- "Direct link to step-5-enable-state-aware-orchestration-optional-but-recommended-") One of Fusion's most powerful features is [state-aware orchestration](https://docs.getdbt.com/docs/deploy/state-aware-about.md), which automatically determines which models need rebuilding based on code or data changes. This can reduce warehouse costs by 30% or more. New jobs automatically have state-aware orchestration enabled in Fusion environments. To enable it for existing jobs: 1. Navigate to **Deploy** → **Jobs**. 2. Click on a production job to open its settings. 3. Click **Edit** in the top right. 4. Scroll to **Execution settings**. 5. Check the box for **Enable Fusion cost optimization features**. 6. Expand **More options** to see additional settings: * **State-aware orchestration** * **Efficient testing** 7. Click **Save**. [![Enable Fusion cost optimization features](/img/docs/dbt-cloud/using-dbt-cloud/example-triggers-section.png?v=2 "Enable Fusion cost optimization features")](#)Enable Fusion cost optimization features Repeat this for all production jobs to maximize cost savings. For more details, see [Setting up state-aware orchestration](https://docs.getdbt.com/docs/deploy/state-aware-setup.md). Dropped tables and views If using state-aware orchestration, dbt doesn’t detect a change if a table or view is dropped outside of dbt, as the cache is unique to each dbt platform environment. This means state-aware orchestration will not rebuild that model until either there is new data or a change in the code that the model uses. To circumvent this limitation: * Use the **Clear cache** button on the target Environment page to force a full rebuild (acts like a reset), or * Temporarily disable State-aware orchestration for the job and rerun it. ##### Step 6: Monitor production jobs[​](#step-6-monitor-production-jobs "Direct link to Step 6: Monitor production jobs") Watch your production jobs closely for the first 24-48 hours: * **Check scheduled job runs:** Navigate to **Deploy** → **Jobs** → **Run history** * **Monitor run times:** Compare to historical averages. You should see significant improvements. * **Review the state-aware interface**: Check the [Models built and reused chart](https://docs.getdbt.com/docs/deploy/state-aware-interface.md) to see cost savings in action. * **Watch for warnings**: Review logs for any unexpected messages. State-aware monitoring With state-aware orchestration enabled, you'll see models marked as **Reused** in the job logs when they don't need rebuilding. This is expected behavior and indicates cost savings! ##### Step 7: Validate downstream integrations[​](#step-7-validate-downstream-integrations "Direct link to Step 7: Validate downstream integrations") Ensure all systems dependent on your production data still function correctly: 1. **BI tools:** Verify dashboards and reports refresh properly. 2. **Data consumers:** Confirm downstream teams can access and query data. 3. **APIs and integrations:** Test any applications that consume dbt outputs. 4. **Semantic Layer:** If using the [dbt Semantic Layer](https://docs.getdbt.com/docs/use-dbt-semantic-layer/dbt-sl.md), verify metrics queries work. 5. **Alerts and monitoring**: Check that data quality alerts and monitors function correctly. ##### Step 8: Update any remaining jobs with version overrides[​](#step-8-update-any-remaining-jobs-with-version-overrides "Direct link to Step 8: Update any remaining jobs with version overrides") Some jobs might have [version overrides](https://docs.getdbt.com/docs/dbt-versions/upgrade-dbt-version-in-cloud.md#override-dbt-version) set from earlier testing. Now that production is on Fusion, remove these overrides: 1. Navigate to **Orchestration** → **Jobs**. 2. Review each job's settings. 3. If a job has a version override (showing in the **dbt version** section), click **Edit**. 4. Remove the override to let the job inherit the environment's Fusion setting. 5. Click **Save**. ##### Rollback procedure[​](#rollback-procedure "Direct link to Rollback procedure") If you encounter critical issues in production, you can revert your dbt version: 1. Navigate to **Orchestration** → **Environments** → **Production**. 2. Click **Edit**. 3. Change **dbt version** from **Latest Fusion** back to **Latest**. 4. Click **Save**. 5. Jobs will use dbt Core on their next run. Rollback impact Rolling back to **Latest** will disable Fusion-specific features like state-aware orchestration. Only rollback if you're experiencing production-critical issues. #### Next steps[​](#next-steps "Direct link to Next steps") 🎉 Congratulations! You've successfully upgraded your entire dbt platform project to Fusion! For your next steps: * **Optimize further**: Explore [advanced state-aware configurations](https://docs.getdbt.com/docs/deploy/state-aware-setup.md#advanced-configurations) to fine-tune refresh intervals. * **Monitor savings**: Use the [state-aware interface](https://docs.getdbt.com/docs/deploy/state-aware-interface.md) to track models built vs. reused. * **Train your team**: Share Fusion features and best practices with your team. * **Explore new features**: Check out column-level lineage, live CTE previews, and other Fusion-powered capabilities. * **Stay informed**: Follow the [Fusion Diaries](https://github.com/dbt-labs/dbt-fusion/discussions/categories/announcements) for updates on new features. Share your success We'd love to hear about your Fusion upgrade experience! Share feedback with your account team or join the [dbt Community Slack](https://www.getdbt.com/community/join-the-community/) to discuss Fusion with other users. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Use Databricks workflows to run dbt jobs [Back to guides](https://docs.getdbt.com/guides.md) Databricks dbt Core dbt platform Orchestration Intermediate [Menu ]() #### Introduction[​](#introduction "Direct link to Introduction") Using Databricks workflows to call the dbt job API can be useful for several reasons: 1. **Integration with other ETL processes** — If you're already running other ETL processes in Databricks, you can use a Databricks workflow to trigger a dbt job after those processes are done. 2. **Utilizes dbt jobs features —** dbt gives the ability to monitor job progress, manage historical logs and documentation, optimize model timing, and much [more](https://docs.getdbt.com/docs/deploy/deploy-jobs.md). 3. [**Separation of concerns —**](https://en.wikipedia.org/wiki/Separation_of_concerns) Detailed logs for dbt jobs in the dbt environment can lead to more modularity and efficient debugging. By doing so, it becomes easier to isolate bugs quickly while still being able to see the overall status in Databricks. 4. **Custom job triggering —** Use a Databricks workflow to trigger dbt jobs based on custom conditions or logic that aren't natively supported by dbt's scheduling feature. This can give you more flexibility in terms of when and how your dbt jobs run. ##### Prerequisites[​](#prerequisites "Direct link to Prerequisites") * Active [Enterprise or Enterprise+ dbt account](https://www.getdbt.com/pricing/) * You must have a configured and existing [dbt deploy job](https://docs.getdbt.com/docs/deploy/deploy-jobs.md) * Active Databricks account with access to [Data Science and Engineering workspace](https://docs.databricks.com/workspace-index.html) and [Manage secrets](https://docs.databricks.com/security/secrets/index.html) * [Databricks CLI](https://docs.databricks.com/dev-tools/cli/index.html) * **Note**: You only need to set up your authentication. Once you have set up your Host and Token and are able to run `databricks workspace ls /Users/`, you can proceed with the rest of this guide. #### Set up a Databricks secret scope[​](#set-up-a-databricks-secret-scope "Direct link to Set up a Databricks secret scope") 1. Retrieve \*\*[personal access token](https://docs.getdbt.com/docs/dbt-cloud-apis/user-tokens.md) \*\*or \*\*[Service account token](https://docs.getdbt.com/docs/dbt-cloud-apis/service-tokens.md#generating-service-account-tokens) \*\*from dbt 2. Set up a **Databricks secret scope**, which is used to securely store your dbt API key. 3. Enter the **following commands** in your terminal: ```bash # In this example we set up a secret scope and key called "dbt-cloud" and "api-key" respectively. databricks secrets create-scope --scope databricks secrets put --scope --key --string-value "" ``` 4. Replace **``** and **``** with your own unique identifiers. Click [here](https://docs.databricks.com/security/secrets/index.html) for more information on secrets. 5. Replace **``** with the actual API key value that you copied from dbt in step 1. #### Create a Databricks Python notebook[​](#create-a-databricks-python-notebook "Direct link to Create a Databricks Python notebook") 1. [Create a **Databricks Python notebook**](https://docs.databricks.com/notebooks/notebooks-manage.html), which executes a Python script that calls the dbt job API. 2. Write a **Python script** that utilizes the `requests` library to make an HTTP POST request to the dbt job API endpoint using the required parameters. Here's an example script: ```python import enum import os import time import json import requests from getpass import getpass dbutils.widgets.text("job_id", "Enter the Job ID") job_id = dbutils.widgets.get("job_id") account_id = base_url = "" api_key = dbutils.secrets.get(scope = "", key = "") # These are documented on the dbt API docs class DbtJobRunStatus(enum.IntEnum): QUEUED = 1 STARTING = 2 RUNNING = 3 SUCCESS = 10 ERROR = 20 CANCELLED = 30 def _trigger_job() -> int: res = requests.post( url=f"https://{base_url}/api/v2/accounts/{account_id}/jobs/{job_id}/run/", headers={'Authorization': f"Token {api_key}"}, json={ # Optionally pass a description that can be viewed within the API. # See the API docs for additional parameters that can be passed in, # including `schema_override` 'cause': f"Triggered by Databricks Workflows.", } ) try: res.raise_for_status() except: print(f"API token (last four): ...{api_key[-4:]}") raise response_payload = res.json() return response_payload['data']['id'] def _get_job_run_status(job_run_id): res = requests.get( url=f"https://{base_url}/api/v2/accounts/{account_id}/runs/{job_run_id}/", headers={'Authorization': f"Token {api_key}"}, ) res.raise_for_status() response_payload = res.json() return response_payload['data']['status'] def run(): job_run_id = _trigger_job() print(f"job_run_id = {job_run_id}") while True: time.sleep(5) status = _get_job_run_status(job_run_id) print(DbtJobRunStatus(status)) if status == DbtJobRunStatus.SUCCESS: break elif status == DbtJobRunStatus.ERROR or status == DbtJobRunStatus.CANCELLED: raise Exception("Failure!") if __name__ == '__main__': run() ``` 3. Replace **``** and **``** with the values you used [previously](#set-up-a-databricks-secret-scope) 4. Replace **``** and **``** with the correct values of your environment and [Access URL](https://docs.getdbt.com/docs/cloud/about-cloud/access-regions-ip-addresses.md) for your region and plan. * To find these values, navigate to dbt, select **Deploy -> Jobs**. Select the Job you want to run and copy the URL. For example: `https://YOUR_ACCESS_URL/deploy/000000/projects/111111/jobs/222222` and therefore valid code would be: Your URL is structured `https:///deploy//projects//jobs/` account\_id = 000000 job\_id = 222222 base\_url = "cloud.getdbt.com" 5. Run the Notebook. It will fail, but you should see **a `job_id` widget** at the top of your notebook. 6. In the widget, **enter your `job_id`** from step 4. 7. **Run the Notebook again** to trigger the dbt job. Your results should look similar to the following: ```bash job_run_id = 123456 DbtJobRunStatus.QUEUED DbtJobRunStatus.QUEUED DbtJobRunStatus.QUEUED DbtJobRunStatus.STARTING DbtJobRunStatus.RUNNING DbtJobRunStatus.RUNNING DbtJobRunStatus.RUNNING DbtJobRunStatus.RUNNING DbtJobRunStatus.RUNNING DbtJobRunStatus.RUNNING DbtJobRunStatus.RUNNING DbtJobRunStatus.RUNNING DbtJobRunStatus.SUCCESS ``` You can cancel the job from dbt if necessary. #### Configure the workflows to run the dbt jobs[​](#configure-the-workflows-to-run-the-dbt-jobs "Direct link to Configure the workflows to run the dbt jobs") You can set up workflows directly from the notebook OR by adding this notebook to one of your existing workflows: * Create a workflow from existing Notebook * Add the Notebook to existing workflow 1. Click **Schedule** on the upper right side of the page 2. Click **Add a schedule** 3. Configure Job name, Schedule, Cluster 4. Add a new parameter called: `job_id` and fill in your job ID. Refer to [step 4 in previous section](#create-a-databricks-python-notebook) to find your job ID. 5. Click **Create** 6. Click **Run Now** to test the job 1) Open Existing **Workflow** 2) Click **Tasks** 3) Press **“+” icon** to add a new task 4) Enter the **following**: | Field | Value | | ---------- | ----------------------------- | | Task name | `` | | Type | Notebook | | Source | Workspace | | Path | `` | | Cluster | `` | | Parameters | `job_id`: `` | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | 5. Select **Save Task** 6. Click **Run Now** to test the workflow Multiple Workflow tasks can be set up using the same notebook by configuring the `job_id` parameter to point to different dbt jobs. Using Databricks workflows to access the dbt job API can improve integration of your data pipeline processes and enable scheduling of more complex workflows. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Use Jinja to improve your SQL code [Back to guides](https://docs.getdbt.com/guides.md) Jinja dbt Core Advanced [Menu ]() #### Introduction[​](#introduction "Direct link to Introduction") In this guide, we're going to take a common pattern used in SQL, and then use Jinja to improve our code. If you'd like to work through this query, add [this CSV](https://github.com/dbt-labs/jaffle_shop/blob/core-v1.0.0/seeds/raw_payments.csv) to the `seeds/` folder of your dbt project, and then execute `dbt seed`. While working through the steps of this model, we recommend that you have your compiled SQL open as well, to check what your Jinja compiles to. To do this: * **Using dbt:** Click the compile button to see the compiled SQL in the right hand pane * **Using dbt Core:** Run `dbt compile` from the command line. Then open the compiled SQL file in the `target/compiled/{project name}/` directory. Use a split screen in your code editor to keep both files open at once. #### Write the SQL without Jinja[​](#write-the-sql-without-jinja "Direct link to Write the SQL without Jinja") Consider a data model in which an `order` can have many `payments`. Each `payment` may have a `payment_method` of `bank_transfer`, `credit_card` or `gift_card`, and therefore each `order` can have multiple `payment_methods` From an analytics perspective, it's important to know how much of each `order` was paid for with each `payment_method`. In your dbt project, you can create a model, named `order_payment_method_amounts`, with the following SQL: models/order\_payment\_method\_amounts.sql ```sql select order_id, sum(case when payment_method = 'bank_transfer' then amount end) as bank_transfer_amount, sum(case when payment_method = 'credit_card' then amount end) as credit_card_amount, sum(case when payment_method = 'gift_card' then amount end) as gift_card_amount, sum(amount) as total_amount from {{ ref('raw_payments') }} group by 1 ``` The SQL for each payment method amount is repetitive, which can be difficult to maintain for a number of reasons: * If the logic or field name were to change, the code would need to be updated in three places. * Often this code is created by copying and pasting, which may lead to mistakes. * Other analysts that review the code are less likely to notice errors as it's common to only scan through repeated code. So we're going to use Jinja to help us clean it up, or to make our code more "DRY" ("Don't Repeat Yourself"). #### Use a for loop in models for repeated SQL[​](#use-a-for-loop-in-models-for-repeated-sql "Direct link to Use a for loop in models for repeated SQL") Here, the repeated code can be replaced with a `for` loop. The following will be compiled to the same query, but is significantly easier to maintain. /models/order\_payment\_method\_amounts.sql ```sql select order_id, {% for payment_method in ["bank_transfer", "credit_card", "gift_card"] %} sum(case when payment_method = '{{payment_method}}' then amount end) as {{payment_method}}_amount, {% endfor %} sum(amount) as total_amount from {{ ref('raw_payments') }} group by 1 ``` #### Set variables at the top of a model[​](#set-variables-at-the-top-of-a-model "Direct link to Set variables at the top of a model") We recommend setting variables at the top of a model, as it helps with readability, and enables you to reference the list in multiple places if required. This is a practice we've borrowed from many other programming languages. /models/order\_payment\_method\_amounts.sql ```sql {% set payment_methods = ["bank_transfer", "credit_card", "gift_card"] %} select order_id, {% for payment_method in payment_methods %} sum(case when payment_method = '{{payment_method}}' then amount end) as {{payment_method}}_amount, {% endfor %} sum(amount) as total_amount from {{ ref('raw_payments') }} group by 1 ``` #### Use loop.last to avoid trailing commas[​](#use-looplast-to-avoid-trailing-commas "Direct link to Use loop.last to avoid trailing commas") In the above query, our last column is outside of the `for` loop. However, this may not always be the case. If the last iteration of a loop is our final column, we need to ensure there isn't a trailing comma at the end. We often use an `if` statement, along with the Jinja variable `loop.last`, to ensure we don't add an extraneous comma: /models/order\_payment\_method\_amounts.sql ```sql {% set payment_methods = ["bank_transfer", "credit_card", "gift_card"] %} select order_id, {% for payment_method in payment_methods %} sum(case when payment_method = '{{payment_method}}' then amount end) as {{payment_method}}_amount {% if not loop.last %},{% endif %} {% endfor %} from {{ ref('raw_payments') }} group by 1 ``` An alternative way to write this is `{{ "," if not loop.last }}`. #### Use whitespace control to tidy up compiled code[​](#use-whitespace-control-to-tidy-up-compiled-code "Direct link to Use whitespace control to tidy up compiled code") If you've been checking your code in the `target/compiled` folder, you might have noticed that this code results in a lot of white space: target/compiled/jaffle\_shop/order\_payment\_method\_amounts.sql ```sql select order_id, sum(case when payment_method = 'bank_transfer' then amount end) as bank_transfer_amount , sum(case when payment_method = 'credit_card' then amount end) as credit_card_amount , sum(case when payment_method = 'gift_card' then amount end) as gift_card_amount from raw_jaffle_shop.payments group by 1 ``` We can use [whitespace control](https://jinja.palletsprojects.com/page/templates/#whitespace-control) to tidy up our code: models/order\_payment\_method\_amounts.sql ```sql {%- set payment_methods = ["bank_transfer", "credit_card", "gift_card"] -%} select order_id, {%- for payment_method in payment_methods %} sum(case when payment_method = '{{payment_method}}' then amount end) as {{payment_method}}_amount {%- if not loop.last %},{% endif -%} {% endfor %} from {{ ref('raw_payments') }} group by 1 ``` Getting whitespace control right is often a lot of trial and error! We recommend that you prioritize the readability of your model code over the readability of the compiled code, and only do this as an extra polish. #### Use a macro to return payment methods[​](#use-a-macro-to-return-payment-methods "Direct link to Use a macro to return payment methods") Here, we've hardcoded the list of payment methods in our model. We may need to access this list from another model. A good solution here is to use a [variable](https://docs.getdbt.com/docs/build/project-variables.md), but for the purpose of this tutorial, we're going to instead use a macro! [Macros](https://docs.getdbt.com/docs/build/jinja-macros.md#macros) in Jinja are pieces of code that can be called multiple times – they are analogous to a function in Python, and are extremely useful if you find yourself repeating code across multiple models. Our macro is simply going to return the list of payment methods: /macros/get\_payment\_methods.sql ```sql {% macro get_payment_methods() %} {{ return(["bank_transfer", "credit_card", "gift_card"]) }} {% endmacro %} ``` There's a few things worth noting here: * Normally, macros take arguments -- we'll see this later on, but for now, we still need to setup our macro with empty parentheses where the arguments would normally go (i.e. `get_payment_methods()`) * We've used the [return](https://docs.getdbt.com/reference/dbt-jinja-functions/return.md) function to return a list – without this function, the macro would return a string. Now that we have a macro for our payment methods, we can update our model as follows: models/order\_payment\_method\_amounts.sql ```sql {%- set payment_methods = get_payment_methods() -%} select order_id, {%- for payment_method in payment_methods %} sum(case when payment_method = '{{payment_method}}' then amount end) as {{payment_method}}_amount {%- if not loop.last %},{% endif -%} {% endfor %} from {{ ref('raw_payments') }} group by 1 ``` Note that we didn't use curly braces when calling the macro – we're already within a Jinja statement, so there's no need to use the brackets again. #### Dynamically retrieve the list of payment methods[​](#dynamically-retrieve-the-list-of-payment-methods "Direct link to Dynamically retrieve the list of payment methods") So far, we've been hardcoding the list of possible payment methods. If a new `payment_method` was introduced, or one of the existing methods was renamed, the list would need to be updated. However, at any given time you could know what `payment_methods` are used to make a payment by running the following query: ```sql select distinct payment_method from {{ ref('raw_payments') }} order by 1 ``` [Statements](https://docs.getdbt.com/reference/dbt-jinja-functions/statement-blocks.md) provide a way to run this query and return the results to your Jinja context. This means that the list of `payment_methods` can be set based on the data in your database rather than a hardcoded value. The easiest way to use a statement is through the [run\_query](https://docs.getdbt.com/reference/dbt-jinja-functions/run_query.md) macro. For the first version, let's check what we get back from the database, by logging the results to the command line using the [log](https://docs.getdbt.com/reference/dbt-jinja-functions/log.md) function. macros/get\_payment\_methods.sql ```sql {% macro get_payment_methods() %} {% set payment_methods_query %} select distinct payment_method from {{ ref('raw_payments') }} order by 1 {% endset %} {% set results = run_query(payment_methods_query) %} {{ log(results, info=True) }} {{ return([]) }} {% endmacro %} ``` The command line gives us back the following: ```bash | column | data_type | | -------------- | --------- | | payment_method | Text | ``` This is actually an [Agate table](https://agate.readthedocs.io/page/api/table.html). To get the payment methods back as a list, we need to do some further transformation. ```sql {% macro get_payment_methods() %} {% set payment_methods_query %} select distinct payment_method from {{ ref('raw_payments') }} order by 1 {% endset %} {% set results = run_query(payment_methods_query) %} {% if execute %} {# Return the first column #} {% set results_list = results.columns[0].values() %} {% else %} {% set results_list = [] %} {% endif %} {{ return(results_list) }} {% endmacro %} ``` There's a few tricky pieces in here: * We used the [execute](https://docs.getdbt.com/reference/dbt-jinja-functions/execute.md) variable to ensure that the code runs during the `parse` stage of dbt (otherwise an error would be thrown). * We used Agate methods to get the column back as a list Fortunately, our model code doesn't need to be updated, since we're already calling the macro to get the list of payment methods. And now, any new `payment_methods` added to the underlying data model will automatically be handled by the dbt model. #### Write modular macros[​](#write-modular-macros "Direct link to Write modular macros") You may wish to use a similar pattern elsewhere in your dbt project. As a result, you decide to break up your logic into two separate macros -- one to generically return a column from a relation, and the other that calls this macro with the correct arguments to get back the list of payment methods. macros/get\_payment\_methods.sql ```sql {% macro get_column_values(column_name, relation) %} {% set relation_query %} select distinct {{ column_name }} from {{ relation }} order by 1 {% endset %} {% set results = run_query(relation_query) %} {% if execute %} {# Return the first column #} {% set results_list = results.columns[0].values() %} {% else %} {% set results_list = [] %} {% endif %} {{ return(results_list) }} {% endmacro %} {% macro get_payment_methods() %} {{ return(get_column_values('payment_method', ref('raw_payments'))) }} {% endmacro %} ``` #### Use a macro from a package[​](#use-a-macro-from-a-package "Direct link to Use a macro from a package") Macros let analysts bring software engineering principles to the SQL they write. One of the features of macros that makes them even more powerful is their ability to be shared across projects. A number of useful dbt macros have already been written in the [dbt-utils package](https://github.com/dbt-labs/dbt-utils). For example, the [get\_column\_values](https://github.com/dbt-labs/dbt-utils#get_column_values-source) macro from dbt-utils could be used instead of the `get_column_values` macro we wrote ourselves (saving us a lot of time, but at least we learnt something along the way!). Install the [dbt-utils](https://hub.getdbt.com/dbt-labs/dbt_utils/latest/) package in your project (docs [here](https://docs.getdbt.com/docs/build/packages.md)), and then update your model to use the macro from the package instead: models/order\_payment\_method\_amounts.sql ```sql {%- set payment_methods = dbt_utils.get_column_values( table=ref('raw_payments'), column='payment_method' ) -%} select order_id, {%- for payment_method in payment_methods %} sum(case when payment_method = '{{payment_method}}' then amount end) as {{payment_method}}_amount {%- if not loop.last %},{% endif -%} {% endfor %} from {{ ref('raw_payments') }} group by 1 ``` You can then remove the macros that we built in previous steps. Whenever you're trying to solve a problem that you think others may have solved previously, it's worth checking the [dbt-utils](https://hub.getdbt.com/dbt-labs/dbt_utils/latest/) package to see if someone has shared their code! #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Using BigQuery DataFrames with dbt Python models [Back to guides](https://docs.getdbt.com/guides.md) BigQuery Google GCP BigFrames Quickstart Intermediate [Menu ]() #### Introduction[​](#introduction "Direct link to Introduction") In this guide, you'll learn how to set up dbt so you can use it with BigQuery DataFrames (BigFrames): * Build scalable data transformation pipelines using dbt and Google Cloud, with SQL and Python. * Leverage BigFrames from dbt for scalable BigQuery SQL. In addition to the existing dataproc/pyspark based submission methods for executing python models, you can now use the BigFrames submission method to execute Python models with pandas-like and scikit-like APIs, without the need of any Spark setup or knowledge. BigQuery DataFrames is an open source Python package that transpiles pandas and scikit-learn code to scalable BigQuery SQL. The dbt-bigquery adapter relies on the BigQuery Studio Notebook Executor Service to run the Python client side code. ##### Prerequisites[​](#prerequisites "Direct link to Prerequisites") * A [Google Cloud account](https://cloud.google.com/free) * A [dbt account](https://www.getdbt.com/signup/) * Basic to intermediate SQL and python. * Basic understanding of dbt fundamentals. We recommend the [dbt Fundamentals course](https://learn.getdbt.com). During setup, you’ll need to select the **BigQuery** adapter and enter values for your **Google Cloud Storage Bucket** and **Dataproc Region** in the dbt platform. See [Configure BigQuery in dbt platform](https://docs.getdbt.com/guides/dbt-python-bigframes.md?step=2#configure-bigquery-in-dbt-platform) for details. ##### What you'll build[​](#what-youll-build "Direct link to What you'll build") Here's what you'll build in two parts: * Google Cloud project setup * A one-time setup to configure the Google Cloud project you’ll be working with. * Build and Run the Python Model * Create, configure, and execute a Python model using BigQuery DataFrames and dbt. You will set up the environments, build scalable pipelines in dbt, and execute a python model. [![Implementation of the BigFrames submission method](/img/guides/gcp-guides/gcp-bigframes-architecture.png?v=2 "Implementation of the BigFrames submission method")](#)Implementation of the BigFrames submission method **Figure 1** - Implementation of the BigFrames submission method for dbt python models #### Configure Google Cloud[​](#configure-google-cloud "Direct link to Configure Google Cloud") The dbt BigFrames submission method supports both service account and OAuth credentials. You will use the service account in the following steps. 1. **Create a new Google Cloud Project** a. Your new project will have the following list of APIs already enabled, including BigQuery, which is required. * [Default APIs](https://cloud.google.com/service-usage/docs/enabled-service#default) b. Enable the BigQuery API which also enables the following additional APIs automatically * [BigQuery API's](https://cloud.google.com/bigquery/docs/enable-assets#automatic-api-enablement) c. Required API's: * **BigQuery API:** For all core BigQuery operations. * **Vertex AI API:** To use the Colab Enterprise executor service. * **Cloud Storage API:** For staging code and logs. * **IAM API:** For managing permissions. * **Compute Engine API:** As an underlying dependency for the notebook runtime environment. * **Dataform API:** For managing the notebook code assets within BigQuery. 2. **Create a service account and grant IAM permissions** This service account will be used by dbt to read and write data on BigQuery and use BigQuery Studio Notebooks. Create the service account with IAM permissions: ```python #Create Service Account gcloud iam service-accounts create dbt-bigframes-sa #Grant BigQuery User Role gcloud projects add-iam-policy-binding ${GOOGLE_CLOUD_PROJECT} --member=serviceAccount:dbt-bigframes-sa@${GOOGLE_CLOUD_PROJECT}.iam.gserviceaccount.com --role=roles/bigquery.user #Grant BigQuery Data Editor role. This can be restricted at dataset level gcloud projects add-iam-policy-binding ${GOOGLE_CLOUD_PROJECT} --member=serviceAccount:dbt-bigframes-sa@${GOOGLE_CLOUD_PROJECT}.iam.gserviceaccount.com --role=roles/bigquery.dataEditor #Grant Service Account user gcloud projects add-iam-policy-binding ${GOOGLE_CLOUD_PROJECT} --member=serviceAccount:dbt-bigframes-sa@${GOOGLE_CLOUD_PROJECT}.iam.gserviceaccount.com --role=roles/iam.serviceAccountUser #Grant Colab Entperprise User gcloud projects add-iam-policy-binding ${GOOGLE_CLOUD_PROJECT} --member=serviceAccount:dbt-bigframes-sa@${GOOGLE_CLOUD_PROJECT}.iam.gserviceaccount.com --role=roles/aiplatform.colabEnterpriseUser ``` When using a Shared VPC When using Colab Enterprise in a Shared VPC environment, additional roles are required for the following service accounts on the Shared VPC host project: * Vertex AI P4SA (`service-@gcp-sa-aiplatform.iam.gserviceaccount.com`): This service account always requires the Compute Network User (`roles/compute.networkUser`) role on the Shared VPC host project. Replace `` with the project number. * Colab Enterprise P6SA (`service-@gcp-sa-vertex-nb.iam.gserviceaccount.com`): This service account also needs the Compute Network User (`roles/compute.networkUser`) role on the Shared VPC host project. Replace `` with the project number. 3. *(Optional)* **Create a test BigQuery Dataset** Create a new BigQuery Dataset if you don't already have one: ```python #Create BQ dataset bq mk --location=${REGION} echo "${GOOGLE_CLOUD_PROJECT}" | tr '-' '_'_dataset ``` 4. **Create a GCS bucket to stage the python code, and store logs** For temporary log and code storage, please create a GCS bucket and assign the required permissions: ```python #Create GCS bucket gcloud storage buckets create gs://${GOOGLE_CLOUD_PROJECT}-bucket --location=${REGION} #Grant Storage Admin over the bucket to your SA gcloud storage buckets add-iam-policy-binding gs://${GOOGLE_CLOUD_PROJECT}-bucket --member=serviceAccount:dbt-bigframes-sa@${GOOGLE_CLOUD_PROJECT}.iam.gserviceaccount.com --role=roles/storage.admin ``` ##### Configure BigQuery in the dbt platform[​](#configure-bigquery-in-the-dbt-platform "Direct link to Configure BigQuery in the dbt platform") To set up your BigQuery DataFrames connection in the dbt platform, refer to the following steps: 1. Go to **Account settings** > **Connections**. Click **New connection**. 2. In the **Type** section, select **BigQuery**. 3. Select **BigQuery** as your adapter. 4. Under **Optional settings**, enter values for the following fields: * **Google Cloud Storage Bucket** (for example: `dbt_name_bucket`) * **Dataproc Region** (for example: `us-central1`) 5. Click **Save**. This is required so that BigFrames jobs execute correctly. Refer to [Connect to BigQuery](https://docs.getdbt.com/docs/cloud/connect-data-platform/connect-bigquery.md) for more info on how to connect to BigQuery in the dbt platform. #### Create, configure, and execute your Python models[​](#create-configure-and-execute-your-python-models "Direct link to Create, configure, and execute your Python models") 1. In your dbt project, create a SQL model in your models directory, ending in the `.sql` file extension. Name it `my_sql_model.sql`. 2. In the file, copy this SQL into it. ```sql select 1 as foo, 2 as bar ``` 3. Now create a new model file in the models directory, named `my_first_python_model.py`. 4. In the `my_first_python_model.py` file, add this code: ```python def model(dbt, session): dbt.config(submission_method="bigframes") bdf = dbt.ref("my_sql_model") #loading from prev step return bdf ``` 5. Configure the BigFrames submission method by using either: a. Project level configuration via dbt\_project.yml ```python models: my_dbt_project: submission_method: bigframes python_models: +materialized: view ``` or b. The Python code via dbt.config in the my\_first\_python\_model.py file ```python def model(dbt, session): dbt.config(submission_method="bigframes") # rest of the python code... ``` 6. Run `dbt run` 7. You can view the logs in [dbt logs](https://docs.getdbt.com/reference/events-logging.md). You can optionally view the codes and logs (including previous executions) from the [Colab Enterprise Executions](https://console.cloud.google.com/vertex-ai/colab/execution-jobs) tab and [GCS bucket](https://console.cloud.google.com/storage/browser) from the GCP console. 8. Congrats! You just created your first two python models to run on BigFrames! #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ## Navigation Options ### navigation-options #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ## Platform ### About Canvas EnterpriseEnterprise + ### About Canvas [Enterprise](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing")[Enterprise +](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing") Canvas helps you quickly access and transform data through a visual, drag-and-drop experience and with a built-in AI for custom code generation. Canvas allows organizations to enjoy the many benefits of code-driven development—such as increased precision, ease of debugging, and ease of validation — while retaining the flexibility to have different contributors develop wherever they are most comfortable. Users can also take advantage of built-in AI for custom code generation, making it an end-to-end frictionless experience. These models compile directly to SQL and are indistinguishable from other dbt models in your projects: * Visual models are version-controlled in your backing Git provider. * All models are accessible across projects in [Mesh](https://docs.getdbt.com/best-practices/how-we-mesh/mesh-1-intro.md). * Models can be materialized into production through [dbt orchestration](https://docs.getdbt.com/docs/deploy/deployments.md), or be built directly into a user's development schema. * Integrate with [Catalog](https://docs.getdbt.com/docs/explore/explore-projects.md) and the [Studio IDE](https://docs.getdbt.com/docs/cloud/studio-ide/develop-in-studio.md). [![Create or edit dbt models with Canvas, enabling everyone to develop with dbt through a drag-and-drop experience inside of dbt.](/img/docs/dbt-cloud/canvas/canvas.png?v=2 "Create or edit dbt models with Canvas, enabling everyone to develop with dbt through a drag-and-drop experience inside of dbt.")](#)Create or edit dbt models with Canvas, enabling everyone to develop with dbt through a drag-and-drop experience inside of dbt. #### Canvas prerequisites[​](#canvas-prerequisites "Direct link to Canvas prerequisites") Before using Canvas, you should: * Have a [dbt Enterprise or Enterprise+](https://www.getdbt.com/pricing) account. * Have a [developer license](https://docs.getdbt.com/docs/cloud/manage-access/seats-and-users.md) with developer credentials set up. * Be using one of the following adapters: * Bigquery * Databricks * Redshift * Snowflake * Trino * You can access the Canvas with adapters not listed, but some features may be missing at this time. * Use [GitHub](https://docs.getdbt.com/docs/cloud/git/connect-github.md), [GitLab](https://docs.getdbt.com/docs/cloud/git/connect-gitlab.md), or [Azure DevOps](https://docs.getdbt.com/docs/cloud/git/connect-azure-devops.md) as your Git provider, connected to dbt via HTTPS. * SSH connections aren't supported at this time. * Self-hosted or on-premises deployments of any Git provider aren't supported for Canvas at this time. * Have an existing dbt project already created with a Staging or Production run completed. * Verify your Development environment is on a supported [release track](https://docs.getdbt.com/docs/dbt-versions/cloud-release-tracks.md) to receive ongoing updates. * Have read-only access to the [Staging environment](https://docs.getdbt.com/docs/deploy/deploy-environments.md#staging-environment) with the data to be able to execute `run` in the Canvas. To customize the required access for the Canvas user group, refer to [Set up environment-level permissions](https://docs.getdbt.com/docs/cloud/manage-access/environment-permissions-setup.md) for more information. * Have the AI-powered features toggle enabled (for [Copilot integration](https://docs.getdbt.com/docs/cloud/dbt-copilot.md)). #### Feedback[​](#feedback "Direct link to Feedback") Please note, always review AI-generated code and content as it may produce incorrect results. To give feedback, please reach out to your dbt Labs account team. We appreciate your feedback and suggestions as we improve Canvas. #### Resources[​](#resources "Direct link to Resources") Learn more about Canvas: * How to [use Canvas](https://docs.getdbt.com/docs/cloud/use-canvas.md) * The Canvas [quickstart guide](https://docs.getdbt.com/guides/canvas.md) * [Canvas fundamentals course](https://learn.getdbt.com/learn/course/canvas-fundamentals) on [dbt Learn](https://learn.getdbt.com/catalog) #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### About dbt Copilot StarterEnterpriseEnterprise + ### About dbt Copilot [Starter](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing")[Enterprise](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing")[Enterprise +](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing") Copilot is a powerful, AI-powered assistant fully integrated into your dbt experience—designed to accelerate your analytics workflows. Copilot embeds AI-driven assistance across every stage of the [analytics development life cycle (ADLC)](https://www.getdbt.com/resources/guides/the-analytics-development-lifecycle) and harnesses rich metadata—capturing relationships, lineage, and context — so you can deliver refined, trusted data products at speed. tip Copilot is available on Starter, Enterprise, and Enterprise+ accounts. [Book a demo](https://www.getdbt.com/contact) to see how AI-driven development can streamline your workflow. [![Example of using dbt Copilot to generate documentation in the IDE](/img/docs/dbt-cloud/cloud-ide/dbt-copilot-doc.gif?v=2 "Example of using dbt Copilot to generate documentation in the IDE")](#)Example of using dbt Copilot to generate documentation in the IDE #### How dbt Copilot works[​](#how-dbt-copilot-works "Direct link to How dbt Copilot works") Copilot enhances efficiency by automating repetitive tasks while ensuring data privacy and security. It works as follows: * Access Copilot through: * The [Studio IDE](https://docs.getdbt.com/docs/cloud/studio-ide/develop-copilot.md) to generate documentation, tests, semantic models. Access the [Developer agent](https://docs.getdbt.com/docs/dbt-ai/developer-agent.md) from the same Copilot panel to build, refactor, and validate models end-to-end. * The [Canvas ](https://docs.getdbt.com/docs/cloud/build-canvas-copilot.md)to generate SQL code using natural language prompts. [Enterprise](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing")[Enterprise +](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing") * The [Insights](https://docs.getdbt.com/docs/explore/dbt-insights.md) to generate SQL queries for analysis using natural language prompts. [Enterprise](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing")[Enterprise +](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing") * Copilot gathers metadata (like column names, model SQL, documentation) but never accesses row-level warehouse data. * The metadata and user prompts are sent to the AI provider (in this case, OpenAI) through API calls for processing. * The AI-generated content is returned to dbt for you to review, edit, and save within your project files. * Copilot does not use warehouse data to train AI models. * No sensitive data persists on dbt Labs' systems, except for usage data. * Client data, including any personal or sensitive data inserted into the query by the user, is deleted within 30 days by OpenAI. * Copilot uses a best practice style guide to ensure consistency across teams. tip Copilot accelerates, but doesn’t replace, your analytics engineer. It helps deliver better data products faster, but always review AI-generated content, as it may be incorrect. To learn about prompt best practices, check out the [Prompt cookbook](https://docs.getdbt.com/guides/prompt-cookbook.md). #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### About dbt platform profiles dbt platform profiles define the connections, credentials, and attributes you use to connect to a data warehouse. Assign profiles to [deployment environments](https://docs.getdbt.com/docs/dbt-cloud-environments.md#deployment-environment) and reuse those profiles in other deployment environments within the same project. You can manage profiles programmatically using our [API documentation](https://docs.getdbt.com/dbt-cloud/api-v3#/operations/List%20Profiles). ###### Considerations[​](#considerations "Direct link to Considerations") * Profiles don't apply to development environments because of the unique configurations and individual credentials applied. * The Semantic Layer configuration isn't supported with profiles yet. #### Create a profile[​](#create-a-profile "Direct link to Create a profile") new feature rollout dbt automatically creates a new project-level profile for each deployment environment and populates it with your existing connection, credentials, and extended attributes. You don't need to take any action to create profiles for your existing projects. You can create profiles from either the project or the environment settings. No matter which approach you take, dbt creates the profile at the project level. Profiles you create in one project won't be visible in others. To create a new profile: * From project settings * From environment settings 1. From the main menu, navigate to your project's **Dashboard**. 2. Click **Settings**. 3. Scroll down to the **Profiles** section and click **Create new profile**. [![Creating a profile from project settings.](/img/docs/dbt-cloud/profile-from-project.png?v=2 "Creating a profile from project settings.")](#)Creating a profile from project settings. 1. From the main menu, click **Orchestration** and select **Environments**. 2. Click an available deployment environment. 3. Click **Settings**, then click **Edit**. 4. Navigate to the **Connection profiles** section, click the three-dot menu next to an existing profile, and select **Change profile**. 5. Click the **Profile** dropdown and select **Create new profile**. [![Creating a profile from the environment settings.](/img/docs/dbt-cloud/profile-from-environment.png?v=2 "Creating a profile from the environment settings.")](#)Creating a profile from the environment settings. The following steps are the same regardless of which approach you take: 1. Give the profile a name that's unique across all projects in the account, easy to identify, and adheres to the naming policy: * Starts with a letter * Ends with a letter or number * Contains only letters, numbers, dashes, or underscores * Has no consecutive dashes or underscores 2. From **Connection details**, select a connection from the list of available [global connections](https://docs.getdbt.com/docs/cloud/connect-data-platform/about-connections.md#connection-management) or add a new connection. 3. Configure the **Deployment credentials** for your warehouse connection. 4. Add any [**Extended attributes**](https://docs.getdbt.com/docs/dbt-cloud-environments.md#extended-attributes) you need. 5. Click **Save** at the top of the screen. [![Sample of a configured profile.](/img/docs/dbt-cloud/profile-sample.png?v=2 "Sample of a configured profile.")](#)Sample of a configured profile. Repeat these steps until you've created all the profiles you need for your project's deployment environments. #### Assign a profile[​](#assign-a-profile "Direct link to Assign a profile") You configure profiles when you create a deployment environment. For accounts that already have environments configured when you enable profiles, dbt automatically creates and assigns a default profile to all projects. To assign a different profile, update the deployment environment settings: 1. From the main menu, click **Orchestration** and select **Environments**. 2. Click an available deployment environment. 3. Click **Settings**, then click **Edit**. 4. Navigate to the **Connection profiles** section, click the three-dot menu next to an existing profile, and select **Change profile**. 5. Click the **Profile** dropdown and select the new profile to assign. #### Permissions and access to profiles[​](#permissions-and-access-to-profiles "Direct link to Permissions and access to profiles") Profiles are created at the project level. Only users with permission to edit the project can create profiles and anyone with permission to create or edit deployment environments in that project can assign that profile and its credentials to those environments. To avoid unintended access, only grant permission sets like **Job Admin** or **Project Admin** to users who should have access to all credentials in a project. Be mindful that profiles created at the project level can be used to configure credentials for any deployment environment in that project. For more information on permission sets, see [Enterprise permissions](https://docs.getdbt.com/docs/cloud/manage-access/enterprise-permissions.md). #### FAQs[​](#faqs "Direct link to FAQs")  Do I need to create profiles for all of my existing projects? You don't need to take any action. dbt automatically creates profiles for all existing projects and deployment environments based on the existing connection, credentials, and extended attributes.  Are there any changes to development environments? Not at this time. Profiles only apply to deployment environments.  What happens if I change my connection details, credentials, or attributes? Any profiles using those settings automatically update with the new information.  What if I use APIs to configure project settings? Existing APIs continue to work and automatically map to a profile behind the scenes. You won't need to take any manual action unless you use APIs to create a deployment environment with no credentials configured. This is a rare occurrence unique to APIs, but it's the only scenario where dbt wouldn't create a profile. Profile-specific APIs are available. Check out our [API documentation](https://docs.getdbt.com/docs/dbt-cloud-apis/overview.md) for more information.  Does the Semantic Layer support profiles? Semantic Layer configuration isn't supported with profiles yet. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### About dbt setup dbt platform (formerly dbt Cloud) is the fastest and most reliable way to deploy your dbt jobs. It contains a myriad of settings that can be configured by admins, from the necessities (data platform integration) to security enhancements (SSO) and quality-of-life features (RBAC). This portion of our documentation will take you through the various settings in the dbt UI, including: * [Connecting to a data platform](https://docs.getdbt.com/docs/cloud/connect-data-platform/about-connections.md) * Configuring access to [GitHub](https://docs.getdbt.com/docs/cloud/git/connect-github.md), [GitLab](https://docs.getdbt.com/docs/cloud/git/connect-gitlab.md), or your own [git repo URL](https://docs.getdbt.com/docs/cloud/git/import-a-project-by-git-url.md). * [Managing users and licenses](https://docs.getdbt.com/docs/cloud/manage-access/seats-and-users.md) * [Configuring secure access](https://docs.getdbt.com/docs/cloud/manage-access/about-user-access.md) For steps on installing dbt development tools, refer to the [dbt CLI](https://docs.getdbt.com/docs/cloud/cloud-cli-installation.md) or the [Studio IDE (browser-based)](https://docs.getdbt.com/docs/cloud/studio-ide/develop-in-studio.md). These settings are intended for dbt administrators. If you need a more detailed first-time setup guide for specific data platforms, read our [quickstart guides](https://docs.getdbt.com/guides.md) or follow the [dbt platform configuration checklist](https://docs.getdbt.com/docs/configuration-checklist.md). If you want a more in-depth learning experience, we recommend taking the dbt Fundamentals on our [dbt Learn site](https://learn.getdbt.com/). #### Prerequisites[​](#prerequisites "Direct link to Prerequisites") * To set up dbt, you'll need to have a dbt account with administrator access. If you still need to create a dbt account, [sign up today](https://getdbt.com) on our North American servers or [contact us](https://getdbt.com/contact) for international options. * To have the best experience using dbt, we recommend you use modern and up-to-date web browsers like Chrome, Safari, Edge, and Firefox. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### About Studio IDE The dbt integrated development environment (Studio IDE) is a single interface for building, testing, running, and version-controlling dbt projects from your browser. With the Cloud Studio IDE, you can compile dbt code into SQL and run it against your database directly. With the Cloud Studio IDE, you can: * Write modular SQL models with select statements and the ref() function, * Compile dbt code into SQL and execute it against your database directly, * Test every model before deploying them to production, * Generate and view documentation of your dbt project, * Leverage git and version-control your code from your browser with a couple of clicks, * Create and test Python models: * Compile Python models to see the full function that gets executed in your data platform * See Python models in DAG in dbt version 1.3 and higher * Currently, you can't preview python models * Visualize a directed acyclic graph (DAG), and more. [![The Studio IDE in dark mode](/img/docs/dbt-cloud/cloud-ide/cloud-ide-v2.png?v=2 "The Studio IDE in dark mode")](#)The Studio IDE in dark mode For more information, read the complete [Cloud Studio IDE guide](https://docs.getdbt.com/docs/cloud/studio-ide/develop-in-studio.md). #### Related docs[​](#related-docs "Direct link to Related docs") * [Studio IDE user interface](https://docs.getdbt.com/docs/cloud/studio-ide/ide-user-interface.md) * [Keyboard shortcuts](https://docs.getdbt.com/docs/cloud/studio-ide/keyboard-shortcuts.md) #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### About the Studio IDE The dbt integrated development environment (Studio IDE) is a single web-based interface for building, testing, running, and version-controlling dbt projects. It compiles dbt code into SQL and executes it directly on your database. The Studio IDE offers several [keyboard shortcuts](https://docs.getdbt.com/docs/cloud/studio-ide/keyboard-shortcuts.md) and [editing features](https://docs.getdbt.com/docs/cloud/studio-ide/ide-user-interface.md#editing-features) for faster and efficient development and governance: * Syntax highlighting for SQL — Makes it easy to distinguish different parts of your code, reducing syntax errors and enhancing readability. * AI copilot — Use [Copilot](https://docs.getdbt.com/docs/cloud/dbt-copilot.md), an AI-powered assistant, to [generate code](https://docs.getdbt.com/docs/cloud/studio-ide/develop-copilot.md#generate-and-edit-code) using natural language prompts and [create resources](https://docs.getdbt.com/docs/cloud/studio-ide/develop-copilot.md#generate-resources) such as documentation, tests, and semantic models. With the [Developer agent](https://docs.getdbt.com/docs/dbt-ai/developer-agent.md), you can generate or refactor models from natural language with plan-based, auditable changes. See [Develop with Copilot](https://docs.getdbt.com/docs/cloud/studio-ide/develop-copilot.md) for more details. * Auto-completion — Suggests table names, arguments, and column names as you type, saving time and reducing typos. * Code [formatting and linting](https://docs.getdbt.com/docs/cloud/studio-ide/lint-format.md) — Helps standardize and fix your SQL code effortlessly. * Navigation tools — Easily move around your code, jump to specific lines, find and replace text, and navigate between project files. * Version control — Manage code versions with a few clicks. * Project documentation — Generate and view your [project documentation](#build-and-document-your-projects) for your dbt project in real-time. * Build, test, and run button — Build, test, and run your project with a button click or by using the Studio IDE command bar. These [features](#studio-ide-features) create a powerful editing environment for efficient SQL coding, suitable for both experienced and beginner developers. [![The Studio IDE includes version control, files/folders, an editor, a command/console, and more.](/img/docs/dbt-cloud/cloud-ide/ide-basic-layout.png?v=2 "The Studio IDE includes version control, files/folders, an editor, a command/console, and more.")](#)The Studio IDE includes version control, files/folders, an editor, a command/console, and more. [![Enable dark mode for a great viewing experience in low-light environments.](/img/docs/dbt-cloud/cloud-ide/cloud-ide-v2.png?v=2 "Enable dark mode for a great viewing experience in low-light environments.")](#)Enable dark mode for a great viewing experience in low-light environments. Disable ad blockers To improve your experience using dbt, we suggest that you turn off ad blockers. This is because some project file names, such as `google_adwords.sql`, might resemble ad traffic and trigger ad blockers. #### Prerequisites[​](#prerequisites "Direct link to Prerequisites") * A [dbt account](https://www.getdbt.com/signup) and [Developer seat license](https://docs.getdbt.com/docs/cloud/manage-access/seats-and-users.md) * A git repository set up and git provider must have `write` access enabled. See [Connecting your GitHub Account](https://docs.getdbt.com/docs/cloud/git/connect-github.md) or [Importing a project by git URL](https://docs.getdbt.com/docs/cloud/git/import-a-project-by-git-url.md) for detailed setup instructions * A dbt project connected to a [data platform](https://docs.getdbt.com/docs/cloud/connect-data-platform/about-connections.md) * A [development environment and development credentials](#get-started-with-the-studio-ide) set up * The environment must be on dbt version 1.0 or higher #### Studio IDE features[​](#studio-ide-features "Direct link to Studio IDE features") The Studio IDE comes with features that make it easier for you to develop, build, compile, run, and test data models. To understand how to navigate the Studio IDE and its user interface elements, refer to the [Studio IDE user interface](https://docs.getdbt.com/docs/cloud/studio-ide/ide-user-interface.md) page. | Feature | Description | | ----------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | | [**Studio IDE shortcuts**](https://docs.getdbt.com/docs/cloud/studio-ide/keyboard-shortcuts.md) | You can access a variety of [commands and actions](https://docs.getdbt.com/docs/cloud/studio-ide/keyboard-shortcuts.md) in the Studio IDE by choosing the appropriate keyboard shortcut. Use the shortcuts for common tasks like building modified models or resuming builds from the last failure. | | **IDE version control** | The Studio IDE version control section and git button allow you to apply the concept of [version control](https://docs.getdbt.com/docs/cloud/git/version-control-basics.md) to your project directly into the Studio IDE.

- Create or change branches, execute git commands using the git button.
- Commit or revert individual files by right-clicking the edited file
- [Resolve merge conflicts](https://docs.getdbt.com/docs/cloud/git/merge-conflicts.md)
- Link to the repo directly by clicking the branch name
- Edit, format, or lint files and execute dbt commands in your primary protected branch, and commit to a new branch.
- Use Git diff view to view what has been changed in a file before you make a pull request.
- Use the **Prune branches** [button](https://docs.getdbt.com/docs/cloud/studio-ide/ide-user-interface.md#prune-branches-modal) to delete local branches that have been deleted from the remote repository, keeping your branch management tidy.
- Sign your [git commits](https://docs.getdbt.com/docs/cloud/studio-ide/git-commit-signing.md) to mark them as 'Verified'. [Enterprise](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing")[Enterprise +](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing") | | **Preview and Compile button** | You can [compile or preview](https://docs.getdbt.com/docs/cloud/studio-ide/ide-user-interface.md#console-section) code, a snippet of dbt code, or one of your dbt models after editing and saving. | | [**Copilot**](https://docs.getdbt.com/docs/cloud/studio-ide/develop-copilot.md) | A powerful AI-powered assistant that can [generate code](https://docs.getdbt.com/docs/cloud/studio-ide/develop-copilot.md#generate-and-edit-code) using natural language, and [generate resources](https://docs.getdbt.com/docs/cloud/studio-ide/develop-copilot.md#generate-resources) (like documentation, tests, metrics, and semantic models) for you — with the click of a button. [Starter](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing")[Enterprise](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing")[Enterprise +](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing"). | | [**Developer agent**](https://docs.getdbt.com/docs/dbt-ai/developer-agent.md) | Autonomous AI agent in the Studio IDE that writes or refactors dbt models from natural language, validates with dbt Fusion engine, and runs against your warehouse with full context. [Beta](https://docs.getdbt.com/docs/dbt-versions/product-lifecycles "Go to https://docs.getdbt.com/docs/dbt-versions/product-lifecycles"). | | **Build, test, and run button** | Build, test, and run your project with the click of a button or by using the command bar. | | **Command bar** | You can enter and run commands from the command bar at the bottom of the Studio IDE. Use the [rich model selection syntax](https://docs.getdbt.com/reference/node-selection/syntax.md) to execute [dbt commands](https://docs.getdbt.com/reference/dbt-commands.md) directly within dbt. You can also view the history, status, and logs of previous runs by clicking History on the left of the bar. | | **Drag and drop** | Drag and drop files located in the file explorer, and use the file breadcrumb on the top of the Studio IDE for quick, linear navigation. Access adjacent files in the same file by right-clicking on the breadcrumb file. | | **Organize tabs and files** | - Move your tabs around to reorganize your work in the IDE
- Right-click on a tab to view and select a list of actions, including duplicate files
- Close multiple, unsaved tabs to batch save your work
- Double click files to rename files | | **Find and replace** | - Press Command-F or Control-F to open the find-and-replace bar in the upper right corner of the current file in the IDE. The IDE highlights your search results in the current file and code outline
- You can use the up and down arrows to see the match highlighted in the current file when there are multiple matches
- Use the left arrow to replace the text with something else | | **Multiple selections** | You can make multiple selections for small and simultaneous edits. The below commands are a common way to add more cursors and allow you to insert cursors below or above with ease.

- Option-Command-Down arrow or Ctrl-Alt-Down arrow
- Option-Command-Up arrow or Ctrl-Alt-Up arrow
- Press Option and click on an area or Press Ctrl-Alt and click on an area
| | **Lint and Format** | [Lint and format](https://docs.getdbt.com/docs/cloud/studio-ide/lint-format.md) your files with a click of a button, powered by SQLFluff, sqlfmt, Prettier, and Black. | | **dbt autocomplete** | Autocomplete features to help you develop faster:

- Use `ref` to autocomplete your model names
- Use `source` to autocomplete your source name + table name
- Use `macro` to autocomplete your arguments
- Use `env var` to autocomplete env var
- Start typing a hyphen (-) to use in-line autocomplete in a YAML file
- Automatically create models from dbt sources with a click of a button. | | **DAG in the IDE** | You can see how models are used as building blocks from left to right to transform your data from raw sources into cleaned-up modular derived pieces and final outputs on the far right of the DAG. The default view is 2+model+2 (defaults to display 2 nodes away), however, you can change it to +model+ (full DAG). Note the `--exclude` flag isn't supported. | | **Status bar** | This area provides you with useful information about your Studio IDE and project status. You also have additional options like enabling light or dark mode, restarting the Studio IDE, or [recloning your repo](https://docs.getdbt.com/docs/cloud/git/version-control-basics.md). | | **Dark mode** | From the status bar in the Studio IDE, enable dark mode for a great viewing experience in low-light environments. | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | ##### Code generation[​](#code-generation "Direct link to Code generation") The Studio IDE comes with **CodeGenCodeLens**, a powerful feature that simplifies creating models from your sources with a click of a button. To use this feature, click on the **Generate model** action next to each table in the source YAML file(s). It automatically creates a basic starting staging model for you to expand on. This feature helps streamline your workflow by automating the first steps of model generation. ##### dbt YAML validation[​](#dbt-yaml-validation "Direct link to dbt YAML validation") Use dbt-jsonschema to validate dbt YAML files, helping you leverage the autocomplete and assistance capabilities of the Studio IDE. This also provides immediate feedback on YAML file structure and syntax, helping you make sure your project configurations meet the required standards. #### Get started with the Studio IDE[​](#get-started-with-the-studio-ide "Direct link to Get started with the Studio IDE") In order to start experiencing the great features of the Studio IDE, you need to first set up a [dbt development environment](https://docs.getdbt.com/docs/dbt-cloud-environments.md). In the following steps, we outline how to set up developer credentials and access the Studio IDE. If you're creating a new project, you will automatically configure this during the project setup. The Studio IDE uses developer credentials to connect to your data platform. These developer credentials should be specific to your user and they should *not* be super user credentials or the same credentials that you use for your production deployment of dbt. Set up your developer credentials: 1. Navigate to your **Credentials** under **Your Profile** settings, which you can access at `https://YOUR_ACCESS_URL/settings/profile#credentials`, replacing `YOUR_ACCESS_URL` with the [appropriate Access URL](https://docs.getdbt.com/docs/cloud/about-cloud/access-regions-ip-addresses.md) for your region and plan. 2. Select the relevant project in the list. 3. Click **Edit** on the bottom right of the page. 4. Enter the details under **Development Credentials**. 5. Click **Save.** [![Configure developer credentials in your profile](/img/docs/dbt-cloud/refresh-ide/dev-credentials.png?v=2 "Configure developer credentials in your profile")](#)Configure developer credentials in your profile 6. Navigate to the Studio IDE by clicking **Studio** in the left menu. 7. Initialize your project and familiarize yourself with the Studio IDE and its delightful [features](#studio-ide-features). Nice job, you're ready to start developing and building models 🎉! ##### Considerations[​](#considerations "Direct link to Considerations") * To improve your experience using dbt, we suggest that you turn off ad blockers. This is because some project file names, such as `google_adwords.sql`, might resemble ad traffic and trigger ad blockers. * To preserve performance, there's a file size limitation for repositories over 6 GB. If you have a repo over 6 GB, please contact [dbt Support](mailto:support@getdbt.com) before running dbt. * The Studio IDE's idle session timeout is one hour. *  About the start up process and work retention The following sections describe the start-up process and work retention in the Studio IDE. * ###### Start-up process[​](#start-up-process "Direct link to Start-up process") There are three start-up states when using or launching the Studio IDE: * **Creation start —** This is the state where you are starting the IDE for the first time. You can also view this as a *cold start* (see below), and you can expect this state to take longer because the git repository is being cloned. * **Cold start —** This is the process of starting a new develop session, which will be available for you for one hour. The environment automatically turns off one hour after the last activity. This includes compile, preview, or any dbt invocation, however, it *does not* include editing and saving a file. * **Hot start —** This is the state of resuming an existing or active develop session within one hour of the last activity.

* ###### Work retention[​](#work-retention "Direct link to Work retention") The Studio IDE needs explicit action to save your changes. There are three ways your work is stored: * **Unsaved, local code —** The browser stores your code only in its local storage. In this state, you might need to commit any unsaved changes in order to switch branches or browsers. If you have saved and committed changes, you can access the "Change branch" option even if there are unsaved changes. But if you attempt to switch branches without saving changes, a warning message will appear, notifying you that you will lose any unsaved changes. [![If you attempt to switch branches without saving changes, a warning message will appear, telling you that you will lose your changes.](/img/docs/dbt-cloud/cloud-ide/ide-unsaved-modal.png?v=2 "If you attempt to switch branches without saving changes, a warning message will appear, telling you that you will lose your changes.")](#)If you attempt to switch branches without saving changes, a warning message will appear, telling you that you will lose your changes. * **Saved but uncommitted code —** When you save a file, the data gets stored in durable, long-term storage, but isn't synced back to git. To switch branches using the **Change branch** option, you must "Commit and sync" or "Revert" changes. Changing branches isn't available for saved-but-uncommitted code. This is to ensure your uncommitted changes don't get lost. * **Committed code —** This is stored in the branch with your git provider and you can check out other (remote) branches. #### Build and document your projects[​](#build-and-document-your-projects "Direct link to Build and document your projects") * **Build, compile, and run projects** — You can *build*, *compile*, *run*, and *test* dbt projects using the command bar or **Build** button. Use the **Build** button to quickly build, run, or test the model you're working on. The Studio IDE will update in real time when you run models, tests, seeds, and operations. * If a model or test fails, dbt makes it easy for you to view and download the run logs for your dbt invocations to fix the issue. * Use dbt's [rich model selection syntax](https://docs.getdbt.com/reference/node-selection/syntax.md) to [run dbt commands](https://docs.getdbt.com/reference/dbt-commands.md) directly within dbt. * Leverage [environments variables](https://docs.getdbt.com/docs/build/environment-variables.md#special-environment-variables) to dynamically use the Git branch name. For example, using the branch name as a prefix for a development schema. * Run [MetricFlow commands](https://docs.getdbt.com/docs/build/metricflow-commands.md) to create and manage metrics in your project with the [Semantic Layer](https://docs.getdbt.com/docs/use-dbt-semantic-layer/dbt-sl.md). * **Generate your YAML configurations with Copilot** — [dbt Copilot](https://docs.getdbt.com/docs/cloud/dbt-copilot.md) is a powerful artificial intelligence (AI) feature that helps automate development in dbt. It can [generate code](https://docs.getdbt.com/docs/cloud/studio-ide/develop-copilot.md#generate-and-edit-code) using natural language, and [generate resources](https://docs.getdbt.com/docs/cloud/studio-ide/develop-copilot.md#generate-resources) (like documentation, tests, metrics,and semantic models) for you directly in the Studio IDE, so you can accomplish more in less time. [Starter](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing")[Enterprise](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing")[Enterprise +](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing") * **Build and view your project's docs** — The Studio IDE makes it possible to [build and view](https://docs.getdbt.com/docs/explore/build-and-view-your-docs.md) documentation for your dbt project while your code is still in development. With this workflow, you can inspect and verify what your project's generated documentation will look like before your changes are released to production. #### Related docs[​](#related-docs "Direct link to Related docs") * [How we style our dbt projects](https://docs.getdbt.com/best-practices/how-we-style/0-how-we-style-our-dbt-projects.md) * [User interface](https://docs.getdbt.com/docs/cloud/studio-ide/ide-user-interface.md) * [Version control basics](https://docs.getdbt.com/docs/cloud/git/version-control-basics.md) * [dbt commands](https://docs.getdbt.com/reference/dbt-commands.md) #### FAQs[​](#faqs "Direct link to FAQs")  Is there a cost to using the Studio IDE? Not at all! You can use dbt when you sign up for the [Free Developer plan](https://www.getdbt.com/pricing/), which comes with one developer seat. If you'd like to access more features or have more developer seats, you can upgrade your account to the Starter, Enterprise, or Enterprise+ plan.
Refer to [dbt pricing plans](https://www.getdbt.com/pricing/) for more details.  Can I be a contributor to dbt As a proprietary product, dbt's source code isn't available for community contributions. If you want to build something in the dbt ecosystem, we encourage you to review [this article](https://docs.getdbt.com/community/contributing/contributing-coding.md) about contributing to a dbt package, a plugin, dbt-core, or this documentation site. Participation in open-source is a great way to level yourself up as a developer, and give back to the community.  What is the difference between developing on the Studio IDE, the dbt CLI, and dbt Core? You can develop dbt using the web-based IDE in dbt or on the command line interface using the dbt CLI or open-source dbt Core, all of which enable you to execute dbt commands. The key distinction between the dbt CLI and dbt Core is the dbt CLI is tailored for dbt's infrastructure and integrates with all its features: * Studio IDE: [dbt](https://docs.getdbt.com/docs/cloud/about-cloud/dbt-cloud-features) is a web-based application that allows you to develop dbt projects with the IDE, includes a purpose-built scheduler, and provides an easier way to share your dbt documentation with your team. The IDE is a faster and more reliable way to deploy your dbt models and provides a real-time editing and execution environment for your dbt project. * dbt CLI: [The dbt CLI](https://docs.getdbt.com/docs/cloud/cloud-cli-installation) allows you to run dbt commands against your dbt dbt development environment from your local command line or code editor. It supports cross-project ref, speedier, lower-cost builds, automatic deferral of build artifacts, and more. * dbt Core: dbt Core is an [open-sourced](https://github.com/dbt-labs/dbt) software that's freely available. You can build your dbt project in a code editor, and run dbt commands from the command line #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Access, Regions, & IP addresses dbt is [hosted](https://docs.getdbt.com/docs/cloud/about-cloud/architecture.md) in multiple regions across the following service providers: * [Amazon Web Services](#AWS) * [Google Cloud Platform](#GCP) * [Microsoft Azure](#Azure) Your dbt account will always connect to your data platform or git provider from the below IP addresses. Be sure to allow traffic from these IPs in your firewall, and include them in any database grants. * [dbt Enterprise-tier](https://www.getdbt.com/pricing/) plans can choose to have their account hosted in any of the regions listed in the following table. * Organizations **must** choose a single region per dbt account. To run dbt in multiple regions, we recommend using multiple dbt accounts. #### Amazon Web Services (AWS)[​](#AWS "Direct link to Amazon Web Services (AWS)") | | | - | | Region | Location | Access URL | IP addresses | Available plans | Status page link | | ------------------------------------ | --------------------------- | --------------------------- | --------------------------------------------------------------------------------------------------------- | --------------------------------------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | North America | AWS us-east-1 (N. Virginia) | ACCOUNT\_PREFIX.us1.dbt.com | 52.45.144.63
54.81.134.249
52.22.161.231
52.3.77.232
3.214.191.130
34.233.79.135 | [All dbt platform plans](https://www.getdbt.com/pricing/) | **Multi-tenant:**
[US AWS](https://status.getdbt.com/us-aws)

**Cell based:**
[US Cell 1 AWS](https://status.getdbt.com/us-cell-1-aws)
[US Cell 2 AWS](https://status.getdbt.com/us-cell-2-aws)
[US Cell 3 AWS](https://status.getdbt.com/us-cell-3-aws) | | EMEA | eu-central-1 (Frankfurt) | ACCOUNT\_PREFIX.eu1.dbt.com | 3.123.45.39
3.126.140.248
3.72.153.148 | All Enterprise plans | [EMEA AWS](https://status.getdbt.com/emea-aws) | | APAC | ap-southeast-2 (Sydney) | ACCOUNT\_PREFIX.au1.dbt.com | 52.65.89.235
3.106.40.33
13.239.155.206
| All Enterprise plans | [APAC AWS](https://status.getdbt.com/apac-aws) | | Japan | ap-northeast-1 (Tokyo) | ACCOUNT\_PREFIX.jp1.dbt.com | 35.76.76.152
54.238.211.79
13.115.236.233
| All Enterprise plans | [JP Cell 1 AWS](https://status.getdbt.com/jp-cell-1-aws) | | Virtual Private dbt or Single tenant | Customized | Customized | Ask [Support](https://docs.getdbt.com/community/resources/getting-help.md#dbt-cloud-support) for your IPs | All Enterprise plans | Customized | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | --- ### Billing dbt offers a variety of [plans and pricing](https://www.getdbt.com/pricing/) to fit your organization’s needs. With flexible billing options that appeal to large enterprises and small businesses and [server availability](https://docs.getdbt.com/docs/cloud/about-cloud/access-regions-ip-addresses.md) worldwide, dbt platform is the fastest and easiest way to begin transforming your data. #### How does dbt pricing work?[​](#how-does-dbt-pricing-work "Direct link to How does dbt pricing work?") As a customer, you pay for the number of seats you have and the amount of usage consumed each month. Seats are billed primarily on the amount of Developer and Read licenses purchased. Usage is based on the number of [Successful Models Built](#what-counts-as-a-successful-model-built) and, if purchased and used, Semantic Layer [Queried Metrics](#what-counts-as-a-queried-metric) subject to reasonable usage. All billing computations are conducted in Coordinated Universal Time (UTC). ##### What counts as a seat license?[​](#what-counts-as-a-seat-license "Direct link to What counts as a seat license?") You can learn more about allocating users to your account in [Users and licenses](https://docs.getdbt.com/docs/cloud/manage-access/seats-and-users.md). There are four types of possible seat licenses: * **Analyst** — for permission sets assigned and shared amongst those who don't need day-to-day access. Requires developer seat license purchase. * **Developer** — for permission sets that require day-to-day interaction with the dbt platform. * **IT** — for access to specific features related to account management (for example, configuring git integration). * **Read-Only** — for access to view certain documents and reports. ##### What counts as a Successful Model Built?[​](#what-counts-as-a-successful-model-built "Direct link to What counts as a Successful Model Built?") dbt considers a Successful Model Built as any model that is successfully built via a run through dbt’s orchestration functionality in a dbt deployment environment. Models are counted when built and run. This includes any jobs run via dbt's scheduler, CI builds (jobs triggered by pull requests), runs kicked off via the dbt API, and any successor dbt tools with similar functionality. This also includes models that are successfully built even when a run may fail to complete. For example, you may have a job that contains 100 models and on one of its runs, 51 models are successfully built and then the job fails. In this situation, only 51 models would be counted. Any models built in a dbt development environment (for example, via the Studio IDE) do not count towards your usage. Tests, seeds, ephemeral models, and snapshots also do not count. When a dynamic table is initially created, the model is counted (if the creation is successful). However, in subsequent runs, dbt skips these models unless the definition of the dynamic table has changed. This refers not to changes in the SQL logic but to changes in dbt's logic, specifically those governed by [`on_configuration_change config`](https://docs.getdbt.com/reference/resource-configs/on_configuration_change.md)). The dynamic table continues to update on a cadence because the adapter is orchestrating that refresh rather than dbt. | What counts towards Successful Models Built | | | ------------------------------------------- | -- | | View | ✅ | | Table | ✅ | | Incremental | ✅ | | Ephemeral Models | ❌ | | Tests | ❌ | | Seeds | ❌ | | Snapshots | ❌ | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | ##### What counts as a Queried Metric?[​](#what-counts-as-a-queried-metric "Direct link to What counts as a Queried Metric?") The Semantic Layer, powered by MetricFlow, measures usage in distinct Queried Metrics. * Every successful request you make to render or run SQL to the Semantic Layer API counts as at least one queried metric, even if no data is returned. * If the query calculates or renders SQL for multiple metrics, each calculated metric will be counted as a queried metric. * If a request to run a query is not executed successfully in the data platform or if a query results in an error without completion, it is not counted as a queried metric. * Requests for metadata from the Semantic Layer are also not counted as queried metrics. Examples of queried metrics include: * Querying one metric, grouping by one dimension → 1 queried metric ```shell dbt sl query --metrics revenue --group-by metric_time ``` * Querying one metric, grouping by two dimensions → 1 queried metric ```shell dbt sl query --metrics revenue --group-by metric_time,user__country ``` * Querying two metrics, grouping by two dimensions → 2 queried metrics ```shell dbt sl query --metrics revenue,gross_sales --group-by metric_time,user__country ``` * Running a compile for one metric → 1 queried metric ```shell dbt sl query --metrics revenue --group-by metric_time --compile ``` * Running a compile for two metrics → 2 queried metrics ```shell dbt sl query --metrics revenue,gross_sales --group-by metric_time --compile ``` ##### Viewing usage in the product[​](#viewing-usage-in-the-product "Direct link to Viewing usage in the product") Viewing usage in the product is restricted to specific roles: * Starter plan — Owner group * Enterprise and Enterprise+ plans — Account and billing admin roles For an account-level view of usage, if you have access to the **Billing** and **Usage** pages, you can see an estimate of the usage for the month. In the Billing page of the **Account Settings**, you can see how your account tracks against its usage. You can also see which projects are building the most models. [![To view account-level estimated usage, go to 'Account settings' and then select 'Billing'.](/img/docs/building-a-dbt-project/billing-usage-page.jpg?v=2 "To view account-level estimated usage, go to 'Account settings' and then select 'Billing'.")](#)To view account-level estimated usage, go to 'Account settings' and then select 'Billing'. As a Starter and Developer plan user, you can see how the account is tracking against the included models built. As an Enterprise plan user, you can see how much you have drawn down from your annual commit and how much remains. On each **Project Home** page, any user with access to that project can see how many models are built each month. From there, additional details on top jobs by models built can be found on each **Environment** page. [![Your Project home page displays how many models are built each month.](/img/docs/building-a-dbt-project/billing-project-page.jpg?v=2 "Your Project home page displays how many models are built each month.")](#)Your Project home page displays how many models are built each month. In addition, you can look at the **Job Details** page's **Insights** tab to show how many models are being built per month for that particular job and which models are taking the longest to build. [![View how many models are being built per month for a particular job by going to the 'Insights' tab in the 'Job details' page.](/img/docs/building-a-dbt-project/billing-job-page.jpg?v=2 "View how many models are being built per month for a particular job by going to the 'Insights' tab in the 'Job details' page.")](#)View how many models are being built per month for a particular job by going to the 'Insights' tab in the 'Job details' page. Usage information is available to customers on consumption-based plans, and some usage visualizations might not be visible to customers on legacy plans. Any usage data shown in dbt is only an estimate of your usage, and there could be a delay in showing usage data in the product. Your final usage for the month will be visible on your monthly statements (statements applicable to Starter and Enterprise-tier plans). #### dbt Copilot: Usage metering and limiting [Starter](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing")[Enterprise](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing")Enterprise+[​](#dbt-copilot-usage-metering-and-limiting- "Direct link to dbt-copilot-usage-metering-and-limiting-") Copilot usage is measured based on the number of completed AI requests, known as Copilot actions. Usage limits are enforced to ensure fair access and system performance. A defined number of Copilot invocations is allocated monthly based on your [subscription plan](https://www.getdbt.com/pricing). Once the usage limit is reached, access to Copilot functionality will be temporarily disabled until the start of the next billing cycle. ##### Usage and metering information[​](#usage-and-metering-information "Direct link to Usage and metering information")  AI usage tracking by Copilot actions Copilot actions refer to requests made to the Copilot assistant through the dbt interface. These actions are recorded and displayed on the billing page alongside other usage metrics. The following interactions count as Copilot actions: * **Each inline generation** — Every time Copilot writes or suggests code in your file, it counts toward your usage limit. * **Each generation of documentation, tests, semantic models, or metrics** — Any time you ask Copilot to automatically create things like documentation, tests, data models, or metrics, it counts as one interaction. * **Each generation within Copilot chats on Canvas or Insights** — Any time you use Copilot chat in Canvas or Insights to generate something, it counts as an interaction.  Allowed limits on number of Copilot actions per month per license The following table outlines the limits of Copilot actions by plan per month: | Plan | Limit | | ---------- | ------ | | Developer | ❌ | | Starter\* | 500 | | Enterprise | 5,000 | | Enterpise+ | 10,000 | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | \*Team plan customers who enrolled in Copilot Beta prior to March 19, 2025 have access to Copilot. All other legacy Team plan customers must move to the [Starter plan or above](https://www.getdbt.com/pricing) to get access.  Notifications when limitations are reached When usage limits are reached, a notification appears in the UI. Additionally, an email notification is sent to the designated recipient. For users on the Starter plan, the account owner receives an email notification when the usage limit is reached. For users enrolled on the Enterprise and Enterprise+ plans, both the billing administrator and the account administrator are notified by email when the usage limit is reached. Once usage limits are reached, attempts to perform an action in Copilot triggers a banner notification indicating that the limit has been exceeded. Under Bring Your Own Key (BYOK), usage is not tracked by Copilot and is subject to your OpenAI limits. ##### Viewing usage in the product[​](#viewing-usage-in-the-product-1 "Direct link to Viewing usage in the product") To view the usage in your account: 1. Navigate to [**Account settings**](https://docs.getdbt.com/docs/cloud/account-settings.md). 2. Select **Billing** under the Settings header. 3. On the billing page, click **Copilot** to view your usage. [![View usage in Copilot](/img/docs/dbt-cloud/view-usage-in-copilot.gif?v=2 "View usage in Copilot")](#)View usage in Copilot #### Plans and Billing[​](#plans-and-billing "Direct link to Plans and Billing") dbt offers several [plans](https://www.getdbt.com/pricing) with different features that meet your needs. We may make changes to our plan details from time to time. We'll always let you know in advance, so you can be prepared. The following section explains how billing works in each plan. ##### Developer plan billing[​](#developer-plan-billing "Direct link to Developer plan billing") Developer plans are free and include one Developer license and 3,000 models each month. Models are refreshed at the beginning of each calendar month. If you exceed 3,000 models, any subsequent runs will be canceled until models are refreshed or until you upgrade to a paid plan. The rest of the dbt platform is still accessible, and no work will be lost. All included successful models built numbers above reflect our most current pricing and packaging. Based on your usage terms when you signed up for the Developer Plan, the included model entitlements may be different from what’s reflected above. ##### Starter plan billing[​](#starter-plan-billing "Direct link to Starter plan billing") Starter customers pay monthly via credit card for seats and usage, and accounts include 15,000 models monthly. Seats are charged upfront at the beginning of the month. If you add seats during the month, seats will be prorated and charged on the same day. Seats removed during the month will be reflected on the next invoice and are not eligible for refunds. You can change the credit card information and the number of seats from the billings section anytime. Accounts will receive one monthly invoice that includes the upfront charge for the seats and the usage charged in arrears from the previous month. Usage is calculated and charged in arrears for the previous month. If you exceed 15,000 models in any month, you will be billed for additional usage on your next invoice. Additional usage is billed at the rates on our [pricing page](https://www.getdbt.com/pricing). Included models that are not consumed do not roll over to future months. You can estimate your bill with a simple formula: `($100 x number of developer seats) + ((models built - 15,000) x $0.01)` All included successful models built numbers above reflect our most current pricing and packaging. Based on your usage terms when you signed up for the Starter plan, the included model entitlements may be different from what’s reflected above. ##### Enterprise plan billing[​](#enterprise-plan-billing "Direct link to Enterprise plan billing") As an Enterprise customer, you pay annually via invoice, monthly in arrears for additional usage (if applicable), and may benefit from negotiated usage rates. Please refer to your order form or contract for your specific pricing details, or [contact the account team](https://www.getdbt.com/contact-demo) with any questions. Enterprise plan billing information is not available in the dbt UI. Changes are handled through your dbt Labs Solutions Architect or account team manager. ##### Legacy plans[​](#legacy-plans "Direct link to Legacy plans") Customers who purchased the dbt Starter plan (formerly Team) plan before August 11, 2023, remain on a legacy pricing plan as long as your account is in good standing. The legacy pricing plan is based on seats and includes unlimited models, subject to reasonable use. Legacy Semantic Layer For customers using the legacy Semantic Layer with dbt\_metrics package, this product will be deprecated in December 2023. Legacy users may choose to upgrade at any time to the revamped version, Semantic Layer powered by MetricFlow. The revamped version is available to most customers (see [prerequisites](https://docs.getdbt.com/guides/sl-snowflake-qs.md#prerequisites)) for a limited time on a free trial basis, subject to reasonable use. dbt Labs may institute use limits if reasonable use is exceeded. Additional features, upgrades, or updates may be subject to separate charges. Any changes to your current plan pricing will be communicated in advance according to our Terms of Use. #### Managing usage[​](#managing-usage "Direct link to Managing usage") From dbt, click on your account name in the left side menu and select **Account settings**. The **Billing** option will be on the left side menu under the **Settings** heading. Here, you can view individual available plans and the features provided for each. ##### Usage notifications[​](#usage-notifications "Direct link to Usage notifications") Every plan automatically sends email alerts when 75%, 90%, and 100% of usage estimates have been reached. * Starter plan — All users within the Owner group receive alerts. * Enterprise-tier plans — All users with the Account Admin and Billing Admin [permission sets](https://docs.getdbt.com/docs/cloud/manage-access/enterprise-permissions.md#permission-sets) receive alerts. Users cannot opt out of these emails. To have additional users to receive these alert emails, assign them the applicable permissions mentioned earlier. Note that your usage may already be higher than the percentage indicated in the alert due to your usage pattern and minor latency times. ##### How do I stop usage from accruing?[​](#how-do-i-stop-usage-from-accruing "Direct link to How do I stop usage from accruing?") There are 2 options to disable models from being built and charged: 1. Open the **Job Settings** of every job and navigate to the **Triggers** section. Disable the **Run on Schedule** and set the **Continuous Integration** feature **Run on Pull Requests?** to **No**. Check your workflows to ensure that you are not triggering any runs via the dbt API. This option will enable you to keep your dbt jobs without building more models. 2. Alternatively, you can delete some or all of your dbt jobs. This will ensure that no runs are kicked off, but you will permanently lose your job(s). #### Optimize costs in dbt[​](#optimize-costs-in-dbt "Direct link to Optimize costs in dbt") dbt offers ways to optimize your model’s built usage and warehouse costs. ##### Best practices for optimizing successful models built[​](#best-practices-for-optimizing-successful-models-built "Direct link to Best practices for optimizing successful models built") When thinking of ways to optimize your costs from successful models built, there are methods to reduce those costs while still adhering to best practices. To ensure that you are still utilizing tests and rebuilding views when logic is changed, it's recommended to implement a combination of the best practices that fit your needs. More specifically, if you decide to exclude views from your regularly scheduled dbt job runs, it's imperative that you set up a merge job (with a link to the section) to deploy updated view logic when changes are detected. ###### Exclude views in a dbt job[​](#exclude-views-in-a-dbt-job "Direct link to Exclude views in a dbt job") Many dbt users utilize views, which don’t always need to be rebuilt every time you run a job. For any jobs that contain views that *do not* include macros that dynamically generate code (for example, case statements) based on upstream tables and also *do not* have tests, you can implement these steps: 1. Go to your current production deployment job in dbt. 2. Modify your command to include: `--exclude config.materialized:view`. 3. Save your job changes. If you have views that contain macros with case statements based on upstream tables, these will need to be run each time to account for new values. If you still need to test your views with each run, follow the [Exclude views while still running tests](#exclude-views-while-running-tests) best practice to create a custom selector. ###### Exclude views while running tests[​](#exclude-views-while-running-tests "Direct link to Exclude views while running tests") Running tests for views in every job run can help keep data quality intact and save you from the need to rerun failed jobs. To exclude views from your job run while running tests, you can follow these steps to create a custom [selector](https://docs.getdbt.com/reference/node-selection/yaml-selectors.md) for your job command. 1. Open your dbt project in the Studio IDE. 2. Add a file called `selectors.yml` in your top-level project folder. 3. In the file, add the following code: ```yaml selectors: - name: skip_views_but_test_views description: > A default selector that will exclude materializing views without skipping tests on views. default: true definition: union: - union: - method: path value: "*" - exclude: - method: config.materialized value: view - method: resource_type value: test ``` 4. Save the file and commit it to your project. 5. Modify your dbt jobs to include `--selector skip_views_but_test_views`. ###### Build only changed views[​](#build-only-changed-views "Direct link to Build only changed views") If you want to ensure that you're building views whenever the logic is changed, create a merge job that gets triggered when code is merged into main: 1. Ensure you have a [CI job setup](https://docs.getdbt.com/docs/deploy/ci-jobs.md) in your environment. 2. Create a new [deploy job](https://docs.getdbt.com/docs/deploy/deploy-jobs.md#create-and-schedule-jobs) and call it “Merge Job". 3. Set the  **Environment** to your CI environment. Refer to [Types of environments](https://docs.getdbt.com/docs/deploy/deploy-environments.md#types-of-environments) for more details. 4. Set **Commands** to: `dbt run -s state:modified+`. Executing `dbt build` in this context is unnecessary because the CI job was used to both run and test the code that just got merged into main. 5. Under the **Execution Settings**, select the default production job to compare changes against: * **Defer to a previous run state** — Select the “Merge Job” you created so the job compares and identifies what has changed since the last merge. 6. In your dbt project, follow the steps in Run a dbt job on merge in the [Customizing CI/CD with custom pipelines](https://docs.getdbt.com/guides/custom-cicd-pipelines.md) guide to create a script to trigger the dbt API to run your job after a merge happens within your git repository or watch this [video](https://www.loom.com/share/e7035c61dbed47d2b9b36b5effd5ee78?sid=bcf4dd2e-b249-4e5d-b173-8ca204d9becb). The purpose of the merge job is to: * Immediately deploy any changes from PRs to production. * Ensure your production views remain up-to-date with how they’re defined in your codebase while remaining cost-efficient when running jobs in production. The merge action will optimize your cloud data platform spend and shorten job times, but you’ll need to decide if making the change is right for your dbt project. ##### Rework inefficient models[​](#rework-inefficient-models "Direct link to Rework inefficient models") ###### Job Insights tab[​](#job-insights-tab "Direct link to Job Insights tab") To reduce your warehouse spend, you can identify what models, on average, are taking the longest to build in the **Job** page under the **Insights** tab. This chart looks at the average run time for each model based on its last 20 runs. Any models that are taking longer than anticipated to build might be prime candidates for optimization, which will ultimately reduce cloud warehouse spending. ###### Model Timing tab[​](#model-timing-tab "Direct link to Model Timing tab") To understand better how long each model takes to run within the context of a specific run, you can look at the **Model Timing** tab. Select the run of interest on the **Run History** page to find the tab. On that **Run** page, click **Model Timing**. Once you've identified which models could be optimized, check out these other resources that walk through how to optimize your work: * [Build scalable and trustworthy data pipelines with dbt and BigQuery](https://services.google.com/fh/files/misc/dbt_bigquery_whitepaper.pdf) * [Best Practices for Optimizing Your dbt and Snowflake Deployment](https://www.snowflake.com/wp-content/uploads/2021/10/Best-Practices-for-Optimizing-Your-dbt-and-Snowflake-Deployment.pdf) * [How to optimize and troubleshoot dbt models on Databricks](https://docs.getdbt.com/guides/optimize-dbt-models-on-databricks.md) #### FAQs[​](#faqs "Direct link to FAQs") * What happens if I need more seats on the Starter plan? *If you need more developer seats, select the [Contact Sales](https://www.getdbt.com/contact) option from the billing settings to talk to our sales team about an Enterprise or Enterprise+ plan.* * What if I go significantly over my included free models on the Starter or Developer plan? *Consider upgrading to a Starter or Enterprise-tier plan. Starter and Enterprise-tier plans include more models and allow you to exceed the monthly usage limit. Enterprise accounts are supported by a dedicated account management team and offer annual plans, custom configurations, and negotiated usage rates.* * I want to upgrade my plan. Will all of my work carry over? *Yes. Your dbt account will be upgraded without impacting your existing projects and account settings.* * How do I determine the right plan for me? *The best option is to consult with our sales team. They'll help you figure out what is right for your needs. We also offer a free two-week trial on the Starter plan.* * What are the Semantic Layer trial terms? *Starter and Enterprise-tier customers can sign up for a free trial of the Semantic Layer, powered by MetricFlow, for use of up to 1,000 Queried Metrics per month. The trial will be available at least through January 2024. dbt Labs may extend the trial period in its sole discretion. During the trial period, we may reach out to discuss pricing options or ask for feedback. At the end of the trial, free access may be removed and a purchase may be required to continue use. dbt Labs reserves the right to change limits in a free trial or institute pricing when required or at any time in its sole discretion.* * What is the reasonable use limitation for the Semantic Layer powered by MetricFlow during the trial? *Each account will be limited to 1,000 Queried Metrics per month during the trial period and may be changed at the sole discretion of dbt Labs.* #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Build with dbt Copilot EnterpriseEnterprise + ### Build with dbt Copilot [Enterprise](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing")[Enterprise +](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing") Use Copilot to build visual models in the Canvas with natural language prompts. [Copilot](https://docs.getdbt.com/docs/cloud/dbt-copilot.md) seamlessly integrates with [Canvas](https://docs.getdbt.com/docs/cloud/canvas.md), a drag-and-drop experience that helps you with build your visual models using natural language prompts. Before you begin, make sure you can access [Canvas](https://docs.getdbt.com/docs/cloud/use-canvas.md#access-canvas). To begin building models with natural language prompts in the Canvas: 1. Click on the **dbt Copilot** icon in Canvas menu. 2. In the dbt Copilot prompt box, enter your prompt in natural language for Copilot to build the model(s) you want. You can also reference existing models using the `@` symbol. For example, to build a model that calculates the total price of orders, you can enter `@orders` in the prompt and it'll pull in and reference the `orders` model. 3. Click **Generate** and dbt Copilot generates a summary of the model(s) you want to build. * To start over, click on the **+** icon. To close the prompt box, click **X**. [![Enter a prompt in the dbt Copilot prompt box to build models using natural language](/img/docs/dbt-cloud/copilot-generate.jpg?v=2 "Enter a prompt in the dbt Copilot prompt box to build models using natural language")](#)Enter a prompt in the dbt Copilot prompt box to build models using natural language 4. Click **Apply** to generate the model(s) in the Canvas. 5. dbt Copilot displays a visual "diff" view to help you compare the proposed changes with your existing code. Review the diff view in the canvas to see the generated operators built byCopilot: * White: Located in the top of the canvas and means existing set up or blank canvas that will be removed or replaced by the suggested changes. * Green: Located in the bottom of the canvas and means new code that will be added if you accept the suggestion.
[![Visual diff view of proposed changes](/img/docs/dbt-cloud/copilot-diff.jpg?v=2 "Visual diff view of proposed changes")](#)Visual diff view of proposed changes 6. Reject or accept the suggestions 7. In the **generated** operator box, click the play icon to preview the data 8. Confirm the results or continue building your model. [![Use the generated operator with play icon to preview the data](/img/docs/dbt-cloud/copilot-output.jpg?v=2 "Use the generated operator with play icon to preview the data")](#)Use the generated operator with play icon to preview the data 9. To edit the generated model, open **Copilot** prompt box and type your edits. 10. Click **Submit** and Copilot will generate the revised model. Repeat steps 5-8 until you're happy with the model. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Change your dbt theme dbt supports **Light mode** (default), **Dark mode**, and **System mode** (respects your browser's theme for light or dark mode) under the **Theme** section of your user profile and is available for all [plans](https://www.getdbt.com/pricing). You can seamlessly switch between these modes directly from the profile menu, customizing your viewing experience. Your selected theme is stored in your user profile, ensuring a consistent experience across dbt. Theme selection applies across all areas of dbt, including the [Studio IDE](https://docs.getdbt.com/docs/cloud/studio-ide/develop-in-studio.md), [Catalog](https://docs.getdbt.com/docs/explore/explore-projects.md), [environments](https://docs.getdbt.com/docs/environments-in-dbt.md), [jobs](https://docs.getdbt.com/docs/deploy/jobs.md), and more. Learn more about customizing themes in [Change themes in dbt](https://docs.getdbt.com/docs/cloud/about-cloud/change-your-dbt-cloud-theme.md#change-themes-in-dbt-cloud). #### Change themes in dbt[​](#change-themes-in-dbt "Direct link to Change themes in dbt") To switch to dark mode in the dbt UI, follow these steps: 1. Navigate to your account name at the bottom left of your account. 2. Under **Theme**, select **Dark**. [![Enable dark mode](/img/docs/dbt-cloud/using-dbt-cloud/dark-mode.png?v=2 "Enable dark mode")](#)Enable dark mode And that’s it! 🎉 Your chosen selected theme will follow you across all devices. To revert to **Light mode** or **System mode**, repeat the same steps and select your preferred theme. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Configure and use the dbt CLI Learn how to configure the dbt CLI for your dbt project to run dbt commands, like `dbt environment show` to view your dbt configuration or `dbt compile` to compile your project and validate models and tests. You'll also benefit from: * Secure credential storage in the dbt platform. * [Automatic deferral](https://docs.getdbt.com/docs/cloud/about-cloud-develop-defer.md) of build artifacts to your project's production environment. * Speedier, lower-cost builds. * Support for Mesh ([cross-project ref](https://docs.getdbt.com/docs/mesh/govern/project-dependencies.md)), and more. #### Prerequisites[​](#prerequisites "Direct link to Prerequisites") * You must set up a project in dbt. * **Note** — If you're using the dbt CLI, you can connect to your [data platform](https://docs.getdbt.com/docs/cloud/connect-data-platform/about-connections.md) directly in the dbt platform interface and don't need a [`profiles.yml`](https://docs.getdbt.com/docs/local/profiles.yml.md) file. * You must have your [personal development credentials](https://docs.getdbt.com/docs/dbt-cloud-environments.md#set-developer-credentials) set for that project. The dbt CLI will use these credentials, stored securely in dbt, to communicate with your data platform. * You must be on dbt version 1.5 or higher. Refer to [dbt versions](https://docs.getdbt.com/docs/dbt-versions/upgrade-dbt-version-in-cloud.md) to upgrade. #### Configure the dbt CLI[​](#configure-the-dbt-cli "Direct link to Configure the dbt CLI") Once you install the dbt CLI, you need to configure it to connect to a dbt project. 1. In dbt, select the project you want to configure your dbt CLI with. The project must already have a [development environment](https://docs.getdbt.com/docs/dbt-cloud-environments.md#create-a-development-environment) set up. 2. From the main menu, go to **CLI**. 3. In the **Configure Cloud authentication** section, click **Download CLI configuration file** to download your `dbt_cloud.yml` credentials file. Region URLs to download credentials You can also download the credentials from the links provided based on your region: * North America: [](https://cloud.getdbt.com/cloud-cli) * EMEA: [](https://emea.dbt.com/cloud-cli) * APAC: [](https://au.dbt.com/cloud-cli) * North American Cell 1: `https:/ACCOUNT_PREFIX.us1.dbt.com/cloud-cli` * Single-tenant: `https://YOUR_ACCESS_URL/cloud-cli` 4. Save the `dbt_cloud.yml` file in the `.dbt` directory, which stores your dbt CLI configuration. * Mac or Linux: `~/.dbt/dbt_cloud.yml` * Windows: `C:\Users\yourusername\.dbt\dbt_cloud.yml` The config file looks like this: ```yaml version: "1" context: active-project: "" active-host: "" defer-env-id: "" projects: - project-name: "" project-id: "" account-name: "" account-id: "" account-host: "" # for example, "cloud.getdbt.com" token-name: "" token-value: "" - project-name: "" project-id: "" account-name: "" account-id: "" account-host: "" # for example, "cloud.getdbt.com" token-name: "" token-value: "" ``` Store the config file in a safe place as it contains API keys. Check out the [FAQs](#faqs) to learn how to create a `.dbt` directory and move the `dbt_cloud.yml` file. If you have multiple copies and your file has a numerical addendum (for example, `dbt_cloud(2).yml`), remove the additional text from the filename. 5. After downloading the config file and creating your directory, navigate to a project in your terminal: ```bash cd ~/dbt-projects/jaffle_shop ``` 6. In your `dbt_project.yml` file, ensure you have or include a `dbt-cloud` section with a `project-id` field. The `project-id` field contains the dbt project ID you want to use. ```yaml # dbt_project.yml name: version: # Your project configs... dbt-cloud: project-id: PROJECT_ID ``` * To find your project ID, select **Develop** in the dbt navigation menu. You can use the URL to find the project ID. For example, in `https://YOUR_ACCESS_URL/develop/26228/projects/123456`, the project ID is `123456`. 7. You should now be able to [use the dbt CLI](#use-the-dbt-cloud-cli) and run [dbt commands](https://docs.getdbt.com/reference/dbt-commands.md) like [`dbt environment show`](https://docs.getdbt.com/reference/commands/dbt-environment.md) to view your dbt configuration details or `dbt compile` to compile models in your dbt project. With your repo recloned, you can add, edit, and sync files with your repo. #### Set environment variables[​](#set-environment-variables "Direct link to Set environment variables") To set environment variables in the dbt CLI for your dbt project: 1. From dbt, click on your account name in the left side menu and select **Account settings**. 2. Under the **Your profile** section, select **Credentials**. 3. Click on your project and scroll to the **Environment variables** section. 4. Click **Edit** on the lower right and then set the user-level environment variables. #### Use the dbt CLI[​](#use-the-dbt-cli "Direct link to Use the dbt CLI") The dbt CLI uses the same set of [dbt commands](https://docs.getdbt.com/reference/dbt-commands.md) and [MetricFlow commands](https://docs.getdbt.com/docs/build/metricflow-commands.md) as dbt Core to execute the commands you provide. For example, use the [`dbt environment`](https://docs.getdbt.com/reference/commands/dbt-environment.md) command to view your dbt configuration details. With the dbt CLI, you can: * Run [multiple invocations in parallel](https://docs.getdbt.com/reference/dbt-commands.md) and ensure [safe parallelism](https://docs.getdbt.com/reference/dbt-commands.md#parallel-execution), which `dbt-core` doesn't currently guarantee. * Automatically defer build artifacts to your project's production environment. * Support [project dependencies](https://docs.getdbt.com/docs/mesh/govern/project-dependencies.md), which allows you to depend on another project using the metadata service in dbt. * Project dependencies instantly connect to and reference (or `ref`) public models defined in other projects. You don't need to execute or analyze these upstream models yourself. Instead, you treat them as an API that returns a dataset. Use the `--help` flag As a tip, most command-line tools have a `--help` flag to show available commands and arguments. Use the `--help` flag with dbt in two ways: * `dbt --help`: Lists the commands available for dbt
* `dbt run --help`: Lists the flags available for the `run` command #### Lint SQL files[​](#lint-sql-files "Direct link to Lint SQL files") From the dbt CLI, you can invoke [SQLFluff](https://sqlfluff.com/), which is a modular and configurable SQL linter that warns you of complex functions, syntax, formatting, and compilation errors. Many of the same flags that you can pass to SQLFluff are available from the dbt CLI. The available SQLFluff commands are: * `lint` — Lint SQL files by passing a list of files or from standard input (stdin). * `fix` — Fix SQL files. * `format` — Autoformat SQL files. To lint SQL files, run the command as follows: ```text dbt sqlfluff lint [PATHS]... [flags] ``` When you don't specify a path, dbt lints all SQL files in the current project. To lint a specific SQL file or a directory, set `PATHS` to the path of the SQL file(s) or directory of files. To lint multiple files or directories, pass multiple `PATHS` flags. To show detailed information on all the dbt supported commands and flags, run the `dbt sqlfluff -h` command. ###### Considerations[​](#considerations "Direct link to Considerations") When running `dbt sqlfluff` from the dbt CLI, the following are important behaviors to consider: * dbt reads the `.sqlfluff` file, if it exists, for any custom configurations you might have. * For continuous integration/continuous development (CI/CD) workflows, your project must have a `dbt_cloud.yml` file and you have successfully run commands from within this dbt project. * An SQLFluff command will return an exit code of 0 if it ran with any file violations. This dbt behavior differs from SQLFluff behavior, where a linting violation returns a non-zero exit code. dbt Labs plans on addressing this in a later release. #### Considerations[​](#considerations-1 "Direct link to Considerations") The dbt CLI doesn't currently support relative paths in the [`packages.yml` file](https://docs.getdbt.com/docs/build/packages.md). Instead, use the [Studio IDE](https://docs.getdbt.com/docs/cloud/studio-ide/develop-in-studio.md), which supports relative paths in this scenario. Here's an example of a [local package](https://docs.getdbt.com/docs/build/packages.md#local-packages) configuration in the `packages.yml` that won't work with the dbt CLI: ```yaml # repository_root/my_dbt_project_in_a_subdirectory/packages.yml packages: - local: ../shared_macros ``` In this example, `../shared_macros` is a relative path that tells dbt to look for: * `..` — Go one directory up (to `repository_root`). * `/shared_macros` — Find the `shared_macros` folder in the root directory. To work around this limitation, use the [Studio IDE](https://docs.getdbt.com/docs/cloud/studio-ide/develop-in-studio.md), which fully supports relative paths in `packages.yml`. #### FAQs[​](#faqs "Direct link to FAQs")  How to create a .dbt directory and move your file If you've never had a `.dbt` directory, you should perform the following recommended steps to create one. If you already have a `.dbt` directory, move the `dbt_cloud.yml` file into it. Some information about the `.dbt` directory: * A `.dbt` directory is a hidden folder in the root of your filesystem. It's used to store your dbt configuration files. The `.` prefix is used to create a hidden folder, which means it's not visible in Finder or File Explorer by default. * To view hidden files and folders, press Command + Shift + G on macOS or Ctrl + Shift + G on Windows. This opens the "Go to Folder" dialog where you can search for the `.dbt` directory. - Create a .dbt directory - Move the dbt\_cloud.yml file 1. Clone your dbt project repository locally. 2. Use the `mkdir` command followed by the name of the folder you want to create. * If using macOS, add the `~` prefix to create a `.dbt` folder in the root of your filesystem: ```bash mkdir ~/.dbt # macOS mkdir %USERPROFILE%\.dbt # Windows ``` You can move the `dbt_cloud.yml` file into the `.dbt` directory using the `mv` command or by dragging and dropping the file into the `.dbt` directory by opening the Downloads folder using the "Go to Folder" dialog and then using drag-and-drop in the UI. To move the file using the terminal, use the `mv/move` command. This command moves the `dbt_cloud.yml` from the `Downloads` folder to the `.dbt` folder. If your `dbt_cloud.yml` file is located elsewhere, adjust the path accordingly. ###### Mac or Linux[​](#mac-or-linux "Direct link to Mac or Linux") In your command line, use the `mv` command to move your `dbt_cloud.yml` file into the `.dbt` directory. If you've just downloaded the `dbt_cloud.yml` file and it's in your Downloads folder, the command might look something like this: ```bash mv ~/Downloads/dbt_cloud.yml ~/.dbt/dbt_cloud.yml ``` ###### Windows[​](#windows "Direct link to Windows") In your command line, use the move command. Assuming your file is in the Downloads folder, the command might look like this: ```bash move %USERPROFILE%\Downloads\dbt_cloud.yml %USERPROFILE%\.dbt\dbt_cloud.yml ```  How to skip artifacts from being downloaded By default, the dbt CLI downloads [all artifacts](https://docs.getdbt.com/reference/artifacts/dbt-artifacts.md) when you execute dbt commands. To skip these files from being downloaded, add `--download-artifacts=false` to the command you want to run. This can help improve run-time performance but might break workflows that depend on assets like the [manifest](https://docs.getdbt.com/reference/artifacts/manifest-json.md).  I'm getting a \`Session occupied\` error in dbt CLI? If you're receiving a `Session occupied` error in the dbt CLI or if you're experiencing a long-running session, you can use the `dbt invocation list` command in a separate terminal window to view the status of your active session. This helps debug the issue and identify the arguments that are causing the long-running session. To cancel an active session, use the `Ctrl + Z` shortcut. To learn more about the `dbt invocation` command, see the [dbt invocation command reference](https://docs.getdbt.com/reference/commands/invocation.md). Alternatively, you can reattach to your existing session with `dbt reattach` and then press `Control-C` and choose to cancel the invocation. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Copilot chat in Studio StarterEnterpriseEnterprise+ ### Copilot chat in Studio [Starter](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing")[Enterprise](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing")Enterprise+ Use the Copilot chat feature in Studio IDE to generate SQL using your input and the context of the active project. Copilot chat is an interactive interface within Studio IDE that allows users to generate SQL from natural language prompts and ask analytics-related questions. By integrating contextual understanding of your dbt project, Copilot assists in streamlining SQL development while ensuring users remain actively involved in the process. This collaborative approach helps maintain accuracy, relevance, and adherence to best practices in your organization’s analytics workflows. Need more than inline SQL generation? For multi-step workflows like building new models end-to-end, refactoring existing models, or generating tests and documentation, use the [Developer agent](https://docs.getdbt.com/docs/dbt-ai/developer-agent.md). The Developer agent is an autonomous agent that can write, validate, and run changes across your project — activate it by switching to **Ask** or **Code** mode in the **Copilot** panel. #### Prerequisites[​](#prerequisites "Direct link to Prerequisites") * Must have a [dbt Starter, Enterprise or Enterprise+ account](https://www.getdbt.com/pricing). * Development environment is on a supported [release track](https://docs.getdbt.com/docs/dbt-versions/cloud-release-tracks.md) to receive ongoing updates. * Copilot enabled for your account. * Admins must [enable Copilot](https://docs.getdbt.com/docs/cloud/enable-dbt-copilot.md#enable-dbt-copilot) (and opt-in to AI features, if required) in your dbt Cloud project settings. #### Copilot chat overview[​](#copilot-chat-overview "Direct link to Copilot chat overview") This section covers the different ways you can use Copilot chat in Studio IDE. * Generate SQL * Mention a model in the project * Add and replace buttons Ask Copilot to generate SQL queries using natural language, making it faster to build or modify dbt models without manual SQL coding. You can describe the query or data transformation you want, and Copilot will produce the corresponding SQL code for you within the Studio IDE environment.⁠ This includes the ability to: * Scaffold new SQL models from scratch by describing your needs in plain English. * Refactor or optimize existing SQL in your models. * Generate complex queries, CTEs, and even automate best-practice SQL formatting, all directly in the chat or command palette UI. To generate SQL queries: 1. Navigate to the **Copilot** button in the Studio IDE 2. Select **\[\*] SQL** from the menu [![SQL option.](/img/docs/dbt-cloud/copilot-chat-generate-sql.png?v=2 "SQL option.")](#)SQL option. ⁠​ This model mention capability is designed to provide a much more project-aware experience than generic code assistants, enabling you to: * Pose questions about specific models (For example, "Add a test for the model `stg_orders`") [![Mention model with menu open.](/img/docs/dbt-cloud/copilot-chat-mention-model-menu-open.png?v=2 "Mention model with menu open.")](#)Mention model with menu open. [![Mention model after selecting from menu.](/img/docs/dbt-cloud/copilot-chat-mention-model-menu-select.png?v=2 "Mention model after selecting from menu.")](#)Mention model after selecting from menu. Add generated code or content into your project, or replace the selected section with the Copilot suggestion, all directly from the chat interface. This lets you review and apply changes with a single click for an efficient workflow.⁠ ⁠​
These buttons are often tracked as specific user actions in the underlying event/telemetry data, confirming they are core to the expected interaction with Copilot in Studio IDE and related surfaces.⁠ ⁠​
The **Add** button lets you append Copilot's output, while **Replace** swaps your current code or selection with the generated suggestion, giving you precise, in-context editing control. Note, if the file is empty, you'll only see **Add** as an option, since there's nothing to replace. [![Add and replace buttons.](/img/docs/dbt-cloud/copilot-chat-add-replace.png?v=2 "Add and replace buttons.")](#)Add and replace buttons. #### Related docs[​](#related-docs "Direct link to Related docs") * [Prompt cookbook](https://docs.getdbt.com/guides/prompt-cookbook.md) — Learn how to write effective prompts for dbt Copilot #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Copilot style guide This guide provides an overview of the Copilot `dbt-styleguide.md` file, outlining its structure, recommended usage, and best practices for effective implementation in your dbt projects. The `dbt-styleguide.md` is a template for creating a style guide for dbt projects. It includes: * SQL style guidelines (for example, using lowercase keywords and trailing commas) * Model organization and naming conventions * Model configurations and testing practices * Recommendations for using pre-commit hooks to enforce style rules This guide helps ensure consistency and clarity in dbt projects. #### `dbt-styleguide.md` for Copilot[​](#dbt-styleguidemd-for-copilot "Direct link to dbt-styleguidemd-for-copilot") Using Copilot in the Studio IDE, you can automatically generate a style guide template called `dbt-styleguide.md`. If the style guide is manually added or edited, it must also follow this naming convention. Any other file name cannot be used with Copilot. Add the `dbt-styleguide.md` file to the root of your project. Copilot will use it as context for the large language model (LLM) when generating [data tests](https://docs.getdbt.com/docs/build/data-tests.md), [metrics](https://docs.getdbt.com/docs/build/metrics-overview.md), [semantic models](https://docs.getdbt.com/docs/build/semantic-models.md), and [documentation](https://docs.getdbt.com/docs/build/documentation.md). Note, by creating a `dbt-styleguide.md` for Copilot, you are overriding dbt's default style guide. #### Creating `dbt-styleguide.md` in the Studio IDE[​](#creating-dbt-styleguidemd-in-the-studio-ide "Direct link to creating-dbt-styleguidemd-in-the-studio-ide") 1. Open a file in the Studio IDE. 2. Click **Copilot** in the toolbar. 3. Select **Generate ... Style guide** from the menu. [![Generate styleguide in Copilot](/img/docs/dbt-cloud/generate-styleguide.png?v=2 "Generate styleguide in Copilot")](#)Generate styleguide in Copilot 4. The style guide template appears in the Studio IDE. Click **Save**. `dbt-styleguide.md` is added at the root level of your project. If you haven't previously generated a style guide file, the latest version will be automatically sourced from dbt platform. #### If `dbt-styleguide.md` already exists[​](#if-dbt-styleguidemd-already-exists "Direct link to if-dbt-styleguidemd-already-exists") If there is an existing `dbt-styleguide.md` file and you attempt to generate a new style guide, a modal appears with the following options: * **Cancel** — Exit without making changes. * **Restore** — Revert to the latest version from dbt platform. * **Edit** — Modify the existing style guide manually. [![Styleguide exists](/img/docs/dbt-cloud/styleguide-exists.png?v=2 "Styleguide exists")](#)Styleguide exists #### Further reading[​](#further-reading "Direct link to Further reading") * [About dbt Copilot](https://docs.getdbt.com/docs/cloud/dbt-copilot.md) * [How we style our dbt projects](https://docs.getdbt.com/best-practices/how-we-style/0-how-we-style-our-dbt-projects.md) #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### dbt Architecture This page helps practitioners and those interested in dbt's architecture and data flow. #### About dbt architecture[​](#about-dbt-architecture "Direct link to About dbt architecture") The dbt application has two types of components: static and dynamic. The static components are always running to serve highly available dbt functions, like the dbt web application. On the other hand, the dynamic components are created ad-hoc to handle tasks such as background jobs or requests to use the Studio IDE. dbt is available in most regions around the world in both [single tenant](https://docs.getdbt.com/docs/cloud/about-cloud/tenancy.md#single-tenant) (AWS and Azure) and [multi-tenant](https://docs.getdbt.com/docs/cloud/about-cloud/tenancy.md#multi-tenant) configurations. dbt uses PostgreSQL for its backend, S3-compatible Object Storage systems for logs and artifacts, and a Kubernetes storage solution for creating dynamic, persistent volumes. All data at rest on dbt servers is protected using AES-256 encryption. ![](/img/docs/dbt-cloud/on-premises/data-flows.png) For a more detailed breakdown of the dbt apps, [download the advanced architecture guide PDF](https://drive.google.com/uc?export=download\&id=1lktNuMZybXfqFtr24J8zAssEfoL9r51S). #### Communication[​](#communication "Direct link to Communication") dbt can communicate with several external services, including data platforms, git repositories, authentication services, and directories. All communications occur over HTTPS (attempts to connect via HTTP are redirected to HTTPS). dbt encrypts in transit using the TLS 1.2 cryptographic protocol. TLS (Transport Layer Security) 1.2 is an industry-standard protocol for encrypting sensitive data while it travels over the public internet (which does not offer native encryption). A typical scenario that might be seen frequently is an employee working in a public space, such as an airport or café. The user might be connected to an unsecured public network offered by a facility to which many others are also connected. What if there is a bad actor amongst them running a program that can "capture" network packets and analyze them over the air? When that user is accessing dbt and running models that interact with the data platform, the information sent to and from their computer and the services is encrypted with TLS 1.2. If that user runs a command that initializes communication between dbt and the data warehouse (or a git repo or an auth service) over the internet, that communication is also encrypted. This means that while the bad actor can technically see the traffic moving over that unsecured network, they can't read or otherwise parse any information. They will not be able to eavesdrop on or hack the information in any way whatsoever. They would see a nonsensical set of characters that nobody can decrypt. For more detailed information on our security practices, read our [Security page](https://getdbt.com/security). ##### Data warehouse interaction[​](#data-warehouse-interaction "Direct link to Data warehouse interaction") dbt's primary role is as a data processor, not a data store. The dbt application enables users to dispatch SQL to the warehouse for transformation. However, users can post SQL that returns customer data into the dbt application. This data never persists and will only exist in memory on the instance for the duration of the session. To lock down customer data correctly, proper data warehouse permissions must be applied to prevent improper access or storage of sensitive data. Some data warehouse providers offer advanced security features that can be leveraged in dbt. [Private connections](https://docs.getdbt.com/docs/cloud/secure/private-connectivity/private-connectivity.md) allows supported data platforms on AWS to communicate with dbt without the traffic traversing the public internet. [Snowflake](https://docs.getdbt.com/docs/cloud/manage-access/set-up-snowflake-oauth.md) and [BigQuery](https://docs.getdbt.com/docs/cloud/manage-access/set-up-bigquery-oauth.md) offer Oauth integration which adds a layer of security for the data platforms (Enterprise-tier plan only). ##### Git sync[​](#git-sync "Direct link to Git sync") dbt can sync with a variety of git providers, including [Github](https://docs.getdbt.com/docs/cloud/git/connect-github.md), [Gitlab](https://docs.getdbt.com/docs/cloud/git/connect-gitlab.md), and [Azure DevOps](https://docs.getdbt.com/docs/cloud/git/connect-azure-devops.md) within its integrated development environment ([Studio IDE](https://docs.getdbt.com/docs/cloud/studio-ide/develop-in-studio.md)). Communication takes place over HTTPS rather than SSH and is protected using the TLS 1.2 protocol for data in transit. The git repo information is stored on dbt servers to make it accessible during the Studio IDE sessions. When the git sync is disabled, you must [contact support](mailto:support@getdbt.com) to request the deletion of the synced data. ##### Authentication services[​](#authentication-services "Direct link to Authentication services") The default settings of dbt enable local users with credentials stored in dbt. Still, integrations with various authentication services are offered as an alternative, including [single sign-on services](https://docs.getdbt.com/docs/cloud/manage-access/sso-overview.md). Access to features can be granted/restricted by role using [RBAC](https://docs.getdbt.com/docs/cloud/manage-access/about-user-access.md#role-based-access-control-). SSO features are essential because they reduce the number of credentials a user must maintain. Users sign in once and the authentication token is shared among integrated services (such as dbt). The token expires and must be refreshed at predetermined intervals, requiring the user to go through the authentication process again. If the user is disabled in the SSO provider service, their access to dbt is disabled, and they cannot override this with local auth credentials. [Snowflake](https://docs.getdbt.com/docs/cloud/manage-access/set-up-snowflake-oauth.md) and [BigQuery](https://docs.getdbt.com/docs/cloud/manage-access/set-up-bigquery-oauth.md) offer OAuth (JSON to pass info and API calls for auth) services as an alternative to SAML (XML to pass info and session cookies for auth). Users can authenticate against the data platform for secure access to dbt and prevent access when credentials are revoked. #### Security[​](#security "Direct link to Security") dbt Labs is dedicated to upholding industry standards for Cloud security and GDPR compliance. Our compliance certifications include the following: * SOC2 Type II — assesses a service provider’s security control environment against the trust services principles and criteria set forth by the American Institute of Certified Public Accountants (AICPA). * ISO27001:2013 — a globally recognized standard for establishing and certifying an information security management system (ISMS). * GDPR - dbt Labs is committed to maintaining GDPR compliance standards. Read more about our [Data Processing Addendum](https://www.getdbt.com/cloud/dpa). For more detailed information about our security practices, read our [Security page](https://www.getdbt.com/security/). #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### dbt Copilot FAQs Read about common questions about Copilot to understand how it works and how it can help you. Copilot is an AI-powered assistant fully integrated into your dbt experience that handles the tedious tasks, speeds up workflows, and ensures consistency, helping you deliver exceptional data products faster. dbt Labs is committed to protecting your privacy and data. This page provides information about how Copilot handles your data. For more information, check out the [dbt Labs AI development principles](https://www.getdbt.com/legal/ai-principles) page. #### Overview[​](#overview "Direct link to Overview")  What is dbt Copilot? Copilot is a powerful AI-powered assistant that's fully integrated into your dbt experience and designed to accelerate your analytics workflows. Copilot embeds AI-driven assistance across every stage of the analytics development life cycle (ADLC), empowering data practitioners to deliver data products faster, improve data quality, and enhance data accessibility. With automatic code generation, let Copilot [generate code](https://docs.getdbt.com/docs/cloud/use-dbt-copilot.md) using natural language, and [generate documentation](https://docs.getdbt.com/docs/build/documentation.md), [data tests](https://docs.getdbt.com/docs/build/data-tests.md), [metrics](https://docs.getdbt.com/docs/build/metrics-overview.md), and [semantic models](https://docs.getdbt.com/docs/build/semantic-models.md) for you with the click of a button in the [Studio IDE](https://docs.getdbt.com/docs/cloud/studio-ide/develop-copilot.md), [Canvas](https://docs.getdbt.com/docs/cloud/use-canvas.md), and [Insights](https://docs.getdbt.com/docs/explore/dbt-insights.md).  Where can I find dbt Copilot? Copilot is available in the [Studio IDE](https://docs.getdbt.com/docs/cloud/studio-ide/develop-copilot.md), [Canvas](https://docs.getdbt.com/docs/cloud/use-canvas.md), and [Insights](https://docs.getdbt.com/docs/explore/dbt-insights.md). Future releases will bring Copilot to even more parts of the dbt application! To use Copilot, you must have a dbt [Starter, Enterprise, or Enterprise+ account](https://www.getdbt.com/contact) and administrative privileges to opt-in to the feature for your team. Certain features like [BYOK](https://docs.getdbt.com/docs/cloud/enable-dbt-copilot.md#bringing-your-own-openai-api-key-byok), [natural prompts in Canvas](https://docs.getdbt.com/docs/cloud/build-canvas-copilot.md), and more are only available on Enterprise and Enterprise+ plans.  What are the benefits of using dbt Copilot? Use Copilot to: * Generate code from scratch or edit existing code with natural language. * Generate documentation, tests, metrics, and semantic models for your models. * Accelerate your development workflow with AI-driven assistance. with a click of a button and ensuring data privacy and security. [![Example of using dbt Copilot to generate documentation in the IDE](/img/docs/dbt-cloud/cloud-ide/dbt-copilot-doc.gif?v=2 "Example of using dbt Copilot to generate documentation in the IDE")](#)Example of using dbt Copilot to generate documentation in the IDE #### Availability[​](#availability "Direct link to Availability")  Who has access to dbt Copilot? When enabled by an admin, Copilot is available on a dbt [Starter, Enterprise, or Enterprise+ account](https://www.getdbt.com/contact) to all dbt [developer license users](https://docs.getdbt.com/docs/cloud/manage-access/seats-and-users.md).  Is dbt Copilot available for all deployment types? Yes, Copilot is powered by ai-codegen-api, which is deployed everywhere including [multi-tenant and single-tenant deployments](https://docs.getdbt.com/docs/cloud/about-cloud/access-regions-ip-addresses.md). #### How it works[​](#how-it-works "Direct link to How it works")  What data/code is used to train the model supporting dbt Copilot? Copilot is not used to train a large language model (LLM). dbt Labs does not train any models at all. Currently, we use OpenAI models, and our agreement with OpenAI prohibits OpenAI from retaining our data persistently. Refer our [dbt Labs AI principles page](https://www.getdbt.com/legal/ai-principles) for more information.  Which model providers does dbt Copilot use? dbt Labs works with OpenAI to build and operationalize Copilot. Enterprise-tier accounts can [supply their own OpenAI keys](https://docs.getdbt.com/docs/cloud/enable-dbt-copilot.md#bringing-your-own-openai-api-key-byok).  Do we support BYOK (bring your own key) at the project level? The Copilot BYOK option is currently an account-only configuration. However, there may be a future where we make this configurable on a project-level. #### Privacy and data[​](#privacy-and-data "Direct link to Privacy and data")  Does dbt Copilot store or use personal data? The user clicks the Copilot button. Aside from authentication, it works without personal data, but the user controls what is input into Copilot.  Can dbt Copilot data be deleted upon client written request? To the extent client identifies personal or sensitive information uploaded by or on behalf of client to dbt Labs systems by the user in error, such data can be deleted within 30 days of written request.  Does dbt Labs own the output generated by dbt Copilot? No, dbt Labs will not dispute your ownership of any code or artifacts unique to your company that's generated when you use Copilot. Your code will not be used to train AI models for the benefit of dbt Labs or other third parties, including other dbt Labs customers.  Does dbt Labs have terms in place for dbt Copilot? Clients who signed with terms after January 2024 don't need additional terms prior to enabling Copilot. Longer term clients have also protected their data through confidentiality and data deletion obligations. In the event client prefer additional terms, clients may enter into the presigned AI & Beta Addendum available at [here](https://na2.docusign.net/Member/PowerFormSigning.aspx?PowerFormId=85817ff4-9ce5-4fae-8e34-20b854fdb52a\&env=na2\&acct=858db9e4-4a6d-48df-954f-84ece3303aac\&v=2) (the dbt Labs signature will be dated as of the date the client signs). #### Considerations[​](#considerations "Direct link to Considerations")  What are the considerations for using dbt Copilot? Copilot has the following considerations to keep in mind: * Copilot is not available in the dbt CLI. * Copilot is not available in the dbt API. Future releases are planned that may bring Copilot to even more parts of the dbt application. #### Copilot allowlisting URLs[​](#copilot-allowlisting-urls "Direct link to Copilot allowlisting URLs")  Allowlisting URLs Copilot doesn't specifically block AI-related URLs. However, if your organization use endpoint protection platforms, firewalls, or network proxies (such as Zscaler), you may encounter the following issues with Copilot: * Block unknown or AI-related domains. * Break TLS/SSL traffic to inspect it. * Disallow specific ports or services. We recommend the following URLs to be allowlisted: **For Copilot in the IDE**: * `/api/ide/accounts/${accountId}/develop/${developId}/ai/generate_generic_tests/...` * `/api/ide/accounts/${accountId}/develop/${developId}/ai/generate_documentation/...` * `/api/ide/accounts/${accountId}/develop/${developId}/ai/generate_semantic_model/...` * `/api/ide/accounts/${accountId}/develop/${developId}/ai/generate_inline` * `/api/ide/accounts/${accountId}/develop/${developId}/ai/generate_metrics/...` * `/api/ide/accounts/${accountId}/develop/${developId}/ai/track_response` **For Copilot in Canvas**: * `/api/private/visual-editor/v1/ai/llm-generate` * `/api/private/visual-editor/v1/ai/track-response` * `/api/private/visual-editor/v1/files/${fileId}/llm-generate-dag-through-chat` #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### dbt Copilot StarterEnterpriseEnterprise + ### dbt Copilot [Starter](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing")[Enterprise](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing")[Enterprise +](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing") Copilot is dbt's AI-powered product surface — the single place where inline code assistance, autonomous agents, and structured workflows come together across your analytics development lifecycle, all grounded in dbt's structured context. From generating SQL and documentation with a single click, to delegating complex multi-step workflows to autonomous agents like the [Developer agent](https://docs.getdbt.com/docs/dbt-ai/developer-agent.md), Copilot brings AI into the places you already work in dbt. #### What Copilot includes[​](#what-copilot-includes "Direct link to What Copilot includes") Copilot includes the following capabilities: ###### Inline assistance [Starter](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing")[Enterprise](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing")[Enterprise +](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing")[​](#inline-assistance- "Direct link to inline-assistance-") * **[Inline assistance](https://docs.getdbt.com/docs/cloud/dbt-copilot.md)** — Generate code, documentation, tests, semantic models, and SQL with a single click — available in the Studio IDE, Canvas, and Insights. ###### dbt Agents [Beta](https://docs.getdbt.com/docs/dbt-versions/product-lifecycles "Go to https://docs.getdbt.com/docs/dbt-versions/product-lifecycles")[Enterprise](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing")[Enterprise +](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing")[​](#dbt-agents- "Direct link to dbt-agents-") * [Developer agent](https://docs.getdbt.com/docs/dbt-ai/developer-agent.md) — Write or refactor models, generate tests, and validate changes from natural language, all within the Studio IDE. * [Analyst agent](https://docs.getdbt.com/docs/dbt-ai/analyst-agent.md) — Ask natural language questions and get accurate, governed answers powered by the dbt Semantic Layer. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Develop with dbt Develop dbt projects using the dbt platform, a faster and more reliable way to deploy dbt and manage your project in a single, web-based UI. You can develop in your browser using a dbt-powered command line interface (CLI), an integrated development environment (Studio IDE), or Canvas. #### Getting started[​](#getting-started "Direct link to Getting started") To get started, you'll need a [dbt](https://www.getdbt.com/signup) account and a developer license. For a more comprehensive guide about developing in dbt, refer to the [quickstart guides](https://docs.getdbt.com/docs/get-started-dbt.md). Choose the option that best fits your needs: [![](/img/icons/dbt-bit.svg)](https://docs.getdbt.com/docs/cloud/cloud-cli-installation.md) ###### [dbt CLI](https://docs.getdbt.com/docs/cloud/cloud-cli-installation.md) [Allows you to develop and run dbt commands from your local command line or code editor against your dbt development environment.](https://docs.getdbt.com/docs/cloud/cloud-cli-installation.md) [![](/img/icons/vsce.svg)](https://docs.getdbt.com/docs/about-dbt-extension.md) ###### [dbt VS Code extension](https://docs.getdbt.com/docs/about-dbt-extension.md) [Bring the speed and intelligence of the dbt Fusion engine to VS Code for a seamless local development experience.](https://docs.getdbt.com/docs/about-dbt-extension.md) [![](/img/icons/dashboard.svg)](https://docs.getdbt.com/docs/cloud/studio-ide/develop-in-studio.md) ###### [dbt Studio IDE](https://docs.getdbt.com/docs/cloud/studio-ide/develop-in-studio.md) [Develop dbt projects directly in your browser with seamless SQL compilation and an intuitive, visual workflow.](https://docs.getdbt.com/docs/cloud/studio-ide/develop-in-studio.md) [![](/img/icons/canvas.svg)](https://docs.getdbt.com/docs/cloud/canvas.md) ###### [dbt Canvas](https://docs.getdbt.com/docs/cloud/canvas.md) [Develop with Canvas, a seamless drag-and-drop experience that helps analysts quickly create and visualize dbt models in dbt.](https://docs.getdbt.com/docs/cloud/canvas.md)
#### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Develop with dbt Copilot StarterEnterpriseEnterprise + ### Develop with dbt Copilot [Starter](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing")[Enterprise](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing")[Enterprise +](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing") This page describes how to use Copilot in the Studio IDE to improve your development workflow. Use [Copilot](https://docs.getdbt.com/docs/cloud/dbt-copilot.md) in the [Studio IDE](https://docs.getdbt.com/docs/cloud/studio-ide/develop-in-studio.md) to generate documentation, tests, semantic models, metrics, and SQL code from scratch — making it easier for you to build your dbt project, accelerate your development, and focus on high-level tasks. For information about using Copilot in the [Canvas](https://docs.getdbt.com/docs/cloud/canvas.md), see [Build with Copilot](https://docs.getdbt.com/docs/cloud/build-canvas-copilot.md). #### Developer agent [Beta](https://docs.getdbt.com/docs/dbt-versions/product-lifecycles "Go to https://docs.getdbt.com/docs/dbt-versions/product-lifecycles")[​](#developer-agent- "Direct link to developer-agent-") For autonomous model generation, refactoring, and multi-step workflows in the Studio IDE, see the [Developer agent](https://docs.getdbt.com/docs/dbt-ai/developer-agent.md). The Developer agent is accessible from the Copilot panel. Switch to **Ask** or **Code** mode to activate the agent. Your browser does not support the video tag. Example of using the Developer agent to refactor a model in the Studio IDE. #### Generate resources[​](#generate-resources "Direct link to Generate resources") Generate documentation, tests, metrics, and semantic models [resources](https://docs.getdbt.com/docs/build/projects.md) with the click-of-a-button in the [Studio IDE](https://docs.getdbt.com/docs/cloud/studio-ide/develop-in-studio.md) using dbt Copilot, saving you time. To access and use this AI feature: 1. Navigate to the Studio IDE and select a SQL model file under the **File Explorer**. 2. In the **Console** section (under the **File Editor**), click **dbt Copilot** to view the available AI options. 3. Select the available options to generate the YAML config: **Generate Documentation**, **Generate Tests**, **Generate Semantic Model**, or **Generate Metrics**. To generate multiple YAML configs for the same model, click each option separately. dbt Copilot intelligently saves the YAML config in the same file. note [dbt Copilot](https://docs.getdbt.com/docs/cloud/dbt-copilot.md) doesn't yet support generating semantic models with the latest YAML spec. * To generate metrics, you need to first have semantic models defined. * Once defined, click **dbt Copilot** and select **Generate Metrics**. * Write a prompt describing the metrics you want to generate and press enter. * **Accept** or **Reject** the generated code. 4. Verify the AI-generated code. You can update or fix the code as needed. 5. Click **Save As**. You should see the file changes under the **Version control** section. [![Example of using dbt Copilot to generate documentation in the IDE](/img/docs/dbt-cloud/cloud-ide/dbt-copilot-doc.gif?v=2 "Example of using dbt Copilot to generate documentation in the IDE")](#)Example of using dbt Copilot to generate documentation in the IDE #### Generate and edit code[​](#generate-and-edit-code "Direct link to Generate and edit code") Copilot also allows you to generate SQL code directly within the SQL file in the [Studio IDE](https://docs.getdbt.com/docs/cloud/studio-ide/develop-in-studio.md), using natural language prompts. This means you can rewrite or add specific portions of the SQL file without needing to edit the entire file. This intelligent AI tool streamlines SQL development by reducing errors, scaling effortlessly with complexity, and saving valuable time. Copilot's [prompt window](#use-the-prompt-window), accessible by keyboard shortcut, handles repetitive or complex SQL generation effortlessly so you can focus on high-level tasks. Use Copilot's prompt window for use cases like: * Writing advanced transformations * Performing bulk edits efficiently * Crafting complex patterns like regex ##### Use the prompt window[​](#use-the-prompt-window "Direct link to Use the prompt window") Access Copilot's AI prompt window using the keyboard shortcut Cmd+B (Mac) or Ctrl+B (Windows) to: ###### 1. Generate SQL from scratch[​](#1-generate-sql-from-scratch "Direct link to 1. Generate SQL from scratch") * Use the keyboard shortcuts Cmd+B (Mac) or Ctrl+B (Windows) to generate SQL from scratch. * Enter your instructions to generate SQL code tailored to your needs using natural language. * Ask Copilot to fix the code or add a specific portion of the SQL file. [![dbt Copilot's prompt window accessible by keyboard shortcut Cmd+B (Mac) or Ctrl+B (Windows)](/img/docs/dbt-cloud/cloud-ide/copilot-sql-generation-prompt.png?v=2 "dbt Copilot's prompt window accessible by keyboard shortcut Cmd+B (Mac) or Ctrl+B (Windows)")](#)dbt Copilot's prompt window accessible by keyboard shortcut Cmd+B (Mac) or Ctrl+B (Windows) ###### 2. Edit existing SQL code[​](#2-edit-existing-sql-code "Direct link to 2. Edit existing SQL code") * Highlight a section of SQL code and press Cmd+B (Mac) or Ctrl+B (Windows) to open the prompt window for editing. * Use this to refine or modify specific code snippets based on your needs. * Ask Copilot to fix the code or add a specific portion of the SQL file. ###### 3. Review changes with the diff view to quickly assess the impact of the changes before making changes[​](#3-review-changes-with-the-diff-view-to-quickly-assess-the-impact-of-the-changes-before-making-changes "Direct link to 3. Review changes with the diff view to quickly assess the impact of the changes before making changes") * When a suggestion is generated, Copilot displays a visual "diff" view to help you compare the proposed changes with your existing code: * **Green**: Means new code that will be added if you accept the suggestion. * **Red**: Highlights existing code that will be removed or replaced by the suggested changes. ###### 4. Accept or reject suggestions[​](#4-accept-or-reject-suggestions "Direct link to 4. Accept or reject suggestions") * **Accept**: If the generated SQL meets your requirements, click the **Accept** button to apply the changes directly to your `.sql` file directly in the IDE. * **Reject**: If the suggestion don’t align with your request/prompt, click **Reject** to discard the generated SQL without making changes and start again. ###### 5. Regenerate code[​](#5-regenerate-code "Direct link to 5. Regenerate code") * To regenerate, press the **Escape** button on your keyboard (or click the Reject button in the popup). This will remove the generated code and puts your cursor back into the prompt text area. * Update your prompt and press **Enter** to try another generation. Press **Escape** again to close the popover entirely. Once you've accepted a suggestion, you can continue to use the prompt window to generate additional SQL code and commit your changes to the branch. [![Edit existing SQL code using dbt Copilot's prompt window accessible by keyboard shortcut Cmd+B (Mac) or Ctrl+B (Windows)](/img/docs/dbt-cloud/cloud-ide/copilot-sql-generation.gif?v=2 "Edit existing SQL code using dbt Copilot's prompt window accessible by keyboard shortcut Cmd+B (Mac) or Ctrl+B (Windows)")](#)Edit existing SQL code using dbt Copilot's prompt window accessible by keyboard shortcut Cmd+B (Mac) or Ctrl+B (Windows) #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Edit and create dbt models EnterpriseEnterprise + ### Edit and create dbt models [Enterprise](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing")[Enterprise +](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing") Access and use Canvas to create or edit dbt models through a visual, drag-and-drop experience. Use the built-in AI for custom code generation in your development experience. #### Access Canvas[​](#access-canvas "Direct link to Access Canvas") Before accessing the editor, you should have a dbt project already set up. This includes a Git repository, data platform connection, environments, and developer credentials. If you don't have this set up, please contact your dbt Admin. Access **Canvas** at any time from the left-side menu. #### Canvas prerequisites[​](#canvas-prerequisites "Direct link to Canvas prerequisites") Before using Canvas, you should: * Have a [dbt Enterprise or Enterprise+](https://www.getdbt.com/pricing) account. * Have a [developer license](https://docs.getdbt.com/docs/cloud/manage-access/seats-and-users.md) with developer credentials set up. * Be using one of the following adapters: * Bigquery * Databricks * Redshift * Snowflake * Trino * You can access the Canvas with adapters not listed, but some features may be missing at this time. * Use [GitHub](https://docs.getdbt.com/docs/cloud/git/connect-github.md), [GitLab](https://docs.getdbt.com/docs/cloud/git/connect-gitlab.md), or [Azure DevOps](https://docs.getdbt.com/docs/cloud/git/connect-azure-devops.md) as your Git provider, connected to dbt via HTTPS. * SSH connections aren't supported at this time. * Self-hosted or on-premises deployments of any Git provider aren't supported for Canvas at this time. * Have an existing dbt project already created with a Staging or Production run completed. * Verify your Development environment is on a supported [release track](https://docs.getdbt.com/docs/dbt-versions/cloud-release-tracks.md) to receive ongoing updates. * Have read-only access to the [Staging environment](https://docs.getdbt.com/docs/deploy/deploy-environments.md#staging-environment) with the data to be able to execute `run` in the Canvas. To customize the required access for the Canvas user group, refer to [Set up environment-level permissions](https://docs.getdbt.com/docs/cloud/manage-access/environment-permissions-setup.md) for more information. * Have the AI-powered features toggle enabled (for [Copilot integration](https://docs.getdbt.com/docs/cloud/dbt-copilot.md)). #### Create a model[​](#create-a-model "Direct link to Create a model") To create a dbt SQL model, click on **Create a new model** and perform the following steps. Note that you can't create source models in Canvas. This is because you need to have production run with sources already created. 1. Drag an [operator](https://docs.getdbt.com/docs/cloud/canvas-interface.md#operators) from the operator toolbar and drop it onto the canvas. 2. Click on the operator to open its configuration panel. 3. View the **Output** and **SQL Code** tabs. * Each operator has an Output tab that allows you to preview the data from that configured node. * The Code tab displays the SQL code generated by the node's configuration. Use this to see the SQL for your visual model config. 4. Connect the operators by using the connector by dragging your cursor between the operator's "+" start point and linking it to the other operators you want to connect to. This should create a connector line. * Doing this allows the data to flow from the source table through various transformations you configured, to the final output. 5. Keep building your dbt model and ensure you confirm the output through the **Output** tab. #### Edit an existing model[​](#edit-an-existing-model "Direct link to Edit an existing model") To edit an existing model: 1. Navigate to a Canvas workspace. 2. Click **+Add** on the top navigation bar. 3. Click **Edit existing model**. This will allow you to select the model you'd like to edit. [![Edit a model using the 'Edit a model' button.](/img/docs/dbt-cloud/canvas/edit-model.png?v=2 "Edit a model using the 'Edit a model' button.")](#)Edit a model using the 'Edit a model' button. #### Upload data to Canvas[​](#upload-data-to-canvas "Direct link to Upload data to Canvas") You can upload a CSV file of source data for model creation directly from Canvas: 1. Click **+Add**. 2. Select **Upload CSV source**. 3. Drag your file to the canvas area or click **Upload** to select from your file explorer. This uploads the data to your data warehouse in a new table in your developer schema. It will be prefixed with `VE_UPLOADS_`. In the canvas window, it creates a source operator and a basic SQL model that you can customize. You can now work with this data in both Canvas and the Studio IDE. #### Test and document[​](#test-and-document "Direct link to Test and document") Testing and documenting your models is an important part of the development process. Stay tuned! Coming very soon, you'll be able to test and document your dbt models in Canvas. This ensures you can maintain high data quality and clarity around how your models ought to be used. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Enable dbt Copilot StarterEnterpriseEnterprise + ### Enable dbt Copilot [Starter](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing")[Enterprise](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing")[Enterprise +](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing") Enable Copilot, an AI-powered assistant, in dbt to speed up your development and focus on delivering quality data. This page explains how to enable Copilot in dbt to speed up your development and allow you to focus on delivering quality data. #### Prerequisites[​](#prerequisites "Direct link to Prerequisites") * Available in the dbt platform only. * Must have a [dbt Starter, Enterprise, or Enterprise+ account](https://www.getdbt.com/pricing). * Certain features like [BYOK](#bringing-your-own-openai-api-key-byok), [natural prompts in Canvas](https://docs.getdbt.com/docs/cloud/build-canvas-copilot.md), and more are only available on Enterprise and Enterprise+ plans. * Development environment is on a supported [release track](https://docs.getdbt.com/docs/dbt-versions/cloud-release-tracks.md) to receive ongoing updates. * By default, Copilot deployments use a central OpenAI API key managed by dbt Labs. Alternatively, you can [bring your own OpenAI API key](#bringing-your-own-openai-api-key-byok)(BYOK). * For BYOK, make sure to enable the latest text generation models as well as the `text-embedding-3-small` model. * Opt-in to AI features by following the steps in the next section in your **Account settings**. #### Enable dbt Copilot[​](#enable-dbt-copilot "Direct link to Enable dbt Copilot") To opt in to Copilot, a dbt admin can follow these steps: 1. Navigate to **Account settings** in the navigation menu. 2. Under **Settings**, confirm the account you're enabling. 3. Click **Edit** in the top right corner. 4. Enable the **Enable account access to dbt Copilot features** option. 5. Click **Save**. You should now have Copilot AI enabled for use. Note: To disable (only after enabled), repeat steps 1 to 3, toggle off in step 4, and repeat step 5. [![Example of the 'Enable account access to dbt Copilot features' option in Account settings](/img/docs/deploy/example-account-settings.png?v=2 "Example of the 'Enable account access to dbt Copilot features' option in Account settings")](#)Example of the 'Enable account access to dbt Copilot features' option in Account settings #### Bringing your own OpenAI API key (BYOK) [Enterprise +](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing")[Enterprise](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing")[​](#bringing-your-own-openai-api-key-byok- "Direct link to bringing-your-own-openai-api-key-byok-") Once AI features have been enabled, you can provide your organization's OpenAI API key. dbt will then leverage your OpenAI account and terms to power Copilot. This will incur billing charges to your organization from OpenAI for requests made by Copilot. Configure AI keys using: * dbt Labs-managed OpenAI API key * Your own OpenAI API key * Azure OpenAI ##### AI integrations[​](#ai-integrations "Direct link to AI integrations") Once AI features have been [enabled](https://docs.getdbt.com/docs/cloud/enable-dbt-copilot.md#enable-dbt-copilot), you can use dbt Labs' AI integration or bring-your-own provider to support AI-powered dbt features like [Copilot](https://docs.getdbt.com/docs/cloud/dbt-copilot.md) and [Ask dbt](https://docs.getdbt.com/docs/cloud-integrations/snowflake-native-app.md). dbt supports AI integrations for dbt Labs-managed OpenAI keys, Self-managed OpenAI keys, or Self-managed Azure OpenAI keys. Note, if you bring your own provider, you will incur API calls and associated charges for features used in dbt. Bringing your own provider is available for Enterprise or Enterprise+ plans. To configure the AI integration in your dbt account, a dbt admin can perform the following steps: 1. Click on your account name and select **Account settings** in the side menu. 2. Under **Settings**, click **Copilot**. 3. Under **API Keys**, click the **Pencil** icon to the right of **OpenAI** to configure the AI integration. [![Example of the AI integration page](/img/docs/dbt-cloud/account-integration-ai.png?v=2 "Example of the AI integration page")](#)Example of the AI integration page 4. Configure the AI integration for either **dbt Labs OpenAI**, **OpenAI**, or **Azure OpenAI**. The following tabs will walk you through the process. * dbt Labs OpenAI * OpenAI * Azure OpenAI 1. Select the toggle for **dbt Labs** to use dbt Labs' managed OpenAI key. 2. Click **Save**. [![Example of the dbt Labs integration page](/img/docs/dbt-cloud/account-integration-dbtlabs.png?v=2 "Example of the dbt Labs integration page")](#)Example of the dbt Labs integration page Bringing your own OpenAI key is available for Enterprise or Enterprise+ plans. 1. Select the toggle for **OpenAI** to use your own OpenAI key. 2. Enter the API key. 3. Click **Save**. [![Example of the OpenAI integration page](/img/docs/dbt-cloud/account-integration-openai.png?v=2 "Example of the OpenAI integration page")](#)Example of the OpenAI integration page Data residency limitation OpenAI projects with [data residency controls](https://platform.openai.com/docs/guides/your-data#data-residency-controls) enabled and configured for the United States (project region set to US) don't currently support BYOK. These projects can only use the API key in the dbt platform configuration. Specifying custom endpoints required for data residency isn’t yet supported, and we’re evaluating a solution for this. To use BYOK, ensure your OpenAI project doesn’t have data residency controls enabled. Projects without project region settings will use the standard OpenAI endpoint (`https://api.openai.com`) and support BYOK. Bringing your own Azure OpenAI key is available for Enterprise or Enterprise+ plans. To learn about deploying your own OpenAI model on Azure, refer to [Deploy models on Azure OpenAI](https://learn.microsoft.com/en-us/azure/ai-studio/how-to/deploy-models-openai). Configure credentials for your Azure OpenAI deployment in dbt the following way: 1. Locate your Azure OpenAI configuration in your Azure Deployment details page. 2. Enter your Azure OpenAI API key. 3. Enter the **Endpoint**, **API Version**, and **Deployment / Model Name**. 4. Click **Save**. [![Example of Azure OpenAI integration section](/img/docs/dbt-cloud/account-integration-azure-manual.png?v=2 "Example of Azure OpenAI integration section")](#)Example of Azure OpenAI integration section #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Fix deprecation warnings You can address deprecation warnings in the dbt platform by finding and fixing them using the autofix tool in the Studio IDE. You can run the autofix tool on the [Compatible or Latest release track](https://docs.getdbt.com/docs/dbt-versions/cloud-release-tracks.md) of dbt Core before you upgrade to Fusion! To find and fix deprecations: 1. Navigate to the Studio IDE by clicking **Studio** in the left menu. 2. Make sure to save and commit your work before proceeding. The autofix tool may overwrite any unsaved changes. 3. Click the three-dot menu located at the bottom right corner of the Studio IDE. 4. Select **Check & fix deprecations**. [![Access the Studio IDE options menu to autofix deprecation warnings](/img/docs/dbt-cloud/cloud-ide/ide-options-menu-with-save.png?v=2 "Access the Studio IDE options menu to autofix deprecation warnings")](#)Access the Studio IDE options menu to autofix deprecation warnings The tool performs a `dbt parse —show-all-deprecations —no-partial-parse` to find the deprecations in your project. 5. If you don't see the deprecations and the **Autofix warnings** button, click the command history in the bottom left: [![Access recent commands to see the autofix button](/img/docs/dbt-cloud/cloud-ide/command-history.png?v=2 "Access recent commands to see the autofix button")](#)Access recent commands to see the autofix button 6. When the command history opens, click the **Autofix warnings** button: [![Learn what deprecations need to be auto fixed](/img/docs/dbt-cloud/cloud-ide/autofix-button.png?v=2 "Learn what deprecations need to be auto fixed")](#)Learn what deprecations need to be auto fixed 7. When the **Proceed with autofix** dialog opens, click **Continue** to begin resolving project deprecations and start a follow-up parse to show remaining deprecations. [![Proceed with autofix](/img/docs/dbt-cloud/cloud-ide/proceed-with-autofix.png?v=2 "Proceed with autofix")](#)Proceed with autofix 8. Once complete, a success message appears. Click **Review changes** to verify the changes. [![Success](/img/docs/dbt-cloud/cloud-ide/autofix-success.png?v=2 "Success")](#)Success 9. Click **Commit and sync** in the top left of Studio IDE to commit these changes to the project repository. 10. You are now ready to enable Fusion if you [meet the requirements](https://docs.getdbt.com/docs/fusion/supported-features.md#requirements)! #### Related docs[​](#related-docs "Direct link to Related docs") * [Quickstart guide](https://docs.getdbt.com/guides.md) * [About dbt](https://docs.getdbt.com/docs/cloud/about-cloud/dbt-cloud-features.md) * [Develop in the Cloud](https://docs.getdbt.com/docs/cloud/studio-ide/develop-in-studio.md) #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Git commit signing EnterpriseEnterprise + ### Git commit signing [Enterprise](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing")[Enterprise +](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing") To prevent impersonation and enhance security, you can sign your Git commits before pushing them to your repository. Using your signature, a Git provider can cryptographically verify a commit and mark it as "verified", providing increased confidence about its origin. You can configure dbt to sign your Git commits when using the Studio IDE for development. To set up, enable the feature in dbt, follow the flow to generate a keypair, and upload the public key to your Git provider to use for signature verification. #### Prerequisites[​](#prerequisites "Direct link to Prerequisites") * GitHub or GitLab is your Git provider. Currently, Azure DevOps is not supported. * You have a dbt account on the [Enterprise or Enterprise+ plan](https://www.getdbt.com/pricing/). #### Generate GPG keypair in dbt[​](#generate-gpg-keypair-in-dbt "Direct link to Generate GPG keypair in dbt") To generate a GPG keypair in dbt, follow these steps: 1. Go to your **Personal profile** page in dbt. 2. Navigate to **Signed Commits** section. 3. Enable the **Sign commits originating from this user** toggle. 4. This will generate a GPG keypair. The private key will be used to sign all future Git commits. The public key will be displayed, allowing you to upload it to your Git provider. [![Example of profile setting Signed commits](/img/docs/dbt-cloud/example-git-signed-commits-setting.png?v=2 "Example of profile setting Signed commits")](#)Example of profile setting Signed commits #### Upload public key to Git provider[​](#upload-public-key-to-git-provider "Direct link to Upload public key to Git provider") To upload the public key to your Git provider, follow the detailed documentation provided by the supported Git provider: * [GitHub instructions](https://docs.github.com/en/authentication/managing-commit-signature-verification/adding-a-gpg-key-to-your-github-account) * [GitLab instructions](https://docs.gitlab.com/ee/user/project/repository/signed_commits/gpg.html) Once you have uploaded the public key to your Git provider, your Git commits will be marked as "Verified" after you push the changes to the repository. [![Example of a verified Git commit in a Git provider.](/img/docs/dbt-cloud/git-sign-verified.png?v=2 "Example of a verified Git commit in a Git provider.")](#)Example of a verified Git commit in a Git provider. #### Considerations[​](#considerations "Direct link to Considerations") * The GPG keypair is tied to the user, not a specific account. There is a 1:1 relationship between the user and keypair. The same key will be used for signing commits on any accounts the user is a member of. * The GPG keypair generated in dbt is linked to the email address associated with your account at the time of keypair creation. This email identifies the author of signed commits. * For your Git commits to be marked as "verified", your dbt email address must be a verified email address with your Git provider. The Git provider (such as, GitHub, GitLab) checks that the commit's signed email matches a verified email in your Git provider account. If they don’t match, the commit won't be marked as "verified." * Keep your dbt email and Git provider's verified email in sync to avoid verification issues. If you change your dbt email address: * Generate a new GPG keypair with the updated email, following the [steps mentioned earlier](https://docs.getdbt.com/docs/cloud/studio-ide/git-commit-signing.md#generate-gpg-keypair-in-dbt-cloud). * Add and verify the new email in your Git provider. #### FAQs[​](#faqs "Direct link to FAQs")  What happens if I delete my GPG keypair in dbt? If you delete your GPG keypair in dbt, your Git commits will no longer be signed. You can generate a new GPG keypair by following the [steps mentioned earlier](https://docs.getdbt.com/docs/cloud/studio-ide/git-commit-signing.md#generate-gpg-keypair-in-dbt-cloud).  What Git providers support GPG keys? GitHub and GitLab support commit signing, while Azure DevOps does not. Commit signing is a [git feature](https://git-scm.com/book/ms/v2/Git-Tools-Signing-Your-Work), and is independent of any specific provider. However, not all providers support the upload of public keys, or the display of verification badges on commits.  What if my Git provider doesn't support GPG keys? If your Git Provider does not explicitly support the uploading of public GPG keys, then commits will still be signed using the private key, but no verification information will be displayed by the provider.  What if my Git provider requires that all commits are signed? If your Git provider is configured to enforce commit verification, then unsigned commits will be rejected. To avoid this, ensure that you have followed all previous steps to generate a keypair, and uploaded the public key to the provider. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### IDE user interface The [Studio IDE](https://docs.getdbt.com/docs/cloud/studio-ide/develop-in-studio.md) is a tool for developers to effortlessly build, test, run, and version-control their dbt projects, and enhance data governance — all from the convenience of your browser. Use the Studio IDE to compile dbt code into SQL and run it against your database directly — no command line required! This page offers comprehensive definitions and terminology of user interface elements, allowing you to navigate the Studio IDE landscape with ease. [![The Studio IDE layout includes version control on the upper left, files/folders and search on the left, editor on the right, command palette at the top, and command/console at the bottom](/img/docs/dbt-cloud/cloud-ide/ide-basic-layout.png?v=2 "The Studio IDE layout includes version control on the upper left, files/folders and search on the left, editor on the right, command palette at the top, and command/console at the bottom")](#)The Studio IDE layout includes version control on the upper left, files/folders and search on the left, editor on the right, command palette at the top, and command/console at the bottom #### Basic layout[​](#basic-layout "Direct link to Basic layout") The Studio IDE streamlines your workflow, and features a popular user interface layout with files and folders on the left, editor on the right, and command and console information at the bottom. [![The Git repo link, documentation site button, Version Control menu, and File Explorer](/img/docs/dbt-cloud/cloud-ide/ide-side-menu.png?v=2 "The Git repo link, documentation site button, Version Control menu, and File Explorer")](#)The Git repo link, documentation site button, Version Control menu, and File Explorer 1. **Git repository link —** The Git repository link, located on the upper left of the Studio IDE, takes you to your repository on the same active branch. It also displays the repository name and the active branch name. * **Note:** This linking feature is only available for GitHub or GitLab repositories on multi-tenant dbt accounts. 2. **Documentation site button —** Clicking the Documentation site book icon, located next to the Git repository link, leads to the dbt Documentation site. The site is powered by the latest dbt artifacts generated in the IDE using the `dbt docs generate` command from the Command bar. 3. [**Version Control**](#editing-features) — The Studio IDE's powerful Version Control section contains all git-related elements, including the Git actions button and the **Changes** section. 4. **File explorer —** The File explorer shows the filetree of your repository. You can: * Click on any file in the filetree to open the file in the file editor. * Click and drag files between directories to move files. * Right-click a file to access the sub-menu options like duplicate file, copy file name, copy as `ref`, rename, delete. * Use file indicators, located to the right of your files or folder name, to see when changes or actions were made: * Unsaved (•) — The Studio IDE detects unsaved changes to your file/folder * Modification (M) — The Studio IDE detects a modification of existing files/folders * Added (A) — The Studio IDE detects added files * Deleted (D) — The Studio IDE detects deleted files. [![Use the Command bar to write dbt commands, toggle 'Defer', and view the current IDE status](/img/docs/dbt-cloud/cloud-ide/ide-command-bar.png?v=2 "Use the Command bar to write dbt commands, toggle 'Defer', and view the current IDE status")](#)Use the Command bar to write dbt commands, toggle 'Defer', and view the current IDE status 5. **Command bar —** The Command bar, located in the lower left of the Studio IDE, is used to invoke [dbt commands](https://docs.getdbt.com/reference/dbt-commands.md). When a command is invoked, the associated logs are shown in the Invocation History Drawer. 6. **Defer to production —** The **Defer to production** toggle allows developers to only build and run and test models they've edited without having to first run and build all the models that come before them (upstream parents). Refer to [Using defer in dbt](https://docs.getdbt.com/docs/cloud/about-cloud-develop-defer.md#defer-in-the-dbt-cloud-ide) for more info. 7. **Status button —** The Studio IDE Status button, located on the lower right of the Studio IDE, displays the current Studio IDE status. If there is an error in the status or in the dbt code that stops the project from parsing, the button will turn red and display "Error". If there aren't any errors, the button will display a green "Ready" status. To access the [Studio IDE Status modal](#modals-and-menus), simply click on this button. #### Search bar and command palette[​](#search-bar-and-command-palette "Direct link to Search bar and command palette") The Studio IDE provides tools to help you quickly navigate your project's files, find information, run commands, and replace syntax with just a few clicks in a layout that's familiar to users of popular IDEs. [![Use the search bar and command palette to quickly navigate your file tree and open tabs.](/img/docs/dbt-cloud/cloud-ide/search-and-command.png?v=2 "Use the search bar and command palette to quickly navigate your file tree and open tabs.")](#)Use the search bar and command palette to quickly navigate your file tree and open tabs. 1. [Search and replace](#search-and-replace) 2. [Command palette](#command-palette) ##### Search and replace[​](#search-and-replace "Direct link to Search and replace") The search feature enables you to quickly find specific terms or phrases and replace them with the click of a button. [![Search files for specific terms and quickly replace them.](/img/docs/dbt-cloud/cloud-ide/search-and-replace.png?v=2 "Search files for specific terms and quickly replace them.")](#)Search files for specific terms and quickly replace them. 1. Toggle between **file tree** and **search** navigation. 2. Search for words or phrases. Enhance the search to match case and/or whole words. You can also input replacement words or phrases. Click the icon next to the **Replace** field to replace all entries. 3. Navigate the search results. Click an entry to open the related file and highlight it on the screen. If you've entered replacement text, you'll see a preview of the new syntax. Click the symbol next to an entry to substitute the text with whatever is in the **Replace** field. ##### Command palette[​](#command-palette "Direct link to Command palette") The command palette enhances navigation of your dbt project, enabling you to search files, content, and symbols, show and run IDE commands, view recent files, and more. Click the command palette to view the available options. Actions supporting keyboard shortcuts display to the right of the text. [![The command palette enables you to quickly navigate your project and run commands.](/img/docs/dbt-cloud/cloud-ide/command-palette.png?v=2 "The command palette enables you to quickly navigate your project and run commands.")](#)The command palette enables you to quickly navigate your project and run commands. * **Go to File:** Search for files in your current project and open them in a new tab. * **Show and Run Commands:** View and run commands related to IDE navigation and settings. Note: dbt commands (such as `run` and `build`) are available only in the [Command bar](#console-section) menu in the console; the command palette doesn't currently support them. * **Search for Text:** Search for text across your project and either open files from the results or send results to the [search and replace](#search-and-replace) section for bulk changes. * **Go to Symbol in Editor:** Quickly jump to symbols in the current file. * **More:** Display advanced features such as **Go to Line/Column**, **Go to Symbol in Workspace**, and search within currently open files only. [![Go to File.](/img/docs/dbt-cloud/cloud-ide/go-to-file.png?v=2 "Go to File.")](#)Go to File. [![Show and Run Commands.](/img/docs/dbt-cloud/cloud-ide/show-and-run-commands.png?v=2 "Show and Run Commands.")](#)Show and Run Commands. [![Search for text.](/img/docs/dbt-cloud/cloud-ide/search-for-text.png?v=2 "Search for text.")](#)Search for text. [![Go to Symbol in Editor.](/img/docs/dbt-cloud/cloud-ide/go-to-symbol.png?v=2 "Go to Symbol in Editor.")](#)Go to Symbol in Editor. [![More.](/img/docs/dbt-cloud/cloud-ide/more.png?v=2 "More.")](#)More. #### Editing features[​](#editing-features "Direct link to Editing features") The Studio IDE features some delightful tools and layouts to make it easier for you to write dbt code and collaborate with teammates. [![Use the file editor, version control section, and save button during your development workflow](/img/docs/dbt-cloud/cloud-ide/ide-editing.png?v=2 "Use the file editor, version control section, and save button during your development workflow")](#)Use the file editor, version control section, and save button during your development workflow 1. **File editor —** The file editor is where you edit code. Tabs break out the region for each opened file, and unsaved files are marked with a blue dot icon in the tab view. You can edit, format, or lint files and execute dbt commands in your protected primary git branch. Since the Studio IDE prevents commits to the protected branch, it prompts you to commit those changes to a new branch. * Use intuitive [keyboard shortcuts](https://docs.getdbt.com/docs/cloud/studio-ide/keyboard-shortcuts.md) to make development easier for you and your team. 2. **Save button —** The editor has a **Save** button that saves editable files. Pressing the button or using the Command-S or Control-S shortcut saves the file contents. You don't need to save to preview code results in the Console section, but it's necessary before changes appear in a dbt invocation. The file editor tab shows a blue icon for unsaved changes. 3. **Version Control —** This menu contains all git-related elements, including the Git actions button. The button updates relevant actions based on your editor's state, such as prompting to pull remote changes, commit and sync when reverted commit changes are present, creating a merge/pull request when appropriate, or pruning branches deleted from the remote repository. * The dropdown menu on the Git actions button allows users to revert changes, refresh Git state, create merge/pull requests, prune branches, and change branches. * You can also [resolve merge conflicts](https://docs.getdbt.com/docs/cloud/git/merge-conflicts.md) and for more info on git, refer to [Version control basics](https://docs.getdbt.com/docs/cloud/git/version-control-basics.md#the-git-button-in-the-cloud-ide). * **Version Control Options menu —** The **Changes** section, under the Git actions button, lists all file changes since the last commit. You can click on a change to open the Git Diff View to see the inline changes. You can also right-click any file and use the file-specific options in the Version Control Options menu. [![Right-click edited files to access Version Control Options menu](/img/docs/dbt-cloud/cloud-ide/version-control-options-menu.png?v=2 "Right-click edited files to access Version Control Options menu")](#)Right-click edited files to access Version Control Options menu * Use the **Prune branches** option to remove local branches that have already been deleted from the remote repository. Selecting this triggers a [pop-up modal](#prune-branches-modal), where you can confirm the deletion of the specific local branches, keeping your branch management tidy. Note that this won't delete the branch you're currently on. Pruning branches isn't available for [managed repositories](https://docs.getdbt.com/docs/cloud/git/managed-repository.md) because they don't have a typical remote setup, which prevents remote branch deletion. #### Additional editing features[​](#additional-editing-features "Direct link to Additional editing features") * **Minimap —** A Minimap (code outline) gives you a high-level overview of your source code, which is useful for quick navigation and code understanding. A file's minimap is displayed on the upper-right side of the editor. To quickly jump to different sections of your file, click the shaded area. [![Use the Minimap for quick navigation and code understanding](/img/docs/dbt-cloud/cloud-ide/ide-minimap.png?v=2 "Use the Minimap for quick navigation and code understanding")](#)Use the Minimap for quick navigation and code understanding * **Git Diff View —** Clicking on a file in the **Changes** section of the **Version Control Menu** will open the changed file with Git Diff view. The editor will show the previous version on the left and the in-line changes made on the right. [![The Git Diff View displays the previous version on the left and the changes made on the right of the Editor](/img/docs/dbt-cloud/cloud-ide/ide-git-diff-view-with-save.png?v=2 "The Git Diff View displays the previous version on the left and the changes made on the right of the Editor")](#)The Git Diff View displays the previous version on the left and the changes made on the right of the Editor * **Markdown Preview console tab —** The Markdown Preview console tab shows a preview of your .md file's markdown code in your repository and updates it automatically as you edit your code. [![The Markdown Preview console tab renders markdown code below the Editor tab.](/img/docs/dbt-cloud/cloud-ide/ide-markdown-with-save.png?v=2 "The Markdown Preview console tab renders markdown code below the Editor tab.")](#)The Markdown Preview console tab renders markdown code below the Editor tab. * **CSV Preview console tab —** The CSV Preview console tab displays the data from your CSV file in a table, which updates automatically as you edit the file in your seed directory. [![View CSV code in the CSV Preview console tab below the Editor tab.](/img/docs/dbt-cloud/cloud-ide/ide-csv.png?v=2 "View CSV code in the CSV Preview console tab below the Editor tab.")](#)View CSV code in the CSV Preview console tab below the Editor tab. #### Console section[​](#console-section "Direct link to Console section") The console section, located below the file editor, includes various console tabs and buttons to help you with tasks such as previewing, compiling, building, and viewing the DAG. Refer to the following sub-bullets for more details on the console tabs and buttons. [![The Console section is located below the file editor and has various tabs and buttons to help execute tasks](/img/docs/dbt-cloud/cloud-ide/ide-console-overview.png?v=2 "The Console section is located below the file editor and has various tabs and buttons to help execute tasks")](#)The Console section is located below the file editor and has various tabs and buttons to help execute tasks 1. **Preview button —** When you click on the **Preview** button, it runs the SQL in the active file editor regardless of whether you have saved it or not and sends the results to the **Results** console tab. You can preview a selected portion of saved or unsaved code by highlighting it and then clicking the **Preview** button. Details Row limits in IDE The Studio IDE returns default row limits, however, you can also specify the number of records returned. Refer to the following sub-bullets for more info:

* **500-row limit:** To prevent the IDE from returning too much data and causing browser problems, dbt automatically sets a 500-row limit when using the **Preview Button**. You can modify this by adding `limit your_number` at the end of your SQL statement. For example, `SELECT * FROM` table `limit 100` will return up to 100 rows. Remember that you must write the `limit your_number` explicitly and cannot derive it from a macro. * **Change row limit default:** In dbt version 1.6 or higher, you can change the default limit of 500 rows shown in the **Results** tab when you run a query. To adjust the setting you can click on **Change row display** next to the displayed rows. Keep in mind that you can't set it higher than 10,000 rows. If you refresh the page or close your development session, the default limit will go back to 500 rows. * **Specify records returned:** The IDE also supports `SELECT TOP #`, which specifies the number of records to return. 2. **Compile button —** The **Compile** button compiles the saved or unsaved SQL code and displays it in the **Compiled code** tab. Starting from dbt v1.6 or higher, when you save changes to a model, you can compile its code with the model's specific context. This context is similar to what you'd have when building the model and involves useful context variables like `{{ this }} `or `{{ is_incremental() }}`. 3. **Build button —** The build button allows users to quickly access dbt commands related to the active model in the file editor. The available commands include dbt build, dbt test, and dbt run, with options to include only the current resource, the resource and its upstream dependencies, the resource, and its downstream dependencies, or the resource with all dependencies. This menu is available for all executable nodes. 4. **Lint button** — The **Lint** button runs the [linter](https://docs.getdbt.com/docs/cloud/studio-ide/lint-format.md) on the active file in the file editor. The linter checks for syntax errors and style issues in your code and displays the results in the **Code quality** tab. 5. **dbt Copilot** — [dbt Copilot](https://docs.getdbt.com/docs/cloud/dbt-copilot.md) is an AI assistant integrated into the Studio IDE. Use the quick-action buttons to generate documentation, tests, semantic models, and metrics with a single click. The Copilot panel also provides access to the [Developer agent](https://docs.getdbt.com/docs/dbt-ai/developer-agent.md), which applies natural language prompts to generate or refactor models, semantic models, tests, and documentation autonomously. Select **Ask** or **Code** mode in the bottom toolbar to activate the Developer agent. [Starter](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing")[Enterprise](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing")[Enterprise +](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing") 6. **Commands tab** — View the most recently run [dbt commands](https://docs.getdbt.com/reference/dbt-commands.md) from your current IDE session, their results, and relevant system logs. 7. **Problems tab** — You must be running the dbt Fusion engine to utilize the problems tab. Gain insights into problems with your dbt project that may prevent it from running properly in Fusion as you edit and before you execute runs. [![Preview results show up in the Results console tab](/img/docs/dbt-cloud/cloud-ide/ide-problems-tab.png?v=2 "Preview results show up in the Results console tab")](#)Preview results show up in the Results console tab 8. **Results tab** — The Results console tab displays the most recent Preview results in tabular format. [![Preview results show up in the Results console tab](/img/docs/dbt-cloud/cloud-ide/results-console-tab.png?v=2 "Preview results show up in the Results console tab")](#)Preview results show up in the Results console tab 9. **Code quality tab** — The Code quality tab displays the results of the linter on the active file in the File editor. It allows you to view code errors, provides code quality visibility and management, and displays the SQLFluff version used. 10. **Compiled code tab —** The Compile generates the compiled code when the Compile button is executed. The Compiled code tab displays the compiled SQL code for the active file in the file editor. [![Compile results show up in the Compiled Code tab](/img/docs/dbt-cloud/cloud-ide/compiled-code-console-tab.png?v=2 "Compile results show up in the Compiled Code tab")](#)Compile results show up in the Compiled Code tab 11. **Lineage tab —** The Lineage tab in the file editor displays the active model's lineage or DAG. By default, it shows two degrees of lineage in both directions (`2+model_name+2`), however, you can change it to +model+ (full DAG). To use the lineage: * Double-click a node in the DAG to open that file in a new tab * Expand or shrink the DAG using node selection syntax. * Note, the `--exclude` flag isn't supported. [![View resource lineage in the Lineage tab](/img/docs/dbt-cloud/cloud-ide/lineage-console-tab.png?v=2 "View resource lineage in the Lineage tab")](#)View resource lineage in the Lineage tab #### Invocation history[​](#invocation-history "Direct link to Invocation history") The Invocation History Drawer stores information on dbt invocations in the IDE. When you invoke a command, like executing a dbt command such as `dbt run`, the associated logs are displayed in the Invocation History Drawer. You can open the drawer in multiple ways: * Clicking the `^` icon next to the Command bar on the lower left of the page * Typing a dbt command and pressing enter * Or pressing Control-backtick (or Ctrl + \`) [![The Invocation History Drawer returns a log and detail of all your dbt invocations.](/img/docs/dbt-cloud/cloud-ide/ide-inv-history-drawer.png?v=2 "The Invocation History Drawer returns a log and detail of all your dbt invocations.")](#)The Invocation History Drawer returns a log and detail of all your dbt invocations. 1. **Invocation History list —** The left-hand panel of the Invocation History Drawer displays a list of previous invocations in the Studio IDE, including the command, branch name, command status, and elapsed time. 2. **Invocation Summary —** The Invocation Summary, located above **System Logs**, displays information about a selected command from the Invocation History list, such as the command, its status (`Running` if it's still running), the git branch that was active during the command, and the time the command was invoked. 3. **System Logs toggle —** The System Logs toggle, located under the Invocation Summary, allows the user to see the full stdout and debug logs for the entirety of the invoked command. 4. **Command Control button —** Use the Command Control button, located on the right side, to control your invocation and cancel or rerun a selected run. [![The Invocation History list displays a list of previous invocations in the IDE](/img/docs/dbt-cloud/cloud-ide/ide-results.png?v=2 "The Invocation History list displays a list of previous invocations in the IDE")](#)The Invocation History list displays a list of previous invocations in the IDE 5. **Node Summary tab —** Clicking on the Results Status Tabs will filter the Node Status List based on their corresponding status. The available statuses are Pass (successful invocation of a node), Warn (test executed with a warning), Error (database error or test failure), Skip (nodes not run due to upstream error), and Queued (nodes that have not executed yet). 6. **Node result toggle —** After running a dbt command, information about each executed node can be found in a Node Result toggle, which includes a summary and debug logs. The Node Results List lists every node that was invoked during the command. 7. **Node result list —** The Node result list shows all the Node Results used in the dbt run, and you can filter it by clicking on a Result Status tab. #### Modals and Menus[​](#modals-and-menus "Direct link to Modals and Menus") Use menus and modals to interact with Studio IDE and access useful options to help your development workflow. ###### Editor tab menu[​](#editor-tab-menu "Direct link to Editor tab menu") To interact with open editor tabs, right-click any tab to access the helpful options in the file tab menu. [![ Right-click a tab to view the Editor tab menu options](/img/docs/dbt-cloud/cloud-ide/editor-tab-menu-with-save.png?v=2 " Right-click a tab to view the Editor tab menu options")](#) Right-click a tab to view the Editor tab menu options ###### Global command shortcut[​](#global-command-shortcut "Direct link to Global command shortcut") The global command shortcut provides helpful shortcuts to interact with the Studio IDE, such as git actions, specialized dbt commands, and compile, and preview actions, among others. To open the menu, use Command-P or Control-P. [![The Command History returns a log and detail of all your dbt invocations.](/img/docs/dbt-cloud/cloud-ide/ide-global-command-palette-with-save.png?v=2 "The Command History returns a log and detail of all your dbt invocations.")](#)The Command History returns a log and detail of all your dbt invocations. ###### Studio IDE Status modal[​](#-status-modal "Direct link to -status-modal") The Studio IDE Status modal shows the current error message and debug logs for the server. This also contains an option to restart the Studio IDE. Open this by clicking on the Studio IDE Status button. [![The Command History returns a log and detail of all your dbt invocations.](/img/docs/dbt-cloud/cloud-ide/ide-status-modal-with-save.png?v=2 "The Command History returns a log and detail of all your dbt invocations.")](#)The Command History returns a log and detail of all your dbt invocations. ###### Commit to a new branch[​](#commit-to-a-new-branch "Direct link to Commit to a new branch") Edit directly on your protected primary git branch and commit those changes to a new branch when ready. [![Commit changes to a new branch](/img/docs/dbt-cloud/using-dbt-cloud/create-new-branch.png?v=2 "Commit changes to a new branch")](#)Commit changes to a new branch ###### Commit Changes modal[​](#commit-changes-modal "Direct link to Commit Changes modal") The Commit Changes modal is accessible via the Git Actions button to commit all changes or via the Version Control Options menu to commit individual changes. Once you enter a commit message, you can use the modal to commit and sync the selected changes. [![The Commit Changes modal is how users commit changes to their branch.](/img/docs/dbt-cloud/cloud-ide/commit-changes-modal.png?v=2 "The Commit Changes modal is how users commit changes to their branch.")](#)The Commit Changes modal is how users commit changes to their branch. ###### Change Branch modal[​](#change-branch-modal "Direct link to Change Branch modal") The Change Branch modal allows users to switch git branches in the Studio IDE. It can be accessed through the **Change Branch** link or the **Git actions** button under the **Version control** menu. [![The Commit Changes modal is how users change their branch.](/img/docs/dbt-cloud/cloud-ide/change-branch-modal.png?v=2 "The Commit Changes modal is how users change their branch.")](#)The Commit Changes modal is how users change their branch. ###### Prune branches modal[​](#prune-branches-modal "Direct link to Prune branches modal") The Prune branches modal allows users to delete local branches that have been deleted from the remote repository, keeping your branch management tidy. This is accessible through the **Git actions** button under the [**Version control** menu](#editing-features). Note that this won't delete the branch you're currently on. Pruning branches isn't available for managed repositories because they don't have a typical remote setup, which prevents remote branch deletion. [![The Prune branches modal allows users to delete local branches that have already been deleted from the remote repository.](/img/docs/dbt-cloud/cloud-ide/prune-branch-modal.png?v=2 "The Prune branches modal allows users to delete local branches that have already been deleted from the remote repository.")](#)The Prune branches modal allows users to delete local branches that have already been deleted from the remote repository. ###### Revert Uncommitted Changes modal[​](#revert-uncommitted-changes-modal "Direct link to Revert Uncommitted Changes modal") The Revert Uncommitted Changes modal is how users revert changes in the IDE. This is accessible via the `Revert File` option above the Version Control Options menu, or via the Git Actions button when there are saved, uncommitted changes in the IDE. [![The Commit Changes modal is how users change their branch.](/img/docs/dbt-cloud/cloud-ide/revert-uncommitted-changes-with-save.png?v=2 "The Commit Changes modal is how users change their branch.")](#)The Commit Changes modal is how users change their branch. ###### Studio IDE Options menu[​](#-options-menu "Direct link to -options-menu") Access the Studio IDE Options menu by clicking the three-dot menu located at the bottom right corner of the Studio IDE. This menu contains global options: * View status details, including the Studio IDE Status modal * Restart the Studio IDE * Reinstall dependencies * Clean dbt project * [Check & fix deprecations](https://docs.getdbt.com/docs/cloud/studio-ide/autofix-deprecations.md) * Rollback your repo to remote to refresh your git state and view status details [![Access the IDE options menu to switch to dark or light mode, restart the IDE, rollback to remote, or view the IDE status](/img/docs/dbt-cloud/cloud-ide/ide-options-menu-with-save.png?v=2 "Access the IDE options menu to switch to dark or light mode, restart the IDE, rollback to remote, or view the IDE status")](#)Access the IDE options menu to switch to dark or light mode, restart the IDE, rollback to remote, or view the IDE status #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Install dbt CLI The dbt platform natively supports developing using a command line interface (CLI), empowering team members to contribute with enhanced flexibility and collaboration. The dbt CLI allows you to run dbt commands against your dbt platform development environment from your local command line. CLI compatibility The dbt CLI is a dbt platform tool available to users on any [available plan](https://www.getdbt.com/pricing). It is not compatible with existing installations of the dbt Core or dbt Fusion engine CLIs. dbt commands run against the platform's infrastructure and benefit from: * Secure credential storage in the dbt platform * [Automatic deferral](https://docs.getdbt.com/docs/cloud/about-cloud-develop-defer.md) of build artifacts to your Cloud project's production environment * Speedier, lower-cost builds * Support for dbt Mesh ([cross-project `ref`](https://docs.getdbt.com/docs/mesh/govern/project-dependencies.md)) * Significant platform improvements, to be released over the coming months [![Diagram of how the dbt CLI works with dbt's infrastructure to run dbt commands from your local command line.](/img/docs/dbt-cloud/cloud-cli-overview.jpg?v=2 "Diagram of how the dbt CLI works with dbt's infrastructure to run dbt commands from your local command line.")](#)Diagram of how the dbt CLI works with dbt's infrastructure to run dbt commands from your local command line. #### Prerequisites[​](#prerequisites "Direct link to Prerequisites") The dbt CLI is available in all [deployment regions](https://docs.getdbt.com/docs/cloud/about-cloud/access-regions-ip-addresses.md) and for both multi-tenant and single-tenant accounts. #### Install dbt CLI[​](#install-dbt-cli "Direct link to Install dbt CLI") You can install the dbt CLI via the command line by using one of the following methods: * macOS (brew) * Windows (native executable) * Linux (native executable) * Existing dbt Core users (pip) Before you begin, make sure you have [Homebrew installed](http://brew.sh/) in your code editor or command line terminal. Refer to the [FAQs](#faqs) if your operating system runs into path conflicts. 1. Verify that you don't already have dbt Core installed by running the following command: ```bash which dbt ``` If the output is `dbt not found`, then that confirms you don't have it installed. Run `pip uninstall dbt` to uninstall dbt Core If you've installed dbt Core globally in some other way, uninstall it first before proceeding: ```bash pip uninstall dbt ``` 2. Install the dbt CLI with Homebrew: * First, remove the `dbt-labs` tap, the separate repository for packages, from Homebrew. This prevents Homebrew from installing packages from that repository: ```bash brew untap dbt-labs/dbt ``` * Then, add and install the dbt CLI as a package: ```bash brew tap dbt-labs/dbt-cli brew install dbt ``` If you have multiple taps, use `brew install dbt-labs/dbt-cli/dbt`. 3. Verify your installation by running `dbt --help` in the command line. If you see the following output, you installed it correctly: ```bash The dbt CLI - an ELT tool for running SQL transformations and data models in dbt... ``` If you don't see this output, check that you've deactivated pyenv or venv and don't have a global dbt version installed. * Note that you no longer need to run the `dbt deps` command when your environment starts. Previously, initialization required this step. However, you should still run `dbt deps` if you make any changes to your `packages.yml` file. 4. Clone your repository to your local computer using `git clone`. For example, to clone a GitHub repo using HTTPS format, run `git clone https://github.com/YOUR-USERNAME/YOUR-REPOSITORY`. 5. After cloning your repo, [configure](https://docs.getdbt.com/docs/cloud/configure-cloud-cli.md) the dbt CLI for your dbt project. This lets you run dbt commands like [`dbt environment show`](https://docs.getdbt.com/reference/commands/dbt-environment.md) to view your dbt configuration or `dbt compile` to compile your project and validate models and tests. You can also add, edit, and synchronize files with your repo. Refer to the [FAQs](#faqs) if your operating system runs into path conflicts. 1. Download the latest Windows release for your platform from [GitHub](https://github.com/dbt-labs/dbt-cli/releases). 2. Extract the `dbt.exe` executable into the same folder as your dbt project. info Advanced users can configure multiple projects to use the same dbt CLI by: 1. Placing the executable file (`.exe`) in the "Program Files" folder 2. [Adding it to their Windows PATH environment variable](https://medium.com/@kevinmarkvi/how-to-add-executables-to-your-path-in-windows-5ffa4ce61a53) 3. Saving it where needed Note that if you're using VS Code, you must restart it to pick up modified environment variables. 4. Verify your installation by running `./dbt --help` in the command line. If you see the following output, you installed it correctly: ```bash The dbt CLI - an ELT tool for running SQL transformations and data models in dbt... ``` If you don't see this output, check that you've deactivated pyenv or venv and don't have a global dbt version installed. * Note that you no longer need to run the `dbt deps` command when your environment starts. Previously, initialization required this step. However, you should still run `dbt deps` if you make any changes to your `packages.yml` file. 5. Clone your repository to your local computer using `git clone`. For example, to clone a GitHub repo using HTTPS format, run `git clone https://github.com/YOUR-USERNAME/YOUR-REPOSITORY`. 6. After cloning your repo, [configure](https://docs.getdbt.com/docs/cloud/configure-cloud-cli.md) the dbt CLI for your dbt project. This lets you run dbt commands like [`dbt environment show`](https://docs.getdbt.com/reference/commands/dbt-environment.md) to view your dbt configuration or `dbt compile` to compile your project and validate models and tests. You can also add, edit, and synchronize files with your repo. Refer to the [FAQs](#faqs) if your operating system runs into path conflicts. 1. Download the latest Linux release for your platform from [GitHub](https://github.com/dbt-labs/dbt-cli/releases). (Pick the file based on your CPU architecture) 2. Extract the `dbt-cloud-cli` binary to the same folder as your dbt project. ```bash tar -xf dbt_0.29.9_linux_amd64.tar.gz ./dbt --version ``` info Advanced users can configure multiple projects to use the same dbt CLI executable by adding it to their PATH environment variable in their shell profile. 3. Verify your installation by running `./dbt --help` in the command line. If you see the following output, you installed it correctly: ```bash The dbt CLI - an ELT tool for running SQL transformations and data models in dbt... ``` If you don't see this output, check that you've deactivated pyenv or venv and don't have a global dbt version installed. * Note that you no longer need to run the `dbt deps` command when your environment starts. Previously, initialization required this step. However, you should still run `dbt deps` if you make any changes to your `packages.yml` file. 4. Clone your repository to your local computer using `git clone`. For example, to clone a GitHub repo using HTTPS format, run `git clone https://github.com/YOUR-USERNAME/YOUR-REPOSITORY`. 5. After cloning your repo, [configure](https://docs.getdbt.com/docs/cloud/configure-cloud-cli.md) the dbt CLI for your dbt project. This lets you run dbt commands like [`dbt environment show`](https://docs.getdbt.com/reference/commands/dbt-environment.md) to view your dbt configuration or `dbt compile` to compile your project and validate models and tests. You can also add, edit, and synchronize files with your repo. If you already have dbt Core installed, the dbt CLI may conflict. Here are some considerations: * **Prevent conflicts**
Use both the dbt CLI and dbt Core with `pip` and create a new virtual environment.

* **Use both dbt CLI and dbt Corewith brew or native installs**
If you use Homebrew, consider aliasing the dbt CLI as "dbt-cloud" to avoid conflict. For more details, check the [FAQs](#faqs) if your operating system experiences path conflicts.

* **Reverting to dbt Core from the dbt CLI**
If you've already installed the dbt CLI and need to switch back to dbt Core:
* Uninstall the dbt CLI using the command: `pip uninstall dbt` * Reinstall dbt Core using the following command, replacing "adapter\_name" with the appropriate adapter name: ```shell python -m pip install dbt-adapter_name --force-reinstall ``` For example, if I used Snowflake as an adapter, I would run: `python -m pip install dbt-snowflake --force-reinstall` *** Before installing the dbt CLI, make sure you have Python installed and your virtual environment (venv or pyenv) configured. If you already have a Python environment configured, you can skip to the [pip installation step](#install-dbt-cloud-cli-in-pip). ##### Install a virtual environment[​](#install-a-virtual-environment "Direct link to Install a virtual environment") We recommend using virtual environments (venv) to namespace `cloud-cli`. 1. Create a new virtual environment named "dbt-cloud" with this command: ```shell python3 -m venv dbt-cloud ``` 2. Activate the virtual environment each time you create a shell window or session, depending on your operating system: * For Mac and Linux, use: `source dbt-cloud/bin/activate`
* For Windows, use: `dbt-env\Scripts\activate` 3. (Mac and Linux only) Create an alias to activate your dbt environment with every new shell window or session. You can add the following to your shell's configuration file (for example, `$HOME/.bashrc, $HOME/.zshrc`) while replacing `` with the path to your virtual environment configuration: ```shell alias env_dbt='source /bin/activate' ``` ##### Install dbt CLI in pip[​](#install-dbt-cli-in-pip "Direct link to Install dbt CLI in pip") 1. (Optional) If you already have dbt Core installed, this installation will override that package. Check your dbt Core version in case you need to reinstall it later by running the following command : ```bash dbt --version ``` 2. Make sure you're in your virtual environment and run the following command to install the dbt CLI: ```bash pip install dbt --no-cache-dir ``` If there are installation issues, running the command with the `--force-reinstall` argument might help: ```bash pip install dbt --no-cache-dir --force-reinstall ``` 3. (Optional) To revert to dbt Core, first uninstall both the dbt CLI and dbt Core. Then reinstall dbt Core. ```bash pip uninstall dbt-core dbt pip install dbt-adapter_name --force-reinstall ``` 4. Clone your repository to your local computer using `git clone`. For example, to clone a GitHub repo using HTTPS format, run `git clone https://github.com/YOUR-USERNAME/YOUR-REPOSITORY`. 5. After cloning your repo, [configure](https://docs.getdbt.com/docs/cloud/configure-cloud-cli.md) the dbt CLI for your dbt project. This lets you run dbt commands like [`dbt environment show`](https://docs.getdbt.com/reference/commands/dbt-environment.md) to view your dbt configuration or `dbt compile` to compile your project and validate models and tests. You can also add, edit, and synchronize files with your repo. #### Update dbt CLI[​](#update-dbt-cli "Direct link to Update dbt CLI") The following instructions explain how to update the dbt CLI to the latest version depending on your operating system. * macOS (brew) * Windows (executable) * Linux (executable) * Existing dbt Core users (pip) To update the dbt CLI, run `brew update` and then `brew upgrade dbt`. To update, follow the [Windows installation instructions](https://docs.getdbt.com/docs/cloud/cloud-cli-installation.md?install=windows#install-dbt-cloud-cli) and replace the existing `dbt.exe` executable with the new one. To update, follow the [Linux installation instructions](https://docs.getdbt.com/docs/cloud/cloud-cli-installation.md?install=linux#install-dbt-cloud-cli) and replace the existing `dbt` executable with the new one. To update: * Make sure you're in your virtual environment * Run `python -m pip install --upgrade dbt`. #### Considerations[​](#considerations "Direct link to Considerations") The dbt CLI doesn't currently support relative paths in the [`packages.yml` file](https://docs.getdbt.com/docs/build/packages.md). Instead, use the [Studio IDE](https://docs.getdbt.com/docs/cloud/studio-ide/develop-in-studio.md), which supports relative paths in this scenario. Here's an example of a [local package](https://docs.getdbt.com/docs/build/packages.md#local-packages) configuration in the `packages.yml` that won't work with the dbt CLI: ```yaml # repository_root/my_dbt_project_in_a_subdirectory/packages.yml packages: - local: ../shared_macros ``` In this example, `../shared_macros` is a relative path that tells dbt to look for: * `..` — Go one directory up (to `repository_root`). * `/shared_macros` — Find the `shared_macros` folder in the root directory. To work around this limitation, use the [Studio IDE](https://docs.getdbt.com/docs/cloud/studio-ide/develop-in-studio.md), which fully supports relative paths in `packages.yml`. #### FAQs[​](#faqs "Direct link to FAQs")  What's the difference between the dbt CLI and dbt Core? The dbt CLI and [dbt Core](https://github.com/dbt-labs/dbt-core), an open-source project, are both command line tools that enable you to run dbt commands. The key distinction is that the dbt CLI is tailored for the dbt platform's infrastructure and integrates with all its [features](https://docs.getdbt.com/docs/cloud/about-cloud/dbt-cloud-features).  How do I run both the dbt CLI and dbt Core? For compatibility, both the dbt CLI and dbt Core are invoked by running `dbt`. This can create path conflicts if your operating system selects one over the other based on your $PATH environment variable (settings). If you have dbt Core installed locally, either: 1. Install using the `pip3 install dbt` [pip](https://docs.getdbt.com/docs/cloud/cloud-cli-installation.md?install=pip#install-dbt-cloud-cli) command. 2. Install natively, ensuring you either deactivate the virtual environment containing dbt Core or create an alias for the dbt CLI. 3. (Advanced users) Install natively, but modify the $PATH environment variable to correctly point to the dbt CLI binary to use both dbt CLI and dbt Core together. You can always uninstall the dbt CLI to return to using dbt Core.  How to create an alias? To create an alias for the dbt CLI:
1. Open your shell's profile configuration file. Depending on your shell and system, this could be `~/.bashrc`, `~/.bash_profile`, `~/.zshrc`, or another file.
2. Add an alias that points to the dbt CLI binary. For example:`alias dbt-cloud="path_to_dbt_cloud_cli_binary` Replace `path_to_dbt_cloud_cli_binary` with the actual path to the dbt CLI binary, which is `/opt/homebrew/bin/dbt`. With this alias, you can use the command `dbt-cloud` to invoke the dbt CLI.
3. Save the file and then either restart your shell or run `source` on the profile file to apply the changes. As an example, in bash you would run: `source ~/.bashrc`
4. Test and use the alias to run commands:
* To run the dbt CLI, use the `dbt-cloud` command: `dbt-cloud command_name`. Replace 'command\_name' with the specific dbt command you want to execute.
* To run the dbt Core, use the `dbt` command: `dbt command_name`. Replace 'command\_name' with the specific dbt command you want to execute.
This alias will allow you to use the `dbt-cloud` command to invoke the dbt CLI while having dbt Core installed natively.  Why am I receiving a \`Stuck session\` error when trying to run a new command? The dbt CLI allows only one command that writes to the data warehouse at a time. If you attempt to run multiple write commands simultaneously (for example, `dbt run` and `dbt build`), you will encounter a `stuck session` error. To resolve this, cancel the specific invocation by passing its ID to the cancel command. For more information, refer to [parallel execution](https://docs.getdbt.com/reference/dbt-commands.md#parallel-execution).  I'm getting a \`Session occupied\` error in dbt CLI? If you're receiving a `Session occupied` error in the dbt CLI or if you're experiencing a long-running session, you can use the `dbt invocation list` command in a separate terminal window to view the status of your active session. This helps debug the issue and identify the arguments that are causing the long-running session. To cancel an active session, use the `Ctrl + Z` shortcut. To learn more about the `dbt invocation` command, see the [dbt invocation command reference](https://docs.getdbt.com/reference/commands/invocation.md). Alternatively, you can reattach to your existing session with `dbt reattach` and then press `Control-C` and choose to cancel the invocation. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Lint and format your code Enhance your development workflow by integrating with popular linters and formatters like [SQLFluff](https://sqlfluff.com/), [sqlfmt](http://sqlfmt.com/), [Black](https://black.readthedocs.io/en/latest/), and [Prettier](https://prettier.io/). Leverage these powerful tools directly in the Studio IDE without interrupting your development flow. Details What are linters and formatters? Linters analyze code for errors, bugs, and style issues, while formatters fix style and formatting rules. Read more about when to use linters or formatters in the [FAQs](#faqs) In the Studio IDE, you can perform linting, auto-fix, and formatting on five different file types: * SQL — [Lint](#lint) and fix with SQLFluff, and [format](#format) with sqlfmt * YAML, Markdown, and JSON — Format with Prettier * Python — Format with Black Each file type has its own unique linting and formatting rules. You can [customize](#customize-linting) the linting process to add more flexibility and enhance problem and style detection. By default, the IDE uses sqlfmt rules to format your code, making it convenient to use right away. However, if you have a file named `.sqlfluff` in the root directory of your dbt project, the IDE will default to SQLFluff rules instead. [![Use SQLFluff to lint/format your SQL code, and view code errors in the Code Quality tab.](/img/docs/dbt-cloud/cloud-ide/sqlfluff.gif?v=2 "Use SQLFluff to lint/format your SQL code, and view code errors in the Code Quality tab.")](#)Use SQLFluff to lint/format your SQL code, and view code errors in the Code Quality tab. [![Use sqlfmt to format your SQL code.](/img/docs/dbt-cloud/cloud-ide/sqlfmt.gif?v=2 "Use sqlfmt to format your SQL code.")](#)Use sqlfmt to format your SQL code. [![Format YAML, Markdown, and JSON files using Prettier.](/img/docs/dbt-cloud/cloud-ide/prettier.gif?v=2 "Format YAML, Markdown, and JSON files using Prettier.")](#)Format YAML, Markdown, and JSON files using Prettier. [![Use the config button to select your tool.](/img/docs/dbt-cloud/cloud-ide/ide-sql-popup.png?v=2 "Use the config button to select your tool.")](#)Use the config button to select your tool. [![Customize linting by configuring your own linting code rules, including dbtonic linting/styling.](/img/docs/dbt-cloud/cloud-ide/ide-sqlfluff-config.png?v=2 "Customize linting by configuring your own linting code rules, including dbtonic linting/styling.")](#)Customize linting by configuring your own linting code rules, including dbtonic linting/styling. #### Lint[​](#lint "Direct link to Lint") With the Studio IDE, you can seamlessly use [SQLFluff](https://sqlfluff.com/), a configurable SQL linter, to warn you of complex functions, syntax, formatting, and compilation errors. This integration allows you to run checks, fix, and display any code errors directly within the Cloud Studio IDE: * Works with Jinja and SQL. * Comes with built-in [linting rules](https://docs.sqlfluff.com/en/stable/rules.html). You can also [customize](#customize-linting) your own linting rules. * Empowers you to [enable linting](#enable-linting) with options like **Lint** (displays linting errors and recommends actions) or **Fix** (auto-fixes errors in the Studio IDE). * Displays a **Code Quality** tab to view code errors, provides code quality visibility and management. Linting considerations * The Studio IDE runs linting using the dbt Core engine, even when your development environment uses the **Latest Fusion** release track. For more information, see [Fusion limitations](https://docs.getdbt.com/docs/fusion/supported-features.md#limitations). * Linting doesn't support ephemeral models in dbt v1.5 and lower. Refer to the [FAQs](#faqs) for more info. ##### Enable linting[​](#enable-linting "Direct link to Enable linting") Linting is available on all branches, including your protected primary git branch. Since the Studio IDE prevents commits to the protected branch, it prompts you to commit those changes to a new branch. 1. To enable linting, open a `.sql` file and click the **Code Quality** tab. 2. Click on the **` Config`** button on the bottom right side of the [console section](https://docs.getdbt.com/docs/cloud/studio-ide/ide-user-interface.md#console-section), below the **File editor**. 3. In the code quality tool config pop-up, you have the option to select **sqlfluff** or **sqlfmt**. 4. To lint your code, select the **sqlfluff** radio button. (Use sqlfmt to [format](#format) your code) 5. Once you've selected the **sqlfluff** radio button, go back to the console section (below the **File editor**) to select the **Lint** or **Fix** dropdown button: * **Lint** button — Displays linting issues in the Studio IDE as wavy underlines in the **File editor**. You can hover over an underlined issue to display the details and actions, including a **Quick Fix** option to fix all or specific issues. After linting, you'll see a message confirming the outcome. Linting doesn't rerun after saving. Click **Lint** again to rerun linting. * **Fix** button — Automatically fixes linting errors in the **File editor**. When fixing is complete, you'll see a message confirming the outcome. * Use the **Code Quality** tab to view and debug any code errors. [![Use the Lint or Fix button in the console section to lint or auto-fix your code.](/img/docs/dbt-cloud/cloud-ide/ide-lint-format-console.gif?v=2 "Use the Lint or Fix button in the console section to lint or auto-fix your code.")](#)Use the Lint or Fix button in the console section to lint or auto-fix your code. ##### Lint multiple files[​](#lint-multiple-files "Direct link to Lint multiple files") You can lint multiple SQL files at once, depending on how you are working with dbt. The behavior differs between the Studio IDE, dbt Core, and the dbt CLI. * **Studio IDE:** By default, linting runs against all modified `.sql` files in your project on your current branch. See [Snapshot linting](#snapshot-linting) for more information. * **dbt Core:** Does not include a built-in linter. To lint SQL files in your project, use a third-party linter such as SQLFluff configured to use the [dbt templater](https://docs.sqlfluff.com/en/stable/configuration/templating/dbt.html). You can lint multiple files by specifying one or more file or directory paths as arguments to the command. * **dbt CLI:** Supports the same linting [commands](https://docs.getdbt.com/docs/cloud/configure-cloud-cli.md#lint-sql-files) as dbt Core: ```text dbt sqlfluff lint [PATHS]... [flags] ``` If no path is specified (for example, `dbt sqlfluff lint`), all SQL files in the project are linted. ##### Customize linting[​](#customize-linting "Direct link to Customize linting") SQLFluff is a configurable SQL linter, which means you can configure your own linting rules instead of using the default linting settings in the IDE. You can exclude files and directories by using a standard `.sqlfluffignore` file. Learn more about the syntax in the [.sqlfluffignore syntax docs](https://docs.sqlfluff.com/en/stable/configuration.html#id2). To configure your own linting rules: 1. Create a new file in the root project directory (the parent or top-level directory for your files). Note: The root project directory is the directory where your `dbt_project.yml` file resides. 2. Name the file `.sqlfluff` (make sure you add the `.` before `sqlfluff`). 3. [Create](https://docs.sqlfluff.com/en/stable/configuration/setting_configuration.html#new-project-configuration) and add your custom config code. 4. Save and commit your changes. 5. Restart the Studio IDE. 6. Test it out and happy linting! ###### Snapshot linting[​](#snapshot-linting "Direct link to Snapshot linting") By default, dbt lints all modified `.sql` files in your project, including snapshots. [Snapshots](https://docs.getdbt.com/docs/build/snapshots.md) can be defined in YAML *and* `.sql` files, but their SQL isn't lintable and can cause errors during linting. To prevent SQLFluff from linting snapshot files, add the snapshots directory to your `.sqlfluffignore` file (for example `snapshots/`). Note that you should explicitly exclude snapshots in your `.sqlfluffignore` file since dbt doesn't automatically ignore snapshots on the backend. ##### Configure dbtonic linting rules[​](#configure-dbtonic-linting-rules "Direct link to Configure dbtonic linting rules") Refer to the [Jaffle shop SQLFluff config file](https://github.com/dbt-labs/jaffle-shop-template/blob/main/.sqlfluff) for dbt-specific (or dbtonic) linting rules we use for our own projects: dbtonic config code example provided by dbt Labs ```text [sqlfluff] templater = dbt # This change (from Jinja to dbt templater) will make linting slower # because linting will first compile dbt code into data warehouse code. runaway_limit = 10 max_line_length = 80 indent_unit = space [sqlfluff:indentation] tab_space_size = 4 [sqlfluff:layout:type:comma] spacing_before = touch line_position = trailing [sqlfluff:rules:capitalisation.keywords] capitalisation_policy = lower [sqlfluff:rules:aliasing.table] aliasing = explicit [sqlfluff:rules:aliasing.column] aliasing = explicit [sqlfluff:rules:aliasing.expression] allow_scalar = False [sqlfluff:rules:capitalisation.identifiers] extended_capitalisation_policy = lower [sqlfluff:rules:capitalisation.functions] capitalisation_policy = lower [sqlfluff:rules:capitalisation.literals] capitalisation_policy = lower [sqlfluff:rules:ambiguous.column_references] # Number in group by group_by_and_order_by_style = implicit ``` For more info on styling best practices, refer to [How we style our SQL](https://docs.getdbt.com/best-practices/how-we-style/2-how-we-style-our-sql.md). [![Customize linting by configuring your own linting code rules, including dbtonic linting/styling.](/img/docs/dbt-cloud/cloud-ide/ide-sqlfluff-config.png?v=2 "Customize linting by configuring your own linting code rules, including dbtonic linting/styling.")](#)Customize linting by configuring your own linting code rules, including dbtonic linting/styling. #### Format[​](#format "Direct link to Format") In the Studio IDE, you can format your code to match style guides with a click of a button. The Studio IDE integrates with formatters like sqlfmt, Prettier, and Black to automatically format code on five different file types — SQL, YAML, Markdown, Python, and JSON: * SQL — Format with [sqlfmt](http://sqlfmt.com/), which provides one way to format your dbt SQL and Jinja. * **Note**: Custom sqlfmt configuration in the Studio IDE is not supported. * YAML, Markdown, and JSON — Format with [Prettier](https://prettier.io/). * Python — Format with [Black](https://black.readthedocs.io/en/latest/). The Cloud Studio IDE formatting integrations take care of manual tasks like code formatting, enabling you to focus on creating quality data models, collaborating, and driving impactful results. ##### Format SQL[​](#format-sql "Direct link to Format SQL") To format your SQL code, dbt integrates with [sqlfmt](http://sqlfmt.com/), which is an uncompromising SQL query formatter that provides one way to format the SQL query and Jinja. By default, the Studio IDE uses sqlfmt rules to format your code, making the **Format** button available and convenient to use immediately. However, if you have a file named .sqlfluff in the root directory of your dbt project, the Studio IDE will default to SQLFluff rules instead. Formatting is available on all branches, including your protected primary git branch. Since the Studio IDE prevents commits to the protected branch, it prompts you to commit those changes to a new branch. 1. Open a `.sql` file and click on the **Code Quality** tab. 2. Click on the **` Config`** button on the right side of the console. 3. In the code quality tool config pop-up, you have the option to select sqlfluff or sqlfmt. 4. To format your code, select the **sqlfmt** radio button. (Use sqlfluff to [lint](#linting) your code). 5. Once you've selected the **sqlfmt** radio button, go to the console section (located below the **File editor**) to select the **Format** button. 6. The **Format** button auto-formats your code in the **File editor**. Once you've auto-formatted, you'll see a message confirming the outcome. [![Use sqlfmt to format your SQL code.](/img/docs/dbt-cloud/cloud-ide/sqlfmt.gif?v=2 "Use sqlfmt to format your SQL code.")](#)Use sqlfmt to format your SQL code. ##### Format YAML, Markdown, JSON[​](#format-yaml-markdown-json "Direct link to Format YAML, Markdown, JSON") To format your YAML, Markdown, or JSON code, dbt integrates with [Prettier](https://prettier.io/), which is an opinionated code formatter. Formatting is available on all branches, including your protected primary git branch. Since the Studio IDE prevents commits to the protected branch, it prompts you to commit those changes to a new branch. 1. Open a `.yml`, `.md`, or `.json` file. 2. In the console section (located below the **File editor**), select the **Format** button to auto-format your code in the **File editor**. Use the **Code Quality** tab to view code errors. 3. Once you've auto-formatted, you'll see a message confirming the outcome. [![Format YAML, Markdown, and JSON files using Prettier.](/img/docs/dbt-cloud/cloud-ide/prettier.gif?v=2 "Format YAML, Markdown, and JSON files using Prettier.")](#)Format YAML, Markdown, and JSON files using Prettier. You can add a configuration file to customize formatting rules for YAML, Markdown, or JSON files using Prettier. The IDE looks for the configuration file based on an order of precedence. For example, it first checks for a "prettier" key in your `package.json` file. For more info on the order of precedence and how to configure files, refer to [Prettier's documentation](https://prettier.io/docs/en/configuration.html). Please note, `.prettierrc.json5`, `.prettierrc.js`, and `.prettierrc.toml` files aren't currently supported. ##### Format Python[​](#format-python "Direct link to Format Python") To format your Python code, dbt integrates with [Black](https://black.readthedocs.io/en/latest/), which is an uncompromising Python code formatter. Formatting is available on all branches, including your protected primary git branch. Since the Studio IDE prevents commits to the protected branch, it prompts you to commit those changes to a new branch. 1. Open a `.py` file. 2. In the console section (located below the **File editor**), select the **Format** button to auto-format your code in the **File editor**. 3. Once you've auto-formatted, you'll see a message confirming the outcome. [![Format Python files using Black.](/img/docs/dbt-cloud/cloud-ide/python-black.gif?v=2 "Format Python files using Black.")](#)Format Python files using Black. #### FAQs[​](#faqs "Direct link to FAQs")  When should I use SQLFluff and when should I use sqlfmt? SQLFluff and sqlfmt are both tools used for formatting SQL code, but some differences may make one preferable to the other depending on your use case.
SQLFluff is a SQL code linter and formatter. This means that it analyzes your code to identify potential issues and bugs, and follows coding standards. It also formats your code according to a set of rules, which are [customizable](#customize-linting), to ensure consistent coding practices. You can also use SQLFluff to keep your SQL code well-formatted and follow styling best practices.
sqlfmt is a SQL code formatter. This means it automatically formats your SQL code according to a set of formatting rules that aren't customizable. It focuses solely on the appearance and layout of the code, which helps ensure consistent indentation, line breaks, and spacing. sqlfmt doesn't analyze your code for errors or bugs and doesn't look at coding issues beyond code formatting.
You can use either SQLFluff or sqlfmt depending on your preference and what works best for you: * Use SQLFluff to have your code linted and formatted (meaning analyze fix your code for errors/bugs, and format your styling). It allows you the flexibility to customize your own rules. * Use sqlfmt to only have your code well-formatted without analyzing it for errors and bugs. You can use sqlfmt out of the box, making it convenient to use right away without having to configure it.  Can I nest \`.sqlfluff\` files? To ensure optimal code quality, consistent code, and styles — it's highly recommended you have one main `.sqlfluff` configuration file in the root folder of your project. Having multiple files can result in various different SQL styles in your project.

However, you can customize and include an additional child `.sqlfluff` configuration file within specific subfolders of your dbt project.

By nesting a `.sqlfluff` file in a subfolder, SQLFluff will apply the rules defined in that subfolder's configuration file to any files located within it. The rules specified in the parent `.sqlfluff` file will be used for all other files and folders outside of the subfolder. This hierarchical approach allows for tailored linting rules while maintaining consistency throughout your project. Refer to [SQLFluff documentation](https://docs.sqlfluff.com/en/stable/configuration.html#configuration-files) for more info.  Can I run SQLFluff commands from the terminal? Currently, running SQLFluff commands from the terminal isn't supported.  What are some considerations when using dbt linting? Currently, the Studio IDE can lint or fix files up to a certain size and complexity. If you attempt to lint or fix files that are too large, taking more than 60 seconds for the dbt backend to process, you will see an 'Unable to complete linting this file' error. To avoid this, break up your model into smaller models (files) so that they are less complex to lint or fix. Note that linting is simpler than fixing so there may be cases where a file can be linted but not fixed. #### Related docs[​](#related-docs "Direct link to Related docs") * [User interface](https://docs.getdbt.com/docs/cloud/studio-ide/ide-user-interface.md) * [Keyboard shortcuts](https://docs.getdbt.com/docs/cloud/studio-ide/keyboard-shortcuts.md) * [SQL linting in CI jobs](https://docs.getdbt.com/docs/deploy/continuous-integration.md#sql-linting) #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Navigate the interface EnterpriseEnterprise + ### Navigate the interface [Enterprise](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing")[Enterprise +](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing") The Canvas interface contains an operator toolbar, operators, canvas, built-in AI, and more to help you access and transform data through a seamless drag-and-drop dbt model creation experience in dbt. This page offers comprehensive definitions and terminology of user interface elements, allowing you to navigate the Canvas landscape with ease. The Canvas interface is composed of: * **Navigation bars** — The top and ledft-side navigation bars contain options for switching between models in the workspace, opening existing or creating new models, uploading CSV data, previewing data and runs, and viewing helpful shortcuts. * **Operator toolbar** — Located at the top of the canvas area, the toolbar displays all the node categories available, as well as tools to help you develop: * **Input:** Source models and data * **Transform:** Data transformation tools * **Output:** Output model configurations * **[Copilot](https://docs.getdbt.com/docs/cloud/build-canvas-copilot.md):** AI tools to help you build fast and efficiently * **SQL:** View your completed model's compiled SQL * **Operators** — Tiles that provide source data, perform specific transformations, and layer configurations (such as model, join, aggregate, filter, and so on). Use connectors to link the operators and build a complete data transformation pipeline. * **Canvas** — The main whiteboard space below the node toolbar. The canvas allows you to create or modify models through a sleek drag-and-drop experience. * **Configuration panel** — Each operator has a configuration panel that opens when you click on it. The configuration panel allows you to configure the operator, review the current model, preview changes to the table, view the SQL code for the node, and delete the operator. #### Operators[​](#operators "Direct link to Operators") The operator toolbar above the canvas contains the different transformation operators available to use. Use each operator to configure or perform specific tasks, like adding filters or joining models by dragging an operator onto the canvas. You can connect operators using the connector line, which allows you to form a complete model for your data transformation. [![Use the operator toolbar to perform different transformation operations.](/img/docs/dbt-cloud/canvas/operators.png?v=2 "Use the operator toolbar to perform different transformation operations.")](#)Use the operator toolbar to perform different transformation operations. Here the following operators are available: ###### Input[​](#input "Direct link to Input") Input operators configure source data: * **Model explorer**: Select the model and columns you want to use. ###### Transform[​](#transform "Direct link to Transform") Transform operators shape your data: * **Join**: Define the join conditions and choose columns from both tables. * **Union:** Perform a `UNION` to remove duplicates or `UNION ALL` to prevent deduplicaation. * **Formula**: Add the formula to create a new column. Use the built-AI code generator to help * **Aggregate**: Specify the aggregation functions and the columns they apply to generate SQL code by clicking on the question mark (?) icon. Enter your prompt and wait to see the results. * **Pivot:** Select the column and values to create a pivot. * **Limit**: Set the maximum number of rows you want to return. * **Order**: Select the columns to sort by and the sort order. * **Filter**: Set the conditions to filter data. * **Rename:** Provide custom alias' for your columns. ###### Output model[​](#output-model "Direct link to Output model") Output operators configure the names and location of your transformed data: * **Output model**: The final transformed dataset generated by a dbt model. You can only have one output model. When you click on each operator, it opens a configuration panel. The configuration panel allows you to configure the operator, review the current model, preview changes to the model, view the SQL code for the node, and delete the operator. [![The Canvas interface that contains a node toolbar and canvas.](/img/docs/dbt-cloud/canvas/canvas.png?v=2 "The Canvas interface that contains a node toolbar and canvas.")](#)The Canvas interface that contains a node toolbar and canvas. If you have any feedback on additional operators that you might need, we'd love to hear it! Please contact your dbt Labs account team and share your thoughts. #### Canvas[​](#canvas "Direct link to Canvas") Canvas has a sleek drag-and-drop interface for creating and modifying dbt SQL models. It's like a digital whiteboard space for easily viewing and delivering trustworthy data. Use the canvas to: * Drag-and-drop operators to create and configure your model(s) * Generate SQL code using the built-in AI generator * Zoom in or out for better visualization * Version-control your dbt models * \[Coming soon] Test and document your created models [![The operator toolbar allows you to select different nodes to configure or perform specific tasks, like adding filters or joining models.](/img/docs/dbt-cloud/canvas/operators.png?v=2 "The operator toolbar allows you to select different nodes to configure or perform specific tasks, like adding filters or joining models.")](#)The operator toolbar allows you to select different nodes to configure or perform specific tasks, like adding filters or joining models. ##### Connector[​](#connector "Direct link to Connector") Connectors allow you to connect your operators to create dbt models. Once you've added operators to the canvas: * Hover over the "+" sign next to the operator and click. * Drag your cursor between the operator's "+" start point to the other node you want to connect to. This should create a connector line. * As an example, to create a join, connect one operator to the "L" (Left) and the other to the "R" (Right). The endpoints are located to the left of the operator so you can easily drag the connectors to the endpoint. [![Click and drag your cursor to connect operators.](/img/docs/dbt-cloud/canvas/connector.png?v=2 "Click and drag your cursor to connect operators.")](#)Click and drag your cursor to connect operators. #### Configuration panel[​](#configuration-panel "Direct link to Configuration panel") Each operator has a configuration side panel that opens when you click on it. The configuration panel allows you to configure the operator, review the current model, preview changes, view the SQL code for the operator, and delete the operator. The configuration side panel has the following: * Configure tab — This section allows you to configure the operator to your specified requirements, such as using the built-in AI code generator to generate SQL. * Input tab — This section allows you to view the data for the current source table. Not available for model operators. * Output tab — This section allows you to preview the data for the modified source model. * Code — This section allows you to view the underlying SQL code for the data transformation. [![A sleek drag-and-drop canvas interface that allows you to create or modify dbt SQL models.](/img/docs/dbt-cloud/canvas/config-panel.png?v=2 "A sleek drag-and-drop canvas interface that allows you to create or modify dbt SQL models.")](#)A sleek drag-and-drop canvas interface that allows you to create or modify dbt SQL models. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Studio IDE keyboard shortcuts The Studio IDE provides keyboard shortcuts, features, and development tips to help you work faster and be more productive. Use this Studio IDE page to help you quickly reference some common operations. | Shortcut description | macOS | Windows | | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | -------------------------------------------------------- | ------------------------------- | | View the full list of editor shortcuts to help your development, such as adding a line comment, changing tab display size, building modified models, changing editor font size, and more. | Fn-F1 | Fn-F1 | | Select a file to open. | Command-O | Control-O | | Close the currently active editor tab. | Option-W | Alt-W | | Preview code. | Command-Enter | Control-Enter | | Compile code. | Command-Shift-Enter | Control-Shift-Enter | | Reveal a list of dbt functions in the editor. | Enter two underscores `__` | Enter two underscores `__` | | Open the command palette to invoke dbt commands and actions. | Command-P / Command-Shift-P | Control-P / Control-Shift-P | | Multi-edit in the editor by selecting multiple lines. | Option-Click / Shift-Option-Command / Shift-Option-Click | Alt-Click / Shift-Alt-Click | | Open the [**Invocation History Drawer**](https://docs.getdbt.com/docs/cloud/studio-ide/ide-user-interface.md#invocation-history) located at the bottom of the IDE. | Control-backtick (or Control + \`) | Control-backtick (or Ctrl + \`) | | Add a block comment to the selected code. SQL files will use the Jinja syntax `({# #})` rather than the SQL one `(/* */)`. | Shift-Option-A | Shift-Alt-A | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | #### Related docs[​](#related-docs "Direct link to Related docs") * [Quickstart guide](https://docs.getdbt.com/guides.md) * [About dbt](https://docs.getdbt.com/docs/cloud/about-cloud/dbt-cloud-features.md) * [Develop in the Cloud](https://docs.getdbt.com/docs/cloud/studio-ide/develop-in-studio.md) #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Supported browsers To have the best experience with dbt, we recommend using the latest versions of the following browsers: * [Google Chrome](https://www.google.com/chrome/) — Latest version is fully supported in dbt * [Mozilla Firefox](https://www.mozilla.org/en-US/firefox/) — Latest version is fully supported in dbt * [Apple Safari](https://www.apple.com/safari/) — Latest version support provided on a best-effort basis * [Microsoft Edge](https://www.microsoft.com/en-us/edge?form=MA13FJ\&exp=e00) — Latest version support provided on a best-effort basis dbt provides two types of browser support: * Fully supported — dbt is fully tested and supported on these browsers. Features display and work as intended. * Best effort — You can access dbt on these browsers. Features may not display or work as intended. You may still be able to access and use dbt even without using the latest recommended browser or an unlisted browser. However, some features might not display as intended. note To improve your experience using dbt, we suggest that you turn off ad blockers. ##### Browser sessions[​](#browser-sessions "Direct link to Browser sessions") A session is a period of time during which you’re signed in to a dbt account from a browser. If you close your browser, it will end your session and log you out. You'll need to log in again the next time you try to access dbt. If you've logged in using [SSO](https://docs.getdbt.com/docs/cloud/manage-access/sso-overview.md), you can customize your maximum session duration, which might vary depending on your identity provider (IdP). #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Tenancy dbt is available in both single (virtual private) and multi-tenant configurations. ##### Multi-tenant[​](#multi-tenant "Direct link to Multi-tenant") The Multi Tenant (SaaS) deployment environment refers to the SaaS dbt application hosted by dbt Labs. This is the most commonly used deployment and is completely managed and maintained by dbt Labs, the makers of dbt. As a SaaS product, a user can quickly [create an account](https://www.getdbt.com/signup/) on our North American servers and get started using the dbt and related services immediately. *If your organization requires cloud services hosted on EMEA or APAC regions*, please [contact us](https://www.getdbt.com/contact/). The deployments are hosted on AWS or Azure and are always kept up to date with the currently supported dbt versions, software updates, and bug fixes. ##### Single tenant[​](#single-tenant "Direct link to Single tenant") The single tenant deployment environment provides a hosted alternative to the multi-tenant (SaaS) dbt environment. While still managed and maintained by dbt Labs, single tenant dbt instances provide dedicated infrastructure in a virtual private cloud (VPC) environment. This is accomplished by spinning up all the necessary infrastructure with a re-usable Infrastructure as Code (IaC) deployment built with [Terraform](https://www.terraform.io/). The single tenant infrastructure lives in a dedicated AWS or Azure account and can be customized with certain configurations, such as firewall rules, to limit inbound traffic or hosting in a specific regions. A few common reasons for choosing a single tenant deployment over the Production SaaS product include: * A requirement that the dbt application be hosted in a dedicated VPC that is logically separated from other customer infrastructure * A desire for multiple isolated dbt instances for testing, development, etc *To learn more about setting up a dbt single tenant deployment, [please contact our sales team](mailto:sales@getdbt.com).* ##### Available features[​](#available-features "Direct link to Available features") The following table outlines which dbt features are supported on the different SaaS options available today. For more information about feature availability, please [contact us](https://www.getdbt.com/contact/). | Feature | AWS Multi-tenant | AWS single tenant | Azure multi-tenant | Azure single tenant | GCP multi-tenant | | --------------------------- | ---------------- | ----------------- | ------------------ | ------------------- | ---------------- | | Audit logs | ✅ | ✅ | ✅ | ✅ | ✅ | | Continuous integration jobs | ✅ | ✅ | ✅ | ✅ | ✅ | | dbt CLI | ✅ | ✅ | ✅ | ✅ | ✅ | | Studio IDE | ✅ | ✅ | ✅ | ✅ | ✅ | | Copilot | ✅ | ✅ | ✅ | ✅ | ✅ | | Catalog | ✅ | ✅ | ✅ | ✅ | ✅ | | Mesh | ✅ | ✅ | ✅ | ✅ | ✅ | | Semantic Layer | ✅ | ✅ | ✅ | ✅ | ✅ | | Discovery API | ✅ | ✅ | ✅ | ✅ | ✅ | | IP restrictions | ✅ | ✅ | ✅ | ✅ | ✅ | | Orchestrator | ✅ | ✅ | ✅ | ✅ | ✅ | | PrivateLink egress | ✅ | ✅ | ✅ | ✅ | ✅ | | PrivateLink ingress | ❌ | ✅ | ❌ | ✅ | ❌ | | Webhooks (Outbound) | ✅ | ✅ | ✅ | ❌ | ❌ | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### The dbt platform features dbt platform (formerly dbt Cloud) is the fastest and most reliable way to deploy dbt. Develop, test, schedule, document, and investigate data models all in one browser-based UI. In addition to providing a hosted architecture for running dbt across your organization, dbt comes equipped with turnkey support for scheduling jobs, CI/CD, hosting documentation, monitoring and alerting, an integrated development environment (Studio IDE), and allows you to develop and run dbt commands from your local command line interface (CLI) or code editor. dbt's [flexible plans](https://www.getdbt.com/pricing/) and features make it well-suited for data teams of any size — sign up for your [free 14-day trial](https://www.getdbt.com/signup/)! [![](/img/icons/dbt-bit.svg)](https://docs.getdbt.com/docs/cloud/cloud-cli-installation.md) ###### [dbt CLI](https://docs.getdbt.com/docs/cloud/cloud-cli-installation.md) [Use the CLI for the dbt platform to develop, test, run, and version control dbt projects and commands, directly from the command line.](https://docs.getdbt.com/docs/cloud/cloud-cli-installation.md) [![](/img/icons/dbt-bit.svg)](https://docs.getdbt.com/docs/cloud/studio-ide/develop-in-studio.md) ###### [dbt Studio IDE](https://docs.getdbt.com/docs/cloud/studio-ide/develop-in-studio.md) [The IDE is the easiest and most efficient way to develop dbt models, allowing you to build, test, run, and version control your dbt projects directly from your browser.](https://docs.getdbt.com/docs/cloud/studio-ide/develop-in-studio.md) [![](/img/icons/dbt-bit.svg)](https://docs.getdbt.com/docs/cloud/canvas.md) ###### [dbt Canvas](https://docs.getdbt.com/docs/cloud/canvas.md) [Develop with Canvas, a seamless drag-and-drop experience that helps analysts quickly create and visualize dbt models in dbt.](https://docs.getdbt.com/docs/cloud/canvas.md) [![](/img/icons/copilot.svg)](https://docs.getdbt.com/docs/cloud/dbt-copilot.md) ###### [dbt Copilot\*](https://docs.getdbt.com/docs/cloud/dbt-copilot.md) [Use dbt Copilot to generate documentation, tests, semantic models, metrics, and SQL code from scratch, giving you the flexibility to modify or fix generated code.](https://docs.getdbt.com/docs/cloud/dbt-copilot.md) [![](/img/icons/dbt-bit.svg)](https://docs.getdbt.com/docs/environments-in-dbt.md) ###### [Manage environments](https://docs.getdbt.com/docs/environments-in-dbt.md) [Set up and manage separate production and development environments in dbt to help engineers develop and test code more efficiently, without impacting users or data.](https://docs.getdbt.com/docs/environments-in-dbt.md) [![](/img/icons/dbt-bit.svg)](https://docs.getdbt.com/docs/deploy/job-scheduler.md) ###### [Schedule and run dbt jobs](https://docs.getdbt.com/docs/deploy/job-scheduler.md) [Create custom schedules to run your production jobs. Schedule jobs by day of the week, time of day, or a recurring interval. Decrease operating costs by using webhooks to trigger CI jobs and the API to start jobs.](https://docs.getdbt.com/docs/deploy/job-scheduler.md) [![](/img/icons/dbt-bit.svg)](https://docs.getdbt.com/docs/deploy/job-notifications.md) ###### [Notifications](https://docs.getdbt.com/docs/deploy/job-notifications.md) [Set up and customize job notifications in dbt to receive email or slack alerts when a job run succeeds, fails, or is cancelled. Notifications alert the right people when something goes wrong instead of waiting for a user to report it.](https://docs.getdbt.com/docs/deploy/job-notifications.md) [![](/img/icons/dbt-bit.svg)](https://docs.getdbt.com/docs/deploy/run-visibility.md) ###### [Run visibility](https://docs.getdbt.com/docs/deploy/run-visibility.md) [View the history of your runs and the model timing dashboard to help identify where improvements can be made to the scheduled jobs.](https://docs.getdbt.com/docs/deploy/run-visibility.md) [![](/img/icons/dbt-bit.svg)](https://docs.getdbt.com/docs/explore/build-and-view-your-docs.md) ###### [Host & share documentation](https://docs.getdbt.com/docs/explore/build-and-view-your-docs.md) [dbt hosts and authorizes access to dbt project documentation, allowing you to generate data documentation on a schedule for your project. Invite teammates to the dbt platform to collaborate and share your project's documentation.](https://docs.getdbt.com/docs/explore/build-and-view-your-docs.md) [![](/img/icons/dbt-bit.svg)](https://docs.getdbt.com/docs/cloud/git/connect-github.md) ###### [Supports GitHub, GitLab, AzureDevOps](https://docs.getdbt.com/docs/cloud/git/connect-github.md) [Seamlessly connect your git account to the dbt platform and provide another layer of security to dbt. Import new repositories, trigger continuous integration, clone repos using HTTPS, and more!](https://docs.getdbt.com/docs/cloud/git/connect-github.md) [![](/img/icons/dbt-bit.svg)](https://docs.getdbt.com/docs/deploy/continuous-integration.md) ###### [Enable Continuous Integration](https://docs.getdbt.com/docs/deploy/continuous-integration.md) [Configure dbt to run your dbt projects in a temporary schema when new commits are pushed to open pull requests. This build-on-PR functionality is a great way to catch bugs before deploying to production, and an essential tool in any analyst's belt.](https://docs.getdbt.com/docs/deploy/continuous-integration.md) [![](/img/icons/dbt-bit.svg)](https://www.getdbt.com/security/) ###### [Security](https://www.getdbt.com/security/) [Manage risk with SOC-2 compliance, CI/CD deployment, RBAC, and ELT architecture.](https://www.getdbt.com/security/) [![](/img/icons/dbt-bit.svg)](https://docs.getdbt.com/docs/cloud-integrations/downstream-exposures.md) ###### [Visualize and orchestrate exposures\*](https://docs.getdbt.com/docs/cloud-integrations/downstream-exposures.md) [Configure downstream exposures automatically from dashboards and understand how models are used in downstream tools. Proactively refresh the underlying data sources during scheduled dbt jobs.](https://docs.getdbt.com/docs/cloud-integrations/downstream-exposures.md) [![](/img/icons/dbt-bit.svg)](https://docs.getdbt.com/docs/use-dbt-semantic-layer/dbt-sl.md) ###### [dbt Semantic Layer\*](https://docs.getdbt.com/docs/use-dbt-semantic-layer/dbt-sl.md) [Use the dbt Semantic Layer to define metrics alongside your dbt models and query them from any integrated analytics tool. Get the same answers everywhere, every time.](https://docs.getdbt.com/docs/use-dbt-semantic-layer/dbt-sl.md) [![](/img/icons/dbt-bit.svg)](https://docs.getdbt.com/docs/dbt-cloud-apis/discovery-api.md) ###### [Discovery API\*](https://docs.getdbt.com/docs/dbt-cloud-apis/discovery-api.md) [Enhance your workflow and run ad-hoc queries, browse schema, or query the dbt Semantic Layer. dbt serves a GraphQL API, which supports arbitrary queries.](https://docs.getdbt.com/docs/dbt-cloud-apis/discovery-api.md) [![](/img/icons/dbt-bit.svg)](https://docs.getdbt.com/docs/explore/explore-projects.md) ###### [dbt Catalog\*](https://docs.getdbt.com/docs/explore/explore-projects.md) [Learn about dbt Catalog and how to interact with it to understand, improve, and leverage your data pipelines.](https://docs.getdbt.com/docs/explore/explore-projects.md) [![](/img/icons/dbt-bit.svg)](https://docs.getdbt.com/docs/explore/dbt-insights.md) ###### [dbt Insights\*](https://docs.getdbt.com/docs/explore/dbt-insights.md) [Learn how to query data and perform exploratory data analysis using dbt Insights.](https://docs.getdbt.com/docs/explore/dbt-insights.md)
\*These features are available on [selected plans](https://www.getdbt.com/pricing/). #### Related docs[​](#related-docs "Direct link to Related docs") * [dbt plans and pricing](https://www.getdbt.com/pricing/) * [Quickstart guides](https://docs.getdbt.com/docs/get-started-dbt.md) * [Studio IDE](https://docs.getdbt.com/docs/cloud/studio-ide/develop-in-studio.md) #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Use dbt Copilot StarterEnterpriseEnterprise + ### Use dbt Copilot [Starter](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing")[Enterprise](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing")[Enterprise +](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing") Use Copilot to generate documentation, tests, semantic models, and code from scratch, giving you the flexibility to modify or fix generated code. Copilot includes the following capabilities: * [Generate resources](#generate-resources): Save time by using Copilot's generation button to generate documentation, tests, and semantic model files during your development in the [Studio IDE](https://docs.getdbt.com/docs/cloud/studio-ide/develop-in-studio.md). * [Generate and edit SQL inline](#generate-and-edit-sql-inline): Use natural language prompts to generate SQL code from scratch or to edit existing SQL file by using keyboard shortcuts or highlighting code in the [Studio IDE](https://docs.getdbt.com/docs/cloud/studio-ide/develop-in-studio.md). * [Build visual models](#build-visual-models): Use Copilot to generate models in [Canvas](https://docs.getdbt.com/docs/cloud/use-canvas.md) with natural language prompts. * [Build queries](#build-queries): Use Copilot to generate queries in [Insights](https://docs.getdbt.com/docs/explore/dbt-insights.md) for exploratory data analysis using natural language prompts. * [dbt Agents](https://docs.getdbt.com/docs/dbt-ai/dbt-agents.md): Delegate entire tasks like building new models end-to-end, refactoring existing models, or analyzing data with natural language — reducing context-switching and letting you stay in flow. Agents like the [Developer agent](https://docs.getdbt.com/docs/dbt-ai/developer-agent.md) and [Analyst agent](https://docs.getdbt.com/docs/dbt-ai/analyst-agent.md) are available in the same Copilot panel. tip Check out our [dbt Copilot on-demand course](https://learn.getdbt.com/learn/course/dbt-copilot/welcome-to-dbt-copilot/welcome-5-mins) to learn how to use Copilot to generate resources, and more! To learn about prompt best practices, check out the [Prompt cookbook](https://docs.getdbt.com/guides/prompt-cookbook.md). #### Generate resources[​](#generate-resources "Direct link to Generate resources") Generate documentation, tests, metrics, and semantic models [resources](https://docs.getdbt.com/docs/build/projects.md) with the click-of-a-button in the [Studio IDE](https://docs.getdbt.com/docs/cloud/studio-ide/develop-in-studio.md) using dbt Copilot, saving you time. To access and use this AI feature: 1. Navigate to the Studio IDE and select a SQL model file under the **File Explorer**. 2. In the **Console** section (under the **File Editor**), click **dbt Copilot** to view the available AI options. 3. Select the available options to generate the YAML config: **Generate Documentation**, **Generate Tests**, **Generate Semantic Model**, or **Generate Metrics**. To generate multiple YAML configs for the same model, click each option separately. dbt Copilot intelligently saves the YAML config in the same file. note [dbt Copilot](https://docs.getdbt.com/docs/cloud/dbt-copilot.md) doesn't yet support generating semantic models with the latest YAML spec. * To generate metrics, you need to first have semantic models defined. * Once defined, click **dbt Copilot** and select **Generate Metrics**. * Write a prompt describing the metrics you want to generate and press enter. * **Accept** or **Reject** the generated code. 4. Verify the AI-generated code. You can update or fix the code as needed. 5. Click **Save As**. You should see the file changes under the **Version control** section. [![Example of using dbt Copilot to generate documentation in the IDE](/img/docs/dbt-cloud/cloud-ide/dbt-copilot-doc.gif?v=2 "Example of using dbt Copilot to generate documentation in the IDE")](#)Example of using dbt Copilot to generate documentation in the IDE #### Generate and edit SQL inline[​](#generate-and-edit-sql-inline "Direct link to Generate and edit SQL inline") Copilot also allows you to generate SQL code directly within the SQL file in the [Studio IDE](https://docs.getdbt.com/docs/cloud/studio-ide/develop-in-studio.md), using natural language prompts. This means you can rewrite or add specific portions of the SQL file without needing to edit the entire file. This intelligent AI tool streamlines SQL development by reducing errors, scaling effortlessly with complexity, and saving valuable time. Copilot's [prompt window](#use-the-prompt-window), accessible by keyboard shortcut, handles repetitive or complex SQL generation effortlessly so you can focus on high-level tasks. Use Copilot's prompt window for use cases like: * Writing advanced transformations * Performing bulk edits efficiently * Crafting complex patterns like regex ##### Use the prompt window[​](#use-the-prompt-window "Direct link to Use the prompt window") Access Copilot's AI prompt window using the keyboard shortcut Cmd+B (Mac) or Ctrl+B (Windows) to: ###### 1. Generate SQL from scratch[​](#1-generate-sql-from-scratch "Direct link to 1. Generate SQL from scratch") * Use the keyboard shortcuts Cmd+B (Mac) or Ctrl+B (Windows) to generate SQL from scratch. * Enter your instructions to generate SQL code tailored to your needs using natural language. * Ask Copilot to fix the code or add a specific portion of the SQL file. [![dbt Copilot's prompt window accessible by keyboard shortcut Cmd+B (Mac) or Ctrl+B (Windows)](/img/docs/dbt-cloud/cloud-ide/copilot-sql-generation-prompt.png?v=2 "dbt Copilot's prompt window accessible by keyboard shortcut Cmd+B (Mac) or Ctrl+B (Windows)")](#)dbt Copilot's prompt window accessible by keyboard shortcut Cmd+B (Mac) or Ctrl+B (Windows) ###### 2. Edit existing SQL code[​](#2-edit-existing-sql-code "Direct link to 2. Edit existing SQL code") * Highlight a section of SQL code and press Cmd+B (Mac) or Ctrl+B (Windows) to open the prompt window for editing. * Use this to refine or modify specific code snippets based on your needs. * Ask Copilot to fix the code or add a specific portion of the SQL file. ###### 3. Review changes with the diff view to quickly assess the impact of the changes before making changes[​](#3-review-changes-with-the-diff-view-to-quickly-assess-the-impact-of-the-changes-before-making-changes "Direct link to 3. Review changes with the diff view to quickly assess the impact of the changes before making changes") * When a suggestion is generated, Copilot displays a visual "diff" view to help you compare the proposed changes with your existing code: * **Green**: Means new code that will be added if you accept the suggestion. * **Red**: Highlights existing code that will be removed or replaced by the suggested changes. ###### 4. Accept or reject suggestions[​](#4-accept-or-reject-suggestions "Direct link to 4. Accept or reject suggestions") * **Accept**: If the generated SQL meets your requirements, click the **Accept** button to apply the changes directly to your `.sql` file directly in the IDE. * **Reject**: If the suggestion don’t align with your request/prompt, click **Reject** to discard the generated SQL without making changes and start again. ###### 5. Regenerate code[​](#5-regenerate-code "Direct link to 5. Regenerate code") * To regenerate, press the **Escape** button on your keyboard (or click the Reject button in the popup). This will remove the generated code and puts your cursor back into the prompt text area. * Update your prompt and press **Enter** to try another generation. Press **Escape** again to close the popover entirely. Once you've accepted a suggestion, you can continue to use the prompt window to generate additional SQL code and commit your changes to the branch. [![Edit existing SQL code using dbt Copilot's prompt window accessible by keyboard shortcut Cmd+B (Mac) or Ctrl+B (Windows)](/img/docs/dbt-cloud/cloud-ide/copilot-sql-generation.gif?v=2 "Edit existing SQL code using dbt Copilot's prompt window accessible by keyboard shortcut Cmd+B (Mac) or Ctrl+B (Windows)")](#)Edit existing SQL code using dbt Copilot's prompt window accessible by keyboard shortcut Cmd+B (Mac) or Ctrl+B (Windows) #### Build visual models[​](#build-visual-models "Direct link to Build visual models") Copilot seamlessly integrates with the [Canvas](https://docs.getdbt.com/docs/cloud/canvas.md), a drag-and-drop experience that helps you build your visual models using natural language prompts. Before you begin, make sure you can [access the Canvas](https://docs.getdbt.com/docs/cloud/use-canvas.md#access-canvas). To begin building models with natural language prompts in the Canvas: 1. Click on the **dbt Copilot** icon in Canvas menu. 2. In the dbt Copilot prompt box, enter your prompt in natural language for Copilot to build the model(s) you want. You can also reference existing models using the `@` symbol. For example, to build a model that calculates the total price of orders, you can enter `@orders` in the prompt and it'll pull in and reference the `orders` model. 3. Click **Generate** and dbt Copilot generates a summary of the model(s) you want to build. * To start over, click on the **+** icon. To close the prompt box, click **X**. [![Enter a prompt in the dbt Copilot prompt box to build models using natural language](/img/docs/dbt-cloud/copilot-generate.jpg?v=2 "Enter a prompt in the dbt Copilot prompt box to build models using natural language")](#)Enter a prompt in the dbt Copilot prompt box to build models using natural language 4. Click **Apply** to generate the model(s) in the Canvas. 5. dbt Copilot displays a visual "diff" view to help you compare the proposed changes with your existing code. Review the diff view in the canvas to see the generated operators built byCopilot: * White: Located in the top of the canvas and means existing set up or blank canvas that will be removed or replaced by the suggested changes. * Green: Located in the bottom of the canvas and means new code that will be added if you accept the suggestion.
[![Visual diff view of proposed changes](/img/docs/dbt-cloud/copilot-diff.jpg?v=2 "Visual diff view of proposed changes")](#)Visual diff view of proposed changes 6. Reject or accept the suggestions 7. In the **generated** operator box, click the play icon to preview the data 8. Confirm the results or continue building your model. [![Use the generated operator with play icon to preview the data](/img/docs/dbt-cloud/copilot-output.jpg?v=2 "Use the generated operator with play icon to preview the data")](#)Use the generated operator with play icon to preview the data 9. To edit the generated model, open **Copilot** prompt box and type your edits. 10. Click **Submit** and Copilot will generate the revised model. Repeat steps 5-8 until you're happy with the model. #### Build queries[​](#build-queries "Direct link to Build queries") Use Copilot to build queries in [Insights](https://docs.getdbt.com/docs/explore/dbt-insights.md) with natural language prompts to seamlessly explore and query data with an intuitive, context-rich interface. Before you begin, make sure you can [access Insights](https://docs.getdbt.com/docs/explore/access-dbt-insights.md). To begin building SQL queries with natural language prompts in Insights: 1. Click the **Copilot** icon in the Query console sidebar menu. 2. In the dropdown menu above the Copilot prompt box, select **Generate SQL**. 3. In the dbt Copilot prompt box, enter your prompt in natural language for dbt Copilot to build the SQL query you want. 4. Click **↑** to submit your prompt. Copilot generates a summary of the SQL query you want to build. To clear the prompt, click on the **Clear** button. To close the prompt box, click the Copilot icon again. 5. Copilot will automatically generate the SQL with an explanation of the query. * Click **Add** to add the generated SQL to the existing query. * Click **Replace** to replace the existing query with the generated SQL. 6. In the **Query console menu**, click the **Run** button to preview the data. 7. Confirm the results or continue building your model. [![dbt Copilot in dbt Insights](/img/docs/dbt-insights/insights-copilot.gif?v=2 "dbt Copilot in dbt Insights")](#)dbt Copilot in dbt Insights #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Using defer in dbt [Defer](https://docs.getdbt.com/reference/node-selection/defer.md) is a powerful feature that allows developers to only build, run, and test models they've edited, without having to build and run all the models that come before them (upstream parents). dbt powers this by using a production manifest for comparison and resolves `{{ ref() }}` calls with upstream production artifacts. Both the Studio IDE and the dbt CLI enable users to natively defer to production metadata directly in their development workflows. [![Use 'defer' to modify end-of-pipeline models by pointing to production models, instead of running everything upstream.](/img/docs/reference/defer-diagram.png?v=2 "Use 'defer' to modify end-of-pipeline models by pointing to production models, instead of running everything upstream.")](#)Use 'defer' to modify end-of-pipeline models by pointing to production models, instead of running everything upstream. When using `--defer`, dbt will follow this order of execution for resolving `{{ ref() }}` calls. 1. If a development version of a deferred relation exists, dbt preferentially uses the development database location when resolving the reference. 2. If a development version doesn't exist, dbt uses the staging locations of parent relations based on metadata from the staging environment. 3. If no development version and no staging environment exist, dbt uses the production locations of parent relations based on metadata from the production environment. Note that dbt only defers to one environment per invocation — either staging or production. **Note:** Passing the `--favor-state` flag will always resolve refs using staging metadata if available; otherwise, it defaults to production metadata regardless of the presence of a development relation, skipping step #1. For a clean slate, it's a good practice to drop the development schema at the start and end of your development cycle. If you require additional controls over production data, create a [staging environment](https://docs.getdbt.com/docs/deploy/deploy-environments.md#staging-environment), and dbt will use that, rather than the production environment, to resolve `{{ ref() }}` calls. #### Required setup[​](#required-setup "Direct link to Required setup") * You must select the **[Production environment](https://docs.getdbt.com/docs/deploy/deploy-environments.md#set-as-production-environment)** checkbox in the **Environment Settings** page. * This can be set for one deployment environment per dbt project. * You must have a successful job run first. When using defer, it compares artifacts from the most recent successful production job, excluding CI jobs. ##### Defer in the dbt IDE[​](#defer-in-the-dbt-ide "Direct link to Defer in the dbt IDE") To use deferral in the Studio IDE, you must have production artifacts generated by a deploy job. dbt will first check for these artifacts in your Staging environment (if available), or else in the Production environment. The defer feature in the Studio IDE won't work if a Staging environment exists but no deploy job has run. This is because the necessary metadata to power defer won't exist until a deploy job has run successfully in the Staging environment. To enable defer in the Studio IDE, toggle the **Defer to staging/production** button on the command bar. Once enabled, dbt will: 1. Pull down the most recent manifest from the staging or production environment for comparison 2. Pass the `--defer` flag to the command (for any command that accepts the flag) For example, if you were to start developing on a new branch with [nothing in your development schema](https://docs.getdbt.com/reference/node-selection/defer.md#usage), edit a single model, and run `dbt build -s state:modified` — only the edited model runs. Any `{{ ref() }}` calls resolve to the staging or production location of the referenced models. [![Select the 'Defer to production' toggle on the bottom right of the command bar to enable defer in the Studio IDE.](/img/docs/dbt-cloud/defer-toggle.png?v=2 "Select the 'Defer to production' toggle on the bottom right of the command bar to enable defer in the Studio IDE.")](#)Select the 'Defer to production' toggle on the bottom right of the command bar to enable defer in the Studio IDE. ##### Defer in dbt CLI[​](#defer-in-dbt-cli "Direct link to Defer in dbt CLI") One key difference between using `--defer` in the dbt CLI and the Studio IDE is that `--defer` is *automatically* enabled in the dbt CLI for all invocations, compared with production artifacts. You can disable it with the `--no-defer` flag. ##### Configure deferral environment ID[​](#configure-deferral-environment-id "Direct link to Configure deferral environment ID") The Studio IDE and dbt CLI both offer additional flexibility by letting you choose the source environment for deferral artifacts. You can manually set a `defer-env-id` key in either your `dbt_project.yml` (dbt CLI and Studio IDE) or `dbt_cloud.yml`(dbt CLI only) file. By default, dbt will prefer metadata from the project's "Staging" environment (if defined). Otherwise, it uses "Production." [![Set the defer environment and the target name will change in the UI.](/img/docs/dbt-cloud/defer-env-id.png?v=2 "Set the defer environment and the target name will change in the UI.")](#)Set the defer environment and the target name will change in the UI. dbt\_cloud.yml ```yml context: active-host: ... active-project: ... defer-env-id: '123456' ``` dbt\_project.yml ```yml dbt-cloud: defer-env-id: '123456' ``` #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ## Reference ### .dbtignore You can create a `.dbtignore` file in the root of your [dbt project](https://docs.getdbt.com/docs/build/projects.md) to specify files that should be **entirely** ignored by dbt. The file behaves like a [`.gitignore` file, using the same syntax](https://git-scm.com/docs/gitignore). Files and subdirectories matching the pattern will not be read, parsed, or otherwise detected by dbt—as if they didn't exist. **Examples** .dbtignore ```md # .dbtignore # ignore individual .py files not-a-dbt-model.py another-non-dbt-model.py # ignore all .py files **.py # ignore all .py files with "codegen" in the filename *codegen*.py # ignore all folders in a directory path/to/folders/** # ignore some folders in a directory path/to/folders/subfolder/** ``` #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### About adapter object Your database communicates with dbt using an internal database adapter object. For example, BaseAdapter and SnowflakeAdapter. The Jinja object `adapter` is a wrapper around this internal database adapter object. `adapter` grants the ability to invoke adapter methods of that internal class via: * `{% do adapter. %}` -- invoke internal adapter method * `{{ adapter. }}` -- invoke internal adapter method and capture its return value for use in materialization or other macros For example, the adapter methods below will be translated into specific SQL statements depending on the type of adapter your project is using: * [adapter.dispatch](https://docs.getdbt.com/reference/dbt-jinja-functions/dispatch.md) * [adapter.get\_missing\_columns](#get_missing_columns) * [adapter.expand\_target\_column\_types](#expand_target_column_types) * [adapter.get\_relation](#get_relation) or [load\_relation](#load_relation) * [adapter.get\_columns\_in\_relation](#get_columns_in_relation) * [adapter.create\_schema](#create_schema) * [adapter.drop\_schema](#drop_schema) * [adapter.drop\_relation](#drop_relation) * [adapter.rename\_relation](#rename_relation) * [adapter.quote](#quote) ##### Deprecated adapter functions[​](#deprecated-adapter-functions "Direct link to Deprecated adapter functions") The following adapter functions are deprecated, and will be removed in a future release. * [adapter.get\_columns\_in\_table](#get_columns_in_table) **(deprecated)** * [adapter.already\_exists](#already_exists) **(deprecated)** * [adapter\_macro](#adapter_macro) **(deprecated)** #### dispatch[​](#dispatch "Direct link to dispatch") Moved to separate page: [dispatch](https://docs.getdbt.com/reference/dbt-jinja-functions/dispatch.md) #### get\_missing\_columns[​](#get_missing_columns "Direct link to get_missing_columns") **Args**: * `from_relation`: The source [Relation](https://docs.getdbt.com/reference/dbt-classes.md#relation) * `to_relation`: The target [Relation](https://docs.getdbt.com/reference/dbt-classes.md#relation) Returns a list of [Columns](https://docs.getdbt.com/reference/dbt-classes.md#column) that is the difference of the columns in the `from_table` and the columns in the `to_table`, i.e. (`set(from_relation.columns) - set(to_table.columns)`). Useful for detecting new columns in a source table. **Usage**: models/example.sql ```sql {%- set target_relation = api.Relation.create( database='database_name', schema='schema_name', identifier='table_name') -%} {% for col in adapter.get_missing_columns(target_relation, this) %} alter table {{this}} add column "{{col.name}}" {{col.data_type}}; {% endfor %} ``` #### expand\_target\_column\_types[​](#expand_target_column_types "Direct link to expand_target_column_types") **Args**: * `from_relation`: The source [Relation](https://docs.getdbt.com/reference/dbt-classes.md#relation) to use as a template * `to_relation`: The [Relation](https://docs.getdbt.com/reference/dbt-classes.md#relation) to mutate Expand the `to_relation` table's column types to match the schema of `from_relation`. Column expansion is constrained to string and numeric types on supported databases. Typical usage involves expanding column types (from eg. `varchar(16)` to `varchar(32)`) to support insert statements. **Usage**: example.sql ```sql {% set tmp_relation = adapter.get_relation(...) %} {% set target_relation = adapter.get_relation(...) %} {% do adapter.expand_target_column_types(tmp_relation, target_relation) %} ``` #### get\_relation[​](#get_relation "Direct link to get_relation") **Args**: * `database`: The database of the relation to fetch * `schema`: The schema of the relation to fetch * `identifier`: The identifier of the relation to fetch Returns a cached [Relation](https://docs.getdbt.com/reference/dbt-classes.md#relation) object identified by the `database.schema.identifier` provided to the method, or `None` if the relation does not exist. **Usage**: example.sql ```sql {%- set source_relation = adapter.get_relation( database="analytics", schema="dbt_drew", identifier="orders") -%} {{ log("Source Relation: " ~ source_relation, info=true) }} ``` #### load\_relation[​](#load_relation "Direct link to load_relation") **Args**: * `relation`: The [Relation](https://docs.getdbt.com/reference/dbt-classes.md#relation) to try to load A convenience wrapper for [get\_relation](#get_relation). Returns the cached version of the [Relation](https://docs.getdbt.com/reference/dbt-classes.md#relation) object, or `None` if the relation does not exist. **Usage**: example.sql ```sql {% set relation_exists = load_relation(ref('my_model')) is not none %} {% if relation_exists %} {{ log("my_model has already been built", info=true) }} {% else %} {{ log("my_model doesn't exist in the warehouse. Maybe it was dropped?", info=true) }} {% endif %} ``` #### get\_columns\_in\_relation[​](#get_columns_in_relation "Direct link to get_columns_in_relation") **Args**: * `relation`: The [Relation](https://docs.getdbt.com/reference/dbt-classes.md#relation) to find the columns for Returns a list of [Columns](https://docs.getdbt.com/reference/dbt-classes.md#column) in a table. **Usage**: example.sql ```sql {%- set columns = adapter.get_columns_in_relation(this) -%} {% for column in columns %} {{ log("Column: " ~ column, info=true) }} {% endfor %} ``` #### create\_schema[​](#create_schema "Direct link to create_schema") **Args**: * `relation`: A relation object with the database and schema to create. Any identifier on the relation will be ignored. Creates a schema (or equivalent) in the target database. If the target schema already exists, then this method is a no-op. **Usage:** example.sql ```sql {% do adapter.create_schema(api.Relation.create(database=target.database, schema="my_schema")) %} ``` #### drop\_schema[​](#drop_schema "Direct link to drop_schema") **Args**: * `relation`: A relation object with the database and schema to drop. Any identifier on the relation will be ignored. Drops a schema (or equivalent) in the target database. If the target schema does not exist, then this method is a no-op. The specific implementation is adapter-dependent, but adapters should implement a cascading drop, such that objects in the schema are also dropped. **Note**: this adapter method is destructive, so please use it with care! **Usage:** example.sql ```sql {% do adapter.drop_schema(api.Relation.create(database=target.database, schema="my_schema")) %} ``` #### drop\_relation[​](#drop_relation "Direct link to drop_relation") **Args**: * `relation`: The Relation to drop Drops a Relation in the database. If the target relation does not exist, then this method is a no-op. The specific implementation is adapter-dependent, but adapters should implement a cascading drop, such that bound views downstream of the dropped relation are also dropped. **Note**: this adapter method is destructive, so please use it with care! The `drop_relation` method will remove the specified relation from dbt's relation cache. **Usage:** example.sql ```sql {% do adapter.drop_relation(this) %} ``` #### rename\_relation[​](#rename_relation "Direct link to rename_relation") **Args**: * `from_relation`: The Relation to rename * `to_relation`: The destination Relation to rename `from_relation` to Renames a Relation the database. The `rename_relation` method will rename the specified relation in dbt's relation cache. **Usage:** example.sql ```sql {%- set old_relation = adapter.get_relation( database=this.database, schema=this.schema, identifier=this.identifier) -%} {%- set backup_relation = adapter.get_relation( database=this.database, schema=this.schema, identifier=this.identifier ~ "__dbt_backup") -%} {% do adapter.rename_relation(old_relation, backup_relation) %} ``` #### quote[​](#quote "Direct link to quote") **Args**: * `identifier`: A string to quote Encloses `identifier` in the correct quotes for the adapter when escaping reserved column names etc. **Usage:** example.sql ```sql select 'abc' as {{ adapter.quote('table_name') }}, 'def' as {{ adapter.quote('group by') }} ``` #### get\_columns\_in\_table[​](#get_columns_in_table "Direct link to get_columns_in_table") Deprecated This method is deprecated and will be removed in a future release. Please use [get\_columns\_in\_relation](#get_columns_in_relation) instead. **Args**: * `schema_name`: The schema to test * `table_name`: The table (or view) from which to select columns Returns a list of [Columns](https://docs.getdbt.com/reference/dbt-classes.md#column) in a table. models/example.sql ```sql {% set dest_columns = adapter.get_columns_in_table(schema, identifier) %} {% set dest_cols_csv = dest_columns | map(attribute='quoted') | join(', ') %} insert into {{ this }} ({{ dest_cols_csv }}) ( select {{ dest_cols_csv }} from {{ref('another_table')}} ); ``` #### already\_exists[​](#already_exists "Direct link to already_exists") Deprecated This method is deprecated and will be removed in a future release. Please use [get\_relation](#get_relation) instead. **Args**: * `schema`: The schema to test * `table`: The relation to look for Returns true if a relation named like `table` exists in schema `schema`, false otherwise. models/example.sql ```text select * from {{ref('raw_table')}} {% if adapter.already_exists(this.schema, this.name) %} where id > (select max(id) from {{this}}) {% endif %} ``` #### adapter\_macro[​](#adapter_macro "Direct link to adapter_macro") Deprecated This method is deprecated and will be removed in a future release. Please use [adapter.dispatch](#dispatch) instead. **Usage:** macros/concat.sql ```sql {% macro concat(fields) -%} {{ adapter_macro('concat', fields) }} {%- endmacro %} {% macro default__concat(fields) -%} concat({{ fields|join(', ') }}) {%- endmacro %} {% macro redshift__concat(fields) %} {{ fields|join(' || ') }} {% endmacro %} {% macro snowflake__concat(fields) %} {{ fields|join(' || ') }} {% endmacro %} ``` #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### About adapter-specific behavior changes Some adapters can display behavior changes when certain flags are enabled. The following sections contain details about these adapter-specific behavior changes. [![](/img/icons/databricks.svg)](https://docs.getdbt.com/reference/global-configs/databricks-changes.md) ###### [Databricks](https://docs.getdbt.com/reference/global-configs/databricks-changes.md) [Behavior changes for the Databricks adapter.](https://docs.getdbt.com/reference/global-configs/databricks-changes.md) [![](/img/icons/redshift.svg)](https://docs.getdbt.com/reference/global-configs/redshift-changes.md) ###### [Redshift](https://docs.getdbt.com/reference/global-configs/redshift-changes.md) [Behavior changes for the Amazon Redshift adapter.](https://docs.getdbt.com/reference/global-configs/redshift-changes.md) [![](/img/icons/snowflake.svg)](https://docs.getdbt.com/reference/global-configs/snowflake-changes.md) ###### [Snowflake](https://docs.getdbt.com/reference/global-configs/snowflake-changes.md) [Behavior changes for the Snowflake adapter.](https://docs.getdbt.com/reference/global-configs/snowflake-changes.md) #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### About as_bool filter The `as_bool` Jinja filter will coerce Jinja-compiled output into a boolean value (`True` or `False`), or return an error if it cannot be represented as a bool. ##### Usage:[​](#usage "Direct link to Usage:") In the example below, the `as_bool` filter is used to coerce a Jinja expression to enable or disable a set of models based on the `target`. dbt\_project.yml ```yml models: my_project: for_export: enabled: "{{ (target.name == 'prod') | as_bool }}" ``` #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### About as_native filter The `as_native` Jinja filter will coerce Jinja-compiled output into its Python native representation according to [`ast.literal_eval`](https://docs.python.org/3/library/ast.html#ast.literal_eval). The result can be any Python native type (set, list, tuple, dict, etc). To render boolean and numeric values, it is recommended to use [`as_bool`](https://docs.getdbt.com/reference/dbt-jinja-functions/as_bool.md) and [`as_number`](https://docs.getdbt.com/reference/dbt-jinja-functions/as_number.md) instead. Proceed with caution Unlike `as_bool` and `as_number`, `as_native` will return a rendered value regardless of the input type. Ensure that your inputs match expectations. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### About as_number filter The `as_number` Jinja filter will coerce Jinja-compiled output into a numeric value (integer or float), or return an error if it cannot be represented as a number. ##### Usage[​](#usage "Direct link to Usage") In the example below, the `as_number` filter is used to coerce an environment variables into a numeric value to dynamically control the connection port. profiles.yml ```yml my_profile: outputs: dev: type: postgres port: "{{ env_var('PGPORT') | as_number }}" ``` #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### About builtins Jinja variable The `builtins` variable exists to provide references to builtin dbt context methods. This allows macros to be created with names that *mask* dbt builtin context methods, while still making those methods accessible in the dbt compilation context. The `builtins` variable is a dictionary containing the following keys: * [ref](https://docs.getdbt.com/reference/dbt-jinja-functions/ref.md) * [source](https://docs.getdbt.com/reference/dbt-jinja-functions/source.md) * [config](https://docs.getdbt.com/reference/dbt-jinja-functions/config.md) #### Usage[​](#usage "Direct link to Usage") important Using the `builtins` variable in this way is an advanced development workflow. Users should be ready to maintain and update these overrides when upgrading in the future. From dbt v1.5 and higher, use the following macro to override the `ref` method available in the model compilation context to return a [Relation](https://docs.getdbt.com/reference/dbt-classes.md#relation) with the database name overriden to `dev`. It includes logic to extract user-provided arguments, including `version`, and call the `builtins.ref()` function with either a single `modelname` argument or both `packagename` and `modelname` arguments, based on the number of positional arguments in `varargs`. Note that the `ref`, `source`, and `config` functions can't be overridden with a package. This is because `ref`, `source`, and `config` are context properties within dbt and are not dispatched as global macros. Refer to [this GitHub discussion](https://github.com/dbt-labs/dbt-core/issues/4491#issuecomment-994709916) for more context.
```text {% macro ref() %} -- extract user-provided positional and keyword arguments {% set version = kwargs.get('version') or kwargs.get('v') %} {% set packagename = none %} {%- if (varargs | length) == 1 -%} {% set modelname = varargs[0] %} {%- else -%} {% set packagename = varargs[0] %} {% set modelname = varargs[1] %} {% endif %} -- call builtins.ref based on provided positional arguments {% set rel = None %} {% if packagename is not none %} {% set rel = builtins.ref(packagename, modelname, version=version) %} {% else %} {% set rel = builtins.ref(modelname, version=version) %} {% endif %} -- finally, override the database name with "dev" {% set newrel = rel.replace_path(database="dev") %} {% do return(newrel) %} {% endmacro %} ``` Logic within the ref macro can also be used to control which elements of the model path are rendered when run, for example the following logic renders only the schema and object identifier, but not the database reference i.e. `my_schema.my_model` rather than `my_database.my_schema.my_model`. This is especially useful when using snowflake as a warehouse, if you intend to change the name of the database post-build and wish the references to remain accurate. ```text -- render identifiers without a database {% do return(rel.include(database=false)) %} ``` #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### About config property * Models * Seeds * Snapshots * Tests * Unit tests * Sources * Metrics * Exposures * Semantic models * Saved queries models/\.yml ```yml models: - name: config: : ... ``` seeds/\.yml ```yml seeds: - name: config: : ... ``` snapshots/\.yml ```yml snapshots: - name: config: : ... ``` \/\.yml ```yml : - name: data_tests: - : arguments: # available in v1.10.5 and higher. Older versions can set the as the top-level property. : config: : ... columns: - name: data_tests: - - : arguments: # available in v1.10.5 and higher. Older versions can set the as the top-level property. : config: : ... ``` 💡Did you know... Available from dbt v 1.8 or with the [dbt "Latest" release track](https://docs.getdbt.com/docs/dbt-versions/cloud-release-tracks.md). models/\.yml ```yml unit_tests: - name: config: enabled: true | false meta: {dictionary} tags: ``` models/\.yml ```yml sources: - name: config: : tables: - name: config: : ``` models/\.yml ```yml metrics: - name: config: enabled: true | false group: meta: {dictionary} ``` models/\.yml ```yml exposures: - name: config: enabled: true | false meta: {dictionary} ``` models/\.yml ```yml saved-queries: - name: config: cache: enabled: true | false enabled: true | false group: meta: {dictionary} schema: exports: - name: config: export_as: view | table alias: schema: ``` #### Definition[​](#definition "Direct link to Definition") The `config` property allows you to configure resources at the same time you're defining properties in YAML files. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### About config variable The `config` variable exists to handle end-user configuration for custom materializations. Configs like `unique_key` can be implemented using the `config` variable in your own materializations. For example, code in the `incremental` materialization like this: ```text {% materialization incremental, default -%} {%- set unique_key = config.get('unique_key') -%} ... ``` is responsible for handling model code that looks like this: ```text {{ config( materialized='incremental', unique_key='id' ) }} ``` Review [Model configurations](https://docs.getdbt.com/reference/model-configs.md) for examples and more information on valid arguments. #### config.get[​](#configget "Direct link to config.get") **Args**: * `name`: The name of the configuration variable (required) * `default`: The default value to use if this configuration is not provided (optional) The `config.get` function is used to get configurations for a model from the end-user. Configs defined in this way are optional, and a default value can be provided. There are 3 cases: 1. The configuration variable exists, it is not `None` 2. The configuration variable exists, it is `None` 3. The configuration variable does not exist Accessing custom configurations in meta `config.get()` doesn't return values from `config.meta`. If a key exists only in `meta`, `config.get()` returns the default value and emits a warning. To access custom configurations stored under `meta`, use [`config.meta_get()`](#configmeta_get). Example usage: ```sql {% materialization incremental, default -%} -- Example w/ no default. unique_key will be None if the user does not provide this configuration {%- set unique_key = config.get('unique_key') -%} -- Example w/ alternate value. Use alternative of 'id' if 'unique_key' config is provided, but it is None {%- set unique_key = config.get('unique_key') or 'id' -%} -- Example w/ default value. Default to 'id' if the 'unique_key' config does not exist {%- set unique_key = config.get('unique_key', default='id') -%} -- For custom configs under `meta`, use config.meta_get() {% set my_custom_config = config.meta_get('custom_config_key') %} ... ``` #### config.require[​](#configrequire "Direct link to config.require") **Args**: * `name`: The name of the configuration variable (required) The `config.require` function is used to get configurations for a model from the end-user. Configs defined using this function are required, and failure to provide them will result in a compilation error. Accessing custom configurations in meta `config.require()` doesn't return values from `config.meta`. If a key exists only in `meta`, `config.require()` raises an error and emits a warning. To access required custom configurations stored under `meta`, use [`config.meta_require()`](#configmeta_require). Example usage: ```sql {% materialization incremental, default -%} {%- set unique_key = config.require('unique_key') -%} ... ``` #### config.meta\_get[​](#configmeta_get "Direct link to config.meta_get") This functionality is available starting in dbt Core v1.10 and in the dbt Fusion engine. **Args**: * `name`: The name of the configuration variable to retrieve from `meta` (required) * `default`: The default value to use if this configuration is not provided (optional) The `config.meta_get` function retrieves custom configurations stored under the `meta` dictionary. Unlike `config.get()`, this function exclusively checks `config.meta` and won't result in a deprecation warning. Use this function when accessing custom configurations that you've defined under `meta` in your model or resource configuration - it's equivalent to writing `config.get('meta').get()`. Note that `config.meta_get` is not yet supported in Python models. In the meantime, Python models should continue using `dbt.config.get("meta").get("")` to access custom meta configurations. `dbt.config.get_meta("")` is an alias for `dbt.config.get("meta").get("")`. Example usage: ```sql {% materialization custom_materialization, default -%} -- Retrieve a custom config from meta, returns None if not found {%- set custom_setting = config.meta_get('custom_setting') -%} -- Retrieve with a default value {%- set custom_setting = config.meta_get('custom_setting', default='default_value') -%} ... ``` Example model configuration: ```yaml models: - name: my_model config: meta: custom_setting: "my_value" ``` #### config.meta\_require[​](#configmeta_require "Direct link to config.meta_require") This functionality is available starting in dbt Core v1.10 and in the dbt Fusion engine. **Args**: * `name`: The name of the configuration variable to retrieve from `meta` (required) The `config.meta_require` function retrieves custom configurations stored under the `meta` dictionary. Unlike `config.require()`, this function exclusively checks `config.meta` and won't result in deprecation warnings. If the configuration is not found, dbt raises a compilation error. Use this function when you need to ensure a custom configuration exists under `meta`. Note that `config.meta_require` is not yet supported in Python models. Example usage: ```sql {% materialization custom_materialization, default -%} -- Require a custom config from meta, throws error if not found {%- set required_setting = config.meta_require('required_setting') -%} ... ``` Example model configuration: ```yaml models: - name: my_model config: meta: required_setting: "my_value" ``` #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### About data tests property #### Description[​](#description "Direct link to Description") The `data_tests` property defines assertions about a column, table, or view. The property contains a list of [generic data tests](https://docs.getdbt.com/docs/build/data-tests.md#generic-data-tests), referenced by name, which can include the four built-in generic tests available in dbt. For example, you can add data tests that ensure a column contains no duplicates and zero null values. Any arguments or [configurations](https://docs.getdbt.com/reference/data-test-configs.md) passed to those data tests should be nested below the `arguments` property. Once these data tests are defined, you can validate their correctness by running `dbt test`. To help you get started, the examples below show how to define the `data_tests` property on different resource types (models, sources, seeds, snapshots, and analyses). * Models * Sources * Seeds * Snapshots * Analyses models/\.yml ```yml models: - name: data_tests: - : arguments: # available in v1.10.5 and higher. Older versions can set the as the top-level property. : config: : columns: - name: data_tests: - - : arguments: : config: : ``` models/\.yml ```yml sources: - name: tables: - name: data_tests: - - : arguments: # available in v1.10.5 and higher. Older versions can set the as the top-level property. : config: : columns: - name: data_tests: - - : arguments: : config: : ``` seeds/\.yml ```yml seeds: - name: data_tests: - - : arguments: # available in v1.10.5 and higher. Older versions can set the as the top-level property. : config: : columns: - name: data_tests: - - : arguments: : config: : ``` snapshots/\.yml ```yml snapshots: - name: data_tests: - - : arguments: # available in v1.10.5 and higher. Older versions can set the as the top-level property. : config: : columns: - name: data_tests: - - : arguments: : config: : ``` This feature is not implemented for analyses. #### Out-of-the-box data tests[​](#out-of-the-box-data-tests "Direct link to Out-of-the-box data tests") There are four generic data tests that are available out of the box, for everyone using dbt. ##### `not_null`[​](#not_null "Direct link to not_null") This data test validates that there are no `null` values present in a column. models/\.yml ```yaml models: - name: orders columns: - name: order_id data_tests: - not_null ... ``` You can add an arguments block for test inputs and a config block for options like `severity` or `where`. Refer to [Data test configurations](https://docs.getdbt.com/reference/data-test-configs.md?version=2.0#data-test-specific-configurations) for the full list. If you see a deprecation warning about test arguments, refer to [Deprecations](https://docs.getdbt.com/reference/deprecations.md?version=2.0) for test-related warnings. ##### `unique`[​](#unique "Direct link to unique") This data test validates that there are no duplicate values present in a field. The config and where clause are optional. models/\.yml ```yaml models: - name: orders columns: - name: order_id data_tests: - unique: config: where: "order_id > 21" ``` ##### `accepted_values`[​](#accepted_values "Direct link to accepted_values") This data test validates that all of the non-`null` values in a column are present in a supplied list of `values`. If any values other than those provided in the list are present, then the data test will fail. The `accepted_values` test supports an optional `quote` parameter which, by default, will single-quote the list of accepted values in the test query. To test non-strings (like integers or boolean values) explicitly set the `quote` config to `false`. schema.yml ```yaml models: - name: orders columns: - name: status data_tests: - accepted_values: arguments: # available in v1.10.5 and higher. Older versions can set the as the top-level property. values: ['placed', 'shipped', 'completed', 'returned'] - name: status_id data_tests: - accepted_values: arguments: values: [1, 2, 3, 4] quote: false ``` ##### `relationships`[​](#relationships "Direct link to relationships") This data test validates that all of the records in a child table have a corresponding record in a parent table. This property is referred to as "referential integrity". This test automatically excludes `NULL` values from validation, consistent with how database foreign key constraints work. Use the `not_null` test separately if `NULL` values should cause failures. The following example tests that every order's `customer_id` maps back to a valid `customer`. schema.yml ```yaml models: - name: orders columns: - name: customer_id data_tests: - relationships: arguments: # available in v1.10.5 and higher. Older versions can set the as the top-level property. to: ref('customers') field: id ``` The `to` argument accepts a [Relation](https://docs.getdbt.com/reference/dbt-classes.md#relation) – this means you can pass it a `ref` to a model (e.g. `ref('customers')`), or a `source` (e.g. `source('jaffle_shop', 'customers')`). #### Additional examples[​](#additional-examples "Direct link to Additional examples") ##### Test an expression[​](#test-an-expression "Direct link to Test an expression") Some data tests require multiple columns, so it doesn't make sense to nest them under the `columns:` key. In this case, you can apply the data test to the model (or source, seed, or snapshot) instead: models/orders.yml ```yaml models: - name: orders description: Order overview data mart, offering key details for each order including if it's a customer's first order and a food vs. drink item breakdown. One row per order. data_tests: - dbt_utils.expression_is_true: arguments: # available in v1.10.5 and higher. Older versions can set the as the top-level property. expression: "order_items_subtotal = subtotal" - dbt_utils.expression_is_true: arguments: expression: "order_total = subtotal + tax_paid" ``` This example focuses on testing expressions to ensure that `order_items_subtotal` equals `subtotal` and `order_total` correctly sums `subtotal` and `tax_paid`. ##### Use custom generic data test[​](#use-custom-generic-data-test "Direct link to Use custom generic data test") If you've defined your own custom generic data test, you can use that as the `test_name`: models/\.yml ```yaml models: - name: orders columns: - name: order_id data_tests: - primary_key # name of my custom generic test ``` Check out the guide on writing a [custom generic data test](https://docs.getdbt.com/best-practices/writing-custom-generic-tests.md) for more information. ##### Custom data test name[​](#custom-data-test-name "Direct link to Custom data test name") By default, dbt will synthesize a name for your generic data test by concatenating: * test name (`not_null`, `unique`, etc) * model name (or source/seed/snapshot) * column name (if relevant) * arguments (if relevant, e.g. `values` for `accepted_values`) It does not include any configurations for the data test. If the concatenated name is too long, dbt will use a truncated and hashed version instead. The goal is to preserve unique identifiers for all resources in your project, including tests. You may also define your own name for a specific data test, via the `name` property. **When might you want this?** dbt's default approach can result in some wonky (and ugly) data test names. By defining a custom name, you get full control over how the data test will appear in log messages and metadata artifacts. You'll also be able to select the data test by that name. models/\.yml ```yaml models: - name: orders columns: - name: status data_tests: - accepted_values: name: unexpected_order_status_today arguments: # available in v1.10.5 and higher. Older versions can set the as the top-level property. values: ['placed', 'shipped', 'completed', 'returned'] config: where: "order_date = current_date" ``` ```sh $ dbt test --select unexpected_order_status_today 12:43:41 Running with dbt=1.1.0 12:43:41 Found 1 model, 1 test, 0 snapshots, 0 analyses, 167 macros, 0 operations, 1 seed file, 0 sources, 0 exposures, 0 metrics 12:43:41 12:43:41 Concurrency: 5 threads (target='dev') 12:43:41 12:43:41 1 of 1 START test unexpected_order_status_today ................................ [RUN] 12:43:41 1 of 1 PASS unexpected_order_status_today ...................................... [PASS in 0.03s] 12:43:41 12:43:41 Finished running 1 test in 0.13s. 12:43:41 12:43:41 Completed successfully 12:43:41 12:43:41 Done. PASS=1 WARN=0 ERROR=0 SKIP=0 TOTAL=1 ``` A data test's name must be unique for all tests defined on a given model-column combination. If you give the same name to data tests defined on several different columns, or across several different models, then `dbt test --select ` will select them all. **When might you need this?** In cases where you have defined the same data test twice, with only a difference in configuration, dbt will consider these data tests to be duplicates: models/\.yml ```yaml models: - name: orders columns: - name: status data_tests: - accepted_values: arguments: # available in v1.10.5 and higher. Older versions can set the as the top-level property. values: ['placed', 'shipped', 'completed', 'returned'] config: where: "order_date = current_date" - accepted_values: arguments: values: ['placed', 'shipped', 'completed', 'returned'] config: # only difference is in the 'where' config where: "order_date = (current_date - interval '1 day')" # PostgreSQL syntax ``` ```sh Compilation Error dbt found two tests with the name "accepted_values_orders_status__placed__shipped__completed__returned" defined on column "status" in "models.orders". Since these resources have the same name, dbt will be unable to find the correct resource when running tests. To fix this, change the name of one of these resources: - test.testy.accepted_values_orders_status__placed__shipped__completed__returned.69dce9e5d5 (models/one_file.yml) - test.testy.accepted_values_orders_status__placed__shipped__completed__returned.69dce9e5d5 (models/one_file.yml) ``` By providing a custom name, you help dbt differentiate data tests: models/\.yml ```yaml models: - name: orders columns: - name: status data_tests: - accepted_values: name: unexpected_order_status_today arguments: # available in v1.10.5 and higher. Older versions can set the as the top-level property. values: ['placed', 'shipped', 'completed', 'returned'] config: where: "order_date = current_date" - accepted_values: name: unexpected_order_status_yesterday arguments: values: ['placed', 'shipped', 'completed', 'returned'] config: where: "order_date = (current_date - interval '1 day')" # PostgreSQL ``` ```sh $ dbt test 12:48:03 Running with dbt=1.1.0-b1 12:48:04 Found 1 model, 2 tests, 0 snapshots, 0 analyses, 167 macros, 0 operations, 1 seed file, 0 sources, 0 exposures, 0 metrics 12:48:04 12:48:04 Concurrency: 5 threads (target='dev') 12:48:04 12:48:04 1 of 2 START test unexpected_order_status_today ................................ [RUN] 12:48:04 2 of 2 START test unexpected_order_status_yesterday ............................ [RUN] 12:48:04 1 of 2 PASS unexpected_order_status_today ...................................... [PASS in 0.04s] 12:48:04 2 of 2 PASS unexpected_order_status_yesterday .................................. [PASS in 0.04s] 12:48:04 12:48:04 Finished running 2 tests in 0.21s. 12:48:04 12:48:04 Completed successfully 12:48:04 12:48:04 Done. PASS=2 WARN=0 ERROR=0 SKIP=0 TOTAL=2 ``` **If using [`store_failures`](https://docs.getdbt.com/reference/resource-configs/store_failures.md):** dbt uses each data test's name as the name of the table in which to store any failing records. If you have defined a custom name for one data test, that custom name will also be used for its table of failures. You may optionally configure an [`alias`](https://docs.getdbt.com/reference/resource-configs/alias.md) for the data test, to separately control both the name of the data test (for metadata) and the name of its database table (for storing failures). ##### Alternative format for defining tests[​](#alternative-format-for-defining-tests "Direct link to Alternative format for defining tests") When defining a generic data test with several arguments and configurations, the YAML can look and feel unwieldy. If you find it easier, you can define the same data test properties as top-level keys of a single dictionary, by providing the data test name as `test_name` instead. It's totally up to you. This example is identical to the one above: models/\.yml ```yaml models: - name: orders columns: - name: status data_tests: - name: unexpected_order_status_today test_name: accepted_values # name of the generic test to apply arguments: # available in v1.10.5 and higher. Older versions can set the as the top-level property. values: - placed - shipped - completed - returned config: where: "order_date = current_date" ``` #### Related documentation[​](#related-documentation "Direct link to Related documentation") * [Data testing guide](https://docs.getdbt.com/docs/build/data-tests.md) #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### About dbt --version The `--version` command-line flag returns information about the currently installed version of dbt Core or the dbt CLI. This flag is not supported when invoking dbt in other dbt runtimes (for example, the IDE or scheduled runs). * **dbt Core** — Returns the installed version of dbt Core and the versions of all installed adapters. * **dbt CLI** — Returns the installed version of the [dbt CLI](https://docs.getdbt.com/docs/cloud/cloud-cli-installation.md) and, for the other `dbt_version` values, the *latest* version of the dbt runtime in dbt. #### Versioning[​](#versioning "Direct link to Versioning") To learn more about release versioning for dbt Core, refer to [How dbt Core uses semantic versioning](https://docs.getdbt.com/docs/dbt-versions/core.md#how-dbt-core-uses-semantic-versioning). If using a [dbt release track](https://docs.getdbt.com/docs/dbt-versions/cloud-release-tracks.md), which provide ongoing updates to dbt, then `dbt_version` represents the release version of dbt in dbt. This also follows semantic versioning guidelines, using the `YYYY.M.D+` format. The year, month, and day represent the date the version was built (for example, `2024.10.8+996c6a8`). The suffix provides an additional unique identification for each build. #### Example usages[​](#example-usages "Direct link to Example usages") dbt Core example: dbt Core ```text $ dbt --version Core: - installed: 1.7.6 - latest: 1.7.6 - Up to date! Plugins: - snowflake: 1.7.1 - Up to date! ``` dbt CLI example: dbt CLI ```text $ dbt --version Cloud CLI - 0.35.7 (fae78a6f5f6f2d7dff3cab3305fe7f99bd2a36f3 2024-01-18T22:34:52Z) ``` #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### About dbt artifacts With every invocation, dbt generates and saves one or more *artifacts*. Several of these are JSON files (`semantic_manifest.json`, `manifest.json`, `catalog.json`, `run_results.json`, and `sources.json`) that are used to power: * [documentation](https://docs.getdbt.com/docs/explore/build-and-view-your-docs.md) * [state](https://docs.getdbt.com/reference/node-selection/syntax.md#about-node-selection) * [visualizing source freshness](https://docs.getdbt.com/docs/build/sources.md#source-data-freshness) They could also be used to: * gain insights into your [Semantic Layer](https://docs.getdbt.com/docs/use-dbt-semantic-layer/dbt-sl.md) * calculate project-level test coverage * perform longitudinal analysis of run timing * identify historical changes in table structure * do much, much more ##### When are artifacts produced? [Starter](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing")[Enterprise](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing")[​](#when-are-artifacts-produced- "Direct link to when-are-artifacts-produced-") Most dbt commands (and corresponding RPC methods) produce artifacts: * [semantic manifest](https://docs.getdbt.com/reference/artifacts/sl-manifest.md): produced whenever your dbt project is parsed * [manifest](https://docs.getdbt.com/reference/artifacts/manifest-json.md): produced by commands that read and understand your project * [run results](https://docs.getdbt.com/reference/artifacts/run-results-json.md): produced by commands that run, compile, or catalog nodes in your DAG * [catalog](https://docs.getdbt.com/reference/artifacts/catalog-json.md): produced by `docs generate` * [sources](https://docs.getdbt.com/reference/artifacts/sources-json.md): produced by `source freshness` When running commands from the [dbt CLI](https://docs.getdbt.com/docs/cloud/cloud-cli-installation.md), all artifacts are downloaded by default. If you want to change this behavior, refer to [How to skip artifacts from being downloaded](https://docs.getdbt.com/docs/cloud/configure-cloud-cli.md#how-to-skip-artifacts-from-being-downloaded). #### Where are artifacts produced?[​](#where-are-artifacts-produced "Direct link to Where are artifacts produced?") By default, artifacts are written to the `/target` directory of your dbt project. You can configure the location using the [`target-path` flag](https://docs.getdbt.com/reference/global-configs/json-artifacts.md). #### Common metadata[​](#common-metadata "Direct link to Common metadata") All artifacts produced by dbt include a `metadata` dictionary with these properties: * `dbt_version`: Version of dbt that produced this artifact. For details about release versioning, refer to [Versioning](https://docs.getdbt.com/reference/commands/version.md#versioning). * `dbt_schema_version`: URL of this artifact's schema. See notes below. * `generated_at`: Timestamp in UTC when this artifact was produced. * `adapter_type`: The adapter (database), e.g. `postgres`, `spark`, etc. * `env`: Any environment variables prefixed with `DBT_ENV_CUSTOM_ENV_` will be included in a dictionary, with the prefix-stripped variable name as its key. * [`invocation_id`](https://docs.getdbt.com/reference/dbt-jinja-functions/invocation_id.md): Unique identifier for this dbt invocation In the manifest, the `metadata` may also include: * `send_anonymous_usage_stats`: Whether this invocation sent [anonymous usage statistics](https://docs.getdbt.com/reference/global-configs/usage-stats.md) while executing. * `project_name`: The `name` defined in the root project's `dbt_project.yml`. (Added in manifest v10 / dbt Core v1.6) * `project_id`: Project identifier, hashed from `project_name`, sent with anonymous usage stats if enabled. * `user_id`: User identifier, stored by default in `~/dbt/.user.yml`, sent with anonymous usage stats if enabled. ###### Notes:[​](#notes "Direct link to Notes:") * The structure of dbt artifacts is canonized by [JSON schemas](https://json-schema.org/), which are hosted at [schemas.getdbt.com](https://schemas.getdbt.com/). * Artifact versions may change in any minor version of dbt (`v1.x.0`). Each artifact is versioned independently. #### Related docs[​](#related-docs "Direct link to Related docs") * [Other artifacts](https://docs.getdbt.com/reference/artifacts/other-artifacts.md) files such as `index.html` or `graph_summary.json`. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### About dbt build command The `dbt build` command will: * run [models](https://docs.getdbt.com/docs/build/models.md) * test [tests](https://docs.getdbt.com/docs/build/data-tests.md) * snapshot [snapshots](https://docs.getdbt.com/docs/build/snapshots.md) * seed [seeds](https://docs.getdbt.com/docs/build/seeds.md) * build [user-defined functions](https://docs.getdbt.com/docs/build/udfs.md) (available from dbt Core v1.11 and in the dbt Fusion engine) In DAG order, for selected resources or an entire project. #### Details[​](#details "Direct link to Details") **Artifacts:** The `build` task will write a single [manifest](https://docs.getdbt.com/reference/artifacts/manifest-json.md) and a single [run results artifact](https://docs.getdbt.com/reference/artifacts/run-results-json.md). The run results will include information about all models, tests, seeds, and snapshots that were selected to build, combined into one file. **Skipping on failures:** Tests on upstream resources will block downstream resources from running, and a test failure will cause those downstream resources to skip entirely. E.g. If `model_b` depends on `model_a`, and a `unique` test on `model_a` fails, then `model_b` will `SKIP`. * Don't want a test to cause skipping? Adjust its [severity or thresholds](https://docs.getdbt.com/reference/resource-configs/severity.md) to `warn` instead of `error` * In the case of a test with multiple parents, where one parent depends on the other (e.g. a `relationships` test between `model_a` + `model_b`), that test will block-and-skip children of the most-downstream parent only (`model_b`). * If you have a test with multiple parents that are independent of each other, dbt [skips](https://github.com/dbt-labs/dbt-core/blob/d5071fa13502be273596a0b7c8b13d14b6c68655/core/dbt/compilation.py#L224-L257) the downstream node only if that node depends on all of those parents. **Selecting resources:** The `build` task supports standard selection syntax (`--select`, `--exclude`, `--selector`), as well as a `--resource-type` flag that offers a final filter (just like `list`). Whichever resources are selected, those are the ones that `build` will run/test/snapshot/seed. * Remember that tests support indirect selection, so `dbt build -s model_a` will both run *and* test `model_a`. What does that mean? Any tests that directly depend on `model_a` will be included, so long as those tests don't also depend on other unselected parents. See [test selection](https://docs.getdbt.com/reference/node-selection/test-selection-examples.md) for details and examples. **Flags:** The `build` task supports all the same flags as `run`, `test`, `snapshot`, and `seed`. For flags that are shared between multiple tasks (e.g. `--full-refresh`), `build` will use the same value for all selected resource types (e.g. both models and seeds will be full refreshed). ##### The `--empty` flag[​](#the---empty-flag "Direct link to the---empty-flag") The `build` command supports the `--empty` flag for building schema-only dry runs. The `--empty` flag limits the refs and sources to zero rows. dbt will still execute the model SQL against the target data warehouse but will avoid expensive reads of input data. This validates dependencies and ensures your models will build properly. ###### The render method[​](#the-render-method "Direct link to The render method") The `.render()` method is generally used to resolve or evaluate Jinja expressions (such as `{{ source(...) }}`) during runtime. When using the `--empty flag`, dbt may skip processing `ref()` or `source()` for optimization. To avoid compilation errors and to explicitly tell dbt to process a specific relation (`ref()` or `source()`), use the `.render()` method in your model file. For example: models.sql ```jinja {{ config( pre_hook = [ "alter external table {{ source('sys', 'customers').render() }} refresh" ] ) }} select ... ``` #### Tests[​](#tests "Direct link to Tests") When `dbt build` is executed with unit tests applied, the models will be processed according to their lineage and dependencies. The tests will be executed as follows: * [Unit tests](https://docs.getdbt.com/docs/build/unit-tests.md) are run on a SQL model. * The model is materialized. * [Data tests](https://docs.getdbt.com/docs/build/data-tests.md) are run on the model. This saves on warehouse spend as the model will only be materialized if the unit tests pass successfully. Unit tests and data tests can be selected using `--select test_type:unit` or `--select test_type:data` for `dbt build` (same for the `--exclude` flag). ##### Examples[​](#examples "Direct link to Examples") ```text $ dbt build Running with dbt=1.9.0-b2 Found 1 model, 4 tests, 1 snapshot, 1 analysis, 341 macros, 0 operations, 1 seed file, 2 sources, 2 exposures 18:49:43 | Concurrency: 1 threads (target='dev') 18:49:43 | 18:49:43 | 1 of 7 START seed file dbt_jcohen.my_seed............................ [RUN] 18:49:43 | 1 of 7 OK loaded seed file dbt_jcohen.my_seed........................ [INSERT 2 in 0.09s] 18:49:43 | 2 of 7 START view model dbt_jcohen.my_model.......................... [RUN] 18:49:43 | 2 of 7 OK created view model dbt_jcohen.my_model..................... [CREATE VIEW in 0.12s] 18:49:43 | 3 of 7 START test not_null_my_seed_id................................ [RUN] 18:49:43 | 3 of 7 PASS not_null_my_seed_id...................................... [PASS in 0.05s] 18:49:43 | 4 of 7 START test unique_my_seed_id.................................. [RUN] 18:49:43 | 4 of 7 PASS unique_my_seed_id........................................ [PASS in 0.03s] 18:49:43 | 5 of 7 START snapshot snapshots.my_snapshot.......................... [RUN] 18:49:43 | 5 of 7 OK snapshotted snapshots.my_snapshot.......................... [INSERT 0 5 in 0.27s] 18:49:43 | 6 of 7 START test not_null_my_model_id............................... [RUN] 18:49:43 | 6 of 7 PASS not_null_my_model_id..................................... [PASS in 0.03s] 18:49:43 | 7 of 7 START test unique_my_model_id................................. [RUN] 18:49:43 | 7 of 7 PASS unique_my_model_id....................................... [PASS in 0.02s] 18:49:43 | 18:49:43 | Finished running 1 seed, 1 view model, 4 tests, 1 snapshot in 1.01s. Completed successfully Done. PASS=7 WARN=0 ERROR=0 SKIP=0 TOTAL=7 ``` #### Functions[​](#functions "Direct link to Functions") *Available from dbt Core v1.11 and in the dbt Fusion engine* The `build` command builds [user-defined functions](https://docs.getdbt.com/docs/build/udfs.md) as part of the DAG execution. To build or rebuild only `functions` in your project, run `dbt build --select "resource_type:function"`. For example: ```bash dbt build --select "resource_type:function" dbt-fusion 2.0.0-preview.45 Succeeded [ 0.98s] function dbt_schema.whoami (function) Succeeded [ 1.12s] function dbt_schema.area_of_circle (function) ``` #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### About dbt clean command `dbt clean` is a utility function that deletes the paths specified within the [`clean-targets`](https://docs.getdbt.com/reference/project-configs/clean-targets.md) list in the `dbt_project.yml` file. It helps by removing unnecessary files or directories generated during the execution of other dbt commands, ensuring a clean state for the project. #### Example usage[​](#example-usage "Direct link to Example usage") ```text dbt clean ``` #### Supported flags[​](#supported-flags "Direct link to Supported flags") This section will briefly explain the following flags: * [`--clean-project-files-only`](#--clean-project-files-only) (default) * [`--no-clean-project-files-only`](#--no-clean-project-files-only) To view the list of all supported flags for the `dbt clean` command in the terminal, use the `--help` flag, which will display detailed information about the available flags you can use, including its description and usage: ```shell dbt clean --help ``` ##### --clean-project-files-only[​](#--clean-project-files-only "Direct link to --clean-project-files-only") By default, dbt deletes all the paths within the project directory specified in `clean-targets`. note Avoid using paths outside the dbt project; otherwise, you will see an error. ###### Example usage[​](#example-usage-1 "Direct link to Example usage") ```shell dbt clean --clean-project-files-only ``` ##### --no-clean-project-files-only[​](#--no-clean-project-files-only "Direct link to --no-clean-project-files-only") Deletes all the paths specified in the `clean-targets` list of `dbt_project.yml`, including those outside the current dbt project. ```shell dbt clean --no-clean-project-files-only ``` #### dbt clean with remote file system[​](#dbt-clean-with-remote-file-system "Direct link to dbt clean with remote file system") To avoid complex permissions issues and potentially deleting crucial aspects of the remote file system without access to fix them, this command does not work when interfacing with the RPC server that powers the Studio IDE. Instead, when working in dbt, the `dbt deps` command cleans before it installs packages automatically. The `target` folder can be manually deleted from the sidebar file tree if needed. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### About dbt clone command The `dbt clone` command clones selected nodes from the [specified state](https://docs.getdbt.com/reference/node-selection/syntax.md#establishing-state) to the target schema(s). This command makes use of the `clone` materialization: * If your data platform supports zero-copy cloning of tables (Snowflake, Databricks, or BigQuery), and this model exists as a table in the source environment, dbt will create it in your target environment as a clone. * Otherwise, dbt will create a simple pointer view (`select * from` the source object) * By default, `dbt clone` will not recreate pre-existing relations in the current target. To override this, use the `--full-refresh` flag. * You may want to specify a higher number of [threads](https://docs.getdbt.com/docs/running-a-dbt-project/using-threads.md) to decrease execution time since individual clone statements are independent of one another. The `clone` command is useful for: * blue/green continuous deployment (on data warehouses that support zero-copy cloning tables) * cloning current production state into development schema(s) * handling incremental models in dbt CI jobs (on data warehouses that support zero-copy cloning tables) * testing code changes on downstream dependencies in your BI tool ```bash # clone all of my models from specified state to my target schema(s) dbt clone --state path/to/artifacts # clone one_specific_model of my models from specified state to my target schema(s) dbt clone --select "one_specific_model" --state path/to/artifacts # clone all of my models from specified state to my target schema(s) and recreate all pre-existing relations in the current target dbt clone --state path/to/artifacts --full-refresh # clone all of my models from specified state to my target schema(s), running up to 50 clone statements in parallel dbt clone --state path/to/artifacts --threads 50 ``` ##### When to use `dbt clone` instead of [deferral](https://docs.getdbt.com/reference/node-selection/defer.md)?[​](#when-to-use-dbt-clone-instead-of-deferral "Direct link to when-to-use-dbt-clone-instead-of-deferral") Unlike deferral, `dbt clone` requires some compute and creation of additional objects in your data warehouse. In many cases, deferral is a cheaper and simpler alternative to `dbt clone`. However, `dbt clone` covers additional use cases where deferral may not be possible. For example, by creating actual data warehouse objects, `dbt clone` allows you to test out your code changes on downstream dependencies *outside of dbt* (such as a BI tool). As another example, you could `clone` your modified incremental models as the first step of your dbt CI job to prevent costly `full-refresh` builds for warehouses that support zero-copy cloning. #### Cloning in dbt[​](#cloning-in-dbt "Direct link to Cloning in dbt") You can clone nodes between states in dbt using the `dbt clone` command. This is available in the [Studio IDE](https://docs.getdbt.com/docs/cloud/studio-ide/develop-in-studio.md) and the [dbt CLI](https://docs.getdbt.com/docs/cloud/cloud-cli-installation.md) and relies on the [`--defer`](https://docs.getdbt.com/reference/node-selection/defer.md) feature. For more details on defer in dbt, read [Using defer in dbt](https://docs.getdbt.com/docs/cloud/about-cloud-develop-defer.md). * **Using dbt CLI** — The `dbt clone` command in the dbt CLI automatically includes the `--defer` flag. This means you can use the `dbt clone` command without any additional setup. * **Using Studio IDE** — To use the `dbt clone` command in the Studio IDE, follow these steps before running the `dbt clone` command: * Set up your **Production environment** and have a successful job run. * Enable **Defer to production** by toggling the switch in the lower-right corner of the command bar. [![Select the 'Defer to production' toggle on the bottom right of the command bar to enable defer in the Studio IDE.](/img/docs/dbt-cloud/defer-toggle.png?v=2 "Select the 'Defer to production' toggle on the bottom right of the command bar to enable defer in the Studio IDE.")](#)Select the 'Defer to production' toggle on the bottom right of the command bar to enable defer in the Studio IDE. * Run the `dbt clone` command from the command bar. Check out [this Developer blog post](https://docs.getdbt.com/blog/to-defer-or-to-clone) for more details on best practices when to use `dbt clone` vs. deferral. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### About dbt compile command `dbt compile` generates executable SQL from source `model`, `test`, and `analysis` files. You can find these compiled SQL files in the `target/` directory of your dbt project. The `compile` command is useful for: 1. Visually inspecting the compiled output of model files. This is useful for validating complex Jinja logic or macro usage. 2. Manually running compiled SQL. While debugging a model or schema test, it's often useful to execute the underlying `select` statement to find the source of the bug. 3. Compiling `analysis` files. Read more about analysis files [here](https://docs.getdbt.com/docs/build/analyses.md). Some common misconceptions: * `dbt compile` is *not* a pre-requisite of `dbt run`, or other building commands. Those commands will handle compilation themselves. * If you just want dbt to read and validate your project code, without connecting to the data warehouse, use `dbt parse` instead. ##### Interactive compile[​](#interactive-compile "Direct link to Interactive compile") Starting in dbt v1.5, `compile` can be "interactive" in the CLI, by displaying the compiled code of a node or arbitrary dbt-SQL query: * `--select` a specific node *by name* * `--inline` an arbitrary dbt-SQL query This will log the compiled SQL to the terminal, in addition to writing to the `target/` directory. For example: ```bash dbt compile --select "stg_orders" dbt compile --inline "select * from {{ ref('raw_orders') }}" ``` returns the following: ```bash dbt compile --select "stg_orders" 21:17:09 Running with dbt=1.7.5 21:17:09 Registered adapter: postgres=1.7.5 21:17:09 Found 5 models, 3 seeds, 20 tests, 0 sources, 0 exposures, 0 metrics, 401 macros, 0 groups, 0 semantic models 21:17:09 21:17:09 Concurrency: 24 threads (target='dev') 21:17:09 21:17:09 Compiled node 'stg_orders' is: with source as ( select * from "jaffle_shop"."main"."raw_orders" ), renamed as ( select id as order_id, user_id as customer_id, order_date, status from source ) select * from renamed ``` ```bash dbt compile --inline "select * from {{ ref('raw_orders') }}" 18:15:49 Running with dbt=1.7.5 18:15:50 Registered adapter: postgres=1.7.5 18:15:50 Found 5 models, 3 seeds, 20 tests, 0 sources, 0 exposures, 0 metrics, 401 macros, 0 groups, 0 semantic models 18:15:50 18:15:50 Concurrency: 5 threads (target='postgres') 18:15:50 18:15:50 Compiled inline node is: select * from "jaffle_shop"."main"."raw_orders" ``` The command accesses the data platform to cache-related metadata, and to run introspective queries. Use the flags: * `--no-populate-cache` to disable the initial cache population. If metadata is needed, it will be a cache miss, requiring dbt to run the metadata query. This is a `dbt` flag, which means you need to add `dbt` as a prefix. For example: `dbt --no-populate-cache`. * `--no-introspect` to disable [introspective queries](https://docs.getdbt.com/faqs/Warehouse/db-connection-dbt-compile.md#introspective-queries). dbt will raise an error if a model's definition requires running one. This is a `dbt compile` flag, which means you need to add `dbt compile` as a prefix. For example:`dbt compile --no-introspect`. ##### FAQs[​](#faqs "Direct link to FAQs") Why dbt compile needs a data platform connection `dbt compile` needs a data platform connection in order to gather the info it needs (including from introspective queries) to prepare the SQL for every model in your project. ##### dbt compile[​](#dbt-compile "Direct link to dbt compile") The [`dbt compile` command](https://docs.getdbt.com/reference/commands/compile.md) generates executable SQL from `source`, `model`, `test`, and `analysis` files. `dbt compile` is similar to `dbt run` except that it doesn't materialize the model's compiled SQL into an existing table. So, up until the point of materialization, `dbt compile` and `dbt run` are similar because they both require a data platform connection, run queries, and have an [`execute` variable](https://docs.getdbt.com/reference/dbt-jinja-functions/execute.md) set to `True`. However, here are some things to consider: * You don't need to execute `dbt compile` before `dbt run` * In dbt, `compile` doesn't mean `parse`. This is because `parse` validates your written `YAML`, configured tags, and so on. ##### Introspective queries[​](#introspective-queries "Direct link to Introspective queries") To generate the compiled SQL for many models, dbt needs to run introspective queries, (which is when dbt needs to run SQL in order to pull data back and do something with it) against the data platform. These introspective queries include: * Populating the relation cache. For more information, refer to the [Create new materializations](https://docs.getdbt.com/guides/create-new-materializations.md) guide. Caching speeds up the metadata checks, including whether an [incremental model](https://docs.getdbt.com/docs/build/incremental-models.md) already exists in the data platform. * Resolving [macros](https://docs.getdbt.com/docs/build/jinja-macros.md#macros), such as `run_query` or `dbt_utils.get_column_values` that you're using to template out your SQL. This is because dbt needs to run those queries during model SQL compilation. Without a data platform connection, dbt can't perform these introspective queries and won't be able to generate the compiled SQL needed for the next steps in the dbt workflow. You can [`parse`](https://docs.getdbt.com/reference/commands/parse.md) a project and use the [`list`](https://docs.getdbt.com/reference/commands/list.md) resources in the project, without an internet or data platform connection. Parsing a project is enough to produce a [manifest](https://docs.getdbt.com/reference/artifacts/manifest-json.md), however, keep in mind that the written-out manifest won't include compiled SQL. To configure a project, you do need a [connection profile](https://docs.getdbt.com/docs/local/profiles.yml.md) (`profiles.yml` if using the CLI). You need this file because the project's configuration depends on its contents. For example, you may need to use [`{{target}}`](https://docs.getdbt.com/reference/dbt-jinja-functions/target.md) for conditional configs or know what platform you're running against so that you can choose the right flavor of SQL. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### About dbt debug command Use dbt debug to test database connections and check system setup. `dbt debug` is a utility function to test the database connection and display information for debugging purposes, such as the validity of your project file, the [dbt version](https://docs.getdbt.com/reference/dbt-jinja-functions/dbt_version.md), and your installation of any requisite dependencies (like `git` when you run `dbt deps`). It checks your database connection, local configuration, and system setup across multiple axes to help identify potential issues before running dbt commands. By default, `dbt debug` validates: * **Database connection** (for configured profiles) * **dbt project setup** (like `dbt_project.yml` validity) * **System environment** (OS, Python version, installed dbt version) * **Required dependencies** (such as `git` for `dbt deps`) * **Adapter details** (installed adapter versions and compatibility) \*Note: Not to be confused with [debug-level logging](https://docs.getdbt.com/reference/global-configs/logs.md#debug-level-logging) through the `--debug` option which increases verbosity. #### Flags[​](#flags "Direct link to Flags") Most of the `dbt debug` flags apply to the dbt Core CLI. Some flags also work in dbt CLI, but only `--connection` is supported in the Studio IDE. * dbt Core CLI: Supports all flags. * Studio IDE: Only supports dbt `debug` and `dbt debug --connection`. * dbt CLI: Only supports dbt `debug` and `dbt debug --connection`. You can also use the [`dbt environment`](https://docs.getdbt.com/reference/commands/dbt-environment.md) command to interact with your dbt environment. `dbt debug` supports the following flags in your terminal when using the command line interface (CLI): ```text Usage: dbt debug [OPTIONS] Show information on the current dbt environment and check dependencies, then test the database connection. Not to be confused with the --debug option which increases verbosity. Options: --cache-selected-only / --no-cache-selected-only At start of run, populate relational cache only for schemas containing selected nodes, or for all schemas of interest. -d, --debug / --no-debug Display debug logging during dbt execution. Useful for debugging and making bug reports. --defer / --no-defer If set, resolve unselected nodes by deferring to the manifest within the --state directory. --defer-state DIRECTORY Override the state directory for deferral only. --deprecated-favor-state TEXT Internal flag for deprecating old env var. -x, --fail-fast / --no-fail-fast Stop execution on first failure. --favor-state / --no-favor-state If set, defer to the argument provided to the state flag for resolving unselected nodes, even if the node(s) exist as a database object in the current environment. --indirect-selection [eager|cautious|buildable|empty] Controls which tests run based on their relationships to selected models in your DAG. Eager (default) is most inclusive and runs tests that reference your selected models. Cautious is most exclusive and only runs tests that reference selected models. Buildable is in between. Empty runs no tests. --log-cache-events / --no-log-cache-events Enable verbose logging for relational cache events to help when debugging. --log-format [text|debug|json|default] Specify the format of logging to the console and the log file. Use --log-format-file to configure the format for the log file differently than the console. --log-format-file [text|debug|json|default] Specify the format of logging to the log file by overriding the default value and the general --log-format setting. --log-level [debug|info|warn|error|none] Specify the minimum severity of events that are logged to the console and the log file. Use --log-level-file to configure the severity for the log file differently than the console. --log-level-file [debug|info|warn|error|none] Specify the minimum severity of events that are logged to the log file by overriding the default value and the general --log-level setting. --log-path PATH Configure the 'log-path'. Only applies this setting for the current run. Overrides the 'DBT_LOG_PATH' if it is set. --partial-parse / --no-partial-parse Allow for partial parsing by looking for and writing to a pickle file in the target directory. This overrides the user configuration file. --populate-cache / --no-populate-cache At start of run, use `show` or `information_schema` queries to populate a relational cache, which can speed up subsequent materializations. --print / --no-print Output all {{ print() }} macro calls. --printer-width INTEGER Sets the width of terminal output --profile TEXT Which existing profile to load. Overrides setting in dbt_project.yml. -q, --quiet / --no-quiet Suppress all non-error logging to stdout. Does not affect {{ print() }} macro calls. -r, --record-timing-info PATH When this option is passed, dbt will output low-level timing stats to the specified file. Example: `--record-timing-info output.profile` --send-anonymous-usage-stats / --no-send-anonymous-usage-stats Send anonymous usage stats to dbt Labs. --state DIRECTORY Unless overridden, use this state directory for both state comparison and deferral. --static-parser / --no-static-parser Use the static parser. -t, --target TEXT Which target to load for the given profile --use-colors / --no-use-colors Specify whether log output is colorized in the console and the log file. Use --use- colors-file/--no-use-colors-file to colorize the log file differently than the console. --use-colors-file / --no-use-colors-file Specify whether log file output is colorized by overriding the default value and the general --use-colors/--no-use-colors setting. --use-experimental-parser / --no-use-experimental-parser Enable experimental parsing features. -V, -v, --version Show version information and exit --version-check / --no-version-check If set, ensure the installed dbt version matches the require-dbt-version specified in the dbt_project.yml file (if any). Otherwise, allow them to differ. --warn-error If dbt would normally warn, instead raise an exception. Examples include --select that selects nothing, deprecations, configurations with no associated models, invalid test configurations, and missing sources/refs in tests. --warn-error-options WARNERROROPTIONSTYPE If dbt would normally warn, instead raise an exception based on include/exclude configuration. Examples include --select that selects nothing, deprecations, configurations with no associated models, invalid test configurations, and missing sources/refs in tests. This argument should be a YAML string, with keys 'include' or 'exclude'. eg. '{"include": "all", "exclude": ["NoNodesForSelectionCriteria"]}' --write-json / --no-write-json Whether or not to write the manifest.json and run_results.json files to the target directory --connection Test the connection to the target database independent of dependency checks. Available in Studio IDE and dbt Core CLI --config-dir Print a system-specific command to access the directory that the current dbt project is searching for a profiles.yml. Then, exit. This flag renders other debug step flags no- ops. --profiles-dir PATH Which directory to look in for the profiles.yml file. If not set, dbt will look in the current working directory first, then HOME/.dbt/ --project-dir PATH Which directory to look in for the dbt_project.yml file. Default is the current working directory and its parents. --vars YAML Supply variables to the project. This argument overrides variables defined in your dbt_project.yml file. This argument should be a YAML string, eg. '{my_variable: my_value}' -h, --help Show this message and exit. ``` #### Example usage[​](#example-usage "Direct link to Example usage") Only test the connection to the data platform and skip the other checks `dbt debug` looks for: ```shell dbt debug --connection ``` Show the configured location for the `profiles.yml` file and exit: ```text dbt debug --config-dir To view your profiles.yml file, run: open /Users/alice/.dbt ``` Test the connection in the Studio IDE: ```text dbt debug --connection ``` [![Test the connection in the Studio IDE](/img/reference/dbt-debug-ide.png?v=2 "Test the connection in the Studio IDE")](#)Test the connection in the Studio IDE #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### About dbt deps command `dbt deps` pulls the most recent version of the dependencies listed in your `packages.yml` from git. See [Package-Management](https://docs.getdbt.com/docs/build/packages.md) for more information. Where relevant, dbt will display up to date and/or latest versions of packages that are listed on dbt Hub. Example below. > This does NOT apply to packages that are installed via git/local ```yaml packages: - package: dbt-labs/dbt_utils version: 0.7.1 - package: brooklyn-data/dbt_artifacts version: 1.2.0 install-prerelease: true - package: dbt-labs/codegen version: 0.4.0 - package: calogica/dbt_expectations version: 0.4.1 - git: https://github.com/dbt-labs/dbt_audit_helper.git revision: 0.4.0 - git: "https://github.com/dbt-labs/dbt_labs-experimental-features" # git URL subdirectory: "materialized-views" # name of subdirectory containing `dbt_project.yml` revision: 0.0.1 - package: dbt-labs/snowplow version: 0.13.0 ``` ```txt Installing dbt-labs/dbt_utils@0.7.1 Installed from version 0.7.1 Up to date! Installing brooklyn-data/dbt_artifacts@1.2.0 Installed from version 1.2.0 Installing dbt-labs/codegen@0.4.0 Installed from version 0.4.0 Up to date! Installing calogica/dbt_expectations@0.4.1 Installed from version 0.4.1 Up to date! Installing https://github.com/dbt-labs/dbt_audit_helper.git@0.4.0 Installed from revision 0.4.0 Installing https://github.com/dbt-labs/dbt_labs-experimental-features@0.0.1 Installed from revision 0.0.1 and subdirectory materialized-views Installing dbt-labs/snowplow@0.13.0 Installed from version 0.13.0 Updated version available: 0.13.1 Installing calogica/dbt_date@0.4.0 Installed from version 0.4.0 Up to date! Updates available for packages: ['tailsdotcom/dbt_artifacts', 'dbt-labs/snowplow'] Update your versions in packages.yml, then run dbt deps ``` #### Predictable package installs[​](#predictable-package-installs "Direct link to Predictable package installs") dbt generates a `package-lock.yml` file in the root of your project. This file records the exact resolved versions (including commit SHAs) of all packages defined in your `packages.yml` or `dependencies.yml` file. The `package-lock.yml` file ensures consistent and repeatable installs across all environments. When you run `dbt deps`, dbt installs packages based on the versions locked in the `package-lock.yml`. This means that as long as your packages file hasn’t changed, the exact same dependency versions will be installed even if newer versions of those packages have been released. This consistency is important to maintain stability in development and production environments, and to prevent unexpected issues from new releases with potential bugs. If the `packages.yml` file has changed (for example, a new package is added or a version range is updated), then `dbt deps` automatically resolves the new set of dependencies and updates the lock file accordingly. You can also manually trigger an upgrade by running `dbt deps --upgrade`. To maintain consistency, commit the `package-lock.yml` file to version control. This guarantees consistency across all environments and for all developers. ##### Managing `package-lock.yml`[​](#managing-package-lockyml "Direct link to managing-package-lockyml") The `package-lock.yml` file should be committed to Git initially and updated only when you intend to change versions or uninstall a package. For example, run `dbt deps --upgrade` to get updated package versions or `dbt deps --lock` to update the lock file based on changes to the packages config without installing the packages. To bypass using `package-lock.yml` entirely, you can add it to your project's `.gitignore`. However, this approach sacrifices the predictability of builds. If you choose this route, we strongly recommend adding version pins for third-party packages in your `packages` config. ##### Detecting changes in `packages` config[​](#detecting-changes-in-packages-config "Direct link to detecting-changes-in-packages-config") The `package-lock.yml` file includes a `sha1_hash` of your packages config. If you update `packages.yml`, dbt will detect the change and rerun dependency resolution during the next `dbt deps` command. To update the lock file without installing the new packages, use the `--lock` flag: ```shell dbt deps --lock ``` ##### Forcing package updates[​](#forcing-package-updates "Direct link to Forcing package updates") To update all packages, even if `packages.yml` hasn't changed, use the `--upgrade` flag: ```shell dbt deps --upgrade ``` This is particularly useful for fetching the latest commits from the `main` branch of an internally maintained Git package. warning Forcing package upgrades may introduce build inconsistencies unless carefully managed. ##### Adding specific packages[​](#adding-specific-packages "Direct link to Adding specific packages") The `dbt deps` command can add or update package configurations directly, saving you from remembering exact syntax. ###### Hub packages (default)[​](#hub-packages-default "Direct link to Hub packages (default)") Hub packages are the default package types and the easiest to install. ```shell dbt deps --add-package dbt-labs/dbt_utils@1.0.0 # with semantic version range dbt deps --add-package dbt-labs/snowplow@">=0.7.0,<0.8.0" ``` ###### Non-Hub packages[​](#non-hub-packages "Direct link to Non-Hub packages") Use the `--source` flag to specify the type of package to be installed: ```shell # Git package dbt deps --add-package https://github.com/fivetran/dbt_amplitude@v0.3.0 --source git # Local package dbt deps --add-package /opt/dbt/redshift --source local ``` #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### About dbt docs commands #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### About dbt environment command info The dbt platform CLI provides the `dbt environment` command for environment and connection details. If you're using Fusion or dbt Core, use `dbt debug` to inspect profile, target, and connection — or use `dbtf debug` if you have more than one dbt CLI and want to inspect Fusion. The `dbt environment` command enables you to interact with your dbt environment. Use the command for: * Viewing your local configuration details (account ID, active project ID, deployment environment, and more). * Viewing your dbt configuration details (environment ID, environment name, connection type, and more). This guide lists all the commands and options you can use with `dbt environment` in the [dbt CLI](https://docs.getdbt.com/docs/cloud/cloud-cli-installation.md). To use them, add a command or option like this: `dbt environment [command]` or use the shorthand `dbt env [command]`. ##### dbt environment show[​](#dbt-environment-show "Direct link to dbt environment show") The `show` command allows you to view your local and dbt configuration details. To run the command with the dbt CLI, enter one of the following commands, including the shorthand: ```shell dbt environment show ``` ```shell dbt env show ``` The command returns the following information: ```bash ❯ dbt env show Local Configuration: Active account ID 185854 Active project ID 271692 Active host name cloud.getdbt.com dbt_cloud.yml file path /Users/cesar/.dbt/dbt_cloud.yml dbt_project.yml file path /Users/cesar/git/cloud-cli-test-project/dbt_project.yml CLI version 0.35.7 OS info darwin arm64 Cloud Configuration: Account ID 185854 Project ID 271692 Project name Snowflake Environment ID 243762 Environment name Development Defer environment ID [N/A] dbt version 1.6.0-latest Target name default Connection type snowflake Snowflake Connection Details: Account ska67070 Warehouse DBT_TESTING_ALT Database DBT_TEST Schema CLOUD_CLI_TESTING Role SYSADMIN User dbt_cloud_user Client session keep alive false ``` Note, that dbt won't return anything that is a secret key and will return an 'NA' for any field that isn't configured. ##### dbt environment flags[​](#dbt-environment-flags "Direct link to dbt environment flags") Use the following flags (or options) with the `dbt environment` command: * `-h`, `--help` — To view the help documentation for a specific command in your command line interface. ```shell dbt environment [command] --help ``` The `--help` flag returns the following information: ```bash ❯ dbt help environment Interact with dbt environments Usage: dbt environment [command] Aliases: environment, env Available Commands: show Show the working environment Flags: -h, --help help for environment Use "dbt environment [command] --help" for more information about a command. ``` For example, to view the help documentation for the `show` command, enter one of the following commands, including the shorthand: ```shell dbt environment show --help dbt env show -h ``` #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### About dbt init command `dbt init` helps get you started using dbt Core! #### New project[​](#new-project "Direct link to New project") If this is your first time ever using the tool, it will: * ask you to name your project * ask you which database adapter you're using (or to [Supported Data Platforms](https://docs.getdbt.com/docs/supported-data-platforms.md)) * prompt you for each piece of information that dbt needs to connect to that database: things like `account`, `user`, `password`, etc Then, it will: * Create a new folder with your project name and sample files, enough to get you started with dbt * Create a connection profile on your local machine. The default location is `~/.dbt/profiles.yml`. Read more in [configuring your profile](https://docs.getdbt.com/docs/local/profiles.yml.md). When using `dbt init` to initialize your project, include the `--profile` flag to specify an existing `profiles.yml` as the `profile:` key to use instead of creating a new one. For example, `dbt init --profile profile_name`. If the profile does not exist in `profiles.yml` or the command is run inside an existing project, the command raises an error. #### Existing project[​](#existing-project "Direct link to Existing project") If you've just cloned or downloaded an existing dbt project, `dbt init` can still help you set up your connection profile so that you can start working quickly. It will prompt you for connection information, as above, and add a profile (using the `profile` name from the project) to your local `profiles.yml`, or create the file if it doesn't already exist. #### profile\_template.yml[​](#profile_templateyml "Direct link to profile_template.yml") `dbt init` knows how to prompt for connection information by looking for a file named `profile_template.yml`. It will look for this file in two places: * **Adapter plugin:** What's the bare minimum Postgres profile? What's the type of each field, what are its defaults? This information is stored in a file called [`dbt/include/postgres/profile_template.yml`](https://github.com/dbt-labs/dbt-postgres/blob/main/dbt/include/postgres/profile_template.yml). If you're the maintainer of an adapter plugin, we highly recommend that you add a `profile_template.yml` to your plugin, too. Refer to the [Build, test, document, and promote adapters](https://docs.getdbt.com/guides/adapter-creation.md) guide for more information. * **Existing project:** If you're the maintainer of an existing project, and you want to help new users get connected to your database quickly and easily, you can include your own custom `profile_template.yml` in the root of your project, alongside `dbt_project.yml`. For common connection attributes, set the values in `fixed`; leave user-specific attributes in `prompts`, but with custom hints and defaults as you'd like. profile\_template.yml ```yml fixed: account: abc123 authenticator: externalbrowser database: analytics role: transformer type: snowflake warehouse: transforming prompts: target: type: string hint: your desired target name user: type: string hint: yourname@jaffleshop.com schema: type: string hint: usually dbt_ threads: hint: "your favorite number, 1-10" type: int default: 8 ``` ```text $ dbt init Running with dbt=1.0.0 Setting up your profile. user (yourname@jaffleshop.com): summerintern@jaffleshop.com schema (usually dbt_): dbt_summerintern threads (your favorite number, 1-10) [8]: 6 Profile internal-snowflake written to /Users/intern/.dbt/profiles.yml using project's profile_template.yml and your supplied values. Run 'dbt debug' to validate the connection. ``` #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### About dbt invocation command The `dbt invocation` command is available in the [dbt CLI](https://docs.getdbt.com/docs/cloud/cloud-cli-installation.md) and allows you to: * List active invocations to debug long-running or hanging invocations. * Identify and investigate sessions causing the `Session occupied` error. * Monitor currently active dbt commands (like `run`, `build`) in real-time. The `dbt invocation` command only lists *active invocations*. If no sessions are running, the list will be empty. Completed sessions aren't included in the output. #### Usage[​](#usage "Direct link to Usage") This page lists the command and flag you can use with `dbt invocation`. To use them, add a command or option like this: `dbt invocation [command]`. Available flags in the command line interface (CLI) are [`help`](#dbt-invocation-help) and [`list`](#dbt-invocation-list). ##### dbt invocation help[​](#dbt-invocation-help "Direct link to dbt invocation help") The `help` command provides you with the help output for the `invocation` command in the CLI, including the available flags. ```shell dbt invocation help ``` or ```shell dbt help invocation ``` The command returns the following information: ```bash dbt invocation help Manage invocations Usage: dbt invocation [command] Available Commands: list List active invocations Flags: -h, --help help for invocation Global Flags: --log-format LogFormat The log format, either json or plain. (default plain) --log-level LogLevel The log level, one of debug, info, warning, error or fatal. (default info) --no-color Disables colorization of the output. -q, --quiet Suppress all non-error logging to stdout. Use "dbt invocation [command] --help" for more information about a command. ``` ##### dbt invocation list[​](#dbt-invocation-list "Direct link to dbt invocation list") The `list` command provides you with a list of active invocations in your dbt CLI. When a long-running session is active, you can use this command in a separate terminal window to view the active session to help debug the issue. ```shell dbt invocation list ``` The command returns the following information, including the `ID`, `status`, `type`, `arguments`, and `started at` time of the active session: ```bash dbt invocation list Active Invocations: ID 6dcf4723-e057-48b5-946f-a4d87e1d117a Status running Type cli Args [run --select test.sql] Started At 2025-01-24 11:03:19 ➜ jaffle-shop git:(test-cli) ✗ ``` tip To cancel an active session in the terminal, use the `Ctrl + Z` shortcut. #### Related docs[​](#related-docs "Direct link to Related docs") * [Install dbt CLI](https://docs.getdbt.com/docs/cloud/cloud-cli-installation.md) * [Troubleshooting dbt CLI 'Session occupied' error](https://docs.getdbt.com/faqs/Troubleshooting/long-sessions-cloud-cli.md) #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### About dbt ls (list) command The `dbt ls` command lists resources in your dbt project. It accepts selector arguments that are similar to those provided in [dbt run](https://docs.getdbt.com/reference/commands/run.md). `dbt list` is an alias for `dbt ls`. While `dbt ls` will read your [connection profile](https://docs.getdbt.com/docs/local/profiles.yml.md) to resolve [`target`](https://docs.getdbt.com/reference/dbt-jinja-functions/target.md)-specific logic, this command will not connect to your database or run any queries. ##### Usage[​](#usage "Direct link to Usage") ```text dbt ls [--resource-type {model,semantic_model,source,seed,snapshot,metric,test,exposure,analysis,function,default,all}] [--select SELECTION_ARG [SELECTION_ARG ...]] [--models SELECTOR [SELECTOR ...]] [--exclude SELECTOR [SELECTOR ...]] [--selector YML_SELECTOR_NAME] [--output {json,name,path,selector}] [--output-keys KEY_NAME [KEY_NAME]] ``` See [resource selection syntax](https://docs.getdbt.com/reference/node-selection/syntax.md) for more information on how to select resources in dbt **Arguments**: * `--resource-type`: This flag restricts the "resource types" returned by dbt in the `dbt ls` command. By default, all resource types are included in the results of `dbt ls` except for the analysis type. * `--select`: This flag specifies one or more selection-type arguments used to filter the nodes returned by the `dbt ls` command * `--models`: Like the `--select` flag, this flag is used to select nodes. It implies `--resource-type=model`, and will only return models in the results of the `dbt ls` command. Supported for backwards compatibility only. * `--exclude`: Specify selectors that should be *excluded* from the list of returned nodes. * `--selector`: This flag specifies one named selector, defined in a `selectors.yml` file. * `--output`: This flag controls the format of output from the `dbt ls` command. * `--output-keys`: If `--output json`, this flag controls which node properties are included in the output. Note that the `dbt ls` command does not include models which are disabled or schema tests which depend on models which are disabled. All returned resources will have a `config.enabled` value of `true`. ##### Example usage[​](#example-usage "Direct link to Example usage") The following examples show how to use the `dbt ls` command to list resources in your project. * [Listing models by package](#listing-models-by-package) * [Listing tests by tag name](#listing-tests-by-tag-name) * [Listing schema tests of incremental models](#listing-schema-tests-of-incremental-models) * [Listing JSON output](#listing-json-output) * [Listing JSON output with custom keys](#listing-json-output-with-custom-keys) * [Listing semantic models](#listing-semantic-models) * [Listing functions](#listing-functions) ###### Listing models by package[​](#listing-models-by-package "Direct link to Listing models by package") ```bash dbt ls --select snowplow.* snowplow.snowplow_base_events snowplow.snowplow_base_web_page_context snowplow.snowplow_id_map snowplow.snowplow_page_views snowplow.snowplow_sessions ... ``` ###### Listing tests by tag name[​](#listing-tests-by-tag-name "Direct link to Listing tests by tag name") ```bash dbt ls --select tag:nightly --resource-type test my_project.schema_test.not_null_orders_order_id my_project.schema_test.unique_orders_order_id my_project.schema_test.not_null_products_product_id my_project.schema_test.unique_products_product_id ... ``` ###### Listing schema tests of incremental models[​](#listing-schema-tests-of-incremental-models "Direct link to Listing schema tests of incremental models") ```bash dbt ls --select config.materialized:incremental,test_type:schema model.my_project.logs_parsed model.my_project.events_categorized ``` ###### Listing JSON output[​](#listing-json-output "Direct link to Listing JSON output") ```bash dbt ls --select snowplow.* --output json {"name": "snowplow_events", "resource_type": "model", "package_name": "snowplow", ...} {"name": "snowplow_page_views", "resource_type": "model", "package_name": "snowplow", ...} ... ``` ###### Listing JSON output with custom keys[​](#listing-json-output-with-custom-keys "Direct link to Listing JSON output with custom keys") ```bash dbt ls --select snowplow.* --output json --output-keys "name resource_type description" {"name": "snowplow_events", "description": "This is a pretty cool model", ...} {"name": "snowplow_page_views", "description": "This model is even cooler", ...} ... ``` ###### Listing semantic models[​](#listing-semantic-models "Direct link to Listing semantic models") List all resources upstream of your orders semantic model: ```bash dbt ls -s +semantic_model:orders ``` ###### Listing file paths[​](#listing-file-paths "Direct link to Listing file paths") ```bash dbt ls --select snowplow.* --output path models/base/snowplow_base_events.sql models/base/snowplow_base_web_page_context.sql models/identification/snowplow_id_map.sql ... ``` ###### Listing functions[​](#listing-functions "Direct link to Listing functions") List all functions in your project: ```bash dbt list --select "resource_type:function" # or dbt ls --resource-type function jaffle_shop.area_of_circle jaffle_shop.whoami ``` #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### About dbt parse command The `dbt parse` command parses and validates the contents of your dbt project. If your project contains Jinja or YAML syntax errors, the command will fail. It will also produce an artifact with detailed timing information, which is useful to understand parsing times for large projects. Refer to [Project parsing](https://docs.getdbt.com/reference/parsing.md) for more information. Starting in v1.5, `dbt parse` will write or return a [manifest](https://docs.getdbt.com/reference/artifacts/manifest-json.md), enabling you to introspect dbt's understanding of all the resources in your project. Since `dbt parse` doesn't connect to your warehouse, [this manifest will not contain any compiled code](https://docs.getdbt.com/faqs/Warehouse/db-connection-dbt-compile.md). By default, the Studio IDE will attempt a "partial" parse, which means it'll only check changes since the last parse (new or updated parts of your project when you make changes). Since the Studio IDE automatically parses in the background whenever you save your work, manually running `dbt parse` yourself is likely to be fast because it's just looking at recent changes. As an option, you can tell dbt to check the entire project from scratch by using the `--no-partial-parse` flag. This makes dbt perform a full re-parse of the project, not just the recent changes. ```text $ dbt parse 13:02:52 Running with dbt=1.5.0 13:02:53 Performance info: target/perf_info.json ``` target/perf\_info.json ```json { "path_count": 7, "is_partial_parse_enabled": false, "parse_project_elapsed": 0.20151838900000008, "patch_sources_elapsed": 0.00039490800000008264, "process_manifest_elapsed": 0.029363873999999957, "load_all_elapsed": 0.240095269, "projects": [ { "project_name": "my_project", "elapsed": 0.07518750299999999, "parsers": [ { "parser": "model", "elapsed": 0.04545303199999995, "path_count": 1 }, { "parser": "operation", "elapsed": 0.0006415469999998535, "path_count": 1 }, { "parser": "seed", "elapsed": 0.026538173000000054, "path_count": 2 } ], "path_count": 4 }, { "project_name": "dbt_postgres", "elapsed": 0.0016448299999998195, "parsers": [ { "parser": "operation", "elapsed": 0.00021672399999994596, "path_count": 1 } ], "path_count": 1 }, { "project_name": "dbt", "elapsed": 0.006580432000000025, "parsers": [ { "parser": "operation", "elapsed": 0.0002488560000000195, "path_count": 1 }, { "parser": "docs", "elapsed": 0.002500640000000054, "path_count": 1 } ], "path_count": 2 } ] } ``` #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### About dbt retry command `dbt retry` re-executes the last `dbt` command from the node point of failure. * If no nodes are executed before the failure (for example, if a run failed early due to a warehouse connection or permission errors), `dbt retry` won't run anything since there's no recorded nodes to retry from. * In these cases, we recommend checking your [`run_results.json` file](https://docs.getdbt.com/reference/artifacts/run-results-json.md) and manually re-running the full job so the nodes build. * Once some nodes have run, you can use `dbt retry` to re-execute from any new point of failure. * If the previously executed command completed successfully, `dbt retry` will finish as `no operation`. Retry works with the following commands: * [`build`](https://docs.getdbt.com/reference/commands/build.md) * [`compile`](https://docs.getdbt.com/reference/commands/compile.md) * [`clone`](https://docs.getdbt.com/reference/commands/clone.md) * [`docs generate`](https://docs.getdbt.com/reference/commands/cmd-docs.md#dbt-docs-generate) * [`seed`](https://docs.getdbt.com/reference/commands/seed.md) * [`snapshot`](https://docs.getdbt.com/reference/commands/build.md) * [`test`](https://docs.getdbt.com/reference/commands/test.md) * [`run`](https://docs.getdbt.com/reference/commands/run.md) * [`run-operation`](https://docs.getdbt.com/reference/commands/run-operation.md) `dbt retry` references [run\_results.json](https://docs.getdbt.com/reference/artifacts/run-results-json.md) to determine where to start. Executing `dbt retry` without correcting the previous failures will garner idempotent results. `dbt retry` reuses the [selectors](https://docs.getdbt.com/reference/node-selection/yaml-selectors.md) from the previously executed command. Example results of executing `dbt retry` after a successful `dbt run`: ```shell Running with dbt=1.6.1 Registered adapter: duckdb=1.6.0 Found 5 models, 3 seeds, 20 tests, 0 sources, 0 exposures, 0 metrics, 348 macros, 0 groups, 0 semantic models Nothing to do. Try checking your model configs and model specification args ``` Example of when `dbt run` encounters a syntax error in a model: ```shell Running with dbt=1.6.1 Registered adapter: duckdb=1.6.0 Found 5 models, 3 seeds, 20 tests, 0 sources, 0 exposures, 0 metrics, 348 macros, 0 groups, 0 semantic models Concurrency: 24 threads (target='dev') 1 of 5 START sql view model main.stg_customers ................................. [RUN] 2 of 5 START sql view model main.stg_orders .................................... [RUN] 3 of 5 START sql view model main.stg_payments .................................. [RUN] 1 of 5 OK created sql view model main.stg_customers ............................ [OK in 0.06s] 2 of 5 OK created sql view model main.stg_orders ............................... [OK in 0.06s] 3 of 5 OK created sql view model main.stg_payments ............................. [OK in 0.07s] 4 of 5 START sql table model main.customers .................................... [RUN] 5 of 5 START sql table model main.orders ....................................... [RUN] 4 of 5 ERROR creating sql table model main.customers ........................... [ERROR in 0.03s] 5 of 5 OK created sql table model main.orders .................................. [OK in 0.04s] Finished running 3 view models, 2 table models in 0 hours 0 minutes and 0.15 seconds (0.15s). Completed with 1 error and 0 warnings: Runtime Error in model customers (models/customers.sql) Parser Error: syntax error at or near "selct" Done. PASS=4 WARN=0 ERROR=1 SKIP=0 TOTAL=5 ``` Example of a subsequent failed `dbt retry` run without fixing the error(s): ```shell Running with dbt=1.6.1 Registered adapter: duckdb=1.6.0 Found 5 models, 3 seeds, 20 tests, 0 sources, 0 exposures, 0 metrics, 348 macros, 0 groups, 0 semantic models Concurrency: 24 threads (target='dev') 1 of 1 START sql table model main.customers .................................... [RUN] 1 of 1 ERROR creating sql table model main.customers ........................... [ERROR in 0.03s] Done. PASS=4 WARN=0 ERROR=1 SKIP=0 TOTAL=5 ``` Example of a successful `dbt retry` run after fixing error(s): ```shell Running with dbt=1.6.1 Registered adapter: duckdb=1.6.0 Found 5 models, 3 seeds, 20 tests, 0 sources, 0 exposures, 0 metrics, 348 macros, 0 groups, 0 semantic models Concurrency: 24 threads (target='dev') 1 of 1 START sql table model main.customers .................................... [RUN] 1 of 1 OK created sql table model main.customers ............................... [OK in 0.05s] Finished running 1 table model in 0 hours 0 minutes and 0.09 seconds (0.09s). Completed successfully Done. PASS=1 WARN=0 ERROR=0 SKIP=0 TOTAL=1 ``` In each scenario `dbt retry` picks up from the error rather than running all of the upstream dependencies again. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### About dbt rpc command The dbt-rpc plugin is deprecated dbt Labs actively maintained `dbt-rpc` for compatibility with dbt-core versions up to v1.5. Starting with dbt-core v1.6 (released in July 2023), `dbt-rpc` is no longer supported for ongoing compatibility. In the meantime, dbt Labs will be performing critical maintenance only for `dbt-rpc`, until the last compatible version of dbt-core has reached the [end of official support](https://docs.getdbt.com/docs/dbt-versions/core.md#end-of-life-versions). At that point, dbt Labs will archive this repository to be read-only. ##### Overview[​](#overview "Direct link to Overview") You can use the `dbt-rpc` plugin to run a Remote Procedure Call (rpc) dbt server. This server compiles and runs queries in the context of a dbt project. Additionally, the RPC server provides methods that enable you to list and terminate running processes. We recommend running an rpc server from a directory containing a dbt project. The server will compile the project into memory, then accept requests to operate against that project's dbt context. Running on Windows We do not recommend running the rpc server on Windows because of reliability issues. A Docker container may provide a useful workaround, if required. For more details, see the [`dbt-rpc` repository](https://github.com/dbt-labs/dbt-rpc) source code. **Running the server:** ```text $ dbt-rpc serve Running with dbt=1.5.0 16:34:31 | Concurrency: 8 threads (target='dev') 16:34:31 | 16:34:31 | Done. Serving RPC server at 0.0.0.0:8580 Send requests to http://localhost:8580/jsonrpc ``` **Configuring the server** * `--host`: Specify the host to listen on (default=`0.0.0.0`) * `--port`: Specify the port to listen on (default=`8580`) **Submitting queries to the server:** The rpc server expects requests in the following format: rpc-spec.json ```json { "jsonrpc": "2.0", "method": "{ a valid rpc server command }", "id": "{ a unique identifier for this query }", "params": { "timeout": { timeout for the query in seconds, optional }, } } ``` #### Built-in Methods[​](#built-in-methods "Direct link to Built-in Methods") ##### status[​](#status "Direct link to status") The `status` method will return the status of the rpc server. This method response includes a high-level status, like `ready`, `compiling`, or `error`, as well as the set of logs accumulated during the initial compilation of the project. When the rpc server is in the `compiling` or `error` state, only built-in methods of the RPC server will be accepted. **Example request** ```json { "jsonrpc": "2.0", "method": "status", "id": "2db9a2fe-9a39-41ef-828c-25e04dd6b07d" } ``` **Example response** ```json { "result": { "status": "ready", "error": null, "logs": [..], "timestamp": "2019-10-07T16:30:09.875534Z", "pid": 76715 }, "id": "2db9a2fe-9a39-41ef-828c-25e04dd6b07d", "jsonrpc": "2.0" } ``` ##### poll[​](#poll "Direct link to poll") The `poll` endpoint will return the status, logs, and results (if available) for a running or completed task. The `poll` method requires a `request_token` parameter which indicates the task to poll a response for. The `request_token` is returned in the response of dbt tasks like `compile`, `run` and `test`. **Parameters**: * `request_token`: The token to poll responses for * `logs`: A boolean flag indicating if logs should be returned in the response (default=false) * `logs_start`: The zero-indexed log line to fetch logs from (default=0) **Example request** ```json { "jsonrpc": "2.0", "method": "poll", "id": "2db9a2fe-9a39-41ef-828c-25e04dd6b07d", "params": { "request_token": "f86926fa-6535-4891-8d24-2cfc65d2a347", "logs": true, "logs_start": 0 } } ``` **Example Response** ```json { "result": { "results": [], "generated_at": "2019-10-11T18:25:22.477203Z", "elapsed_time": 0.8381369113922119, "logs": [], "tags": { "command": "run --select my_model", "branch": "abc123" }, "status": "success" }, "id": "2db9a2fe-9a39-41ef-828c-25e04dd6b07d", "jsonrpc": "2.0" } ``` ##### ps[​](#ps "Direct link to ps") The `ps` methods lists running and completed processes executed by the RPC server. **Parameters** * `completed`: If true, also return completed tasks (default=false) **Example request:** ```json { "jsonrpc": "2.0", "method": "ps", "id": "2db9a2fe-9a39-41ef-828c-25e04dd6b07d", "params": { "completed": true } } ``` **Example response:** ```json { "result": { "rows": [ { "task_id": "561d4a02-18a9-40d1-9f01-cd875c3ec56d", "request_id": "3db9a2fe-9a39-41ef-828c-25e04dd6b07d", "request_source": "127.0.0.1", "method": "run", "state": "success", "start": "2019-10-07T17:09:49.865976Z", "end": null, "elapsed": 1.107261, "timeout": null, "tags": { "command": "run --select my_model", "branch": "feature/add-models" } } ] }, "id": "2db9a2fe-9a39-41ef-828c-25e04dd6b07d", "jsonrpc": "2.0" } ``` ##### kill[​](#kill "Direct link to kill") The `kill` method will terminate a running task. You can find a `task_id` for a running task either in the original response which invoked that task, or in the results of the `ps` method. **Example request** ```json { "jsonrpc": "2.0", "method": "kill", "id": "2db9a2fe-9a39-41ef-828c-25e04dd6b07d", "params": { "task_id": "{ the task id to terminate }" } } ``` #### Running dbt projects[​](#running-dbt-projects "Direct link to Running dbt projects") The following methods make it possible to run dbt projects via the RPC server. ##### Common parameters[​](#common-parameters "Direct link to Common parameters") All RPC requests accept the following parameters in addition to the parameters listed: * `timeout`: The max amount of time to wait before cancelling the request. * `task_tags`: Arbitrary key/value pairs to attach to this task. These tags will be returned in the output of the `poll` and `ps` methods (optional). ##### Running a task with CLI syntax[​](#running-a-task-with-cli-syntax "Direct link to Running a task with CLI syntax") **Parameters:** * `cli`: A dbt command (eg. `run --select abc+ --exclude +def`) to run (required) ```json { "jsonrpc": "2.0", "method": "cli_args", "id": "", "params": { "cli": "run --select abc+ --exclude +def", "task_tags": { "branch": "feature/my-branch", "commit": "c0ff33b01" } } } ``` Several of the following request types accept these additional parameters: * `threads`: The number of [threads](https://docs.getdbt.com/docs/local/profiles.yml.md#understanding-threads) to use when compiling (optional) * `select`: The space-delimited set of resources to execute (optional). (`models` is also supported on some request types for backwards compatibility.) * `selector`: The name of a predefined [YAML selector](https://docs.getdbt.com/reference/node-selection/yaml-selectors.md) that defines the set of resources to execute (optional) * `exclude`: The space-delimited set of resources to exclude from compiling, running, testing, seeding, or snapshotting (optional) * `state`: The filepath of artifacts to use when establishing [state](https://docs.getdbt.com/reference/node-selection/syntax.md#about-node-selection) (optional) ##### Compile a project ([docs](https://docs.getdbt.com/reference/commands/compile.md))[​](#compile-a-project-docs "Direct link to compile-a-project-docs") ```json { "jsonrpc": "2.0", "method": "compile", "id": "", "params": { "threads": " (optional)", "select": " (optional)", "exclude": " (optional)", "selector": " (optional)", "state": " (optional)" } } ``` ##### Run models ([docs](https://docs.getdbt.com/reference/commands/run.md))[​](#run-models-docs "Direct link to run-models-docs") **Additional parameters:** * `defer`: Whether to defer references to upstream, unselected resources (optional, requires `state`) ```json { "jsonrpc": "2.0", "method": "run", "id": "", "params": { "threads": " (optional)", "select": " (optional)", "exclude": " (optional)", "selector": " (optional)", "state": " (optional)", "defer": " (optional)" } } ``` ##### Run tests ([docs](https://docs.getdbt.com/reference/commands/test.md))[​](#run-tests-docs "Direct link to run-tests-docs") **Additional parameters:** * `data`: If True, run data tests (optional, default=true) * `schema`: If True, run schema tests (optional, default=true) ```json { "jsonrpc": "2.0", "method": "test", "id": "", "params": { "threads": " (optional)", "select": " (optional)", "exclude": " (optional)", "selector": " (optional)", "state": " (optional)", "data": " (optional)", "schema": " (optional)" } } ``` ##### Run seeds ([docs](https://docs.getdbt.com/reference/commands/seed.md))[​](#run-seeds-docs "Direct link to run-seeds-docs") **Parameters:** * `show`: If True, show a sample of the seeded data in the response (optional, default=false) ```json { "jsonrpc": "2.0", "method": "seed", "id": "", "params": { "threads": " (optional)", "select": " (optional)", "exclude": " (optional)", "selector": " (optional)", "show": " (optional)", "state": " (optional)" } } ``` ##### Run snapshots ([docs](https://docs.getdbt.com/docs/build/snapshots.md))[​](#run-snapshots-docs "Direct link to run-snapshots-docs") ```json { "jsonrpc": "2.0", "method": "snapshot", "id": "", "params": { "threads": " (optional)", "select": " (optional)", "exclude": " (optional)", "selector": " (optional)", "state": " (optional)" } } ``` ##### Build ([docs](https://docs.getdbt.com/reference/commands/build.md))[​](#build-docs "Direct link to build-docs") ```json { "jsonrpc": "2.0", "method": "build", "id": "", "params": { "threads": " (optional)", "select": " (optional)", "exclude": " (optional)", "selector": " (optional)", "state": " (optional)", "defer": " (optional)" } } ``` ##### List project resources ([docs](https://docs.getdbt.com/reference/commands/cmd-docs.md#dbt-docs-generate))[​](#list-project-resources-docs "Direct link to list-project-resources-docs") **Additional parameters:** * `resource_types`: Filter selected resources by type * `output_keys`: Specify which node properties to include in output ```json { "jsonrpc": "2.0", "method": "ls", "id": "", "params": { "select": " (optional)", "exclude": " (optional)", "selector": " (optional)", "resource_types": [" (optional)"], "output_keys": [" (optional)"], } } ``` ##### Generate docs ([docs](https://docs.getdbt.com/reference/commands/cmd-docs.md#dbt-docs-generate))[​](#generate-docs-docs "Direct link to generate-docs-docs") **Additional parameters:** * `compile`: If True, compile the project before generating a catalog (optional, default=false) ```json { "jsonrpc": "2.0", "method": "docs.generate", "id": "", "params": { "compile": " (optional)", "state": " (optional)" } } ``` #### Compiling and running SQL statements[​](#compiling-and-running-sql-statements "Direct link to Compiling and running SQL statements") ##### Compiling a query[​](#compiling-a-query "Direct link to Compiling a query") This query compiles the SQL `select {{ 1 + 1 }} as id` (base64-encoded) against the rpc server: rpc-spec.json ```json { "jsonrpc": "2.0", "method": "compile_sql", "id": "2db9a2fe-9a39-41ef-828c-25e04dd6b07d", "params": { "timeout": 60, "sql": "c2VsZWN0IHt7IDEgKyAxIH19IGFzIGlk", "name": "my_first_query" } } ``` The resulting response will include a key called `compiled_sql` with a value of `'select 2'`. ##### Executing a query[​](#executing-a-query "Direct link to Executing a query") This query executes the SQL `select {{ 1 + 1 }} as id` (bas64-encoded) against the rpc server: rpc-run.json ```json { "jsonrpc": "2.0", "method": "run_sql", "id": "2db9a2fe-9a39-41ef-828c-25e04dd6b07d", "params": { "timeout": 60, "sql": "c2VsZWN0IHt7IDEgKyAxIH19IGFzIGlk", "name": "my_first_query" } } ``` The resulting response will include a key called `table` with a value of `{'column_names': ['?column?'], 'rows': [[2.0]]}` #### Reloading the RPC Server[​](#reloading-the-rpc-server "Direct link to Reloading the RPC Server") When the dbt RPC Server starts, it will load the dbt project into memory using the files present on disk at startup. If the files in the dbt project should change (either during development or in a deployment), the dbt RPC Server can be updated live without cycling the server process. To reload the files present on disk, send a "hangup" signal to the running server process using the Process ID (pid) of the running process. ##### Finding the server PID[​](#finding-the-server-pid "Direct link to Finding the server PID") To find the server PID, either fetch the `.result.pid` value from the `status` method response on the server, or use `ps`: ```text # Find the server PID using `ps`: ps aux | grep 'dbt-rpc serve' | grep -v grep ``` After finding the PID for the process (eg. 12345), send a signal to the running server using the `kill` command: ```text kill -HUP 12345 ``` When the server receives the HUP (hangup) signal, it will re-parse the files on disk and use the updated project code when handling subsequent requests. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### About dbt run command #### Overview[​](#overview "Direct link to Overview") The `dbt run` command only applies to models. It doesn't run tests, snapshots, seeds, or other resource types. To run those commands, use the appropriate dbt commands found in the [dbt commands](https://docs.getdbt.com/reference/dbt-commands.md) section — such as `dbt test`, `dbt snapshot`, or `dbt seed`. Alternatively, use `dbt build` with a [resource type selector](https://docs.getdbt.com/reference/node-selection/methods.md#resource_type). You can use the `dbt run` command when you want to build or rebuild models in your project. ##### How does `dbt run` work?[​](#how-does-dbt-run-work "Direct link to how-does-dbt-run-work") * `dbt run` executes compiled SQL model files against the current `target` database. * dbt connects to the target database and runs the relevant SQL required to materialize all data models using the specified materialization strategies. * Models are run in the order defined by the dependency graph generated during compilation. Intelligent multi-threading is used to minimize execution time without violating dependencies. * Deploying new models frequently involves destroying prior versions of these models. In these cases, `dbt run` minimizes downtime by first building each model with a temporary name, then dropping and renaming within a single transaction (for adapters that support transactions). #### Refresh incremental models[​](#refresh-incremental-models "Direct link to Refresh incremental models") If you provide the `--full-refresh` flag to `dbt run`, dbt will treat incremental models as table models. This is useful when 1. The schema of an incremental model changes and you need to recreate it. 2. You want to reprocess the entirety of the incremental model because of new logic in the model code. bash ```shell dbt run --full-refresh ``` You can also supply the flag by its short name: `dbt run -f`. In the dbt compilation context, this flag will be available as [flags.FULL\_REFRESH](https://docs.getdbt.com/reference/dbt-jinja-functions/flags.md). Further, the `is_incremental()` macro will return `false` for *all* models in response when the `--full-refresh` flag is specified. models/example.sql ```sql select * from all_events -- if the table already exists and `--full-refresh` is -- not set, then only add new records. otherwise, select -- all records. {% if is_incremental() %} where collector_tstamp > ( select coalesce(max(max_tstamp), '0001-01-01') from {{ this }} ) {% endif %} ``` #### Running specific models[​](#running-specific-models "Direct link to Running specific models") dbt will also allow you select which specific models you'd like to materialize. This can be useful during special scenarios where you may prefer running a different set of models at various intervals. This can also be helpful when you may want to limit the tables materialized while you develop and test new models. For more information, see the [Model Selection Syntax Documentation](https://docs.getdbt.com/reference/node-selection/syntax.md). For more information on running parents or children of specific models, see the [Graph Operators Documentation](https://docs.getdbt.com/reference/node-selection/graph-operators.md). #### Treat warnings as errors[​](#treat-warnings-as-errors "Direct link to Treat warnings as errors") See [global configs](https://docs.getdbt.com/reference/global-configs/warnings.md) #### Failing fast[​](#failing-fast "Direct link to Failing fast") See [global configs](https://docs.getdbt.com/reference/global-configs/failing-fast.md) #### Enable or Disable Colorized Logs[​](#enable-or-disable-colorized-logs "Direct link to Enable or Disable Colorized Logs") See [global configs](https://docs.getdbt.com/reference/global-configs/print-output.md#print-color) #### The `--empty` flag[​](#the---empty-flag "Direct link to the---empty-flag") The `run` command supports the `--empty` flag for building schema-only dry runs. The `--empty` flag limits the refs and sources to zero rows. dbt will still execute the model SQL against the target data warehouse but will avoid expensive reads of input data. This validates dependencies and ensures your models will build properly. #### Status codes[​](#status-codes "Direct link to Status codes") When calling the [list\_runs api](https://docs.getdbt.com/dbt-cloud/api-v2#/operations/List%20Runs), you will get a status code for each run returned. The available run status codes are as follows: * Queued = 1 * Starting = 2 * Running = 3 * Success = 10 * Error = 20 * Canceled = 30 * Skipped = 40 #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### About dbt run-operation command ##### Overview[​](#overview "Direct link to Overview") The `dbt run-operation` command is used to invoke a macro. For usage information, consult the docs on [operations](https://docs.getdbt.com/docs/build/hooks-operations.md#about-operations). ##### Usage[​](#usage "Direct link to Usage") ```text $ dbt run-operation {macro} --args '{args}' {macro} Specify the macro to invoke. dbt will call this macro with the supplied arguments and then exit --args ARGS Supply arguments to the macro. This dictionary will be mapped to the keyword arguments defined in the selected macro. This argument should be a YAML string, eg. '{my_variable: my_value}' ``` ##### Command line examples[​](#command-line-examples "Direct link to Command line examples") Example 1: `$ dbt run-operation grant_select --args '{role: reporter}'` Example 2: `$ dbt run-operation clean_stale_models --args '{days: 7, dry_run: True}'` #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### About dbt seed command The `dbt seed` command will load `csv` files located in the `seed-paths` directory of your dbt project into your data warehouse. ##### Selecting seeds to run[​](#selecting-seeds-to-run "Direct link to Selecting seeds to run") Specific seeds can be run using the `--select` flag to `dbt seed`. Example: ```text $ dbt seed --select "country_codes" Found 2 models, 3 tests, 0 archives, 0 analyses, 53 macros, 0 operations, 2 seed files 14:46:15 | Concurrency: 1 threads (target='dev') 14:46:15 | 14:46:15 | 1 of 1 START seed file analytics.country_codes........................... [RUN] 14:46:15 | 1 of 1 OK loaded seed file analytics.country_codes....................... [INSERT 3 in 0.01s] 14:46:16 | 14:46:16 | Finished running 1 seed in 0.14s. ``` For information about configuring seeds (for example, column types and quoting behavior), see [Seed configurations](https://docs.getdbt.com/reference/seed-configs.md). #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### About dbt show command Use `dbt show` to: * Compile the dbt-SQL definition of a single `model`, `test`, `analysis`, or an arbitrary dbt-SQL query passed `--inline` * `dbt show` does not support [Python (dbt-py)](https://docs.getdbt.com/docs/build/python-models.md) models. * Only selecting a single node is supported. [Selector methods](https://docs.getdbt.com/reference/node-selection/methods.md), [graph operators](https://docs.getdbt.com/reference/node-selection/graph-operators.md), and other methods that select multiple nodes will not be utilized. * Run that query against the data warehouse * Preview the results in the terminal #### How it works[​](#how-it-works "Direct link to How it works") By default, `dbt show` will display the first 5 rows from the query result. This can be customized by passing the `limit` or the `inline` flags , where `n` is the number of rows to display. If previewing a model, dbt will always compile and run the compiled query from source. It will not select from the already-materialized database relation, even if you've just run the model. (We may support that in the future; if you're interested, upvote or comment on [dbt-core#7391](https://github.com/dbt-labs/dbt-core/issues/7391).) ###### `limit` flag[​](#limit-flag "Direct link to limit-flag") * The `--limit` flag modifies the underlying SQL and not just the number of rows displayed. By using the `--limit n` flag, it means `n` is the number of rows to display and retrieved from the data warehouse. * This means dbt wraps your model's query in a subquery or CTE and applies a SQL `limit n` clause so that your data warehouse only processes and returns that number of rows, making it significantly faster for large datasets. ###### `inline` flag[​](#inline-flag "Direct link to inline-flag") * The results of the preview query are only included in dbt's logs and displayed in the terminal and aren't materialized in the data warehouse or stored in any dbt file, except if you use `dbt show --inline`. * The `--inline` flags enables you to run ad-hoc SQL, which means dbt can't ensure the query doesn't modify the data warehouse. To ensure no changes are made, use a profile or role with read-only permissions, which are managed directly in your data warehouse. For example: `dbt show --inline "select * from my_table" --profile my-read-only-profile`. ##### `--output json` flag[​](#--output-json-flag "Direct link to --output-json-flag") * The `--output json` flag returns `dbt show` results in JSON format instead of the default human-readable output, which is helpful for scripting and automation. * If you want the full terminal output (including logs) to be machine-readable JSON, you can also set `--log-format json`. #### Example[​](#example "Direct link to Example") ```text dbt show --select "model_name.sql" ``` or ```text dbt show --inline "select * from {{ ref('model_name') }}" ``` The following is an example of `dbt show` output for a model named `stg_orders`: ```bash dbt show --select "stg_orders" 21:17:38 Running with dbt=1.5.0-b5 21:17:38 Found 5 models, 20 tests, 0 snapshots, 0 analyses, 425 macros, 0 operations, 3 seed files, 0 sources, 0 exposures, 0 metrics, 0 groups 21:17:38 21:17:38 Concurrency: 24 threads (target='dev') 21:17:38 21:17:38 Previewing node 'stg_orders' : | order_id | customer_id | order_date | status | |----------+-------------+------------+-------- | | 1 | 1 | 2023-01-01 | returned | | 2 | 3 | 2023-01-02 | completed | | 3 | 94 | 2023-01-03 | completed | | 4 | 50 | 2023-01-04 | completed | | 5 | 64 | 2023-01-05 | completed | ``` For example, if you've just built a model that has a failing test, you can quickly preview the test failures right in the terminal, to find values of `id` that are duplicated: ```bash $ dbt build -s "my_model_with_duplicates" 13:22:47 .0 ... 13:22:48 Completed with 1 error and 0 warnings: 13:22:48 13:22:48 Failure in test unique_my_model_with_duplicates (models/schema.yml) 13:22:48 Got 1 result, configured to fail if not 0 13:22:48 13:22:48 compiled code at target/compiled/my_dbt_project/models/schema.yml/unique_my_model_with_duplicates_id.sql 13:22:48 13:22:48 Done. PASS=1 WARN=0 ERROR=1 SKIP=0 TOTAL=2 $ dbt show -s "unique_my_model_with_duplicates_id" 13:22:53 Running with dbt=1.5.0 13:22:53 Found 4 models, 2 tests, 0 snapshots, 0 analyses, 309 macros, 0 operations, 0 seed files, 0 sources, 0 exposures, 0 metrics, 0 groups 13:22:53 13:22:53 Concurrency: 5 threads (target='dev') 13:22:53 13:22:53 Previewing node 'unique_my_model_with_duplicates_id': | unique_field | n_records | | ------------ | --------- | | 1 | 2 | ``` ```sh dbt show --inline "select 1" --output json --log-format json ``` Gives you a result like this: ```json { "data": { "is_inline": true, "node_name": "inline_query", "output_format": "json", "preview": "[{\"ID\": 1}]", "quiet": false, "unique_id": "sql_operation.jaffle_shop.inline_query" }, "info": { "code": "Q041", "level": "info", "msg": "{\n \"show\": [\n {\n \"ID\": 1\n }\n ]\n}\n", "name": "ShowNode", "thread": "MainThread" } } ``` #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### About dbt snapshot command The `dbt snapshot` command executes the [Snapshots](https://docs.getdbt.com/docs/build/snapshots.md) defined in your project. dbt will look for Snapshots in the `snapshot-paths` paths defined in your `dbt_project.yml` file. By default, the `snapshot-paths` path is `snapshots/`. **Usage:** ```text $ dbt snapshot --help usage: dbt snapshot [-h] [--profiles-dir PROFILES_DIR] [--profile PROFILE] [--target TARGET] [--vars VARS] [--bypass-cache] [--threads THREADS] [--select SELECTOR [SELECTOR ...]] [--exclude EXCLUDE [EXCLUDE ...]] optional arguments: --select SELECTOR [SELECTOR ...] Specify the snapshots to include in the run. --exclude EXCLUDE [EXCLUDE ...] Specify the snapshots to exclude in the run. ``` #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### About dbt source command The `dbt source` command provides subcommands that are useful when working with source data. This command provides one subcommand, `dbt source freshness`. ##### dbt source freshness[​](#dbt-source-freshness "Direct link to dbt source freshness") If your dbt project is [configured with sources](https://docs.getdbt.com/docs/build/sources.md), then the `dbt source freshness` command will query all of your defined source tables, determining the "freshness" of these tables. If the tables are stale (based on the `freshness` config specified for your sources) then dbt will report a warning or error accordingly. If a source table is in a stale state, then dbt will exit with a nonzero exit code. You can also use [source freshness commands](https://docs.getdbt.com/reference/commands/source.md#source-freshness-commands) to help make sure the data you get is new and not old or outdated. ##### Configure source freshness[​](#configure-source-freshness "Direct link to Configure source freshness") The example below, shows how to configure source freshness in dbt. Refer to [Declaring source freshness](https://docs.getdbt.com/docs/build/sources.md#declaring-source-freshness) for more information. models/\.yml ```yaml sources: - name: jaffle_shop database: raw config: freshness: # changed to config in v1.9 warn_after: {count: 12, period: hour} error_after: {count: 24, period: hour} loaded_at_field: _etl_loaded_at # changed to config in v1.10 tables: - name: customers - name: orders config: freshness: warn_after: {count: 6, period: hour} error_after: {count: 12, period: hour} filter: datediff('day', _etl_loaded_at, current_timestamp) < 2 - name: product_skus config: freshness: null ``` This helps to monitor the data pipeline health. You can also configure source freshness in the **Execution settings** section in your dbt job **Settings** page. For more information, refer to [Enabling source freshness snapshots](https://docs.getdbt.com/docs/deploy/source-freshness.md#enabling-source-freshness-snapshots). ##### Source freshness commands[​](#source-freshness-commands "Direct link to Source freshness commands") Source freshness commands ensure you're receiving the most up-to-date, relevant, and accurate information. Some of the typical commands you can use are: | **Command** | **Description** | | -------------------------------------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------ | | [`dbt source freshness`](https://docs.getdbt.com/reference/commands/source.md#dbt-source-freshness) | Checks the "freshness" for all sources. | | [`dbt source freshness --output target/source_freshness.json`](https://docs.getdbt.com/reference/commands/source.md#configuring-source-freshness-output) | Output of "freshness" information to a different path. | | [`dbt source freshness --select "source:source_name"`](https://docs.getdbt.com/reference/commands/source.md#specifying-sources-to-snapshot) | Checks the "freshness" for specific sources. | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | ##### Specifying sources to snapshot[​](#specifying-sources-to-snapshot "Direct link to Specifying sources to snapshot") By default, `dbt source freshness` will calculate freshness information for all of the sources in your project. To snapshot freshness for a subset of these sources, use the `--select` flag. ```bash # Snapshot freshness for all Snowplow tables: $ dbt source freshness --select "source:snowplow" # Snapshot freshness for a particular source table: $ dbt source freshness --select "source:snowplow.event" ``` ##### Configuring source freshness output[​](#configuring-source-freshness-output "Direct link to Configuring source freshness output") When `dbt source freshness` completes, a JSON file containing information about the freshness of your sources will be saved to `target/sources.json`. An example `sources.json` will look like: target/sources.json ```json { "meta": { "generated_at": "2019-02-15T00:53:03.971126Z", "elapsed_time": 0.21452808380126953 }, "sources": { "source.project_name.source_name.table_name": { "max_loaded_at": "2019-02-15T00:45:13.572836+00:00Z", "snapshotted_at": "2019-02-15T00:53:03.880509+00:00Z", "max_loaded_at_time_ago_in_s": 481.307673, "state": "pass", "criteria": { "warn_after": { "count": 12, "period": "hour" }, "error_after": { "count": 1, "period": "day" } } } } } ``` To override the destination for this `sources.json` file, use the `-o` (or `--output`) flag: ```text # Output source freshness info to a different path $ dbt source freshness --output target/source_freshness.json ``` ##### Using source freshness[​](#using-source-freshness "Direct link to Using source freshness") Snapshots of source freshness can be used to understand: 1. If a specific data source is in a delayed state 2. The trend of data source freshness over time This command can be run manually to determine the state of your source data freshness at any time. It is also recommended that you run this command on a schedule, storing the results of the freshness snapshot at regular intervals. These longitudinal snapshots will make it possible to be alerted when source data freshness SLAs are violated, as well as understand the trend of freshness over time. dbt makes it easy to snapshot source freshness on a schedule, and provides a dashboard out of the box indicating the state of freshness for all of the sources defined in your project. For more information on snapshotting freshness in dbt, check out the [docs](https://docs.getdbt.com/docs/build/sources.md#source-data-freshness). #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### About dbt test command `dbt test` runs data tests defined on models, sources, snapshots, and seeds and unit tests defined on SQL models. It expects that you have already created those resources through the appropriate commands. The tests to run can be selected using the `--select` flag discussed [here](https://docs.getdbt.com/reference/node-selection/syntax.md). ```bash # run data and unit tests dbt test # run only data tests dbt test --select test_type:data # run only unit tests dbt test --select test_type:unit # run tests for one_specific_model dbt test --select "one_specific_model" # run tests for all models in package dbt test --select "some_package.*" # run only data tests defined singularly dbt test --select "test_type:singular" # run only data tests defined generically dbt test --select "test_type:generic" # run data tests limited to one_specific_model dbt test --select "one_specific_model,test_type:data" # run unit tests limited to one_specific_model dbt test --select "one_specific_model,test_type:unit" ``` For more information on writing tests, read the [data testing](https://docs.getdbt.com/docs/build/data-tests.md) and [unit testing](https://docs.getdbt.com/docs/build/unit-tests.md) documentation. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### About dbt_project.yml context The following context methods and variables are available when configuring resources in the `dbt_project.yml` file. This applies to the `models:`, `seeds:`, and `snapshots:` keys in the `dbt_project.yml` file. **Available context methods:** * [env\_var](https://docs.getdbt.com/reference/dbt-jinja-functions/env_var.md) * [var](https://docs.getdbt.com/reference/dbt-jinja-functions/var.md) **Available context variables:** * [target](https://docs.getdbt.com/reference/dbt-jinja-functions/target.md) * [builtins](https://docs.getdbt.com/reference/dbt-jinja-functions/builtins.md) * [dbt\_version](https://docs.getdbt.com/reference/dbt-jinja-functions/dbt_version.md) ##### Example configuration[​](#example-configuration "Direct link to Example configuration") dbt\_project.yml ```yml name: my_project version: 1.0.0 # Configure the models in models/facts/ to be materialized as views # in development and tables in production/CI contexts models: my_project: facts: +materialized: "{{ 'view' if target.name == 'dev' else 'table' }}" ``` #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### About dbt_version variable The `dbt_version` variable returns the installed version of dbt that is currently running. It can be used for debugging or auditing purposes. For details about release versioning, refer to [Versioning](https://docs.getdbt.com/reference/commands/version.md#versioning). #### Example usages[​](#example-usages "Direct link to Example usages") macros/get\_version.sql ```sql {% macro get_version() %} {% do log("The installed version of dbt is: " ~ dbt_version, info=true) %} {% endmacro %} ``` ```text $ dbt run-operation get_version The installed version of dbt is 1.6.0 ``` #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### About debug macro Requires Core CLI The `debug()` macro is only available when using dbt Core CLI in a local development environment. It's *not available* in dbt platform. Do not deploy code to production that uses the `debug` macro. If developing in dbt platform or using Fusion, you can instead use: * [`{{ print() }}`](https://docs.getdbt.com/reference/dbt-jinja-functions/print.md) - Print messages to both the log file and standard output (`stdout`). * [`{{ log() }}`](https://docs.getdbt.com/reference/dbt-jinja-functions/log.md) - Structured logging that prints messages during Jinja rendering. The `{{ debug() }}` macro will open an iPython debugger in the context of a compiled dbt macro. The `DBT_MACRO_DEBUGGING` environment variable must be set to use the debugger. This function requires: * Interactive terminal access with iPython debugger (`ipdb`) installed. Fusion doesn't provide a iPython (ipdb) debugger since its built on Rust. It instead outputs a non-interactive snapshot of the MiniJinja render context in the compiled code. * Local development environment running dbt Core CLI * `DBT_MACRO_DEBUGGING` environment variable set #### Usage[​](#usage "Direct link to Usage") my\_macro.sql ```text {% macro my_macro() %} {% set something_complex = my_complicated_macro() %} {{ debug() }} {% endmacro %} ``` When dbt hits the `debug()` line, you'll see something like: ```shell $ DBT_MACRO_DEBUGGING=write dbt compile Running with dbt=1.0 > /var/folders/31/mrzqbbtd3rn4hmgbhrtkfyxm0000gn/T/dbt-macro-compiled-cxvhhgu7.py(14)root() 13 environment.call(context, (undefined(name='debug') if l_0_debug is missing else l_0_debug)), ---> 14 environment.call(context, (undefined(name='source') if l_0_source is missing else l_0_source), 'src', 'seedtable'), 15 ) ipdb> l 9,12 9 l_0_debug = resolve('debug') 10 l_0_source = resolve('source') 11 pass 12 yield '%s\nselect * from %s' % ( ``` #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### About dispatch config dbt can extend functionality across [Supported Data Platforms](https://docs.getdbt.com/docs/supported-data-platforms.md) through a system of [multiple dispatch](https://en.wikipedia.org/wiki/Multiple_dispatch). Because SQL syntax, data types, and DDL/DML support vary across adapters, dbt can define and call generic functional macros, and then "dispatch" that macro to the appropriate implementation for the current adapter. #### Syntax[​](#syntax "Direct link to Syntax") **Args**: * `macro_name` \[required]: Name of macro to dispatch. Must be a string literal. * `macro_namespace` \[optional]: Namespace (package) of macro to dispatch. Must be a string literal. **Usage**: ```sql {% macro my_macro(arg1, arg2) -%} {{ return(adapter.dispatch('my_macro')(arg1, arg2)) }} {%- endmacro %} ``` dbt uses two criteria when searching for the right candidate macro: * Adapter prefix * Namespace (package) **Adapter prefix:** Adapter-specific macros are prefixed with the lowercase adapter name and two underscores. Given a macro named `my_macro`, dbt will look for: * Postgres: `postgres__my_macro` * Redshift: `redshift__my_macro` * Snowflake: `snowflake__my_macro` * BigQuery: `bigquery__my_macro` * OtherAdapter: `otheradapter__my_macro` * *default:* `default__my_macro` If dbt does not find an adapter-specific implementation, it will dispatch to the default implementation. **Namespace:** Generally, dbt will search for implementations in the root project and internal projects (e.g. `dbt`, `dbt_postgres`). If the `macro_namespace` argument is provided, it instead searches the specified namespace (package) for viable implementations. It is also possible to dynamically route namespace searching by defining a [`dispatch` project config](https://docs.getdbt.com/reference/project-configs/dispatch-config.md); see the examples below for details. #### Examples[​](#examples "Direct link to Examples") ##### A simple example[​](#a-simple-example "Direct link to A simple example") Let's say I want to define a macro, `concat`, that compiles to the SQL function `concat()` as its default behavior. On Redshift and Snowflake, however, I want to use the `||` operator instead. macros/concat.sql ```sql {% macro concat(fields) -%} {{ return(adapter.dispatch('concat')(fields)) }} {%- endmacro %} {% macro default__concat(fields) -%} concat({{ fields|join(', ') }}) {%- endmacro %} {% macro redshift__concat(fields) %} {{ fields|join(' || ') }} {% endmacro %} {% macro snowflake__concat(fields) %} {{ fields|join(' || ') }} {% endmacro %} ``` The top `concat` macro follows a special, rigid formula: It is named with the macro's "primary name," `concat`, which is how the macro will be called elsewhere. It accepts one argument, named `fields`. This macro's *only* function is to dispatch—that is, look for and return—using the primary macro name (`concat`) as its search term. It also wants to pass through, to its eventual implementation, all the keyword arguments that were passed into it. In this case, there's only one argument, named `fields`. Below that macro, I've defined three possible implementations of the `concat` macro: one for Redshift, one for Snowflake, and one for use by default on all other adapters. Depending on the adapter I'm running against, one of these macros will be selected, it will be passed the specified arguments as inputs, it will operate on those arguments, and it will pass back the result to the original dispatching macro. ##### A more complex example[​](#a-more-complex-example "Direct link to A more complex example") I found an existing implementation of the `concat` macro in the dbt-utils package. However, I want to override its implementation of the `concat` macro on Redshift in particular. In all other cases—including the default implementation—I'm perfectly happy falling back to the implementations defined in `dbt_utils.concat`. macros/concat.sql ```sql {% macro concat(fields) -%} {{ return(adapter.dispatch('concat')(fields)) }} {%- endmacro %} {% macro default__concat(fields) -%} {{ return(dbt_utils.concat(fields)) }} {%- endmacro %} {% macro redshift__concat(fields) %} {% for field in fields %} nullif({{ field }},'') {{ ' || ' if not loop.last }} {% endfor %} {% endmacro %} ``` If I'm running on Redshift, dbt will use my version; if I'm running on any other database, the `concat()` macro will shell out to the version defined in `dbt_utils`. #### For package maintainers[​](#for-package-maintainers "Direct link to For package maintainers") Dispatched macros from [packages](https://docs.getdbt.com/docs/build/packages.md) *must* provide the `macro_namespace` argument, as this declares the namespace (package) where it plans to search for candidates. Most often, this is the same as the name of your package, e.g. `dbt_utils`. (It is possible, if rarely desirable, to define a dispatched macro *not* in the `dbt_utils` package, and dispatch it into the `dbt_utils` namespace.) Here we have the definition of the `dbt_utils.concat` macro, which specifies both the `macro_name` and `macro_namespace` to dispatch: ```sql {% macro concat(fields) -%} {{ return(adapter.dispatch('concat', 'dbt_utils')(fields)) }} {%- endmacro %} ``` ##### Overriding package macros[​](#overriding-package-macros "Direct link to Overriding package macros") Following the second example above: Whenever I call my version of the `concat` macro in my own project, it will use my special null-handling version on Redshift. But the version of the `concat` macro *within* the dbt-utils package will not use my version. Why does this matter? Other macros in dbt-utils, such as `surrogate_key`, call the `dbt_utils.concat` macro directly. What if I want `dbt_utils.surrogate_key` to use *my* version of `concat` instead, including my custom logic on Redshift? As a user, I can accomplish this via a [project-level `dispatch` config](https://docs.getdbt.com/reference/project-configs/dispatch-config.md). When dbt goes to dispatch `dbt_utils.concat`, it knows from the `macro_namespace` argument to search in the `dbt_utils` namespace. The config below defines dynamic routing for that namespace, telling dbt to search through an ordered sequence of packages, instead of just the `dbt_utils` package. dbt\_project.yml ```yml dispatch: - macro_namespace: dbt_utils search_order: ['my_project', 'dbt_utils'] ``` Note that this config *must* be specified in the user's root `dbt_project.yml`. dbt will ignore any `dispatch` configs defined in the project files of installed packages. Adapter prefixes still matter: dbt will only ever look for implementations that are compatible with the current adapter. But dbt will prioritize package specificity over adapter specificity. If I call the `concat` macro while running on Postgres, with the config above, dbt will look for the following macros in order: 1. `my_project.postgres__concat` (not found) 2. `my_project.default__concat` (not found) 3. `dbt_utils.postgres__concat` (not found) 4. `dbt_utils.default__concat` (found! use this one) As someone installing a package, this functionality makes it possible for me to change the behavior of another, more complex macro (`dbt_utils.surrogate_key`) by reimplementing and overriding one of its modular components. As a package maintainer, this functionality enables users of my package to extend, reimplement, or override default behavior, without needing to fork the package's source code. ##### Overriding global macros[​](#overriding-global-macros "Direct link to Overriding global macros") tip Certain functions like [`ref`](https://docs.getdbt.com/reference/dbt-jinja-functions/ref.md), [`source`](https://docs.getdbt.com/reference/dbt-jinja-functions/source.md), and [`config`](https://docs.getdbt.com/reference/dbt-jinja-functions/config.md) can't be overridden with a package using the dispatch config. This is because `ref`, `source`, and `config` are context properties within dbt and are not dispatched as global macros. Refer to [this GitHub discussion](https://github.com/dbt-labs/dbt-core/issues/4491#issuecomment-994709916) for more context. I maintain an internal utility package at my organization, named `my_org_dbt_helpers`. I use this package to reimplement built-in dbt macros on behalf of all my dbt-using colleagues, who work across a number of dbt projects. My package can define custom versions of any dispatched global macro I choose, from `generate_schema_name` to `test_unique`. I can define a new default version of that macro (e.g. `default__generate_schema_name`), or custom versions for specific data warehouse adapters (e.g. `spark__generate_schema_name`). Each root project installing my package simply needs to include the [project-level `dispatch` config](https://docs.getdbt.com/reference/project-configs/dispatch-config.md) that searches my package ahead of `dbt` for the `dbt` global namespace: dbt\_project.yml ```yml dispatch: - macro_namespace: dbt search_order: ['my_project', 'my_org_dbt_helpers', 'dbt'] ``` ##### Managing different global overrides across packages[​](#managing-different-global-overrides-across-packages "Direct link to Managing different global overrides across packages") You can override global behaviors in different ways for each project that is installed as a package. This holds true for all global macros: `generate_schema_name`, `create_table_as`, etc. When parsing or running a resource defined in a package, the definition of the global macro within that package takes precedence over the definition in the root project because it's more specific to those resources. By combining package-level overrides and `dispatch`, it is possible to achieve three different patterns: 1. **Package always wins** — As the developer of dbt models in a project that will be deployed elsewhere as a package, You want full control over the macros used to define & materialize my models. Your macros should always take precedence for your models, and there should not be any way to override them. * *Mechanism:* Each project/package fully overrides the macro by its name, for example, `generate_schema_name` or `create_table_as`. Do not use dispatch. 2. **Conditional application (root project wins)** — As the maintainer of one dbt project in a mesh of multiple, your team wants conditional application of these rules. When running your project standalone (in development), you want to apply custom behavior; but when installed as a package and deployed alongside several other projects (in production), you want the root-level project's rules to apply. * *Mechanism:* Each package implements its "local" override by registering a candidate for dispatch with an adapter prefix, for example, `default__generate_schema_name` or `default__create_table_as`. The root-level project can then register its own candidate for dispatch (`default__generate_schema_name`), winning the default search order or by explicitly overriding the macro by name (`generate_schema_name`). 3. **Same rules everywhere all the time** — As a member of the data platform team responsible for consistency across teams at your organization, you want to create a "macro package" that every team can install & use. * *Mechanism:* Create a standalone package of candidate macros only, for example, `default__generate_schema_name` or `default__create_table_as`. Add a [project-level `dispatch` configuration](https://docs.getdbt.com/reference/project-configs/dispatch-config.md) in every project's `dbt_project.yml`. #### For adapter plugin maintainers[​](#for-adapter-plugin-maintainers "Direct link to For adapter plugin maintainers") Most packages were initially designed to work on the four original dbt adapters. By using the `dispatch` macro and project config, it is possible to "shim" existing packages to work on other adapters, by way of third-party compatibility packages. For instance, if I want to use `dbt_utils.concat` on Apache Spark, I can install a compatibility package, spark-utils, alongside dbt-utils: packages.yml ```yml packages: - package: dbt-labs/dbt_utils version: ... - package: dbt-labs/spark_utils version: ... ``` I then include `spark_utils` in the search order for dispatched macros in the `dbt_utils` namespace. (I still include my own project first, just in case I want to reimplement any macros with my own custom logic.) dbt\_project.yml ```yml dispatch: - macro_namespace: dbt_utils search_order: ['my_project', 'spark_utils', 'dbt_utils'] ``` When dispatching `dbt_utils.concat`, dbt will search for: 1. `my_project.spark__concat` (not found) 2. `my_project.default__concat` (not found) 3. `spark_utils.spark__concat` (found! use this one) 4. `spark_utils.default__concat` 5. `dbt_utils.postgres__concat` 6. `dbt_utils.default__concat` As a compatibility package maintainer, I only need to reimplement the foundational building-block macros which encapsulate low-level syntactical differences. By reimplementing low-level macros, such as `spark__dateadd` and `spark__datediff`, the `spark_utils` package provides access to more complex macros (`dbt_utils.date_spine`) "for free." As a `dbt-spark` user, by installing `dbt_utils` and `spark_utils` together, I don't just get access to higher-level utility macros. I may even be able to install and use packages with no Spark-specific logic, and which have never been tested against Spark, so long as they rely on `dbt_utils` macros for cross-adapter compatibility. ##### Adapter inheritance[​](#adapter-inheritance "Direct link to Adapter inheritance") Some adapters "inherit" from other adapters (e.g. `dbt-postgres` → `dbt-redshift`, and `dbt-spark` → `dbt-databricks`). If using a child adapter, dbt will include any parent adapter implementations in its search order, too. Instead of just looking for `redshift__` and falling back to `default__`, dbt will look for `redshift__`, `postgres__`, and `default__`, in that order. Child adapters tend to have very similar SQL syntax to their parents, so this allows them to skip reimplementing a macro that has already been reimplemented by the parent adapter. Following the example above with `dbt_utils.concat`, the full search order on Redshift is actually: 1. `my_project.redshift__concat` 2. `my_project.postgres__concat` 3. `my_project.default__concat` 4. `dbt_utils.redshift__concat` 5. `dbt_utils.postgres__concat` 6. `dbt_utils.default__concat` In rare cases, the child adapter may prefer the default implementation to its parent's adapter-specific implementation. In that case, the child adapter should define an adapter-specific macro that calls the default. For instance, the PostgreSQL syntax for adding dates ought to work on Redshift, too, but I may happen to prefer the simplicity of `dateadd`: ```sql {% macro dateadd(datepart, interval, from_date_or_timestamp) %} {{ return(adapter.dispatch('dateadd')(datepart, interval, from_date_or_timestamp)) }} {% endmacro %} {% macro default__dateadd(datepart, interval, from_date_or_timestamp) %} dateadd({{ datepart }}, {{ interval }}, {{ from_date_or_timestamp }}) {% endmacro %} {% macro postgres__dateadd(datepart, interval, from_date_or_timestamp) %} {{ from_date_or_timestamp }} + ((interval '1 {{ datepart }}') * ({{ interval }})) {% endmacro %} {# Use default syntax instead of postgres syntax #} {% macro redshift__dateadd(datepart, interval, from_date_or_timestamp) %} {{ return(default__dateadd(datepart, interval, from_date_or_timestamp) }} {% endmacro %} ``` #### FAQs[​](#faqs "Direct link to FAQs") \[Error] Could not find my\_project package If a package name is included in the `search_order` of a project-level `dispatch` config, dbt expects that package to contain macros which are viable candidates for dispatching. If an included package does not contain *any* macros, dbt will raise an error like: ```shell Compilation Error In dispatch: Could not find package 'my_project' ``` This does not mean the package or root project is missing—it means that any macros from it are missing, and so it is missing from the search spaces available to `dispatch`. If you've tried the step above and are still experiencing this behavior - reach out to the Support team at and we'll be happy to help! #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### About doc function The `doc` function is used to reference docs blocks in the description field of schema.yml files. It is analogous to the `ref` function. For more information, consult the [Documentation guide](https://docs.getdbt.com/docs/explore/build-and-view-your-docs.md). Usage: orders.md ```jinja2 {% docs orders %} # docs - go - here {% enddocs %} ``` schema.yml ```yaml models: - name: orders description: "{{ doc('orders') }}" ``` #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### About env_var function The `env_var` function can be used to incorporate environment variables from the system into your dbt project. You can use the `env_var` function in your `profiles.yml` file, the `dbt_project.yml` file, the `sources.yml` file, your `schema.yml` files, and in model `.sql` files. Essentially, `env_var` is available anywhere dbt processes Jinja code. When used in a `profiles.yml` file (to avoid putting credentials on a server), it can be used like this: profiles.yml ```yaml profile: target: prod outputs: prod: type: postgres host: 127.0.0.1 # IMPORTANT: Make sure to quote the entire Jinja string here user: "{{ env_var('DBT_USER') }}" password: "{{ env_var('DBT_PASSWORD') }}" .... ``` If the `DBT_USER` and `DBT_ENV_SECRET_PASSWORD` environment variables are present when dbt is invoked, then these variables will be pulled into the profile as expected. If any environment variables are not set, then dbt will raise a compilation error. ##### Converting env\_vars[​](#converting-env_vars "Direct link to Converting env_vars") Environment variables are always strings. When using them for configurations that expect integers or booleans, you must explicitly convert the value to the correct type. Use a Jinja filter to convert the string to the correct type: * **Integers** — Convert the string to a number using the `int` or [`as_number`](https://docs.getdbt.com/reference/dbt-jinja-functions/as_number.md) filter to avoid errors like `'1' is not of type 'integer'`. For example, `"{{ env_var('DBT_THREADS') | int }}"` or `"{{ env_var('DB_PORT') | as_number }}"`. * **Booleans** — Convert the string to a boolean explicitly using the [`as_bool`](https://docs.getdbt.com/reference/dbt-jinja-functions/as_bool.md) filter. For example, `"{{ env_var('DBT_PERSIST_DOCS_RELATION', False) | as_bool }}"`. For boolean defaults, use capitalized `True` or `False`. Using lowercase `true` or `false` will be treated as a string and can result in unexpected results. For example, to disable [`persist_docs`](https://docs.getdbt.com/reference/resource-configs/persist_docs.md) using environment variables: dbt\_project.yml ```yml +persist_docs: relation: "{{ env_var('DBT_PERSIST_DOCS_RELATION', False) | as_bool }}" columns: "{{ env_var('DBT_PERSIST_DOCS_COLUMNS', False) | as_bool }}" ``` Quoting, curly brackets, & you Be sure to quote the entire Jinja string. Otherwise, the YAML parser will be confused by the Jinja curly brackets. ##### Default values[​](#default-values "Direct link to Default values") You can also provide a default value as a second argument: dbt\_project.yml ```yaml ... models: jaffle_shop: +materialized: "{{ env_var('DBT_MATERIALIZATION', 'view') }}" ``` This can be useful to avoid compilation errors when the environment variable isn't available. ##### Secrets[​](#secrets "Direct link to Secrets") For certain configurations, you can use "secret" env vars. Any env var named with the prefix `DBT_ENV_SECRET` will be: * Available for use in `profiles.yml` + `packages.yml`, via the same `env_var()` function * Disallowed everywhere else, including `dbt_project.yml` and model SQL, to prevent accidentally writing these secret values to the data warehouse or metadata artifacts * Scrubbed from dbt logs and replaced with `*****`, any time its value appears in those logs (even if the env var was not called directly) The primary use case of secret env vars is git access tokens for [private packages](https://docs.getdbt.com/docs/build/packages.md#private-packages). **Note:** When dbt is loading profile credentials and package configuration, secret env vars will be replaced with the string value of the environment variable. You cannot modify secrets using Jinja filters, including type-casting filters such as [`as_number`](https://docs.getdbt.com/reference/dbt-jinja-functions/as_number.md) or [`as_bool`](https://docs.getdbt.com/reference/dbt-jinja-functions/as_bool.md), or pass them as arguments into other Jinja macros. You can only use *one secret* per configuration: ```yml # works host: "{{ env_var('DBT_ENV_SECRET_HOST') }}" # does not work host: "www.{{ env_var('DBT_ENV_SECRET_HOST_DOMAIN') }}.com/{{ env_var('DBT_ENV_SECRET_HOST_PATH') }}" ``` ##### Custom metadata[​](#custom-metadata "Direct link to Custom metadata") Any env var named with the prefix `DBT_ENV_CUSTOM_ENV_` will be included in two places, with its prefix-stripped name as the key: * [dbt artifacts](https://docs.getdbt.com/reference/artifacts/dbt-artifacts.md#common-metadata): `metadata` -> `env` * [events and structured logs](https://docs.getdbt.com/reference/events-logging.md#info-fields): `info` -> `extra` A dictionary of these prefixed env vars will also be available in a `dbt_metadata_envs` context variable: ```sql -- {{ dbt_metadata_envs }} select 1 as id ``` ```shell $ DBT_ENV_CUSTOM_ENV_MY_FAVORITE_COLOR=indigo DBT_ENV_CUSTOM_ENV_MY_FAVORITE_NUMBER=6 dbt compile ``` Compiles to: ```sql -- {'MY_FAVORITE_COLOR': 'indigo', 'MY_FAVORITE_NUMBER': '6'} select 1 as id ``` ##### dbt platform usage[​](#dbt-platform-usage "Direct link to dbt platform usage") If you are using dbt, you must adhere to the naming conventions for environment variables. Environment variables in dbt must be prefixed with `DBT_` (including `DBT_ENV_CUSTOM_ENV_` or `DBT_ENV_SECRET`). Environment variables keys are uppercased and case sensitive. When referencing `{{env_var('DBT_KEY')}}` in your project's code, the key must match exactly the variable defined in dbt's UI. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### About exceptions namespace The `exceptions` namespace can be used to raise warnings and errors in dbt userspace. #### raise\_compiler\_error[​](#raise_compiler_error "Direct link to raise_compiler_error") The `exceptions.raise_compiler_error` method will raise a compiler error with the provided message. This is typically only useful in macros or materializations when invalid arguments are provided by the calling model. Note that throwing an exception will cause a model to fail, so please use this variable with care! **Example usage**: exceptions.sql ```sql {% if number < 0 or number > 100 %} {{ exceptions.raise_compiler_error("Invalid `number`. Got: " ~ number) }} {% endif %} ``` #### warn[​](#warn "Direct link to warn") Use the `exceptions.warn` method to raise a compiler warning with the provided message, but any model will still be successful and be treated as a PASS. By default, warnings will not cause dbt runs to fail. However: * If you use the `--warn-error` flag, all warnings will be promoted to errors. * To promote only Jinja warnings to errors (and leave other warnings alone), use `--warn-error-options`. For example, `--warn-error-options '{"error": ["JinjaLogWarning"]}'`. Learn more about [Warnings](https://docs.getdbt.com/reference/global-configs/warnings.md). **Example usage**: warn.sql ```sql {% if number < 0 or number > 100 %} {% do exceptions.warn("Invalid `number`. Got: " ~ number) %} {% endif %} ``` #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### About execute variable `execute` is a Jinja variable that returns True when dbt is in "execute" mode. When you execute a `dbt compile` or `dbt run` command, dbt: 1. Reads all of the files in your project and generates a [manifest](https://docs.getdbt.com/reference/artifacts/manifest-json.md) comprised of models, tests, and other graph nodes present in your project. During this phase, dbt uses the [`ref`](https://docs.getdbt.com/reference/dbt-jinja-functions/ref.md) and [`source`](https://docs.getdbt.com/reference/dbt-jinja-functions/source.md) statements it finds to generate the DAG for your project. **No SQL is run during this phase**, and `execute == False`. 2. Compiles (and runs) each node (eg. building models, or running tests). **SQL is run during this phase**, and `execute == True`. Any Jinja that relies on a result being returned from the database will error during the parse phase. For example, this SQL will return an error: models/order\_payment\_methods.sql ```sql 1 {% set payment_method_query %} 2 select distinct 3 payment_method 4 from {{ ref('raw_payments') }} 5 order by 1 6 {% endset %} 7 8 {% set results = run_query(payment_method_query) %} 9 10 {# Return the first column #} 11 {% set payment_methods = results.columns[0].values() %} ``` The error returned by dbt will look as follows: ```text Encountered an error: Compilation Error in model order_payment_methods (models/order_payment_methods.sql) 'None' has no attribute 'table' ``` This is because line #11 in the earlier code example (`{% set payment_methods = results.columns[0].values() %}`) assumes that a table has been returned, when, during the parse phase, this query hasn't been run. To work around this, wrap any problematic Jinja in an `{% if execute %}` statement: models/order\_payment\_methods.sql ```sql {% set payment_method_query %} select distinct payment_method from {{ ref('raw_payments') }} order by 1 {% endset %} {% set results = run_query(payment_method_query) %} {% if execute %} {# Return the first column #} {% set payment_methods = results.columns[0].values() %} {% else %} {% set payment_methods = [] %} {% endif %} ``` #### Parsing vs execution[​](#parsing-vs-execution "Direct link to Parsing vs execution") Parsing in Jinja is when dbt: * Reads your project files. * Identifies [`ref`](https://docs.getdbt.com/reference/dbt-jinja-functions/ref.md) and [`source`](https://docs.getdbt.com/reference/dbt-jinja-functions/source.md). * Identifies macro definitions. * Builds the dependency graph (DAG). It doesn't run any SQL — `execute == False`. Execution is when dbt actually runs SQL and builds models — `execute == True`. During execution, dbt: * Renders full Jinja templates into SQL. * Resolves all instances of `ref()` and `source()` to their corresponding table or view names. * Runs the SQL in your models during commands like ([`dbt run`](https://docs.getdbt.com/reference/commands/run.md)), ([`dbt test`](https://docs.getdbt.com/reference/commands/test.md)), \[`dbt seed`]\(/reference/commands/seed, or [`dbt snapshot`](https://docs.getdbt.com/reference/commands/snapshot.md). * Creates or updates tables/views in the warehouse. * Applies any materializations (incremental, table, view, ephemeral). `execute` impacts the values of `ref()` and `source()`, and won't work as expected inside of a [`sql_header`](https://docs.getdbt.com/reference/resource-configs/sql_header.md#usage). This is because in the initial parse of the project, dbt identifies every use of `ref()` and `source()` to build the DAG, but doesn’t resolve them to actual database identifiers. Instead, it replaces each with a placeholder value to ensure the SQL compiles cleanly during parsing. #### Examples[​](#examples "Direct link to Examples") Macros like [`log()`](https://docs.getdbt.com/reference/dbt-jinja-functions/log.md) and [`exceptions.warn()`](https://docs.getdbt.com/reference/dbt-jinja-functions/exceptions.md#warn) are still evaluated at parse time, during dbt's "first-pass" Jinja render to extract `ref`, `source` and `config`. As a result, dbt will also run any logging or warning messages during this process. Even though nothing is being executed yet, dbt still runs those log lines while parsing. This can be confusing — it looks like dbt is doing something real but it’s just parsing. ```text $ dbt run 15:42:01 Running with dbt=1.10.2 15:42:01 I'm running a query now. <------ this one is misleading!!!! no query is actually being run 15:42:01 Found 1 model, 0 tests, 0 snapshots, 0 macros, 0 operations, 0 seed files, 0 sources, 0 exposures, 0 metrics 15:42:01 15:42:01 Concurrency: 8 threads (target='dev') 15:42:01 15:42:01 1 of 1 START table model analytics.my_model .................................. [RUN] 15:42:01 I'm running a query now 15:42:02 1 of 1 OK created table model analytics.my_model ............................. [OK in 0.36s] ``` ##### Logging fully-qualified relation names[​](#logging-fully-qualified-relation-names "Direct link to Logging fully-qualified relation names") Let's assume you have a relation named `relation` obtained using something like `{% set relation = ref('my_model') %}` or `{% set relation = source('source_name', 'table_name') %}` — this will lead to unexpected or confusing behavior during parsing: ```jinja {%- if load_relation(relation) is none -%} {{ log("Relation is missing: " ~ relation, True) }} {% endif %} ``` To prevent this, add the `execute` flag to make sure the check only runs when dbt is actually running the code — not just when it's preparing it. Use the command `do exceptions.warn` to emit a warning during model execution without failing the run. ```jinja {%- if execute and load_relation(relation) is none -%} {% do exceptions.warn("Relation is missing: " ~ relation) %} {{ log("Relation is missing: " ~ relation, info=True) }} {%- endif -%} ``` #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### About flags (global configs) In dbt, "flags" (also called "global configs") are configurations for fine-tuning *how* dbt runs your project. They differ from [resource-specific configs](https://docs.getdbt.com/reference/configs-and-properties.md) that tell dbt about *what* to run. Flags control things like the visual output of logs, whether to treat specific warning messages as errors, or whether to "fail fast" after encountering the first error. Flags are "global" configs because they are available for all dbt commands and they can be set in multiple places. There is a significant overlap between dbt's flags and dbt's command line options, but there are differences: * Certain flags can only be set in [`dbt_project.yml`](https://docs.getdbt.com/reference/dbt_project.yml.md) and cannot be overridden for specific invocations via CLI options. * If a CLI option is supported by specific commands, rather than supported by all commands ("global"), it is generally not considered to be a "flag". ##### Setting flags[​](#setting-flags "Direct link to Setting flags") There are multiple ways of setting flags, which depend on the use case: * **[Project-level `flags` in `dbt_project.yml`](https://docs.getdbt.com/reference/global-configs/project-flags.md):** Define version-controlled defaults for everyone running this project. Also, opt in or opt out of [behavior changes](https://docs.getdbt.com/reference/global-configs/behavior-changes.md) to manage your migration off legacy functionality. * **[Environment variables](https://docs.getdbt.com/reference/global-configs/environment-variable-configs.md):** Define different behavior in different runtime environments (development vs. production vs. [continuous integration](https://docs.getdbt.com/docs/deploy/continuous-integration.md), or different behavior for different users in development (based on personal preferences). * **[CLI options](https://docs.getdbt.com/reference/global-configs/command-line-options.md):** Define behavior specific to *this invocation*. Supported for all dbt commands. The most specific setting "wins." If you set the same flag in all three places, the CLI option will take precedence, followed by the environment variable, and finally, the value in `dbt_project.yml`. If you set the flag in none of those places, it will use the default value defined within dbt. Most flags can be set in all three places: ```yaml # dbt_project.yml flags: # set default for running this project -- anywhere, anytime, by anyone fail_fast: true ``` ```bash # set this environment variable to 'True' (bash syntax) export DBT_FAIL_FAST=1 dbt run ``` ```bash dbt run --fail-fast # set to True for this specific invocation dbt run --no-fail-fast # set to False ``` There are two categories of exceptions: 1. **Flags setting file paths:** Flags for file paths that are relevant to runtime execution (for example, `--log-path` or `--state`) cannot be set in `dbt_project.yml`. To override defaults, pass CLI options or set environment variables (`DBT_LOG_PATH`, `DBT_STATE`). Flags that tell dbt where to find project resources (for example, `model-paths`) are set in `dbt_project.yml`, but as a top-level key, outside the `flags` dictionary; these configs are expected to be fully static and never vary based on the command or execution environment. 2. **Opt-in flags:** Flags opting in or out of [behavior changes](https://docs.getdbt.com/reference/global-configs/behavior-changes.md) can *only* be defined in `dbt_project.yml`. These are intended to be set in version control and migrated via pull/merge request. Their values should not diverge indefinitely across invocations, environments, or users. ##### Accessing flags[​](#accessing-flags "Direct link to Accessing flags") Custom user-defined logic, written in Jinja, can check the values of flags using [the `flags` context variable](https://docs.getdbt.com/reference/dbt-jinja-functions/flags.md). ```yaml # dbt_project.yml on-run-start: - '{{ log("I will stop at the first sign of trouble", info = true) if flags.FAIL_FAST }}' ``` #### Available flags[​](#available-flags "Direct link to Available flags") Because the values of `flags` can differ across invocations, we strongly advise against using `flags` as an input to configurations or dependencies (`ref` + `source`) that dbt resolves [during parsing](https://docs.getdbt.com/reference/parsing.md#known-limitations). ##### Common flag examples[​](#common-flag-examples "Direct link to Common flag examples") Use the `--target` flag to specify which target (environment) to use when running dbt commands. For example: ```bash dbt run --target dev dbt run --target prod dbt build --target staging ``` The `--target` flag allows you to run the same dbt project against different environments without modifying your configuration files. Define the target in your `profiles.yml` file. Learn more about [connection profiles and targets](https://docs.getdbt.com/docs/local/profiles.yml.md#understanding-targets-in-profiles). | | | - | | Flag name | Type | Default | Supported in project? | Environment variable | CLI options | Supported in dbt CLI? | | ------------------------------------------------------------------------------------------------------------------------------------------------- | -------- | --------------------------------- | ----------------------- | ------------------------------------------------------ | ------------------------------------------------------------------------------------------------- | --------------------- | | [cache\_selected\_only](https://docs.getdbt.com/reference/global-configs/cache.md) | boolean | False | ✅ | `DBT_CACHE_SELECTED_ONLY` | `--cache-selected-only`, `--no-cache-selected-only` | ✅ | | [clean\_project\_files\_only](https://docs.getdbt.com/reference/commands/clean.md#--clean-project-files-only) | boolean | True | ❌ | `DBT_CLEAN_PROJECT_FILES_ONLY` | `--clean-project-files-only, --no-clean-project-files-only` | ❌ | | [debug](https://docs.getdbt.com/reference/global-configs/logs.md#debug-level-logging) | boolean | False | ✅ | `DBT_DEBUG` | `--debug`, `--no-debug` | ✅ | | [defer](https://docs.getdbt.com/reference/node-selection/defer.md) | boolean | False | ❌ | `DBT_DEFER` | `--defer`, `--no-defer` | ✅ (default) | | [defer\_state](https://docs.getdbt.com/reference/node-selection/defer.md) | path | None | ❌ | `DBT_DEFER_STATE` | `--defer-state` | ❌ | | [favor\_state](https://docs.getdbt.com/reference/node-selection/defer.md#favor-state) | boolean | False | ❌ | `DBT_FAVOR_STATE` | `--favor-state`, `--no-favor-state` | ✅ | | [empty](https://docs.getdbt.com/docs/build/empty-flag.md) | boolean | False | ❌ | `DBT_EMPTY` | `--empty`, `--no-empty` | ✅ | | [event\_time\_start](https://docs.getdbt.com/reference/dbt-jinja-functions/model.md#batch-properties-for-microbatch-models) | datetime | None | ❌ | `DBT_EVENT_TIME_START` | `--event-time-start` | ✅ | | [event\_time\_end](https://docs.getdbt.com/reference/dbt-jinja-functions/model.md#batch-properties-for-microbatch-models) | datetime | None | ❌ | `DBT_EVENT_TIME_END` | `--event-time-end` | ✅ | | [fail\_fast](https://docs.getdbt.com/reference/global-configs/failing-fast.md) | boolean | False | ✅ | `DBT_FAIL_FAST` | `--fail-fast`, `-x`, `--no-fail-fast` | ✅ | | [full\_refresh](https://docs.getdbt.com/reference/resource-configs/full_refresh.md) | boolean | False | ✅ (as resource config) | `DBT_FULL_REFRESH` | `--full-refresh`, `--no-full-refresh` | ✅ | | [indirect\_selection](https://docs.getdbt.com/reference/node-selection/test-selection-examples.md#syntax-examples) | enum | eager | ✅ | `DBT_INDIRECT_SELECTION` | `--indirect-selection` | ❌ | | [introspect](https://docs.getdbt.com/reference/commands/compile.md#introspective-queries) | boolean | True | ❌ | `DBT_INTROSPECT` | `--introspect`, `--no-introspect` | ❌ | | [log\_cache\_events](https://docs.getdbt.com/reference/global-configs/logs.md#logging-relational-cache-events) | boolean | False | ❌ | `DBT_LOG_CACHE_EVENTS` | `--log-cache-events`, `--no-log-cache-events` | ❌ | | [log\_format\_file](https://docs.getdbt.com/reference/global-configs/logs.md#log-formatting) | enum | default (text) | ✅ | `DBT_LOG_FORMAT_FILE` | `--log-format-file` | ❌ | | [log\_format](https://docs.getdbt.com/reference/global-configs/logs.md#log-formatting) | enum | default (text) | ✅ | `DBT_LOG_FORMAT` | `--log-format` | ❌ | | [log\_level\_file](https://docs.getdbt.com/reference/global-configs/logs.md#log-level) | enum | debug | ✅ | `DBT_LOG_LEVEL_FILE` | `--log-level-file` | ❌ | | [log\_level](https://docs.getdbt.com/reference/global-configs/logs.md#log-level) | enum | info | ✅ | `DBT_LOG_LEVEL` | `--log-level` | ❌ | | [log\_path](https://docs.getdbt.com/reference/global-configs/logs.md) | path | None (uses `logs/`) | ❌ | `DBT_LOG_PATH` | `--log-path` | ❌ | | [partial\_parse](https://docs.getdbt.com/reference/global-configs/parsing.md#partial-parsing) | boolean | True | ✅ | `DBT_PARTIAL_PARSE` | `--partial-parse`, `--no-partial-parse` | ✅ | | [populate\_cache](https://docs.getdbt.com/reference/global-configs/cache.md) | boolean | True | ✅ | `DBT_POPULATE_CACHE` | `--populate-cache`, `--no-populate-cache` | ✅ | | [print](https://docs.getdbt.com/reference/global-configs/print-output.md#suppress-print-messages-in-stdout) | boolean | True | ❌ | `DBT_PRINT` | `--print`, `--no-print` | ❌ | | [printer\_width](https://docs.getdbt.com/reference/global-configs/print-output.md#printer-width) | int | 80 | ✅ | `DBT_PRINTER_WIDTH` | `--printer-width` | ❌ | | [profile](https://docs.getdbt.com/docs/local/profiles.yml.md#about-profiles) | string | None | ✅ (as top-level key) | `DBT_PROFILE` | [`--profile`](https://docs.getdbt.com/docs/local/profiles.yml.md#overriding-profiles-and-targets) | ❌ | | [profiles\_dir](https://docs.getdbt.com/docs/local/profiles.yml.md#about-profiles) | path | None (current dir, then HOME dir) | ❌ | `DBT_PROFILES_DIR` | `--profiles-dir` | ❌ | | [project\_dir](https://docs.getdbt.com/reference/dbt_project.yml.md) | path | | ❌ | `DBT_PROJECT_DIR` | `--project-dir` | ❌ | | [quiet](https://docs.getdbt.com/reference/global-configs/logs.md#suppress-non-error-logs-in-output) | boolean | False | ❌ | `DBT_QUIET` | `--quiet` | ✅ | | [resource-type](https://docs.getdbt.com/reference/global-configs/resource-type.md) (v1.8+) | string | None | ❌ | `DBT_RESOURCE_TYPES`
`DBT_EXCLUDE_RESOURCE_TYPES` | `--resource-type`
`--exclude-resource-type` | ✅ | | [sample](https://docs.getdbt.com/docs/build/sample-flag.md) | string | None | ❌ | `DBT_SAMPLE` | `--sample` | ✅ | | [send\_anonymous\_usage\_stats](https://docs.getdbt.com/reference/global-configs/usage-stats.md) | boolean | True | ✅ | `DBT_SEND_ANONYMOUS_USAGE_STATS` | `--send-anonymous-usage-stats`, `--no-send-anonymous-usage-stats` | ❌ | | [source\_freshness\_run\_project\_hooks](https://docs.getdbt.com/reference/global-configs/behavior-changes.md#source_freshness_run_project_hooks) | boolean | True | ✅ | ❌ | ❌ | ❌ | | [state](https://docs.getdbt.com/reference/node-selection/defer.md) | path | none | ❌ | `DBT_STATE`, `DBT_DEFER_STATE` | `--state`, `--defer-state` | ❌ | | [static\_parser](https://docs.getdbt.com/reference/global-configs/parsing.md#static-parser) | boolean | True | ✅ | `DBT_STATIC_PARSER` | `--static-parser`, `--no-static-parser` | ❌ | | [store\_failures](https://docs.getdbt.com/reference/resource-configs/store_failures.md) | boolean | False | ✅ (as resource config) | `DBT_STORE_FAILURES` | `--store-failures`, `--no-store-failures` | ✅ | | [target\_path](https://docs.getdbt.com/reference/global-configs/json-artifacts.md) | path | None (uses `target/`) | ❌ | `DBT_TARGET_PATH` | `--target-path` | ❌ | | [target](https://docs.getdbt.com/docs/local/profiles.yml.md#about-profiles) | string | None | ❌ | `DBT_TARGET` | [`--target`](https://docs.getdbt.com/docs/local/profiles.yml.md#overriding-profiles-and-targets) | ❌ | | [use\_colors\_file](https://docs.getdbt.com/reference/global-configs/logs.md#color) | boolean | True | ✅ | `DBT_USE_COLORS_FILE` | `--use-colors-file`, `--no-use-colors-file` | ❌ | | [use\_colors](https://docs.getdbt.com/reference/global-configs/print-output.md#print-color) | boolean | True | ✅ | `DBT_USE_COLORS` | `--use-colors`, `--no-use-colors` | ❌ | | [use\_experimental\_parser](https://docs.getdbt.com/reference/global-configs/parsing.md#experimental-parser) | boolean | False | ✅ | `DBT_USE_EXPERIMENTAL_PARSER` | `--use-experimental-parser`, `--no-use-experimental-parser` | ❌ | | [version\_check](https://docs.getdbt.com/reference/global-configs/version-compatibility.md) | boolean | varies | ✅ | `DBT_VERSION_CHECK` | `--version-check`, `--no-version-check` | ❌ | | [warn\_error\_options](https://docs.getdbt.com/reference/global-configs/warnings.md) | dict | | ✅ | `DBT_WARN_ERROR_OPTIONS` | `--warn-error-options` | ✅ | | [warn\_error](https://docs.getdbt.com/reference/global-configs/warnings.md) | boolean | False | ✅ | `DBT_WARN_ERROR` | `--warn-error` | ✅ | | [write\_json](https://docs.getdbt.com/reference/global-configs/json-artifacts.md) | boolean | True | ✅ | `DBT_WRITE_JSON` | `--write-json`, `--no-write-json` | ✅ | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | --- ### About flags variable The `flags` variable contains values of flags provided on the command line. **Example usage:** flags.sql ```sql {% if flags.FULL_REFRESH %} drop table ... {% else %} -- no-op {% endif %} ``` The list of available flags is defined in the [`flags` module](https://github.com/dbt-labs/dbt-core/blob/HEAD/core/dbt/flags.py) within `dbt-core`. Recommended use cases include: * different materialization logic based on "run modes," such as `flags.FULL_REFRESH` and `flags.STORE_FAILURES` * running hooks conditionally based on the current command / task type, via `flags.WHICH` **Note:** It is *not* recommended to use flags as an input to parse-time configurations, properties, or dependencies (`ref` + `source`). Flags are likely to change in every invocation of dbt, and their parsed values will become stale (and yield incorrect results) in subsequent invocations that have partial parsing enabled. For more details, see [the docs on parsing](https://docs.getdbt.com/reference/parsing.md). ##### invocation\_args\_dict[​](#invocation_args_dict "Direct link to invocation_args_dict") For the full set of information passed from the CLI—subcommand, flags, arguments—you can use `invocation_args_dict`. This is equivalent to the `args` dictionary in [`run_results.json`](https://docs.getdbt.com/reference/artifacts/run-results-json.md). models/my\_model.sql ```sql -- invocation_args_dict: -- {{ invocation_args_dict }} -- dbt_metadata_envs: -- {{ dbt_metadata_envs }} select 1 as id ``` The `invocation_command` key within `invocation_args_dict` includes the entire subcommand when it compiles: ```shell $ DBT_ENV_CUSTOM_ENV_MYVAR=myvalue dbt compile -s my_model 12:10:22 Running with dbt=1.6.0-b8 12:10:22 Registered adapter: postgres=1.6.0-b8 12:10:22 Found 1 seed, 1 model, 349 macros 12:10:22 12:10:22 Concurrency: 5 threads (target='dev') 12:10:22 12:10:22 Compiled node 'my_model' is: -- invocation_args_dict: -- {'log_format_file': 'debug', 'log_level': 'info', 'exclude': (), 'send_anonymous_usage_stats': True, 'which': 'compile', 'defer': False, 'output': 'text', 'log_format': 'default', 'macro_debugging': False, 'populate_cache': True, 'static_parser': True, 'vars': {}, 'warn_error_options': WarnErrorOptions(include=[], exclude=[]), 'quiet': False, 'select': ('my_model',), 'indirect_selection': 'eager', 'strict_mode': False, 'version_check': False, 'enable_legacy_logger': False, 'log_path': '/Users/jerco/dev/scratch/testy/logs', 'profiles_dir': '/Users/jerco/.dbt', 'invocation_command': 'dbt compile -s my_model', 'log_level_file': 'debug', 'project_dir': '/Users/jerco/dev/scratch/testy', 'favor_state': False, 'use_colors_file': True, 'write_json': True, 'partial_parse': True, 'printer_width': 80, 'print': True, 'cache_selected_only': False, 'use_colors': True, 'introspect': True} -- dbt_metadata_envs: -- {'MYVAR': 'myvalue'} select 1 as id ``` #### flags.WHICH[​](#flagswhich "Direct link to flags.WHICH") `flags.WHICH` is a global variable that gets set when you run a dbt command. If used in a macro, it allows you to conditionally change behavior depending on the command currently being executed. For example, conditionally modifying SQL: ```sql {% macro conditional_filter(table_name) %} {# Add a WHERE clause only during `dbt run`, not during `dbt test` or `dbt compile` #} select * from {{ table_name }} {% if flags.WHICH == "run" %} where is_active = true {% elif flags.WHICH == "test" %} -- In test runs, restrict rows to keep tests fast limit 10 {% elif flags.WHICH == "compile" %} -- During compile, just add a harmless comment -- compile mode detected {% endif %} {% endmacro %} ``` The following commands are supported: | `flags.WHICH` value | Description | | ------------------- | ------------------------------------------------------------------ | | `"build"` | Build and test all selected resources. | | `"clean"` | Remove artifacts like target directory and packages. | | `"clone"` | Clone models and other resources. | | `"compile"` | Compile SQL, but do not execute. | | `"debug"` | Test connections and validate configs. | | `"deps"` | Download package dependencies. | | `"docs"` | Generate and serve documentation. | | `"environment"` | Workspace environment commands (cloud CLI). | | `"help"` | Show help for commands and subcommands. | | `"init"` | Bootstrap a new project. | | `"invocation"` | For interacting with or inspecting current invocation (cloud CLI). | | `"list"` | List resources. | | `"parse"` | Parse project and report errors, but don’t build/test. | | `"retry"` | Retry the last invocation from the point of failure. | | `"run"` | Execute models. | | `"run-operation"` | Invoke arbitrary macros or SQL ops. | | `"seed"` | Load CSV(s) into the database. | | `"show"` | Inspect resource definitions or materializations. | | `"snapshot"` | Execute snapshots. | | `"source"` | Validate freshness and inspect source definitions. | | `"test"` | Schema and data tests. | | `"version"` | Display dbt version. | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### About fromjson context method The `fromjson` context method can be used to deserialize a JSON string into a Python object primitive, eg. a `dict` or `list`. **Args**: * `string`: The JSON string to deserialize (required) * `default`: A default value to return if the `string` argument cannot be deserialized (optional) ##### Usage:[​](#usage "Direct link to Usage:") ```text {% set my_json_str = '{"abc": 123}' %} {% set my_dict = fromjson(my_json_str) %} {% do log(my_dict['abc']) %} ``` #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### About fromyaml context method The `fromyaml` context method can be used to deserialize a YAML string into a Python object primitive, eg. a `dict` or `list`. **Args**: * `string`: The YAML string to deserialize (required) * `default`: A default value to return if the `string` argument cannot be deserialized (optional) ##### Usage:[​](#usage "Direct link to Usage:") ```text {% set my_yml_str -%} dogs: - good - bad {%- endset %} {% set my_dict = fromyaml(my_yml_str) %} {% do log(my_dict['dogs'], info=true) %} -- ["good", "bad"] {% do my_dict['dogs'].pop() %} {% do log(my_dict['dogs'], info=true) %} -- ["good"] ``` #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### About graph context variable The `graph` context variable contains information about the *nodes* in your dbt project. Models, sources, tests, and snapshots are all examples of nodes in dbt projects. Heads up dbt actively builds the `graph` variable during the [parsing phase](https://docs.getdbt.com/reference/dbt-jinja-functions/execute.md) of running dbt projects, so some properties of the `graph` context variable will be missing or incorrect during parsing. Please read the information below carefully to understand how to effectively use this variable. ##### The graph context variable[​](#the-graph-context-variable "Direct link to The graph context variable") The `graph` context variable is a dictionary which maps node ids onto dictionary representations of those nodes. A simplified example might look like: ```json { "nodes": { "model.my_project.model_name": { "unique_id": "model.my_project.model_name", "config": {"materialized": "table", "sort": "id"}, "tags": ["abc", "123"], "path": "models/path/to/model_name.sql", ... }, ... }, "sources": { "source.my_project.snowplow.event": { "unique_id": "source.my_project.snowplow.event", "database": "analytics", "schema": "analytics", "tags": ["abc", "123"], "path": "models/path/to/schema.yml", ... }, ... }, "exposures": { "exposure.my_project.traffic_dashboard": { "unique_id": "exposure.my_project.traffic_dashboard", "type": "dashboard", "maturity": "high", "path": "models/path/to/schema.yml", ... }, ... }, "metrics": { "metric.my_project.count_all_events": { "unique_id": "metric.my_project.count_all_events", "type": "count", "path": "models/path/to/schema.yml", ... }, ... }, "groups": { "group.my_project.finance": { "unique_id": "group.my_project.finance", "name": "finance", "owner": { "email": "finance@jaffleshop.com" } ... }, ... } } ``` The exact contract for these model and source nodes is not currently documented, but that will change in the future. ##### Accessing models[​](#accessing-models "Direct link to Accessing models") The `model` entries in the `graph` dictionary will be incomplete or incorrect during parsing. If accessing the models in your project via the `graph` variable, be sure to use the [execute](https://docs.getdbt.com/reference/dbt-jinja-functions/execute.md) flag to ensure that this code only executes at run-time and not at parse-time. Do not use the `graph` variable to build your DAG, as the resulting dbt behavior will be undefined and likely incorrect. Example usage: graph-usage.sql ```sql /* Print information about all of the models in the Snowplow package */ {% if execute %} {% for node in graph.nodes.values() | selectattr("resource_type", "equalto", "model") | selectattr("package_name", "equalto", "snowplow") %} {% do log(node.unique_id ~ ", materialized: " ~ node.config.materialized, info=true) %} {% endfor %} {% endif %} /* Example output --------------------------------------------------------------- model.snowplow.snowplow_id_map, materialized: incremental model.snowplow.snowplow_page_views, materialized: incremental model.snowplow.snowplow_web_events, materialized: incremental model.snowplow.snowplow_web_page_context, materialized: table model.snowplow.snowplow_web_events_scroll_depth, materialized: incremental model.snowplow.snowplow_web_events_time, materialized: incremental model.snowplow.snowplow_web_events_internal_fixed, materialized: ephemeral model.snowplow.snowplow_base_web_page_context, materialized: ephemeral model.snowplow.snowplow_base_events, materialized: ephemeral model.snowplow.snowplow_sessions_tmp, materialized: incremental model.snowplow.snowplow_sessions, materialized: table */ ``` ##### Accessing sources[​](#accessing-sources "Direct link to Accessing sources") To access the sources in your dbt project programmatically, use the `sources` attribute of the `graph` object. Example usage: models/events\_unioned.sql ```sql /* Union all of the Snowplow sources defined in the project which begin with the string "event_" */ {% set sources = [] -%} {% for node in graph.sources.values() -%} {%- if node.name.startswith('event_') and node.source_name == 'snowplow' -%} {%- do sources.append(source(node.source_name, node.name)) -%} {%- endif -%} {%- endfor %} select * from ( {%- for source in sources %} select * from {{ source }} {% if not loop.last %} union all {% endif %} {% endfor %} ) /* Example compiled SQL --------------------------------------------------------------- select * from ( select * from raw.snowplow.event_add_to_cart union all select * from raw.snowplow.event_remove_from_cart union all select * from raw.snowplow.event_checkout ) */ ``` ##### Accessing exposures[​](#accessing-exposures "Direct link to Accessing exposures") To access the exposures in your dbt project programmatically, use the `exposures` attribute of the `graph` object. Example usage: models/my\_important\_view\_model.sql ```sql {# Include a SQL comment naming all of the exposures that this model feeds into #} {% set exposures = [] -%} {% for exposure in graph.exposures.values() -%} {%- if model['unique_id'] in exposure.depends_on.nodes -%} {%- do exposures.append(exposure) -%} {%- endif -%} {%- endfor %} -- HELLO database administrator! Before dropping this view, -- please be aware that doing so will affect: {% for exposure in exposures %} -- * {{ exposure.name }} ({{ exposure.type }}) {% endfor %} /* Example compiled SQL --------------------------------------------------------------- -- HELLO database administrator! Before dropping this view, -- please be aware that doing so will affect: -- * our_metrics (dashboard) -- * my_sync (application) */ ``` ##### Accessing metrics[​](#accessing-metrics "Direct link to Accessing metrics") To access the metrics in your dbt project programmatically, use the `metrics` attribute of the `graph` object. Example usage: macros/get\_metric.sql ```sql {% macro get_metric_sql_for(metric_name) %} {% set metrics = graph.metrics.values() %} {% set metric = (metrics | selectattr('name', 'equalto', metric_name) | list).pop() %} /* Elsewhere, I've defined a macro, get_metric_timeseries_sql, that will return the SQL needed to perform a time-based rollup of this metric's calculation */ {% set metric_sql = get_metric_timeseries_sql( relation = metric['model'], type = metric['type'], expression = metric['sql'], ... ) %} {{ return(metric_sql) }} {% endmacro %} ``` ##### Accessing groups[​](#accessing-groups "Direct link to Accessing groups") To access the groups in your dbt project programmatically, use the `groups` attribute of the `graph` object. Example usage: macros/get\_group.sql ```sql {% macro get_group_owner_for(group_name) %} {% set groups = graph.groups.values() %} {% set owner = (groups | selectattr('owner', 'equalto', group_name) | list).pop() %} {{ return(owner) }} {% endmacro %} ``` #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### About invocation_id The `invocation_id` outputs a UUID generated for this dbt command. This value is useful when auditing or analyzing dbt invocation metadata. If available, the `invocation_id` is: * available in the compilation context of [`query-comment`](https://docs.getdbt.com/reference/project-configs/query-comment.md) * included in the `info` dictionary in dbt [events and logs](https://docs.getdbt.com/reference/events-logging.md#info) * included in the `metadata` dictionary in [dbt artifacts](https://docs.getdbt.com/reference/artifacts/dbt-artifacts.md#common-metadata) * included as a label in all BigQuery jobs that dbt originates **Example usage**: You can use the following example code for all data platforms. Remember to replace `TABLE_NAME` with the actual name of your target table: `select '{{ invocation_id }}' as test_id from TABLE_NAME` #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### About local_md5 context variable The `local_md5` context variable calculates an [MD5 hash](https://en.wikipedia.org/wiki/MD5) of the given string. The string `local_md5` emphasizes that the hash is calculated *locally*, in the dbt-Jinja context. This variable is typically useful for advanced use cases. For example, when you generate unique identifiers within custom materialization or operational logic, you can either avoid collisions between temporary relations or identify changes by comparing checksums. It is different than the `md5` SQL function, supported by many SQL dialects, which runs remotely in the data platform. You want to always use SQL hashing functions when generating surrogate keys. Usage: ```sql -- source {%- set value_hash = local_md5("hello world") -%} '{{ value_hash }}' -- compiled '5eb63bbbe01eeed093cb22bb8f5acdc3' ``` #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### About model object `model` is the dbt [graph object](https://docs.getdbt.com/reference/dbt-jinja-functions/graph.md) (or node) for the current model. It can be used to: * Access `config` settings, say, in a post-hook * Access the path to the model For example: ```jinja {% if model.config.materialized == 'view' %} {{ log(model.name ~ " is a view.", info=True) }} {% endif %} ``` To view the contents of `model` for a given model: * Command line interface * Studio IDE If you're using the command line interface (CLI), use [log()](https://docs.getdbt.com/reference/dbt-jinja-functions/log.md) to print the full contents: ```jinja {{ log(model, info=True) }} ``` If you're using the Studio IDE, compile the following to print the full contents:

```jinja {{ model | tojson(indent = 4) }} ``` #### Batch properties for microbatch models[​](#batch-properties-for-microbatch-models "Direct link to Batch properties for microbatch models") Starting in dbt Core v1.9, the model object includes a `batch` property (`model.batch`), which provides details about the current batch when executing an [incremental microbatch](https://docs.getdbt.com/docs/build/incremental-microbatch.md) model. This property is only populated during the batch execution of a microbatch model. The following table describes the properties of the `batch` object. Note that dbt appends the property to the `model` and `batch` objects. | Property | Description | Example | | ------------------ | ---------------------------------------------------------------------------------------------------------------------------------- | ------------------------------ | | `id` | The unique identifier for the batch within the context of the microbatch model. | `model.batch.id` | | `event_time_start` | The start time of the batch's [`event_time`](https://docs.getdbt.com/reference/resource-configs/event-time.md) filter (inclusive). | `model.batch.event_time_start` | | `event_time_end` | The end time of the batch's `event_time` filter (exclusive). | `model.batch.event_time_end` | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | ##### Usage notes[​](#usage-notes "Direct link to Usage notes") `model.batch` is only available during the execution of a microbatch model batch. Outside of the microbatch execution, `model.batch` is `None`, and its sub-properties aren't accessible. ###### Example of safeguarding access to batch properties[​](#example-of-safeguarding-access-to-batch-properties "Direct link to Example of safeguarding access to batch properties") We recommend to always check if `model.batch` is populated before accessing its properties. To do this, use an `if` statement for safe access to `batch` properties: ```jinja {% if model.batch %} {{ log(model.batch.id) }} # Log the batch ID # {{ log(model.batch.event_time_start) }} # Log the start time of the batch # {{ log(model.batch.event_time_end) }} # Log the end time of the batch # {% endif %} ``` In this example, the `if model.batch` statement makes sure that the code only runs during a batch execution. `log()` is used to print the `batch` properties for debugging. ###### Example of log batch details[​](#example-of-log-batch-details "Direct link to Example of log batch details") This is a practical example of how you might use `model.batch` in a microbatch model to log batch details for the `batch.id`: ```jinja {% if model.batch %} {{ log("Processing batch with ID: " ~ model.batch.id, info=True) }} {{ log("Batch event time range: " ~ model.batch.event_time_start ~ " to " ~ model.batch.event_time_end, info=True) }} {% endif %} ``` In this example, the `if model.batch` statement makes sure that the code only runs during a batch execution. `log()` is used to print the `batch` properties for debugging. #### Model structure and JSON schema[​](#model-structure-and-json-schema "Direct link to Model structure and JSON schema") To view the structure of `models` and their definitions: * Refer to [dbt JSON Schema](https://schemas.getdbt.com/) for describing and consuming dbt generated artifacts * Select the corresponding manifest version under **Manifest**. For example if you're on dbt v1.8, then you would select Manifest v12 * The `manifest.json` version number is related to (but not *equal* to) your dbt version, so you *must* use the correct `manifest.json` version for your dbt version. To find the correct `manifest.json` version, refer to [Manifest](https://docs.getdbt.com/reference/artifacts/manifest-json.md) and select the dbt version on the top navigation (such as `v1.5`). This will help you find out which tags are associated with your model. * Then go to `nodes` --> Select Additional properties --> `CompiledModelNode` or view other definitions/objects. Use the following table to understand how the versioning pattern works and match the Manifest version with the dbt version: | dbt version | Manifest version | | ---------------------- | -------------------------------------------------------------------------------- | | dbt Fusion engine v2.0 | v20 (Identical to [v12](https://schemas.getdbt.com/dbt/manifest/v12/index.html)) | | Core v1.11 | [v12](https://schemas.getdbt.com/dbt/manifest/v12/index.html) | | Core v1.10 | [v12](https://schemas.getdbt.com/dbt/manifest/v12/index.html) | | Core v1.9 | [v12](https://schemas.getdbt.com/dbt/manifest/v12/index.html) | | Core v1.8 | [v12](https://schemas.getdbt.com/dbt/manifest/v12/index.html) | | Core v1.7 | [v11](https://schemas.getdbt.com/dbt/manifest/v11/index.html) | | Core v1.6 | [v10](https://schemas.getdbt.com/dbt/manifest/v10/index.html) | | Core v1.5 | [v9](https://schemas.getdbt.com/dbt/manifest/v9/index.html) | | Core v1.4 | [v8](https://schemas.getdbt.com/dbt/manifest/v8/index.html) | | Core v1.3 | [v7](https://schemas.getdbt.com/dbt/manifest/v7/index.html) | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | #### Related docs[​](#related-docs "Direct link to Related docs") * [dbt JSON Schema](https://schemas.getdbt.com/) #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### About model schema The schema that the model is configured to be materialized in. This is typically the same as `model['schema']`. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### About modules variable The `modules` variable in the Jinja context is a predefined namespace that contains only a limited set of supported Python modules for operating on data. You cannot import or access arbitrary Python modules (for example, `os`, `requests`, or custom third-party libraries) from within Jinja. There is no user-facing configuration to modify or extend the `modules` namespace. This restriction helps ensure consistent behavior, security, and portability across environments. If your workflow requires functionality from additional Python libraries, use a [Python model](https://docs.getdbt.com/docs/build/python-models.md) (where supported) instead of Jinja. Python models run in a different execution context and allow you to import and use external libraries as needed. #### datetime[​](#datetime "Direct link to datetime") This variable is a pointer to the Python [`datetime`](https://docs.python.org/3/library/datetime.html) module, which supports complex date and time logic. It includes the modules contexts of `date`, `datetime`, `time`, `timedelta`, and `tzinfo`. **Usage** ```text {% set now = modules.datetime.datetime.now() %} {% set three_days_ago_iso = (now - modules.datetime.timedelta(3)).isoformat() %} ``` This module will return the current date and time on every Jinja evaluation. For the date and time of the start of the run, please see [run\_started\_at](https://docs.getdbt.com/reference/dbt-jinja-functions/run_started_at.md). #### pytz[​](#pytz "Direct link to pytz") This variable is a pointer to the Python [`pytz`](https://pypi.org/project/pytz/) module, which supports timezone logic. **Usage** ```text {% set dt = modules.datetime.datetime(2002, 10, 27, 6, 0, 0) %} {% set dt_local = modules.pytz.timezone('US/Eastern').localize(dt) %} {{ dt_local }} ``` #### re[​](#re "Direct link to re") This variable is a pointer to the Python [`re`](https://docs.python.org/3/library/re.html) module, which supports regular expressions. **Usage** ```text {% set my_string = 's3://example/path' %} {% set s3_path_pattern = 's3://[a-z0-9-_/]+' %} {% set re = modules.re %} {% set is_match = re.match(s3_path_pattern, my_string, re.IGNORECASE) %} {% if not is_match %} {%- do exceptions.raise_compiler_error( my_string ~ ' is not a valid s3 path' ) -%} {% endif %} ``` #### itertools[​](#itertools "Direct link to itertools") Note Starting in `dbt-core==1.10.6`, using `modules.itertools` raises a deprecation warning. For more information and suggested workarounds, refer to the [documentation on `ModulesItertoolsUsageDeprecation`](https://docs.getdbt.com/reference/deprecations.md#modulesitertoolsusagedeprecation). This variable is a pointer to the Python [`itertools`](https://docs.python.org/3/library/itertools.html) module, which includes useful functions for working with iterators (loops, lists, and the like). The supported functions are: * `count` * `cycle` * `repeat` * `accumulate` * `chain` * `compress` * `islice` * `starmap` * `tee` * `zip_longest` * `product` * `permutations` * `combinations` * `combinations_with_replacement` **Usage** ```text {%- set A = [1, 2] -%} {%- set B = ['x', 'y', 'z'] -%} {%- set AB_cartesian = modules.itertools.product(A, B) -%} {%- for item in AB_cartesian %} {{ item }} {%- endfor -%} ``` ```text (1, 'x') (1, 'y') (1, 'z') (2, 'x') (2, 'y') (2, 'z') ``` #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### About on-run-end context variable Caution These variables are only available in the context for `on-run-end` hooks. They will evaluate to `none` if used outside of an `on-run-end` hook! #### schemas[​](#schemas "Direct link to schemas") The `schemas` context variable can be used to reference the schemas that dbt has built models into during a run of dbt. This variable can be used to grant usage on these schemas to certain users at the end of a dbt run. Example: dbt\_project.yml ```sql on-run-end: - "{% for schema in schemas %}grant usage on schema {{ schema }} to db_reader;{% endfor %}" ``` In practice, it might not be a bad idea to put this code into a macro: macros/grants.sql ```jinja2 {% macro grant_usage_to_schemas(schemas, user) %} {% for schema in schemas %} grant usage on schema {{ schema }} to {{ user }}; {% endfor %} {% endmacro %} ``` dbt\_project.yml ```yaml on-run-end: - "{{ grant_usage_to_schemas(schemas, 'user') }}" ``` #### database\_schemas[​](#database_schemas "Direct link to database_schemas") The `database_schemas` context variable can be used to reference the databases *and* schemas that dbt has built models into during a run of dbt. This variable is similar to the `schemas` variable, and should be used if a dbt run builds resources into multiple different databases. Example: macros/grants.sql ```jinja2 {% macro grant_usage_to_schemas(database_schemas, user) %} {% for (database, schema) in database_schemas %} grant usage on {{ database }}.{{ schema }} to {{ user }}; {% endfor %} {% endmacro %} ``` dbt\_project.yml ```yaml on-run-end: - "{{ grant_usage_to_schemas(database_schemas, user) }}" ``` #### Results[​](#results "Direct link to Results") The `results` variable contains a list of [Result objects](https://docs.getdbt.com/reference/dbt-classes.md#result-objects) with one element per resource that executed in the dbt job. The Result object provides access within the Jinja on-run-end context to the information that will populate the [run results JSON artifact](https://docs.getdbt.com/reference/artifacts/run-results-json.md). Example usage: macros/log\_results.sql ```sql {% macro log_results(results) %} {% if execute %} {{ log("========== Begin Summary ==========", info=True) }} {% for res in results -%} {% set line -%} node: {{ res.node.unique_id }}; status: {{ res.status }} (message: {{ res.message }}) {%- endset %} {{ log(line, info=True) }} {% endfor %} {{ log("========== End Summary ==========", info=True) }} {% endif %} {% endmacro %} ``` dbt\_project.yml ```yaml on-run-end: "{{ log_results(results) }}" ``` Results: ```text 12:48:17 | Concurrency: 1 threads (target='dev') 12:48:17 | 12:48:17 | 1 of 2 START view model dbt_jcohen.abc............................... [RUN] 12:48:17 | 1 of 2 OK created view model dbt_jcohen.abc.......................... [CREATE VIEW in 0.11s] 12:48:17 | 2 of 2 START table model dbt_jcohen.def.............................. [RUN] 12:48:17 | 2 of 2 ERROR creating table model dbt_jcohen.def..................... [ERROR in 0.09s] 12:48:17 | 12:48:17 | Running 1 on-run-end hook ========== Begin Summary ========== node: model.testy.abc; status: success (message: CREATE VIEW) node: model.testy.def; status: error (message: Database Error in model def (models/def.sql) division by zero compiled SQL at target/run/testy/models/def.sql) ========== End Summary ========== 12:48:17 | 1 of 1 START hook: testy.on-run-end.0................................ [RUN] 12:48:17 | 1 of 1 OK hook: testy.on-run-end.0................................... [OK in 0.00s] 12:48:17 | 12:48:17 | 12:48:17 | Finished running 1 view model, 1 table model, 1 hook in 1.94s. ``` #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### About packages.yml context The following context methods and variables are available when configuring a `packages.yml` file. **Available context methods:** * [env\_var](https://docs.getdbt.com/reference/dbt-jinja-functions/env_var.md) * Use `env_var()` in any dbt YAML file that supports Jinja. Only `packages.yml` and `profiles.yml` support environment variables for [secure values](https://docs.getdbt.com/docs/build/dbt-tips.md#yaml-tips) (using the `DBT_ENV_SECRET_` prefix). * [var](https://docs.getdbt.com/reference/dbt-jinja-functions/var.md) (Note: only variables defined with `--vars` are available. Refer to [YAML tips](https://docs.getdbt.com/docs/build/dbt-tips.md#yaml-tips) for more information) **Available context variables:** * [builtins](https://docs.getdbt.com/reference/dbt-jinja-functions/builtins.md) * [dbt\_version](https://docs.getdbt.com/reference/dbt-jinja-functions/dbt_version.md) * [target](https://docs.getdbt.com/reference/dbt-jinja-functions/target.md) #### Example usage[​](#example-usage "Direct link to Example usage") The following examples show how to use the different context methods and variables in your `packages.yml`. Use `builtins` in your `packages.yml`: ```text packages: - package: dbt-labs/dbt_utils version: "{% if builtins is defined %}0.14.0{% else %}0.13.1{% endif %}" ``` Use `env_var` in your `packages.yml`: ```text packages: - package: dbt-labs/dbt_utils version: "{{ env_var('DBT_UTILS_VERSION') }}" ``` Use `dbt_version` in your `packages.yml`: ```text packages: - package: dbt-labs/dbt_utils version: "{% if dbt_version is defined %}0.14.0{% else %}0.13.1{% endif %}" ``` Use `target` in your `packages.yml`: ```text packages: - package: dbt-labs/dbt_utils version: "{% if target.name == 'prod' %}0.14.0{% else %}0.13.1{% endif %}" ``` #### Related docs[​](#related-docs "Direct link to Related docs") * [Packages](https://docs.getdbt.com/docs/build/packages.md) #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### About print function Use the `print()` function when you want to print messages to both the log file and standard output (stdout). When used in conjunction with the `QUIET` global config, which suppresses non-error logs, you will only see error logs and the print messages in stdout. For more information, see [Global configs](https://docs.getdbt.com/reference/global-configs/about-global-configs.md). #### Example[​](#example "Direct link to Example") ```sql {% macro some_macro(arg1, arg2) %} {{ print("Running some_macro: " ~ arg1 ~ ", " ~ arg2) }} {% endmacro %} ``` #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### About profiles.yml context The following context methods are available when configuring resources in the `profiles.yml` file. **Available context methods:** * [env\_var](https://docs.getdbt.com/reference/dbt-jinja-functions/env_var.md) * [var](https://docs.getdbt.com/reference/dbt-jinja-functions/var.md) (*Note: only variables defined with `--vars` are available*) ##### Example usage[​](#example-usage "Direct link to Example usage") \~/.dbt/profiles.yml ```yml jaffle_shop: target: dev outputs: dev: type: redshift host: "{{ env_var('DBT_HOST') }}" user: "{{ env_var('DBT_USER') }}" password: "{{ env_var('DBT_PASS') }}" port: 5439 dbname: analytics schema: dbt_dbanin threads: 4 ``` #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### About project_name context variable The `project_name` context variable returns the `name` for the root-level project which is being run by dbt. This variable can be used to defer execution to a root-level project macro if one exists. ##### Example Usage[​](#example-usage "Direct link to Example Usage") redshift/macros/helper.sql ```sql /* This macro vacuums tables in a Redshift database. If a macro exists in the root-level project called `get_tables_to_vacuum`, this macro will call _that_ macro to find the tables to vacuum. If the macro is not defined in the root project, this macro will use a default implementation instead. */ {% macro vacuum_tables() %} {% set root_project = context[project_name] %} {% if root_project.get_tables_to_vacuum %} {% set tables = root_project.get_tables_to_vacuum() %} {% else %} {% set tables = redshift.get_tables_to_vacuum() %} {% endif %} {% for table in tables %} {% do redshift.vacuum_table(table) %} {% endfor %} {% endmacro %} ``` #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### About properties.yml context The following context methods and variables are available when configuring resources in a `properties.yml` file. **Available context methods:** * [env\_var](https://docs.getdbt.com/reference/dbt-jinja-functions/env_var.md) * [var](https://docs.getdbt.com/reference/dbt-jinja-functions/var.md) **Available context variables:** * [target](https://docs.getdbt.com/reference/dbt-jinja-functions/target.md) * [builtins](https://docs.getdbt.com/reference/dbt-jinja-functions/builtins.md) * [dbt\_version](https://docs.getdbt.com/reference/dbt-jinja-functions/dbt_version.md) ##### Example configuration[​](#example-configuration "Direct link to Example configuration") properties.yml ```yml # Configure this model to be materialized as a view # in development and a table in production/CI contexts models: - name: dim_customers config: materialized: "{{ 'view' if target.name == 'dev' else 'table' }}" ``` #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### About ref function ```sql select * from {{ ref("node_name") }} ``` #### Definition[​](#definition "Direct link to Definition") This function: * Returns a [Relation](https://docs.getdbt.com/reference/dbt-classes.md#relation) for a [model](https://docs.getdbt.com/docs/build/models.md), [seed](https://docs.getdbt.com/docs/build/seeds.md), or [snapshot](https://docs.getdbt.com/docs/build/snapshots.md) * Creates dependencies between the referenced node and the current model, which is useful for documentation and [node selection](https://docs.getdbt.com/reference/node-selection/syntax.md) * Compiles to the full object name in the database The most important function in dbt is `ref()`; it's impossible to build even moderately complex models without it. `ref()` is how you reference one model within another. This is a very common behavior, as typically models are built to be "stacked" on top of one another. Here is how this looks in practice: model\_a.sql ```sql select * from public.raw_data ``` model\_b.sql ```sql select * from {{ref('model_a')}} ``` `ref()` is, under the hood, actually doing two important things. First, it is interpolating the schema into your model file to allow you to change your deployment schema via configuration. Second, it is using these references between models to automatically build the dependency graph. This will enable dbt to deploy models in the correct order when using `dbt run`. The `{{ ref }}` function returns a `Relation` object that has the same `table`, `schema`, and `name` attributes as the [{{ this }} variable](https://docs.getdbt.com/reference/dbt-jinja-functions/this.md). #### Advanced ref usage[​](#advanced-ref-usage "Direct link to Advanced ref usage") ##### Versioned ref[​](#versioned-ref "Direct link to Versioned ref") The `ref` function supports an optional keyword argument - `version` (or `v`). When a version argument is provided to the `ref` function, dbt returns to the `Relation` object corresponding to the specified version of the referenced model. This functionality is useful when referencing versioned models that make breaking changes by creating new versions, but guarantees no breaking changes to existing versions of the model. If the `version` argument is not supplied to a `ref` of a versioned model, the latest version is. This has the benefit of automatically incorporating the latest changes of a referenced model, but there is a risk of incorporating breaking changes. ###### Example[​](#example "Direct link to Example") models/\.yml ```yml models: - name: model_name latest_version: 2 versions: - v: 2 - v: 1 ``` ```sql -- returns the `Relation` object corresponding to version 1 of model_name select * from {{ ref('model_name', version=1) }} ``` ```sql -- returns the `Relation` object corresponding to version 2 (the latest version) of model_name select * from {{ ref('model_name') }} ``` ##### Ref project-specific models[​](#ref-project-specific-models "Direct link to Ref project-specific models") You can also reference models from different projects using the two-argument variant of the `ref` function. By specifying both a namespace (which could be a project or package) and a model name, you ensure clarity and avoid any ambiguity in the `ref`. This is also useful when dealing with models across various projects or packages. When using two arguments with projects (not packages), you also need to set [cross project dependencies](https://docs.getdbt.com/docs/mesh/govern/project-dependencies.md). The following syntax demonstrates how to reference a model from a specific project or package: ```sql select * from {{ ref('project_or_package', 'model_name') }} ``` We recommend using two-argument `ref` any time you are referencing a model defined in a different package or project. While not required in all cases, it's more explicit for you, for dbt, and future readers of your code. We especially recommend using two-argument `ref` to avoid ambiguity, in cases where a model name is duplicated across multiple projects or installed packages. If you use one-argument `ref` (just the `model_name`), dbt will look for a model by that name in the same namespace (package or project); if it finds none, it will raise an error. **Note:** The `project_or_package` should match the `name` of the project/package, as defined in its `dbt_project.yml`. This might be different from the name of the repository. It never includes the repository's organization name. For example, if you use the [`fivetran/stripe`](https://hub.getdbt.com/fivetran/stripe/latest/) package, the package name is `stripe`, not `fivetran/stripe`. ##### Forcing dependencies[​](#forcing-dependencies "Direct link to Forcing dependencies") In normal usage, dbt knows the proper order to run all models based on the use of the `ref` function, because it discovers them all during its parse phase. dbt will throw an error if it discovers an "unexpected" `ref` at run time (meaning it was hidden during the parsing phase). The most common cause for this is that the `ref` is inside a branch of an `if` statement that wasn't evaluated during parsing. conditional\_ref.sql ```sql --This macro already has its own `if execute` check, so this one is redundant and introduced solely to cause an error {% if execute %} {% set sql_statement %} select max(created_at) from {{ ref('processed_orders') }} {% endset %} {%- set newest_processed_order = dbt_utils.get_single_value(sql_statement, default="'2020-01-01'") -%} {% endif %} select *, last_order_at > '{{ newest_processed_order }}' as has_unprocessed_order from {{ ref('users') }} ``` * In this case, dbt doesn't know that `processed_orders` is a dependency because `execute` is false during parsing. * To address this, use a SQL comment along with the `ref` function — dbt will understand the dependency and the compiled query will still be valid: conditional\_ref.sql ```sql --Now that this ref is outside of the if block, it will be detected during parsing --depends_on: {{ ref('processed_orders') }} {% if execute %} {% set sql_statement %} select max(created_at) from {{ ref('processed_orders') }} {% endset %} {%- set newest_processed_order = dbt_utils.get_single_value(sql_statement, default="'2020-01-01'") -%} {% endif %} select *, last_order_at > '{{ newest_processed_order }}' as has_unprocessed_order from {{ ref('users') }} ``` tip To ensure dbt understands the dependency, use a SQL comment instead of a Jinja comment. Jinja comments (`{# ... #}`) *don't* work and are ignored by dbt's parser, meaning `ref` is never processed and resolved. SQL comments, however, (`--` or `/* ... */`) *do* work because dbt still evaluates Jinja inside SQL comments. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### About References The References section contains reference materials for developing with dbt, which includes dbt and dbt Core. Learn how to add more configurations to your dbt project or adapter, use properties for extra ability, refer to dbt commands, use powerful Jinja functions to streamline your dbt project, and understand how to use dbt artifacts.
[![](/img/icons/computer.svg)](https://docs.getdbt.com/reference/dbt_project.yml.md) ###### [Project configurations](https://docs.getdbt.com/reference/dbt_project.yml.md) [Customize and configure your dbt project to optimize performance.](https://docs.getdbt.com/reference/dbt_project.yml.md) [![](/img/icons/computer.svg)](https://docs.getdbt.com/reference/resource-configs.md) ###### [Platform-specific configurations](https://docs.getdbt.com/reference/resource-configs.md) [Learn how to optimize performance with data platform-specific configurations in dbt and dbt Core.](https://docs.getdbt.com/reference/resource-configs.md) [![](/img/icons/computer.svg)](https://docs.getdbt.com/reference/configs-and-properties.md) ###### [Resource configurations and properties](https://docs.getdbt.com/reference/configs-and-properties.md) [Properties and configurations that provide extra abilities to your projects resources.](https://docs.getdbt.com/reference/configs-and-properties.md) [![](/img/icons/computer.svg)](https://docs.getdbt.com/reference/dbt-commands.md) ###### [dbt Commands](https://docs.getdbt.com/reference/dbt-commands.md) [Outlines the commands supported by dbt and their relevant flags.](https://docs.getdbt.com/reference/dbt-commands.md) [![](/img/icons/computer.svg)](https://docs.getdbt.com/reference/dbt-jinja-functions-context-variables.md) ###### [dbt Jinja functions and context variables](https://docs.getdbt.com/reference/dbt-jinja-functions-context-variables.md) [Additional functions and variables to the Jinja context that are useful when working with a dbt project.](https://docs.getdbt.com/reference/dbt-jinja-functions-context-variables.md) [![](/img/icons/computer.svg)](https://docs.getdbt.com/reference/artifacts/dbt-artifacts.md) ###### [dbt Artifacts](https://docs.getdbt.com/reference/artifacts/dbt-artifacts.md) [Information on dbt-generated Artifacts and how you can use them.](https://docs.getdbt.com/reference/artifacts/dbt-artifacts.md) [![](/img/icons/computer.svg)](https://docs.getdbt.com/reference/database-permissions/snowflake-permissions.md) ###### [Snowflake permissions artifacts](https://docs.getdbt.com/reference/database-permissions/snowflake-permissions.md) [Provides an example Snowflake database role permissions.](https://docs.getdbt.com/reference/database-permissions/snowflake-permissions.md) [![](/img/icons/computer.svg)](https://docs.getdbt.com/reference/database-permissions/databricks-permissions.md) ###### [Databricks permissions artifacts](https://docs.getdbt.com/reference/database-permissions/databricks-permissions.md) [Provides an example Databricks database role permissions.](https://docs.getdbt.com/reference/database-permissions/databricks-permissions.md) [![](/img/icons/computer.svg)](https://docs.getdbt.com/reference/database-permissions/redshift-permissions.md) ###### [Redshift permissions artifacts](https://docs.getdbt.com/reference/database-permissions/redshift-permissions.md) [Provides an example Redshift database role permissions.](https://docs.getdbt.com/reference/database-permissions/redshift-permissions.md) [![](/img/icons/computer.svg)](https://docs.getdbt.com/reference/database-permissions/postgres-permissions.md) ###### [Postgres permissions artifacts](https://docs.getdbt.com/reference/database-permissions/postgres-permissions.md) [Provides an example Postgres database role permissions.](https://docs.getdbt.com/reference/database-permissions/postgres-permissions.md) #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### About return function **Args**: * `data`: The data to return to the caller The `return` function can be used in macros to return data to the caller. The type of the data (dict, list, int, etc) will be preserved through the `return` call. You can use the `return` function in the following ways within your macros: as an expression or as a statement. * Expression — Use an expression when the goal is to output a string from the macro. * Statement with a `do` tag — Use a statement with a `do` tag to execute the return function without generating an output string. This is particularly useful when you want to perform actions without necessarily inserting their results directly into the template. In the following example, `{{ return([1,2,3]) }}` acts as an *expression* that directly outputs a string, making it suitable for directly inserting returned values into SQL code. macros/get\_data.sql ```sql {% macro get_data() %} {{ return([1,2,3]) }} {% endmacro %} ``` Alternatively, you can use a statement with a [do](https://jinja.palletsprojects.com/en/3.0.x/extensions/#expression-statement) tag (or expression-statements) to execute the return function without generating an output string. In the following example ,`{% do return([1,2,3]) %}` acts as a *statement* that executes the return action but does not output a string: macros/get\_data.sql ```sql {% macro get_data() %} {% do return([1,2,3]) %} {% endmacro %} ``` models/my\_model.sql ```sql select -- getdata() returns a list! {% for i in get_data() %} {{ i }} {%- if not loop.last %},{% endif -%} {% endfor %} ``` #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### About run_query macro The `run_query` macro provides a convenient way to run queries and fetch their results. It is a wrapper around the [statement block](https://docs.getdbt.com/reference/dbt-jinja-functions/statement-blocks.md), which is more flexible, but also more complicated to use. **Args**: * `sql`: The SQL query to execute Returns a [Table](https://agate.readthedocs.io/page/api/table.html) object with the result of the query. If the specified query does not return results (eg. a DDL, DML, or maintenance query), then the return value will be `none`. **Note:** The `run_query` macro will not begin a transaction automatically - if you wish to run your query inside of a transaction, please use `begin` and `commit `statements as appropriate. Using run\_query for the first time? Check out the section of the Getting Started guide on [using Jinja](https://docs.getdbt.com/guides/using-jinja.md#dynamically-retrieve-the-list-of-payment-methods) for an example of working with the results of the `run_query` macro! **Example Usage:** models/my\_model.sql ```jinja2 {% set results = run_query('select 1 as id') %} {% if results is not none %} {{ log(results.print_table(), info=True) }} {% endif %} {# do something with `results` here... #} ``` macros/run\_grants.sql ```jinja2 {% macro run_vacuum(table) %} {% set query %} vacuum table {{ table }} {% endset %} {% do run_query(query) %} {% endmacro %} ``` Here's an example of using this (though if you're using `run_query` to return the values of a column, check out the [get\_column\_values](https://github.com/dbt-labs/dbt-utils#get_column_values-source) macro in the dbt-utils package). models/my\_model.sql ```sql {% set payment_methods_query %} select distinct payment_method from app_data.payments order by 1 {% endset %} {% set results = run_query(payment_methods_query) %} {% if execute %} {# Return the first column #} {% set results_list = results.columns[0].values() %} {% else %} {% set results_list = [] %} {% endif %} select order_id, {% for payment_method in results_list %} sum(case when payment_method = '{{ payment_method }}' then amount end) as {{ payment_method }}_amount, {% endfor %} sum(amount) as total_amount from {{ ref('raw_payments') }} group by 1 ``` You can also use `run_query` to perform SQL queries that aren't select statements. macros/run\_vacuum.sql ```sql {% macro run_vacuum(table) %} {% set query %} vacuum table {{ table }} {% endset %} {% do run_query(query) %} {% endmacro %} ``` Use the `length` filter to verify whether `run_query` returned any rows or not. Make sure to wrap the logic in an [if execute](https://docs.getdbt.com/reference/dbt-jinja-functions/execute.md) block to avoid unexpected behavior during parsing. ```sql {% if execute %} {% set results = run_query(payment_methods_query) %} {% if results|length > 0 %} -- do something with `results` here... {% else %} -- do fallback here... {% endif %} {% endif %} ``` #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### About run_started_at variable `run_started_at` outputs the timestamp that this run started, e.g. `2017-04-21 01:23:45.678`. The `run_started_at` variable is a Python `datetime` object. As of 0.9.1, the timezone of this variable defaults to UTC. run\_started\_at\_example.sql ```sql select '{{ run_started_at.strftime("%Y-%m-%d") }}' as date_day from ... ``` To modify the timezone of this variable, use the `pytz` module: run\_started\_at\_utc.sql ```sql select '{{ run_started_at.astimezone(modules.pytz.timezone("America/New_York")) }}' as run_started_est from ... ``` #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### About schemas variable `schemas` is a variable available in an `on-run-end` hook, representing a list of schemas that dbt built objects in on this run. If you do not use [custom schemas](https://docs.getdbt.com/docs/build/custom-schemas.md), `schemas` will evaluate to your target schema, e.g. `['dbt_alice']`. If you use custom schemas, it will include these as well, e.g. `['dbt_alice', 'dbt_alice_marketing', 'dbt_alice_finance']`. The `schemas` variable is useful for granting privileges to all schemas that dbt builds relations in, like so (note this is Redshift specific syntax): dbt\_project.yml ```yaml ... on-run-end: - "{% for schema in schemas%}grant usage on schema {{ schema }} to group reporter;{% endfor%}" - "{% for schema in schemas %}grant select on all tables in schema {{ schema }} to group reporter;{% endfor%}" - "{% for schema in schemas %}alter default privileges in schema {{ schema }} grant select on tables to group reporter;{% endfor %}" ``` Want more in-depth instructions on the recommended way to grant privileges? We've written a full discourse article [here](https://discourse.getdbt.com/t/the-exact-grant-statements-we-use-in-a-dbt-project/430) #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### About selected_resources context variable The `selected_resources` context variable contains a list of all the *nodes* selected by the current dbt command. Currently, this variable is not accessible when using the command `run-operation`. Warning! dbt actively builds the graph during the [parsing phase](https://docs.getdbt.com/reference/dbt-jinja-functions/execute.md) of running dbt projects, so the `selected_resources` context variable will be empty during parsing. Please read the information on this page to effectively use this variable. ##### Usage[​](#usage "Direct link to Usage") The `selected_resources` context variable is a list of all the resources selected by the current dbt command selector. Its value depends on the usage of parameters like `--select`, `--exclude` and `--selector`. For a given run it will look like: ```json ["model.my_project.model1", "model.my_project.model2", "snapshot.my_project.my_snapshot"] ``` Each value corresponds to a key in the `nodes` object within the [graph](https://docs.getdbt.com/reference/dbt-jinja-functions/graph.md) context variable. It can be used in macros in a `pre-hook`, `post-hook`, `on-run-start` or `on-run-end` to evaluate what nodes are selected and trigger different logic whether a particular node is selected or not. check-node-selected.sql ```sql /* Check if a given model is selected and trigger a different action, depending on the result */ {% if execute %} {% if 'model.my_project.model1' in selected_resources %} {% do log("model1 is included based on the current selection", info=true) %} {% else %} {% do log("model1 is not included based on the current selection", info=true) %} {% endif %} {% endif %} /* Example output when running the code in on-run-start when doing `dbt build`, including all nodels --------------------------------------------------------------- model1 is included based on the current selection Example output when running the code in on-run-start when doing `dbt run --select model2` --------------------------------------------------------------- model1 is not included based on the current selection */ ``` #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### About set context method tip Not to be confused with the `{% set foo = "bar" ... %}` expression in Jinja, which defines a variable. For examples of constructing SQL strings with `{% set %}` (and why `{{ }}` should not be nested inside quoted strings), see [Don’t nest your curlies](https://docs.getdbt.com/best-practices/dont-nest-your-curlies.md). You can use the `set` context method to convert any iterable to a sequence of iterable elements that are unique (a set). **Args**: * `value`: The iterable to convert (for example, a list) * `default`: A default value to return if the `value` argument is not a valid iterable ##### Usage[​](#usage "Direct link to Usage") ```text {% set my_list = [1, 2, 2, 3] %} {% set my_set = set(my_list) %} {% do log(my_set) %} {# {1, 2, 3} #} ``` ```text {% set my_invalid_iterable = 1234 %} {% set my_set = set(my_invalid_iterable) %} {% do log(my_set) %} {# None #} ``` ```text {% set email_id = "'admin@example.com'" %} ``` ##### set\_strict[​](#set_strict "Direct link to set_strict") The `set_strict` context method can be used to convert any iterable to a sequence of iterable elements that are unique (a set). The difference to the `set` context method is that the `set_strict` method will raise an exception on a `TypeError`, if the provided value is not a valid iterable and cannot be converted to a set. **Args**: * `value`: The iterable to convert (for example, a list) ```text {% set my_list = [1, 2, 2, 3] %} {% set my_set = set(my_list) %} {% do log(my_set) %} {# {1, 2, 3} #} ``` ```text {% set my_invalid_iterable = 1234 %} {% set my_set = set_strict(my_invalid_iterable) %} {% do log(my_set) %} Compilation Error in ... (...) 'int' object is not iterable ``` #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### About source function ```sql select * from {{ source("source_name", "table_name") }} ``` #### Definition[​](#definition "Direct link to Definition") This function: * Returns a [Relation](https://docs.getdbt.com/reference/dbt-classes.md#relation) for a [source](https://docs.getdbt.com/docs/build/sources.md) * Creates dependencies between a source and the current model, which is useful for documentation and [node selection](https://docs.getdbt.com/reference/node-selection/syntax.md) * Compiles to the full object name in the database #### Related guides[​](#related-guides "Direct link to Related guides") * [Using sources](https://docs.getdbt.com/docs/build/sources.md) #### Arguments[​](#arguments "Direct link to Arguments") * `source_name`: The `name:` defined under a `sources:` key * `table_name`: The `name:` defined under a `tables:` key #### Example[​](#example "Direct link to Example") Consider a source defined as follows: models/\.yml ```yaml sources: - name: jaffle_shop # this is the source_name database: raw tables: - name: customers # this is the table_name - name: orders ``` Select from the source in a model: models/orders.sql ```sql select ... from {{ source('jaffle_shop', 'customers') }} left join {{ source('jaffle_shop', 'orders') }} using (customer_id) ``` #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### About state in dbt One of the greatest underlying assumptions about dbt is that its operations should be **stateless** and **idempotent**. That is, it doesn't matter how many times a model has been run before, or if it has ever been run before. It doesn't matter if you run it once or a thousand times. Given the same raw data, you can expect the same transformed result. A given run of dbt doesn't need to "know" about *any other* run; it just needs to know about the code in the project and the objects in your database as they exist *right now*. That said, dbt does store "state" — a detailed, point-in-time view of project resources (also referred to as nodes), database objects, and invocation results — in the form of its [artifacts](https://docs.getdbt.com/docs/deploy/artifacts.md). If you choose, dbt can use these artifacts to inform certain operations. Crucially, the operations themselves are still stateless and idempotent: given the same manifest and the same raw data, dbt will produce the same transformed result. dbt can leverage artifacts from a prior invocation as long as their file path is passed to the `--state` flag. This is a prerequisite for: * [The `state` selector](https://docs.getdbt.com/reference/node-selection/methods.md#state), whereby dbt can identify resources that are new or modified by comparing code in the current project against the state manifest. * [Deferring](https://docs.getdbt.com/reference/node-selection/defer.md) to another environment, whereby dbt can identify upstream, unselected resources that don't exist in your current environment and instead "defer" their references to the environment provided by the state manifest. * The [`dbt clone` command](https://docs.getdbt.com/reference/commands/clone.md), whereby dbt can clone nodes based on their location in the manifest provided to the `--state` flag. Together, the [`state`](https://docs.getdbt.com/reference/node-selection/methods.md#state) selector and deferral enable ["slim CI"](https://docs.getdbt.com/best-practices/best-practice-workflows.md#run-only-modified-models-to-test-changes-slim-ci). We expect to add more features in future releases that can leverage artifacts passed to the `--state` flag. #### Related docs[​](#related-docs "Direct link to Related docs") * [Configure state selection](https://docs.getdbt.com/reference/node-selection/configure-state.md) * [State comparison caveats](https://docs.getdbt.com/reference/node-selection/state-comparison-caveats.md) #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### About statement blocks Recommendation We recommend using the [`run_query` macro](https://docs.getdbt.com/reference/dbt-jinja-functions/run_query.md) instead of `statement` blocks. The `run_query` macro provides a more convenient way to run queries and fetch their results by wrapping `statement` blocks. You can use this macro to write more concise code that is easier to maintain. `statement`s are SQL queries that hit the database and return results to your Jinja context. Here’s an example of a `statement` which gets all of the states from a users table. get\_states\_statement.sql ```sql -- depends_on: {{ ref('users') }} {%- call statement('states', fetch_result=True) -%} select distinct state from {{ ref('users') }} {%- endcall -%} ``` The signature of the `statement` block looks like this: ```text statement(name=None, fetch_result=False, auto_begin=True) ``` When executing a `statement`, dbt needs to understand how to resolve references to other dbt models or resources. If you are already `ref`ing the model outside of the statement block, the dependency will be automatically inferred, but otherwise you will need to [force the dependency](https://docs.getdbt.com/reference/dbt-jinja-functions/ref.md#forcing-dependencies) with `-- depends_on`.  Example using -- depends\_on ```sql -- depends_on: {{ ref('users') }} {% call statement('states', fetch_result=True) -%} select distinct state from {{ ref('users') }} /* The unique states are: {{ load_result('states')['data'] }} */ {%- endcall %} ```  Example using ref() function ```sql {% call statement('states', fetch_result=True) -%} select distinct state from {{ ref('users') }} /* The unique states are: {{ load_result('states')['data'] }} */ {%- endcall %} select id * 2 from {{ ref('users') }} ``` **Args**: * `name` (string): The name for the result set returned by this statement * `fetch_result` (bool): If True, load the results of the statement into the Jinja context * `auto_begin` (bool): If True, open a transaction if one does not exist. If false, do not open a transaction. Once the statement block has executed, the result set is accessible via the `load_result` function. The result object includes three keys: * `response`: Structured object containing metadata returned from the database, which varies by adapter. E.g. success `code`, number of `rows_affected`, total `bytes_processed`, etc. Comparable to `adapter_response` in the [Result object](https://docs.getdbt.com/reference/dbt-classes.md#result-objects). * `data`: Pythonic representation of data returned by query (arrays, tuples, dictionaries). * `table`: [Agate](https://agate.readthedocs.io/page/api/table.html) table representation of data returned by query. For the above statement, that could look like: load\_states.sql ```sql {%- set states = load_result('states') -%} {%- set states_data = states['data'] -%} {%- set states_status = states['response'] -%} ``` The contents of the returned `data` field is a matrix. It contains a list rows, with each row being a list of values returned by the database. For the above example, this data structure might look like: states.sql ```python >>> log(states_data) [ ['PA'], ['NY'], ['CA'], ... ] ``` #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### About target variables The `target` variable contains information about your connection to the warehouse. * **dbt Core:** These values are based on the target defined in your [profiles.yml](https://docs.getdbt.com/docs/local/profiles.yml.md) file. Please note that for certain adapters, additional configuration steps may be required. Refer to the [set up page](https://docs.getdbt.com/docs/local/connect-data-platform/about-dbt-connections.md) for your data platform. * **dbt** To learn more about setting up your adapter in dbt, refer to [About data platform connections](https://docs.getdbt.com/docs/cloud/connect-data-platform/about-connections.md). * **[Orchestrator](https://docs.getdbt.com/docs/deploy/job-scheduler.md)**: `target.name` is defined per job as described in [Custom target names](https://docs.getdbt.com/docs/build/custom-target-names.md). For other attributes, values are defined by the deployment connection. To check these values, click **Deploy** and select **Environments**. Then, select the relevant deployment environment, and click **Settings**. * **[Studio IDE](https://docs.getdbt.com/docs/cloud/studio-ide/develop-in-studio.md)**: These values are defined by your connection and credentials. To edit these values, click on your account name in the left side menu and select **Account settings**. Then, click **Credentials**. Select and edit a project to set up the credentials and target name. Some configurations are shared between all adapters, while others are adapter-specific. You can also use the [`--target` flag](#using-the---target-flag) to set the active target when running dbt commands. #### Common[​](#common "Direct link to Common") | Variable | Example | Description | | --------------------- | ------------ | --------------------------------------------------------------------------------------------------- | | `target.profile_name` | jaffle\_shop | The name of the active profile | | `target.name` | dev | Name of the active target | | `target.schema` | dbt\_alice | Name of the dbt schema (or, dataset on BigQuery) | | `target.type` | postgres | The active adapter being used. One of "postgres", "snowflake", "bigquery", "redshift", "databricks" | | `target.threads` | 4 | The number of threads in use by dbt | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | #### Adapter-specific[​](#adapter-specific "Direct link to Adapter-specific") ##### Snowflake[​](#snowflake "Direct link to Snowflake") | Variable | Example | Description | | ------------------ | --------------- | ------------------------------------------ | | `target.database` | RAW | Database name specified in active target. | | `target.warehouse` | TRANSFORM | Name of the Snowflake virtual warehouse | | `target.user` | TRANSFORM\_USER | The user specified in the active target | | `target.role` | TRANSFORM\_ROLE | The role specified in the active target | | `target.account` | abc123 | The account specified in the active target | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | ##### Postgres/Redshift[​](#postgresredshift "Direct link to Postgres/Redshift") | Variable | Example | Description | | --------------- | --------------------------------------- | ----------------------------------------- | | `target.dbname` | analytics | Database name specified in active target. | | `target.host` | abc123.us-west-2.redshift.amazonaws.com | The host specified in active target | | `target.user` | dbt\_user | The user specified in the active target | | `target.port` | 5439 | The port specified in the active profile | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | ##### BigQuery[​](#bigquery "Direct link to BigQuery") | Variable | Example | Description | | ---------------- | ---------- | ------------------------------------------- | | `target.project` | abc-123 | The project specified in the active profile | | `target.dataset` | dbt\_alice | The dataset the active profile | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | #### Using the --target flag[​](#using-the---target-flag "Direct link to Using the --target flag") Use the `--target` flag when running dbt commands to set the active target and its associated `target.name` value: ```bash dbt run --target dev ``` ```bash dbt run --target prod ``` You can use the `--target` flag with any dbt command to override the default target specified in your `profiles.yml` file. This is useful for running the same dbt project against different environments (like dev, staging, or prod) without changing your configuration files. #### Examples[​](#examples "Direct link to Examples") ##### Use `target.name` to limit data in dev[​](#use-targetname-to-limit-data-in-dev "Direct link to use-targetname-to-limit-data-in-dev") As long as you use sensible target names, you can perform conditional logic to limit data when working in dev. ```sql select * from source('web_events', 'page_views') {% if target.name == 'dev' %} where created_at >= dateadd('day', -3, current_date) {% endif %} ``` ##### Use `target.name` to change your source database[​](#use-targetname-to-change-your-source-database "Direct link to use-targetname-to-change-your-source-database") If you have specific Snowflake databases configured for your dev/qa/prod environments, you can set up your sources to compile to different databases depending on your environment. ```yml sources: - name: source_name database: | {%- if target.name == "dev" -%} raw_dev {%- elif target.name == "qa" -%} raw_qa {%- elif target.name == "prod" -%} raw_prod {%- else -%} invalid_database {%- endif -%} schema: source_schema ``` #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### about this `this` is the database representation of the current model. It is useful when: * Defining a `where` statement within [incremental models](https://docs.getdbt.com/docs/build/incremental-models.md) * Using [pre or post hooks](https://docs.getdbt.com/reference/resource-configs/pre-hook-post-hook.md) `this` is a [Relation](https://docs.getdbt.com/reference/dbt-classes.md#relation), and as such, properties such as `{{ this.database }}` and `{{ this.schema }}` compile as expected. * Note — Prior to dbt v1.6, returns `request` as the result of `{{ ref.identifier }}`. `this` can be thought of as equivalent to `ref('')`, and is a neat way to avoid circular dependencies. #### Examples[​](#examples "Direct link to Examples") ##### Configuring incremental models[​](#configuring-incremental-models "Direct link to Configuring incremental models") models/stg\_events.sql ```sql {{ config(materialized='incremental') }} select *, my_slow_function(my_column) from raw_app_data.events {% if is_incremental() %} where event_time > (select max(event_time) from {{ this }}) {% endif %} ``` #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### About thread_id The `thread_id` outputs an identifier for the current Python thread that is executing a node, like `Thread-1`. This value is useful when auditing or analyzing dbt invocation metadata. It corresponds to the `thread_id` within the [`Result` object](https://docs.getdbt.com/reference/dbt-classes.md#result-objects) and [`run_results.json`](https://docs.getdbt.com/reference/artifacts/run-results-json.md). If available, the `thread_id` is: * available in the compilation context of [`query-comment`](https://docs.getdbt.com/reference/project-configs/query-comment.md) * included in the `info` dictionary in dbt [events and logs](https://docs.getdbt.com/reference/events-logging.md#info) * included in the `metadata` dictionary in [dbt artifacts](https://docs.getdbt.com/reference/artifacts/dbt-artifacts.md#common-metadata) * included as a label in all BigQuery jobs that dbt originates #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### About tojson context method The `tojson` context method can be used to serialize a Python object primitive, eg. a `dict` or `list` to a JSON string. **Args**: * `value`: The value to serialize to JSON (required) * `default`: A default value to return if the `value` argument cannot be serialized (optional) ##### Usage:[​](#usage "Direct link to Usage:") ```text {% set my_dict = {"abc": 123} %} {% set my_json_string = tojson(my_dict) %} {% do log(my_json_string) %} ``` #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### About toyaml context method The `toyaml` context method can be used to serialize a Python object primitive, eg. a `dict` or `list` to a YAML string. **Args**: * `value`: The value to serialize to YAML (required) * `default`: A default value to return if the `value` argument cannot be serialized (optional) ##### Usage:[​](#usage "Direct link to Usage:") ```text {% set my_dict = {"abc": 123} %} {% set my_yaml_string = toyaml(my_dict) %} {% do log(my_yaml_string) %} ``` #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### About unit tests property 💡Did you know... Available from dbt v 1.8 or with the [dbt "Latest" release track](https://docs.getdbt.com/docs/dbt-versions/cloud-release-tracks.md). Unit tests validate your SQL modeling logic on a small set of static inputs before you materialize your full model in production. They support a test-driven development approach, improving both the efficiency of developers and reliability of code. To run only your unit tests, use the command: `dbt test --select test_type:unit` #### Before you begin[​](#before-you-begin "Direct link to Before you begin") * We currently only support unit testing SQL models. * We currently only support adding unit tests to models in your *current* project. * If your model has multiple versions, by default the unit test will run on *all* versions of your model. Read [unit testing versioned models](https://docs.getdbt.com/reference/resource-properties/unit-testing-versions.md) for more information. * Unit tests must be defined in a YML file in your `models/` directory. * If you want to unit test a model that depends on an ephemeral model, you must use `format: sql` for that input. models/schema.yml ```yml unit_tests: - name: # this is the unique name of the test model: versions: #optional include: #optional exclude: #optional config: meta: {dictionary} tags: | [] enabled: {boolean} # optional. v1.9 or higher. If not configured, defaults to `true` given: - input: # optional for seeds format: dict | csv | sql # either define rows inline or name of fixture rows: {dictionary} | fixture: # SQL or csv - input: ... # declare additional inputs expect: format: dict | csv | sql # either define rows inline of rows or name of fixture rows: {dictionary} | fixture: # SQL or csv overrides: # optional: configuration for the dbt execution environment macros: is_incremental: true | false dbt_utils.current_timestamp: # ... any other Jinja function from https://docs.getdbt.com/reference/dbt-jinja-functions # ... any other context property vars: {dictionary} env_vars: {dictionary} - name: ... # declare additional unit tests ``` #### Examples[​](#examples "Direct link to Examples") models/schema.yml ```yml unit_tests: - name: test_is_valid_email_address # this is the unique name of the test model: dim_customers # name of the model I'm unit testing given: # the mock data for your inputs - input: ref('stg_customers') rows: - {email: cool@example.com, email_top_level_domain: example.com} - {email: cool@unknown.com, email_top_level_domain: unknown.com} - {email: badgmail.com, email_top_level_domain: gmail.com} - {email: missingdot@gmailcom, email_top_level_domain: gmail.com} - input: ref('top_level_email_domains') rows: - {tld: example.com} - {tld: gmail.com} expect: # the expected output given the inputs above rows: - {email: cool@example.com, is_valid_email_address: true} - {email: cool@unknown.com, is_valid_email_address: false} - {email: badgmail.com, is_valid_email_address: false} - {email: missingdot@gmailcom, is_valid_email_address: false} ``` models/schema.yml ```yml unit_tests: - name: test_is_valid_email_address # this is the unique name of the test model: dim_customers # name of the model I'm unit testing given: # the mock data for your inputs - input: ref('stg_customers') rows: - {email: cool@example.com, email_top_level_domain: example.com} - {email: cool@unknown.com, email_top_level_domain: unknown.com} - {email: badgmail.com, email_top_level_domain: gmail.com} - {email: missingdot@gmailcom, email_top_level_domain: gmail.com} - input: ref('top_level_email_domains') format: csv rows: | tld example.com gmail.com expect: # the expected output given the inputs above format: csv fixture: valid_email_address_fixture_output ``` models/schema.yml ```yml unit_tests: - name: test_is_valid_email_address # this is the unique name of the test model: dim_customers # name of the model I'm unit testing given: # the mock data for your inputs - input: ref('stg_customers') rows: - {email: cool@example.com, email_top_level_domain: example.com} - {email: cool@unknown.com, email_top_level_domain: unknown.com} - {email: badgmail.com, email_top_level_domain: gmail.com} - {email: missingdot@gmailcom, email_top_level_domain: gmail.com} - input: ref('top_level_email_domains') format: sql rows: | select 'example.com' as tld union all select 'gmail.com' as tld expect: # the expected output given the inputs above format: sql fixture: valid_email_address_fixture_output ``` #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### About var function To retrieve a variable inside a model, hook, or macro, use the `var()` function. The `var()` function returns the value defined in your project or passed using `--vars`, based on precedence. You can use `var()` anywhere dbt renders Jinja during compilation, including most `.sql` and `.yml` files in your project. It does not work in configuration files that dbt reads before compilation, such as [`profiles.yml`](https://docs.getdbt.com/reference/dbt-jinja-functions/profiles-yml-context.md) or [`packages.yml`](). To add a variable to a model, use the `var()` function: my\_model.sql ```sql select * from events where event_type = '{{ var("event_type") }}' ``` If you try to run this model without supplying an `event_type` variable, you'll receive a compilation error that looks like this: ```text Encountered an error: ! Compilation error while compiling model package_name.my_model: ! Required var 'event_type' not found in config: Vars supplied to package_name.my_model = { } ``` ##### Variable default values[​](#variable-default-values "Direct link to Variable default values") The `var()` function takes an optional second argument, `default`. If this argument is provided, then it will be the default value for the variable if one is not explicitly defined. my\_model.sql ```sql -- Use 'activation' as the event_type if the variable is not defined. select * from events where event_type = '{{ var("event_type", "activation") }}' ``` ##### Command line variables[​](#command-line-variables "Direct link to Command line variables") The `dbt_project.yml` file is a great place to define variables that rarely change. When you need to override a variable for a specific run, use the `--vars` command line option. For example, when you want to test with a different date range, run models with environment-specific settings, or adjust behavior dynamically. Use `--vars` to pass one or more variables to a dbt command. Provide the argument as a YAML dictionary string. For example: ```text $ dbt run --vars '{"event_type": "signup"}' ``` Inside a model or macro, access the value using the `var()` function: ```text select '{{ var("event_type") }}' as event_type ``` When you pass variables using `--vars`, you can access them anywhere you use the `var()` function in your project. You can pass multiple variables at once: ```text $ dbt run --vars '{event_type: signup, region: us}' ``` If only one variable is being set, the brackets are optional: ```text $ dbt run --vars 'event_type: signup' ``` The `--vars` argument accepts a YAML dictionary as a string on the command line. YAML is convenient because it does not require strict quoting as with JSON. Both of the following are valid and equivalent: ```text $ dbt run --vars '{"key": "value", "date": 20180101}' $ dbt run --vars '{key: value, date: 20180101}' ``` Variables defined using `--var`, override values defined in `dbt_project.yml`. This makes `--vars` useful for temporarily overriding configuration without changing your committed project files. For the complete order of precedence (including package-scoped variables and default values defined in `var()`), see [Variable precedence](https://docs.getdbt.com/docs/build/project-variables.md#variable-precedence). #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### About zip context method The `zip` context method can be used to return an iterator of tuples, where the i-th tuple contains the i-th element from each of the argument iterables. For more information, see [Python docs](https://docs.python.org/3/library/functions.html#zip). **Args**: * `*args`: Any number of iterables * `default`: A default value to return if `*args` is not iterable ##### Usage[​](#usage "Direct link to Usage") ```text {% set my_list_a = [1, 2] %} {% set my_list_b = ['alice', 'bob'] %} {% set my_zip = zip(my_list_a, my_list_b) | list %} {% do log(my_zip) %} {# [(1, 'alice'), (2, 'bob')] #} ``` ```text {% set my_list_a = 12 %} {% set my_list_b = ['alice', 'bob'] %} {% set my_zip = zip(my_list_a, my_list_b, default = []) | list %} {% do log(my_zip) %} {# [] #} ``` ##### zip\_strict[​](#zip_strict "Direct link to zip_strict") The `zip_strict` context method can be used to used to return an iterator of tuples, just like `zip`. The difference to the `zip` context method is that the `zip_strict` method will raise an exception on a `TypeError`, if one of the provided values is not a valid iterable. **Args**: * `value`: The iterable to convert (e.g. a list) ```text {% set my_list_a = [1, 2] %} {% set my_list_b = ['alice', 'bob'] %} {% set my_zip = zip_strict(my_list_a, my_list_b) | list %} {% do log(my_zip) %} {# [(1, 'alice'), (2, 'bob')] #} ``` ```text {% set my_list_a = 12 %} {% set my_list_b = ['alice', 'bob'] %} {% set my_zip = zip_strict(my_list_a, my_list_b) %} Compilation Error in ... (...) 'int' object is not iterable ``` #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### access models/\.yml ```yml models: - name: model_name config: access: private | protected | public # changed to config in v1.10 ``` You can apply `access` modifiers in config files, including the `dbt_project.yml`, or to models one-by-one in `properties.yml`. Applying `access` configs to a subfolder modifies the default for all models in that subfolder, so make sure you intend for this behavior. When setting individual model access, a group or subfolder might contain a variety of access levels, so when you designate a model with `access: public` make sure you intend for this behavior. Note that for backwards compatibility, `access` is supported as a top-level key, but without the capabilities of config inheritance. There are multiple approaches to configuring access: * In `properties.yml` using the older method: models/properties\_my\_public\_model.yml ```yml models: - name: my_public_model config: access: public # Older method, still supported # changed to config in v1.10 ``` * In `properties.yml` using the new method (for v1.7 or higher). Use either the older method or the new method, but not both for the same model: models/properties\_my\_public\_model.yml ```yml models: - name: my_public_model config: access: public ``` * In `dbt_project.yml`: dbt\_project.yml ```yml models: my_project_name: subfolder_name: +group: my_group +access: private # sets default for all models in this subfolder ``` * In the `my_public_model.sql` file: models/my\_public\_model.sql ```sql -- models/my_public_model.sql {{ config(access = "public") }} select ... ``` After you define `access`, rerun a production job to apply the change. #### Definition[​](#definition "Direct link to Definition") The access level of the model you are declaring properties for. Some models (not all) are designed to be referenced through the [ref](https://docs.getdbt.com/reference/dbt-jinja-functions/ref.md) function across [groups](https://docs.getdbt.com/docs/build/groups.md). | Access | Referenceable by | | --------- | ----------------------------------------------------------------------------------------- | | private | Same group | | protected | Same project/package | | public | Any group, package, or project. When defined, rerun a production job to apply the change. | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | If you try to reference a model outside of its supported access, you will see an error: ```shell dbt run -s marketing_model ... dbt.exceptions.DbtReferenceError: Parsing Error Node model.jaffle_shop.marketing_model attempted to reference node model.jaffle_shop.finance_model, which is not allowed because the referenced node is private to the finance group. ``` #### Default[​](#default "Direct link to Default") By default, all models are "protected." This means that other models in the same project can reference them. #### Related docs[​](#related-docs "Direct link to Related docs") * [Model Access](https://docs.getdbt.com/docs/mesh/govern/model-access.md#groups) * [Group configuration](https://docs.getdbt.com/reference/resource-configs/group.md) #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Advanced configuration usage #### Alternative SQL file config syntax[​](#alternative-sql-file-config-syntax "Direct link to Alternative SQL file config syntax") Some configurations may contain characters (e.g. dashes) that cannot be parsed as a Jinja argument. For example, the following would return an error: ```sql {{ config( post-hook="grant select on {{ this }} to role reporter", materialized='table' ) }} select ... ``` While dbt provides an alias for any core configurations (for example, you should use `pre_hook` instead of `pre-hook` in a config block), your dbt project may contain custom configurations without aliases. If you want to specify these configurations inside of a model, use the alternative config block syntax: models/events/base/base\_events.sql ```sql {{ config({ "post-hook": "grant select on {{ this }} to role reporter", "materialized": "table" }) }} select ... ``` #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### alias Specify a custom alias for a model, data test, snapshot, or seed and give it a more user-friendly name in the database. * Models * Seeds * Snapshots * Tests Specify a custom alias for a model in your project YAML file (`dbt_project.yml`), properties YAML file (for example, `models/properties.yml`) config, or in a SQl file config block. For example, if you have a model that calculates `sales_total` and want to give it a more user-friendly alias, you can alias it as shown in the following examples. In the `dbt_project.yml` file, the following example sets a default `alias` for the `sales_total` model at the project level: dbt\_project.yml ```yml models: your_project: sales_total: +alias: sales_dashboard ``` The following specifies an `alias` as part of the `models/properties.yml` file metadata, useful for centralized configuration: models/properties.yml ```yml models: - name: sales_total config: alias: sales_dashboard ``` The following assigns the `alias` directly in the In `models/sales_total.sql` file: models/sales\_total.sql ```sql {{ config( alias="sales_dashboard" ) }} ``` This would return `analytics.finance.sales_dashboard` in the database, instead of the default `analytics.finance.sales_total`. Configure a seed's alias in your project file (`dbt_project.yml`) or a properties file config (for example, `seeds/properties.yml`). The following examples demonstrate how to `alias` a seed named `product_categories` to `categories_data`. In the `dbt_project.yml` file at the project level: dbt\_project.yml ```yml seeds: your_project: product_categories: +alias: categories_data ``` In the `seeds/properties.yml` file: seeds/properties.yml ```yml seeds: - name: product_categories config: alias: categories_data ``` This would return the name `analytics.finance.categories_data` in the database. In the following second example, the seed at `seeds/country_codes.csv` will be built as a table named `country_mappings`. dbt\_project.yml ```yml seeds: jaffle_shop: country_codes: +alias: country_mappings ``` Configure a snapshots's alias in your project YAML file (`dbt_project.yml` ), properties YAML file (for example, `snapshots/snapshot_name.yml`), or in a SQL file config block for the model. The following examples demonstrate how to `alias` a snapshot named `your_snapshot` to `the_best_snapshot`. In the `dbt_project.yml` file at the project level: dbt\_project.yml ```yml snapshots: your_project: your_snapshot: +alias: the_best_snapshot ``` In the `snapshots/snapshot_name.yml` file: snapshots/snapshot\_name.yml ````yml snapshots: - name: your_snapshot_name config: alias: the_best_snapshot In `snapshots/your_snapshot.sql` file: ```sql {{ config( alias="the_best_snapshot" ) }} ```` This would build your snapshot to `analytics.finance.the_best_snapshot` in the database. Configure a data test's alias in your project YAML file (`dbt_project.yml` ), properties YAML file (for example, `models/properties.yml`) file, or in a SQL file config block for the model. The following examples demonstrate how to `alias` a unique data test named `order_id` to `unique_order_id_test` to identify a specific data test. In the `dbt_project.yml` file at the project level: dbt\_project.yml ```yml data_tests: your_project: +alias: unique_order_id_test ``` In the `models/properties.yml` file: models/properties.yml ```yml models: - name: orders columns: - name: order_id data_tests: - unique: arguments: # available in v1.10.5 and higher. Older versions can set the as the top-level property. alias: unique_order_id_test ``` In `tests/unique_order_id_test.sql` file: tests/unique\_order\_id\_test.sql ```sql {{ config( alias="unique_order_id_test", severity="error" ) }} ``` When using [`store_failures_as`](https://docs.getdbt.com/reference/resource-configs/store_failures_as.md), this would return the name `analytics.dbt_test__audit.orders_order_id_unique_order_id_test` in the database. #### Definition[​](#definition "Direct link to Definition") Optionally specify a custom alias for a [model](https://docs.getdbt.com/docs/build/models.md), [data test](https://docs.getdbt.com/docs/build/data-tests.md), [snapshot](https://docs.getdbt.com/docs/build/snapshots.md), or [seed](https://docs.getdbt.com/docs/build/seeds.md). When dbt creates a relation (table/view) in a database, it creates it as: `{{ database }}.{{ schema }}.{{ identifier }}`, e.g. `analytics.finance.payments` The standard behavior of dbt is: * If a custom alias is *not* specified, the identifier of the relation is the resource name (i.e. the filename). * If a custom alias is specified, the identifier of the relation is the `{{ alias }}` value. **Note** With an [ephemeral model](https://docs.getdbt.com/docs/build/materializations.md), dbt will always apply the prefix `__dbt__cte__` to the CTE identifier. This means that if an alias is set on an ephemeral model, then its CTE identifier will be `__dbt__cte__{{ alias }}`, but if no alias is set then its identifier will be `__dbt__cte__{{ filename }}`. To learn more about changing the way that dbt generates a relation's `identifier`, read [Using Aliases](https://docs.getdbt.com/docs/build/custom-aliases.md). #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Amazon Athena configurations #### Models[​](#models "Direct link to Models") ##### Table configuration[​](#table-configuration "Direct link to Table configuration") | Parameter | Default | Description | | ------------------------- | ------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | `external_location` | None | The full S3 path to where the table is saved. It only works with incremental models. It doesn't work with Hive tables with `ha` set to `true`. | | `partitioned_by` | None | An array list of columns by which the table will be partitioned. Currently limited to 100 partitions. | | `bucketed_by` | None | An array list of the columns to bucket data. Ignored if using Iceberg. | | `bucket_count` | None | The number of buckets for bucketing your data. This parameter is ignored if using Iceberg. | | `table_type` | Hive | The type of table. Supports `hive` or `iceberg`. | | `ha` | False | Build the table using the high-availability method. Only available for Hive tables. | | `format` | Parquet | The data format for the table. Supports `ORC`, `PARQUET`, `AVRO`, `JSON`, and `TEXTFILE`. | | `write_compression` | None | The compression type for any storage format that allows compressions. | | `field_delimeter` | None | Specify the custom field delimiter to use when the format is set to `TEXTFIRE`. | | `table_properties` | N/A | The table properties to add to the table. This is only for Iceberg. | | `native_drop` | N/A | Relation drop operations will be performed with SQL, not direct Glue API calls. No S3 calls will be made to manage data in S3. Data in S3 will only be cleared up for Iceberg tables. See the [AWS docs](https://docs.aws.amazon.com/athena/latest/ug/querying-iceberg-managing-tables.html) for more info. Iceberg DROP TABLE operations may timeout if they take longer than 60 seconds. | | `seed_by_insert` | False | Creates seeds using an SQL insert statement. Large seed files can't exceed the Athena 262144 bytes limit. | | `force_batch` | False | Run the table creation directly in batch insert mode. Useful when the standard table creation fails due to partition limitation. | | `unique_tmp_table_suffix` | False | Replace the "\_\_dbt\_tmp table" suffix with a unique UUID for incremental models using insert overwrite on Hive tables. | | `temp_schema` | None | Defines a schema to hold temporary create statements used in incremental model runs. Scheme will be created in the models target database if it does not exist. | | `lf_tags_config` | None | [AWS Lake Formation](#aws-lake-formation-integration) tags to associate with the table and columns. Existing tags will be removed.
\* `enabled` (`default=False`) whether LF tags management is enabled for a model
\* `tags` dictionary with tags and their values to assign for the model
\* `tags_columns` dictionary with a tag key, value and list of columns they must be assigned to | | `lf_inherited_tags` | None | List of the Lake Formation tag keys that are to be inherited from the database level and shouldn't be removed during the assignment of those defined in `ls_tags_config`. | | `lf_grants` | None | Lake Formation grants config for `data_cell` filters. | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | ###### Configuration examples[​](#configuration-examples "Direct link to Configuration examples") * schema.yml * dbt\_project.yml * Lake formation grants models/schema.yml ```sql {{ config( materialized='incremental', incremental_strategy='append', on_schema_change='append_new_columns', table_type='iceberg', schema='test_schema', lf_tags_config={ 'enabled': true, 'tags': { 'tag1': 'value1', 'tag2': 'value2' }, 'tags_columns': { 'tag1': { 'value1': ['column1', 'column2'], 'value2': ['column3', 'column4'] } }, 'inherited_tags': ['tag1', 'tag2'] } ) }} ``` dbt\_project.yml ```yaml +lf_tags_config: enabled: true tags: tag1: value1 tag2: value2 tags_columns: tag1: value1: [ column1, column2 ] inherited_tags: [ tag1, tag2 ] ``` ```python lf_grants={ 'data_cell_filters': { 'enabled': True | False, 'filters': { 'filter_name': { 'row_filter': '', 'principals': ['principal_arn1', 'principal_arn2'] } } } } ``` Consider these limitations and recommendations: * `lf_tags` and `lf_tags_columns` configs support only attaching lf tags to corresponding resources. * We recommend managing LF Tags permissions somewhere outside dbt. For example, [terraform](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/lakeformation_permissions) or [aws cdk](https://docs.aws.amazon.com/cdk/api/v2/docs/aws-cdk-lib.aws_lakeformation-readme.html). * `data_cell_filters` management can't be automated outside dbt because the filter can't be attached to the table, which doesn't exist. Once you `enable` this config, dbt will set all filters and their permissions during every dbt run. Such an approach keeps the actual state of row-level security configuration after every dbt run and applies changes if they occur: drop, create, and update filters and their permissions. * Any tags listed in `lf_inherited_tags` should be strictly inherited from the database level and never overridden at the table and column level. * Currently, `dbt-athena` does not differentiate between an inherited tag association and an override it made previously. * For example, If a `lf_tags_config` value overrides an inherited tag in one run, and that override is removed before a subsequent run, the prior override will linger and no longer be encoded anywhere (for example, Terraform where the inherited value is configured nor in the dbt project where the override previously existed but now is gone). ##### Table location[​](#table-location "Direct link to Table location") The saved location of a table is determined in precedence by the following conditions: 1. If `external_location` is defined, that value is used. 2. If `s3_data_dir` is defined, the path is determined by that and `s3_data_naming`. 3. If `s3_data_dir` is not defined, data is stored under `{s3_staging_dir}/tables/`. The following options are available for `s3_data_naming`: * `unique`: `{s3_data_dir}/{uuid4()}/` * `table`: `{s3_data_dir}/{table}/` * `table_unique`: `{s3_data_dir}/{table}/{uuid4()}/` * `schema_table`: `{s3_data_dir}/{schema}/{table}/` * `schema_table_unique`: `{s3_data_dir}/{schema}/{table}/{uuid4()}/` To set the `s3_data_naming` globally in the target profile, overwrite the value in the table config, or set up the value for groups of the models in dbt\_project.yml. Note: If you're using a workgroup with a default output location configured, `s3_data_naming` ignores any configured buckets and uses the location configured in the workgroup. ##### Incremental models[​](#incremental-models "Direct link to Incremental models") The following [incremental models](https://docs.getdbt.com/docs/build/incremental-models.md) strategies are supported: * `insert_overwrite` (default): The insert-overwrite strategy deletes the overlapping partitions from the destination table and then inserts the new records from the source. This strategy depends on the `partitioned_by` keyword! dbt will fall back to the `append` strategy if no partitions are defined. * `append`: Insert new records without updating, deleting or overwriting any existing data. There might be duplicate data (great for log or historical data). * `merge`: Conditionally updates, deletes, or inserts rows into an Iceberg table. Used in combination with `unique_key`.It is only available when using Iceberg. Consider this limitation when using Iceberg models: * Incremental Iceberg models — Sync all columns on schema change. You can't remove columns used for partitioning with an incremental refresh; you must fully refresh the model. ##### On schema change[​](#on-schema-change "Direct link to On schema change") The `on_schema_change` option reflects changes of the schema in incremental models. The values you can set this to are: * `ignore` (default) * `fail` * `append_new_columns` * `sync_all_columns` To learn more, refer to [What if the columns of my incremental model change](https://docs.getdbt.com/docs/build/incremental-models.md#what-if-the-columns-of-my-incremental-model-change). ##### Iceberg[​](#iceberg "Direct link to Iceberg") The adapter supports table materialization for Iceberg. For example: ```sql {{ config( materialized='table', table_type='iceberg', format='parquet', partitioned_by=['bucket(user_id, 5)'], table_properties={ 'optimize_rewrite_delete_file_threshold': '2' } ) }} select 'A' as user_id, 'pi' as name, 'active' as status, 17.89 as cost, 1 as quantity, 100000000 as quantity_big, current_date as my_date ``` Iceberg supports bucketing as hidden partitions. Use the `partitioned_by` config to add specific bucketing conditions. Iceberg supports the `PARQUET`, `AVRO` and `ORC` table formats for data . The following are the supported strategies for using Iceberg incrementally: * `append`: New records are appended to the table (this can lead to duplicates). * `merge`: Perform an update and insert (and optional delete) where new and existing records are added. This is only available with Athena engine version 3. * `unique_key`(required): Columns that define a unique source and target table record. * `incremental_predicates` (optional): The SQL conditions that enable custom join clauses in the merge statement. This helps improve performance via predicate pushdown on target tables. * `delete_condition` (optional): SQL condition that identifies records that should be deleted. * `update_condition` (optional): SQL condition that identifies records that should be updated. * `insert_condition` (optional): SQL condition that identifies records that should be inserted. `incremental_predicates`, `delete_condition`, `update_condition` and `insert_condition` can include any column of the incremental table (`src`) or the final table (`target`). Column names must be prefixed by either `src` or `target` to prevent a `Column is ambiguous` error. * delete\_condition * update\_condition * insert\_condition ```sql {{ config( materialized='incremental', table_type='iceberg', incremental_strategy='merge', unique_key='user_id', incremental_predicates=["src.quantity > 1", "target.my_date >= now() - interval '4' year"], delete_condition="src.status != 'active' and target.my_date < now() - interval '2' year", format='parquet' ) }} select 'A' as user_id, 'pi' as name, 'active' as status, 17.89 as cost, 1 as quantity, 100000000 as quantity_big, current_date as my_date ``` ```sql {{ config( materialized='incremental', incremental_strategy='merge', unique_key=['id'], update_condition='target.id > 1', schema='sandbox' ) }} {% if is_incremental() %} select * from ( values (1, 'v1-updated') , (2, 'v2-updated') ) as t (id, value) {% else %} select * from ( values (-1, 'v-1') , (0, 'v0') , (1, 'v1') , (2, 'v2') ) as t (id, value) {% endif %} ``` ```sql {{ config( materialized='incremental', incremental_strategy='merge', unique_key=['id'], insert_condition='target.status != 0', schema='sandbox' ) }} select * from ( values (1, 0) , (2, 1) ) as t (id, status) ``` ##### High availability (HA) table[​](#high-availability-ha-table "Direct link to High availability (HA) table") The current implementation of table materialization can lead to downtime, as the target table is dropped and re-created. For less destructive behavior, you can use the `ha` config on your `table` materialized models. It leverages the table versions feature of the glue catalog, which creates a temporary table and swaps the target table to the location of the temporary table. This materialization is only available for `table_type=hive` and requires using unique locations. For Iceberg, high availability is the default. By default, the materialization keeps the last 4 table versions,but you can change it by setting `versions_to_keep`. ```sql {{ config( materialized='table', ha=true, format='parquet', table_type='hive', partitioned_by=['status'], s3_data_naming='table_unique' ) }} select 'a' as user_id, 'pi' as user_name, 'active' as status union all select 'b' as user_id, 'sh' as user_name, 'disabled' as status ``` ##### HA known issues[​](#ha-known-issues "Direct link to HA known issues") * There could be a little downtime when swapping from a table with partitions to a table without (and the other way around). If higher performance is needed, consider bucketing instead of partitions. * By default, Glue "duplicates" the versions internally, so the last two versions of a table point to the same location. * It's recommended to set `versions_to_keep` >= 4, as this will avoid having the older location removed. ##### Avoid deleting parquet files[​](#avoid-deleting-parquet-files "Direct link to Avoid deleting parquet files") If a dbt model has the same name as an existing table in the AWS Glue catalog, the `dbt-athena` adapter deletes the files in that table’s S3 location before recreating the table using the SQL from the model. The adapter may also delete data if a model is configured to use the same S3 location as an existing table. In this case, it clears the folder before creating the new table to avoid conflicts during setup. When dropping a model, the `dbt-athena` adapter performs two cleanup steps for both Iceberg and Hive tables: * It deletes the table from the AWS Glue catalog using Glue APIs. * It removes the associated S3 data files using a delete operation. However, for Iceberg tables, using standard SQL like [`DROP TABLE`](https://docs.aws.amazon.com/athena/latest/ug/querying-iceberg-drop-table.html) may not remove all related S3 objects. To ensure proper cleanup in a dbt workflow, the adapter includes a workaround that explicitly deletes these S3 objects. Alternatively, users can enable [`native_drop`](https://docs.getdbt.com/reference/resource-configs/athena-configs.md#table-configuration) to let Iceberg handle the cleanup natively. ##### Update glue data catalog[​](#update-glue-data-catalog "Direct link to Update glue data catalog") You can persist your column and model level descriptions to the Glue Data Catalog as [glue table properties](https://docs.aws.amazon.com/glue/latest/dg/tables-described.html#table-properties) and [column parameters](https://docs.aws.amazon.com/glue/latest/webapi/API_Column.html). To enable this, set the configuration to `true` as shown in the following example. By default, documentation persistence is disabled, but it can be enabled for specific resources or groups of resources as needed. For example: ```yaml models: - name: test_deduplicate description: another value config: persist_docs: relation: true columns: true meta: test: value columns: - name: id config: meta: # changed to config in v1.10 and backported to 1.9 primary_key: true ``` Refer to [persist\_docs](https://docs.getdbt.com/reference/resource-configs/persist_docs.md) for more details. #### Snapshots[​](#snapshots "Direct link to Snapshots") The adapter supports snapshot materialization. It supports both the timestamp and check strategies. To create a snapshot, create a snapshot file in the `snapshots` directory. You'll need to create this directory if it doesn't already exist. ##### Timestamp strategy[​](#timestamp-strategy "Direct link to Timestamp strategy") Refer to [Timestamp strategy](https://docs.getdbt.com/docs/build/snapshots.md#timestamp-strategy-recommended) for details on how to use it. ##### Check strategy[​](#check-strategy "Direct link to Check strategy") Refer to [Check strategy](https://docs.getdbt.com/docs/build/snapshots.md#check-strategy) for details on how to use it. ##### Hard deletes[​](#hard-deletes "Direct link to Hard deletes") The materialization also supports invalidating hard deletes. For usage details, refer to [Hard deletes](https://docs.getdbt.com/docs/build/snapshots.md#hard-deletes-opt-in). ##### Snapshots known issues[​](#snapshots-known-issues "Direct link to Snapshots known issues") * Tables, schemas, and database names should only be lowercase. * To avoid potential conflicts, make sure [`dbt-athena-adapter`](https://github.com/Tomme/dbt-athena) is not installed in the target environment. * Snapshot does not support dropping columns from the source table. If you drop a column, make sure to drop the column from the snapshot as well. Another workaround is to NULL the column in the snapshot definition to preserve the history. #### AWS Lake Formation integration[​](#aws-lake-formation-integration "Direct link to AWS Lake Formation integration") The following describes how the adapter implements the AWS Lake Formation tag management: * [Enable](#table-configuration) LF tags management with the `lf_tags_config` parameter. By default, it's disabled. * Once enabled, LF tags are updated on every dbt run. * First, all lf-tags for columns are removed to avoid inheritance issues. * Then, all redundant lf-tags are removed from tables and actual tags from table configs are applied. * Finally, lf-tags for columns are applied. It's important to understand the following points: * dbt doesn't manage `lf-tags` for databases * dbt doesn't manage Lake Formation permissions That's why it's important to take care of this yourself or use an automation tool such as terraform and AWS CDK. For more details, refer to: * [terraform aws\_lakeformation\_permissions](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/lakeformation_permissions) * [terraform aws\_lakeformation\_resource\_lf\_tags](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/lakeformation_resource_lf_tags) #### Python models[​](#python-models "Direct link to Python models") The adapter supports Python models using [`spark`](https://docs.aws.amazon.com/athena/latest/ug/notebooks-spark.html). ##### Prerequisites[​](#prerequisites "Direct link to Prerequisites") * A Spark-enabled workgroup created in Athena. * Spark execution role granted access to Athena, Glue and S3. * The Spark workgroup is added to the `~/.dbt/profiles.yml` file and the profile to be used is referenced in `dbt_project.yml`. ##### Spark-specific table configuration[​](#spark-specific-table-configuration "Direct link to Spark-specific table configuration") | Configuration | Default | Description | | ----------------------------- | ------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | `timeout` | 43200 | Time out in seconds for each Python model execution. Defaults to 12 hours/43200 seconds. | | `spark_encryption` | False | When set to `true,` it encrypts data stored locally by Spark and in transit between Spark nodes. | | `spark_cross_account_catalog` | False | When using the Spark Athena workgroup, queries can only be made against catalogs on the same AWS account by default. Setting this parameter to true will enable querying external catalogs if you want to query another catalog on an external AWS account.
Use the syntax `external_catalog_id/database.table` to access the external table on the external catalog (For example, `999999999999/mydatabase.cloudfront_logs` where 999999999999 is the external catalog ID). | | `spark_requester_pays` | False | When set to true, if an Amazon S3 bucket is configured as `requester pays`, the user account running the query is charged for data access and data transfer fees associated with the query. | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | ##### Spark notes[​](#spark-notes "Direct link to Spark notes") * A session is created for each unique engine configuration defined in the models that are part of the invocation. A session's idle timeout is set to 10 minutes. Within the timeout period, if a new calculation (Spark Python model) is ready for execution and the engine configuration matches, the process will reuse the same session. * The number of Python models running simultaneously depends on the `threads`. The number of sessions created for the entire run depends on the number of unique engine configurations and the availability of sessions to maintain thread concurrency. * For Iceberg tables, it's recommended to use the `table_properties` configuration to set the `format_version` to `2`. This helps maintain compatibility between the Iceberg tables Trino created and those Spark created. ##### Example models[​](#example-models "Direct link to Example models") * Simple pandas * Simple Spark * Spark incremental * Config Spark model * PySpark UDF ```python import pandas as pd def model(dbt, session): dbt.config(materialized="table") model_df = pd.DataFrame({"A": [1, 2, 3, 4]}) return model_df ``` ```python def model(dbt, spark_session): dbt.config(materialized="table") data = [(1,), (2,), (3,), (4,)] df = spark_session.createDataFrame(data, ["A"]) return df ``` ```python def model(dbt, spark_session): dbt.config(materialized="incremental") df = dbt.ref("model") if dbt.is_incremental: max_from_this = ( f"select max(run_date) from {dbt.this.schema}.{dbt.this.identifier}" ) df = df.filter(df.run_date >= spark_session.sql(max_from_this).collect()[0][0]) return df ``` ```python def model(dbt, spark_session): dbt.config( materialized="table", engine_config={ "CoordinatorDpuSize": 1, "MaxConcurrentDpus": 3, "DefaultExecutorDpuSize": 1 }, spark_encryption=True, spark_cross_account_catalog=True, spark_requester_pays=True polling_interval=15, timeout=120, ) data = [(1,), (2,), (3,), (4,)] df = spark_session.createDataFrame(data, ["A"]) return df ``` Using imported external python files: ```python def model(dbt, spark_session): dbt.config( materialized="incremental", incremental_strategy="merge", unique_key="num", ) sc = spark_session.sparkContext sc.addPyFile("s3://athena-dbt/test/file1.py") sc.addPyFile("s3://athena-dbt/test/file2.py") def func(iterator): from file2 import transform return [transform(i) for i in iterator] from pyspark.sql.functions import udf from pyspark.sql.functions import col udf_with_import = udf(func) data = [(1, "a"), (2, "b"), (3, "c")] cols = ["num", "alpha"] df = spark_session.createDataFrame(data, cols) return df.withColumn("udf_test_col", udf_with_import(col("alpha"))) ``` ##### Known issues in Python models[​](#known-issues-in-python-models "Direct link to Known issues in Python models") * Python models can't [reference Athena SQL views](https://docs.aws.amazon.com/athena/latest/ug/notebooks-spark.html). * You can use third-party Python libraries; however, they must be [included in the pre-installed list](https://docs.aws.amazon.com/athena/latest/ug/notebooks-spark-preinstalled-python-libraries.html) or [imported manually](https://docs.aws.amazon.com/athena/latest/ug/notebooks-import-files-libraries.html). * Python models can only reference or write to tables with names matching the regular expression: `^[0-9a-zA-Z_]+$`. Spark doesn't support dashes or special characters, even though Athena supports them. * Incremental models don't fully utilize Spark capabilities. They depend partially on existing SQL-based logic that runs on Trino. * Snapshot materializations are not supported. * Spark can only reference tables within the same catalog. * For tables created outside of the dbt tool, be sure to populate the location field, or dbt will throw an error when creating the table. #### Contracts[​](#contracts "Direct link to Contracts") The adapter partly supports contract definitions: * `data_type` is supported but needs to be adjusted for complex types. Types must be specified entirely (for example, `array`) even though they won't be checked. Indeed, as dbt recommends, we only compare the broader type (array, map, int, varchar). The complete definition is used to check that the data types defined in Athena are ok (pre-flight check). * The adapter does not support the constraints since Athena has no constraint concept. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Amazon Redshift adapter behavior changes The following are the current [behavior change flags](https://docs.getdbt.com/reference/global-configs/behavior-changes.md#behavior-change-flags) that are specific to `dbt-redshift`: | Flag | `dbt-redshift`: Intro | `dbt-redshift`: Maturity | Status | | ---------------------------------------------------------------------------------------------------------- | --------------------- | ------------------------ | ------ | | [`redshift_skip_autocommit_transaction_statements`](#redshift_skip_autocommit_transaction_statements-flag) | 1.12.0 | TBD | Active | | | | - | #### `redshift_skip_autocommit_transaction_statements` flag[​](#redshift_skip_autocommit_transaction_statements-flag "Direct link to redshift_skip_autocommit_transaction_statements-flag") The `redshift_skip_autocommit_transaction_statements` flag is `True` by default. When `autocommit=True` (the default since `dbt-redshift 1.5`), each statement is automatically committed by the driver. Previously, dbt still sent explicit `BEGIN` / `COMMIT` / `ROLLBACK` statements, which were unnecessary and added extra round trips to Redshift. With the `redshift_skip_autocommit_transaction_statements` flag enabled, dbt skips sending transaction management statements when you enable autocommit, reducing unnecessary round trips and improving performance. ###### Key behaviors[​](#key-behaviors "Direct link to Key behaviors") When both the flag and autocommit are `True`: * `begin()` skips sending `BEGIN` * `commit()` skips sending `COMMIT` * `rollback_if_open()` skips sending `ROLLBACK` dbt still maintains its internal `transaction_open` state to preserve compatibility with dbt’s transaction tracking, even when actual statements are skipped. ##### Preserving legacy behavior[​](#preserving-legacy-behavior "Direct link to Preserving legacy behavior") To preserve the legacy behavior of sending `BEGIN`/`COMMIT`/`ROLLBACK` statements even when autocommit is enabled, set the flag to `False` in your `dbt_project.yml`: dbt\_project.yml ```yaml flags: redshift_skip_autocommit_transaction_statements: false ``` ##### Backward compatibility[​](#backward-compatibility "Direct link to Backward compatibility") * **`autocommit=False`**: Unchanged. Explicit transactions still work as before regardless of this flag. * **`autocommit=True` with flag (default)**: Skips unnecessary transaction statements for better performance. * **`autocommit=True` without flag**: Sends `BEGIN`/`COMMIT`/`ROLLBACK` (legacy behavior). #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Analysis properties We recommend you define analysis properties in your `analyses/` directory, which is illustrated in the [`analysis-paths`](https://docs.getdbt.com/reference/project-configs/analysis-paths.md) configuration. Analysis properties are "special properties" in that you can't configure them in the `dbt_project.yml` file or using `config()` blocks. Refer to [Configs and properties](https://docs.getdbt.com/reference/define-properties#which-properties-are-not-also-configs) for more info.
You can name these files `whatever_you_want.yml`, and nest them arbitrarily deeply in subfolders within the `analyses/` or `models/` directory. analyses/\.yml ```yml analyses: - name: # required description: config: docs: # changed to config in v1.10 show: true | false node_color: # Use name (such as node_color: purple) or hex code with quotes (such as node_color: "#cd7f32") tags: | [] columns: - name: description: - name: ... # declare properties of additional columns - name: ... # declare properties of additional analyses ``` #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### analysis-paths dbt\_project.yml ```yml analysis-paths: [directorypath] ``` #### Definition[​](#definition "Direct link to Definition") Specify a custom list of directories where [analyses](https://docs.getdbt.com/docs/build/analyses.md) are located. #### Default[​](#default "Direct link to Default") Without specifying this config, dbt will not compile any `.sql` files as analyses. However, the [`dbt init` command](https://docs.getdbt.com/reference/commands/init.md) populates this value as `analyses` ([source](https://github.com/dbt-labs/dbt-starter-project/blob/HEAD/dbt_project.yml#L15)). Paths specified in `analysis-paths` must be relative to the location of your `dbt_project.yml` file. Avoid using absolute paths like `/Users/username/project/analyses`, as it will lead to unexpected behavior and outcomes. * ✅ **Do** * Use relative path: ```yml analysis-paths: ["analyses"] ``` * ❌ **Don't** * Avoid absolute paths: ```yml analysis-paths: ["/Users/username/project/analyses"] ``` #### Examples[​](#examples "Direct link to Examples") ##### Use a subdirectory named `analyses`[​](#use-a-subdirectory-named-analyses "Direct link to use-a-subdirectory-named-analyses") This is the value populated by the [`dbt init` command](https://docs.getdbt.com/reference/commands/init.md). dbt\_project.yml ```yml analysis-paths: ["analyses"] ``` ##### Use a subdirectory named `custom_analyses`[​](#use-a-subdirectory-named-custom_analyses "Direct link to use-a-subdirectory-named-custom_analyses") dbt\_project.yml ```yml analysis-paths: ["custom_analyses"] ``` #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### anchors #### Definition[​](#definition "Direct link to Definition") Anchors are a [YAML feature](https://yaml.org/spec/1.2.2/#692-node-anchors) that let you reuse configuration blocks inside a single YAML file. In dbt Core v1.10, the `anchors:` key was introduced to enclose configuration fragments that aren't valid on their own or that only exist as template data. Using the `anchors:` key ensures these fragments won't be rejected during file validation. In dbt Core v1.10 and higher, invalid anchors trigger a warning. In the dbt Fusion engine, these invalid anchors will result in errors when Fusion leaves beta. note You can define anchors in dbt Core v1.9 and earlier, but there is no dedicated location for anchors in these versions. If you need to define a standalone anchor, you can put it at the top level of your YAML file. #### YAML anchor syntax[​](#yaml-anchor-syntax "Direct link to YAML anchor syntax") ##### Anchors and aliases[​](#anchors-and-aliases "Direct link to Anchors and aliases") To define a YAML anchor, add an `anchors:` block in your YAML file and use the `&` symbol in front of the anchor's name (for example, `&id_column_alias`). This creates an alias which you can reference elsewhere by prefixing the alias with a `*` character. The following example creates an anchor whose alias is `*id_column_alias`. The `id` column, its description, data type, and data tests are all applied to `my_first_model`, `my_second_model`, and `my_third_model`. models/\_models.yml ```yml anchors: - &id_column_alias name: id description: This is a unique identifier. data_type: int data_tests: - not_null - unique models: - name: my_first_model columns: - *id_column_alias - name: unrelated_column_a description: This column is not repeated in other models. - name: unrelated_column_b - name: my_second_model columns: - *id_column_alias - name: unrelated_column_c - name: my_third_model columns: - *id_column_alias - name: unrelated_column_d ``` [![Behind the scenes, the alias is replaced with the object defined by the anchor.](/img/reference/resource-properties/anchor_example_expansion.png?v=2 "Behind the scenes, the alias is replaced with the object defined by the anchor.")](#)Behind the scenes, the alias is replaced with the object defined by the anchor. ##### Merge syntax[​](#merge-syntax "Direct link to Merge syntax") Sometimes, an anchor is mostly the same but one part needs to be overridden. When the anchor refers to a dictionary/mapping (not a list or a scalar value), you can use the `<<:` merge syntax to override an already-defined key, or add extra keys to the dictionary. For example: models/\_models.yml ```yml anchors: - &id_column_alias name: id description: This is a unique identifier. data_type: int data_tests: - not_null - unique - &source_template_alias database: RAW loader: fivetran config: freshness: warn_after: {count: 1, period: day} models: - name: my_first_model columns: - *id_column_alias # brings in the full anchor defined above - name: unrelated_column_a description: This column is not repeated in other models. - name: unrelated_column_b - name: my_second_model columns: - <<: *id_column_alias data_type: bigint # overrides the data_type from int to bigint, while inheriting the name, description, and data tests - name: unrelated_column_c - name: my_third_model columns: - <<: *id_column_alias config: meta: extra_key: extra_value # adds config.meta.extra_key to just this version of the id column, in addition to the name, description, data type, and data tests - name: unrelated_column_d sources: # both sources start with their database, loader, and freshness expectations set from the anchor, and merge in additional keys - <<: *source_template_alias name: salesforce schema: etl_salesforce_schema tables: - name: opportunities - name: users - <<: *source_template_alias name: hubspot schema: etl_hubspot_schema tables: - name: contacts ``` #### Usage notes[​](#usage-notes "Direct link to Usage notes") * Old versions of dbt Core (v1.9 and earlier) do not have a dedicated `anchors:` key. If you need to define a standalone anchor, you can leave it at the top level of your file. * You can't merge additional elements into a list which was defined as an anchor. For example, if you define an anchor containing multiple columns, you can't attach extra columns to the end of the list. Instead, define each column as an individual anchor and add each one to the relevant tables. * You do not need to move existing anchors to the `anchors:` key if they are already defined in a larger valid YAML object. For example, the following `&customer_id_tests` anchor does not need to be moved because it is a valid part of the existing `columns` block. ```yml models: - name: my_first_model columns: - name: customer_id tests: &customer_id_tests - not_null - unique - name: order_id tests: *customer_id_tests ``` #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Anonymous usage stats dbt Labs is on a mission to build the best version of dbt possible, and a crucial part of that is understanding how users work with dbt. To this end, we've added some simple event tracking (or telemetry) to dbt using Snowplow. Importantly, we do not track credentials, raw model contents, or model names: we consider these private, and frankly none of our business. The data we collect is used for use cases such as industry identification, use-case research, improvements of sales, marketing, product features, and services. Telemetry allows users to seamlessly contribute to the continuous improvement of dbt, enabling us to better serve the data community. Usage statistics are fired when dbt is invoked and when models are run. These events contain basic platform information (OS + Python version) and metadata such as: * Whether the invocation succeeded. * How long it took. * An anonymized hash key representing the raw model content. * Number of nodes that were run. For full transparency, you can see all the event definitions in [`tracking.py`](https://github.com/dbt-labs/dbt-core/blob/HEAD/core/dbt/tracking.py). * dbt has telemetry enabled by default to help us enhance the user experience and improve the product by using real user feedback and usage patterns. While it cannot be disabled, we ensure the data is [secure](https://www.getdbt.com/security) and used responsibly. Collecting this data enables us to provide a better product experience, including improvements to the performance of dbt. * dbt Core users have telemetry enabled by default to help us understand usage patterns and improve the product. You can opt out of event tracking at any time by adding the following to your `dbt_project.yml` file: dbt\_project.yml ```yaml flags: send_anonymous_usage_stats: false ``` dbt Core users can also use the `DO_NOT_TRACK` environment variable to enable or disable sending anonymous data. For more information, see [Environment variables](https://docs.getdbt.com/docs/build/environment-variables.md). `DO_NOT_TRACK=1` is the same as `DBT_SEND_ANONYMOUS_USAGE_STATS=False` #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Apache Spark configurations If you're using Databricks, use `dbt-databricks` If you're using Databricks, the `dbt-databricks` adapter is recommended over `dbt-spark`. If you're still using dbt-spark with Databricks consider [migrating from the dbt-spark adapter to the dbt-databricks adapter](https://docs.getdbt.com/guides/migrate-from-spark-to-databricks.md). For the Databricks version of this page, refer to [Databricks setup](#databricks-setup). note See [Databricks configuration](#databricks-configs) for the Databricks version of this page. #### Configuring tables[​](#configuring-tables "Direct link to Configuring tables") When materializing a model as `table`, you may include several optional configs that are specific to the dbt-spark plugin, in addition to the standard [model configs](https://docs.getdbt.com/reference/model-configs.md). | Option | Description | Required? | Example | | -------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | --------------------------------------- | -------------------------------------------------------------------------------------------------- | | file\_format | The file format to use when creating tables (`parquet`, `delta`, `iceberg`, `hudi`, `csv`, `json`, `text`, `jdbc`, `orc`, `hive` or `libsvm`). | Optional | `parquet` | | location\_root [1](#user-content-fn-1) | The created table uses the specified directory to store its data. The table alias is appended to it. | Optional | `/mnt/root` | | partition\_by | Partition the created table by the specified columns. A directory is created for each partition. | Optional | `date_day` | | clustered\_by | Each partition in the created table will be split into a fixed number of buckets by the specified columns. | Optional | `country_code` | | buckets | The number of buckets to create while clustering | Required if `clustered_by` is specified | `8` | | tblproperties | The table properties configure table behavior. Properties differ depending on the file format, see reference docs ([Iceberg](https://iceberg.apache.org/docs/latest/configuration/#table-properties), [Parquet](https://spark.apache.org/docs/3.5.4/sql-data-sources-parquet.html#data-source-option), [Delta](https://docs.databricks.com/aws/en/delta/table-properties#delta-table-properties), [Hudi](https://hudi.apache.org/docs/sql_ddl/#table-properties)). | Optional | `# Iceberg Example tblproperties:   read.split.target-size: 268435456   commit.retry.num-retries: 10` | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | #### Incremental models[​](#incremental-models "Direct link to Incremental models") dbt seeks to offer useful, intuitive modeling abstractions by means of its built-in configurations and materializations. Because there is so much variance between Apache Spark clusters out in the world—not to mention the powerful features offered to Databricks users by the Delta file format and custom runtime—making sense of all the available options is an undertaking in its own right. Alternatively, you can use Apache Iceberg or Apache Hudi file format with Apache Spark runtime for building incremental models. For that reason, the dbt-spark plugin leans heavily on the [`incremental_strategy` config](https://docs.getdbt.com/docs/build/incremental-strategy.md). This config tells the incremental materialization how to build models in runs beyond their first. It can be set to one of three values: * **`append`** (default): Insert new records without updating or overwriting any existing data. * **`insert_overwrite`**: If `partition_by` is specified, overwrite partitions in the table with new data. If no `partition_by` is specified, overwrite the entire table with new data. * **`merge`** (Delta, Iceberg and Hudi file format only): Match records based on a `unique_key`; update old records, insert new ones. (If no `unique_key` is specified, all new data is inserted, similar to `append`.) * `microbatch` Implements the [microbatch strategy](https://docs.getdbt.com/docs/build/incremental-microbatch.md) using `event_time` to define time-based ranges for filtering data. Each of these strategies has its pros and cons, which we'll discuss below. As with any model config, `incremental_strategy` may be specified in `dbt_project.yml` or within a model file's `config()` block. ##### The `append` strategy[​](#the-append-strategy "Direct link to the-append-strategy") Following the `append` strategy, dbt will perform an `insert into` statement with all new data. The appeal of this strategy is that it is straightforward and functional across all platforms, file types, connection methods, and Apache Spark versions. However, this strategy *cannot* update, overwrite, or delete existing data, so it is likely to insert duplicate records for many data sources. Specifying `append` as the incremental strategy is optional, since it's the default strategy used when none is specified. * Source code * Run code spark\_incremental.sql ```sql {{ config( materialized='incremental', incremental_strategy='append', ) }} -- All rows returned by this query will be appended to the existing table select * from {{ ref('events') }} {% if is_incremental() %} where event_ts > (select max(event_ts) from {{ this }}) {% endif %} ``` spark\_incremental.sql ```sql create temporary view spark_incremental__dbt_tmp as select * from analytics.events where event_ts >= (select max(event_ts) from {{ this }}) ; insert into table analytics.spark_incremental select `date_day`, `users` from spark_incremental__dbt_tmp ``` ##### The `insert_overwrite` strategy[​](#the-insert_overwrite-strategy "Direct link to the-insert_overwrite-strategy") This strategy is most effective when specified alongside a `partition_by` clause in your model config. dbt will run an [atomic `insert overwrite` statement](https://downloads.apache.org/spark/docs/3.0.0/sql-ref-syntax-dml-insert-overwrite-table.html) that dynamically replaces all partitions included in your query. Be sure to re-select *all* of the relevant data for a partition when using this incremental strategy. If no `partition_by` is specified, then the `insert_overwrite` strategy will atomically replace all contents of the table, overriding all existing data with only the new records. The column schema of the table remains the same, however. This can be desirable in some limited circumstances, since it minimizes downtime while the table contents are overwritten. The operation is comparable to running `truncate` + `insert` on other databases. For atomic replacement of Delta-formatted tables, use the `table` materialization (which runs `create or replace`) instead. **Usage notes:** * This strategy is not supported for tables with `file_format: delta`. * This strategy is not available when connecting via Databricks SQL endpoints (`method: odbc` + `endpoint`). * If connecting via a Databricks cluster + ODBC driver (`method: odbc` + `cluster`), you **must** include `set spark.sql.sources.partitionOverwriteMode DYNAMIC` in the [cluster Spark Config](https://docs.databricks.com/clusters/configure.html#spark-config) in order for dynamic partition replacement to work (`incremental_strategy: insert_overwrite` + `partition_by`). [![Databricks cluster: Spark Config](/img/reference/databricks-cluster-sparkconfig-partition-overwrite.png?v=2 "Databricks cluster: Spark Config")](#)Databricks cluster: Spark Config * Source code * Run code spark\_incremental.sql ```sql {{ config( materialized='incremental', partition_by=['date_day'], file_format='parquet', incremental_strategy='insert_overwrite' ) }} /* Every partition returned by this query will be overwritten when this model runs */ with new_events as ( select * from {{ ref('events') }} {% if is_incremental() %} where date_day >= date_add(current_date, -1) {% endif %} ) select date_day, count(*) as users from events group by 1 ``` spark\_incremental.sql ```sql create temporary view spark_incremental__dbt_tmp as with new_events as ( select * from analytics.events where date_day >= date_add(current_date, -1) ) select date_day, count(*) as users from events group by 1 ; insert overwrite table analytics.spark_incremental partition (date_day) select `date_day`, `users` from spark_incremental__dbt_tmp ``` ##### The `merge` strategy[​](#the-merge-strategy "Direct link to the-merge-strategy") **Usage notes:** The `merge` incremental strategy requires: * `file_format: delta, iceberg or hudi` * Databricks Runtime 5.1 and above for delta file format * Apache Spark for Iceberg or Hudi file format dbt will run an [atomic `merge` statement](https://docs.databricks.com/spark/latest/spark-sql/language-manual/merge-into.html) which looks nearly identical to the default merge behavior on Snowflake and BigQuery. If a `unique_key` is specified (recommended), dbt will update old records with values from new records that match on the key column. If a `unique_key` is not specified, dbt will forgo match criteria and simply insert all new records (similar to `append` strategy). * Source code * Run code merge\_incremental.sql ```sql {{ config( materialized='incremental', file_format='delta', # or 'iceberg' or 'hudi' unique_key='user_id', incremental_strategy='merge' ) }} with new_events as ( select * from {{ ref('events') }} {% if is_incremental() %} where date_day >= date_add(current_date, -1) {% endif %} ) select user_id, max(date_day) as last_seen from events group by 1 ``` target/run/merge\_incremental.sql ```sql create temporary view merge_incremental__dbt_tmp as with new_events as ( select * from analytics.events where date_day >= date_add(current_date, -1) ) select user_id, max(date_day) as last_seen from events group by 1 ; merge into analytics.merge_incremental as DBT_INTERNAL_DEST using merge_incremental__dbt_tmp as DBT_INTERNAL_SOURCE on DBT_INTERNAL_SOURCE.user_id = DBT_INTERNAL_DEST.user_id when matched then update set * when not matched then insert * ``` #### Persisting model descriptions[​](#persisting-model-descriptions "Direct link to Persisting model descriptions") Relation-level docs persistence is supported in dbt. For more information on configuring docs persistence, see [the docs](https://docs.getdbt.com/reference/resource-configs/persist_docs.md). When the `persist_docs` option is configured appropriately, you'll be able to see model descriptions in the `Comment` field of `describe [table] extended` or `show table extended in [database] like '*'`. #### Always `schema`, never `database`[​](#always-schema-never-database "Direct link to always-schema-never-database") Apache Spark uses the terms "schema" and "database" interchangeably. dbt understands `database` to exist at a higher level than `schema`. As such, you should *never* use or set `database` as a node config or in the target profile when running dbt-spark. If you want to control the schema/database in which dbt will materialize models, use the `schema` config and `generate_schema_name` macro *only*. #### Default file format configurations[​](#default-file-format-configurations "Direct link to Default file format configurations") To access advanced incremental strategies features, such as [snapshots](https://docs.getdbt.com/docs/build/snapshots.md) and the `merge` incremental strategy, you will want to use the Delta, Iceberg or Hudi file format as the default file format when materializing models as tables. It's quite convenient to do this by setting a top-level configuration in your project file: dbt\_project.yml ```yml models: +file_format: delta # or iceberg or hudi seeds: +file_format: delta # or iceberg or hudi snapshots: +file_format: delta # or iceberg or hudi ``` #### Footnotes[​](#footnote-label "Direct link to Footnotes") 1. If you configure `location_root`, dbt specifies a location path in the `create table` statement. This changes the table from "managed" to "external" in Spark/Databricks. [↩](#user-content-fnref-1) #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### arguments (for functions) 💡Did you know... Available from dbt v 1.11 or with the [dbt "Latest" release track](https://docs.getdbt.com/docs/dbt-versions/cloud-release-tracks.md). functions/\.yml ```yml functions: - name: arguments: - name: data_type: # warehouse-specific description: default_value: # optional, available in Snowflake and Postgres ``` #### Definition[​](#definition "Direct link to Definition") The `arguments` property is used to define the parameters that a resource can accept. Each argument can have a `name`, a `data_type` field, and optional properties such as `description` and `default_value`. For **functions**, you can add `arguments` to a [function property](https://docs.getdbt.com/reference/function-properties.md), which defines the parameters for user-defined functions (UDFs) in your warehouse. The `data_type` for function arguments is warehouse-specific (for example, `STRING`, `VARCHAR`, `INTEGER`) and should match the data types supported by your data platform. #### Properties[​](#properties "Direct link to Properties") ##### name[​](#name "Direct link to name") The name of the argument. This is a required field if `arguments` is specified. ##### data\_type[​](#data_type "Direct link to data_type") The data type that the warehouse expects for this parameter. This is a required field if `arguments` is specified and must match the data types supported by your specific data platform. Warehouse-specific data types The `data_type` values are warehouse-specific. Use the data type syntax that your warehouse requires: * **Snowflake**: `STRING`, `NUMBER`, `BOOLEAN`, `TIMESTAMP_NTZ`, etc. * **BigQuery**: `STRING`, `INT64`, `BOOL`, `TIMESTAMP`, `ARRAY`, etc. * **Redshift**: `VARCHAR`, `INTEGER`, `BOOLEAN`, `TIMESTAMP`, etc. * **Postgres**: `TEXT`, `INTEGER`, `BOOLEAN`, `TIMESTAMP`, etc. Refer to your warehouse documentation for the complete list of supported data types. ##### description[​](#description "Direct link to description") An optional markdown string describing the argument. This is helpful for documentation purposes. ##### default\_value[​](#default_value "Direct link to default_value") Use the `default_value` property to make a function argument optional. * When an argument isn't defined with a `default_value`, it becomes a required argument, and you must pass a value for them when you use the function. If a required argument isn’t passed, the function call fails. * Arguments with a `default_value` are optional — if you don't pass a value for the argument, the warehouse uses the value you set in `default_value`. This property is supported in [Snowflake](https://docs.snowflake.com/en/developer-guide/udf-stored-procedure-arguments#designating-an-argument-as-optional) and [Postgres](https://www.postgresql.org/docs/current/sql-createfunction.html). When you use `default_value`, the order of your arguments matter. Any required arguments (those without default values) have to come before optional ones. Here's an example with the correct order: functions/schema.yml ```yml functions: - name: sum_2_values description: Add two values together arguments: - name: val1 # this argument comes first because it has no default value data_type: integer description: The first value - name: val2 data_type: integer description: The second value default_value: 0 returns: data_type: integer ``` In this example: * `val1` has no `default_value`, so it’s required. * `val2` has a `default_value` of `0`, so it’s optional. If you don’t provide a value for `val2`, the function uses `0` instead. See the following examples of calling the `sum_2_values` function: ```text sum_2_values(5) # val1 = 5, val2 = 0 (default value used since user did not specify val2) sum_2_values(5, 10) # val1 = 5, val2 = 10 sum_2_values() # ❌ error: val1 is required and must be passed ``` #### Examples[​](#examples "Direct link to Examples") ##### Simple function arguments[​](#simple-function-arguments "Direct link to Simple function arguments") functions/schema.yml ```yml functions: - name: is_positive_int arguments: - name: a_string data_type: string description: "The string that I want to check if it's representing a positive integer (like '10')" returns: data_type: boolean ``` ##### Complex data types[​](#complex-data-types "Direct link to Complex data types") functions/schema.yml ```yml functions: - name: calculate_discount arguments: - name: original_price data_type: DECIMAL(10,2) description: "The original price before discount" - name: discount_percent data_type: INTEGER description: "The discount percentage to apply" returns: data_type: DECIMAL(10,2) description: "The discounted price" ``` ##### Array data types (BigQuery example)[​](#array-data-types-bigquery-example "Direct link to Array data types (BigQuery example)") functions/schema.yml ```yml functions: - name: get_tags arguments: - name: tag_string data_type: STRING description: "Comma-separated string of tags" returns: data_type: ARRAY description: "An array of individual tag strings" ``` #### Related documentation[​](#related-documentation "Direct link to Related documentation") * [Function properties](https://docs.getdbt.com/reference/function-properties.md) * [Function configurations](https://docs.getdbt.com/reference/function-configs.md) * [Arguments (for macros)](https://docs.getdbt.com/reference/resource-properties/arguments.md) * [Returns](https://docs.getdbt.com/reference/resource-properties/returns.md) #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### arguments (for macros) macros/\.yml ```yml macros: - name: arguments: - name: type: description: ``` #### Definition[​](#definition "Direct link to Definition") The `arguments` property is used to define the parameters that a resource can accept. Each argument can have a `name`, a `type` field, and an optional `description`. For **macros**, you can add `arguments` to a [macro property](https://docs.getdbt.com/reference/macro-properties.md), which helps in documenting the macro and understanding what inputs it requires. #### type[​](#type "Direct link to type") tip From dbt Core v1.10, you can opt into validating the arguments you define in macro documentation using the `validate_macro_args` behavior change flag. When enabled, dbt will: * Infer arguments from the macro and includes them in the [manifest.json](https://docs.getdbt.com/reference/artifacts/manifest-json.md) file if no arguments are documented. * Raise a warning if documented argument names don't match the macro definition. * Raise a warning if `type` fields don't follow [supported formats](https://docs.getdbt.com/reference/resource-properties/arguments.md#supported-types). Learn more about [macro argument validation](https://docs.getdbt.com/reference/global-configs/behavior-changes.md#macro-argument-validation). macros/\.yml ```yml macros: - name: arguments: - name: type: ``` ##### Supported types[​](#supported-types "Direct link to Supported types") From dbt Core v1.10, when you use the [`validate_macro_args`](https://docs.getdbt.com/reference/global-configs/behavior-changes.md#macro-argument-validation) flag, dbt supports the following types for macro arguments: * `string` or `str` * `boolean` or `bool` * `integer` or `int` * `float` * `any` * `list[]`, for example, `list[string]` * `dict[, ]`, for example, `dict[str, list[int]]` * `optional[]`, for example, `optional[integer]` * [`relation`](https://docs.getdbt.com/reference/dbt-classes.md#relation) * [`column`](https://docs.getdbt.com/reference/dbt-classes.md#column) Note that the types follow a Python-like style but are used for documentation and validation only. They are not Python types. #### Examples[​](#examples "Direct link to Examples") macros/cents\_to\_dollars.sql ```sql {% macro cents_to_dollars(column_name, scale=2) %} ({{ column_name }} / 100)::numeric(16, {{ scale }}) {% endmacro %} ``` macros/cents\_to\_dollars.yml ```yml macros: - name: cents_to_dollars arguments: - name: column_name type: column description: "The name of a column" - name: scale type: integer description: "The number of decimal places to round to. Default is 2." ``` #### Related documentation[​](#related-documentation "Direct link to Related documentation") * [Macro properties](https://docs.getdbt.com/reference/macro-properties.md) * [Arguments (for functions)](https://docs.getdbt.com/reference/resource-properties/function-arguments.md) #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### asset-paths dbt\_project.yml ```yml asset-paths: [directorypath] ``` #### Definition[​](#definition "Direct link to Definition") Optionally specify a custom list of directories to copy to the `target` directory as part of the `docs generate` command. This is useful for rendering images in your repository in your project documentation. #### Default[​](#default "Direct link to Default") By default, dbt will not copy any additional files as part of docs generate. For example, `asset-paths: []`. Paths specified in `asset-paths` must be relative to the location of your `dbt_project.yml` file. Avoid using absolute paths like `/Users/username/project/assets`, as it will lead to unexpected behavior and outcomes. * ✅ **Do** * Use relative path: ```yml asset-paths: ["assets"] ``` * ❌ **Don't** * Avoid absolute paths: ```yml asset-paths: ["/Users/username/project/assets"] ``` #### Examples[​](#examples "Direct link to Examples") ##### Compile files in the `assets` subdirectory as part of `docs generate`[​](#compile-files-in-the-assets-subdirectory-as-part-of-docs-generate "Direct link to compile-files-in-the-assets-subdirectory-as-part-of-docs-generate") dbt\_project.yml ```yml asset-paths: ["assets"] ``` Any files included in this directory will be copied to the `target/` directory as part of `dbt docs generate`, making them accessible as images in your project documentation. Check out the full writeup on including images in your descriptions [here](https://docs.getdbt.com/reference/resource-properties/description.md#include-an-image-from-your-repo-in-your-descriptions). #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### AWS Glue configurations #### Configuring tables[​](#configuring-tables "Direct link to Configuring tables") When materializing a model as `table`, you may include several optional configs that are specific to the dbt-glue plugin, in addition to the [Apache Spark model configuration](https://docs.getdbt.com/reference/resource-configs/spark-configs.md#configuring-tables). | Option | Description | Required? | Example | | ---------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | --------- | ---------------------------------- | | custom\_location | By default, the adapter will store your data in the following path: `location path`/`database`/`table`. If you don't want to follow that default behaviour, you can use this parameter to set your own custom location on S3 | No | `s3://mycustombucket/mycustompath` | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | #### Incremental models[​](#incremental-models "Direct link to Incremental models") dbt seeks to offer useful, intuitive modeling abstractions by means of its built-in configurations and materializations. For that reason, the dbt-glue plugin leans heavily on the [`incremental_strategy` config](https://docs.getdbt.com/docs/build/incremental-strategy.md). This config tells the incremental materialization how to build models in runs beyond their first. It can be set to one of three values: * **`append`** (default): Insert new records without updating or overwriting any existing data. * **`insert_overwrite`**: If `partition_by` is specified, overwrite partitions in the table with new data. If no `partition_by` is specified, overwrite the entire table with new data. * **`merge`** (Apache Hudi only): Match records based on a `unique_key`; update old records, insert new ones. (If no `unique_key` is specified, all new data is inserted, similar to `append`.) Each of these strategies has its pros and cons, which we'll discuss below. As with any model config, `incremental_strategy` may be specified in `dbt_project.yml` or within a model file's `config()` block. **Notes:** The default strategie is **`insert_overwrite`** ##### The `append` strategy[​](#the-append-strategy "Direct link to the-append-strategy") Following the `append` strategy, dbt will perform an `insert into` statement with all new data. The appeal of this strategy is that it is straightforward and functional across all platforms, file types, connection methods, and Apache Spark versions. However, this strategy *cannot* update, overwrite, or delete existing data, so it is likely to insert duplicate records for many data sources. * Source code * Run code glue\_incremental.sql ```sql {{ config( materialized='incremental', incremental_strategy='append', ) }} -- All rows returned by this query will be appended to the existing table select * from {{ ref('events') }} {% if is_incremental() %} where event_ts > (select max(event_ts) from {{ this }}) {% endif %} ``` glue\_incremental.sql ```sql create view spark_incremental__dbt_tmp as select * from analytics.events where event_ts >= (select max(event_ts) from {{ this }}) ; insert into table analytics.spark_incremental select `date_day`, `users` from spark_incremental__dbt_tmp ``` ; drop view spark\_incremental\_\_dbt\_tmp ##### The `insert_overwrite` strategy[​](#the-insert_overwrite-strategy "Direct link to the-insert_overwrite-strategy") This strategy is most effective when specified alongside a `partition_by` clause in your model config. dbt will run an [atomic `insert overwrite` statement](https://spark.apache.org/docs/3.1.2/sql-ref-syntax-dml-insert-overwrite-table.html) that dynamically replaces all partitions included in your query. Be sure to re-select *all* of the relevant data for a partition when using this incremental strategy. If no `partition_by` is specified, then the `insert_overwrite` strategy will atomically replace all contents of the table, overriding all existing data with only the new records. The column schema of the table remains the same, however. This can be desirable in some limited circumstances, since it minimizes downtime while the table contents are overwritten. The operation is comparable to running `truncate` + `insert` on other databases. For atomic replacement of Delta-formatted tables, use the `table` materialization (which runs `create or replace`) instead. * Source code * Run code spark\_incremental.sql ```sql {{ config( materialized='incremental', partition_by=['date_day'], file_format='parquet' ) }} /* Every partition returned by this query will be overwritten when this model runs */ with new_events as ( select * from {{ ref('events') }} {% if is_incremental() %} where date_day >= date_add(current_date, -1) {% endif %} ) select date_day, count(*) as users from events group by 1 ``` spark\_incremental.sql ```sql create view spark_incremental__dbt_tmp as with new_events as ( select * from analytics.events where date_day >= date_add(current_date, -1) ) select date_day, count(*) as users from events group by 1 ; insert overwrite table analytics.spark_incremental partition (date_day) select `date_day`, `users` from spark_incremental__dbt_tmp ; drop view spark_incremental__dbt_tmp ``` Specifying `insert_overwrite` as the incremental strategy is optional, since it's the default strategy used when none is specified. ##### The `merge` strategy[​](#the-merge-strategy "Direct link to the-merge-strategy") **Usage notes:** The `merge` incremental strategy requires: * `file_format: hudi` * AWS Glue runtime 2 with hudi libraries as extra jars You can add hudi libraries as extra jars in the classpath using extra\_jars options in your profiles.yml. Here is an example: ```yml extra_jars: "s3://dbt-glue-hudi/Dependencies/hudi-spark.jar,s3://dbt-glue-hudi/Dependencies/spark-avro_2.11-2.4.4.jar" ``` dbt will run an [atomic `merge` statement](https://hudi.apache.org/docs/writing_data#spark-datasource-writer) which looks nearly identical to the default merge behavior on Snowflake and BigQuery. If a `unique_key` is specified (recommended), dbt will update old records with values from new records that match on the key column. If a `unique_key` is not specified, dbt will forgo match criteria and simply insert all new records (similar to `append` strategy). * Source code hudi\_incremental.sql ```sql {{ config( materialized='incremental', incremental_strategy='merge', unique_key='user_id', file_format='hudi' ) }} with new_events as ( select * from {{ ref('events') }} {% if is_incremental() %} where date_day >= date_add(current_date, -1) {% endif %} ) select user_id, max(date_day) as last_seen from events group by 1 ``` #### Persisting model descriptions[​](#persisting-model-descriptions "Direct link to Persisting model descriptions") Relation-level docs persistence is inherited from dbt-spark, for more details, check [Apache Spark model configuration](https://docs.getdbt.com/reference/resource-configs/spark-configs.md#persisting-model-descriptions). #### Always `schema`, never `database`[​](#always-schema-never-database "Direct link to always-schema-never-database") This section is also inherited from dbt-spark, for more details, check [Apache Spark model configuration](https://docs.getdbt.com/reference/resource-configs/spark-configs.md#always-schema-never-database). #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### batch_size 💡Did you know... Available from dbt v 1.9 or with the [dbt "Latest" release track](https://docs.getdbt.com/docs/dbt-versions/cloud-release-tracks.md). #### Definition[​](#definition "Direct link to Definition") The `batch_size` config determines how large batches are when running a [microbatch incremental model](https://docs.getdbt.com/docs/build/incremental-microbatch.md). Accepted values are `hour`, `day`, `month`, or `year`. You can configure `batch_size` for a [model](https://docs.getdbt.com/docs/build/models.md) in your project YAML file (`dbt_project.yml`), properties YAML file, or config block. #### Examples[​](#examples "Direct link to Examples") The following examples set `day` as the `batch_size` for the `user_sessions` model. Example of the `batch_size` config in the `dbt_project.yml` file: dbt\_project.yml ```yml models: my_project: user_sessions: +batch_size: day ``` Example in a property file: models/properties.yml ```yml models: - name: user_sessions config: batch_size: day ``` Example in a config block for a model: models/user\_sessions.sql ```sql {{ config( materialized='incremental', batch_size='day' ) }} ``` #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### begin 💡Did you know... Available from dbt v 1.9 or with the [dbt "Latest" release track](https://docs.getdbt.com/docs/dbt-versions/cloud-release-tracks.md). #### Definition[​](#definition "Direct link to Definition") Set the `begin` config to the timestamp value at which your [microbatch incremental model](https://docs.getdbt.com/docs/build/incremental-microbatch.md) data should begin — at the point the data becomes relevant for the microbatch model. You can configure `begin` for a [model](https://docs.getdbt.com/docs/build/models.md) in your project YAML file (`dbt_project.yml`), properties YAML file, or SQL file config. The value for `begin` must be a string representing an ISO-formatted date, *or* date and time, *or* [relative dates](#set-begin-to-use-relative-dates). Check out the [examples](#examples) in the next section for more details. #### Examples[​](#examples "Direct link to Examples") The following examples set `2024-01-01 00:00:00` as the `begin` config for the `user_sessions` model. ###### Example in the `dbt_project.yml` file[​](#example-in-the-dbt_projectyml-file "Direct link to example-in-the-dbt_projectyml-file") dbt\_project.yml ```yml models: my_project: user_sessions: +begin: "2024-01-01 00:00:00" ``` ###### Example in a property YAML file[​](#example-in-a-property-yaml-file "Direct link to Example in a property YAML file") models/properties.yml ```yml models: - name: user_sessions config: begin: "2024-01-01 00:00:00" ``` ###### Example in a SQL config block for a model[​](#example-in-a-sql-config-block-for-a-model "Direct link to Example in a SQL config block for a model") models/user\_sessions.sql ```sql {{ config( begin='2024-01-01 00:00:00' ) }} ``` ###### Set `begin` to use relative dates[​](#set-begin-to-use-relative-dates "Direct link to set-begin-to-use-relative-dates") To configure `begin` to use relative dates, you can use modules variables [`modules.datetime`](https://docs.getdbt.com/reference/dbt-jinja-functions/modules.md#datetime) and [`modules.pytz`](https://docs.getdbt.com/reference/dbt-jinja-functions/modules.md#pytz) to dynamically specify relative timestamps, such as yesterday's date or the start of the current week. For example, to set `begin` to yesterday's date: ```sql {{ config( materialized = 'incremental', incremental_strategy='microbatch', unique_key = 'run_id', begin=(modules.datetime.datetime.now() - modules.datetime.timedelta(1)).isoformat(), event_time='created_at', batch_size='day', ) }} ``` #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Behavior changes Behavior change flags let you control when to adopt new runtime behaviors in dbt. They're configured in your dbt\_project.yml file. How this relates to other changes Since behavior change flags are different from other dbt changes, it's important to understand the difference: * [Deprecation warnings](https://docs.getdbt.com/reference/deprecations.md) — Features in your project code that will stop working (behavior flags often control when these become errors) * [Deprecated CLI flags](https://docs.getdbt.com/docs/dbt-versions/core-upgrade/upgrading-to-fusion.md#deprecated-flags) — Command-line flags being removed in dbt Fusion See the [Changes overview](https://docs.getdbt.com/reference/changes-overview.md) for a quick comparison. If you're upgrading to [dbt Fusion](https://docs.getdbt.com/docs/dbt-versions/core-upgrade/upgrading-to-fusion.md), all behavior change flags are removed and the new behavior is always enabled. Most flags exist to configure runtime behaviors with multiple valid choices. The right choice may vary based on the environment, user preference, or the specific invocation. Another category of flags provides existing projects with a migration window for runtime behaviors that are changing in newer releases of dbt. These flags help us achieve a balance between these goals, which can otherwise be in tension, by: * Providing a better, more sensible, and more consistent default behavior for new users/projects. * Providing a migration window for existing users/projects — nothing changes overnight without warning. * Providing maintainability of dbt software. Every fork in behavior requires additional testing & cognitive overhead that slows future development. These flags exist to facilitate migration from "current" to "better," not to stick around forever. These flags go through three phases of development: 1. **Introduction (disabled by default):** dbt adds logic to support both 'old' and 'new' behaviors. The 'new' behavior is gated behind a flag, disabled by default, preserving the old behavior. 2. **Maturity (enabled by default):** The default value of the flag is switched, from `false` to `true`, enabling the new behavior by default. Users can preserve the 'old' behavior and opt out of the 'new' behavior by setting the flag to `false` in their projects. They may see deprecation warnings when they do so. 3. **Removal (generally enabled):** After marking the flag for deprecation, we remove it along with the 'old' behavior it supported from the dbt codebases. We aim to support most flags indefinitely, but we're not committed to supporting them forever. If we choose to remove a flag, we'll offer significant advance notice. #### What is a behavior change?[​](#what-is-a-behavior-change "Direct link to What is a behavior change?") The same dbt project code and the same dbt commands return one result before the behavior change, and they return a different result after the behavior change. Examples of behavior changes: * dbt begins raising a validation *error* that it didn't previously. * dbt changes the signature of a built-in macro. Your project has a custom reimplementation of that macro. This could lead to errors, because your custom reimplementation will be passed arguments it cannot accept. * A dbt adapter renames or removes a method that was previously available on the `{{ adapter }}` object in the dbt-Jinja context. * dbt makes a breaking change to contracted metadata artifacts by deleting a required field, changing the name or type of an existing field, or removing the default value of an existing field ([README](https://github.com/dbt-labs/dbt-core/blob/37d382c8e768d1e72acd767e0afdcb1f0dc5e9c5/core/dbt/artifacts/README.md#breaking-changes)). * dbt removes one of the fields from [structured logs](https://docs.getdbt.com/reference/events-logging.md#structured-logging). The following are **not** behavior changes: * Fixing a bug where the previous behavior was defective, undesirable, or undocumented. * dbt begins raising a *warning* that it didn't previously. * dbt updates the language of human-friendly messages in log events. * dbt makes a non-breaking change to contracted metadata artifacts by adding a new field with a default, or deleting a field with a default ([README](https://github.com/dbt-labs/dbt-core/blob/37d382c8e768d1e72acd767e0afdcb1f0dc5e9c5/core/dbt/artifacts/README.md#non-breaking-changes)). The vast majority of changes are not behavior changes. Because introducing these changes does not require any action on the part of users, they are included in continuous releases of dbt and patch releases of dbt Core. By contrast, behavior change migrations happen slowly, over the course of months, facilitated by behavior change flags. The flags are loosely coupled to the specific dbt runtime version. By setting flags, users have control over opting in (and later opting out) of these changes. #### Behavior change flags[​](#behavior-change-flags "Direct link to Behavior change flags") These flags *must* be set in the `flags` dictionary in `dbt_project.yml`. They configure behaviors closely tied to project code, which means they should be defined in version control and modified through pull or merge requests, with the same testing and peer review. The following example displays the current flags and their current default values in the latest dbt and dbt Core versions. To opt out of a specific behavior change, set the values of the flag to `False` in `dbt_project.yml`. You will continue to see warnings for legacy behaviors you've opted out of, until you either: * Resolve the issue (by switching the flag to `True`) * Silence the warnings using the `warn_error_options.silence` flag Here's an example of the available behavior change flags with their default values: dbt\_project.yml ```yml flags: require_explicit_package_overrides_for_builtin_materializations: True require_resource_names_without_spaces: True source_freshness_run_project_hooks: True skip_nodes_if_on_run_start_fails: False state_modified_compare_more_unrendered_values: False require_yaml_configuration_for_mf_time_spines: False require_batched_execution_for_custom_microbatch_strategy: False require_nested_cumulative_type_params: False validate_macro_args: False require_all_warnings_handled_by_warn_error: False require_generic_test_arguments_property: True require_unique_project_resource_names: False require_ref_searches_node_package_before_root: False require_valid_schema_from_generate_schema_name: False enable_truthy_nulls_equals_macro: False require_sql_header_in_test_configs: False ``` ###### dbt Core behavior changes[​](#dbt-core-behavior-changes "Direct link to dbt Core behavior changes") This table outlines which month of the **Latest** release track in dbt and which version of dbt Core contains the behavior change's introduction (disabled by default) or maturity (enabled by default). | Flag | dbt **Latest**: Intro | dbt **Latest**: Maturity | dbt Core: Intro | dbt Core: Maturity | Removed in Fusion | | ----------------------------------------------------------------------------------------------------------------------- | --------------------- | ------------------------ | --------------- | ------------------ | ----------------- | | [require\_explicit\_package\_overrides\_for\_builtin\_materializations](#package-override-for-built-in-materialization) | 2024.04 | 2024.06 | 1.6.14, 1.7.14 | 1.8.0 | ✅ | | [require\_resource\_names\_without\_spaces](#no-spaces-in-resource-names) | 2024.05 | 2025.05 | 1.8.0 | 1.10.0 | ✅ | | [source\_freshness\_run\_project\_hooks](#project-hooks-with-source-freshness) | 2024.03 | 2025.05 | 1.8.0 | 1.10.0 | ✅ | | [skip\_nodes\_if\_on\_run\_start\_fails](#failures-in-on-run-start-hooks) | 2024.10 | TBD\* | 1.9.0 | TBD\* | ✅ | | [state\_modified\_compare\_more\_unrendered\_values](#source-definitions-for-state) | 2024.10 | TBD\* | 1.9.0 | TBD\* | ✅ | | [require\_yaml\_configuration\_for\_mf\_time\_spines](#metricflow-time-spine-yaml) | 2024.10 | TBD\* | 1.9.0 | TBD\* | ✅ | | [require\_batched\_execution\_for\_custom\_microbatch\_strategy](#custom-microbatch-strategy) | 2024.11 | TBD\* | 1.9.0 | TBD\* | ✅ | | [require\_nested\_cumulative\_type\_params](#cumulative-metrics) | 2024.11 | TBD\* | 1.9.0 | TBD\* | - | | [enable\_truthy\_nulls\_equals\_macro](#null-safe-equality) | 2025.02 | TBD\* | 1.9.0 | TBD\* | - | | [validate\_macro\_args](#macro-argument-validation) | 2025.03 | TBD\* | 1.10.0 | TBD\* | - | | [require\_all\_warnings\_handled\_by\_warn\_error](#warn-error-handler-for-all-warnings) | 2025.06 | TBD\* | 1.10.0 | TBD\* | - | | [require\_generic\_test\_arguments\_property](#generic-test-arguments-property) | 2025.07 | 2025.08 | 1.10.5 | 1.10.8 | - | | [require\_unique\_project\_resource\_names](#unique-project-resource-names) | 2025.12 | TBD\* | 1.11.0 | TBD\* | - | | [require\_ref\_searches\_node\_package\_before\_root](#package-ref-search-order) | 2025.12 | TBD\* | 1.11.0 | TBD\* | - | | [require\_valid\_schema\_from\_generate\_schema\_name](#valid-schema-from-generate_schema_name) | 2026.1 | TBD\* | 1.12.0a1 | TBD\* | - | | [require\_sql\_header\_in\_test\_configs](#sql_header-in-data-tests) | 2026.3 | TBD\* | 1.12.0 | TBD\* | - | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | ###### dbt adapter behavior changes[​](#dbt-adapter-behavior-changes "Direct link to dbt adapter behavior changes") This table outlines which version of the dbt adapter contains the behavior change's introduction (disabled by default) or maturity (enabled by default). | Flag | dbt-ADAPTER: Intro | dbt-ADAPTER: Maturity | Removed in Fusion | | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ------------------ | --------------------- | ----------------- | | [use\_info\_schema\_for\_columns](https://docs.getdbt.com/reference/global-configs/databricks-changes.md#use-information-schema-for-columns) | Databricks 1.9.0 | TBD | ✅ | | [use\_user\_folder\_for\_python](https://docs.getdbt.com/reference/global-configs/databricks-changes.md#use-users-folder-for-python-model-notebooks) | Databricks 1.9.0 | TBD | ✅ | | [use\_managed\_iceberg](https://docs.getdbt.com/reference/global-configs/databricks-changes.md#use-managed-iceberg) | Databricks 1.11.0 | 1.12.0 | - | | [use\_materialization\_v2](https://docs.getdbt.com/reference/global-configs/databricks-changes.md#use-restructured-materializations) | Databricks 1.10.0 | TBD | - | | [use\_replace\_on\_for\_insert\_overwrite](https://docs.getdbt.com/reference/global-configs/databricks-changes.md#use-replace-on-for-insert_overwrite-strategy) | Databricks 1.11.0 | 1.11.0 | - | | [redshift\_skip\_autocommit\_transaction\_statements](https://docs.getdbt.com/reference/global-configs/redshift-changes.md#redshift_skip_autocommit_transaction_statements-flag) | Redshift 1.12.0 | TBD | - | | [bigquery\_use\_batch\_source\_freshness](https://docs.getdbt.com/reference/global-configs/bigquery-changes.md#bigquery-use-batch-source-freshness) | BigQuery 1.11.0rc2 | TBD | - | | [bigquery\_reject\_wildcard\_metadata\_source\_freshness](https://docs.getdbt.com/reference/global-configs/bigquery-changes.md#the-bigquery_reject_wildcard_metadata_source_freshness-flag) | BigQuery 1.12.0 | TBD | - | | [snowflake\_default\_transient\_dynamic\_tables](https://docs.getdbt.com/reference/global-configs/snowflake-changes.md#the-snowflake_default_transient_dynamic_tables-flag) | Snowflake 1.12.0 | TBD | - | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | When the dbt Maturity is "TBD," it means we have not yet determined the exact date when these flags' default values will change. Affected users will see deprecation warnings in the meantime, and they will receive emails providing advance warning ahead of the maturity date. In the meantime, if you are seeing a deprecation warning, you can either: * Migrate your project to support the new behavior, and then set the flag to `True` to stop seeing the warnings. * Set the flag to `False`. You will continue to see warnings, and you will retain the legacy behavior even after the maturity date (when the default value changes). ##### Failures in on-run-start hooks[​](#failures-in-on-run-start-hooks "Direct link to Failures in on-run-start hooks") The flag is `False` by default. Set the `skip_nodes_if_on_run_start_fails` flag to `True` to skip all selected resources from running if there is a failure on an `on-run-start` hook. ##### Source definitions for state:modified[​](#source-definitions-for-state "Direct link to source-definitions-for-state") info You need to build the state directory using dbt v1.9 or higher, or [the dbt "Latest" release track](https://docs.getdbt.com/docs/dbt-versions/cloud-release-tracks.md), and you need to set `state_modified_compare_more_unrendered_values` to `true` within your dbt\_project.yml. If the state directory was built with an older dbt version or if the `state_modified_compare_more_unrendered_values` behavior change flag was either not set or set to `false`, you need to rebuild the state directory to avoid false positives during state comparison with `state:modified`. The flag is `False` by default. Set `state_modified_compare_more_unrendered_values` to `True` to reduce false positives during `state:modified` checks (especially when configs differ by target environment like `prod` vs. `dev`). Setting the flag to `True` changes the `state:modified` comparison from using rendered values to unrendered values instead. It accomplishes this by persisting `unrendered_config` during model parsing and `unrendered_database` and `unrendered_schema` configs during source parsing. ##### Package override for built-in materialization[​](#package-override-for-built-in-materialization "Direct link to Package override for built-in materialization") Setting the `require_explicit_package_overrides_for_builtin_materializations` flag to `True` prevents this automatic override. We have deprecated the behavior where installed packages could override built-in materializations without your explicit opt-in. When this flag is set to `True`, a materialization defined in a package that matches the name of a built-in materialization will no longer be included in the search and resolution order. Unlike macros, materializations don't use the `search_order` defined in the project `dispatch` config. The built-in materializations are `'view'`, `'table'`, `'incremental'`, `'materialized_view'` for models as well as `'test'`, `'unit'`, `'snapshot'`, `'seed'`, and `'clone'`. You can still explicitly override built-in materializations, in favor of a materialization defined in a package, by reimplementing the built-in materialization in your root project and wrapping the package implementation. macros/materialization\_view.sql ```sql {% materialization view, snowflake %} {{ return(my_installed_package_name.materialization_view_snowflake()) }} {% endmaterialization %} ``` In the future, we may extend the project-level [`dispatch` configuration](https://docs.getdbt.com/reference/project-configs/dispatch-config.md) to support a list of authorized packages for overriding built-in materialization. ##### No spaces in resource names[​](#no-spaces-in-resource-names "Direct link to No spaces in resource names") The `require_resource_names_without_spaces` flag enforces using resource names without spaces. The names of dbt resources (for example, models) should contain letters, numbers, and underscores. We highly discourage the use of other characters, especially spaces. To that end, we have deprecated support for spaces in resource names. When the `require_resource_names_without_spaces` flag is set to `True`, dbt will raise an exception (instead of a deprecation warning) if it detects a space in a resource name. models/model name with spaces.sql ```sql -- This model file should be renamed to model_name_with_underscores.sql ``` ##### Project hooks with source freshness[​](#project-hooks-with-source-freshness "Direct link to Project hooks with source freshness") Set the `source_freshness_run_project_hooks` flag to include/exclude "project hooks" ([`on-run-start` / `on-run-end`](https://docs.getdbt.com/reference/project-configs/on-run-start-on-run-end.md)) in the `dbt source freshness` command execution. The flag is set to `True` (include) by default. If you have a specific project [`on-run-start` / `on-run-end`](https://docs.getdbt.com/reference/project-configs/on-run-start-on-run-end.md) hooks that should not run before/after `source freshness` command, you can add a conditional check to those hooks: dbt\_project.yml ```yaml on-run-start: - '{{ ... if flags.WHICH != 'freshness' }}' ``` ##### MetricFlow time spine YAML[​](#metricflow-time-spine-yaml "Direct link to MetricFlow time spine YAML") The `require_yaml_configuration_for_mf_time_spines` flag is set to `False` by default. In previous versions (dbt Core 1.8 and earlier), the MetricFlow time spine configuration was stored in a `metricflow_time_spine.sql` file. When the flag is set to `True`, dbt will continue to support the SQL file configuration. When the flag is set to `False`, dbt will raise a deprecation warning if it detects a MetricFlow time spine configured in a config block in a SQL file. The MetricFlow properties YAML file should have the `time_spine:` field. Refer to [MetricFlow timespine](https://docs.getdbt.com/docs/build/metricflow-time-spine.md) for more details. ##### Custom microbatch strategy[​](#custom-microbatch-strategy "Direct link to Custom microbatch strategy") The `require_batched_execution_for_custom_microbatch_strategy` flag is set to `False` by default and is only relevant if you already have a custom microbatch macro in your project. If you don't have a custom microbatch macro, you don't need to set this flag as dbt will handle microbatching automatically for any model using the [microbatch strategy](https://docs.getdbt.com/docs/build/incremental-microbatch.md#how-microbatch-compares-to-other-incremental-strategies). Set the flag is set to `True` if you have a custom microbatch macro set up in your project. When the flag is set to `True`, dbt will execute the custom microbatch strategy in batches. If you have a custom microbatch macro and the flag is left as `False`, dbt will issue a deprecation warning. Previously, users needed to set the `DBT_EXPERIMENTAL_MICROBATCH` environment variable to `True` to prevent unintended interactions with existing custom incremental strategies. But this is no longer necessary, as setting `DBT_EXPERMINENTAL_MICROBATCH` will no longer have an effect on runtime functionality. ##### Cumulative metrics[​](#cumulative-metrics "Direct link to Cumulative metrics") [Cumulative-type metrics](https://docs.getdbt.com/docs/build/cumulative.md#parameters) are nested under the `cumulative_type_params` field in [the dbt **Latest** release track](https://docs.getdbt.com/docs/dbt-versions/cloud-release-tracks.md), dbt Core v1.9 and newer. Currently, dbt will warn users if they have cumulative metrics improperly nested. To enforce the new format (resulting in an error instead of a warning), set the `require_nested_cumulative_type_params` to `True`. Use the following metric configured with the syntax before v1.9 as an example: ```yaml type: cumulative type_params: measure: order_count window: 7 days ``` If you run `dbt parse` with that syntax on Core v1.9 or [the dbt **Latest** release track](https://docs.getdbt.com/docs/dbt-versions/cloud-release-tracks.md), you will receive a warning like: ```bash 15:36:22 [WARNING]: Cumulative fields `type_params.window` and `type_params.grain_to_date` has been moved and will soon be deprecated. Please nest those values under `type_params.cumulative_type_params.window` and `type_params.cumulative_type_params.grain_to_date`. See documentation on behavior changes: https://docs.getdbt.com/reference/global-configs/behavior-changes ``` If you set `require_nested_cumulative_type_params` to `True` and re-run `dbt parse` you will now receive an error like: ```bash 21:39:18 Cumulative fields `type_params.window` and `type_params.grain_to_date` should be nested under `type_params.cumulative_type_params.window` and `type_params.cumulative_type_params.grain_to_date`. Invalid metrics: orders_last_7_days. See documentation on behavior changes: https://docs.getdbt.com/reference/global-configs/behavior-changes. ``` Once the metric is updated, it will work as expected: ```yaml type: cumulative type_params: measure: name: order_count cumulative_type_params: window: 7 days ``` ##### Null-safe equality (equals macro)[​](#null-safe-equality "Direct link to Null-safe equality (equals macro)") The `enable_truthy_nulls_equals_macro` flag is `False` by default. Setting it to `True` in your `dbt_project.yml` enables null-safe equality in the dbt [equals](https://docs.getdbt.com/reference/dbt-jinja-functions/cross-database-macros.md#equals) macro, which is used in incremental and snapshot materializations. By default, the `equals()` macro follows SQL's [three-valued logic (3VL)](https://modern-sql.com/concept/three-valued-logic), so `NULL = NULL` evaluates to `UNKNOWN` rather than `TRUE`. When the `enable_truthy_nulls_equals_macro` flag is enabled, the `equals()` macro uses the semantics of the [`IS NOT DISTINCT FROM`](https://modern-sql.com/feature/is-distinct-from) operator with two `NULL` values treated as equal. To enable the flag, add it under `flags` in `dbt_project.yml`: dbt\_project.yml ```yml flags: enable_truthy_nulls_equals_macro: true ``` ##### Macro argument validation[​](#macro-argument-validation "Direct link to Macro argument validation") dbt supports optional validation for macro arguments using the `validate_macro_args` flag. By default, the `validate_macro_args` flag is set to `False`, which means that dbt won't validate the names or types of documented macro arguments. In the past, dbt didn't enforce a standard vocabulary for the [`type`](https://docs.getdbt.com/reference/resource-properties/arguments.md#type) field on macro arguments in YAML. Because of this, the `type` field was used for documentation only, and dbt didn't check that: * the argument names matched those in your macro * the argument types were valid or consistent with the macro's Jinja definition Here's an example of a documented macro: macros/filename.yml ```yaml macros: - name: arguments: - name: type: ``` When you set the `validate_macro_args` flag to `True`, dbt will: * Validate macro arguments during project parsing. * Check that all argument names in your YAML match those in the macro definition. * Raise warnings if the names or types don't match. * Validate that the [`type` values follow the supported format](https://docs.getdbt.com/reference/resource-properties/arguments.md#supported-types). * If no arguments are documented in the YAML, infer them from the macro and include them in the [`manifest.json` file](https://docs.getdbt.com/reference/artifacts/manifest-json.md).  When does validation occur? Macro argument validation runs during project parsing, not during macro execution. Any dbt command that parses the project will trigger validation if you enable the `validate_macro_args` flag. * In dbt Core: * Validation runs as part of parsing for most commands (`parse`, `build`, `run`, `test`, `seed`, `snapshot`, `compile`). * With a full parse, dbt validates all macros. * With partial parsing (the default), dbt validates only macros affected by changed files. * Use `--no-partial-parse` to force validation of all macros. dbt Fusion engine will support macro argument validation in a future release. ##### Warn-error handler for all warnings[​](#warn-error-handler-for-all-warnings "Direct link to Warn-error handler for all warnings") By default, the `require_all_warnings_handled_by_warn_error` flag is set to `False`. When you set `require_all_warnings_handled_by_warn_error` to `True`, all warnings raised during a run are routed through the `--warn-error` / `--warn-error-options` handler. This ensures consistent behavior when promoting warnings to errors or silencing them. When the flag is `False`, only some warnings are processed by the handler while others may bypass it. Note that enabling this for projects that use `--warn-error` (or `--warn-error-options='{"error":"all"}'`) may cause builds to fail on warnings that were previously ignored. We recommend enabling it gradually.  Recommended steps to enable the flag We recommend the following rollout plan when setting the `require_all_warnings_handled_by_warn_error` flag to `True`: 1. Run a full build without partial parsing to surface parse-time warnings, and confirm it finishes successfully: ```bash dbt build --no-partial-parse ``` * Some warnings are only emitted at parse time. * If the build fails because warnings are already treated as errors (via `--warn-error` or `--warn-error-options`), fix those first and re-run. 2. Review the logs: * If you have any warnings at this point, it means they weren't handled by `--warn-error`/`--warn-error-options`. Continue to the next step. * If there are no warnings, enable the flag in all environments and that's it! 3. Enable `require_all_warnings_handled_by_warn_error` in your development environment and fix any warnings that now surface as errors. 4. Enable the flag in your CI environment (if you have one) and ensure builds pass. 5. Enable the flag in your production environment. ##### Generic test arguments property[​](#generic-test-arguments-property "Direct link to Generic test arguments property") dbt supports parsing key-value arguments that are inputs to generic tests when specified under the `arguments` property. In the past, dbt didn't support a way to clearly disambiguate between properties that were inputs to generic tests and framework configurations, and only accepted arguments as top-level properties. In **Latest**, the `require_generic_test_arguments_property` flag is set to `True` by default. In dbt Core versions prior to 1.10.8, the default value is `False`. Using the `arguments` property in test definitions is optional in either case. If you do use `arguments` while the flag is `False`, dbt will recognize it but raise the `ArgumentsPropertyInGenericTestDeprecation` warning. This warning lets you know that the flag will eventually default to `True` across all releases and will be parsed as keyword arguments to the data test. Here's an example using the new `arguments` property: model.yml ```yaml models: - name: my_model_with_generic_test data_tests: - dbt_utils.expression_is_true: arguments: expression: "order_items_subtotal = subtotal" ``` Here's an example using the alternative `test_name` format: model.yml ```yaml models: - name: my_model_with_generic_test data_tests: - name: arbitrary_name test_name: dbt_utils.expression_is_true arguments: expression: "order_items_subtotal = subtotal" config: where: "1=1" ``` When you set the `require_generic_test_arguments_property` flag to `True`, dbt will: * Parse any key-value pairs under `arguments` in generic tests as inputs to the generic test macro. * Raise a `MissingArgumentsPropertyInGenericTestDeprecation` warning if additional non-config arguments are specified outside of the `arguments` property. ##### Unique project resource names[​](#unique-project-resource-names "Direct link to Unique project resource names") The `require_unique_project_resource_names` flag enforces uniqueness of resource names within the same package. dbt resources such as models, seeds, snapshots, analyses, tests, and functions share a common namespace. When two resources in the same package have the same name, dbt must decide which one a `ref()` or `source()` refers to. Previously, this check was not always enforced, which meant duplicate names could result in dbt referencing the wrong resource. The `require_unique_project_resource_names` flag is set to `False` by default. With this setting, if two unversioned resources in the same package share the same name, dbt continues to run and raises a [`DuplicateNameDistinctNodeTypesDeprecation`](https://docs.getdbt.com/reference/deprecations.md#duplicatenamedistinctnodetypesdeprecation) warning. When set to `True`, dbt raises a `DuplicateResourceNameError` error. For example, if your project contains a model and a seed named `sales`: ```text models/sales.sql seeds/sales.csv ``` And a model contains: ```sql select * from {{ ref('sales') }} ``` When the flag is set to `True`, dbt will raise: ```text DuplicateResourceNameError: Found resources with the same name 'sales' in package 'project': 'model.project.sales' and 'seed.project.sales'. Please update one of the resources to have a unique name. ``` When this error is raised, you should rename one of the resources, or refactor the project structure to avoid name conflicts. ##### Package `ref` search order[​](#package-ref-search-order "Direct link to package-ref-search-order") The `require_ref_searches_node_package_before_root` flag controls the search order when dbt resolves `ref()` calls defined within a package. The flag is set to `False` by default in **Latest** and dbt Core v1.11. When dbt resolves a `ref()` in a package model, it searches for the referenced model in the root project *first*, then in the package where the model is defined. For example, the following model in the package `my_package` is imported by the project `my_project`: my\_package/model\_downstream.sql ```sql select * from {{ ref('model_upstream') }} ``` By default, dbt searches for `model_upstream` in this order: 1. First in `my_project` (root project) 2. Then in `my_package` (where the model is defined) When you set the `require_ref_searches_node_package_before_root` flag to `True`, dbt searches the package where the model is defined *before* searching the root project. Using the same example, dbt searches for `model_upstream` in this order: 1. First in `my_package` (where the model is defined) 2. Then in `my_project` (root project) The current default behavior is considered a [bug in dbt-core](https://github.com/dbt-labs/dbt-core/issues/11351) because it can *potentially* lead to unexpected dependency cycles. However, because this is long-standing behavior, changing the default requires setting `require_ref_searches_node_package_before_root` to `True` to avoid breaking existing projects. ##### Valid schema from `generate_schema_name`[​](#valid-schema-from-generate_schema_name "Direct link to valid-schema-from-generate_schema_name") The `generate_schema_name` macro determines the schema where dbt creates models and other resources. Returning a `null` value from this macro can result in invalid schema names and lead to unpredictable behavior during dbt runs. The `require_valid_schema_from_generate_schema_name` behavior flag is set to `False` by default. When `False`, dbt raises the [`GenerateSchemaNameNullValueDeprecation`](https://docs.getdbt.com/reference/deprecations.md#generateschemanamenullvaluedeprecation) warning when a custom `generate_schema_name` macro returns a `null` value. When `require_valid_schema_from_generate_schema_name` is set to `True`, dbt enforces stricter validation and raises a parsing error. For example, if your project has a custom `generate_schema_name` macro that returns `null`: macros/get\_custom\_schema.sql ```sql {% macro generate_schema_name(custom_schema_name, node) -%} {%- if custom_schema_name is none -%} {{ return(none) }} {%- else -%} {{ custom_schema_name | trim }} {%- endif -%} {%- endmacro %} ``` With the default behavior, dbt raises a deprecation warning. When `require_valid_schema_from_generate_schema_name` is set to `True`, dbt raises an error. To resolve this, update your macro to return a valid schema name (`target.schema` in this example): macros/get\_custom\_schema.sql ```sql {% macro generate_schema_name(custom_schema_name, node) -%} {%- if custom_schema_name is none -%} {{ return(target.schema) }} {%- else -%} {{ custom_schema_name | trim }} {%- endif -%} {%- endmacro %} ``` ##### `sql_header` in data tests[​](#sql_header-in-data-tests "Direct link to sql_header-in-data-tests") Set the `require_sql_header_in_test_configs` flag to `True` to enable support for the [`sql_header`](https://docs.getdbt.com/reference/resource-configs/sql_header.md) config for generic data tests. When enabled, you can set `sql_header` in the `config` of a generic data test at the model or column level in your `properties.yml` file. You can use `sql_header` to define SQL that should run before the test executes (for example, to create temporary functions, to set session parameters, or to declare variables required by the test query). dbt runs this SQL before executing the test. For example: models/properties.yml ```yaml models: - name: orders columns: - name: order_id data_tests: - not_null: name: not_null_orders_order_id config: sql_header: "-- SQL_HEADER_TEST_MARKER" ``` For more information, refer to [Data test configurations](https://docs.getdbt.com/reference/data-test-configs.md). #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### BigQuery adapter behavior changes #### The `bigquery_use_batch_source_freshness` flag[​](#the-bigquery_use_batch_source_freshness-flag "Direct link to the-bigquery_use_batch_source_freshness-flag") The `bigquery_use_batch_source_freshness` flag is `False` by default. Setting it to `True` in your `dbt_project.yml` file enables dbt to compute `source freshness` results with a single batched query to BigQuery's [`INFORMATION_SCHEMA.TABLE_STORAGE`](https://cloud.google.com/bigquery/docs/information-schema-table-storage) view as opposed to sending a metadata request for each source. Setting this flag to `True` improves the performance of the `source freshness` command significantly, especially when a project contains a large (1000+) number of sources. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### BigQuery configurations #### Use `project` and `dataset` in configurations[​](#use-project-and-dataset-in-configurations "Direct link to use-project-and-dataset-in-configurations") * `schema` is interchangeable with the BigQuery concept `dataset` * `database` is interchangeable with the BigQuery concept of `project` For our reference documentation, you can declare `project` in place of `database.` This will allow you to read and write from multiple BigQuery projects. Same for `dataset`. #### Using table partitioning and clustering[​](#using-table-partitioning-and-clustering "Direct link to Using table partitioning and clustering") ##### Partition clause[​](#partition-clause "Direct link to Partition clause") BigQuery supports the use of a [partition by](https://cloud.google.com/bigquery/docs/data-definition-language#specifying_table_partitioning_options) clause to easily partition a table by a column or expression. This option can help decrease latency and cost when querying large tables. Note that partition pruning [only works](https://cloud.google.com/bigquery/docs/querying-partitioned-tables#use_a_constant_filter_expression) when partitions are filtered using literal values (so selecting partitions using a subquery won't improve performance). The `partition_by` config can be supplied as a dictionary with the following format: ```python { "field": "", "data_type": "", "granularity": "" # Only required if data_type is "int64" "range": { "start": , "end": , "interval": } } ``` ###### Partitioning by a date or timestamp[​](#partitioning-by-a-date-or-timestamp "Direct link to Partitioning by a date or timestamp") When using a `datetime` or `timestamp` column to partition data, you can create partitions with a granularity of hour, day, month, or year. A `date` column supports granularity of day, month and year. Daily partitioning is the default for all column types. If the `data_type` is specified as a `date` and the granularity is day, dbt will supply the field as-is when configuring table partitioning. * Source code * Compiled code bigquery\_table.sql ```sql {{ config( materialized='table', partition_by={ "field": "created_at", "data_type": "timestamp", "granularity": "day" } )}} select user_id, event_name, created_at from {{ ref('events') }} ``` bigquery\_table.sql ```sql create table `projectname`.`analytics`.`bigquery_table` partition by timestamp_trunc(created_at, day) as ( select user_id, event_name, created_at from `analytics`.`events` ) ``` ###### Partitioning by an "ingestion" date or timestamp[​](#partitioning-by-an-ingestion-date-or-timestamp "Direct link to Partitioning by an \"ingestion\" date or timestamp") BigQuery supports an [older mechanism of partitioning](https://cloud.google.com/bigquery/docs/partitioned-tables#ingestion_time) based on the time when each row was ingested. While we recommend using the newer and more ergonomic approach to partitioning whenever possible, for very large datasets, there can be some performance improvements to using this older, more mechanistic approach. [Read more about the `insert_overwrite` incremental strategy below](#copying-ingestion-time-partitions). dbt will always instruct BigQuery to partition your table by the values of the column specified in `partition_by.field`. By configuring your model with `partition_by.time_ingestion_partitioning` set to `True`, dbt will use that column as the input to a `_PARTITIONTIME` pseudocolumn. Unlike with newer column-based partitioning, you must ensure that the values of your partitioning column match exactly the time-based granularity of your partitions. * Source code * Compiled code bigquery\_table.sql ```sql {{ config( materialized="incremental", partition_by={ "field": "created_date", "data_type": "timestamp", "granularity": "day", "time_ingestion_partitioning": true } ) }} select user_id, event_name, created_at, -- values of this column must match the data type + granularity defined above timestamp_trunc(created_at, day) as created_date from {{ ref('events') }} ``` bigquery\_table.sql ```sql create table `projectname`.`analytics`.`bigquery_table` (`user_id` INT64, `event_name` STRING, `created_at` TIMESTAMP) partition by timestamp_trunc(_PARTITIONTIME, day); insert into `projectname`.`analytics`.`bigquery_table` (_partitiontime, `user_id`, `event_name`, `created_at`) select created_date as _partitiontime, * EXCEPT(created_date) from ( select user_id, event_name, created_at, -- values of this column must match granularity defined above timestamp_trunc(created_at, day) as created_date from `projectname`.`analytics`.`events` ); ``` ###### Partitioning with integer buckets[​](#partitioning-with-integer-buckets "Direct link to Partitioning with integer buckets") If the `data_type` is specified as `int64`, then a `range` key must also be provided in the `partition_by` dict. dbt will use the values provided in the `range` dict to generate the partitioning clause for the table. * Source code * Compiled code bigquery\_table.sql ```sql {{ config( materialized='table', partition_by={ "field": "user_id", "data_type": "int64", "range": { "start": 0, "end": 100, "interval": 10 } } )}} select user_id, event_name, created_at from {{ ref('events') }} ``` bigquery\_table.sql ```sql create table analytics.bigquery_table partition by range_bucket( user_id, generate_array(0, 100, 10) ) as ( select user_id, event_name, created_at from analytics.events ) ``` ###### Additional partition configs[​](#additional-partition-configs "Direct link to Additional partition configs") If your model has `partition_by` configured, you may optionally specify two additional configurations: * `require_partition_filter` (boolean): If set to `true`, anyone querying this model *must* specify a partition filter, otherwise their query will fail. This is recommended for very large tables with obvious partitioning schemes, such as event streams grouped by day. Note that this will affect other dbt models or tests that try to select from this model, too. * `partition_expiration_days` (integer): If set for date- or timestamp-type partitions, the partition will expire that many days after the date it represents. E.g. A partition representing `2021-01-01`, set to expire after 7 days, will no longer be queryable as of `2021-01-08`, its storage costs zeroed out, and its contents will eventually be deleted. Note that [table expiration](#controlling-table-expiration) will take precedence if specified. bigquery\_table.sql ```sql {{ config( materialized = 'table', partition_by = { "field": "created_at", "data_type": "timestamp", "granularity": "day" }, require_partition_filter = true, partition_expiration_days = 7 )}} ``` ##### Clustering clause[​](#clustering-clause "Direct link to Clustering clause") BigQuery tables can be [clustered](https://cloud.google.com/bigquery/docs/clustered-tables) to colocate related data. Clustering on a single column: bigquery\_table.sql ```sql {{ config( materialized = "table", cluster_by = "order_id", ) }} select * from ... ``` Clustering on multiple columns: bigquery\_table.sql ```sql {{ config( materialized = "table", cluster_by = ["customer_id", "order_id"], ) }} select * from ... ``` #### Managing KMS encryption[​](#managing-kms-encryption "Direct link to Managing KMS encryption") [Customer managed encryption keys](https://cloud.google.com/bigquery/docs/customer-managed-encryption) can be configured for BigQuery tables using the `kms_key_name` model configuration. ##### Using KMS encryption[​](#using-kms-encryption "Direct link to Using KMS encryption") To specify the KMS key name for a model (or a group of models), use the `kms_key_name` model configuration. The following example sets the `kms_key_name` for all of the models in the `encrypted/` directory of your dbt project. dbt\_project.yml ```yaml name: my_project version: 1.0.0 ... models: my_project: encrypted: +kms_key_name: 'projects/PROJECT_ID/locations/global/keyRings/test/cryptoKeys/quickstart' ``` #### Labels and tags[​](#labels-and-tags "Direct link to Labels and tags") ##### Specifying labels[​](#specifying-labels "Direct link to Specifying labels") dbt supports the specification of BigQuery labels for the tables and views that it creates. These labels can be specified using the `labels` model config. The `labels` config can be provided in a model config, or in the `dbt_project.yml` file, as shown below. BigQuery key-value pair entries for labels larger than 63 characters are truncated. **Configuring labels in a model file** model.sql ```sql {{ config( materialized = "table", labels = {'contains_pii': 'yes', 'contains_pie': 'no'} ) }} select * from {{ ref('another_model') }} ``` **Configuring labels in dbt\_project.yml** dbt\_project.yml ```yaml models: my_project: snowplow: +labels: domain: clickstream finance: +labels: domain: finance ``` [![Viewing labels in the BigQuery console](/img/docs/building-a-dbt-project/building-models/73eaa8a-Screen_Shot_2020-01-20_at_12.12.54_PM.png?v=2 "Viewing labels in the BigQuery console")](#)Viewing labels in the BigQuery console ##### Applying labels to jobs[​](#applying-labels-to-jobs "Direct link to Applying labels to jobs") While the `labels` configuration applies labels to the tables and views created by dbt, you can also apply labels to the BigQuery *jobs* that dbt runs. Job labels are useful for tracking query costs, monitoring job performance, and organizing your BigQuery job history by dbt metadata. By default, labels are not applied to jobs directly. However, you can enable job labeling through query comments by following these steps: ###### Step 1[​](#step-1 "Direct link to Step 1") Define the `query_comment` macro to add labels to your queries via the query comment: ```sql -- macros/query_comment.sql {% macro query_comment(node) %} {%- set comment_dict = {} -%} {%- do comment_dict.update( app='dbt', dbt_version=dbt_version, profile_name=target.get('profile_name'), target_name=target.get('target_name'), ) -%} {%- if node is not none -%} {%- do comment_dict.update(node.config.get("labels", {})) -%} {% else %} {%- do comment_dict.update(node_id='internal') -%} {%- endif -%} {% do return(tojson(comment_dict)) %} {% endmacro %} ``` This macro creates a JSON comment containing dbt metadata (app, version, profile, target) and merges in any model-specific labels you've configured. ###### Step 2[​](#step-2 "Direct link to Step 2") Enable job labeling in your `dbt_project.yml` by setting `comment: "{{ query_comment(node) }}"` and `job-label: true` in the `query-comment` configuration: ```yaml # dbt_project.yml name: analytics profile: bq version: "1.0.0" models: analytics: +materialized: table query-comment: comment: "{{ query_comment(node) }}" job-label: true ``` When enabled, BigQuery will parse the JSON comment and apply the key-value pairs as labels to each job. You can then filter and analyze jobs in the BigQuery console or via the INFORMATION\_SCHEMA.JOBS view using these labels. ##### Specifying tags[​](#specifying-tags "Direct link to Specifying tags") BigQuery table and view *tags* can be created by supplying an empty string for the label value. model.sql ```sql {{ config( materialized = "table", labels = {'contains_pii': ''} ) }} select * from {{ ref('another_model') }} ``` You can create a new label with no value or remove a value from an existing label key. A label with a key that has an empty value can also be referred to as a [tag](https://cloud.google.com/bigquery/docs/adding-labels#adding_a_label_without_a_value) in BigQuery. However, this is different from a [BigQuery tag](https://cloud.google.com/bigquery/docs/tags), which conditionally applies IAM policies to BigQuery tables and datasets. For more information, see the [Tags documentation](https://cloud.google.com/resource-manager/docs/tags/tags-overview). ##### Resource tags[​](#resource-tags "Direct link to Resource tags") [BigQuery tags](https://cloud.google.com/bigquery/docs/tags) enable conditional IAM access control for BigQuery tables and views. You can apply these BigQuery tags using the `resource_tags` configuration. This section contains guidelines for using the `resource_tags` configuration parameter. Resource tags are key-value pairs that must follow BigQuery's tag format: `{google_cloud_project_id}/{key_name}: value`. Unlike labels, BigQuery tags are primarily designed for IAM access control using conditional policies, allowing organizations to: * **Implement conditional access control**: Apply IAM policies conditionally based on BigQuery tags (for example, granting access only to tables tagged with `environment:production`). * **Enforce data governance**: Use BigQuery tags with IAM policies to protect sensitive data. * **Control access at scale**: Manage access patterns consistently across different projects and environments. ###### Prerequisites[​](#prerequisites "Direct link to Prerequisites") * [Create tag keys and values](https://cloud.google.com/bigquery/docs/tags#create_tag_keys_and_values) in advance before using them in dbt. * Grant the [required IAM permissions](https://cloud.google.com/bigquery/docs/tags#required_permissions) to apply tags to resources. ###### Configuring tags in a model file[​](#configuring-tags-in-a-model-file "Direct link to Configuring tags in a model file") To configure tags in a model file, refer to the following example: model.sql ```sql {{ config( materialized = "table", resource_tags = { "my-project-id/environment": "production", "my-project-id/data_classification": "sensitive", "my-project-id/access_level": "restricted" } ) }} select * from {{ ref('another_model') }} ``` ###### Configuring tags in `dbt_project.yml`[​](#configuring-tags-in-dbt_projectyml "Direct link to configuring-tags-in-dbt_projectyml") To configure tags in a `dbt_project.yml` file, refer to the following example: dbt\_project.yml ```yaml models: my_project: production: +resource_tags: my-project-id/environment: production my-project-id/data_classification: sensitive staging: +resource_tags: my-project-id/environment: staging my-project-id/data_classification: internal ``` ###### Using both dbt tags and BigQuery tags[​](#using-both-dbt-tags-and-bigquery-tags "Direct link to Using both dbt tags and BigQuery tags") You can use dbt's existing `tags` configuration alongside BigQuery's `resource_tags`: model.sql ```sql {{ config( materialized = "materialized_view", tags = ["reporting", "daily"], # dbt tags for internal organization resource_tags = { # BigQuery tags for IAM access control "my-project-id/environment": "production", "my-project-id/data_classification": "sensitive" } ) }} select * from {{ ref('my_table') }} ``` For more information on setting up IAM conditional policies with BigQuery tags, see BigQuery's documentation on [tags](https://cloud.google.com/bigquery/docs/tags). ##### Policy tags[​](#policy-tags "Direct link to Policy tags") BigQuery enables [column-level security](https://cloud.google.com/bigquery/docs/column-level-security-intro) by setting [policy tags](https://cloud.google.com/bigquery/docs/best-practices-policy-tags) on specific columns. dbt enables this feature as a column resource property, `policy_tags` (*not* a node config). models/\.yml ```yaml models: - name: policy_tag_table columns: - name: field policy_tags: - 'projects//locations//taxonomies//policyTags/' ``` Please note that in order for policy tags to take effect, [column-level `persist_docs`](https://docs.getdbt.com/reference/resource-configs/persist_docs.md) must be enabled for the model, seed, or snapshot. Consider using [variables](https://docs.getdbt.com/docs/build/project-variables.md) to manage taxonomies and make sure to add the required security [roles](https://cloud.google.com/bigquery/docs/column-level-security-intro#roles) to your BigQuery service account key. #### Merge behavior (incremental models)[​](#merge-behavior-incremental-models "Direct link to Merge behavior (incremental models)") The [`incremental_strategy` config](https://docs.getdbt.com/docs/build/incremental-strategy.md) controls how dbt builds incremental models. dbt uses a [merge statement](https://cloud.google.com/bigquery/docs/reference/standard-sql/dml-syntax) on BigQuery to refresh incremental tables. The `incremental_strategy` config can be set to one of the following values: * `merge` (default) * `insert_overwrite` * [`microbatch`](https://docs.getdbt.com/docs/build/incremental-microbatch.md) ##### Change history[​](#change-history "Direct link to Change history") The `enable_change_history` parameter enables [BigQuery's change history feature](https://cloud.google.com/bigquery/docs/change-history) which tracks changes made to a BigQuery table. When enabled, you can use the change history to audit and debug the behavior of your incremental models. `enable_change_history` is set to `` values. ##### Performance and cost[​](#performance-and-cost "Direct link to Performance and cost") The operations performed by dbt while building a BigQuery incremental model can be made cheaper and faster by using a [clustering clause](#clustering-clause) in your model configuration. See [this guide](https://discourse.getdbt.com/t/benchmarking-incremental-strategies-on-bigquery/981) for more information on performance tuning for BigQuery incremental models. **Note:** These performance and cost benefits are applicable to incremental models built with either the `merge` or the `insert_overwrite` incremental strategy. ##### The `merge` strategy[​](#the-merge-strategy "Direct link to the-merge-strategy") The `merge` incremental strategy will generate a `merge` statement that looks something like: ```merge merge into {{ destination_table }} DEST using ({{ model_sql }}) SRC on SRC.{{ unique_key }} = DEST.{{ unique_key }} when matched then update ... when not matched then insert ... ``` The 'merge' approach automatically updates new data in the destination incremental table but requires scanning all source tables referenced in the model SQL, as well as destination tables. This can be slow and expensive for large data volumes. [Partitioning and clustering](#using-table-partitioning-and-clustering) techniques mentioned earlier can help mitigate these issues. **Note:** The `unique_key` configuration is required when the `merge` incremental strategy is selected. ##### The `insert_overwrite` strategy[​](#the-insert_overwrite-strategy "Direct link to the-insert_overwrite-strategy") The `insert_overwrite` strategy generates a merge statement that replaces entire partitions in the destination table. **Note:** this configuration requires that the model is configured with a [Partition clause](#partition-clause). The `merge` statement that dbt generates when the `insert_overwrite` strategy is selected looks something like: ```sql /* Create a temporary table from the model SQL */ create temporary table {{ model_name }}__dbt_tmp as ( {{ model_sql }} ); /* If applicable, determine the partitions to overwrite by querying the temp table. */ declare dbt_partitions_for_replacement array; set (dbt_partitions_for_replacement) = ( select as struct array_agg(distinct date(max_tstamp)) from `my_project`.`my_dataset`.{{ model_name }}__dbt_tmp ); /* Overwrite partitions in the destination table which match the partitions in the temporary table */ merge into {{ destination_table }} DEST using {{ model_name }}__dbt_tmp SRC on FALSE when not matched by source and {{ partition_column }} in unnest(dbt_partitions_for_replacement) then delete when not matched then insert ... ``` For a complete writeup on the mechanics of this approach, see [this explainer post](https://discourse.getdbt.com/t/bigquery-dbt-incremental-changes/982). ###### Determining partitions to overwrite[​](#determining-partitions-to-overwrite "Direct link to Determining partitions to overwrite") dbt is able to determine the partitions to overwrite dynamically from the values present in the temporary table, or statically using a user-supplied configuration. The "dynamic" approach is simplest (and the default), but the "static" approach will reduce costs by eliminating multiple queries in the model build script. ###### Static partitions[​](#static-partitions "Direct link to Static partitions") To supply a static list of partitions to overwrite, use the `partitions` configuration. models/session.sql ```sql {% set partitions_to_replace = [ 'timestamp(current_date)', 'timestamp(date_sub(current_date, interval 1 day))' ] %} {{ config( materialized = 'incremental', incremental_strategy = 'insert_overwrite', partition_by = {'field': 'session_start', 'data_type': 'timestamp'}, partitions = partitions_to_replace ) }} with events as ( select * from {{ref('events')}} {% if is_incremental() %} -- recalculate yesterday + today where timestamp_trunc(event_timestamp, day) in ({{ partitions_to_replace | join(',') }}) {% endif %} ), ... rest of model ... ``` This example model serves to replace the data in the destination table for both *today* and *yesterday* every day that it is run. It is the fastest and cheapest way to incrementally update a table using dbt. If we wanted this to run more dynamically— let’s say, always for the past 3 days—we could leverage dbt’s baked-in [datetime macros](https://github.com/dbt-labs/dbt-core/blob/dev/octavius-catto/core/dbt/include/global_project/macros/etc/datetime.sql) and write a few of our own. Think of this as "full control" mode. You must ensure that expressions or literal values in the `partitions` config have proper quoting when templated, and that they match the `partition_by.data_type` (`timestamp`, `datetime`, `date`, or `int64`). Otherwise, the filter in the incremental `merge` statement will raise an error. ###### Dynamic partitions[​](#dynamic-partitions "Direct link to Dynamic partitions") If no `partitions` configuration is provided, dbt will instead: 1. Create a temporary table for your model SQL 2. Query the temporary table to find the distinct partitions to be overwritten 3. Query the destination table to find the *max* partition in the database When building your model SQL, you can take advantage of the introspection performed by dbt to filter for only *new* data. The maximum value in the partitioned field in the destination table will be available using the `_dbt_max_partition` BigQuery scripting variable. **Note:** this is a BigQuery SQL variable, not a dbt Jinja variable, so no Jinja brackets are required to access this variable. **Example model SQL:** ```sql {{ config( materialized = 'incremental', partition_by = {'field': 'session_start', 'data_type': 'timestamp'}, incremental_strategy = 'insert_overwrite' ) }} with events as ( select * from {{ref('events')}} {% if is_incremental() %} -- recalculate latest day's data + previous -- NOTE: The _dbt_max_partition variable is used to introspect the destination table where date(event_timestamp) >= date_sub(date(_dbt_max_partition), interval 1 day) {% endif %} ), ... rest of model ... ``` ###### Copying partitions[​](#copying-partitions "Direct link to Copying partitions") If you are replacing entire partitions in your incremental runs, you can opt to do so with the [copy table API](https://cloud.google.com/bigquery/docs/managing-tables#copy-table) and partition decorators rather than a `merge` statement. While this mechanism doesn't offer the same visibility and ease of debugging as the SQL `merge` statement, it can yield significant savings in time and cost for large datasets because the copy table API does not incur any costs for inserting the data - it's equivalent to the `bq cp` gcloud command line interface (CLI) command. You can enable this by switching on `copy_partitions: True` in the `partition_by` configuration. This approach works only in combination with "dynamic" partition replacement. bigquery\_table.sql ```sql {{ config( materialized="incremental", incremental_strategy="insert_overwrite", partition_by={ "field": "created_date", "data_type": "timestamp", "granularity": "day", "time_ingestion_partitioning": true, "copy_partitions": true } ) }} select user_id, event_name, created_at, -- values of this column must match the data type + granularity defined above timestamp_trunc(created_at, day) as created_date from {{ ref('events') }} ``` logs/dbt.log ```text ... [0m16:03:13.017641 [debug] [Thread-3 (]: BigQuery adapter: Copying table(s) "/projects/projectname/datasets/analytics/tables/bigquery_table__dbt_tmp$20230112" to "/projects/projectname/datasets/analytics/tables/bigquery_table$20230112" with disposition: "WRITE_TRUNCATE" ... ``` #### Controlling table expiration[​](#controlling-table-expiration "Direct link to Controlling table expiration") By default, dbt-created tables never expire. You can configure certain model(s) to expire after a set number of hours by setting `hours_to_expiration`. Note The `hours_to_expiration` only applies to initial creation of the underlying table. It doesn't reset for incremental models when they do another run. dbt\_project.yml ```yml models: : +hours_to_expiration: 6 ``` models/\.sql ```sql {{ config( hours_to_expiration = 6 ) }} select ... ``` #### Authorized views[​](#authorized-views "Direct link to Authorized views") If the `grant_access_to` config is specified for a model materialized as a view, dbt will grant the view model access to select from the list of datasets provided. See [BQ docs on authorized views](https://cloud.google.com/bigquery/docs/share-access-views) for more details. Note The `grants` config and the `grant_access_to` config are distinct. * **`grant_access_to`:** Enables you to set up authorized views. When configured, dbt provides an authorized view access to show partial information from other datasets, without providing end users with full access to those underlying datasets. For more information, see ["BigQuery configurations: Authorized views"](https://docs.getdbt.com/reference/resource-configs/bigquery-configs.md#authorized-views) * **`grants`:** Provides specific permissions to users, groups, or service accounts for managing access to datasets you're producing with dbt. For more information, see ["Resource configs: grants"](https://docs.getdbt.com/reference/resource-configs/grants.md) You can use the two features together: "authorize" a view model with the `grants_access_to` configuration, and then add `grants` to that view model to share its query results (and *only* its query results) with other users, groups, or service accounts. dbt\_project.yml ```yml models: : +grant_access_to: - project: project_1 dataset: dataset_1 - project: project_2 dataset: dataset_2 ``` models/\.sql ```sql {{ config( grant_access_to=[ {'project': 'project_1', 'dataset': 'dataset_1'}, {'project': 'project_2', 'dataset': 'dataset_2'} ] ) }} ``` Views with this configuration will be able to select from objects in `project_1.dataset_1` and `project_2.dataset_2`, even when they are located elsewhere and queried by users who do not otherwise have access to `project_1.dataset_1` and `project_2.dataset_2`. #### Materialized views[​](#materialized-views "Direct link to Materialized views") The BigQuery adapter supports [materialized views](https://cloud.google.com/bigquery/docs/materialized-views-intro) with the following configuration parameters: | Parameter | Type | Required | Default | Change Monitoring Support | | ---------------------------------------------------------------------------------------------------------- | ---------------------- | -------- | ------- | ------------------------- | | [`on_configuration_change`](https://docs.getdbt.com/reference/resource-configs/on_configuration_change.md) | `` | no | `apply` | n/a | | [`cluster_by`](#clustering-clause) | `[]` | no | `none` | drop/create | | [`partition_by`](#partition-clause) | `{}` | no | `none` | drop/create | | [`enable_refresh`](#auto-refresh) | `` | no | `true` | alter | | [`refresh_interval_minutes`](#auto-refresh) | `` | no | `30` | alter | | [`max_staleness`](#auto-refresh) (in Preview) | `` | no | `none` | alter | | [`description`](https://docs.getdbt.com/reference/resource-properties/description.md) | `` | no | `none` | alter | | [`labels`](#specifying-labels) | `{: }` | no | `none` | alter | | [`resource_tags`](#resource-tags) | `{: }` | no | `none` | alter | | [`hours_to_expiration`](#controlling-table-expiration) | `` | no | `none` | alter | | [`kms_key_name`](#using-kms-encryption) | `` | no | `none` | alter | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | * Project file * Property file * SQL file config dbt\_project.yml ```yaml models: : +materialized: materialized_view +on_configuration_change: apply | continue | fail +cluster_by: | [] +partition_by: - field: - data_type: timestamp | date | datetime | int64 # only if `data_type` is not 'int64' - granularity: hour | day | month | year # only if `data_type` is 'int64' - range: - start: - end: - interval: +enable_refresh: true | false +refresh_interval_minutes: +max_staleness: +description: +labels: {: } +resource_tags: {: } +hours_to_expiration: +kms_key_name: ``` models/properties.yml ```yaml models: - name: [] config: materialized: materialized_view on_configuration_change: apply | continue | fail cluster_by: | [] partition_by: - field: - data_type: timestamp | date | datetime | int64 # only if `data_type` is not 'int64' - granularity: hour | day | month | year # only if `data_type` is 'int64' - range: - start: - end: - interval: enable_refresh: true | false refresh_interval_minutes: max_staleness: description: labels: {: } resource_tags: {: } hours_to_expiration: kms_key_name: ``` models/\.sql ```jinja {{ config( materialized='materialized_view', on_configuration_change="apply" | "continue" | "fail", cluster_by="" | [""], partition_by={ "field": "", "data_type": "timestamp" | "date" | "datetime" | "int64", # only if `data_type` is not 'int64' "granularity": "hour" | "day" | "month" | "year, # only if `data_type` is 'int64' "range": { "start": , "end": , "interval": , } }, # auto-refresh options enable_refresh= true | false, refresh_interval_minutes=, max_staleness="", # additional options description="", labels={ "": "", }, resource_tags={ "": "", }, hours_to_expiration=, kms_key_name="", ) }} ``` Many of these parameters correspond to their table counterparts and have been linked above. The set of parameters unique to materialized views covers [auto-refresh functionality](#auto-refresh). Learn more about these parameters in BigQuery's docs: * [CREATE MATERIALIZED VIEW statement](https://cloud.google.com/bigquery/docs/reference/standard-sql/data-definition-language#create_materialized_view_statement) * [materialized\_view\_option\_list](https://cloud.google.com/bigquery/docs/reference/standard-sql/data-definition-language#materialized_view_option_list) ##### Auto-refresh[​](#auto-refresh "Direct link to Auto-refresh") | Parameter | Type | Required | Default | Change Monitoring Support | | ---------------------------- | ------------ | -------- | ------- | ------------------------- | | `enable_refresh` | `` | no | `true` | alter | | `refresh_interval_minutes` | `` | no | `30` | alter | | `max_staleness` (in Preview) | `` | no | `none` | alter | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | BigQuery supports [automatic refresh](https://cloud.google.com/bigquery/docs/materialized-views-manage#automatic_refresh) configuration for materialized views. By default, a materialized view will automatically refresh within 5 minutes of changes in the base table, but not more frequently than once every 30 minutes. BigQuery only officially supports the configuration of the frequency (the "once every 30 minutes" frequency); however, there is a feature in preview that allows for the configuration of the staleness (the "5 minutes" refresh). dbt will monitor these parameters for changes and apply them using an `ALTER` statement. Learn more about these parameters in BigQuery's docs: * [materialized\_view\_option\_list](https://cloud.google.com/bigquery/docs/reference/standard-sql/data-definition-language#materialized_view_option_list) * [max\_staleness](https://cloud.google.com/bigquery/docs/materialized-views-create#max_staleness) ##### Limitations[​](#limitations "Direct link to Limitations") As with most data platforms, there are limitations associated with materialized views. Some worth noting include: * Materialized view SQL has a [limited feature set](https://cloud.google.com/bigquery/docs/materialized-views-create#supported-mvs). * Materialized view SQL cannot be updated; the materialized view must go through a `--full-refresh` (DROP/CREATE). * The `partition_by` clause on a materialized view must match that of the underlying base table. * While materialized views can have descriptions, materialized view *columns* cannot. * Recreating/dropping the base table requires recreating/dropping the materialized view. Find more information about materialized view limitations in Google's BigQuery [docs](https://cloud.google.com/bigquery/docs/materialized-views-intro#limitations). #### Python model configuration[​](#python-model-configuration "Direct link to Python model configuration") **Submission methods:** BigQuery supports a few different mechanisms to submit Python code, each with relative advantages. The `dbt-bigquery` adapter uses BigQuery DataFrames (BigFrames) or Dataproc. This process reads data from BigQuery, computes it either natively with BigQuery DataFrames or Dataproc, and writes the results back to BigQuery. * BigQuery DataFrames * Dataproc BigQuery DataFrames can execute pandas and scikit-learn. There's no need to manage infrastructure and leverages BigQuery-distributed query engines. It's great for analysts, data scientists, and machine learning engineers who want to manipulate big data using a pandas-like syntax. **Note:** BigQuery DataFrames run on Google Colab's default runtime. If no `default` runtime template is available, the adapter will automatically create one for you and mark it `default` for next time usage (assuming it has the right permissions). **BigQuery DataFrames setup:** ```bash # IAM permission if using service account #Create Service Account gcloud iam service-accounts create dbt-bigframes-sa #Grant BigQuery User Role gcloud projects add-iam-policy-binding ${GOOGLE_CLOUD_PROJECT} --member=serviceAccount:dbt-bigframes-sa@${GOOGLE_CLOUD_PROJECT}.iam.gserviceaccount.com --role=roles/bigquery.user #Grant BigQuery Data Editor role. This can be restricted at dataset level gcloud projects add-iam-policy-binding ${GOOGLE_CLOUD_PROJECT} --member=serviceAccount:dbt-bigframes-sa@${GOOGLE_CLOUD_PROJECT}.iam.gserviceaccount.com --role=roles/bigquery.dataEditor #Grant Service Account user gcloud projects add-iam-policy-binding ${GOOGLE_CLOUD_PROJECT} --member=serviceAccount:dbt-bigframes-sa@${GOOGLE_CLOUD_PROJECT}.iam.gserviceaccount.com --role=roles/iam.serviceAccountUser #Grant Colab Entperprise User gcloud projects add-iam-policy-binding ${GOOGLE_CLOUD_PROJECT} --member=serviceAccount:dbt-bigframes-sa@${GOOGLE_CLOUD_PROJECT}.iam.gserviceaccount.com --role=roles/aiplatform.colabEnterpriseUser ``` dbt\_project.yml ```yaml models: my_dbt_project: submission_method: bigframes ``` profiles.yml ```yaml my_dbt_project_sa: outputs: dev: compute_region: us-central1 dataset: gcs_bucket: job_execution_timeout_seconds: 300 job_retries: 1 keyfile: location: US method: service-account priority: interactive project: threads: 1 type: bigquery target: dev ``` Dataproc (`serverless` or pre-configured `cluster`) can execute Python models as PySpark jobs, reading from and writing to BigQuery. `serverless` is simpler but slower with limited configuration and pre-installed packages (`pandas`, `numpy`, `scikit-learn`), while `cluster` offers full control and faster runtimes. Good for complex, long-running batch pipelines and legacy Hadoop/Spark workflows but often slower for ad-hoc or interactive workloads. **Dataproc setup:** * Create or use an existing [Cloud Storage bucket](https://cloud.google.com/storage/docs/creating-buckets). * Enable Dataproc APIs for your project and region. * If using the `cluster` submission method: Create or use an existing [Dataproc cluster](https://cloud.google.com/dataproc/docs/guides/create-cluster) with the [Spark BigQuery connector initialization action](https://github.com/GoogleCloudDataproc/initialization-actions/tree/master/connectors#bigquery-connectors). (Google recommends copying the action into your own Cloud Storage bucket, rather than using the example version shown in the screenshot.) [![Add the Spark BigQuery connector as an initialization action](/img/docs/building-a-dbt-project/building-models/python-models/dataproc-connector-initialization.png?v=2 "Add the Spark BigQuery connector as an initialization action")](#)Add the Spark BigQuery connector as an initialization action The following configurations are needed to run Python models on Dataproc. You can add these to your [BigQuery profile](https://docs.getdbt.com/docs/local/connect-data-platform/bigquery-setup.md#running-python-models-on-dataproc) or configure them on specific Python models: * `gcs_bucket`: Storage bucket to which dbt will upload your model's compiled PySpark code. * `dataproc_region`: GCP region in which you have enabled Dataproc (for example `us-central1`). * `dataproc_cluster_name`: Name of Dataproc cluster to use for running Python model (executing PySpark job). Only required if `submission_method: cluster`. ```python def model(dbt, session): dbt.config( submission_method="cluster", dataproc_cluster_name="my-favorite-cluster" ) ... ``` ```yml models: - name: my_python_model config: submission_method: serverless ``` Python models running on Dataproc Serverless can be further configured in your [BigQuery profile](https://docs.getdbt.com/docs/local/connect-data-platform/bigquery-setup.md#running-python-models-on-dataproc). Any user or service account that runs dbt Python models will need the following permissions, in addition to the required BigQuery permissions: ```text dataproc.batches.create dataproc.clusters.use dataproc.jobs.create dataproc.jobs.get dataproc.operations.get dataproc.operations.list storage.buckets.get storage.objects.create storage.objects.delete ``` For more information, refer to [Dataproc IAM roles and permissions](https://cloud.google.com/dataproc/docs/concepts/iam/iam). **Installing packages:** Installation of third-party packages on Dataproc varies depending on whether it's a [cluster](https://cloud.google.com/dataproc/docs/guides/create-cluster) or [serverless](https://cloud.google.com/dataproc-serverless/docs). * **Dataproc Cluster** — Google recommends installing Python packages while creating the cluster via initialization actions: * [How initialization actions are used](https://github.com/GoogleCloudDataproc/initialization-actions/blob/master/README.md#how-initialization-actions-are-used) * [Actions for installing via `pip` or `conda`](https://github.com/GoogleCloudDataproc/initialization-actions/tree/master/python) You can also install packages at cluster creation time by [defining cluster properties](https://cloud.google.com/dataproc/docs/tutorials/python-configuration#image_version_20): `dataproc:pip.packages` or `dataproc:conda.packages`. * **Dataproc Serverless** — Google recommends using a [custom docker image](https://cloud.google.com/dataproc-serverless/docs/guides/custom-containers) to install thrid-party packages. The image needs to be hosted in [Google Artifact Registry](https://cloud.google.com/artifact-registry/docs). It can then be used by providing the image path in dbt profiles: profiles.yml ```yml my-profile: target: dev outputs: dev: type: bigquery method: oauth project: abc-123 dataset: my_dataset # for dbt Python models to be run on Dataproc Serverless gcs_bucket: dbt-python dataproc_region: us-central1 submission_method: serverless dataproc_batch: runtime_config: container_image: {HOSTNAME}/{PROJECT_ID}/{IMAGE}:{TAG} ``` [![Adding packages to install via pip at cluster startup](/img/docs/building-a-dbt-project/building-models/python-models/dataproc-pip-packages.png?v=2 "Adding packages to install via pip at cluster startup")](#)Adding packages to install via pip at cluster startup ##### Additional parameters[​](#additional-parameters "Direct link to Additional parameters") The BigQuery Python models also have the following additional configuration parameters: | Parameter | Type | Required | Default | Valid values | | ----------------------- | ----------- | -------- | --------- | -------------------------------------- | | `enable_list_inference` | `` | no | `True` | `True`, `False` | | `intermediate_format` | `` | no | `parquet` | `parquet`, `orc` | | `submission_method` | `` | no | \`\` | `serverless`, `bigframes`, `cluster` | | `notebook_template_id` | `` | no | \`\` | `` | | `compute_region` | `` | no | \`\` | `` | | `gcs_bucket` | `` | no | \`\` | `` | | `packages` | `` | no | \`\` | `['numpy<=1.1.1', 'pandas', 'mlflow']` | | `timeout` | `` | no | \`\` | `` | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | * The `enable_list_inference` parameter * The `enable_list_inference` parameter enables a PySpark data frame to read multiple records in the same operation. By default, this is set to `True` to support the default `intermediate_format` of `parquet`. * The `intermediate_format` parameter * The `intermediate_format` parameter specifies which file format to use when writing records to a table. The default is `parquet`. * The `submission_method` parameter * The `submission_method` parameter specifies whether the job will run on BigQuery DataFrames or Serverless Spark. `submission_method` is not required when `dataproc_cluster_name` is declared. * The `notebook_template_id` parameter * The `notebook_template_id` parameter specifies runtime template in Colab Enterprise. * The `compute_region` parameter * The `compute_region` parameter specifies the region of the job. * The `gcs_bucket` parameter * The `gcs_bucket` parameter specifies the GCS bucket used for storing artifacts for the job. * The `timeout` parameter * The `timeout` parameter specifies the maximum execution time in seconds for the Python model. This is particularly useful for BigFrames models that may require longer execution times for complex data processing or machine learning workloads. If not specified, the model will use the default timeout configured for the execution environment. **Related docs:** * [Dataproc overview](https://cloud.google.com/dataproc/docs/concepts/overview) * [Create a Dataproc cluster](https://cloud.google.com/dataproc/docs/guides/create-cluster) * [Create a Cloud Storage bucket](https://cloud.google.com/storage/docs/creating-buckets) * [PySpark DataFrame syntax](https://spark.apache.org/docs/latest/api/python/reference/pyspark.sql/api/pyspark.sql.DataFrame.html) #### Unit test limitations[​](#unit-test-limitations "Direct link to Unit test limitations") You must specify all fields in a BigQuery `STRUCT` for [unit tests](https://docs.getdbt.com/docs/build/unit-tests.md). You cannot use only a subset of fields in a `STRUCT`. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Cache ##### Cache population[​](#cache-population "Direct link to Cache population") At the start of runs, dbt caches metadata about all the objects in all the schemas where it might materialize resources (such as models). By default, dbt populates the relational cache with information on all schemas related to the project. There are two ways to optionally modify this behavior: * `POPULATE_CACHE` (default: `True`): Whether to populate the cache at all. To skip cache population entirely, use the `--no-populate-cache` flag or `DBT_POPULATE_CACHE: False`. Note that this does not *disable* the cache; missed cache lookups will run queries, and update the cache afterward. * `CACHE_SELECTED_ONLY` (default `False`): Whether to limit cache population to just the resources selected in the current run. This can offer significant speed improvements when running a small subset of a large project, while still providing the benefit of caching upfront. For example, to quickly compile a model that requires no database metadata or introspective queries: ```text dbt compile --no-populate-cache --select my_model_name ``` Or, to improve speed and performance while focused on developing Salesforce models, which are materialized into their own dedicated schema, you could select those models and pass the `cache-selected-only` flag: ```text dbt run --cache-selected-only --select salesforce ``` ##### Logging relational cache events[​](#logging-relational-cache-events "Direct link to Logging relational cache events") The `LOG_CACHE_EVENTS` config allows detailed logging for relational cache, which are disabled by default. ```text dbt compile --log-cache-events ``` #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Catalog JSON file **Current schema**: [`v1`](https://schemas.getdbt.com/dbt/catalog/v1.json) **Produced by:** This file contains information from your data warehouse about the tables and views produced and defined by the resources in your project. Today, dbt uses this file to populate metadata, such as column types and table statistics, in the [docs site](https://docs.getdbt.com/docs/explore/build-and-view-your-docs.md). ##### Top-level keys[​](#top-level-keys "Direct link to Top-level keys") * [`metadata`](https://docs.getdbt.com/reference/artifacts/dbt-artifacts.md#common-metadata) * `nodes`: Dictionary containing information about database objects corresponding to dbt models, seeds, and snapshots. * `sources`: Dictionary containing information about database objects corresponding to dbt sources. * `errors`: Errors received while running metadata queries during . ##### Resource details[​](#resource-details "Direct link to Resource details") Within `sources` and `nodes`, each dictionary key is a resource `unique_id`. Each nested resource contains: * `unique_id`: `..`, same as dictionary key, maps to `nodes` and `sources` in the [manifest](https://docs.getdbt.com/reference/artifacts/manifest-json.md) * `metadata` * `type`: table, view, etc. * `database` * `schema` * `name` * `comment` * `owner` * `columns` (array) * `name` * `type`: data type * `comment` * `index`: ordinal * `stats`: differs by database and relation type #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Caveats to state comparison The [`state:` selection method](https://docs.getdbt.com/reference/node-selection/methods.md#state) is a powerful feature, with a lot of underlying complexity. Below are a handful of considerations when setting up automated jobs that leverage state comparison. ##### Seeds[​](#seeds "Direct link to Seeds") dbt stores a file hash of seed files that are <1 MiB in size. If the contents of these seeds is modified, the seed will be included in `state:modified`. If a seed file is >1 MiB in size, dbt cannot compare its contents and will raise a warning as such. Instead, dbt will use only the seed's file path to detect changes. If the file path has changed, the seed will be included in `state:modified`; if it hasn't, it won't. ##### Macros[​](#macros "Direct link to Macros") dbt will mark modified any resource that depends on a changed macro, or on a macro that depends on a changed macro. ##### Vars[​](#vars "Direct link to Vars") If a model uses a `var` or `env_var` in its definition, dbt is unable to identify that lineage in such a way that it can include the model in `state:modified` because the `var` or `env_var` value has changed. It's likely that the model will be marked modified if the change in variable results in a different configuration. ##### Tests[​](#tests "Direct link to Tests") The command `dbt test -s state:modified` will include both: * tests that select from a new/modified resource * tests that are themselves new or modified As long as you're adding or changing tests at the same time that you're adding or changing the resources (models, seeds, snapshots) they select from, all should work the way you expect with "simple" state selection: ```shell dbt run -s "state:modified" dbt test -s "state:modified" ``` This can get complicated, however. If you add a new test without modifying its underlying model, or add a test that selects from a new model and an old unmodified one, you may need to test a model without having first run it. You can defer upstream references when testing. For example, if a test selects from a model that doesn't exist as a database object in your current environment, dbt will look to the other environment instead—the one defined in your state manifest. This enables you to use "simple" state selection without risk of query failure, but it may have some surprising consequences for tests with multiple parents. For instance, if you have a `relationships` test that depends on one modified model and one unmodified model, the test query will select from data "across" two different environments. If you limit or sample your data in development and CI, it may not make much sense to test for referential integrity, knowing there's a good chance of mismatch. If you're a frequent user of `relationships` tests or data tests, or frequently find yourself adding tests without modifying their underlying models, consider tweaking the selection criteria of your CI job. For instance: ```shell dbt run -s "state:modified" dbt test -s "state:modified" --exclude "test_name:relationships" ``` ##### Overwrites the `manifest.json`[​](#overwrites-the-manifestjson "Direct link to overwrites-the-manifestjson") dbt overwrites the `manifest.json` file during parsing, which means when you reference `--state` from the `target/ directory`, you may encounter a warning indicating that the saved manifest wasn't found. [![Saved manifest not found error](/img/docs/reference/saved-manifest-not-found.png?v=2 "Saved manifest not found error")](#)Saved manifest not found error During the next job run, dbt follows a sequence of steps that lead to the issue. First, it overwrites `target/manifest.json` before it can be used for change detection. Then, when dbt tries to read `target/manifest.json` again to detect changes, it finds none because the previous state has already been overwritten/erased. Avoid setting `--state` and `--target-path` to the same path with state-dependent features like `--defer` and `state:modified` as it can lead to non-idempotent behavior and won't work as expected. ###### Recommendation[​](#recommendation "Direct link to Recommendation") To prevent the `manifest.json` from being overwritten before dbt reads it for change detection, update your workflow using one of these methods: * Move the `manifest.json` to a dedicated folder (for example `state/`) after dbt generates it in the `target/ folder`. This makes sure dbt references the correct saved state instead of comparing the current state with the just-overwritten version. It also avoids issues caused by setting `--state` and `--target-path` to the same location, which can lead to non-idempotent behavior. * Write the manifest to a different `--target-path` in the build stage (where dbt would generate the `target/manifest.json`) or before it gets overwritten during job execution to avoid issues with change detection. This allows dbt to detect changes instead of comparing the current state with the just-overwritten version. * Pass the `--no-write-json` flag: `dbt ls --no-write-json --select state:modified --state target`: during the reproduction stage. ##### False positives[​](#false-positives "Direct link to False positives") ##### Final note[​](#final-note "Direct link to Final note") State comparison is complex. We hope to reach eventual consistency between all configuration options, as well as providing users with the control they need to reliably return all modified resources, and only the ones they expect. If you're interested in learning more, read [open issues tagged "state"](https://github.com/dbt-labs/dbt-core/issues?q=is%3Aopen+is%3Aissue+label%3Astate) in the dbt repository. #### Related docs[​](#related-docs "Direct link to Related docs") * [About state in dbt](https://docs.getdbt.com/reference/node-selection/state-selection.md) * [Configure state selection](https://docs.getdbt.com/reference/node-selection/configure-state.md) #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### check_cols dbt\_project.yml ```yml snapshots: : +strategy: check +check_cols: [column_name] | all ``` #### Description[​](#description "Direct link to Description") A list of columns within the results of your snapshot query to check for changes. Alternatively, use all columns using the `all` value (however this may be less performant). This parameter is **required if using the `check` [strategy](https://docs.getdbt.com/reference/resource-configs/strategy.md)**. #### Default[​](#default "Direct link to Default") No default is provided. #### Examples[​](#examples "Direct link to Examples") ##### Check a list of columns for changes[​](#check-a-list-of-columns-for-changes "Direct link to Check a list of columns for changes") ##### Check all columns for changes[​](#check-all-columns-for-changes "Direct link to Check all columns for changes") #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Checking version compatibility For the first several years of dbt Core's development, breaking changes were more common. For this reason, we encouraged setting [dbt version requirements](https://docs.getdbt.com/reference/project-configs/require-dbt-version.md) — especially if they use features that are newer or which may break in future versions of dbt Core. By default, if you run a project with an incompatible dbt version, dbt will raise an error. You can use the `VERSION_CHECK` config to disable this check and suppress the error message: ```text $ dbt run --no-version-check Running with dbt=1.0.0 Found 13 models, 2 tests, 1 archives, 0 analyses, 204 macros, 2 operations.... ``` dbt release tracks Starting in 2024, when you select a [release track in dbt](https://docs.getdbt.com/docs/dbt-versions/cloud-release-tracks.md) to receive ongoing dbt version upgrades, dbt will ignore the `require-dbt-version` config. dbt Labs is committed to zero breaking changes for code in dbt projects, with ongoing releases to dbt and new versions of dbt Core. We also recommend these best practices:  Installing dbt packages If you install dbt packages for use in your project, whether the package is maintained by your colleagues or a member of the open source dbt community, we recommend pinning the package to a specific revision or `version` boundary. dbt manages this out-of-the-box by *locking* the version/revision of packages in development in order to guarantee predictable builds in production. To learn more, refer to [Predictable package installs](https://docs.getdbt.com/reference/commands/deps.md#predictable-package-installs).  Maintaining dbt packages If you maintain dbt packages, whether on behalf of your colleagues or members of the open source community, we recommend writing defensive code that checks to verify that other required packages and global macros are available. For example, if your package depends on the availability of a `date_spine` macro in the global `dbt` namespace, you can write: models/some\_days.sql ```sql {% macro a_few_days_in_september() %} {% if not dbt.get('date_spine') %} {{ exceptions.raise_compiler_error("Expected to find the dbt.date_spine macro, but it could not be found") }} {% endif %} {{ date_spine("day", "cast('2020-01-01' as date)", "cast('2030-12-31' as date)") }} {% endmacro %} ``` #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### clean-targets dbt\_project.yml ```yml clean-targets: [directorypath] ``` #### Definition[​](#definition "Direct link to Definition") Optionally specify a custom list of directories to be removed by the `dbt clean` [command](https://docs.getdbt.com/reference/commands/clean.md). As such, you should only include directories containing artifacts (e.g. compiled files, logs, installed packages) in this list. #### Default[​](#default "Direct link to Default") If this configuration is not included in your `dbt_project.yml` file, the `clean` command will remove files in your [target-path](https://docs.getdbt.com/reference/global-configs/json-artifacts.md). #### Examples[​](#examples "Direct link to Examples") ##### Remove packages and compiled files as part of `dbt clean` (preferred)[​](#remove-packages-and-compiled-files-as-part-of-dbt-clean "Direct link to remove-packages-and-compiled-files-as-part-of-dbt-clean") To remove packages as well as compiled files, include the value of your [packages-install-path](https://docs.getdbt.com/reference/project-configs/packages-install-path.md) configuration in your `clean-targets` configuration. dbt\_project.yml ```yml clean-targets: - target - dbt_packages ``` Now, run `dbt clean`. Both the `target` and `dbt_packages` directory will be removed. Note: this is the configuration in the dbt [starter project](https://github.com/dbt-labs/dbt-starter-project/blob/HEAD/dbt_project.yml), which is generated by the `init` command. ##### Remove `logs` when running `dbt clean`[​](#remove-logs-when-running-dbt-clean "Direct link to remove-logs-when-running-dbt-clean") dbt\_project.yml ```yml clean-targets: [target, dbt_packages, logs] ``` #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### ClickHouse configurations #### ClickHouse configurations[​](#clickhouse-configurations "Direct link to ClickHouse configurations") ##### View materialization[​](#view-materialization "Direct link to View materialization") A dbt model can be created as a [ClickHouse view](https://clickhouse.com/docs/en/sql-reference/table-functions/view/) and configured using the following syntax: * Project YAML file * SQL file config dbt\_project.yml ```yaml models: : +materialized: view ``` models/\.sql ```jinja {{ config(materialized = "view") }} ``` ##### Table materialization[​](#table-materialization "Direct link to Table materialization") A dbt model can be created as a [ClickHouse table](https://clickhouse.com/docs/en/operations/system-tables/tables/) and configured using the following syntax: * Project YAML file * SQL file config dbt\_project.yml ```yaml models: : +materialized: table +order_by: [ , ... ] +engine: +partition_by: [ , ... ] ``` models/\.sql ```jinja {{ config( materialized = "table", engine = "", order_by = [ "", ... ], partition_by = [ "", ... ], ... ] ) }} ``` ###### Table configuration[​](#table-configuration "Direct link to Table configuration") | Option | Description | Required? | | -------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------- | --------------------------------- | | `materialized` | How the model will be materialized into ClickHouse. Must be `table` to create a table model. | Required | | `engine` | The table engine to use when creating tables. See list of supported engines below. | Optional (default: `MergeTree()`) | | `order_by` | A tuple of column names or arbitrary expressions. This allows you to create a small sparse index that helps find data faster. | Optional (default: `tuple()`) | | `partition_by` | A partition is a logical combination of records in a table by a specified criterion. The partition key can be any expression from the table columns. | Optional | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | For the complete list of configuration options, see the [ClickHouse documentation](https://clickhouse.com/docs/integrations/dbt). ##### Incremental materialization[​](#incremental-materialization "Direct link to Incremental materialization") Table model will be reconstructed for each dbt execution. This may be infeasible and extremely costly for larger result sets or complex transformations. To address this challenge and reduce the build time, a dbt model can be created as an incremental ClickHouse table and is configured using the following syntax: * Project file * SQL file config dbt\_project.yml ```yaml models: : +materialized: incremental +order_by: [ , ... ] +engine: +partition_by: [ , ... ] +unique_key: [ , ... ] +inserts_only: [ True|False ] ``` models/\.sql ```jinja {{ config( materialized = "incremental", engine = "", order_by = [ "", ... ], partition_by = [ "", ... ], unique_key = [ "", ... ], inserts_only = [ True|False ], ... ] ) }} ``` ###### Incremental table configuration[​](#incremental-table-configuration "Direct link to Incremental table configuration") | Option | Description | Required? | | ------------------------ | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------ | | `materialized` | How the model will be materialized into ClickHouse. Must be `table` to create a table model. | Required | | `unique_key` | A tuple of column names that uniquely identify rows. For more details on uniqueness constraints, see [here](https://docs.getdbt.com/docs/build/incremental-models.md#defining-a-unique-key-optional). | Required. If not provided altered rows will be added twice to the incremental table. | | `engine` | The table engine to use when creating tables. See list of supported engines below. | Optional (default: `MergeTree()`) | | `order_by` | A tuple of column names or arbitrary expressions. This allows you to create a small sparse index that helps find data faster. | Optional (default: `tuple()`) | | `partition_by` | A partition is a logical combination of records in a table by a specified criterion. The partition key can be any expression from the table columns. | Optional | | `inserts_only` | (Deprecated, see the `append` materialization strategy). If True, incremental updates will be inserted directly to the target incremental table without creating an intermediate table. | Optional (default: `False`) | | `incremental_strategy` | The strategy to use for incremental materialization. `delete+insert`, `append` and `insert_overwrite` (experimental) are supported. For additional details on strategies, see [here](https://github.com/ClickHouse/dbt-clickhouse#incremental-model-strategies) | Optional (default: 'default') | | `incremental_predicates` | Incremental predicate clause to be applied to `delete+insert` materializations | Optional | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | For the complete list of configuration options, see the [ClickHouse documentation](https://clickhouse.com/docs/integrations/dbt). #### Snapshot[​](#snapshot "Direct link to Snapshot") dbt snapshots allow a record to be made of changes to a mutable model over time. This in turn allows point-in-time queries on models, where analysts can “look back in time” at the previous state of a model. This functionality is supported by the ClickHouse connector and is configured using the following syntax: For more information on configuration, check out the [snapshot configs](https://docs.getdbt.com/reference/snapshot-configs.md) reference page. #### Learn more[​](#learn-more "Direct link to Learn more") The `dbt-clickhouse` adapter supports most dbt-native features like tests, snapshots, helper macros, and more. For a complete overview of supported features and best practices, see the [ClickHouse documentation](https://clickhouse.com/docs/integrations/dbt). #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Cloudera Hive configurations #### Configuring tables[​](#configuring-tables "Direct link to Configuring tables") When materializing a model as `table`, you may include several optional configs that are specific to the dbt-hive plugin, in addition to the standard [model configs](https://docs.getdbt.com/reference/model-configs.md). | Option | Description | Required? | Example | | --------------- | -------------------------------------------------------------------------------------------------------------------------------- | --------- | ------------------------------------ | | partition\_by | partition by a column, typically a directory per partition is created | No | partition\_by=\['name'] | | clustered\_by | second level division of a partitioned column | No | clustered\_by=\['age'] | | file\_format | underlying storage format of the table, see for supported formats | No | file\_format='PARQUET' | | location | storage location, typically an hdfs path | No | LOCATION='/user/etl/destination' | | comment | comment for the table | No | comment='this is the cleanest model' | | external | is this an external table - true / false | No | external=true | | tbl\_properties | any metadata can be stored as key/value pair with the table | No | tbl\_properties="('dbt\_test'='1')" | | table\_type | indicates the type of the table | No | table\_type="iceberg" | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | #### Incremental models[​](#incremental-models "Direct link to Incremental models") Supported modes for incremental model: * **`append`** (default): Insert new records without updating or overwriting any existing data. * **`insert_overwrite`**: For new records, insert data. When used along with partition clause, update data for changed record and insert data for new records. #### Example: Using partition\_by config option[​](#example-using-partition_by-config-option "Direct link to Example: Using partition_by config option") hive\_partition\_by.sql ```sql {{ config( materialized='table', unique_key='id', partition_by=['city'], ) }} with source_data as ( select 1 as id, "Name 1" as name, "City 1" as city, union all select 2 as id, "Name 2" as name, "City 2" as city, union all select 3 as id, "Name 3" as name, "City 2" as city, union all select 4 as id, "Name 4" as name, "City 1" as city, ) select * from source_data ``` In the above example, a sample table is created with partition\_by and other config options. One thing to note when using partition\_by option is that the select query should always have the column name used in partition\_by option as the last one, as can be seen for the `city` column name used in the above query. If the partition\_by clause is not the same as the last column in select statement, Hive will flag an error when trying to create the model. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Cloudera Impala configurations #### Configuring tables[​](#configuring-tables "Direct link to Configuring tables") When materializing a model as `table`, you may include several optional configs that are specific to the dbt-impala plugin, in addition to the standard [model configs](https://docs.getdbt.com/reference/model-configs.md). | Option | Description | Required? | Example | | ------------------- | --------------------------------------------------------------------- | --------- | -------------------------------------------------------- | | partition\_by | partition by a column, typically a directory per partition is created | No | partition\_by=\['name'] | | sort\_by | sort by a column | No | sort\_by=\['age'] | | row\_format | format to be used when storing individual arows | No | row\_format='delimited' | | stored\_as | underlying storage format of the table | No | stored\_as='PARQUET' | | location | storage location, typically an hdfs path | No | LOCATION='/user/etl/destination' | | comment | comment for the table | No | comment='this is the cleanest model' | | serde\_properties | SerDes (\[de-]serialization) properties of table | No | serde\_properties="('quoteChar'=''', 'escapeChar'='\\')" | | tbl\_properties | any metadata can be stored as key/value pair with the table | No | tbl\_properties="('dbt\_test'='1')" | | is\_cached | true or false - if this table is cached | No | is\_cached=false (default) | | cache\_pool | cache pool name to use if is\_cached is set to true | No | | | replication\_factor | cache replication factor to use if is\_cached is set to true | No | | | external | is this an external table - true / false | No | external=true | | table\_type | indicates the type of the table - iceberg / kudu | No | table\_type="iceberg" | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | For Cloudera specific options for above parameters see documentation of CREATE TABLE () #### Incremental models[​](#incremental-models "Direct link to Incremental models") Supported modes for incremental model: * **`append`** (default): Insert new records without updating or overwriting any existing data. * **`insert_overwrite`**: For new records, insert data. When used along with partition clause, update data for changed record and insert data for new records. Unsupported modes: * **`unique_key`** This is not suppored option for incremental models in dbt-impala * **`merge`**: Merge is not supported by the underlying warehouse, and hence not supported by dbt-impala #### Example: Using partition\_by config option[​](#example-using-partition_by-config-option "Direct link to Example: Using partition_by config option") impala\_partition\_by.sql ```sql {{ config( materialized='table', unique_key='id', partition_by=['city'], ) }} with source_data as ( select 1 as id, "Name 1" as name, "City 1" as city, union all select 2 as id, "Name 2" as name, "City 2" as city, union all select 3 as id, "Name 3" as name, "City 2" as city, union all select 4 as id, "Name 4" as name, "City 1" as city, ) select * from source_data ``` In the above example, a sample table is created with partition\_by and other config options. One thing to note when using partition\_by option is that the select query should always have the column name used in partition\_by option as the last one, as can be seen for the `city` column name used in the above query. If the partition\_by clause is not the same as the last column in select statement, Impala will flag an error when trying to create the model. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### column_types #### Description[​](#description "Direct link to Description") Optionally specify the database type of columns in a [seed](https://docs.getdbt.com/docs/build/seeds.md), by providing a dictionary where the keys are the column names, and the values are a valid datatype (this varies across databases). Without specifying this, dbt will infer the datatype based on the column values in your seed file. #### Usage[​](#usage "Direct link to Usage") Specify column types in your `dbt_project.yml` file: dbt\_project.yml ```yml seeds: jaffle_shop: country_codes: +column_types: country_code: varchar(2) country_name: varchar(32) ``` Or: seeds/properties.yml ```yml seeds: - name: country_codes config: column_types: country_code: varchar(2) country_name: varchar(32) ``` If you have previously run `dbt seed`, you'll need to run `dbt seed --full-refresh` for the changes to take effect. Note that you will need to use the fully directory path of a seed when configuring `column_types`. For example, for a seed file at `seeds/marketing/utm_mappings.csv`, you will need to configure it like so: dbt\_project.yml ```yml seeds: jaffle_shop: marketing: utm_mappings: +column_types: ... ``` #### Examples[​](#examples "Direct link to Examples") ##### Use a varchar column type to preserve leading zeros in a zipcode[​](#use-a-varchar-column-type-to-preserve-leading-zeros-in-a-zipcode "Direct link to Use a varchar column type to preserve leading zeros in a zipcode") dbt\_project.yml ```yml seeds: jaffle_shop: # you must include the project name warehouse_locations: +column_types: zipcode: varchar(5) ``` #### Recommendation[​](#recommendation "Direct link to Recommendation") Use this configuration only when required, i.e. when the type inference is not working as expected. Otherwise you can omit this configuration. #### Troubleshooting[​](#troubleshooting "Direct link to Troubleshooting") Note: The `column_types` configuration is case-sensitive, regardless of quoting configuration. If you specify a column as `Country_Name` in your Seed, you should reference it as `Country_Name`, and not `country_name`. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### columns * Models * Sources * Seeds * Snapshots * Analyses models/\.yml ```yml models: - name: columns: - name: data_type: description: quote: true | false data_tests: ... config: tags: ... meta: ... - name: ... ``` models/\.yml ```yml sources: - name: tables: - name: columns: - name: description: data_type: quote: true | false data_tests: ... config: tags: ... meta: ... - name: ... ``` seeds/\.yml ```yml seeds: - name: columns: - name: description: data_type: quote: true | false data_tests: ... config: tags: ... meta: ... - name: ... ``` snapshots/\.yml ```yml snapshots: - name: columns: - name: description: data_type: quote: true | false data_tests: ... config: tags: ... meta: ... - name: ``` analyses/\.yml ```yml analyses: - name: columns: - name: description: data_type: - name: ``` Columns are not resources in and of themselves. Instead, they are child properties of another resource type. They can define sub-properties that are similar to properties defined at the resource level: * `tags` * `meta` * `data_tests` * `description` Because columns are not resources, their `tags` and `meta` properties are not true configurations even when nested under a `config` block. They do not inherit the `tags` or `meta` values of their parent resources. However, you can select a generic test, defined on a column, using tags applied to its column or top-level resource; see [test selection examples](https://docs.getdbt.com/reference/node-selection/test-selection-examples.md#run-tests-on-tagged-columns). Columns may optionally define a `data_type`, which is necessary for: * Enforcing a model [contract](https://docs.getdbt.com/reference/resource-configs/contract.md) * Use in other packages or plugins, such as the [`external`](https://docs.getdbt.com/reference/resource-properties/external.md) property of sources and [`dbt-external-tables`](https://hub.getdbt.com/dbt-labs/dbt_external_tables/latest/) ##### `quote`[​](#quote "Direct link to quote") The `quote` field can be used to enable or disable quoting for column names. * Models * Sources * Seeds * Snapshots * Analyses models/schema.yml ```yml models: - name: model_name columns: - name: column_name quote: true | false ``` models/schema.yml ```yml sources: - name: source_name tables: - name: table_name columns: - name: column_name quote: true | false ``` seeds/schema.yml ```yml seeds: - name: seed_name columns: - name: column_name quote: true | false ``` snapshots/schema.yml ```yml snapshots: - name: snapshot_name columns: - name: column_name quote: true | false ``` analysis/schema.yml ```yml analyses: - name: analysis_name columns: - name: column_name quote: true | false ``` ##### Default[​](#default "Direct link to Default") The default quoting value is `false` ##### Explanation[​](#explanation "Direct link to Explanation") This is particularly relevant to those using Snowflake, where quoting can be particularly fickle. This property is useful when: * A source table has a column that needs to be quoted to be selected, for example, to preserve column casing * A seed was created with `quote_columns: true` ([docs](https://docs.getdbt.com/reference/resource-configs/quote_columns.md)) on Snowflake * A model uses quotes in the SQL, potentially to work around the use of reserved words ```sql select user_group as "group" ``` Without setting `quote: true`: * [Data tests](https://docs.getdbt.com/docs/build/data-tests.md) applied to this column may fail due to invalid SQL * Documentation may not render correctly, e.g. `group` and `"group"` may not be matched as the same column name. ##### Example[​](#example "Direct link to Example") ###### Add data tests to a quoted column in a source table[​](#add-data-tests-to-a-quoted-column-in-a-source-table "Direct link to Add data tests to a quoted column in a source table") This is especially relevant if using Snowflake: ```yml sources: - name: stripe tables: - name: payment columns: - name: orderID quote: true data_tests: - not_null ``` Without `quote: true`, the following error will occur: ```text $ dbt test -s source:stripe.* Running with dbt=0.16.1 Found 7 models, 22 tests, 0 snapshots, 0 analyses, 130 macros, 0 operations, 0 seed files, 4 sources 13:33:37 | Concurrency: 4 threads (target='learn') 13:33:37 | 13:33:37 | 1 of 1 START test source_not_null_stripe_payment_order_id............ [RUN] 13:33:39 | 1 of 1 ERROR source_not_null_stripe_payment_order_id................. [ERROR in 1.89s] 13:33:39 | 13:33:39 | Finished running 1 tests in 6.43s. Completed with 1 error and 0 warnings: Database Error in test source_not_null_stripe_payment_order_id (models/staging/stripe/src_stripe.yml) 000904 (42000): SQL compilation error: error line 3 at position 6 invalid identifier 'ORDERID' compiled SQL at target/compiled/jaffle_shop/schema_test/source_not_null_stripe_payment_orderID.sql ``` This is because dbt is trying to run: ```sql select count(*) from raw.stripe.payment where orderID is null ``` Instead of: ```sql select count(*) from raw.stripe.payment where "orderID" is null ``` #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Command line options For consistency, command-line interface (CLI) flags should come right after the `dbt` prefix and its subcommands. This includes "global" flags (supported for all commands). For a list of all dbt CLI flags you can set, refer to [Available flags](https://docs.getdbt.com/reference/global-configs/about-global-configs.md#available-flags). When set, CLI flags override [environment variables](https://docs.getdbt.com/reference/global-configs/environment-variable-configs.md) and [project flags](https://docs.getdbt.com/reference/global-configs/project-flags.md). Environment variables contain a `DBT_` prefix. For example, instead of using: ```bash dbt --no-populate-cache run ``` You should use: ```bash dbt run --no-populate-cache ``` Historically, passing flags (such as "global flags") *before* the subcommand is a legacy functionality that dbt Labs can remove at any time. We do not support using the same flag before and after the subcommand. #### Using boolean and non-boolean flags[​](#using-boolean-and-non-boolean-flags "Direct link to Using boolean and non-boolean flags") You can construct your commands with boolean flags to enable or disable or with non-boolean flags that use specific values, such as strings. * Non-boolean config * Boolean config Use this non-boolean config structure: * Replacing `` with the command this config applies to. * `` with the config you are enabling or disabling, and * `` with the new setting for the config. CLI flags ```text --= ``` ##### Example[​](#example "Direct link to Example") CLI flags ```text dbt run --printer-width=80 dbt test --indirect-selection=eager ``` To enable or disable boolean configs: * Use `` this config applies to. * Followed by `--` to turn it on, or `--no-` to turn it off. * Replace `` with the config you are enabling or disabling CLI flags ```text dbt -- dbt --no- ``` ##### Example[​](#example-1 "Direct link to Example") CLI flags ```text dbt run --version-check dbt run --no-version-check ``` #### Config precedence[​](#config-precedence "Direct link to Config precedence") There are multiple ways of setting flags, which depend on the use case: * **[Project-level `flags` in `dbt_project.yml`](https://docs.getdbt.com/reference/global-configs/project-flags.md):** Define version-controlled defaults for everyone running this project. Also, opt in or opt out of [behavior changes](https://docs.getdbt.com/reference/global-configs/behavior-changes.md) to manage your migration off legacy functionality. * **[Environment variables](https://docs.getdbt.com/reference/global-configs/environment-variable-configs.md):** Define different behavior in different runtime environments (development vs. production vs. [continuous integration](https://docs.getdbt.com/docs/deploy/continuous-integration.md), or different behavior for different users in development (based on personal preferences). * **[CLI options](https://docs.getdbt.com/reference/global-configs/command-line-options.md):** Define behavior specific to *this invocation*. Supported for all dbt commands. The most specific setting "wins." If you set the same flag in all three places, the CLI option will take precedence, followed by the environment variable, and finally, the value in `dbt_project.yml`. If you set the flag in none of those places, it will use the default value defined within dbt. Most flags can be set in all three places: ```yaml # dbt_project.yml flags: # set default for running this project -- anywhere, anytime, by anyone fail_fast: true ``` ```bash # set this environment variable to 'True' (bash syntax) export DBT_FAIL_FAST=1 dbt run ``` ```bash dbt run --fail-fast # set to True for this specific invocation dbt run --no-fail-fast # set to False ``` There are two categories of exceptions: 1. **Flags setting file paths:** Flags for file paths that are relevant to runtime execution (for example, `--log-path` or `--state`) cannot be set in `dbt_project.yml`. To override defaults, pass CLI options or set environment variables (`DBT_LOG_PATH`, `DBT_STATE`). Flags that tell dbt where to find project resources (for example, `model-paths`) are set in `dbt_project.yml`, but as a top-level key, outside the `flags` dictionary; these configs are expected to be fully static and never vary based on the command or execution environment. 2. **Opt-in flags:** Flags opting in or out of [behavior changes](https://docs.getdbt.com/reference/global-configs/behavior-changes.md) can *only* be defined in `dbt_project.yml`. These are intended to be set in version control and migrated via pull/merge request. Their values should not diverge indefinitely across invocations, environments, or users. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### concurrent_batches 💡Did you know... Available from dbt v 1.9 or with the [dbt "Latest" release track](https://docs.getdbt.com/docs/dbt-versions/cloud-release-tracks.md). * Project YAML file * SQL file config dbt\_project.yml ```yaml models: +concurrent_batches: true ``` models/my\_model.sql ```sql {{ config( materialized='incremental', concurrent_batches=true, incremental_strategy='microbatch' ... ) }} select ... ``` #### Definition[​](#definition "Direct link to Definition") `concurrent_batches` is an override which allows you to decide whether or not you want to run batches in parallel or sequentially (one at a time). For more information, refer to [how batch execution works](https://docs.getdbt.com/docs/build/parallel-batch-execution.md#how-parallel-batch-execution-works). #### Example[​](#example "Direct link to Example") By default, dbt auto-detects whether batches can run in parallel for microbatch models. However, you can override dbt's detection by setting the `concurrent_batches` config to `false` in your `dbt_project.yml` or model `.sql` file to specify parallel or sequential execution, given you meet these conditions: * You've configured a [microbatch incremental strategy](https://docs.getdbt.com/docs/build/incremental-microbatch.md). * You're working with cumulative metrics or any logic that depends on batch order. Set `concurrent_batches` config to `false` to ensure batches are processed sequentially. For example: dbt\_project.yml ```yaml models: my_project: cumulative_metrics_model: +concurrent_batches: false ``` models/my\_model.sql ```sql {{ config( materialized='incremental', incremental_strategy='microbatch' concurrent_batches=false ) }} select ... ``` #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### config-version The `config-version:` tag is optional. dbt\_project.yml ```yml config-version: 2 ``` #### Definition[​](#definition "Direct link to Definition") Specify your `dbt_project.yml` as using the v2 structure. #### Default[​](#default "Direct link to Default") Without this configuration, dbt will assume your `dbt_project.yml` uses the version 2 syntax. Version 1 has been deprecated. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Configurations and properties, what are they? Understand the difference between properties and configurations in dbt: properties describe resources, while configurations control how dbt builds them in the warehouse. Resources in your project—models, snapshots, seeds, tests, and the rest—can have a number of declared *properties*. Resources can also define *configurations* (configs), which are a special kind of property that bring extra abilities. What's the distinction? * Properties are declared for resources one-by-one in `properties.yml` files. Configs can be defined there, nested under a `config` property. They can also be set one-by-one via a `config()` macro (right within `.sql` files), and for many resources at once in `dbt_project.yml`. * Because configs can be set in multiple places, they are also applied hierarchically. An individual resource might *inherit* or *override* configs set elsewhere. * You can select resources based on their config values using the `config:` selection method, but not the values of non-config properties. * There are slightly different naming conventions for properties and configs depending on the file type. Refer to [naming convention](https://docs.getdbt.com/reference/dbt_project.yml.md#naming-convention) for more details. A rule of thumb: properties declare things *about* your project resources; configs go the extra step of telling dbt *how* to build those resources in your warehouse. This is generally true, but not always, so it's always good to check! For example, you can use resource **properties** to: * Describe models, snapshots, seed files, and their columns * Assert "truths" about a model, in the form of [data tests](https://docs.getdbt.com/docs/build/data-tests.md), e.g. "this `id` column is unique" * Define official downstream uses of your data models, in the form of [exposures](https://docs.getdbt.com/docs/build/exposures.md), and assert an exposure's "type" Whereas you can use **configurations** to: * Change how a model will be materialized (table, view, incremental, etc) * Declare where a seed will be created in the database (`..`) * Declare whether a resource should persist its descriptions as comments in the database * Apply tags and meta to a resource #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Configure state selection State and [defer](https://docs.getdbt.com/reference/node-selection/defer.md) can be set by environment variables as well as CLI flags: * `--state` or `DBT_STATE`: file path * `--defer` or `DBT_DEFER`: boolean * `--defer-state` or `DBT_DEFER_STATE`: file path to use for deferral only (optional) If `--defer-state` is not specified, deferral will use the artifacts supplied by `--state`. This enables more granular control in cases where you want to compare against logical state from one environment or past point in time, and defer to applied state from a different environment or point in time. If both the flag and env var are provided, the flag takes precedence. ###### Notes[​](#notes "Direct link to Notes") * The `--state` artifacts must be of schema versions that are compatible with the currently running dbt version. * These are powerful, complex features. Read about [known caveats and limitations](https://docs.getdbt.com/reference/node-selection/state-comparison-caveats.md) to state comparison. Syntax deprecated In [dbt v1.5](), we deprecated the original syntax for state (`DBT_ARTIFACT_STATE_PATH`) and defer (`DBT_DEFER_TO_STATE`). Although dbt supports backward compatibility with the old syntax, we will remove it in a future release that we have not yet determined. ##### The "result" status[​](#the-result-status "Direct link to The \"result\" status") Another element of job state is the `result` of a prior dbt invocation. After executing a `dbt run`, for example, dbt creates the `run_results.json` artifact which contains execution times and success / error status for dbt models. You can read more about `run_results.json` on the ['run results'](https://docs.getdbt.com/reference/artifacts/run-results-json.md) page. The following dbt commands produce `run_results.json` artifacts whose results can be referenced in subsequent dbt invocations: * `dbt run` * `dbt test` * `dbt build` * `dbt seed` After issuing one of the above commands, you can reference the results by adding a selector to a subsequent command as follows: ```bash # You can also set the DBT_STATE environment variable instead of the --state flag. dbt run --select "result:" --defer --state path/to/prod/artifacts ``` The available options depend on the resource (node) type: | `result:\` | model | seed | snapshot | test | | ------------------ | ----- | ---- | -------- | ---- | | `result:error` | ✅ | ✅ | ✅ | ✅ | | `result:success` | ✅ | ✅ | ✅ | | | `result:skipped` | ✅ | | ✅ | ✅ | | `result:fail` | | | | ✅ | | `result:warn` | | | | ✅ | | `result:pass` | | | | ✅ | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | ##### Combining `state` and `result` selectors[​](#combining-state-and-result-selectors "Direct link to combining-state-and-result-selectors") The state and result selectors can also be combined in a single invocation of dbt to capture errors from a previous run OR any new or modified models. ```bash dbt run --select "result:+" state:modified+ --defer --state ./ ``` ##### The "source\_status" status[​](#the-source_status-status "Direct link to The \"source_status\" status") Another element of job state is the `source_status` of a prior dbt invocation. After executing `dbt source freshness`, for example, dbt creates the `sources.json` artifact which contains execution times and `max_loaded_at` dates for dbt sources. You can read more about `sources.json` on the ['sources'](https://docs.getdbt.com/reference/artifacts/sources-json.md) page. The `dbt source freshness` command produces a `sources.json` artifact whose results can be referenced in subsequent dbt invocations. When a job is selected, dbt will surface the artifacts from that job's most recent successful run. dbt will then use those artifacts to determine the set of fresh sources. In your job commands, you can signal dbt to run and test only on the fresher sources and their children by including the `source_status:fresher+` argument. This requires both the previous and current states to have the `sources.json` artifact available. Or plainly said, both job states need to run `dbt source freshness`. After issuing the `dbt source freshness` command, you can reference the source freshness results by adding a selector to a subsequent command: ```bash # You can also set the DBT_STATE environment variable instead of the --state flag. dbt source freshness # must be run again to compare current to previous state dbt build --select "source_status:fresher+" --state path/to/prod/artifacts ``` For more example commands, refer to [Pro-tips for workflows](https://docs.getdbt.com/best-practices/best-practice-workflows.md#pro-tips-for-workflows). #### Related docs[​](#related-docs "Direct link to Related docs") * [About state in dbt](https://docs.getdbt.com/reference/node-selection/state-selection.md) * [State comparison caveats](https://docs.getdbt.com/reference/node-selection/state-comparison-caveats.md) #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Configuring quoting in projects dbt\_project.yml ```yml quoting: database: true | false schema: true | false identifier: true | false snowflake_ignore_case: true | false # Fusion-only config. Aligns with Snowflake's session parameter QUOTED_IDENTIFIERS_IGNORE_CASE behavior. # Ignored by dbt Core and other adapters. ``` #### Definition[​](#definition "Direct link to Definition") You can optionally enable quoting in a dbt project to control whether dbt wraps database, schema, or identifier names in quotes when generating SQL. dbt uses this configuration when: * Creating relations (For example, tables or views) * Resolving a `ref()` function to a direct relation reference BigQuery terminology Note that for BigQuery quoting configuration, `database` and `schema` should be used here, though these configs will apply to `project` and `dataset` names respectively #### Default[​](#default "Direct link to Default") The default values vary by database. * Default * Snowflake For most adapters, quoting is set to `true` by default. Why? It's equally easy to select from relations with quoted or unquoted identifiers. Quoting allows you to use reserved words and special characters in those identifiers, though we recommend avoiding reserved words and special characters in identifiers whenever possible. dbt\_project.yml ```yml quoting: database: true schema: true identifier: true ``` For Snowflake, quoting is set to `false` by default. Creating relations with quoted identifiers also makes those identifiers case sensitive. It's much more difficult to select from them. You can re-enable quoting for relations identifiers that are case sensitive, reserved words, or contain special characters, but we recommend you avoid this as much as possible. dbt\_project.yml ```yml quoting: database: false schema: false identifier: false snowflake_ignore_case: false # Fusion-only config. Aligns with Snowflake's session parameter QUOTED_IDENTIFIERS_IGNORE_CASE behavior. # Ignored by dbt Core and other adapters. ``` #### Examples[​](#examples "Direct link to Examples") Set quoting to `false` for a project: dbt\_project.yml ```yml quoting: database: false schema: false identifier: false snowflake_ignore_case: false # Fusion-only config. Aligns with Snowflake's session parameter QUOTED_IDENTIFIERS_IGNORE_CASE behavior. # Ignored by dbt Core and other adapters. ``` dbt will then create relations without quotes: ```sql create table analytics.dbt_alice.dim_customers ``` #### Recommendations[​](#recommendations "Direct link to Recommendations") ##### Snowflake[​](#snowflake "Direct link to Snowflake") If you're using Snowflake, we recommend: * Setting all quoting configs to `False` in your [`dbt_project.yml`](https://docs.getdbt.com/reference/dbt_project.yml.md) to avoid quoting model and column names unnecessarily and to help prevent case sensitivity issues. * Setting all quoting configs to `False` also means you cannot use reserved words as identifiers, such as model or table names. We recommend you avoid using these reserved words anyway. * If you're using Fusion and your Snowflake environment sets the session parameter `QUOTED_IDENTIFIERS_IGNORE_CASE = true` (for example, in an orchestrator or pre-hook), you should also enable quoting and `snowflake_ignore_case` in your `dbt_project.yml` to preserve the exact case of database, schema, and identifier: ```yml quoting: database: true schema: true identifier: true snowflake_ignore_case: true # Fusion-only config. Aligns with Snowflake's session parameter QUOTED_IDENTIFIERS_IGNORE_CASE behavior. # Ignored by dbt Core and other adapters. ``` Setting `snowflake_ignore_case: true` ensures that dbt compiles column and identifier names match Snowflake’s behavior at runtime, preserving parity between compile-time and runtime logic. Without this, you may encounter "column not found" errors. Quoting a source If a Snowflake source table uses a quoted database, schema, or table identifier, you can configure this in the source.yml file. Refer to [configuring quoting](https://docs.getdbt.com/reference/resource-properties/quoting.md) for more information. ###### Explanation[​](#explanation "Direct link to Explanation") dbt skips quoting on Snowflake so lowercase model names work seamlessly in downstream queries and BI tools without worrying about case or quotes. Unlike most databases (which lowercase unquoted identifiers), Snowflake uppercases them. When you quote identifiers, Snowflake will preserve their case and make them case-sensitive. This means when you create a table with quoted, lowercase identifiers, the table should always be referenced with quotes and use the exact same case, which can easily break downstream queries in BI tools or ad-hoc SQL. Because dbt conventions use lowercase model and file names, quoting them in Snowflake risks breaking downstream queries in BI tools or ad-hoc SQL. If dbt instead used uppercase names by convention, the safe defaults for other databases would be at risk of breaking downstream queries. snowflake\_casing.sql ```sql /* Run these queries to understand how Snowflake handles casing and quoting. */ -- This is the output of an example `orders.sql` model with quoting enabled create table "analytics"."orders" as ( select 1 as id ); /* These queries WILL NOT work! Since the table above was created with quotes, Snowflake created the orders table with a lowercase schema and identifier. Since unquoted identifiers are automatically uppercased, both of the following queries are equivalent, and neither will work correctly. */ select * from analytics.orders; select * from ANALYTICS.ORDERS; /* To query this table, you'll need to quote the schema and table. This query should indeed complete without error. */ select * from "analytics"."orders"; /* To avoid this quoting madness, you can disable quoting for schemas and identifiers in your dbt_project.yml file. This means that you won't be able to use reserved words as model names, but you should avoid that anyway! Assuming schema and identifier quoting is disabled, the following query would indeed work: */ select * from analytics.orders; ``` ##### Other warehouses[​](#other-warehouses "Direct link to Other warehouses") Leave the default values for your warehouse. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Configuring quoting in sources models/\.yml ```yml sources: - name: jaffle_shop quoting: database: true | false schema: true | false identifier: true | false tables: - name: orders quoting: database: true | false schema: true | false identifier: true | false ``` #### Definition[​](#definition "Direct link to Definition") Optionally configure whether dbt should quote databases, schemas, and identifiers when resolving a `{{ source() }}` function to a direct relation reference. This config can be specified for all tables in a source, or for a specific source table. Quoting configs defined for a specific source table override the quoting configs specified for the top-level source. BigQuery Terminology Note that for BigQuery quoting configuration, `database` and `schema` should be used here, though these configs will apply to `project` and `dataset` names respectively #### Default[​](#default "Direct link to Default") The default values vary by database. For most adapters, quoting is set to *true* by default. Why? It's equally easy to select from relations with quoted or unquoted identifiers. Quoting allows you to use reserved words and special characters in those identifiers, though we recommend avoiding this whenever possible. On Snowflake, quoting is set to *false* by default. Creating relations with quoted identifiers also makes those identifiers case sensitive. It's much more difficult to select from them. You can re-enable quoting for relations identifiers that are case sensitive, reserved words, or contain special characters, but we recommend you avoid this as much as possible. #### Example[​](#example "Direct link to Example") models/\.yml ```yaml sources: - name: jaffle_shop database: raw quoting: database: true schema: true identifier: true tables: - name: orders - name: customers # This overrides the `jaffle_shop` quoting config quoting: identifier: false ``` In a downstream model: models/\.yml ```sql select ... -- this should be quoted from {{ source('jaffle_shop', 'orders') }} -- here, the identifier should be unquoted left join {{ source('jaffle_shop', 'customers') }} using (order_id) ``` This will get compiled to: ```sql select ... -- this should be quoted from "raw"."jaffle_shop"."orders" -- here, the identifier should be unquoted left join "raw"."jaffle_shop".customers using (order_id) ``` #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### constraints Constraints are a feature of many data platforms. When specified, the platform will perform additional validation on data as it is being populated in a new table or inserted into a preexisting table. If the validation fails, the table creation or update fails, the operation is rolled back, and you will see a clear error message. When enforced, a constraint guarantees that you will never see invalid data in the table materialized by your model. Enforcement varies significantly by data platform. #### Prerequisites[​](#prerequisites "Direct link to Prerequisites") Before using constraints, ensure the following requirements are met: * **You use supported materializations** — Constraints only work on `table` and `incremental` models. Constraints are never applied on `ephemeral` models or those materialized as `view`. * **You enforce a contract** — To use constraints, your model must declare and enforce a [contract](https://docs.getdbt.com/reference/resource-configs/contract.md). This means you need to explicitly define the `data_type` for every column in your model's schema configuration. #### Defining constraints[​](#defining-constraints "Direct link to Defining constraints") Constraints may be defined for a single column, or at the model level for one or more columns. As a general rule, we recommend defining single-column constraints directly on those columns. If you define multiple `primary_key` constraints for a single model, those *must* be defined at the model level. Defining multiple `primary_key` constraints at the column level is not supported. The structure of a constraint is: * `type` (required): one of `not_null`, `unique`, `primary_key`, `foreign_key`, `check`, `custom` * `expression`: Free text input to qualify the constraint. Required for certain constraint types, and optional for others. * `name` (optional): Human-friendly name for this constraint. Supported by some data platforms. * `columns` (model-level only): List of column names to apply the constraint over. #### Platform-specific support[​](#platform-specific-support "Direct link to Platform-specific support") In transactional databases, it is possible to define "constraints" on the allowed values of certain columns, stricter than just the data type of those values. For example, Postgres supports and enforces all the constraints in the ANSI SQL standard (`not null`, `unique`, `primary key`, `foreign key`), plus a flexible row-level `check` constraint that evaluates to a boolean expression. Most analytical data platforms support and enforce a `not null` constraint, but they either do not support or do not enforce the rest. It is sometimes still desirable to add an "informational" constraint, knowing it is *not* enforced, for the purpose of integrating with legacy data catalog or entity-relation diagram tools ([dbt-core#3295](https://github.com/dbt-labs/dbt-core/issues/3295)). Some data platforms can optionally use primary or foreign key constraints for query optimization if you specify an additional keyword. To that end, there are two optional fields you can specify on any filter: * `warn_unenforced: False` to skip warning on constraints that are supported, but not enforced, by this data platform. The constraint will be included in templated DDL. * `warn_unsupported: False` to skip warning on constraints that aren't supported by this data platform, and therefore won't be included in templated DDL. - Postgres - Redshift - Snowflake - BigQuery - Databricks * PostgreSQL constraints documentation: [here](https://www.postgresql.org/docs/current/ddl-constraints.html#id-1.5.4.6.6) models/constraints\_example.sql ```sql {{ config( materialized = "table" ) }} select 1 as id, 'My Favorite Customer' as customer_name, cast('2019-01-01' as date) as first_transaction_date ``` models/schema.yml ```yml models: - name: dim_customers config: contract: enforced: true columns: - name: id data_type: int constraints: - type: not_null - type: primary_key - type: check expression: "id > 0" - name: customer_name data_type: text - name: first_transaction_date data_type: date ``` Expected DDL to enforce constraints: target/run/.../constraints\_example.sql ```sql create table "database_name"."schema_name"."constraints_example__dbt_tmp" ( id integer not null primary key check (id > 0), customer_name text, first_transaction_date date ) ; insert into "database_name"."schema_name"."constraints_example__dbt_tmp" ( id, customer_name, first_transaction_date ) ( select 1 as id, 'My Favorite Customer' as customer_name, cast('2019-01-01' as date) as first_transaction_date ); ``` Redshift currently only enforces `not null` constraints; all other constraints are metadata only. Additionally, Redshift does not allow column checks at the time of table creation. See more in the Redshift documentation [here](https://docs.aws.amazon.com/redshift/latest/dg/t_Defining_constraints.html). models/constraints\_example.sql ```sql {{ config( materialized = "table" ) }} select 1 as id, 'My Favorite Customer' as customer_name, cast('2019-01-01' as date) as first_transaction_date ``` models/schema.yml ```yml models: - name: dim_customers config: contract: enforced: true columns: - name: id data_type: integer constraints: - type: not_null - type: primary_key # not enforced -- will warn & include - type: check # not supported -- will warn & skip expression: "id > 0" data_tests: - unique # primary_key constraint is not enforced - name: customer_name data_type: varchar - name: first_transaction_date data_type: date ``` Note that Redshift limits the maximum length of the `varchar` values to 256 characters by default (or when specified without a length). This means that any string data exceeding 256 characters might get truncated *or* return a "value too long for character type" error. To allow the maximum length, use `varchar(max)`. For example, `data_type: varchar(max)`. Expected DDL to enforce constraints: target/run/.../constraints\_example.sql ```sql create table "database_name"."schema_name"."constraints_example__dbt_tmp" ( id integer not null, customer_name varchar, first_transaction_date date, primary key(id) ) ; insert into "database_name"."schema_name"."constraints_example__dbt_tmp" ( select 1 as id, 'My Favorite Customer' as customer_name, cast('2019-01-01' as date) as first_transaction_date ); ``` * Snowflake constraints documentation: [here](https://docs.snowflake.com/en/sql-reference/constraints-overview.html) * Snowflake data types: [here](https://docs.snowflake.com/en/sql-reference/intro-summary-data-types.html) Snowflake supports four types of constraints: `unique`, `not null`, `primary key`, and `foreign key`. It is important to note that only the `not null` (and the `not null` property of `primary key`) are actually checked at present. The rest of the constraints are purely metadata, not verified when inserting data. Although Snowflake does not validate `unique`, `primary`, or `foreign_key` constraints, you may optionally instruct Snowflake to use them for query optimization by specifying [`rely`](https://docs.snowflake.com/en/user-guide/join-elimination) in the constraint `expression` field. Currently, Snowflake doesn't support the `check` syntax and dbt will skip the `check` config and raise a warning message if it is set on some models in the dbt project. models/constraints\_example.sql ```sql {{ config( materialized = "table" ) }} select 1 as id, 'My Favorite Customer' as customer_name, cast('2019-01-01' as date) as first_transaction_date ``` models/schema.yml ```yml models: - name: dim_customers config: contract: enforced: true columns: - name: id data_type: integer description: hello constraints: - type: not_null - type: primary_key # not enforced -- will warn & include - type: check # not supported -- will warn & skip expression: "id > 0" data_tests: - unique # need this test because primary_key constraint is not enforced - name: customer_name data_type: text - name: first_transaction_date data_type: date ``` Expected DDL to enforce constraints: target/run/.../constraints\_example.sql ```sql create or replace transient table ..constraints_model ( id integer not null primary key, customer_name text, first_transaction_date date ) as ( select 1 as id, 'My Favorite Customer' as customer_name, cast('2019-01-01' as date) as first_transaction_date ); ``` BigQuery allows defining and enforcing `not null` constraints, and defining (but *not* enforcing) `primary key` and `foreign key` constraints (which can be used for query optimization). BigQuery does not support defining or enforcing other constraints. For more information, refer to [Platform constraint support](https://docs.getdbt.com/docs/mesh/govern/model-contracts.md#platform-constraint-support) Documentation: Data types: models/constraints\_example.sql ```sql {{ config( materialized = "table" ) }} select 1 as id, 'My Favorite Customer' as customer_name, cast('2019-01-01' as date) as first_transaction_date ``` models/schema.yml ```yml models: - name: dim_customers config: contract: enforced: true columns: - name: id data_type: int constraints: - type: not_null - type: primary_key # not enforced -- will warn & include - type: check # not supported -- will warn & skip expression: "id > 0" data_tests: - unique # primary_key constraint is not enforced - name: customer_name data_type: string - name: first_transaction_date data_type: date ``` ##### Column-level constraint on nested column:[​](#column-level-constraint-on-nested-column "Direct link to Column-level constraint on nested column:") models/nested\_column\_constraints\_example.sql ```sql {{ config( materialized = "table" ) }} select 'string' as a, struct( 1 as id, 'name' as name, struct(2 as id, struct('test' as again, '2' as even_more) as another) as double_nested ) as b ``` models/nested\_fields.yml ```yml models: - name: nested_column_constraints_example config: contract: enforced: true columns: - name: a data_type: string - name: b.id data_type: integer constraints: - type: not_null - name: b.name description: test description data_type: string - name: b.double_nested.id data_type: integer - name: b.double_nested.another.again data_type: string - name: b.double_nested.another.even_more data_type: integer constraints: - type: not_null ``` ##### Expected DDL to enforce constraints:[​](#expected-ddl-to-enforce-constraints "Direct link to Expected DDL to enforce constraints:") target/run/.../constraints\_example.sql ```sql create or replace table ``.``.`constraints_model` ( id integer not null, customer_name string, first_transaction_date date ) as ( select 1 as id, 'My Favorite Customer' as customer_name, cast('2019-01-01' as date) as first_transaction_date ); ``` Databricks allows you to define: * a `not null` constraint * and/or additional `check` constraints, with conditional expressions including one or more columns As Databricks does not support transactions nor allows using `create or replace table` with a column schema, the table is first created without a schema, and `alter` statements are then executed to add the different constraints. This means that: * The names and order of columns is checked but not their type * If the `constraints` and/or `constraint_check` fails, the table with the failing data will still exist in the Warehouse See [this page](https://docs.databricks.com/tables/constraints.html) with more details about the support of constraints on Databricks. models/constraints\_example.sql ```sql {{ config( materialized = "table" ) }} select 1 as id, 'My Favorite Customer' as customer_name, cast('2019-01-01' as date) as first_transaction_date ``` models/schema.yml ```yml models: - name: dim_customers config: contract: enforced: true columns: - name: id data_type: int constraints: - type: not_null - type: primary_key # not enforced -- will warn & include - type: check # not supported -- will warn & skip expression: "id > 0" data_tests: - unique # primary_key constraint is not enforced - name: customer_name data_type: text - name: first_transaction_date data_type: date ``` Expected DDL to enforce constraints: target/run/.../constraints\_example.sql ```sql create or replace table schema_name.my_model using delta as select 1 as id, 'My Favorite Customer' as customer_name, cast('2019-01-01' as date) as first_transaction_date ``` Followed by the statements ```sql alter table schema_name.my_model change column id set not null; alter table schema_name.my_model add constraint 472394792387497234 check (id > 0); ``` #### Custom constraints[​](#custom-constraints "Direct link to Custom constraints") In dbt and dbt Core, you can use custom constraints on models for the advanced configuration of tables. Different data warehouses support different syntax and capabilities. Custom constraints allow you to add configuration to specific columns. For example: * Set [masking policies](https://docs.snowflake.com/en/user-guide/security-column-intro#what-are-masking-policies) in Snowflake when using a Create Table As Select (CTAS). * Other data warehouses (such as [Databricks](https://docs.databricks.com/en/sql/language-manual/sql-ref-syntax-ddl-create-table-using.html) and [BigQuery](https://cloud.google.com/bigquery/docs/reference/standard-sql/data-definition-language#column_name_and_column_schema) have their own set of parameters that can be set for columns in their CTAS statements. You can implement constraints in a couple of different ways:  Custom constraints with tags Here's an example of how to implement tag-based masking policies with contracts and constraints using the following syntax: models/constraints\_example.yml ```yaml models: - name: my_model config: contract: enforced: true materialized: table columns: - name: id data_type: int constraints: - type: custom expression: "tag (my_tag = 'my_value')" # A custom SQL expression used to enforce a specific constraint on a column. ``` Using this syntax requires configuring all the columns and their types as it’s the only way to send a create or replace ` mytable as ...`. It’s not possible to do it with just a partial list of columns. This means making sure the columns and constraints fields are fully defined. To generate a YAML with all the columns, you can use `generate_model_yaml` from [dbt-codegen](https://github.com/dbt-labs/dbt-codegen/tree/0.12.1/?tab=readme-ov-file#generate_model_yaml-source).  Custom constraints without tags Alternatively, you can add a masking policy without tags: models/constraints\_example.yml ```yaml models: - name: my_model config: contract: enforced: true materialized: table columns: - name: id data_type: int constraints: - type: custom expression: "masking policy my_policy" ``` #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### contract When the `contract` configuration is enforced, dbt will ensure that your model's returned dataset exactly matches the attributes you have defined in yaml: * `name` and `data_type` for every column * Additional [`constraints`](https://docs.getdbt.com/reference/resource-properties/constraints.md), as supported for this materialization and data platform This is to ensure that the people querying your model downstream—both inside and outside dbt—have a predictable and consistent set of columns to use in their analyses. Even a subtle change in data type, such as from `boolean` (`true`/`false`) to `integer` (`0`/`1`), could cause queries to fail in surprising ways. Contracts give you control over how schemas are enforced, whether that’s on a single model or consistently across many models in a project. #### Prerequisites[​](#prerequisites "Direct link to Prerequisites") **These places support model contracts:** * `dbt_project.yml` file * `properties.yml` file * SQL models * Models materialized as one of the following: * `table` * `view` — views offer limited support for column names and data types, but not `constraints` * `incremental` — with `on_schema_change: append_new_columns` or `on_schema_change: fail` * Certain data platforms, but the supported and [enforced `constraints`](https://docs.getdbt.com/reference/resource-properties/constraints.md) vary by platform **These places do *NOT* support model contracts:** * Python models * `materialized view` or `ephemeral` — materialized SQL models * Custom materializations (unless added by the author) * Models with recursive CTE's in BigQuery * Other resource types, such as `sources`, `seeds`, `snapshots`, and so on Refer to the [Examples](https://docs.getdbt.com/reference/resource-configs/contract.md#examples) to see how to apply contracts in your project. #### Data type aliasing[​](#data-type-aliasing "Direct link to Data type aliasing") dbt uses built-in type aliasing for the `data_type` defined in your YAML. For example, you can specify `string` in your contract, and on Postgres/Redshift, dbt will convert it to `text`. If dbt doesn't recognize the `data_type` name among its known aliases, it will pass it through as-is. This is enabled by default, but you can opt-out by setting `alias_types` to `false`. Example for disabling: FOLDER\_NAME/FILE\_NAME.yml ```yml models: - name: my_model config: contract: enforced: true alias_types: false # true by default ``` #### Size, precision, and scale[​](#size-precision-and-scale "Direct link to Size, precision, and scale") When dbt compares data types, it will not compare granular details such as size, precision, or scale. We don't think you should sweat the difference between `varchar(256)` and `varchar(257)`, because it doesn't really affect the experience of downstream queriers. You can accomplish a more-precise assertion by [writing or using a custom test](https://docs.getdbt.com/best-practices/writing-custom-generic-tests.md). Note that you need to specify a varchar size or numeric scale, otherwise dbt relies on default values. For example, if a `numeric` type defaults to a precision of 38 and a scale of 0, then the numeric column stores 0 digits to the right of the decimal (it only stores whole numbers), which might cause it to fail contract enforcement. To avoid this implicit coercion, specify your `data_type` with a nonzero scale, like `numeric(38, 6)`. dbt Core 1.7 and higher provides a warning if you don't specify precision and scale when providing a numeric data type. ##### Examples[​](#examples "Direct link to Examples") models/dim\_customers.yml ```yml models: - name: dim_customers config: materialized: table contract: enforced: true columns: - name: customer_id data_type: int constraints: - type: not_null - name: customer_name data_type: string - name: non_integer data_type: numeric(38,3) ``` Let's say your model is defined as: models/dim\_customers.sql ```sql select 'abc123' as customer_id, 'My Best Customer' as customer_name ``` When you `dbt run` your model, *before* dbt has materialized it as a table in the database, you will see this error: ```txt 20:53:45 Compilation Error in model dim_customers (models/dim_customers.sql) 20:53:45 This model has an enforced contract that failed. 20:53:45 Please ensure the name, data_type, and number of columns in your contract match the columns in your model's definition. 20:53:45 20:53:45 | column_name | definition_type | contract_type | mismatch_reason | 20:53:45 | ----------- | --------------- | ------------- | ------------------ | 20:53:45 | customer_id | TEXT | INT | data type mismatch | 20:53:45 20:53:45 20:53:45 > in macro assert_columns_equivalent (macros/materializations/models/table/columns_spec_ddl.sql) ``` * Project YAML * Properties YAML * SQL file config Use a contract enforcement in your `dbt_project.yml` to enforce contracts consistently across multiple models: ```yml models: property_management: # replace with your dbt project name +contract: enforced: true ``` Define a model’s contract in a `properties.yml` by specifying the expected columns and data types: ```yml models: - name: stg_rental_applications # replace with your model name config: contract: enforced: true columns: - name: column_1_id # example id column. Replace with your column data_type: int # replace with your column's data type - name: column_2_created_at # example column tracking when something was created data_type: timestamp - name: column_3_status # example status column, which typically store text values ("active", "pending", "completed", etc.) data_type: string ``` Enforce a contract in a model SQL file when you want to apply it to a single model and maintain fine-grained control: ```sql {{ config( contract = { "enforced": true } -- Enables contract enforcement for this model ) }} select column_1_id, -- replace with your column column_2_created_at, -- replace with your column column_3_status -- replace with your column from {{ source('property_management', 'rental_applications') }} -- replace with your source name and table ``` Refer to [General configurations](https://docs.getdbt.com/reference/model-configs.md#general-configurations) for more information on the supported configs available for model SQL files, `dbt_project.yml` and `properties.yml`. ##### Incremental models and `on_schema_change`[​](#incremental-models-and-on_schema_change "Direct link to incremental-models-and-on_schema_change") Why require that incremental models also set [`on_schema_change`](https://docs.getdbt.com/docs/build/incremental-models.md#what-if-the-columns-of-my-incremental-model-change), and why to `append_new_columns` or `fail`? Imagine: * You add a new column to both the SQL and the YAML spec * You don't set `on_schema_change`, or you set `on_schema_change: 'ignore'` * dbt doesn't actually add that new column to the existing table — and the upsert/merge still succeeds, because it does that upsert/merge on the basis of the already-existing "destination" columns only (this is long-established behavior) * The result is a delta between the yaml-defined contract, and the actual table in the database - which means the contract is now incorrect! Why `append_new_columns` (or `fail`) rather than `sync_all_columns`? Because removing existing columns is a breaking change for contracted models! `sync_all_columns` works like `append_new_columns` but also removes deleted columns, which you're not supposed to do with contracted models unless you upgrade the version. #### Related documentation[​](#related-documentation "Direct link to Related documentation") * [What is a model contract?](https://docs.getdbt.com/docs/mesh/govern/model-contracts.md) * [Defining `columns`](https://docs.getdbt.com/reference/resource-properties/columns.md) * [Defining `constraints`](https://docs.getdbt.com/reference/resource-properties/constraints.md) #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Cross-database macros These macros benefit three different user groups: * If you maintain a package, your package is more likely to work on other adapters by using these macros (rather than a specific database's SQL syntax) * If you maintain an adapter, your adapter is more likely to support more packages by implementing (and testing) these macros. * If you're an end user, more packages and adapters are likely to "just work" for you (without you having to do anything). Note Please make sure to take a look at the [SQL expressions section](#sql-expressions) to understand quoting syntax for string values and date literals. #### All functions (alphabetical)[​](#all-functions-alphabetical "Direct link to All functions (alphabetical)") * [Cross-database macros](#cross-database-macros) * [All functions (alphabetical)](#all-functions-alphabetical) * [Data type functions](#data-type-functions) * [type\_bigint](#type_bigint) * [type\_boolean](#type_boolean) * [type\_float](#type_float) * [type\_int](#type_int) * [type\_numeric](#type_numeric) * [type\_string](#type_string) * [type\_timestamp](#type_timestamp) * [current\_timestamp](#current_timestamp) * [Set functions](#set-functions) * [except](#except) * [intersect](#intersect) * [Array functions](#array-functions) * [array\_append](#array_append) * [array\_concat](#array_concat) * [array\_construct](#array_construct) * [String functions](#string-functions) * [concat](#concat) * [hash](#hash) * [length](#length) * [position](#position) * [replace](#replace) * [right](#right) * [split\_part](#split_part) * [String literal functions](#string-literal-functions) * [escape\_single\_quotes](#escape_single_quotes) * [string\_literal](#string_literal) * [Aggregate and window functions](#aggregate-and-window-functions) * [any\_value](#any_value) * [bool\_or](#bool_or) * [listagg](#listagg) * [Cast functions](#cast-functions) * [cast](#cast) * [cast\_bool\_to\_text](#cast_bool_to_text) * [safe\_cast](#safe_cast) * [Comparison functions](#comparison-functions) * [equals](#equals) * [Date and time functions](#date-and-time-functions) * [date](#date) * [dateadd](#dateadd) * [datediff](#datediff) * [date\_trunc](#date_trunc) * [last\_day](#last_day) * [Date and time parts](#date-and-time-parts) * [SQL expressions](#sql-expressions) [**Data type functions**](#data-type-functions) * [type\_bigint](#type_bigint) * [type\_boolean](#type_boolean) * [type\_float](#type_float) * [type\_int](#type_int) * [type\_numeric](#type_numeric) * [type\_string](#type_string) * [type\_timestamp](#type_timestamp) [**Set functions**](#set-functions) * [except](#except) * [intersect](#intersect) [**Array functions**](#array-functions) * [array\_append](#array_append) * [array\_concat](#array_concat) * [array\_construct](#array_construct) [**String functions**](#string-functions) * [concat](#concat) * [hash](#hash) * [length](#length) * [position](#position) * [replace](#replace) * [right](#right) * [split\_part](#split_part) [**String literal functions**](#string-literal-functions) * [escape\_single\_quotes](#escape_single_quotes) * [string\_literal](#string_literal) [**Aggregate and window functions**](#aggregate-and-window-functions) * [any\_value](#any_value) * [bool\_or](#bool_or) * [listagg](#listagg) [**Cast functions**](#cast-functions) * [cast](#cast) * [cast\_bool\_to\_text](#cast_bool_to_text) * [safe\_cast](#safe_cast) [**Comparison functions**](#comparison-functions) * [equals](#equals) [**Date and time functions**](#date-and-time-functions) * [date](#date) * [dateadd](#dateadd) * [datediff](#datediff) * [date\_trunc](#date_trunc) * [last\_day](#last_day) #### Data type functions[​](#data-type-functions "Direct link to Data type functions") ##### type\_bigint[​](#type_bigint "Direct link to type_bigint") **Args**: * None This macro yields the database-specific data type for a `BIGINT`. **Usage**: ```sql {{ dbt.type_bigint() }} ``` **Sample Output (PostgreSQL)**: ```sql bigint ``` ##### type\_boolean[​](#type_boolean "Direct link to type_boolean") **Args**: * None This macro yields the database-specific data type for a `BOOLEAN`. **Usage**: ```sql {{ dbt.type_boolean() }} ``` **Sample Output (PostgreSQL)**: ```sql BOOLEAN ``` ##### type\_float[​](#type_float "Direct link to type_float") **Args**: * None This macro yields the database-specific data type for a `FLOAT`. **Usage**: ```sql {{ dbt.type_float() }} ``` **Sample Output (PostgreSQL)**: ```sql FLOAT ``` ##### type\_int[​](#type_int "Direct link to type_int") **Args**: * None This macro yields the database-specific data type for an `INT`. **Usage**: ```sql {{ dbt.type_int() }} ``` **Sample Output (PostgreSQL)**: ```sql INT ``` ##### type\_numeric[​](#type_numeric "Direct link to type_numeric") **Args**: * None This macro yields the database-specific data type for a `NUMERIC`. **Usage**: ```sql {{ dbt.type_numeric() }} ``` **Sample Output (PostgreSQL)**: ```sql numeric(28,6) ``` ##### type\_string[​](#type_string "Direct link to type_string") **Args**: * None This macro yields the database-specific data type for `TEXT`. **Usage**: ```sql {{ dbt.type_string() }} ``` **Sample Output (PostgreSQL)**: ```sql TEXT ``` ##### type\_timestamp[​](#type_timestamp "Direct link to type_timestamp") **Args**: * None This macro yields the database-specific data type for a `TIMESTAMP` (which may or may not match the behavior of `TIMESTAMP WITHOUT TIMEZONE` from ANSI SQL-92). **Usage**: ```sql {{ dbt.type_timestamp() }} ``` **Sample Output (PostgreSQL)**: ```sql TIMESTAMP ``` ##### current\_timestamp[​](#current_timestamp "Direct link to current_timestamp") This macro returns the current date and time for the system. Depending on the adapter: * The result may be an aware or naive timestamp. * The result may correspond to the start of the statement or the start of the transaction. **Args** * None **Usage** * You can use the `current_timestamp()` macro within your dbt SQL files like this: ```sql {{ dbt.current_timestamp() }} ``` **Sample output (PostgreSQL)** ```sql now() ``` #### Set functions[​](#set-functions "Direct link to Set functions") ##### except[​](#except "Direct link to except") **Args**: * None `except` is one of the set operators specified ANSI SQL-92 (along with `union` and `intersect`) and is akin to [set difference](https://en.wikipedia.org/wiki/Complement_\(set_theory\)#Relative_complement). **Usage**: ```sql {{ dbt.except() }} ``` **Sample Output (PostgreSQL)**: ```sql except ``` ##### intersect[​](#intersect "Direct link to intersect") **Args**: * None `intersect` is one of the set operators specified ANSI SQL-92 (along with `union` and `except`) and is akin to [set intersection](https://en.wikipedia.org/wiki/Intersection_\(set_theory\)). **Usage**: ```sql {{ dbt.intersect() }} ``` **Sample Output (PostgreSQL)**: ```sql intersect ``` #### Array functions[​](#array-functions "Direct link to Array functions") ##### array\_append[​](#array_append "Direct link to array_append") **Args**: * `array` (required): The array to append to. * `new_element` (required): The element to be appended. This element must *match the data type of the existing elements* in the array in order to match PostgreSQL functionality and *not null* to match BigQuery functionality. This macro appends an element to the end of an array and returns the appended array. **Usage**: ```sql {{ dbt.array_append("array_column", "element_column") }} {{ dbt.array_append("array_column", "5") }} {{ dbt.array_append("array_column", "'blue'") }} ``` **Sample Output (PostgreSQL)**: ```sql array_append(array_column, element_column) array_append(array_column, 5) array_append(array_column, 'blue') ``` ##### array\_concat[​](#array_concat "Direct link to array_concat") **Args**: * `array_1` (required): The array to append to. * `array_2` (required): The array to be appended to `array_1`. This array must match the data type of `array_1` in order to match PostgreSQL functionality. This macro returns the concatenation of two arrays. **Usage**: ```sql {{ dbt.array_concat("array_column_1", "array_column_2") }} ``` **Sample Output (PostgreSQL)**: ```sql array_cat(array_column_1, array_column_2) ``` ##### array\_construct[​](#array_construct "Direct link to array_construct") **Args**: * `inputs` (optional): The list of array contents. If not provided, this macro will create an empty array. All inputs must be the *same data type* in order to match PostgreSQL functionality and *not null* to match BigQuery functionality. * `data_type` (optional): Specifies the data type of the constructed array. This is only relevant when creating an empty array (will otherwise use the data type of the inputs). If `inputs` are `data_type` are both not provided, this macro will create an empty array of type integer. This macro returns an array constructed from a set of inputs. **Usage**: ```sql {{ dbt.array_construct(["column_1", "column_2", "column_3"]) }} {{ dbt.array_construct([], "integer") }} {{ dbt.array_construct([1, 2, 3, 4]) }} {{ dbt.array_construct(["'blue'", "'green'"]) }} ``` **Sample Output (PostgreSQL)**: ```sql array[ column_1 , column_2 , column_3 ] array[]::integer[] array[ 1 , 2 , 3 , 4 ] array[ 'blue' , 'green' ] ``` #### String functions[​](#string-functions "Direct link to String functions") ##### concat[​](#concat "Direct link to concat") **Args**: * `fields`: Jinja array of [attribute names or expressions](#sql-expressions). This macro combines a list of strings together. **Usage**: ```sql {{ dbt.concat(["column_1", "column_2"]) }} {{ dbt.concat(["year_column", "'-'" , "month_column", "'-'" , "day_column"]) }} {{ dbt.concat(["first_part_column", "'.'" , "second_part_column"]) }} {{ dbt.concat(["first_part_column", "','" , "second_part_column"]) }} ``` **Sample Output (PostgreSQL)**: ```sql column_1 || column_2 year_column || '-' || month_column || '-' || day_column first_part_column || '.' || second_part_column first_part_column || ',' || second_part_column ``` ##### hash[​](#hash "Direct link to hash") **Args**: * `field`: [attribute name or expression](#sql-expressions). This macro provides a hash (such as [MD5](https://en.wikipedia.org/wiki/MD5)) of an [expression](#sql-expressions) cast as a string. **Usage**: ```sql {{ dbt.hash("column") }} {{ dbt.hash("'Pennsylvania'") }} ``` **Sample Output (PostgreSQL)**: ```sql md5(cast(column as varchar )) md5(cast('Pennsylvania' as varchar )) ``` ##### length[​](#length "Direct link to length") **Args**: * `expression`: string [expression](#sql-expressions). This macro calculates the number of characters in a string. **Usage**: ```sql {{ dbt.length("column") }} ``` **Sample Output (PostgreSQL)**: ```sql length( column ) ``` ##### position[​](#position "Direct link to position") **Args**: * `substring_text`: [attribute name or expression](#sql-expressions). * `string_text`: [attribute name or expression](#sql-expressions). This macro searches for the first occurrence of `substring_text` within `string_text` and returns the 1-based position if found. **Usage**: ```sql {{ dbt.position("substring_column", "text_column") }} {{ dbt.position("'-'", "text_column") }} ``` **Sample Output (PostgreSQL)**: ```sql position( substring_column in text_column ) position( '-' in text_column ) ``` ##### replace[​](#replace "Direct link to replace") **Args**: * `field`: [attribute name or expression](#sql-expressions). * `old_chars`: [attribute name or expression](#sql-expressions). * `new_chars`: [attribute name or expression](#sql-expressions). This macro updates a string and replaces all occurrences of one substring with another. The precise behavior may vary slightly from one adapter to another. **Usage**: ```sql {{ dbt.replace("string_text_column", "old_chars_column", "new_chars_column") }} {{ dbt.replace("string_text_column", "'-'", "'_'") }} ``` **Sample Output (PostgreSQL)**: ```sql replace( string_text_column, old_chars_column, new_chars_column ) replace( string_text_column, '-', '_' ) ``` ##### right[​](#right "Direct link to right") **Args**: * `string_text`: [attribute name or expression](#sql-expressions). * `length_expression`: numeric [expression](#sql-expressions). This macro returns the N rightmost characters from a string. **Usage**: ```sql {{ dbt.right("string_text_column", "length_column") }} {{ dbt.right("string_text_column", "3") }} ``` **Sample Output (PostgreSQL)**: ```sql right( string_text_column, length_column ) right( string_text_column, 3 ) ``` ##### split\_part[​](#split_part "Direct link to split_part") **Args**: * `string_text` (required): Text to be split into parts. * `delimiter_text` (required): Text representing the delimiter to split by. * `part_number` (required): Requested part of the split (1-based). If the value is negative, the parts are counted backward from the end of the string. This macro splits a string of text using the supplied delimiter and returns the supplied part number (1-indexed). **Usage**: When referencing a column, use one pair of quotes. When referencing a string, use single quotes enclosed in double quotes. ```sql {{ dbt.split_part(string_text='column_to_split', delimiter_text='delimiter_column', part_number=1) }} {{ dbt.split_part(string_text="'1|2|3'", delimiter_text="'|'", part_number=1) }} ``` **Sample Output (PostgreSQL)**: ```sql split_part( column_to_split, delimiter_column, 1 ) split_part( '1|2|3', '|', 1 ) ``` #### String literal functions[​](#string-literal-functions "Direct link to String literal functions") ##### escape\_single\_quotes[​](#escape_single_quotes "Direct link to escape_single_quotes") **Args**: * `value`: Jinja string literal value This macro adds escape characters for any single quotes within the provided string literal. Note: if given a column, it will only operate on the column *name*, not the values within the column. To escape quotes for column values, consider a macro like [replace](#replace) or a regular expression replace. **Usage**: ```sql {{ dbt.escape_single_quotes("they're") }} {{ dbt.escape_single_quotes("ain't ain't a word") }} ``` **Sample Output (PostgreSQL)**: ```sql they''re ain''t ain''t a word ``` ##### string\_literal[​](#string_literal "Direct link to string_literal") **Args**: * `value`: Jinja string value This macro converts a Jinja string into a SQL string literal. To cast column values to a string, consider a macro like [safe\_cast](#safe_cast) or an ordinary cast. **Usage**: ```sql select {{ dbt.string_literal("Pennsylvania") }} ``` **Sample Output (PostgreSQL)**: ```sql select 'Pennsylvania' ``` #### Aggregate and window functions[​](#aggregate-and-window-functions "Direct link to Aggregate and window functions") ##### any\_value[​](#any_value "Direct link to any_value") **Args**: * `expression`: an [expression](#sql-expressions). This macro returns some value of the expression from the group. The selected value is non-deterministic (rather than random). **Usage**: ```sql {{ dbt.any_value("column_name") }} ``` **Sample Output (PostgreSQL)**: ```sql any(column_name) ``` ##### bool\_or[​](#bool_or "Direct link to bool_or") **Args**: * `expression`: [attribute name or expression](#sql-expressions). This macro returns the logical `OR` of all non-`NULL` expressions -- `true` if at least one record in the group evaluates to `true`. **Usage**: ```sql {{ dbt.bool_or("boolean_column") }} {{ dbt.bool_or("integer_column = 3") }} {{ dbt.bool_or("string_column = 'Pennsylvania'") }} {{ dbt.bool_or("column1 = column2") }} ``` **Sample Output (PostgreSQL)**: ```sql bool_or(boolean_column) bool_or(integer_column = 3) bool_or(string_column = 'Pennsylvania') bool_or(column1 = column2) ``` ##### listagg[​](#listagg "Direct link to listagg") **Args**: * `measure` (required): The [attribute name or expression](#sql-expressions) that determines the values to be concatenated. To only include distinct values add keyword `DISTINCT` to beginning of expression (example: 'DISTINCT column\_to\_agg'). * `delimiter_text` (required): Text representing the delimiter to separate concatenated values by. * `order_by_clause` (optional): An expression (typically one or more column names separated by commas) that determines the order of the concatenated values. * `limit_num` (optional): Specifies the maximum number of values to be concatenated. This macro returns the concatenated input values from a group of rows separated by a specified delimiter. **Usage**: Note: If there are instances of `delimiter_text` within your `measure`, you cannot include a `limit_num`. ```sql {{ dbt.listagg(measure="column_to_agg", delimiter_text="','", order_by_clause="order by order_by_column", limit_num=10) }} ``` **Sample Output (PostgreSQL)**: ```sql array_to_string( (array_agg( column_to_agg order by order_by_column ))[1:10], ',' ) ``` #### Cast functions[​](#cast-functions "Direct link to Cast functions") ##### cast[​](#cast "Direct link to cast") **Availability**: dbt v1.8 or higher. For more information, select the version from the documentation navigation menu. **Args**: * `field`: [attribute name or expression](#sql-expressions). * `type`: data type to convert to This macro casts a value to the specified data type. Unlike [safe\_cast](#safe_cast), this macro will raise an error when the cast fails. **Usage**: ```sql {{ dbt.cast("column_1", api.Column.translate_type("string")) }} {{ dbt.cast("column_2", api.Column.translate_type("integer")) }} {{ dbt.cast("'2016-03-09'", api.Column.translate_type("date")) }} ``` **Sample Output (PostgreSQL)**: ```sql cast(column_1 as TEXT) cast(column_2 as INT) cast('2016-03-09' as date) ``` ##### cast\_bool\_to\_text[​](#cast_bool_to_text "Direct link to cast_bool_to_text") **Args**: * `field`: boolean [attribute name or expression](#sql-expressions). This macro casts a boolean value to a string. **Usage**: ```sql {{ dbt.cast_bool_to_text("boolean_column_name") }} {{ dbt.cast_bool_to_text("false") }} {{ dbt.cast_bool_to_text("true") }} {{ dbt.cast_bool_to_text("0 = 1") }} {{ dbt.cast_bool_to_text("1 = 1") }} {{ dbt.cast_bool_to_text("null") }} ``` **Sample Output (PostgreSQL)**: ```sql cast(boolean_column_name as varchar ) cast(false as varchar ) cast(true as varchar ) cast(0 = 1 as varchar ) cast(1 = 1 as varchar ) cast(null as varchar ) ``` ##### safe\_cast[​](#safe_cast "Direct link to safe_cast") **Args**: * `field`: [attribute name or expression](#sql-expressions). * `type`: data type to convert to For databases that support it, this macro will return `NULL` when the cast fails (instead of raising an error). **Usage**: ```sql {{ dbt.safe_cast("column_1", api.Column.translate_type("string")) }} {{ dbt.safe_cast("column_2", api.Column.translate_type("integer")) }} {{ dbt.safe_cast("'2016-03-09'", api.Column.translate_type("date")) }} ``` **Sample Output (PostgreSQL)**: ```sql cast(column_1 as TEXT) cast(column_2 as INT) cast('2016-03-09' as date) ``` #### Comparison functions[​](#comparison-functions "Direct link to Comparison functions") Comparison functions are macros that compare two SQL expressions and return a boolean SQL expression (for example, `TRUE`, `FALSE`, or `UNKNOWN`). ##### equals[​](#equals "Direct link to equals") **Args**: * `a`: [attribute name or expression](#sql-expressions). * `b`: [attribute name or expression](#sql-expressions). This macro compares two expressions for equality. By default, the `equals()` macro follows SQL's [three-valued logic (3VL)](https://modern-sql.com/concept/three-valued-logic), so `NULL = NULL` evaluates to `UNKNOWN` rather than `TRUE`. When the [`enable_truthy_nulls_equals_macro`](https://docs.getdbt.com/reference/global-configs/behavior-changes.md#null-safe-equality-equals-macro) flag is enabled, `equals()` behaves like the [`IS NOT DISTINCT FROM`](https://modern-sql.com/feature/is-distinct-from) SQL operator and treats two `NULL` values as the same. **Usage**: ```sql {{ dbt.equals("column_a", "column_b") }} {{ dbt.equals("id", "previous_id") }} ``` **Sample output (PostgreSQL with [`enable_truthy_nulls_equals_macro`](https://docs.getdbt.com/reference/global-configs/behavior-changes.md#null-safe-equality-equals-macro) enabled)**: ```sql (column_a IS NOT DISTINCT FROM column_b) (id IS NOT DISTINCT FROM previous_id) ``` #### Date and time functions[​](#date-and-time-functions "Direct link to Date and time functions") ##### date[​](#date "Direct link to date") **Availability**: dbt v1.8 or later. For more information, select the version from the documentation navigation menu. **Args**: * `year`: an integer * `month`: an integer * `day`: an integer This macro converts the `year`, `month`, and `day` into an SQL `DATE` type. **Usage**: ```sql {{ dbt.date(2023, 10, 4) }} ``` **Sample output (PostgreSQL)**: ```sql to_date('2023-10-04', 'YYYY-MM-DD') ``` ##### dateadd[​](#dateadd "Direct link to dateadd") **Args**: * `datepart`: [date or time part](#date-and-time-parts). * `interval`: integer count of the `datepart` to add (can be positive or negative) * `from_date_or_timestamp`: date/time [expression](#sql-expressions). This macro adds a time/day interval to the supplied date/timestamp. Note: The `datepart` argument is database-specific. **Usage**: ```sql {{ dbt.dateadd(datepart="day", interval=1, from_date_or_timestamp="'2016-03-09'") }} {{ dbt.dateadd(datepart="month", interval=-2, from_date_or_timestamp="'2016-03-09'") }} ``` **Sample Output (PostgreSQL)**: ```sql '2016-03-09' + ((interval '10 day') * (1)) '2016-03-09' + ((interval '10 month') * (-2)) ``` ##### datediff[​](#datediff "Direct link to datediff") **Args**: * `first_date`: date/time [expression](#sql-expressions). * `second_date`: date/time [expression](#sql-expressions). * `datepart`: [date or time part](#date-and-time-parts). This macro calculates the difference between two dates. **Usage**: ```sql {{ dbt.datediff("column_1", "column_2", "day") }} {{ dbt.datediff("column", "'2016-03-09'", "month") }} {{ dbt.datediff("'2016-03-09'", "column", "year") }} ``` **Sample Output (PostgreSQL)**: ```sql ((column_2)::date - (column_1)::date) ((date_part('year', ('2016-03-09')::date) - date_part('year', (column)::date)) * 12 + date_part('month', ('2016-03-09')::date) - date_part('month', (column)::date)) (date_part('year', (column)::date) - date_part('year', ('2016-03-09')::date)) ``` ##### date\_trunc[​](#date_trunc "Direct link to date_trunc") **Args**: * `datepart`: [date or time part](#date-and-time-parts). * `date`: date/time [expression](#sql-expressions). This macro truncates / rounds a timestamp to the first instant for the given [date or time part](#date-and-time-parts). **Usage**: ```sql {{ dbt.date_trunc("day", "updated_at") }} {{ dbt.date_trunc("month", "updated_at") }} {{ dbt.date_trunc("year", "'2016-03-09'") }} ``` **Sample Output (PostgreSQL)**: ```sql date_trunc('day', updated_at) date_trunc('month', updated_at) date_trunc('year', '2016-03-09') ``` ##### last\_day[​](#last_day "Direct link to last_day") **Args**: * `date`: date/time [expression](#sql-expressions). * `datepart`: [date or time part](#date-and-time-parts). This macro gets the last day for a given date and datepart. **Usage**: * The `datepart` argument is database-specific. * This macro currently only supports dateparts of `month` and `quarter`. ```sql {{ dbt.last_day("created_at", "month") }} {{ dbt.last_day("'2016-03-09'", "year") }} ``` **Sample Output (PostgreSQL)**: ```sql cast( date_trunc('month', created_at) + ((interval '10 month') * (1)) + ((interval '10 day') * (-1)) as date) cast( date_trunc('year', '2016-03-09') + ((interval '10 year') * (1)) + ((interval '10 day') * (-1)) as date) ``` #### Date and time parts[​](#date-and-time-parts "Direct link to Date and time parts") Often supported date and time parts (case insensitive): * `year` * `quarter` * `month` * `week` * `day` * `hour` * `minute` * `second` * `millisecond` * `microsecond` * `nanosecond` This listing is not meant to be exhaustive, and some of these date and time parts may not be supported for particular adapters. Some macros may not support all date and time parts. Some adapters may support more or less precision. #### SQL expressions[​](#sql-expressions "Direct link to SQL expressions") A SQL expression may take forms like the following: * function * column name * date literal * string literal * \ literal (number, etc) * `NULL` Example: Suppose there is an `orders` table with a column named `order_date`. The following shows 3 different types of expressions: ```sql select date_trunc(month, order_date) as expression_function, order_date as expression_column_name, '2016-03-09' as expression_date_literal, 'Pennsylvania' as expression_string_literal, 3 as expression_number_literal, NULL as expression_null, from orders ``` Note that the string literal example includes single quotes. (Note: the string literal character may vary per database. For this example, we suppose a single quote.) To refer to a SQL string literal in Jinja, surrounding double quotes are required. So within Jinja, the string values would be: * `"date_trunc(month, order_date)"` * `"order_date"` * `"'2016-03-09'"` * `"'Pennsylvania'"` * `"NULL"` #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Data test configurations #### Related documentation[​](#related-documentation "Direct link to Related documentation") * [Data tests](https://docs.getdbt.com/docs/build/data-tests.md) Data tests can be configured in a few different ways: 1. Properties within `.yml` definition (generic tests only, see [test properties](https://docs.getdbt.com/reference/resource-properties/data-tests.md) for full syntax) 2. A `config()` block within the test's SQL definition 3. In `dbt_project.yml` Data test configs are applied hierarchically, in the order of specificity outlined above. In the case of a singular test, the `config()` block within the SQL definition takes precedence over configs in the project YAML file. In the case of a specific instance of a generic test, the test's `.yml` properties would take precedence over any values set in its generic SQL definition's `config()`, which in turn would take precedence over values set in the project YAML file (`dbt_project.yml`). #### Available configurations[​](#available-configurations "Direct link to Available configurations") Click the link on each configuration option to read more about what it can do. ##### Data test-specific configurations[​](#data-test-specific-configurations "Direct link to Data test-specific configurations") Resource-specific configurations are applicable to only one dbt resource type rather than multiple resource types. You can define these settings in the project file (`dbt_project.yml`), a property file (`models/properties.yml` for models, similarly for other resources), or within the resource’s file using the `{{ config() }}` macro.
The following resource-specific configurations are only available to Data tests: * Project file * SQL file config * Property file dbt\_project.yml ```yaml data_tests: : +fail_calc: +limit: +severity: error | warn +error_if: +warn_if: +store_failures: true | false +where: ``` ```jinja {{ config( fail_calc = "", limit = , severity = "error | warn", error_if = "", warn_if = "", store_failures = true | false, where = "" ) }} ``` ```yaml : - name: data_tests: - : # # Actual name of the test. For example, dbt_utils.equality name: # Human friendly name for the test. For example, equality_fct_test_coverage description: "markdown formatting" arguments: # Available in v1.10.5 and higher. Older versions can set the as the top-level property. : config: fail_calc: limit: severity: error | warn error_if: warn_if: store_failures: true | false where: # Available in v1.12 and higher. Requires enabling the `require_sql_header_in_test_configs` flag. sql_header: columns: - name: data_tests: - : name: description: "markdown formatting" arguments: # Available in v1.10.5 and higher. Older versions can set the as the top-level property. : config: fail_calc: limit: severity: error | warn error_if: warn_if: store_failures: true | false where: # Available in v1.12 and higher. Requires enabling the `require_sql_header_in_test_configs` flag. sql_header: ``` This configuration mechanism is supported for specific instances of generic tests only. To configure a specific singular test, you should use the `config()` macro in its SQL definition. Starting in dbt Core v1.12, you can set [`sql_header`](https://docs.getdbt.com/reference/resource-configs/sql_header.md) in the `config` of a generic data test at the model or column level of your `properties.yml`. Enable the [`require_sql_header_in_test_configs`](https://docs.getdbt.com/reference/global-configs/behavior-changes.md#sql_header-in-test-configs) flag to use `config.sql_header` in your data tests. ##### General configurations[​](#general-configurations "Direct link to General configurations") General configurations provide broader operational settings applicable across multiple resource types. Like resource-specific configurations, these can also be set in the project file, property files, or within resource-specific files. * Project file * SQL file config * Property file dbt\_project.yml ```yaml data_tests: : +enabled: true | false +tags: | [] +meta: {dictionary} # relevant for store_failures only +database: +schema: +alias: ``` ```jinja {{ config( enabled=true | false, tags="" | [""] meta={dictionary}, database="", schema="", alias="", ) }} ``` ```yaml : - name: data_tests: - : # Actual name of the test. For example, dbt_utils.equality name: # Human friendly name for the test. For example, equality_fct_test_coverage description: "markdown formatting" arguments: # available in v1.10.5 and higher. Older versions can set the as the top-level property. : config: enabled: true | false tags: | [] meta: {dictionary} # relevant for store_failures only database: schema: alias: columns: - name: data_tests: - : name: description: "markdown formatting" arguments: # available in v1.10.5 and higher. Older versions can set the as the top-level property. : config: enabled: true | false tags: | [] meta: {dictionary} # relevant for store_failures only database: schema: alias: ``` This configuration mechanism is supported for specific instances of generic data tests only. To configure a specific singular test, you should use the `config()` macro in its SQL definition. ##### Examples[​](#examples "Direct link to Examples") ###### Add a tag to one test[​](#add-a-tag-to-one-test "Direct link to Add a tag to one test") If a specific instance of a generic data test: models/\.yml ```yml models: - name: my_model columns: - name: id data_tests: - unique: config: tags: ['my_tag'] # changed to config in v1.10 ``` If a singular data test: tests/\.sql ```sql {{ config(tags = ['my_tag']) }} select ... ``` ###### Set the default severity for all instances of a generic data test[​](#set-the-default-severity-for-all-instances-of-a-generic-data-test "Direct link to Set the default severity for all instances of a generic data test") macros/\.sql ```sql {% test my_test() %} {{ config(severity = 'warn') }} select ... {% endtest %} ``` ###### Disable all data tests from a package[​](#disable-all-data-tests-from-a-package "Direct link to Disable all data tests from a package") dbt\_project.yml ```yml data_tests: package_name: +enabled: false ``` ###### Specify custom configurations for generic data tests[​](#specify-custom-configurations-for-generic-data-tests "Direct link to Specify custom configurations for generic data tests") Beginning in dbt v1.9, you can use any custom config key to specify custom configurations for data tests. For example, the following specifies the `snowflake_warehouse` custom config that dbt should use when executing the `accepted_values` data test: ```yml models: - name: my_model columns: - name: color data_tests: - accepted_values: arguments: # available in v1.10.5 and higher. Older versions can set the as the top-level property. values: ['blue', 'red'] config: severity: warn snowflake_warehouse: my_warehouse ``` Given the config, the data test runs on a different Snowflake virtual warehouse than the one in your default connection to enable better price-performance with a different warehouse size or more granular cost allocation and visibility. ###### Add a description to generic and singular tests[​](#add-a-description-to-generic-and-singular-tests "Direct link to Add a description to generic and singular tests") Starting from dbt v1.9 (also available to dbt [release tracks](https://docs.getdbt.com/docs/dbt-versions/cloud-release-tracks.md)), you can add [descriptions](https://docs.getdbt.com/reference/resource-properties/data-tests.md#description) to both generic and singular tests. For a generic test, add the description in line with the existing YAML: models/staging/\.yml ```yml models: - name: my_model columns: - name: delivery_status data_tests: - accepted_values: arguments: # available in v1.10.5 and higher. Older versions can set the as the top-level property. values: ['delivered', 'pending', 'failed'] description: "This test checks whether there are unexpected delivery statuses. If it fails, check with logistics team" ``` You can also add descriptions to the Jinja macro that provides the core logic of a generic data test. Refer to the [Add description to generic data test logic](https://docs.getdbt.com/best-practices/writing-custom-generic-tests.md#add-description-to-generic-data-test-logic) for more information. For a singular test, define it in the test's directory: tests/my\_custom\_test.yml ```yml data_tests: - name: my_custom_test description: "This test checks whether the rolling average of returns is inside of expected bounds. If it isn't, flag to customer success team" ``` For more information refer to [Add a description to a data test](https://docs.getdbt.com/reference/resource-properties/description.md#add-a-description-to-a-data-test). #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### database * Model * Seeds * Snapshots * Tests Specify a custom database for a model in your `dbt_project.yml` file. For example, if you have a model that you want to load into a database other than the target database, you can configure it like this: dbt\_project.yml ```yml models: your_project: sales_metrics: +database: reporting ``` This would result in the generated relation being located in the `reporting` database, so the full relation name would be `reporting.finance.sales_metrics` instead of the default target database. Configure a database in your `dbt_project.yml` file. For example, to load a seed into a database called `staging` instead of the target database, you can configure it like this: dbt\_project.yml ```yml seeds: your_project: product_categories: +database: staging ``` This would result in the generated relation being located in the `staging` database, so the full relation name would be `staging.finance.product_categories`. Customize the database for storing test results in your `dbt_project.yml` file. For example, to save test results in a specific database, you can configure it like this: dbt\_project.yml ```yml data_tests: +store_failures: true +database: test_results ``` This would result in the test results being stored in the `test_results` database. #### Definition[​](#definition "Direct link to Definition") Optionally specify a custom database for a [model](https://docs.getdbt.com/docs/build/sql-models.md), [seed](https://docs.getdbt.com/docs/build/seeds.md), [snapshot](https://docs.getdbt.com/docs/build/snapshots.md), or [data test](https://docs.getdbt.com/docs/build/data-tests.md). When dbt creates a relation (table/view) in a database, it creates it as: `{{ database }}.{{ schema }}.{{ identifier }}`, e.g. `analytics.finance.payments` The standard behavior of dbt is: * If a custom database is *not* specified, the database of the relation is the target database (`{{ target.database }}`). * If a custom database is specified, the database of the relation is the `{{ database }}` value. To learn more about changing the way that dbt generates a relation's `database`, read [Using Custom Databases](https://docs.getdbt.com/docs/build/custom-databases.md) #### Warehouse specific information[​](#warehouse-specific-information "Direct link to Warehouse specific information") * BigQuery: `project` and `database` are interchangeable * Databricks: `catalog` and `database` are interchangable #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Database permissions Database permissions are access rights and privileges granted to users or roles within a database or data platform. They help you specify what actions users or roles can perform on various database objects, like tables, views, schemas, or even the entire database. ##### Why are they useful[​](#why-are-they-useful "Direct link to Why are they useful") * Database permissions are essential for security and data access control. * They ensure that only authorized users can perform specific actions. * They help maintain data integrity, prevent unauthorized changes, and limit exposure to sensitive data. * Permissions also support compliance with data privacy regulations and auditing. ##### How to use them[​](#how-to-use-them "Direct link to How to use them") * Users and administrators can grant and manage permissions at various levels (such as table, schema, and so on) using SQL statements or through the database system's interface. * Assign permissions to individual users or roles (groups of users) based on their responsibilities. * Typical permissions include "SELECT" (read), "INSERT" (add data), "UPDATE" (modify data), "DELETE" (remove data), and administrative rights like "CREATE" and "DROP." * Users should be assigned permissions that ensure they have the necessary access to perform their tasks without overextending privileges. Something to note is that each data platform provider might have different approaches and names for privileges. Refer to their documentation for more details. ##### Examples[​](#examples "Direct link to Examples") Refer to the following database permission pages for more info on examples and how to set up database permissions: * [Databricks](https://docs.getdbt.com/reference/database-permissions/databricks-permissions.md) * [Postgres](https://docs.getdbt.com/reference/database-permissions/postgres-permissions.md) * [Redshift](https://docs.getdbt.com/reference/database-permissions/redshift-permissions.md) * [Snowflake](https://docs.getdbt.com/reference/database-permissions/snowflake-permissions.md) #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Databricks adapter behavior changes The following are the current [behavior change flags](https://docs.getdbt.com/reference/global-configs/behavior-changes.md#behavior-change-flags) that are specific to `dbt-databricks`: | Flag | `dbt-databricks`: Intro | `dbt-databricks`: Maturity | Status | | -------------------------------------------------------------------------------------- | ----------------------- | -------------------------- | -------------------------- | | [`use_info_schema_for_columns`](#use-information-schema-for-columns) | 1.9.0 | N/A | **Removed in 1.11.0** | | [`use_user_folder_for_python`](#use-users-folder-for-python-model-notebooks) | 1.9.0 | 1.11.0 | Default changed to `True` | | [`use_materialization_v2`](#use-restructured-materializations) | 1.10.0 | TBD | Active | | [`use_managed_iceberg`](#use-managed-iceberg) | 1.11.0 | 1.12.0 | Active | | [`use_replace_on_for_insert_overwrite`](#use-replace-on-for-insert_overwrite-strategy) | 1.11.0 | 1.12.0 | Active, defaults to `True` | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | #### Use information schema for columns[​](#use-information-schema-for-columns "Direct link to Use information schema for columns") Removed in v1.11.0 The `use_info_schema_for_columns` flag has been **removed** as of dbt-databricks v1.11.0. The adapter now uses [`DESCRIBE EXTENDED ... AS JSON`](https://docs.databricks.com/aws/en/sql/language-manual/sql-ref-syntax-aux-describe-table) (available in DBR 16.2+) to efficiently retrieve complex type information, eliminating the need for this flag. If you're still using this flag in your project configuration, you can safely remove it. The new approach provides better performance and doesn't require the `REPAIR TABLE` operations that were needed with `information_schema`. ##### Legacy documentation[​](#legacy-documentation "Direct link to Legacy documentation") *This applies to dbt-databricks versions v1.11 and older* The `use_info_schema_for_columns` flag was `False` by default in versions 1.9 and 1.10. Setting this flag to `True` would use `information_schema` rather than `describe extended` to get column metadata for Unity Catalog tables. This setting helped avoid issues where `describe extended` truncates information when the type is a complex struct. For complex types If your complex type comes from processing JSON using `from_json`, you have an alternative: use [`parse_json` to create the column as the `variant` type](https://docs.databricks.com/aws/en/sql/language-manual/functions/parse_json). The `variant` type might be a reasonable alternative in terms of performance, while avoiding type truncation issues. #### Use user's folder for Python model notebooks[​](#use-users-folder-for-python-model-notebooks "Direct link to Use user's folder for Python model notebooks") Default changed in v1.11.0 As of dbt-databricks v1.11.0, the `use_user_folder_for_python` flag defaults to **`True`**. The `use_user_folder_for_python` flag controls where uploaded Python model notebooks are stored in Databricks: * **`True` (default in v1.11+)**: Notebooks are written to `/Users/{{current user}}/{{catalog}}/{{schema}}/`. * **`False` (default in v1.9-v1.10)**: Notebooks are written to `/Shared/dbt_python_models/{{schema}}/`. Databricks deprecated writing to the `Shared` folder as it doesn't align with governance best practices. Using user-specific folders provides better isolation, access control, and aligns with Unity Catalog security models. To preserve the legacy behavior for backward compatibility, you can explicitly set this flag to `False` in your `dbt_project.yml`: ```yaml flags: use_user_folder_for_python: false ``` #### Use restructured materializations[​](#use-restructured-materializations "Direct link to Use restructured materializations") The `use_materialization_v2` flag is `False` by default and guards significant rewrites of the core materializations in `dbt-databricks` while they are still in an experimental stage. When set to `True`, `dbt-databricks` uses the updated logic for all model types (views, tables, incremental, seeds). It also enables additional, optional config options for more fine-tuned control: * `view_update_via_alter` — When enabled, this config attempts to update the view in place using alter view, instead of using create or replace to replace it. * `use_safer_relation_operation` — When enabled (and if `view_update_via_alter` isn't set), this config makes dbt model updates more safe by staging relations and using rename operations to ensure the live version of the table or view is not disrupted by failures. These configs aren't required to receive the core benefits of this flag — like better performance and column/constraint functionality — but they are gated behind the flag because they introduce more significant changes to how materializations behave. In v1.11.0, this flag will stay set to `False` by default. Based on feedback about the new materialization’s lack of atomicity (all-or-nothing updates), we won’t enable it automatically. We’ll explore other ways to achieve the same benefits without losing atomicity. Given feedback about lack of atomicity of the new materialization approach, we will not be flipping this flag to `True`. Instead, we will be investigating new ways to provide the same benefits while maintaining atomicity. ##### Changes to the Seed materialization[​](#changes-to-the-seed-materialization "Direct link to Changes to the Seed materialization") The seeds materialization should have the smallest difference between the old and new materialization, as the primary difference is just removing calls to methods that are not supported by Databricks, such as transaction operations. ##### Changes to the View materialization[​](#changes-to-the-view-materialization "Direct link to Changes to the View materialization") With the `use_materialization_v2` flag set to `True`, there are two model configuration options that can customize how we handle the view materialization when we detect an existing relation at the target location. * `view_update_via_alter` — Updates the view in place using alter view, instead of using create or replace to replace it. This allows continuity of history for the view, keeps the metadata, and helps with Unity Catalog compatibility. Here's an example of how to configure this: schema.yml ```yaml models: - name: market_summary config: materialized: view view_update_via_alter: true columns: - name: country data_tests: - unique - not_null ... ``` There is currently no support for altering the comment on a view via Databricks SQL. As such, we must replace the view whenever you change its description * `use_safer_relation_operations` — When enabled (and if `view_update_via_alter` isn't set), this config makes dbt model updates more safe by creating a new relation in a staging location, swapping it with the existing relation, and deleting the old relation afterward. The following example shows how to configure this: schema.yml ```yaml models: - name: market_summary config: materialized: view use_safer_relation_operations: true columns: - name: country data_tests: - unique - not_null ... ``` This configuration option may increase costs and disrupt Unity Catalog history. While this approach is equivalent to the default dbt view materialization, it will create additional UC objects, as compared to alternatives. Since this config does not use atomic 'create or replace...' for any materialization, the history of the object in Unity Catalog may not behave as you expect. Consider carefully before using this model config broadly. ##### Changes to the Table materialization[​](#changes-to-the-table-materialization "Direct link to Changes to the Table materialization") This flag may increase storage costs for tables. As with views, these materialization changes could increase costs. More temporary objects are used, consistent with other dbt adapters' materializations. We consider these changes experimental in part because we do not have enough data quantifying the price impact of this change. The benefits though are improvements in performance, safety, and unblocking features that cannot be delivered with the existing materialization. When `use_materialization_v2` is set to `True`, all materialization paths are updated. The key change is that table creation is separated from inserting rows into the table. This separation greatly improves performance for setting table comments, since adding comments at create time is faster than using separate `alter table` statements. It also resolves compatibility issues in Databricks, where creating and inserting in one step prevents setting comments. Additionally, this change makes it possible to support other column features — like column-level masks — that aren’t compatible with inserting data during creation. While these features aren’t included in version 1.10.0, they can now be added in future releases. ###### Constraints[​](#constraints "Direct link to Constraints") For several feature releases now, dbt-databricks supported both dbt's [constraints](https://docs.getdbt.com/reference/resource-properties/constraints.md) implementation and our own alternative, earlier version called `persist_constraints`. With the `use_materialization_v2` flag, we're beginning to deprecate `persist_constraints` and shifting fully to dbt's native constraint support. One new enhancement is support for the `expression` field on primary and foreign keys, which lets you pass additional Databricks options — like using [`RELY` to tell the Databricks optimizer that it may exploit the constraint to rewrite queries](https://docs.databricks.com/aws/en/sql/language-manual/sql-ref-syntax-ddl-create-table-constraint). Separating `create` and `insert` also changes how constraints behave. Previously, we would create a table with data and then apply constraints. If the new data violated a constraint, the run would fail — but by then, it had already replaced the valid table from the previous run. As with views, you can select between performance and safety with the [`use_safer_relation_operations` flag](#use_safer_relation_operations), but regardless of setting, the new materialization approach ensures constraint violations don't make it into the target table. ###### `use_safer_relation_operations`[​](#use_safer_relation_operations "Direct link to use_safer_relation_operations") When using this model configuration with tables, we first create a staging table. After successfully inserting data into the table, we rename it to replace the target materialization. Since Databricks doesn’t support rollbacks, this is a safer approach — if something fails before the rename, the original table stays intact. That gives you time to troubleshoot without worrying that exposures or work streams relying on that table are broken in the mean time. If this config is set to `false` (the default), the target table will still never contain constraint-violating data, but it might end up empty if the insert fails due to the constraint. The key difference is whether we replace the target directly or use a staging-and-rename approach. This configuration option may increase costs and disrupt Unity Catalog history. As with views, there is a cost to using additional temporary objects, in the form of creating more UC objects with their own history. Consider carefully whether you need this behavior. ##### Changes to the Incremental materialization[​](#changes-to-the-incremental-materialization "Direct link to Changes to the Incremental materialization") All the changes made to the [Table materialization section](#changes-to-the-table-materialization) also apply to Incremental materializations. We’ve also added a new config: `incremental_apply_config_changes`. This config lets you control whether dbt should apply changes to things like `tags`, `tblproperties`, and comments during incremental runs. Many users wanted the capability to configure table metadata in Databricks — like AI-generated comments — without dbt overwriting them. Previously, dbt-databricks always applied detected changes during incremental runs. With the V2 materialization, you can now set `incremental_apply_config_changes` to `False` to stop that behavior. (It defaults to `True` to match the previous behavior.) The following example shows how to configure this: schema.yml ```yaml models: - name: incremental_market_updates config: materialized: incremental incremental_apply_config_changes: false ... ``` #### Use managed Iceberg[​](#use-managed-iceberg "Direct link to Use managed Iceberg") When you set `table_format` to `iceberg`, the `use_managed_iceberg` flag controls how the table is created. By default, this flag is set to `False` and dbt creates a [UniForm](https://www.databricks.com/blog/delta-uniform-universal-format-lakehouse-interoperability) table. When set to `True`, dbt creates a [managed Iceberg](https://docs.databricks.com/aws/en/tables/managed) table. #### Use `replace on` for `insert_overwrite` strategy[​](#use-replace-on-for-insert_overwrite-strategy "Direct link to use-replace-on-for-insert_overwrite-strategy") The `use_replace_on_for_insert_overwrite` flag controls which SQL syntax dbt generates for incremental models using the `insert_overwrite` strategy. This flag defaults to `True` by default and results in using the [`insert into ... replace on`](https://docs.databricks.com/aws/en/sql/language-manual/sql-ref-syntax-dml-insert-into#replace-on) syntax to perform dynamic partition/cluster overwrites, which is the same behavior as in cluster computes. When the flag is set to `False`, `insert_overwrite` will truncate the entire table when used with SQL warehouses. The flag is not relevant for cluster computes because the `insert_overwrite`'s behavior has always been dynamic partition/cluster overwrites in cluster computes. | Flag value | SQL generated | Description | | ---------------- | -------------------------------------------------------------------------------------------------------------------------------- | --------------------------------------------------------------------------------------- | | `True` (default) | [`INSERT INTO ... REPLACE ON`](https://docs.databricks.com/aws/en/sql/language-manual/sql-ref-syntax-dml-insert-into#replace-on) | Uses the latest, recommended Databricks syntax to replace matching partitions. | | `False` | `INSERT OVERWRITE` | Uses the older Spark syntax to overwrite partitions. Depends on Spark session settings. | | | | - | If you previously relied on this behavior to get full table replacement without dropping existing metadata, that behavior continues to exist with the flag set to `True`, provided you do not use any partitions or liquid clustering clusters. These data layout optimizations only tend to have a significant effect for tables that are approximately 1 TB large or greater, at which point regular replacement of all of the data is probably not the best approach. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Databricks configurations #### Configuring tables[​](#configuring-tables "Direct link to Configuring tables") When materializing a model as `table`, you may include several optional configs that are specific to the dbt-databricks plugin, in addition to the standard [model configs](https://docs.getdbt.com/reference/model-configs.md). dbt-databricks v1.9 adds support for the `table_format: iceberg` config. Try it now on the [dbt **Latest** release track](https://docs.getdbt.com/docs/dbt-versions/cloud-release-tracks.md). All other table configurations were also supported in 1.8. | Option | Description | Required? | Model support | Example | | ----------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | --------------------------------------- | ---------------- | --------------------------- | | table\_format | Whether or not to provision [Iceberg](https://docs.databricks.com/en/delta/uniform.html) compatibility for the materialization | Optional | SQL, Python | `iceberg` | | file\_format † | The file format to use when creating tables (`parquet`, `delta`, `hudi`, `csv`, `json`, `text`, `jdbc`, `orc`, `hive` or `libsvm`). | Optional | SQL, Python | `delta` | | location\_root | The created table uses the specified directory to store its data. The table alias is appended to it. | Optional | SQL, Python | `/mnt/root` | | include\_full\_name\_in\_path | Whether to use the full table path to qualify the location root. If this is set, the database, schema, and table alias are all appended to the location root. | Optional | SQL, Python | `true` | | partition\_by | Partition the created table by the specified columns. A directory is created for each partition. | Optional | SQL, Python | `date_day` | | liquid\_clustered\_by^ | Cluster the created table by the specified columns. Clustering method is based on [Delta's Liquid Clustering feature](https://docs.databricks.com/en/delta/clustering.html). Available since dbt-databricks 1.6.2. | Optional | SQL, Python | `date_day` | | auto\_liquid\_cluster+ | The created table is [automatically clustered by Databricks](https://docs.databricks.com/aws/en/delta/clustering#automatic-liquid-clustering). Available since dbt-databricks 1.10.0 | Optional | SQL, Python | `auto_liquid_cluster: true` | | clustered\_by | Each partition in the created table will be split into a fixed number of buckets by the specified columns. | Optional | SQL, Python | `country_code` | | buckets | The number of buckets to create while clustering | Required if `clustered_by` is specified | SQL, Python | `8` | | tblproperties | [Tblproperties](https://docs.databricks.com/en/sql/language-manual/sql-ref-syntax-ddl-tblproperties.html) to be set on the created table | Optional | SQL, Python\* | `{'this.is.my.key': 12}` | | databricks\_tags | [Tags](https://docs.databricks.com/en/data-governance/unity-catalog/tags.html) to be set on the created table | Optional | SQL ‡ , Python ‡ | `{'my_tag': 'my_value'}` | | compression | Set the compression algorithm. | Optional | SQL, Python | `zstd` | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | \* We do not yet have a PySpark API to set tblproperties at table creation, so this feature is primarily to allow users to anotate their python-derived tables with tblproperties. † When `table_format` is `iceberg`, `file_format` must be `delta`. ‡ `databricks_tags` are applied via `ALTER` statements. Tags cannot be removed via dbt-databricks once applied. To remove tags, use Databricks directly or a post-hook. ^ When `liquid_clustered_by` is enabled, dbt-databricks issues an `OPTIMIZE` (Liquid Clustering) operation after each run. To disable this behavior, set the variable `DATABRICKS_SKIP_OPTIMIZE=true`, which can be passed into the dbt run command (`dbt run --vars "{'databricks_skip_optimize': true}"`) or set as an environment variable. See [issue #802](https://github.com/databricks/dbt-databricks/issues/802). \+ Do not use `liquid_clustered_by` and `auto_liquid_cluster` on the same model. In dbt-databricks v1.10, there are several new model configurations options gated behind the `use_materialization_v2` flag. For details, see the [documentation of Databricks behavior flags](https://docs.getdbt.com/reference/global-configs/databricks-changes.md). ##### Python submission methods[​](#python-submission-methods "Direct link to Python submission methods") *Available in versions 1.9 or higher* In dbt-databricks v1.9 (try it now in [the dbt **Latest** release track](https://docs.getdbt.com/docs/dbt-versions/cloud-release-tracks.md)), you can use these four options for `submission_method`: * `all_purpose_cluster`: Executes the python model either directly using the [command api](https://docs.databricks.com/api/workspace/commandexecution) or by uploading a notebook and creating a one-off job run * `job_cluster`: Creates a new job cluster to execute an uploaded notebook as a one-off job run * `serverless_cluster`: Uses a [serverless cluster](https://docs.databricks.com/en/jobs/run-serverless-jobs.html) to execute an uploaded notebook as a one-off job run * `workflow_job`: Creates/updates a reusable workflow and uploaded notebook, for execution on all-purpose, job, or serverless clusters. caution This approach gives you maximum flexibility, but will create persistent artifacts in Databricks (the workflow) that users could run outside of dbt. We are currently in a transitionary period where there is a disconnect between old submission methods (which were grouped by compute), and the logically distinct submission methods (command, job run, workflow). As such, the supported config matrix is somewhat complicated: | Config | Use | Default | `all_purpose_cluster`\* | `job_cluster` | `serverless_cluster` | `workflow_job` | | --------------------- | --------------------------------------------------------------------------------------------------------------------------- | ------------------ | ----------------------- | ------------- | -------------------- | -------------- | | `create_notebook` | if false, use Command API, otherwise upload notebook and use job run | `false` | ✅ | ❌ | ❌ | ❌ | | `timeout` | maximum time to wait for command/job to run | `0` (No timeout) | ✅ | ✅ | ✅ | ✅ | | `job_cluster_config` | configures a [new cluster](https://docs.databricks.com/api/workspace/jobs/submit#tasks-new_cluster) for running the model | `{}` | ❌ | ✅ | ❌ | ✅ | | `access_control_list` | directly configures [access control](https://docs.databricks.com/api/workspace/jobs/submit#access_control_list) for the job | `{}` | ✅ | ✅ | ✅ | ✅ | | `packages` | list of packages to install on the executing cluster | `[]` | ✅ | ✅ | ✅ | ✅ | | `index_url` | url to install `packages` from | `None` (uses pypi) | ✅ | ✅ | ✅ | ✅ | | `additional_libs` | directly configures [libraries](https://docs.databricks.com/api/workspace/jobs/submit#tasks-libraries) | `[]` | ✅ | ✅ | ✅ | ✅ | | `python_job_config` | additional configuration for jobs/workflows (see table below) | `{}` | ✅ | ✅ | ✅ | ✅ | | `cluster_id` | id of existing all purpose cluster to execute against | `None` | ✅ | ❌ | ❌ | ✅ | | `http_path` | path to existing all purpose cluster to execute against | `None` | ✅ | ❌ | ❌ | ❌ | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | \* Only `timeout` and `cluster_id`/`http_path` are supported when `create_notebook` is false With the introduction of the `workflow_job` submission method, we chose to segregate further configuration of the python model submission under a top level configuration named `python_job_config`. This keeps configuration options for jobs and workflows namespaced in such a way that they do not interfere with other model config, allowing us to be much more flexible with what is supported for job execution. The support matrix for this feature is divided into `workflow_job` and all others (assuming `all_purpose_cluster` with `create_notebook`==true). Each config option listed must be nested under `python_job_config`: | Config | Use | Default | `workflow_job` | All others | | -------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------ | ------- | -------------- | ---------- | | `name` | The name to give (or used to look up) the created workflow | `None` | ✅ | ❌ | | `grants` | A simplified way to specify access control for the workflow | `{}` | ✅ | ✅ | | `existing_job_id` | Id to use to look up the created workflow (in place of `name`) | `None` | ✅ | ❌ | | `post_hook_tasks` | [Tasks](https://docs.databricks.com/api/workspace/jobs/create#tasks) to include after the model notebook execution | `[]` | ✅ | ❌ | | `additional_task_settings` | Additional [task config](https://docs.databricks.com/api/workspace/jobs/create#tasks) to include in the model task | `{}` | ✅ | ❌ | | [Other job run settings](https://docs.databricks.com/api/workspace/jobs/submit) | Config will be copied into the request, outside of the model task | `None` | ❌ | ✅ | | [Other workflow settings](https://docs.databricks.com/api/workspace/jobs/create) | Config will be copied into the request, outside of the model task | `None` | ✅ | ❌ | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | This example uses the new configuration options in the previous table: schema.yml ```yaml models: - name: my_model config: submission_method: workflow_job # Define a job cluster to create for running this workflow # Alternately, could specify cluster_id to use an existing cluster, or provide neither to use a serverless cluster job_cluster_config: spark_version: "15.3.x-scala2.12" node_type_id: "rd-fleet.2xlarge" runtime_engine: "{{ var('job_cluster_defaults.runtime_engine') }}" data_security_mode: "{{ var('job_cluster_defaults.data_security_mode') }}" autoscale: { "min_workers": 1, "max_workers": 4 } python_job_config: # These settings are passed in, as is, to the request email_notifications: { on_failure: ["me@example.com"] } max_retries: 2 name: my_workflow_name # Override settings for your model's dbt task. For instance, you can # change the task key additional_task_settings: { "task_key": "my_dbt_task" } # Define tasks to run before/after the model # This example assumes you have already uploaded a notebook to /my_notebook_path to perform optimize and vacuum post_hook_tasks: [ { "depends_on": [{ "task_key": "my_dbt_task" }], "task_key": "OPTIMIZE_AND_VACUUM", "notebook_task": { "notebook_path": "/my_notebook_path", "source": "WORKSPACE" }, }, ] # Simplified structure, rather than having to specify permission separately for each user grants: view: [{ "group_name": "marketing-team" }] run: [{ "user_name": "other_user@example.com" }] manage: [] ``` #### Configuring columns[​](#configuring-columns "Direct link to Configuring columns") *Available in versions 1.10 or higher* When materializing models of various types, you may include several optional column-level configs that are specific to the dbt-databricks plugin, in addition to the standard [column configs](https://docs.getdbt.com/reference/resource-properties/columns.md). Support for column tags and column masks were added in dbt-databricks v1.10.4. | Option | Description | Required? | Model support | Materialization support | Example | | ---------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | --------- | ------------- | ------------------------------------------------------ | ------------------------------------------------- | | databricks\_tags | [Tags](https://docs.databricks.com/en/data-governance/unity-catalog/tags.html) to be set on individual columns | Optional | SQL†, Python† | Table, Incremental, Materialized View, Streaming Table | `{'data_classification': 'pii'}` | | column\_mask | [Column mask](https://docs.databricks.com/aws/en/sql/language-manual/sql-ref-syntax-ddl-column-mask) configuration for dynamic data masking. Accepts `function` and optional `using_columns` properties\* | Optional | SQL, Python | Table, Incremental, Streaming Table | `{'function': 'my_catalog.my_schema.mask_email'}` | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | \* `using_columns` supports all parameter types listed in [Databricks column mask parameters](https://docs.databricks.com/aws/en/sql/language-manual/sql-ref-syntax-ddl-column-mask#parameters). † `databricks_tags` are applied via `ALTER` statements. Tags cannot be removed via dbt-databricks once applied. To remove tags, use Databricks directly or a post-hook. This example uses the column-level configurations in the previous table: schema.yml ```yaml models: - name: customers columns: - name: customer_id databricks_tags: data_classification: "public" - name: email databricks_tags: data_classification: "pii" column_mask: function: my_catalog.my_schema.mask_email using_columns: "customer_id, 'literal string'" ``` #### Incremental models[​](#incremental-models "Direct link to Incremental models") *Available in versions 1.9 or higher* Breaking change in v1.11.0 dbt-databricks v1.11.0 requires Databricks Runtime 12.2 LTS or higher for incremental models This version introduces a fix for column order mismatches in incremental models by using Databricks' `INSERT BY NAME` syntax (available since DBR 12.2). This prevents data corruption that could occur when column order changed in models using `on_schema_change: sync_all_columns`. If you're using an older runtime: * Pin your `dbt-databricks` version to `1.10.x` * Or upgrade to DBR 12.2 LTS or higher This breaking change affects all incremental strategies: `append`, `insert_overwrite`, `replace_where`, `delete+insert`, and `merge` (via intermediate table creation). For more details on v1.11.0 changes, see the [dbt-databricks v1.11.0 changelog](https://github.com/databricks/dbt-databricks/blob/main/CHANGELOG.md). dbt-databricks plugin leans heavily on the [`incremental_strategy` config](https://docs.getdbt.com/docs/build/incremental-strategy.md). This config tells the incremental materialization how to build models in runs beyond their first. It can be set to one of six values: * `append`: Insert new records without updating or overwriting any existing data. * `insert_overwrite`: If `partition_by` is specified, overwrite partitions in the table with new data. If no `partition_by` is specified, overwrite the entire table with new data. * `merge`(default; Delta and Hudi file format only): Match records based on a `unique_key`, updating old records, and inserting new ones. (If no `unique_key` is specified, all new data is inserted, similar to `append`.) * `replace_where` (Delta file format only): Match records based on `incremental_predicates`, replacing all records that match the predicates from the existing table with records matching the predicates from the new data. (If no `incremental_predicates` are specified, all new data is inserted, similar to `append`.) * `delete+insert` (Delta file format only, available in v1.11+): Match records based on a required `unique_key`, delete matching records, and insert new records. Optionally filter using `incremental_predicates`. * `microbatch` (Delta file format only): Implements the [microbatch strategy](https://docs.getdbt.com/docs/build/incremental-microbatch.md) using `replace_where` with predicates generated based `event_time`. Each of these strategies has its pros and cons, which we'll discuss below. As with any model config, `incremental_strategy` may be specified in `dbt_project.yml` or within a model file's `config()` block. ##### The `append` strategy[​](#the-append-strategy "Direct link to the-append-strategy") Following the `append` strategy, dbt will perform an `insert into` statement with all new data. The appeal of this strategy is that it is straightforward and functional across all platforms, file types, connection methods, and Apache Spark versions. However, this strategy *cannot* update, overwrite, or delete existing data, so it is likely to insert duplicate records for many data sources. * Source code * Run code databricks\_incremental.sql ```sql {{ config( materialized='incremental', incremental_strategy='append', ) }} -- All rows returned by this query will be appended to the existing table select * from {{ ref('events') }} {% if is_incremental() %} where event_ts > (select max(event_ts) from {{ this }}) {% endif %} ``` databricks\_incremental.sql ```sql create temporary view databricks_incremental__dbt_tmp as select * from analytics.events where event_ts >= (select max(event_ts) from {{ this }}) ; insert into table analytics.databricks_incremental select `date_day`, `users` from databricks_incremental__dbt_tmp ``` ##### The `insert_overwrite` strategy[​](#the-insert_overwrite-strategy "Direct link to the-insert_overwrite-strategy") The `insert_overwrite` strategy updates data in a table by replacing existing records instead of just adding new ones. This strategy is most effective when specified alongside a `partition_by` or `liquid_clustered_by` clause in your model config, which helps identify the specific partitions or clusters affected by your query. dbt will run an [atomic `insert into ... replace on` statement](https://docs.databricks.com/aws/en/sql/language-manual/sql-ref-syntax-dml-insert-into#replace-on) that dynamically replaces all partitions/clusters included in your query, instead of rebuilding the entire table. **Important!** Be sure to re-select *all* of the relevant data for a partition or cluster when using this incremental strategy. When using `liquid_clustered_by`, the `replace on` keys used will be the same as the `liquid_clustered_by` keys (same as `partition_by` behavior). When you set [`use_replace_on_for_insert_overwrite`](https://docs.getdbt.com/reference/global-configs/databricks-changes.md#use-replace-on-for-insert_overwrite-strategy) to `True` (in SQL warehouses or when using cluster computes) dbt dynamically overwrites partitions and only replaces the partitions or clusters returned by your model query. dbt runs a [partitionOverwriteMode='dynamic' `insert overwrite` statement](https://docs.databricks.com/aws/en/delta/selective-overwrite#dynamic-partition-overwrites-with-partitionoverwritemode-legacyl), which helps reduce unnecessary overwrites and improves performance. When you set [`use_replace_on_for_insert_overwrite`](https://docs.getdbt.com/reference/global-configs/databricks-changes.md#use-replace-on-for-insert_overwrite-strategy) to `False` in SQL warehouses, dbt truncates (empties) the entire table before inserting new data. This replaces all rows in the table each time the model runs, which can increase run time and cost for large datasets If you don't specify `partition_by` or `liquid_clustered_by`, then the `insert_overwrite` strategy will atomically replace all contents of the table, overriding all existing data with only the new records. The column schema of the table remains the same, however. This can be desirable in some limited circumstances, since it minimizes downtime while the table contents are overwritten. The operation is comparable to running `truncate` and `insert` on other databases. For atomic replacement of Delta-formatted tables, use the `table` materialization (which runs `create or replace`) instead. * Source code * Run code databricks\_incremental.sql ```sql {{ config( materialized='incremental', partition_by=['date_day'], file_format='parquet' ) }} /* Every partition returned by this query will be overwritten when this model runs */ with new_events as ( select * from {{ ref('events') }} {% if is_incremental() %} where date_day >= date_add(current_date, -1) {% endif %} ) select date_day, count(*) as users from new_events group by 1 ``` databricks\_incremental.sql ```sql create temporary view databricks_incremental__dbt_tmp as with new_events as ( select * from analytics.events where date_day >= date_add(current_date, -1) ) select date_day, count(*) as users from events group by 1 ; insert overwrite table analytics.databricks_incremental partition (date_day) select `date_day`, `users` from databricks_incremental__dbt_tmp ``` ##### The `merge` strategy[​](#the-merge-strategy "Direct link to the-merge-strategy") The `merge` incremental strategy requires: * `file_format: delta or hudi` * Databricks Runtime 5.1 and above for delta file format * Apache Spark for hudi file format The Databricks adapter will run an [atomic `merge` statement](https://docs.databricks.com/spark/latest/spark-sql/language-manual/merge-into.html) similar to the default merge behavior on Snowflake and BigQuery. If a `unique_key` is specified (recommended), dbt will update old records with values from new records that match on the key column. If a `unique_key` is not specified, dbt will forgo match criteria and simply insert all new records (similar to `append` strategy). Specifying `merge` as the incremental strategy is optional since it's the default strategy used when none is specified. * Source code * Run code merge\_incremental.sql ```sql {{ config( materialized='incremental', file_format='delta', # or 'hudi' unique_key='user_id', incremental_strategy='merge' ) }} with new_events as ( select * from {{ ref('events') }} {% if is_incremental() %} where date_day >= date_add(current_date, -1) {% endif %} ) select user_id, max(date_day) as last_seen from events group by 1 ``` target/run/merge\_incremental.sql ```sql create temporary view merge_incremental__dbt_tmp as with new_events as ( select * from analytics.events where date_day >= date_add(current_date, -1) ) select user_id, max(date_day) as last_seen from events group by 1 ; merge into analytics.merge_incremental as DBT_INTERNAL_DEST using merge_incremental__dbt_tmp as DBT_INTERNAL_SOURCE on DBT_INTERNAL_SOURCE.user_id = DBT_INTERNAL_DEST.user_id when matched then update set * when not matched then insert * ``` Beginning with 1.9, `merge` behavior can be modified with the following additional configuration options: * `target_alias`, `source_alias`: Aliases for the target and source to allow you to describe your merge conditions more naturally. These default to `DBT_INTERNAL_DEST` and `DBT_INTERNAL_SOURCE`, respectively. * `skip_matched_step`: If set to `true`, the 'matched' clause of the merge statement will not be included. * `skip_not_matched_step`: If set to `true`, the 'not matched' clause will not be included. * `matched_condition`: Condition to apply to the `WHEN MATCHED` clause. You should use the `target_alias` and `source_alias` to write a conditional expression, such as `DBT_INTERNAL_DEST.col1 = hash(DBT_INTERNAL_SOURCE.col2, DBT_INTERNAL_SOURCE.col3)`. This condition further restricts the matched set of rows. * `not_matched_condition`: Condition to apply to the `WHEN NOT MATCHED [BY TARGET]` clause. This condition further restricts the set of rows in the target that do not match the source that will be inserted into the merged table. * `not_matched_by_source_condition`: Condition to apply to the further filter `WHEN NOT MATCHED BY SOURCE` clause. Only used in conjunction with `not_matched_by_source_action`. * `not_matched_by_source_action`: The action to apply when the condition is met. Configure as an expression. For example: `not_matched_by_source_action: "update set t.attr1 = 'deleted', t.tech_change_ts = current_timestamp()"`. * `merge_with_schema_evolution`: If set to `true`, the merge statement includes the `WITH SCHEMA EVOLUTION` clause. For more details on the meaning of each merge clause, please see [the Databricks documentation](https://docs.databricks.com/en/sql/language-manual/delta-merge-into.html). The following is an example demonstrating the use of these new options: * Source code * Run code merge\_incremental\_options.sql ```sql {{ config( materialized = 'incremental', unique_key = 'id', incremental_strategy='merge', target_alias='t', source_alias='s', matched_condition='t.tech_change_ts < s.tech_change_ts', not_matched_condition='s.attr1 IS NOT NULL', not_matched_by_source_condition='t.tech_change_ts < current_timestamp()', not_matched_by_source_action='delete', merge_with_schema_evolution=true ) }} select id, attr1, attr2, tech_change_ts from {{ ref('source_table') }} as s ``` target/run/merge\_incremental\_options.sql ```sql create temporary view merge_incremental__dbt_tmp as select id, attr1, attr2, tech_change_ts from upstream.source_table ; merge with schema evolution into target_table as t using ( select id, attr1, attr2, tech_change_ts from source_table as s ) on t.id <=> s.id when matched and t.tech_change_ts < s.tech_change_ts then update set id = s.id, attr1 = s.attr1, attr2 = s.attr2, tech_change_ts = s.tech_change_ts when not matched and s.attr1 IS NOT NULL then insert ( id, attr1, attr2, tech_change_ts ) values ( s.id, s.attr1, s.attr2, s.tech_change_ts ) when not matched by source and t.tech_change_ts < current_timestamp() then delete ``` ##### The `replace_where` strategy[​](#the-replace_where-strategy "Direct link to the-replace_where-strategy") The `replace_where` incremental strategy requires: * `file_format: delta` * Databricks Runtime 12.0 and above dbt will run an [atomic `replace where` statement](https://docs.databricks.com/en/delta/selective-overwrite.html#arbitrary-selective-overwrite-with-replacewhere) which selectively overwrites data matching one or more `incremental_predicates` specified as a string or array. Only rows matching the predicates will be inserted. If no `incremental_predicates` are specified, dbt will perform an atomic insert, as with `append`. caution `replace_where` inserts data into columns in the order provided, rather than by column name. If you reorder columns and the data is compatible with the existing schema, you may silently insert values into an unexpected column. If the incoming data is incompatible with the existing schema, you will instead receive an error. * Source code * Run code replace\_where\_incremental.sql ```sql {{ config( materialized='incremental', file_format='delta', incremental_strategy = 'replace_where' incremental_predicates = 'user_id >= 10000' # Never replace users with ids < 10000 ) }} with new_events as ( select * from {{ ref('events') }} {% if is_incremental() %} where date_day >= date_add(current_date, -1) {% endif %} ) select user_id, max(date_day) as last_seen from events group by 1 ``` target/run/replace\_where\_incremental.sql ```sql create temporary view replace_where__dbt_tmp as with new_events as ( select * from analytics.events where date_day >= date_add(current_date, -1) ) select user_id, max(date_day) as last_seen from events group by 1 ; insert into analytics.replace_where_incremental replace where user_id >= 10000 table `replace_where__dbt_tmp` ``` ##### The `delete+insert` strategy[​](#the-deleteinsert-strategy "Direct link to the-deleteinsert-strategy") *Available in versions 1.11 or higher* The `delete+insert` incremental strategy requires: * `file_format: delta` * A required `unique_key` configuration * Databricks Runtime 12.2 LTS or higher The `delete+insert` strategy is a simpler alternative to the `merge` strategy for cases where you want to replace matching records without the complexity of updating specific columns. This strategy works in two steps: 1. **Delete**: Remove all rows from the target table where the `unique_key` matches rows in the new data. 2. **Insert**: Insert all new rows from the staging data. This strategy is particularly useful when: * You want to replace entire records rather than update specific columns * Your business logic requires a clean "remove and replace" approach * You need a simpler incremental strategy than `merge` for full record replacement When using Databricks Runtime 17.1 or higher, dbt uses the efficient [`INSERT INTO ... REPLACE ON` syntax](https://docs.databricks.com/aws/en/sql/language-manual/sql-ref-syntax-dml-insert-into#replace-on) to perform this operation atomically. For older runtime versions, dbt executes separate `DELETE` and `INSERT` statements. You can optionally use `incremental_predicates` to further filter which records are processed, providing more control over which rows are deleted and inserted. * Source code * Run code (DBR 17.1+) * Run code (DBR < 17.1) delete\_insert\_incremental.sql ```sql {{ config( materialized='incremental', file_format='delta', incremental_strategy='delete+insert', unique_key='user_id' ) }} with new_events as ( select * from {{ ref('events') }} {% if is_incremental() %} where date_day >= date_add(current_date, -1) {% endif %} ) select user_id, max(date_day) as last_seen from new_events group by 1 ``` target/run/delete\_insert\_incremental.sql ```sql create temporary view delete_insert_incremental__dbt_tmp as with new_events as ( select * from analytics.events where date_day >= date_add(current_date, -1) ) select user_id, max(date_day) as last_seen from new_events group by 1 ; insert into table analytics.delete_insert_incremental as target replace on (target.user_id <=> temp.user_id) (select `user_id`, `last_seen` from delete_insert_incremental__dbt_tmp where date_day >= date_add(current_date, -1)) as temp ``` target/run/delete\_insert\_incremental.sql ```sql create temporary view delete_insert_incremental__dbt_tmp as with new_events as ( select * from analytics.events where date_day >= date_add(current_date, -1) ) select user_id, max(date_day) as last_seen from new_events group by 1 ; -- Step 1: Delete matching rows delete from analytics.delete_insert_incremental where analytics.delete_insert_incremental.user_id IN (SELECT user_id FROM delete_insert_incremental__dbt_tmp) and date_day >= date_add(current_date, -1); -- Step 2: Insert new rows insert into analytics.delete_insert_incremental by name select `user_id`, `last_seen` from delete_insert_incremental__dbt_tmp where date_day >= date_add(current_date, -1) ``` ##### The `microbatch` strategy[​](#the-microbatch-strategy "Direct link to the-microbatch-strategy") *Available in versions 1.9 or higher* The Databricks adapter implements the `microbatch` strategy using `replace_where`. Note the requirements and caution statements for `replace_where` above. For more information about this strategy, see the [microbatch reference page](https://docs.getdbt.com/docs/build/incremental-microbatch.md). In the following example, the upstream table `events` have been annotated with an `event_time` column called `ts` in its schema file. * Source code * Run code microbatch\_incremental.sql ```sql {{ config( materialized='incremental', file_format='delta', incremental_strategy = 'microbatch' event_time='date' # Use 'date' as the grain for this microbatch table ) }} with new_events as ( select * from {{ ref('events') }} ) select user_id, date, count(*) as visits from events group by 1, 2 ``` target/run/replace\_where\_incremental.sql ```sql create temporary view replace_where__dbt_tmp as with new_events as ( select * from (select * from analytics.events where ts >= '2024-10-01' and ts < '2024-10-02') ) select user_id, date, count(*) as visits from events group by 1, 2 ; insert into analytics.replace_where_incremental replace where CAST(date as TIMESTAMP) >= '2024-10-01' and CAST(date as TIMESTAMP) < '2024-10-02' table `replace_where__dbt_tmp` ``` #### Python model configuration[​](#python-model-configuration "Direct link to Python model configuration") The Databricks adapter supports Python models. Databricks uses PySpark as the processing framework for these models. **Submission methods:** Databricks supports a few different mechanisms to submit PySpark code, each with relative advantages. Some are better for supporting iterative development, while others are better for supporting lower-cost production deployments. The options are: * `all_purpose_cluster` (default): dbt will run your Python model using the cluster ID configured as `cluster` in your connection profile or for this specific model. These clusters are more expensive but also much more responsive. We recommend using an interactive all-purpose cluster for quicker iteration in development. * `create_notebook: True`: dbt will upload your model's compiled PySpark code to a notebook in the namespace `/Shared/dbt_python_model/{schema}`, where `{schema}` is the configured schema for the model, and execute that notebook to run using the all-purpose cluster. The appeal of this approach is that you can easily open the notebook in the Databricks UI for debugging or fine-tuning right after running your model. Remember to copy any changes into your dbt `.py` model code before re-running. * `create_notebook: False` (default): dbt will use the [Command API](https://docs.databricks.com/dev-tools/api/1.2/index.html#run-a-command), which is slightly faster. * `job_cluster`: dbt will upload your model's compiled PySpark code to a notebook in the namespace `/Shared/dbt_python_model/{schema}`, where `{schema}` is the configured schema for the model, and execute that notebook to run using a short-lived jobs cluster. For each Python model, Databricks will need to spin up the cluster, execute the model's PySpark transformation, and then spin down the cluster. As such, job clusters take longer before and after model execution, but they're also less expensive, so we recommend these for longer-running Python models in production. To use the `job_cluster` submission method, your model must be configured with `job_cluster_config`, which defines key-value properties for `new_cluster`, as defined in the [JobRunsSubmit API](https://docs.databricks.com/dev-tools/api/latest/jobs.html#operation/JobsRunsSubmit). You can configure each model's `submission_method` in all the standard ways you supply configuration: ```python def model(dbt, session): dbt.config( submission_method="all_purpose_cluster", create_notebook=True, cluster_id="abcd-1234-wxyz" ) ... ``` ```yml models: - name: my_python_model config: submission_method: job_cluster job_cluster_config: spark_version: ... node_type_id: ... ``` ```yml # dbt_project.yml models: project_name: subfolder: # set defaults for all .py models defined in this subfolder +submission_method: all_purpose_cluster +create_notebook: False +cluster_id: abcd-1234-wxyz ``` If not configured, `dbt-spark` will use the built-in defaults: the all-purpose cluster (based on `cluster` in your connection profile) without creating a notebook. The `dbt-databricks` adapter will default to the cluster configured in `http_path`. We encourage explicitly configuring the clusters for Python models in Databricks projects. **Installing packages:** When using all-purpose clusters, we recommend installing packages which you will be using to run your Python models. **Related docs:** * [PySpark DataFrame syntax](https://spark.apache.org/docs/latest/api/python/reference/pyspark.sql/api/pyspark.sql.DataFrame.html) * [Databricks: Introduction to DataFrames - Python](https://docs.databricks.com/spark/latest/dataframes-datasets/introduction-to-dataframes-python.html) #### Selecting compute per model[​](#selecting-compute-per-model "Direct link to Selecting compute per model") Beginning in version 1.7.2, you can assign which compute resource to use on a per-model basis. For SQL models, you can select a SQL Warehouse (serverless or provisioned) or an all purpose cluster. For details on how this feature interacts with python models, see [Specifying compute for Python models](#specifying-compute-for-python-models). note This is an optional setting. If you do not configure this as shown below, we will default to the compute specified by http\_path in the top level of the output section in your profile. This is also the compute that will be used for tasks not associated with a particular model, such as gathering metadata for all tables in a schema. To take advantage of this capability, you will need to add compute blocks to your profile: profile.yml ```yaml profile-name: target: target-name # this is the default target outputs: target-name: type: databricks catalog: optional catalog name if you are using Unity Catalog schema: schema name # Required host: yourorg.databrickshost.com # Required ### This path is used as the default compute http_path: /sql/your/http/path # Required ### New compute section compute: ### Name that you will use to refer to an alternate compute Compute1: http_path: '/sql/your/http/path' # Required of each alternate compute ### A third named compute, use whatever name you like Compute2: http_path: '/some/other/path' # Required of each alternate compute ... target-name: # additional targets ... ### For each target, you need to define the same compute, ### but you can specify different paths compute: ### Name that you will use to refer to an alternate compute Compute1: http_path: '/sql/your/http/path' # Required of each alternate compute ### A third named compute, use whatever name you like Compute2: http_path: '/some/other/path' # Required of each alternate compute ... ``` The new compute section is a map of user chosen names to objects with an http\_path property. Each compute is keyed by a name which is used in the model definition/configuration to indicate which compute you wish to use for that model/selection of models. We recommend choosing a name that is easily recognized as the compute resources you're using, such as the name of the compute resource inside the Databricks UI. note You need to use the same set of names for compute across your outputs, though you may supply different http\_paths, allowing you to use different computes in different deployment scenarios. To configure this inside of dbt, use the [extended attributes feature](https://docs.getdbt.com/docs/dbt-cloud-environments.md#extended-attributes-) on the desired environments: ```yaml compute: Compute1: http_path: /SOME/OTHER/PATH Compute2: http_path: /SOME/OTHER/PATH ``` ##### Specifying the compute for models[​](#specifying-the-compute-for-models "Direct link to Specifying the compute for models") As with many other configuration options, you can specify the compute for a model in multiple ways, using `databricks_compute`. In your `dbt_project.yml`, the selected compute can be specified for all the models in a given directory: dbt\_project.yml ```yaml ... models: +databricks_compute: "Compute1" # use the `Compute1` warehouse/cluster for all models in the project... my_project: clickstream: +databricks_compute: "Compute2" # ...except for the models in the `clickstream` folder, which will use `Compute2`. snapshots: +databricks_compute: "Compute1" # all Snapshot models are configured to use `Compute1`. ``` For an individual model the compute can be specified in the model config in your schema file. schema.yml ```yaml models: - name: table_model config: databricks_compute: Compute1 columns: - name: id data_type: int ``` Alternatively the warehouse can be specified in the config of a model's SQL file. model.sql ```sql {{ config( materialized='table', databricks_compute='Compute1' ) }} select * from {{ ref('seed') }} ``` To validate that the specified compute is being used, look for lines in your dbt.log like: ```text Databricks adapter ... using default compute resource. ``` or ```text Databricks adapter ... using compute resource . ``` ##### Specifying compute for Python models[​](#specifying-compute-for-python-models "Direct link to Specifying compute for Python models") Materializing a python model requires execution of SQL as well as python. Specifically, if your python model is incremental, the current execution pattern involves executing python to create a staging table that is then merged into your target table using SQL. The python code needs to run on an all purpose cluster (or serverless cluster, see [Python Submission Methods](#python-submission-methods)), while the SQL code can run on an all purpose cluster or a SQL Warehouse. When you specify your `databricks_compute` for a python model, you are currently only specifying which compute to use when running the model-specific SQL. If you wish to use a different compute for executing the python itself, you must specify an alternate compute in the config for the model. For example: model.py ```python def model(dbt, session): dbt.config( http_path="sql/protocolv1/..." ) ``` If your default compute is a SQL Warehouse, you will need to specify an all purpose cluster `http_path` in this way. #### Persisting model descriptions[​](#persisting-model-descriptions "Direct link to Persisting model descriptions") Relation-level docs persistence is supported. For more information on configuring docs persistence, see [the docs](https://docs.getdbt.com/reference/resource-configs/persist_docs.md). When the `persist_docs` option is configured appropriately, you'll be able to see model descriptions in the `Comment` field of `describe [table] extended` or `show table extended in [database] like '*'`. #### Query tags[​](#query-tags "Direct link to Query tags") *Available in versions 1.11 or higher* [Query tags](https://docs.databricks.com/aws/en/sql/user/queries/query-tags) are a Databricks feature that allows you to attach custom key-value metadata to SQL queries. This metadata appears in system tables and query history, making it useful for tracking query costs, debugging, and auditing. Feature availability Query tags may not yet be available in all Databricks workspaces. Check the [Databricks documentation](https://docs.databricks.com/aws/en/sql/user/queries/query-tags) for the latest information on feature availability. dbt-databricks supports setting query tags at both the connection level (in your profile) and the model level (in model configs). When you run dbt, it automatically includes default tags containing dbt metadata, such as the model name and dbt version. ##### Default query tags[​](#default-query-tags "Direct link to Default query tags") dbt-databricks automatically adds the following tags to every query: | Tag key | Description | | -------------------------- | --------------------------------------------------------------- | | `@@dbt_model_name` | The name of the model being executed | | `@@dbt_core_version` | The version of dbt-core being used | | `@@dbt_databricks_version` | The version of dbt-databricks being used | | `@@dbt_materialized` | The materialization type (table, view, incremental, and so on.) | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | These reserved keys cannot be overridden by user-defined tags. ##### Configuring query tags[​](#configuring-query-tags "Direct link to Configuring query tags") You can set query tags at the connection level in your profile or at the model level in your model config. Model-level tags take precedence over connection-level tags. ###### Connection-level query tags[​](#connection-level-query-tags "Direct link to Connection-level query tags") To set query tags for all queries in a connection, add the `query_tags` parameter to your `profile.yml` file as a JSON string: \~/.dbt/profiles.yml ```yaml your_profile_name: target: dev outputs: dev: type: databricks catalog: my_catalog schema: my_schema host: yourorg.databrickshost.com http_path: /sql/your/http/path token: dapiXXXXXXXXXXXXXXXXXXXXXXX query_tags: '{"team": "analytics", "project": "customer_360"}' ``` ###### Model-level query tags[​](#model-level-query-tags "Direct link to Model-level query tags") To set query tags for a specific model, use the `query_tags` config: models/my\_model.sql ```sql {{ config( query_tags = {'cost_center': 'marketing', 'priority': 'high'} ) }} select * from {{ ref('upstream_model') }} ``` You can also configure query tags in your `dbt_project.yml` for groups of models: dbt\_project.yml ```yaml models: my_project: marketing: +query_tags: {'department': 'marketing'} finance: +query_tags: {'department': 'finance'} ``` ##### Tag precedence and merging[​](#tag-precedence-and-merging "Direct link to Tag precedence and merging") When query tags are defined at multiple levels, they are merged with the following precedence (highest to lowest): 1. Model-level tags (from `config()` or schema.yml) 2. Connection-level tags (from `profiles.yml`) 3. Default dbt tags (automatically added) If the same key appears at multiple levels, the higher-precedence value wins. Why connection-level tags? Due to how dbt merges configs, specifying `query_tags` at the model level in `config()` or `schema.yml` will **replace** any `query_tags` you defined in `dbt_project.yml` rather than merging them. This is standard dbt behavior for dictionary configs. To work around this limitation, dbt-databricks accepts `query_tags` in your connection profile (`profiles.yml`). Connection-level tags are always merged with model-level tags, allowing you to define common tags once in your profile and selectively add or override specific keys at the model level. **Recommended pattern:** * Define shared tags (team, project, environment) in your profile's `query_tags` * Use model-level `query_tags` when you need to add model-specific tags ##### Limitations[​](#limitations "Direct link to Limitations") * **Maximum 20 tags**: The total number of query tags (including default tags) cannot exceed 20. * **Value length**: Tag values must be at most 128 characters. Default tag values that exceed this limit are automatically truncated. * **Special characters**: Backslash (`\`), comma (`,`), and colon (`:`) characters in tag values are automatically escaped. A warning is logged when escaping occurs. * **Reserved keys**: The keys `@@dbt_model_name`, `@@dbt_core_version`, `@@dbt_databricks_version`, and `@@dbt_materialized` are reserved and cannot be used in user-defined tags. ##### Viewing query tags[​](#viewing-query-tags "Direct link to Viewing query tags") Query tags appear in Databricks system tables and query history. For information on how to query and analyze query tags, see the [Databricks query tags documentation](https://docs.databricks.com/aws/en/sql/user/queries/query-tags). #### Default file format configurations[​](#default-file-format-configurations "Direct link to Default file format configurations") To access advanced incremental strategies features, such as [snapshots](https://docs.getdbt.com/reference/commands/snapshot.md) and the `merge` incremental strategy, you will want to use the Delta or Hudi file format as the default file format when materializing models as tables. It's quite convenient to do this by setting a top-level configuration in your project file: dbt\_project.yml ```yml models: +file_format: delta # or hudi seeds: +file_format: delta # or hudi snapshots: +file_format: delta # or hudi ``` #### Materialized views and streaming tables[​](#materialized-views-and-streaming-tables "Direct link to Materialized views and streaming tables") [Materialized views](https://docs.databricks.com/en/sql/user/materialized-views.html) and [streaming tables](https://docs.databricks.com/en/sql/load-data-streaming-table.html) are alternatives to incremental tables that are powered by [Delta Live Tables](https://docs.databricks.com/en/delta-live-tables/index.html). See [What are Delta Live Tables?](https://docs.databricks.com/en/delta-live-tables/index.html#what-are-delta-live-tables-datasets) for more information and use cases. In order to adopt these materialization strategies, you will need a workspace that is enabled for Unity Catalog and serverless SQL Warehouses. materialized\_view.sql ```sql {{ config( materialized = 'materialized_view' ) }} ``` or streaming\_table.sql ```sql {{ config( materialized = 'streaming_table' ) }} ``` We support [on\_configuration\_change](https://docs.getdbt.com/reference/resource-configs/on_configuration_change.md) for most available properties of these materializations. The following table summarizes our configuration support: | Databricks Concept | Config Name | MV/ST support | Version | | ------------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------- | ------------------------------------- | ------- | | [PARTITIONED BY](https://docs.databricks.com/en/sql/language-manual/sql-ref-partition.html#partitioned-by) | `partition_by` | MV/ST | All | | [CLUSTER BY](https://docs.databricks.com/en/delta/clustering.html) | `liquid_clustered_by` | MV/ST | v1.11+ | | COMMENT | [`description`](https://docs.getdbt.com/reference/resource-properties/description.md) | MV/ST | All | | [TBLPROPERTIES](https://docs.databricks.com/en/sql/language-manual/sql-ref-syntax-ddl-tblproperties.html#tblproperties) | `tblproperties` | MV/ST | All | | [TAGS](https://docs.databricks.com/en/data-governance/unity-catalog/tags.html) | `databricks_tags` | MV/ST | v1.11+ | | [SCHEDULE CRON](https://docs.databricks.com/en/sql/language-manual/sql-ref-syntax-ddl-create-materialized-view.html#parameters) | `schedule: { 'cron': '\', 'time_zone_value': '\
resume recluster` query after building the target table. The `automatic_clustering` config can be specified in the `dbt_project.yml` file, or in a model `config()` block. dbt\_project.yml ```yaml models: +automatic_clustering: true ``` #### Python model configuration[​](#python-model-configuration "Direct link to Python model configuration") The Snowflake adapter supports Python models. Snowflake uses its own framework, Snowpark, which has many similarities to PySpark. **Additional setup:** You will need to [acknowledge and accept Snowflake Third Party Terms](https://docs.snowflake.com/en/developer-guide/udf/python/udf-python-packages.html#getting-started) to use Anaconda packages. **Installing packages:** Snowpark supports several popular packages via Anaconda. Refer to the [complete list](https://repo.anaconda.com/pkgs/snowflake/) for more details. Packages are installed when your model is run. Different models can have different package dependencies. If you use third-party packages, Snowflake recommends using a dedicated virtual warehouse for best performance rather than one with many concurrent users. **Python version:** To specify a different Python version, use the following configuration: ```python def model(dbt, session): dbt.config( materialized = "table", python_version="3.11" ) ``` You can use the `python_version` config to run a Snowpark model with [Python versions](https://docs.snowflake.com/en/developer-guide/snowpark/python/setup) 3.9, 3.10, or 3.11. **External access integrations and secrets**: To query external APIs within dbt Python models, use Snowflake’s [external access](https://docs.snowflake.com/en/developer-guide/external-network-access/external-network-access-overview) together with [secrets](https://docs.snowflake.com/en/developer-guide/external-network-access/secret-api-reference). Here are some additional configurations you can use: ```python import pandas import snowflake.snowpark as snowpark def model(dbt, session: snowpark.Session): dbt.config( materialized="table", secrets={"secret_variable_name": "test_secret"}, external_access_integrations=["test_external_access_integration"], ) import _snowflake return session.create_dataframe( pandas.DataFrame( [{"secret_value": _snowflake.get_generic_secret_string('secret_variable_name')}] ) ) ``` **Docs:** ["Developer Guide: Snowpark Python"](https://docs.snowflake.com/en/developer-guide/snowpark/python/index.html) ##### Third-party Snowflake packages[​](#third-party-snowflake-packages "Direct link to Third-party Snowflake packages") To use a third-party Snowflake package that isn't available in Snowflake Anaconda, upload your package by following [this example](https://docs.snowflake.com/en/developer-guide/udf/python/udf-python-packages#importing-packages-through-a-snowflake-stage), and then configure the `imports` setting in the dbt Python model to reference to the zip file in your Snowflake staging. Here’s a complete example configuration using a zip file, including using `imports` in a Python model: ```python def model(dbt, session): # Configure the model dbt.config( materialized="table", imports=["@mystage/mycustompackage.zip"], # Specify the external package location ) # Example data transformation using the imported package # (Assuming `some_external_package` has a function we can call) data = { "name": ["Alice", "Bob", "Charlie"], "score": [85, 90, 88] } df = pd.DataFrame(data) # Process data with the external package df["adjusted_score"] = df["score"].apply(lambda x: some_external_package.adjust_score(x)) # Return the DataFrame as the model output return df ``` For more information on using this configuration, refer to [Snowflake's documentation](https://community.snowflake.com/s/article/how-to-use-other-python-packages-in-snowpark) on uploading and using other python packages in Snowpark not published on Snowflake's Anaconda channel. #### Configuring virtual warehouses[​](#configuring-virtual-warehouses "Direct link to Configuring virtual warehouses") The default warehouse that dbt uses can be configured in your [Profile](https://docs.getdbt.com/docs/local/profiles.yml.md) for Snowflake connections. To override the warehouse that is used for specific models (or groups of models), use the `snowflake_warehouse` model configuration. This configuration can be used to specify a larger warehouse for certain models in order to control Snowflake costs and project build times. [Tests](https://docs.getdbt.com/docs/build/data-tests.md) also supports the `snowflake_warehouse` configuration. This can be useful when you want to you run tests on a different Snowflake virtual warehouse than the one used to build models, for example, using a smaller warehouse for lightweight data tests while models run on a larger warehouse. * Project file * Property file * SQL file config The following example changes the warehouse for a group of models with a config argument in the YAML. dbt\_project.yml ```yaml name: my_project version: 1.0.0 ... models: +snowflake_warehouse: "EXTRA_SMALL" # default Snowflake virtual warehouse for all models in the project. my_project: clickstream: +snowflake_warehouse: "EXTRA_LARGE" # override the default Snowflake virtual warehouse for all models under the `clickstream` directory. snapshots: +snowflake_warehouse: "EXTRA_LARGE" # all Snapshot models are configured to use the `EXTRA_LARGE` warehouse. data_tests: +snowflake_warehouse: "EXTRA_SMALL" # all data tests are configured to use the `EXTRA_SMALL` warehouse. ``` The following example overrides the Snowflake warehouse for a single model and a specific test using a config argument in the property file. models/my\_model.yml ```yaml models: - name: my_model config: snowflake_warehouse: "EXTRA_LARGE" # override the Snowflake virtual warehouse just for this model columns: - name: id data_tests: - unique: config: snowflake_warehouse: "EXTRA_SMALL" # use a smaller warehouse for this test ``` The following example changes the warehouse for a single model with a config() block in the SQL model. models/events/sessions.sql ```sql # override the Snowflake virtual warehouse for just this model {{ config( materialized='table', snowflake_warehouse='EXTRA_LARGE' ) }} with aggregated_page_events as ( select session_id, min(event_time) as session_start, max(event_time) as session_end, count(*) as count_page_views from {{ source('snowplow', 'event') }} group by 1 ), index_sessions as ( select *, row_number() over ( partition by session_id order by session_start ) as page_view_in_session_index from aggregated_page_events ) select * from index_sessions ``` #### Copying grants[​](#copying-grants "Direct link to Copying grants") When the `copy_grants` config is set to `true`, dbt will add the `copy grants` DDL qualifier when rebuilding tables and views. The default value is `false`. dbt\_project.yml ```yaml models: +copy_grants: true ``` #### Secure views[​](#secure-views "Direct link to Secure views") To create a Snowflake [secure view](https://docs.snowflake.net/manuals/user-guide/views-secure.html), use the `secure` config for view models. Secure views can be used to limit access to sensitive data. Note: secure views may incur a performance penalty, so you should only use them if you need them. The following example configures the models in the `sensitive/` folder to be configured as secure views. dbt\_project.yml ```yaml name: my_project version: 1.0.0 models: my_project: sensitive: +materialized: view +secure: true ``` #### Source freshness known limitation[​](#source-freshness-known-limitation "Direct link to Source freshness known limitation") Snowflake calculates source freshness using information from the `LAST_ALTERED` column, meaning it relies on a field updated whenever any object undergoes modification, not only data updates. No action must be taken, but analytics teams should note this caveat. Per the [Snowflake documentation](https://docs.snowflake.com/en/sql-reference/info-schema/tables#usage-notes): > The `LAST_ALTERED` column is updated when the following operations are performed on an object: > > * DDL operations. > * DML operations (for tables only). > * Background maintenance operations on metadata performed by Snowflake. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Snowflake permissions In Snowflake, permissions are used to control who can perform certain actions on different database objects. Use SQL statements to manage permissions in a Snowflake database. #### Set up Snowflake account[​](#set-up-snowflake-account "Direct link to Set up Snowflake account") This section explains how to set up permissions and roles within Snowflake. In Snowflake, you would perform these actions using SQL commands and set up your data warehouse and access control within Snowflake's ecosystem. 1. Set up databases ```text use role sysadmin; create database raw; create database analytics; ``` 2. Set up warehouses ```text create warehouse loading warehouse_size = xsmall auto_suspend = 3600 auto_resume = false initially_suspended = true; create warehouse transforming warehouse_size = xsmall auto_suspend = 60 auto_resume = true initially_suspended = true; create warehouse reporting warehouse_size = xsmall auto_suspend = 60 auto_resume = true initially_suspended = true; ``` 3. Set up roles and warehouse permissions ```text use role securityadmin; create role loader; grant all on warehouse loading to role loader; create role transformer; grant all on warehouse transforming to role transformer; create role reporter; grant all on warehouse reporting to role reporter; ``` 4. Create users, assigning them to their roles Every person and application gets a separate user and is assigned to the correct role. ```text create user stitch_user -- or fivetran_user password = '_generate_this_' default_warehouse = loading default_role = loader; create user claire -- or amy, jeremy, etc. password = '_generate_this_' default_warehouse = transforming default_role = transformer must_change_password = true; create user dbt_cloud_user password = '_generate_this_' default_warehouse = transforming default_role = transformer; create user looker_user -- or mode_user etc. password = '_generate_this_' default_warehouse = reporting default_role = reporter; -- then grant these roles to each user grant role loader to user stitch_user; -- or fivetran_user grant role transformer to user dbt_cloud_user; grant role transformer to user claire; -- or amy, jeremy grant role reporter to user looker_user; -- or mode_user, periscope_user ``` 5. Let loader load data Give the role unilateral permission to operate on the raw database ```text use role sysadmin; grant all on database raw to role loader; ``` 6. Let transformer transform data The transformer role needs to be able to read raw data. If you do this before you have any data loaded, you can run: ```text grant usage on database raw to role transformer; grant usage on future schemas in database raw to role transformer; grant select on future tables in database raw to role transformer; grant select on future views in database raw to role transformer; ``` If you already have data loaded in the raw database, make sure also you run the following to update the permissions ```text grant usage on all schemas in database raw to role transformer; grant select on all tables in database raw to role transformer; grant select on all views in database raw to role transformer; ``` transformer also needs to be able to create in the analytics database: ```text grant all on database analytics to role transformer; ``` 7. Let reporter read the transformed data A previous version of this article recommended this be implemented through hooks in dbt, but this way lets you get away with a one-off statement. ```text grant usage on database analytics to role reporter; grant usage on future schemas in database analytics to role reporter; grant select on future tables in database analytics to role reporter; grant select on future views in database analytics to role reporter; ``` Again, if you already have data in your analytics database, make sure you run: ```text grant usage on all schemas in database analytics to role reporter; grant select on all tables in database analytics to role reporter; grant select on all views in database analytics to role reporter; ``` 8. Maintain When new users are added, make sure you add them to the right role! Everything else should be inherited automatically thanks to those `future` grants. For more discussion and legacy information, refer to [this Discourse article](https://discourse.getdbt.com/t/setting-up-snowflake-the-exact-grant-statements-we-run/439). #### Example Snowflake permissions[​](#example-snowflake-permissions "Direct link to Example Snowflake permissions") The following example provides you with the SQL statements you can use to manage permissions. **Note** that `warehouse_name`, `database_name`, and `role_name` are placeholders and you can replace them as needed for your organization's naming convention. ```text grant all on warehouse warehouse_name to role role_name; grant usage on database database_name to role role_name; grant create schema on database database_name to role role_name; grant usage on schema database.an_existing_schema to role role_name; grant create table on schema database.an_existing_schema to role role_name; grant create view on schema database.an_existing_schema to role role_name; grant usage on future schemas in database database_name to role role_name; grant monitor on future schemas in database database_name to role role_name; grant select on future tables in database database_name to role role_name; grant select on future views in database database_name to role role_name; grant usage on all schemas in database database_name to role role_name; grant monitor on all schemas in database database_name to role role_name; grant select on all tables in database database_name to role role_name; grant select on all views in database database_name to role role_name; ``` #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Source configurations #### Available configurations[​](#available-configurations "Direct link to Available configurations") ##### General configurations[​](#general-configurations "Direct link to General configurations") General configurations provide broader operational settings applicable across multiple resource types. Like resource-specific configurations, these can also be set in the project file, property files, or within resource-specific files. * Project YAML file * Properties YAML file dbt\_project.yml models/properties.yml #### Configuring sources[​](#configuring-sources "Direct link to Configuring sources") Sources can be configured via a `config:` block within their `.yml` definitions, or from the `dbt_project.yml` file under the `sources:` key. This configuration is most useful for configuring sources imported from [a package](https://docs.getdbt.com/docs/build/packages.md). You can disable sources imported from a package to prevent them from rendering in the documentation, or to prevent [source freshness checks](https://docs.getdbt.com/docs/build/sources.md#source-data-freshness) from running on source tables imported from packages. * **Note**: To disable a source table nested in a properties YAML file in a subfolder, you will need to supply the subfolder(s) within the path to that properties YAML file, as well as the source name and the table name in the project YAML file (`dbt_project.yml`).

The following example shows how to disable a source table nested in a properties YAML file in a subfolder: dbt\_project.yml ##### Examples[​](#examples "Direct link to Examples") The following examples show how to configure sources in your dbt project. — [Disable all sources imported from a package](#disable-all-sources-imported-from-a-package)
— [Conditionally enable a single source](#conditionally-enable-a-single-source)
— [Disable a single source from a package](#disable-a-single-source-from-a-package)
— [Configure a source with an `event_time`](#configure-a-source-with-an-event_time)
— [Configure meta to a source](#configure-meta-to-a-source)
— [Configure source freshness](#configure-source-freshness)
###### Disable all sources imported from a package[​](#disable-all-sources-imported-from-a-package "Direct link to Disable all sources imported from a package") To apply a configuration to all sources included from a [package](https://docs.getdbt.com/docs/build/packages.md), state your configuration under the [project name](https://docs.getdbt.com/reference/project-configs/name.md) in the `sources:` config as a part of the resource path. dbt\_project.yml ```yml sources: events: +enabled: false ``` ###### Conditionally enable a single source[​](#conditionally-enable-a-single-source "Direct link to Conditionally enable a single source") When defining a source, you can disable the entire source, or specific source tables, using the inline `config` property. You can also specify `database` and `schema` to override the target database and schema: models/sources.yml ```yml sources: - name: my_source database: raw schema: my_schema config: enabled: true tables: - name: my_source_table # enabled - name: ignore_this_one # not enabled config: enabled: false ``` You can configure specific source tables, and use [variables](https://docs.getdbt.com/reference/dbt-jinja-functions/var.md) as the input to that configuration: models/sources.yml ```yml sources: - name: my_source tables: - name: my_source_table config: enabled: "{{ var('my_source_table_enabled', false) }}" ``` ###### Disable a single source from a package[​](#disable-a-single-source-from-a-package "Direct link to Disable a single source from a package") To disable a specific source from another package, qualify the resource path for your configuration with both a package name and a source name. In this case, we're disabling the `clickstream` source from the `events` package. dbt\_project.yml ```yml sources: events: clickstream: +enabled: false ``` Similarly, you can disable a specific table from a source by qualifying the resource path with a package name, source name, and table name: dbt\_project.yml ```yml sources: events: clickstream: pageviews: +enabled: false ``` ###### Configure a source with an `event_time`[​](#configure-a-source-with-an-event_time "Direct link to configure-a-source-with-an-event_time") ###### Configure meta to a source[​](#configure-meta-to-a-source "Direct link to Configure meta to a source") Use the `meta` field to assign metadata information to sources. This is useful for tracking additional context, documentation, logging, and more. For example, you can add `meta` information to a `clickstream` source to include information about the data source system: dbt\_project.yml ```yaml sources: events: clickstream: +meta: source_system: "Google analytics" data_owner: "marketing_team" ``` ###### Configure source freshness[​](#configure-source-freshness "Direct link to Configure source freshness") Use a `freshness` block to define expectations about how frequently a table is updated with new data, and to raise warnings and errors when those expectation are not met. dbt compares the most recently updated timestamp calculated from a column, warehouse metadata, or custom query against the current timestamp when the freshness check is running. You can provide one or both of the `warn_after` and `error_after` parameters. If neither is provided, then dbt will not calculate freshness snapshots for the tables in this source. For more information, see [freshness](https://docs.getdbt.com/reference/resource-properties/freshness.md). See the following example of a `dbt_project.yml` file using the `freshness` config: dbt\_project.yml ```yml sources: : +freshness: warn_after: count: 4 period: hour ``` #### Example source configuration[​](#example-source-configuration "Direct link to Example source configuration") The following is a valid source configuration for a project with: * `name: jaffle_shop` * A package called `events` containing multiple source tables dbt\_project.yml ```yml name: jaffle_shop config-version: 2 ... sources: # project names jaffle_shop: +enabled: true events: # source names clickstream: # table names pageviews: +enabled: false link_clicks: +enabled: true ``` #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Source properties #### Related documentation[​](#related-documentation "Direct link to Related documentation") * [Using sources](https://docs.getdbt.com/docs/build/sources.md) * [Declaring resource properties](https://docs.getdbt.com/reference/configs-and-properties.md) #### Overview[​](#overview "Direct link to Overview") Source properties can be declared in any `properties.yml` file in your `models/` directory (as defined by the [`model-paths` config](https://docs.getdbt.com/reference/project-configs/model-paths.md)). Source properties are "special properties" in that you can't configure them in the `dbt_project.yml` file or using `config()` blocks. Refer to [Configs and properties](https://docs.getdbt.com/reference/define-properties#which-properties-are-not-also-configs) for more info.
You can name these files `whatever_you_want.yml`, and nest them arbitrarily deeply in subfolders within the `models/` directory: models/\.yml ```yml sources: - name: # required description: database: schema: loader: # requires v1.1+ config: : freshness: # changed to config in v1.10 loaded_at_field: warn_after: count: period: minute | hour | day error_after: count: period: minute | hour | day filter: meta: {} # changed to config in v1.10 tags: [] # changed to config in v1.10 # deprecated in v1.10 overrides: quoting: database: true | false schema: true | false identifier: true | false tables: - name: #required description: identifier: data_tests: - - ... # declare additional tests config: loaded_at_field: meta: {} tags: [] freshness: warn_after: count: period: minute | hour | day error_after: count: period: minute | hour | day filter: quoting: database: true | false schema: true | false identifier: true | false external: {} columns: - name: # required description: quote: true | false data_tests: - - ... # declare additional tests config: meta: {} tags: [] - name: ... # declare properties of additional columns - name: ... # declare properties of additional source tables - name: ... # declare properties of additional sources ``` #### Example[​](#example "Direct link to Example") models/\.yml ```yaml sources: - name: jaffle_shop database: raw schema: public loader: emr # informational only (free text) config: # changed to config in v1.10 loaded_at_field: _loaded_at # configure for all sources # meta fields are rendered in auto-generated documentation meta: # changed to config in v1.10 contains_pii: true owner: "@alice" # Add tags to this source tags: # changed to config in v1.10 - ecom - pii quoting: database: false schema: false identifier: false tables: - name: orders identifier: Orders_ config: # changed to config in v1.10 loaded_at_field: updated_at # override source defaults columns: - name: id data_tests: - unique - name: price_in_usd data_tests: - not_null - name: customers quoting: identifier: true # override source defaults columns: data_tests: - unique ``` #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Sources JSON file **Current schema:** [`v3`](https://schemas.getdbt.com/dbt/sources/v3/index.html) **Produced by:** [`source freshness`](https://docs.getdbt.com/reference/commands/source.md) This file contains information about [sources with freshness checks](https://docs.getdbt.com/docs/build/sources.md#checking-source-freshness). Today, dbt uses this file to power its [Source Freshness visualization](https://docs.getdbt.com/docs/build/sources.md#source-data-freshness). ##### Top-level keys[​](#top-level-keys "Direct link to Top-level keys") * [`metadata`](https://docs.getdbt.com/reference/artifacts/dbt-artifacts.md#common-metadata) * `elapsed_time`: Total invocation time in seconds. * `results`: Array of freshness-check execution details. Each entry in `results` is a dictionary with the following keys: * `unique_id`: Unique source node identifier, which map results to `sources` in the [manifest](https://docs.getdbt.com/reference/artifacts/manifest-json.md) * `max_loaded_at`: Max value of `loaded_at_field` timestamp in the source table when queried. * `snapshotted_at`: Current timestamp when querying. * `max_loaded_at_time_ago_in_s`: Interval between `max_loaded_at` and `snapshotted_at`, calculated in python to handle timezone complexity. * `criteria`: The freshness threshold(s) for this source, defined in the project. * `status`: The freshness status of this source, based on `max_loaded_at_time_ago_in_s` + `criteria`, reported on the CLI. One of `pass`, `warn`, or `error` if the query succeeds, `runtime error` if the query fails. * `execution_time`: Total time spent checking freshness for this source * `timing`: Array that breaks down execution time into steps (`compile` + `execute`) * `adapter_response`: Dictionary of metadata returned from the database, which varies by adapter. For example, success `code`, number of `rows_affected`, total `bytes_processed`, and so on. Not applicable for [data tests](https://docs.getdbt.com/docs/build/data-tests.md). * `rows_affected` returns the number of rows modified by the last statement executed. In cases where the query's row count can't be determined or isn't applicable (such as when creating a view), a [standard value](https://peps.python.org/pep-0249/#rowcount) of `-1` is returned for `rowcount`. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### sql_header `sql_header` does not support Jinja or macros like `ref` and `source` Unlike [pre-hooks and post-hooks](https://docs.getdbt.com/reference/resource-configs/pre-hook-post-hook.md), macros like [`ref`](https://docs.getdbt.com/reference/dbt-jinja-functions/ref.md), [`source`](https://docs.getdbt.com/reference/dbt-jinja-functions/source.md), and references like [`{{ this }}`](https://docs.getdbt.com/reference/dbt-jinja-functions/this.md), aren't supported. The primary function of `set_sql_header` is fairly limited. It's intended to: * [Create UDFs](https://docs.getdbt.com/reference/resource-configs/sql_header.md#create-a-bigquery-temporary-udf) * [Set script variables](https://cloud.google.com/bigquery/docs/reference/standard-sql/procedural-language) (BigQuery) * [Set temporary session parameters](https://docs.getdbt.com/reference/resource-configs/sql_header.md#set-snowflake-session-parameters-for-a-particular-model) (Snowflake) - Models - Seeds - Snapshots - Property file models/\.sql ```sql {{ config( sql_header="" ) }} select ... ``` dbt\_project.yml ```yml config-version: 2 models: : +sql_header: ``` This config is not implemented for seeds snapshots/\.sql ```sql {% snapshot snapshot_name %} {{ config( sql_header="" ) }} select ... {% endsnapshot %} ``` dbt\_project.yml ```yml snapshots: : +sql_header: ``` Setting `sql_header` in the `config` of a [generic data test](https://docs.getdbt.com/docs/build/data-tests.md) is available starting in dbt Core v1.12. Enable the [`require_sql_header_in_test_configs`](https://docs.getdbt.com/reference/global-configs/behavior-changes.md#sql_header-in-data-tests) flag to use `sql_header` in `properties.yml` for generic data tests. Here's an example of a model-level configuration: models/properties.yml ```yaml models: - name: orders data_tests: - unique: name: unique_orders_order_id arguments: column_name: order_id config: sql_header: "-- SQL_HEADER_TEST_MARKER" ``` You can also use `sql_header` for column-level data tests: models/properties.yml ```yaml models: - name: orders columns: - name: order_id data_tests: - not_null: name: not_null_orders_order_id config: sql_header: "-- SQL_HEADER_TEST_MARKER" ``` #### Definition[​](#definition "Direct link to Definition") An optional configuration to inject SQL above the `create table as` and `create view as` statements that dbt executes when building models and snapshots. `sql_header`s can be set using the config, or by `call`-ing the `set_sql_header` macro (example below). #### Comparison to pre-hooks[​](#comparison-to-pre-hooks "Direct link to Comparison to pre-hooks") [Pre-hooks](https://docs.getdbt.com/reference/resource-configs/pre-hook-post-hook.md) also provide an opportunity to execute SQL before model creation, as a *preceding* query. In comparison, SQL in a `sql_header` is run in the same *query* as the `create table|view as` statement. As a result, this makes it more useful for [Snowflake session parameters](https://docs.snowflake.com/en/sql-reference/parameters.html) and [BigQuery Temporary UDFs](https://cloud.google.com/bigquery/docs/reference/standard-sql/user-defined-functions#sql-udf-examples). #### Examples[​](#examples "Direct link to Examples") ##### Set Snowflake session parameters for a particular model[​](#set-snowflake-session-parameters-for-a-particular-model "Direct link to Set Snowflake session parameters for a particular model") This uses the config block syntax: models/my\_model.sql ```sql {{ config( sql_header="alter session set timezone = 'Australia/Sydney';" ) }} select * from {{ ref('other_model') }} ``` ##### Set Snowflake session parameters for all models[​](#set-snowflake-session-parameters-for-all-models "Direct link to Set Snowflake session parameters for all models") dbt\_project.yml ```yml config-version: 2 models: +sql_header: "alter session set timezone = 'Australia/Sydney';" ``` ##### Create a BigQuery Temporary UDF[​](#create-a-bigquery-temporary-udf "Direct link to Create a BigQuery Temporary UDF") This example calls the `set_sql_header` macro. This macro is a convenience wrapper which you may choose to use if you have a multi-line SQL statement to inject. You do not need to use the `sql_header` configuration key in this case. models/my\_model.sql ```sql -- Supply a SQL header: {% call set_sql_header(config) %} CREATE TEMPORARY FUNCTION yes_no_to_boolean(answer STRING) RETURNS BOOLEAN AS ( CASE WHEN LOWER(answer) = 'yes' THEN True WHEN LOWER(answer) = 'no' THEN False ELSE NULL END ); {%- endcall %} -- Supply your model code: select yes_no_to_boolean(yes_no) from {{ ref('other_model') }} ``` #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Starburst/Trino configurations #### Cluster requirements[​](#cluster-requirements "Direct link to Cluster requirements") The designated cluster must have an attached catalog where objects such as tables and views can be created, renamed, altered, and dropped. Any user connecting to the cluster with dbt must also have these same permissions for the target catalog. #### Session properties[​](#session-properties "Direct link to Session properties") With a Starburst Enterprise, Starburst Galaxy, or Trino cluster, you can [set session properties](https://trino.io/docs/current/sql/set-session.html) to modify the current configuration for your user session. The standard way to define session properties is with the `session_properties` field of your `profiles.yml`. This ensures that all dbt connections use these settings by default. However, to temporaily adjust these session properties for a specific dbt model or group of models, you can use a [dbt hook](https://docs.getdbt.com/reference/resource-configs/pre-hook-post-hook.md) to set session properties on a specific dbt model. For example: ```sql {{ config( pre_hook="set session query_max_run_time='10m'" ) }} ``` #### Connector properties[​](#connector-properties "Direct link to Connector properties") You can use Starburst/Trino table properties to configure how you want your data to be represented. For details on what's supported for each supported data source, refer to either the [Trino Connectors](https://trino.io/docs/current/connector.html) or [Starburst Catalog](https://docs.starburst.io/starburst-galaxy/catalogs/). ##### Hive catalogs[​](#hive-catalogs "Direct link to Hive catalogs") At target catalog that uses the Hive connector and a metastore service (HMS) is typical when working with Starburst and dbt. The following settings are recommended for working with dbt. The intent is to ensure that dbt can perform the frequently executed `DROP` and `RENAME` statements. ```java hive.metastore-cache-ttl=0s hive.metastore-refresh-interval=5s ``` #### File format configuration[​](#file-format-configuration "Direct link to File format configuration") When using file-based connectors such as Hive, a user can customize aspects of the connector such as the format that is used as well the type of materialization The below configures the table to be materializes as a set of partitioned [Parquet](https://spark.apache.org/docs/latest/sql-data-sources-parquet.html) files. ```sql {{ config( materialized='table', properties= { "format": "'PARQUET'", "partitioning": "ARRAY['bucket(id, 2)']", } ) }} ``` #### Seeds and prepared statements[​](#seeds-and-prepared-statements "Direct link to Seeds and prepared statements") The [dbt seed](https://docs.getdbt.com/docs/build/seeds.md) command makes use of prepared statements in [Starburst](https://docs.starburst.io/latest/sql/prepare.html)/[Trino](https://trino.io/docs/current/sql/prepare.html). Prepared statements are templated SQL statements that you can execute repeatedly with high efficiency. The values are sent in a separate field rather than hard coded in the SQL string itself. This is often how application frontends structure their record `INSERT` statements in the OLTP database backend. Because of this, it's common for prepared statements to have as many placeholder variables (parameters) as there are columns in the destination table. Most seed files have more than one row, and often thousands of rows. This makes the size of the client request as large as there are parameters. ##### Header line length limit in Python HTTP client[​](#header-line-length-limit-in-python-http-client "Direct link to Header line length limit in Python HTTP client") You might run into an error message about header line limit if your prepared statements have too many parameters. This is because the header line limit in Python's HTTP client is `65536` bytes. You can avoid this upper limit by converting the large prepared statement into smaller statements. dbt already does this by batching an entire seed file into groups of rows — one group for a number of rows in the CSV. Let's say you have a seed file with 20 columns, 600 rows, and 12,000 parameters. Instead of creating a single prepared statement for this, you can have dbt create four prepared `INSERT` statements with 150 rows and 3,000 parameters. There's a drawback to grouping your table rows. When there are many columns (parameters) in a seed file, the batch size needs to be very small. For the `dbt-trino` adapter, the macro for batch size is `trino__get_batch_size()` and its default value is `1000`. To change this default behavior, you can add this macro to your dbt project: macros/YOUR\_MACRO\_NAME.sql ```sql {% macro trino__get_batch_size() %} {{ return(10000) }} -- Adjust this number as you see fit {% endmacro %} ``` Another way to avoid the header line length limit is to set `prepared_statements_enabled` to `true` in your dbt profile; however, this is considered legacy behavior and can be removed in a future release. #### Materializations[​](#materializations "Direct link to Materializations") ##### Table[​](#table "Direct link to Table") The `dbt-trino` adapter supports these modes in `table` materialization (and [full-refresh runs](https://docs.getdbt.com/reference/commands/run.md#refresh-incremental-models) in `incremental` materialization), which you can configure with `on_table_exists`: * `rename` — Creates an intermediate table, renames the target table to the backup one, and renames the intermediate table to the target one. * `drop` — Drops and re-creates a table. This overcomes the table rename limitation in AWS Glue. * `replace` — Replaces a table using CREATE OR REPLACE clause. Support for table replacement varies across connectors. Refer to the connector documentation for details. * `skip` — Skips table materialization altogether using a CREATE TABLE IF NOT EXISTS clause. If CREATE OR REPLACE is supported in underlying connector, `replace` is recommended option. Otherwise, the recommended `table` materialization uses `on_table_exists = 'rename'` and is also the default. You can change this default configuration by editing *one* of these files: * the SQL file for your model * the `dbt_project.yml` configuration file The following examples configure `table` materialization to be `drop`: models/YOUR\_MODEL\_NAME.sql ```sql {{ config( materialized = 'table', on_table_exists = 'drop` ) }} ``` dbt\_project.yml ```yaml models: path: materialized: table +on_table_exists: drop ``` If you use `table` materialization and `on_table_exists = 'rename'` with AWS Glue, you might encounter this error message. You can overcome the table rename limitation by using `drop`: ```sh TrinoUserError(type=USER_ERROR, name=NOT_SUPPORTED, message="Table rename is not yet supported by Glue service") ``` ##### View[​](#view "Direct link to View") The `dbt-trino` adapter supports these security modes in `view` materialization, which you can configure with `view_security`: * `definer` * `invoker` For more details about security modes in views, see [Security](https://trino.io/docs/current/sql/create-view.html#security) in the Trino docs. By default, `view` materialization uses `view_security = 'definer'`. You can change this default configuration by editing *one* of these files: * the SQL file for your model * the `dbt_project.yml` configuration file For example, these configure the security mode to `invoker`: models/YOUR\_MODEL\_NAME.sql ```sql {{ config( materialized = 'view', view_security = 'invoker' ) }} ``` dbt\_project.yml ```yaml models: path: materialized: view +view_security: invoker ``` ##### Incremental[​](#incremental "Direct link to Incremental") Using an incremental model limits the amount of data that needs to be transformed, which greatly reduces the runtime of your transformations. This improves performance and reduces compute costs. ```sql {{ config( materialized = 'incremental', unique_key='', incremental_strategy='',) }} select * from {{ ref('events') }} {% if is_incremental() %} where event_ts > (select max(event_ts) from {{ this }}) {% endif %} ``` Use the `+on_schema_change` property to define how dbt-trino should handle column changes. For more details about this property, see [column changes](https://docs.getdbt.com/docs/build/incremental-models.md#what-if-the-columns-of-my-incremental-model-change). If your connector doesn't support views, set the `+views_enabled` property to `false`. You can decide how model should be rebuilt in a `full-refresh` run by specifying `on_table_exists` config. Options are the same as described in [table materialization section](https://docs.getdbt.com/reference/resource-configs/trino-configs.md#table) ###### append strategy[​](#append-strategy "Direct link to append strategy") The default incremental strategy is `append`. `append` only adds new records based on the condition specified in the `is_incremental()` conditional block. ```sql {{ config( materialized = 'incremental') }} select * from {{ ref('events') }} {% if is_incremental() %} where event_ts > (select max(event_ts) from {{ this }}) {% endif %} ``` ###### delete+insert strategy[​](#deleteinsert-strategy "Direct link to delete+insert strategy") With the `delete+insert` incremental strategy, you can instruct dbt to use a two-step incremental approach. First, it deletes the records detected through the configured `is_incremental()` block, then re-inserts them. ```sql {{ config( materialized = 'incremental', unique_key='user_id', incremental_strategy='delete+insert', ) }} select * from {{ ref('users') }} {% if is_incremental() %} where updated_ts > (select max(updated_ts) from {{ this }}) {% endif %} ``` ###### merge strategy[​](#merge-strategy "Direct link to merge strategy") With the `merge` incremental strategy, dbt-trino constructs a [Trino MERGE statement](https://trino.io/docs/current/sql/merge.html) to `insert` new records and `update` existing records, based on the `unique_key` property. If `unique_key` is not unique, you can use the `delete+insert` strategy instead. ```sql {{ config( materialized = 'incremental', unique_key='user_id', incremental_strategy='merge', ) }} select * from {{ ref('users') }} {% if is_incremental() %} where updated_ts > (select max(updated_ts) from {{ this }}) {% endif %} ``` Be aware that there are some Trino connectors that don't support `MERGE` or have limited support. ###### Incremental overwrite on Hive models[​](#incremental-overwrite-on-hive-models "Direct link to Incremental overwrite on Hive models") If there's a [Hive connector](https://trino.io/docs/current/connector/hive.html) accessing your target incremental model, you can simulate an `INSERT OVERWRITE` statement by using the `insert-existing-partitions-behavior` setting on the Hive connector configuration in Trino: ```ini .insert-existing-partitions-behavior=OVERWRITE ``` Below is an example Hive configuration that sets the `OVERWRITE` functionality for a Hive connector called `minio`: ```yaml trino-incremental-hive: target: dev outputs: dev: type: trino method: none user: admin password: catalog: minio schema: tiny host: localhost port: 8080 http_scheme: http session_properties: minio.insert_existing_partitions_behavior: OVERWRITE threads: 1 ``` `dbt-trino` overwrites existing partitions in the target model that match the staged data. It appends the remaining partitions to the target model. This functionality works on incremental models that use partitioning. For example: ```sql {{ config( materialized = 'incremental', properties={ "format": "'PARQUET'", "partitioned_by": "ARRAY['day']", } ) }} ``` ##### Materialized view[​](#materialized-view "Direct link to Materialized view") The `dbt-trino` adapter supports [materialized views](https://trino.io/docs/current/sql/create-materialized-view.html) and refreshes them for every subsequent `dbt run` that you execute. For more information, see [REFRESH MATERIALIZED VIEW](https://trino.io/docs/current/sql/refresh-materialized-view.html) in the Trino docs. You can also define custom properties for the materialized view through the `properties` config. This materialization supports the [full\_refresh](https://docs.getdbt.com/reference/resource-configs/full_refresh.md) config and flag. Whenever you want to rebuild your materialized view (for example, when changing underlying SQL query) run `dbt run --full-refresh`. You can create a materialized view by editing *one* of these files: * the SQL file for your model * the `dbt_project.yml` configuration file The following examples create a materialized view in Parquet format: models/YOUR\_MODEL\_NAME.sql ```sql {{ config( materialized = 'materialized_view', properties = { 'format': "'PARQUET'" }, ) }} ``` dbt\_project.yml ```yaml models: path: materialized: materialized_view properties: format: "'PARQUET'" ``` #### Snapshots[​](#snapshots "Direct link to Snapshots") [Snapshots in dbt](https://docs.getdbt.com/docs/build/snapshots.md) depend on the `current_timestamp` macro, which returns a timestamp with millisecond precision (3 digits) by default. There are some connectors for Trino that don't support this timestamp precision (`TIMESTAMP(3) WITH TIME ZONE`), like Iceberg. To change timestamp precision, you can define your own [macro](https://docs.getdbt.com/docs/build/jinja-macros.md). For example, this defines a new `trino__current_timestamp()` macro with microsecond precision (6 digits): macros/YOUR\_MACRO\_NAME.sql ```sql {% macro trino__current_timestamp() %} current_timestamp(6) {% endmacro %} ``` #### Grants[​](#grants "Direct link to Grants") Use [grants](https://docs.getdbt.com/reference/resource-configs/grants.md) to manage access to the datasets you're producing with dbt. You can use grants with [Starburst Enterprise](https://docs.starburst.io/latest/security/biac-overview.html), [Starburst Galaxy](https://docs.starburst.io/starburst-galaxy/security/access-control.html), and Hive ([sql-standard](https://trino.io/docs/current/connector/hive-security.html)). To implement access permissions, define grants as resource configs on each model, seed, and snapshot. Define the default grants that apply to the entire project in your `dbt_project.yml` and define model-specific grants within each model's SQL or YAML file. dbt\_project.yml ```yaml models: - name: NAME_OF_YOUR_MODEL config: grants: select: ['reporter', 'bi'] ``` #### Model contracts[​](#model-contracts "Direct link to Model contracts") The `dbt-trino` adapter supports [model contracts](https://docs.getdbt.com/docs/mesh/govern/model-contracts.md). Currently, only [constraints](https://docs.getdbt.com/reference/resource-properties/constraints.md) with `type` as `not_null` are supported. Before using `not_null` constraints in your model, make sure the underlying connector supports `not null`, to avoid running into errors. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Starrocks configurations #### Model Configuration[​](#model-configuration "Direct link to Model Configuration") A dbt model can be configured using the following syntax: * Project YAML file * Properties YAML file * SQL file config dbt\_project.yml ```yaml models: : materialized: table // table or view or materialized_view keys: ['id', 'name', 'some_date'] table_type: 'PRIMARY' // PRIMARY or DUPLICATE or UNIQUE distributed_by: ['id'] buckets: 3 // default 10 partition_by: ['some_date'] partition_by_init: ["PARTITION p1 VALUES [('1971-01-01 00:00:00'), ('1991-01-01 00:00:00')),PARTITION p1972 VALUES [('1991-01-01 00:00:00'), ('1999-01-01 00:00:00'))"] properties: [{"replication_num":"1", "in_memory": "true"}] refresh_method: 'async' // only for materialized view default manual ``` models/properties.yml ```yaml models: - name: config: materialized: table // table or view or materialized_view keys: ['id', 'name', 'some_date'] table_type: 'PRIMARY' // PRIMARY or DUPLICATE or UNIQUE distributed_by: ['id'] buckets: 3 // default 10 partition_by: ['some_date'] partition_by_init: ["PARTITION p1 VALUES [('1971-01-01 00:00:00'), ('1991-01-01 00:00:00')),PARTITION p1972 VALUES [('1991-01-01 00:00:00'), ('1999-01-01 00:00:00'))"] properties: [{"replication_num":"1", "in_memory": "true"}] refresh_method: 'async' // only for materialized view default manual ``` models/\.sql ```jinja {{ config( materialized = 'table', keys=['id', 'name', 'some_date'], table_type='PRIMARY', distributed_by=['id'], buckets=3, partition_by=['some_date'], .... ) }} ``` ##### Configuration Description[​](#configuration-description "Direct link to Configuration Description") | Option | Description | | ------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | `materialized` | How the model will be materialized into Starrocks. Supports view, table, incremental, ephemeral, and materialized\_view. | | `keys` | Which columns serve as keys. | | `table_type` | Table type, supported are PRIMARY or DUPLICATE or UNIQUE. | | `distributed_by` | Specifies the column of data distribution. If not specified, it defaults to random. | | `buckets` | The bucket number in one partition. If not specified, it will be automatically inferred. | | `partition_by` | The partition column list. | | `partition_by_init` | The partition rule or some real partitions item. | | `properties` | The table properties configuration of Starrocks. ([Starrocks table properties](https://docs.starrocks.io/en-us/latest/sql-reference/sql-statements/data-definition/CREATE_TABLE#properties)) | | `refresh_method` | How to refresh materialized views. | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | #### Read From Catalog[​](#read-from-catalog "Direct link to Read From Catalog") First you need to add this catalog to starrocks. The following is an example of hive. ```sql CREATE EXTERNAL CATALOG `hive_catalog` PROPERTIES ( "hive.metastore.uris" = "thrift://127.0.0.1:8087", "type"="hive" ); ``` How to add other types of catalogs can be found in the documentation. [Catalog Overview](https://docs.starrocks.io/en-us/latest/data_source/catalog/catalog_overview) Then write the sources.yaml file. ```yaml sources: - name: external_example schema: hive_catalog.hive_db tables: - name: hive_table_name ``` Finally, you might use below marco quote ```jinja {{ source('external_example', 'hive_table_name') }} ``` #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Static analysis Use the `--static-analysis` flag to override model-level `static_analysis` behavior for a single run. This flag applies to the dbt Fusion engine only; it is ignored by dbt Core. Values: * `baseline` (default): Statically analyze SQL for all models in the run. This is the recommended starting point for users transitioning from dbt Core. * `strict` (previously `on`): Statically analyze all SQL before execution begins. Provides maximum validation guarantees — nothing runs until the entire project is proven valid. * `off`: Disable static analysis for all models in the run. Deprecated values The `on` and `unsafe` values are deprecated and will be removed in May 2026. Use `strict` instead. If not set, Fusion defaults to `baseline` mode, which provides a smooth transition from dbt Core while still catching most SQL errors. See [Configuring `static_analysis`](https://docs.getdbt.com/docs/fusion/new-concepts.md#configuring-static_analysis) for more information on incrementally opting in to stricter analysis. Usage ```shell dbt run --static-analysis strict dbt run --static-analysis baseline dbt run --static-analysis off ``` #### Related docs[​](#related-docs "Direct link to Related docs") Also check out the model-level [`static_analysis` (resource config)](https://docs.getdbt.com/reference/resource-configs/static-analysis.md) and [About flags](https://docs.getdbt.com/reference/global-configs/about-global-configs.md) pages for more details. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### static_analysis info The `static_analysis` config is available in the dbt Fusion engine only. It isn't available in dbt Core and will be ignored. To upgrade to Fusion, refer to [Get started with Fusion](https://docs.getdbt.com/docs/fusion/get-started-fusion.md). * Project YAML file * Properties YAML file * SQL file config dbt\_project.yml ```yml models: resource-path: +static_analysis: strict | baseline | off ``` models/filename.yml ```yml models: - name: model_name config: static_analysis: strict | baseline | off ``` models/model\_name.sql ```sql {{ config(static_analysis='strict' | 'baseline' | 'off') }} select user_id, my_cool_udf(ip_address) as cleaned_ip from {{ ref('my_model') }} ``` #### Definition[​](#definition "Direct link to Definition") You can configure if and when the dbt Fusion engine performs static SQL analysis for a model. Configure the `static_analysis` config in your project YAML file (`dbt_project.yml`), model properties YAML file, or in a SQL config block in your model file. Refer to [Priciples of static analysis](https://docs.getdbt.com/docs/fusion/new-concepts.md?version=1.12#principles-of-static-analysis) for more information on the different modes of static analysis. The following values are available for `static_analysis`: * `baseline` (default): Statically analyze SQL. This is the recommended starting point for users transitioning from dbt Core, providing a smooth migration experience while still catching most SQL errors. You can incrementally opt-in to stricter analysis over time. * `strict` (previously `on`): Statically analyze all SQL before execution begins. Use this for maximum validation guarantees — nothing runs until the entire project is proven valid. * `off`: Skip SQL analysis for this model and its descendants. Deprecated values The `on` and `unsafe` values are deprecated and will be removed in May 2026. Use `strict` instead. ##### How static analysis modes cascade[​](#how-static-analysis-modes-cascade "Direct link to How static analysis modes cascade") Two rules determine how `static_analysis` modes apply in a lineage: * Eligibility rule: A model is eligible for static analysis only if all of its "parents" are eligible (by parents, we mean the models that are upstream of the current model in the lineage). * Strictness rule: A "child" model cannot be stricter than its parent (by child, we mean the models that are downstream of the current model in the lineage). The static analysis configuration cascades from most strict to least strict. Here's the strictness hierarchy: `strict` → `baseline` → `off` **Allowed downstream by parent mode**
When going downstream in your lineage, you can keep the same mode or relax it; but you cannot make a child stricter than its parent. The following table shows the allowed downstream modes by parent mode: | Parent mode | Child can be | | ----------- | ---------------------------------- | | `strict` | `strict`, `baseline`, or `off` | | `baseline` | `baseline` or `off` (not `strict`) | | `off` | `off` only | | | | - | For example, for the lineage Model A → Model B → Model C: * If Model A is `baseline`, you *cannot* set Model B to `strict` * If Model A is `strict`, you *can* set Model B to `baseline` This makes sure that stricter validation requirements don't apply downstream when parent models haven't met those requirements. Refer to the Fusion concepts page for deeper discussion and visuals: [New concepts](https://docs.getdbt.com/docs/fusion/new-concepts.md). For more info on the JSON schema, refer to the [dbt-jsonschema file](https://github.com/dbt-labs/dbt-jsonschema/blob/1e2c1536fbdd421e49c8b65c51de619e3cd313ff/schemas/latest_fusion/dbt_project-latest-fusion.json#L4689). #### CLI override[​](#cli-override "Direct link to CLI override") You can override model-level configuration for a run using the following CLI flags. For example, to disable static analysis for a run: ```bash dbt run --static-analysis off # disable static analysis for all models dbt run --static-analysis baseline # use baseline analysis for all models ``` See [static analysis CLI flag](https://docs.getdbt.com/reference/global-configs/static-analysis-flag.md). #### Examples[​](#examples "Direct link to Examples") The following examples show how to disable static analysis for all models in a package, for a single model, and for a model that uses a custom UDF. * [Disable static analysis for all models in a package](#disable-static-analysis-for-all-models-in-a-package) * [Disable static analysis in YAML for a single model](#disable-static-analysis-in-yaml-for-a-single-model) * [Disable static analysis in SQL for a model using a custom UDF](#disable-static-analysis-in-sql-for-a-model-using-a-custom-udf) ###### Disable static analysis for all models in a package[​](#disable-static-analysis-for-all-models-in-a-package "Direct link to Disable static analysis for all models in a package") This example shows how to disable static analysis for all models in a package. The [`+` prefix](https://docs.getdbt.com/reference/resource-configs/plus-prefix.md) applies the config to all models in the package. dbt\_project.yml ```yml name: jaffle_shop models: jaffle_shop: marts: +materialized: table a_package_with_introspective_queries: +static_analysis: off ``` ###### Disable static analysis in YAML for a single model[​](#disable-static-analysis-in-yaml-for-a-single-model "Direct link to Disable static analysis in YAML for a single model") This example shows how to disable static analysis for a single model in YAML. models/my\_udf\_using\_model.yml ```yml models: - name: model_with_static_analysis_off config: static_analysis: off ``` ###### Disable static analysis in SQL for a model using a custom UDF[​](#disable-static-analysis-in-sql-for-a-model-using-a-custom-udf "Direct link to Disable static analysis in SQL for a model using a custom UDF") This example shows how to disable static analysis for a model using a custom [user-defined function (UDF)](https://docs.getdbt.com/docs/build/udfs.md) in a SQL file. models/my\_udf\_using\_model.sql ```sql {{ config(static_analysis='off') }} select user_id, my_cool_udf(ip_address) as cleaned_ip from {{ ref('my_model') }} ``` #### Considerations[​](#considerations "Direct link to Considerations") * Disabling static analysis means that features of the VS Code extension that depend on SQL comprehension will be unavailable. * Static analysis might fail in some cases (for example, dynamic SQL constructs or unrecognized UDFs) and may require setting `static_analysis: off`. For more examples, refer to [When should I turn static analysis off?](https://docs.getdbt.com/docs/fusion/new-concepts.md#when-should-i-turn-static-analysis-off). #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### store_failures The configured test(s) will store their failures when `dbt test --store-failures` is invoked. If you set this configuration as `false` but [`store_failures_as`](https://docs.getdbt.com/reference/resource-configs/store_failures_as.md) is configured, it will be overridden. #### Description[​](#description "Direct link to Description") Optionally set a test to always or never store its failures in the database. * If specified as `true` or `false`, the `store_failures` config will take precedence over the presence or absence of the `--store-failures` flag. * If the `store_failures` config is `none` or omitted, the resource will use the value of the `--store-failures` flag. * When true, `store_failures` saves all records (up to [limit](https://docs.getdbt.com/reference/resource-configs/limit.md)) that failed the test. Failures are saved in a new table with the name of the test. * A test's results will always **replace** previous failures for the same test, even if that test results in no failures. * By default, `store_failures` uses a schema named `{{ profile.schema }}_dbt_test__audit`, but, you can [configure](https://docs.getdbt.com/reference/resource-configs/schema.md#tests) the schema to a different value. Ensure you have the authorization to create or access schemas for your work. For more details, refer to the [FAQ](#faqs). This logic is encoded in the [`should_store_failures()`](https://github.com/dbt-labs/dbt-adapters/blob/60005a0a2bd33b61cb65a591bc1604b1b3fd25d5/dbt/include/global_project/macros/materializations/configs.sql#L15) macro. * Specific test * Singular test * Generic test block * Project level Configure a specific instance of a generic (schema) test: models/\.yml ```yaml models: - name: my_model columns: - name: my_column data_tests: - unique: config: store_failures: true # always store failures - not_null: config: store_failures: false # never store failures ``` Configure a singular (data) test: tests/\.sql ```sql {{ config(store_failures = true) }} select ... ``` Set the default for all instances of a generic (schema) test, by setting the config inside its test block (definition): macros/\.sql ```sql {% test (model, column_name) %} {{ config(store_failures = false) }} select ... {% endtest %} ``` Set the default for all tests in a package or project: dbt\_project.yml ```yaml data_tests: +store_failures: true # all tests : +store_failures: false # tests in ``` #### FAQs[​](#faqs "Direct link to FAQs")  Receiving a 'permissions denied for schema' error If you're receiving a `Adapter name adapter: Adapter_name error: permission denied for schema dev_username_dbt_test__audit`, this is most likely due to your user not having permission to create new schemas, despite having owner access to your own development schema. To resolve this, you need proper authorization to create or access custom schemas. Run the following SQL command in your respective data platform environment. Note that the exact authorization query may differ from one data platform to another: ```sql create schema if not exists dev_username_dbt_test__audit authorization username; ``` *Replace `dev_username` with your specific development schema name and `username` with the appropriate user who should have the permissions.* This command grants the appropriate permissions to create and access the `dbt_test__audit` schema, which is often used with the `store_failures` configuration. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### store_failures_as For the `test` resource type, `store_failures_as` is an optional config that specifies how test failures should be stored in the database. If [`store_failures`](https://docs.getdbt.com/reference/resource-configs/store_failures.md) is also configured, `store_failures_as` takes precedence. The three supported values are: * `ephemeral` — nothing stored in the database (default) * `table` — test failures stored as a database table * `view` — test failures stored as a database view You can configure it in all the same places as `store_failures`, including singular tests (.sql files), generic tests (.yml files), and dbt\_project.yml. ##### Examples[​](#examples "Direct link to Examples") ###### Singular test[​](#singular-test "Direct link to Singular test") [Singular test](https://docs.getdbt.com/docs/build/data-tests.md#singular-data-tests) in `tests/singular/check_something.sql` file ```sql {{ config(store_failures_as="table") }} -- custom singular test select 1 as id where 1=0 ``` ###### Generic test[​](#generic-test "Direct link to Generic test") [Generic tests](https://docs.getdbt.com/docs/build/data-tests.md#generic-data-tests) in `models/_models.yml` file ```yaml models: - name: my_model columns: - name: id data_tests: - not_null: config: store_failures_as: view - unique: config: store_failures_as: ephemeral ``` ###### Project level[​](#project-level "Direct link to Project level") Config in `dbt_project.yml` ```yaml name: "my_project" version: "1.0.0" config-version: 2 profile: "sandcastle" data_tests: my_project: +store_failures_as: table my_subfolder_1: +store_failures_as: view my_subfolder_2: +store_failures_as: ephemeral ``` ##### "Clobbering" configs[​](#clobbering-configs "Direct link to \"Clobbering\" configs") As with most other configurations, `store_failures_as` is "clobbered" when applied hierarchically. Whenever a more specific value is available, it will completely replace the less specific value. Additional resources: * [Data test configurations](https://docs.getdbt.com/reference/data-test-configs.md#related-documentation) * [Data test-specific configurations](https://docs.getdbt.com/reference/data-test-configs.md#test-data-specific-configurations) * [Configuring directories of models in dbt\_project.yml](https://docs.getdbt.com/reference/model-configs.md#configuring-directories-of-models-in-dbt_projectyml) * [Config inheritance](https://docs.getdbt.com/reference/define-configs.md#config-inheritance) #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### strategy * timestamp * check dbt\_project.yml ```yml snapshots: : +strategy: timestamp +updated_at: column_name ``` dbt\_project.yml ```yml snapshots: : +strategy: check +check_cols: [column_name] | all ``` #### Description[​](#description "Direct link to Description") The snapshot strategy dbt should use to detect record changes. Read the guide to [snapshots](https://docs.getdbt.com/docs/build/snapshots.md#detecting-row-changes) to understand the differences between the two. #### Default[​](#default "Direct link to Default") This is a **required configuration**. There is no default value. #### Examples[​](#examples "Direct link to Examples") ##### Use the timestamp strategy[​](#use-the-timestamp-strategy "Direct link to Use the timestamp strategy") ##### Use the check strategy[​](#use-the-check-strategy "Direct link to Use the check strategy") ##### Advanced: define and use custom snapshot strategy[​](#advanced-define-and-use-custom-snapshot-strategy "Direct link to Advanced: define and use custom snapshot strategy") Behind the scenes, snapshot strategies are implemented as macros, named `snapshot__strategy` * [Source code](https://github.com/dbt-labs/dbt-adapters/blob/60005a0a2bd33b61cb65a591bc1604b1b3fd25d5/dbt/include/global_project/macros/materializations/snapshots/strategies.sql#L52) for the timestamp strategy * [Source code](https://github.com/dbt-labs/dbt-adapters/blob/60005a0a2bd33b61cb65a591bc1604b1b3fd25d5/dbt/include/global_project/macros/materializations/snapshots/strategies.sql#L136) for the check strategy It's possible to implement your own snapshot strategy by adding a macro with the same naming pattern to your project. For example, you might choose to create a strategy which records hard deletes, named `timestamp_with_deletes`. 1. Create a macro named `snapshot_timestamp_with_deletes_strategy`. Use the existing code as a guide and adjust as needed. 2. Use this strategy via the `strategy` configuration: #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Supported data formats for unit tests Currently, mock data for unit testing in dbt supports three formats: * `dict` (default): Inline dictionary values. * `csv`: Inline CSV values or a CSV file. * `sql`: Inline SQL query or a SQL file. Note: For this format you must supply mock data for *all columns*. #### dict[​](#dict "Direct link to dict") The `dict` data format is the default if no `format` is defined. `dict` requires an inline dictionary for `rows`: models/schema.yml ```yml unit_tests: - name: test_my_model model: my_model given: - input: ref('my_model_a') format: dict rows: - {id: 1, name: gerda} - {id: 2, name: michelle} ``` #### CSV[​](#csv "Direct link to CSV") When using the `csv` format, you can use either an inline CSV string for `rows`: models/schema.yml ```yml unit_tests: - name: test_my_model model: my_model given: - input: ref('my_model_a') format: csv rows: | id,name 1,gerda 2,michelle ``` Or, you can provide the name of a CSV file in the `tests/fixtures` directory (or the configured `test-paths` location) of your project for `fixture`: models/schema.yml ```yml unit_tests: - name: test_my_model model: my_model given: - input: ref('my_model_a') format: csv fixture: my_model_a_fixture ``` #### sql[​](#sql "Direct link to sql") Using this format: * Provides more flexibility for the types of data you can unit test * Allows you to unit test a model that depends on an ephemeral model However, when using `format: sql` you must supply mock data for *all rows*. When using the `sql` format, you can use either an inline SQL query for `rows`: models/schema.yml ```yml unit_tests: - name: test_my_model model: my_model given: - input: ref('my_model_a') format: sql rows: | select 1 as id, 'gerda' as name, null as loaded_at union all select 2 as id, 'michelle', null as loaded_at as name ``` Or, you can provide the name of a SQL file in the `tests/fixtures` directory (or the configured `test-paths` location) of your project for `fixture`: models/schema.yml ```yml unit_tests: - name: test_my_model model: my_model given: - input: ref('my_model_a') format: sql fixture: my_model_a_fixture ``` **Note:** Jinja is unsupported in SQL fixtures for unit tests. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Syntax overview dbt's node selection syntax makes it possible to run only specific resources in a given invocation of dbt. This selection syntax is used for the following subcommands: | command | argument(s) | | ----------------------------------------------------------------------- | ------------------------------------------------------------------- | | [run](https://docs.getdbt.com/reference/commands/run.md) | `--select`, `--exclude`, `--selector`, `--defer` | | [test](https://docs.getdbt.com/reference/commands/test.md) | `--select`, `--exclude`, `--selector`, `--defer` | | [seed](https://docs.getdbt.com/reference/commands/seed.md) | `--select`, `--exclude`, `--selector` | | [snapshot](https://docs.getdbt.com/reference/commands/snapshot.md) | `--select`, `--exclude` `--selector` | | [ls (list)](https://docs.getdbt.com/reference/commands/list.md) | `--select`, `--exclude`, `--selector`, `--resource-type` | | [compile](https://docs.getdbt.com/reference/commands/compile.md) | `--select`, `--exclude`, `--selector`, `--inline` | | [freshness](https://docs.getdbt.com/reference/commands/source.md) | `--select`, `--exclude`, `--selector` | | [build](https://docs.getdbt.com/reference/commands/build.md) | `--select`, `--exclude`, `--selector`, `--resource-type`, `--defer` | | [docs generate](https://docs.getdbt.com/reference/commands/cmd-docs.md) | `--select`, `--exclude`, `--selector` | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | Nodes and resources We use the terms ["nodes"](https://en.wikipedia.org/wiki/Vertex_\(graph_theory\)) and "resources" interchangeably. These encompass all the models, tests, sources, seeds, snapshots, exposures, and analyses in your project. They are the objects that make up dbt's DAG (directed acyclic graph). The `--select` and `--selector` arguments are similar in that they both allow you to select resources. To understand the difference, see [Differences between `--select` and `--selector`](https://docs.getdbt.com/reference/node-selection/yaml-selectors.md#difference-between---select-and---selector). #### Specifying resources[​](#specifying-resources "Direct link to Specifying resources") By default, `dbt run` executes *all* of the models in the dependency graph; `dbt seed` creates all seeds, `dbt snapshot` performs every snapshot. The `--select` flag is used to specify a subset of nodes to execute. To follow [POSIX standards](https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap12.html) and make things easier to understand, we recommend CLI users use quotes when passing arguments to the `--select` or `--exclude` option (including single or multiple space-delimited, or comma-delimited arguments). Not using quotes might not work reliably on all operating systems, terminals, and user interfaces. For example, `dbt run --select "my_dbt_project_name"` runs all models in your project. ##### How does selection work?[​](#how-does-selection-work "Direct link to How does selection work?") 1. dbt gathers all the resources that are matched by one or more of the `--select` criteria, in the order of [selection methods](https://docs.getdbt.com/reference/node-selection/methods.md) (e.g. `tag:`), then [graph operators](https://docs.getdbt.com/reference/node-selection/graph-operators.md) (e.g. `+`), then finally set operators ([unions](https://docs.getdbt.com/reference/node-selection/set-operators.md#unions), [intersections](https://docs.getdbt.com/reference/node-selection/set-operators.md#intersections), [exclusions](https://docs.getdbt.com/reference/node-selection/exclude.md)). tip You can combine multiple selector methods in one `--select` command by separating them with commas (`,`) without whitespace (for example, `dbt run --select "marts.finance,tag:nightly"`). This only selects resources that satisfy *all* arguments. In this example, the command runs models that are in the `marts/finance` subdirectory and tagged `nightly`. For more information, see [Set operators](https://docs.getdbt.com/reference/node-selection/set-operators.md). 2. The selected resources may be models, sources, seeds, snapshots, tests. (Tests can also be selected "indirectly" via their parents; see [test selection examples](https://docs.getdbt.com/reference/node-selection/test-selection-examples.md) for details.) 3. dbt now has a list of still-selected resources of varying types. As a final step, it tosses away any resource that does not match the resource type of the current task. (Only seeds are kept for `dbt seed`, only models for `dbt run`, only tests for `dbt test`, and so on.) #### Shorthand[​](#shorthand "Direct link to Shorthand") Select resources to build (run, test, seed, snapshot) or check freshness: `--select`, `-s` ##### Examples[​](#examples "Direct link to Examples") By default, `dbt run` will execute *all* of the models in the dependency graph. During development (and deployment), it is useful to specify only a subset of models to run. Use the `--select` flag with `dbt run` to select a subset of models to run. Note that the following arguments (`--select`, `--exclude`, and `--selector`) also apply to other dbt tasks, such as `test` and `build`. * Examples of select flag * Examples of subsets of nodes The `--select` flag accepts one or more arguments. Each argument can be one of: 1. a package name 2. a model name 3. a fully-qualified path to a directory of models 4. a selection method (`path:`, `tag:`, `config:`, `test_type:`, `test_name:`) Examples: ```bash dbt run --select "my_dbt_project_name" # runs all models in your project dbt run --select "my_dbt_model" # runs a specific model dbt run --select "path/to/my/models" # runs all models in a specific directory dbt run --select "my_package.some_model" # run a specific model in a specific package dbt run --select "tag:nightly" # run models with the "nightly" tag dbt run --select "path/to/models" # run models contained in path/to/models dbt run --select "path/to/my_model.sql" # run a specific model by its path ``` dbt supports a shorthand language for defining subsets of nodes. This language uses the following characters: * plus operator [(`+`)](https://docs.getdbt.com/reference/node-selection/graph-operators.md#the-plus-operator) * at operator [(`@`)](https://docs.getdbt.com/reference/node-selection/graph-operators.md#the-at-operator) * asterisk operator (`*`) * comma operator (`,`) Examples: ```bash # multiple arguments can be provided to --select dbt run --select "my_first_model my_second_model" # select my_model and all of its children dbt run --select "my_model+" # select my_model, its children, and the parents of its children dbt run --select @my_model # these arguments can be projects, models, directory paths, tags, or sources dbt run --select "tag:nightly my_model finance.base.*" # use methods and intersections for more complex selectors dbt run --select "path:marts/finance,tag:nightly,config.materialized:table" ``` As your selection logic gets more complex, and becomes unwieldly to type out as command-line arguments, consider using a [yaml selector](https://docs.getdbt.com/reference/node-selection/yaml-selectors.md). You can use a predefined definition with the `--selector` flag. Note that when you're using `--selector`, most other flags (namely `--select` and `--exclude`) will be ignored. The `--select` and `--selector` arguments are similar in that they both allow you to select resources. To understand the difference between `--select` and `--selector` arguments, see [this section](https://docs.getdbt.com/reference/node-selection/yaml-selectors.md#difference-between---select-and---selector) for more details. ##### Troubleshoot with the `ls` command[​](#troubleshoot-with-the-ls-command "Direct link to troubleshoot-with-the-ls-command") Constructing and debugging your selection syntax can be challenging. To get a "preview" of what will be selected, we recommend using the [`list` command](https://docs.getdbt.com/reference/commands/list.md). This command, when combined with your selection syntax, will output a list of the nodes that meet that selection criteria. The `dbt ls` command supports all types of selection syntax arguments, for example: ```bash dbt ls --select "path/to/my/models" # Lists all models in a specific directory. dbt ls --select "source_status:fresher+" # Shows sources updated since the last dbt source freshness run. dbt ls --select state:modified+ # Displays nodes modified in comparison to a previous state. dbt ls --select "result:+" state:modified+ --state ./ # Lists nodes that match certain result statuses and are modified. ``` ##### Questions from the Community[​](#questions-from-the-community "Direct link to Questions from the Community") ![Loading](/img/loader-icon.svg)[Ask the Community](https://discourse.getdbt.com/new-topic?category=help\&tags=node-selection "Ask the Community") #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### tags * Models * Seeds * Snapshots * Saved queries * Sources * Exposures * Tests dbt\_project.yml models/properties.yml ```yaml models: - name: model_name config: tags: | [] columns: - name: column_name config: tags: | [] # changed to config in v1.10 and backported to 1.9 data_tests: - test_name: config: tags: | [] ``` models/\.sql ```sql {{ config( tags="" | [""] ) }} select ... ``` dbt\_project.yml seeds/properties.yml ```yaml seeds: - name: seed_name config: tags: | [] columns: - name: column_name config: tags: | [] # changed to config in v1.10 and backported to 1.9 data_tests: - test_name: config: tags: | [] ``` dbt\_project.yml snapshots/\.sql ```sql {% snapshot snapshot_name %} {{ config( tags="" | [""] ) }} select ... {% endsnapshot %} ``` dbt\_project.yml models/semantic\_models.yml ```yaml saved_queries: - name: saved_query_name config: tags: | [] ``` dbt\_project.yml models/properties.yml ```yaml sources: - name: source_name config: tags: | [] # changed to config in v1.10 tables: - name: table_name config: tags: | [] # changed to config in v1.10 columns: - name: column_name config: tags: | [] # changed to config in v1.10 and backported to 1.9 data_tests: - test_name: config: tags: | [] ``` Note that for backwards compatibility, `tags` is supported as a top-level key for sources, but without the capabilities of config inheritance. dbt\_project.yml models/exposures.yml ```yaml exposures: - name: exposure_name config: tags: | [] # changed to config in v1.10 ``` Note that for backwards compatibility, `tags` is supported as a top-level key for exposures, but without the capabilities of config inheritance. dbt\_project.yml models/properties.yml ```yaml models: - name: model_name columns: - name: column_name data_tests: - test_name: config: tags: | [] ``` tests/\.sql ```sql {% test test_name() %} {{ config( tags="" | [""] ) }} select ... {% endtest %} ``` #### Definition[​](#definition "Direct link to Definition") Apply a tag (or list of tags) to a resource. These tags can be used as part of the [resource selection syntax](https://docs.getdbt.com/reference/node-selection/syntax.md), when running the following commands: * `dbt run --select tag:my_tag` — Run all models tagged with a specific tag. * `dbt build --select tag:my_tag` — Build all resources tagged with a specific tag. * `dbt seed --select tag:my_tag` — Seed all resources tagged with a specific tag. * `dbt snapshot --select tag:my_tag` — Snapshot all resources tagged with a specific tag. * `dbt test --select tag:my_tag` — Indirectly runs all tests associated with the models that are tagged. ###### Using tags with the `+` operator[​](#using-tags-with-the--operator "Direct link to using-tags-with-the--operator") You can use the [`+` operator](https://docs.getdbt.com/reference/node-selection/graph-operators.md#the-plus-operator) to include upstream or downstream dependencies in your `tag` selection: * `dbt run --select tag:my_tag+` — Run models tagged with `my_tag` and all their downstream dependencies. * `dbt run --select +tag:my_tag` — Run models tagged with `my_tag` and all their upstream dependencies. * `dbt run --select +tag:my_tag+` — Run models tagged with `my_tag`, their upstream dependencies, and their downstream dependencies. * `dbt run --select tag:my_tag+ --exclude tag:exclude_tag` — Run models tagged with `my_tag` and their downstream dependencies, and exclude models tagged with `exclude_tag`, regardless of their dependencies. Usage notes about tags When using tags, consider the following: * Each individual tag must be a string. * Tags are additive across project hierarchy. * Some resource types (like sources, exposures) require tags at the top level. Refer to [usage notes](#usage-notes) for more information. #### Examples[​](#examples "Direct link to Examples") The following examples show how to apply tags to resources in your project. You can configure tags in the `dbt_project.yml`, property files, or SQL files. ##### Use tags to run parts of your project[​](#use-tags-to-run-parts-of-your-project "Direct link to Use tags to run parts of your project") Apply tags in your `dbt_project.yml` as a single value or a string. In the following example, one of the models, the `jaffle_shop` model, is tagged with `contains_pii`. dbt\_project.yml ```yml models: jaffle_shop: +tags: "contains_pii" staging: +tags: - "hourly" marts: +tags: - "hourly" - "published" metrics: +tags: - "daily" - "published" ``` ##### Apply tags to models[​](#apply-tags-to-models "Direct link to Apply tags to models") This section demonstrates applying tags to models in the `dbt_project.yml`, `schema.yml`, and SQL files. To apply tags to a model in your `dbt_project.yml` file, you would add the following: dbt\_project.yml ```yaml models: jaffle_shop: +tags: finance # jaffle_shop model is tagged with 'finance'. ``` To apply tags to a model in your `models/` directory YAML property file, you would add the following using the `config` property: models/stg\_customers.yml ```yaml models: - name: stg_customers description: Customer data with basic cleaning and transformation applied, one row per customer. config: tags: ['santi'] # stg_customers.yml model is tagged with 'santi'. columns: - name: customer_id description: The unique key for each customer. data_tests: - not_null - unique ``` To apply tags to a model in your SQL file, you would add the following: models/staging/stg\_payments.sql ```sql {{ config( tags=["finance"] # stg_payments.sql model is tagged with 'finance'. ) }} select ... ``` Run resources with specific tags (or exclude resources with specific tags) using the following commands: ```shell # Run all models tagged "daily" dbt run --select tag:daily # Run all models tagged "daily", except those that are tagged hourly dbt run --select tag:daily --exclude tag:hourly ``` ##### Apply tags to seeds[​](#apply-tags-to-seeds "Direct link to Apply tags to seeds") dbt\_project.yml ```yml seeds: jaffle_shop: utm_mappings: +tags: marketing ``` dbt\_project.yml ```yml seeds: jaffle_shop: utm_mappings: +tags: - marketing - hourly ``` ##### Apply tags to saved queries[​](#apply-tags-to-saved-queries "Direct link to Apply tags to saved queries") This following example shows how to apply a tag to a saved query in the `dbt_project.yml` file. The saved query is then tagged with `order_metrics`. dbt\_project.yml ```yml saved-queries: jaffle_shop: customer_order_metrics: +tags: order_metrics ``` Then run resources with a specific tag using the following commands: ```shell # Run all resources tagged "order_metrics" dbt run --select tag:order_metrics ``` The second example shows how to apply multiple tags to a saved query in the `semantic_model.yml` file. The saved query is then tagged with `order_metrics` and `hourly`. semantic\_model.yml ```yaml saved_queries: - name: test_saved_query description: "{{ doc('saved_query_description') }}" label: Test saved query config: tags: - order_metrics - hourly ``` Run resources with multiple tags using the following commands: ```shell # Run all resources tagged "order_metrics" and "hourly" dbt build --select tag:order_metrics tag:hourly ``` #### Usage notes[​](#usage-notes "Direct link to Usage notes") ##### Tags must be strings[​](#tags-must-be-strings "Direct link to Tags must be strings") Each individual tag must be a string value (for example, `marketing` or `daily`). In the following example, `my_tag: "my_value"` is invalid because it is a key-value pair. ```yml sources: - name: ecom schema: raw description: E-commerce data for the Jaffle Shop config: tags: my_tag: "my_value". # invalid tables: - name: raw_customers config: tags: my_tag: "my_value". # invalid ``` A warning is raised when the `tags` value is not a string. For example: ```text Field config.tags: {'my_tag': 'my_value'} is not valid for source (ecom) ``` ##### Tags are additive[​](#tags-are-additive "Direct link to Tags are additive") Tags accumulate hierarchically. The [earlier example](https://docs.getdbt.com/reference/resource-configs/tags.md#use-tags-to-run-parts-of-your-project) would result in: | Model | Tags | | --------------------------------- | ------------------------------------- | | models/staging/stg\_customers.sql | `contains_pii`, `hourly` | | models/staging/stg\_payments.sql | `contains_pii`, `hourly`, `finance` | | models/marts/dim\_customers.sql | `contains_pii`, `hourly`, `published` | | models/metrics/daily\_metrics.sql | `contains_pii`, `daily`, `published` | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | ##### Applying tags to specific columns and tests[​](#applying-tags-to-specific-columns-and-tests "Direct link to Applying tags to specific columns and tests") You can also apply tags to specific columns in a resource, and to tests. models/properties.yml ```yml models: - name: my_model columns: - name: column_name config: tags: ['column_level'] # changed to config in v1.10 and backported to 1.9 data_tests: - unique: config: tags: ['test_level'] # changed to config in v1.10 ``` In the example above, the `unique` test would be selected by either of these tags: ```bash dbt test --select tag:column_level dbt test --select tag:test_level ``` ##### Backwards compatibility for sources and exposures[​](#backwards-compatibility-for-sources-and-exposures "Direct link to Backwards compatibility for sources and exposures") For backwards compatibility, `tags` is supported as a top-level key for sources and exposures (prior to dbt v1.10), but without the capabilities of config inheritance. models/properties.yml ```yml exposures: - name: my_exposure tags: ['exposure_tag'] # top-level key (legacy) # OR use config (v1.10+) config: tags: ['exposure_tag'] sources: - name: source_name tags: ['top_level'] # top-level key (legacy) # OR use config (v1.10+) config: tags: ['top_level'] tables: - name: table_name tags: ['table_level'] # top-level key (legacy) # OR use config (v1.10+) config: tags: ['table_level'] columns: - name: column_name config: tags: ['column_level'] # changed to config in v1.10 and backported to 1.9 ``` #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### target_database note Starting in dbt Core v1.9+, this functionality is no longer utilized. Use the [database](https://docs.getdbt.com/reference/resource-configs/database.md) config as an alternative to define a custom database while still respecting the `generate_database_name` macro. Try it now in the [dbt **Latest** release track](https://docs.getdbt.com/docs/dbt-versions/cloud-release-tracks.md). dbt\_project.yml ```yml snapshots: : +target_database: string ``` snapshots/\.sql ```jinja2 {{ config( target_database="string" ) }} ``` #### Description[​](#description "Direct link to Description") The database that dbt should build a [snapshot](https://docs.getdbt.com/docs/build/snapshots.md) table into. Notes: * The specified database must already exist * On **BigQuery**, this is analogous to a `project`. * On **Redshift**, cross-database queries are not possible. If you use this parameter, you will receive the following error. As such, **do not use** this parameter on Redshift: ```text Encountered an error: Runtime Error Cross-db references not allowed in redshift (raw vs analytics) ``` #### Default[​](#default "Direct link to Default") By default, dbt will use the [target](https://docs.getdbt.com/reference/dbt-jinja-functions/target.md) database associated with your profile/connection. #### Examples[​](#examples "Direct link to Examples") ##### Build all snapshots in a database named `snapshots`[​](#build-all-snapshots-in-a-database-named-snapshots "Direct link to build-all-snapshots-in-a-database-named-snapshots") dbt\_project.yml ```yml snapshots: +target_database: snapshots ``` ##### Use a target-aware database[​](#use-a-target-aware-database "Direct link to Use a target-aware database") Use the [`{{ target }}` variable](https://docs.getdbt.com/reference/dbt-jinja-functions/target.md) to change which database a snapshot table is built in. Note: consider whether this use-case is right for you, as downstream `refs` will select from the `dev` version of a snapshot, which can make it hard to validate models that depend on snapshots. dbt\_project.yml ```yml snapshots: +target_database: "{% if target.name == 'dev' %}dev{% else %}{{ target.database }}{% endif %}" ``` ##### Use the same database-naming behavior as models[​](#use-the-same-database-naming-behavior-as-models "Direct link to Use the same database-naming behavior as models") Leverage the [`generate_database_name` macro](https://docs.getdbt.com/docs/build/custom-databases.md) to build snapshots in databases that follow the same naming behavior as your models. Notes: * This macro is not available when configuring from the `dbt_project.yml` file, so it must be configured in a snapshot SQL file config. * Consider whether this use-case is right for you, as downstream `refs` will select from the `dev` version of a snapshot, which can make it hard to validate models that depend on snapshots. snapshots/orders\_snaphot.sql ```sql {{ config( target_database=generate_database_name('snapshots') ) }} ``` #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### target_schema note Starting in dbt Core v1.9+, this functionality is no longer utilized. Use the [schema](https://docs.getdbt.com/reference/resource-configs/schema.md) config as an alternative to define a custom schema while still respecting the `generate_schema_name` macro. Try it now in the [dbt **Latest** release track](https://docs.getdbt.com/docs/dbt-versions/cloud-release-tracks.md). dbt\_project.yml ```yml snapshots: : +target_schema: string ``` snapshots/\.sql ```jinja2 {{ config( target_schema="string" ) }} ``` #### Description[​](#description "Direct link to Description") The schema that dbt should build a [snapshot](https://docs.getdbt.com/docs/build/snapshots.md) table into. When `target_schema` is provided, snapshots build into the same `target_schema`, no matter who is running them. On **BigQuery**, this is analogous to a `dataset`. #### Default[​](#default "Direct link to Default") #### Examples[​](#examples "Direct link to Examples") ##### Build all snapshots in a schema named `snapshots`[​](#build-all-snapshots-in-a-schema-named-snapshots "Direct link to build-all-snapshots-in-a-schema-named-snapshots") dbt\_project.yml ```yml snapshots: +target_schema: snapshots ``` #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Teradata configurations #### General[​](#general "Direct link to General") * *Set `quote_columns`* - to prevent a warning, make sure to explicitly set a value for `quote_columns` in your `dbt_project.yml`. See the [doc on quote\_columns](https://docs.getdbt.com/reference/resource-configs/quote_columns.md) for more information. ```yaml seeds: +quote_columns: false #or `true` if you have CSV column headers with spaces ``` #### Models[​](#models "Direct link to Models") ##### table * `table_kind` - define the table kind. Legal values are `MULTISET` (default for ANSI transaction mode required by `dbt-teradata`) and `SET`, e.g.: * in SQL materialization definition file: ```yaml {{ config( materialized="table", table_kind="SET" ) }} ``` * in seed configuration: ```yaml seeds: : table_kind: "SET" ``` For details, see [CREATE TABLE documentation](https://docs.teradata.com/r/76g1CuvvQlYBjb2WPIuk3g/B6Js16DRQVwPDjgJ8rz7hg). * `table_option` - defines table options. The config supports multiple statements. The definition below uses the Teradata syntax definition to explain what statements are allowed. Square brackets `[]` denote optional parameters. The pipe symbol `|` separates statements. Use commas to combine multiple statements as shown in the examples below: ```text { MAP = map_name [COLOCATE USING colocation_name] | [NO] FALLBACK [PROTECTION] | WITH JOURNAL TABLE = table_specification | [NO] LOG | [ NO | DUAL ] [BEFORE] JOURNAL | [ NO | DUAL | LOCAL | NOT LOCAL ] AFTER JOURNAL | CHECKSUM = { DEFAULT | ON | OFF } | FREESPACE = integer [PERCENT] | mergeblockratio | datablocksize | blockcompression | isolated_loading } ``` where: * mergeblockratio: ```text { DEFAULT MERGEBLOCKRATIO | MERGEBLOCKRATIO = integer [PERCENT] | NO MERGEBLOCKRATIO } ``` * datablocksize: ```text DATABLOCKSIZE = { data_block_size [ BYTES | KBYTES | KILOBYTES ] | { MINIMUM | MAXIMUM | DEFAULT } DATABLOCKSIZE } ``` * blockcompression: ```text BLOCKCOMPRESSION = { AUTOTEMP | MANUAL | ALWAYS | NEVER | DEFAULT } [, BLOCKCOMPRESSIONALGORITHM = { ZLIB | ELZS_H | DEFAULT } ] [, BLOCKCOMPRESSIONLEVEL = { value | DEFAULT } ] ``` * isolated\_loading: ```text WITH [NO] [CONCURRENT] ISOLATED LOADING [ FOR { ALL | INSERT | NONE } ] ``` Examples: * In SQL materialization definition file: ```yaml {{ config( materialized="table", table_option="NO FALLBACK" ) }} ``` ```yaml {{ config( materialized="table", table_option="NO FALLBACK, NO JOURNAL" ) }} ``` ```yaml {{ config( materialized="table", table_option="NO FALLBACK, NO JOURNAL, CHECKSUM = ON, NO MERGEBLOCKRATIO, WITH CONCURRENT ISOLATED LOADING FOR ALL" ) }} ``` * in seed configuration: ```yaml seeds: : table_option:"NO FALLBACK" ``` ```yaml seeds: : table_option:"NO FALLBACK, NO JOURNAL" ``` ```yaml seeds: : table_option: "NO FALLBACK, NO JOURNAL, CHECKSUM = ON, NO MERGEBLOCKRATIO, WITH CONCURRENT ISOLATED LOADING FOR ALL" ``` For details, see [CREATE TABLE documentation](https://docs.teradata.com/r/76g1CuvvQlYBjb2WPIuk3g/B6Js16DRQVwPDjgJ8rz7hg). * `with_statistics` - should statistics be copied from the base table. For example: ```yaml {{ config( materialized="table", with_statistics="true" ) }} ``` For details, see [CREATE TABLE documentation](https://docs.teradata.com/r/76g1CuvvQlYBjb2WPIuk3g/B6Js16DRQVwPDjgJ8rz7hg). * `index` - defines table indices: ```text [UNIQUE] PRIMARY INDEX [index_name] ( index_column_name [,...] ) | NO PRIMARY INDEX | PRIMARY AMP [INDEX] [index_name] ( index_column_name [,...] ) | PARTITION BY { partitioning_level | ( partitioning_level [,...] ) } | UNIQUE INDEX [ index_name ] [ ( index_column_name [,...] ) ] [loading] | INDEX [index_name] [ALL] ( index_column_name [,...] ) [ordering] [loading] [,...] ``` where: * partitioning\_level: ```text { partitioning_expression | COLUMN [ [NO] AUTO COMPRESS | COLUMN [ [NO] AUTO COMPRESS ] [ ALL BUT ] column_partition ] } [ ADD constant ] ``` * ordering: ```text ORDER BY [ VALUES | HASH ] [ ( order_column_name ) ] ``` * loading: ```text WITH [NO] LOAD IDENTITY ``` Examples: * In SQL materialization definition file: ```yaml {{ config( materialized="table", index="UNIQUE PRIMARY INDEX ( GlobalID )" ) }} ``` > ℹ️ Note, unlike in `table_option`, there are no commas between index statements! ```yaml {{ config( materialized="table", index="PRIMARY INDEX(id) PARTITION BY RANGE_N(create_date BETWEEN DATE '2020-01-01' AND DATE '2021-01-01' EACH INTERVAL '1' MONTH)" ) }} ``` ```yaml {{ config( materialized="table", index="PRIMARY INDEX(id) PARTITION BY RANGE_N(create_date BETWEEN DATE '2020-01-01' AND DATE '2021-01-01' EACH INTERVAL '1' MONTH) INDEX index_attrA (attrA) WITH LOAD IDENTITY" ) }} ``` * in seed configuration: ```yaml seeds: : index: "UNIQUE PRIMARY INDEX ( GlobalID )" ``` > ℹ️ Note, unlike in `table_option`, there are no commas between index statements! ```yaml seeds: : index: "PRIMARY INDEX(id) PARTITION BY RANGE_N(create_date BETWEEN DATE '2020-01-01' AND DATE '2021-01-01' EACH INTERVAL '1' MONTH)" ``` ```yaml seeds: : index: "PRIMARY INDEX(id) PARTITION BY RANGE_N(create_date BETWEEN DATE '2020-01-01' AND DATE '2021-01-01' EACH INTERVAL '1' MONTH) INDEX index_attrA (attrA) WITH LOAD IDENTITY" ``` #### Seeds[​](#seeds "Direct link to Seeds") Using seeds to load raw data As explained in [dbt seeds documentation](https://docs.getdbt.com/docs/build/seeds.md), seeds should not be used to load raw data (for example, large CSV exports from a production database). Since seeds are version controlled, they are best suited to files that contain business-specific logic, for example a list of country codes or user IDs of employees. Loading CSVs using dbt's seed functionality is not performant for large files. Consider using a different tool to load these CSVs into your data warehouse. * `use_fastload` - use [fastload](https://github.com/Teradata/python-driver#FastLoad) when handling `dbt seed` command. The option will likely speed up loading when your seed files have hundreds of thousands of rows. You can set this seed configuration option in your `project.yml` file, e.g.: ```yaml seeds: : +use_fastload: true ``` #### Snapshots[​](#snapshots "Direct link to Snapshots") Snapshots use the [HASHROW function](https://docs.teradata.com/r/Enterprise_IntelliFlex_VMware/SQL-Functions-Expressions-and-Predicates/Hash-Related-Functions/HASHROW/HASHROW-Function-Syntax) of the Teradata database to generate a unique hash value for the `dbt_scd_id` column. To use your own hash UDF, there is a configuration option in the snapshot model called `snapshot_hash_udf`, which defaults to HASHROW. You can provide a value like ``. If you only provide `hash_udf_name`, it uses the same schema as the model runs. For example, in the `snapshots/snapshot_example.sql` file: ```sql {% snapshot snapshot_example %} {{ config( target_schema='snapshots', unique_key='id', strategy='check', check_cols=["c2"], snapshot_hash_udf='GLOBAL_FUNCTIONS.hash_md5' ) }} select * from {{ ref('order_payments') }} {% endsnapshot %} ``` ###### Grants[​](#grants "Direct link to Grants") Grants are supported in dbt-teradata adapter with release version 1.2.0 and above. You can use grants to manage access to the datasets you're producing with dbt. To implement these permissions, define grants as resource configs on each model, seed, or snapshot. Define the default grants that apply to the entire project in your `dbt_project.yml`, and define model-specific grants within each model's SQL or property file. For example: models/schema.yml ```yaml models: - name: model_name config: grants: select: ['user_a', 'user_b'] ``` Another example for adding multiple grants: ```yaml models: - name: model_name config: materialized: table grants: select: ["user_b"] insert: ["user_c"] ``` > ℹ️ `copy_grants` is not supported in Teradata. Refer to [grants](https://docs.getdbt.com/reference/resource-configs/grants.md) for more information on Grants. #### Query band[​](#query-band "Direct link to Query band") Query band in dbt-teradata can be set on three levels: 1. Profiles level: In the `profiles.yml` file, the user can provide `query_band` using the following example: ```yaml query_band: 'application=dbt;' ``` 2. Project level: In the `dbt_project.yml` file, the user can provide `query_band` using the following example: ```yaml models: Project_name: +query_band: "app=dbt;model={model};" ``` 3. Model level: It can be set on the model SQL file or model level configuration on YAML files: ```sql {{ config( query_band='sql={model};' ) }} ``` Users can set `query_band` at any level or on all levels. With profiles-level `query_band`, dbt-teradata will set the `query_band` for the first time for the session, and subsequently for model and project level query band will be updated with respective configuration. If a user sets some key-value pair with value as `'{model}'`, internally this `'{model}'` will be replaced with model name, which can be useful for telemetry tracking of sql/ dbql logging. ```yaml models: Project_name: +query_band: "app=dbt;model={model};" ``` * For example, if the model the user is running is `stg_orders`, `{model}` will be replaced with `stg_orders` in runtime. * If no `query_band` is set by the user, the default query\_band used will be: `org=teradata-internal-telem;appname=dbt;` #### Unit testing[​](#unit-testing "Direct link to Unit testing") * Unit testing is supported in dbt-teradata, allowing users to write and execute unit tests using the dbt test command. * For detailed guidance, refer to the [dbt unit tests documentation](https://docs.getdbt.com/docs/build/documentation.md). > In Teradata, reusing the same alias across multiple common table expressions (CTEs) or subqueries within a single model is not permitted, as it results in parsing errors; therefore, it is essential to assign unique aliases to each CTE or subquery to ensure proper query execution. #### valid\_history incremental materialization strategy[​](#valid_history-incremental-materialization-strategy "Direct link to valid_history incremental materialization strategy") *This is available in early access* This strategy is designed to manage historical data efficiently within a Teradata environment, leveraging dbt features to ensure data quality and optimal resource usage. In temporal databases, valid time is crucial for applications like historical reporting, ML training datasets, and forensic analysis. ```yaml {{ config( materialized='incremental', unique_key='id', on_schema_change='fail', incremental_strategy='valid_history', valid_period='valid_period_col', use_valid_to_time='no', ) }} ``` The `valid_history` incremental strategy requires the following parameters: * `unique_key`: The primary key of the model (excluding the valid time components), specified as a column name or list of column names. * `valid_period`: Name of the model column indicating the period for which the record is considered to be valid. The datatype must be `PERIOD(DATE)` or `PERIOD(TIMESTAMP)`. * `use_valid_to_time`: Whether the end bound value of the valid period in the input is considered by the strategy when building the valid timeline. Use `no` if you consider your record to be valid until changed (and supply any value greater to the begin bound for the end bound of the period. A typical convention is `9999-12-31` of \`\`9999-12-31 23:59:59.999999`). Use `yes\` if you know until when the record is valid (typically this is a correction in the history timeline). The valid\_history strategy in dbt-teradata involves several critical steps to ensure the integrity and accuracy of historical data management: * Remove duplicates and conflicting values from the source data: * This step ensures that the data is clean and ready for further processing by eliminating any redundant or conflicting records. * The process of removing primary key duplicates (two or more records with the same value for the `unique_key` and BEGIN() bond of the `valid_period` fields) in the dataset produced by the model. If such duplicates exist, the row with the lowest value is retained for all non-primary-key fields (in the order specified in the model). Full-row duplicates are always de-duplicated. * Identify and adjust overlapping time slices: * Overlapping or adjacent time periods in the data are corrected to maintain a consistent and non-overlapping timeline. To achieve this, the macro adjusts the valid period end bound of a record to align with the begin bound of the next record (if they overlap or are adjacent) within the same `unique_key` group. If `use_valid_to_time = 'yes'`, the valid period end bound provided in the source data is used. Otherwise, a default end date is applied for missing bounds, and adjustments are made accordingly. * Manage records needing to be adjusted, deleted, or split based on the source and target data: * This involves handling scenarios where records in the source data overlap with or need to replace records in the target data, ensuring that the historical timeline remains accurate. * Compact history: * Normalize and compact the history by merging records of adjacent time periods with the same value, optimizing database storage and performance. We use the function TD\_NORMALIZE\_MEET for this purpose. * Delete existing overlapping records from the target table: * Before inserting new or updated records, any existing records in the target table that overlap with the new data are removed to prevent conflicts. * Insert the processed data into the target table: * Finally, the cleaned and adjusted data is inserted into the target table, ensuring that the historical data is up-to-date and accurately reflects the intended timeline. These steps collectively ensure that the valid\_history strategy effectively manages historical data, maintaining its integrity and accuracy while optimizing performance. ```sql An illustration demonstrating the source sample data and its corresponding target data: -- Source data pk | valid_from | value_txt1 | value_txt2 ====================================================================== 1 | 2024-03-01 00:00:00.0000 | A | x1 1 | 2024-03-12 00:00:00.0000 | B | x1 1 | 2024-03-12 00:00:00.0000 | B | x2 1 | 2024-03-25 00:00:00.0000 | A | x2 2 | 2024-03-01 00:00:00.0000 | A | x1 2 | 2024-03-12 00:00:00.0000 | C | x1 2 | 2024-03-12 00:00:00.0000 | D | x1 2 | 2024-03-13 00:00:00.0000 | C | x1 2 | 2024-03-14 00:00:00.0000 | C | x1 -- Target data pk | valid_period | value_txt1 | value_txt2 =================================================================================================== 1 | PERIOD(TIMESTAMP)[2024-03-01 00:00:00.0, 2024-03-12 00:00:00.0] | A | x1 1 | PERIOD(TIMESTAMP)[2024-03-12 00:00:00.0, 2024-03-25 00:00:00.0] | B | x1 1 | PERIOD(TIMESTAMP)[2024-03-25 00:00:00.0, 9999-12-31 23:59:59.9999] | A | x2 2 | PERIOD(TIMESTAMP)[2024-03-01 00:00:00.0, 2024-03-12 00:00:00.0] | A | x1 2 | PERIOD(TIMESTAMP)[2024-03-12 00:00:00.0, 9999-12-31 23:59:59.9999] | C | x1 ``` #### Common Teradata-specific tasks[​](#common-teradata-specific-tasks "Direct link to Common Teradata-specific tasks") * *collect statistics* - when a table is created or modified significantly, there might be a need to tell Teradata to collect statistics for the optimizer. It can be done using `COLLECT STATISTICS` command. You can perform this step using dbt's `post-hooks`, e.g.: ```yaml {{ config( post_hook=[ "COLLECT STATISTICS ON {{ this }} COLUMN (column_1, column_2 ...);" ] )}} ``` See [Collecting Statistics documentation](https://docs.teradata.com/r/76g1CuvvQlYBjb2WPIuk3g/RAyUdGfvREwbO9J0DMNpLw) for more information. #### The external tables package[​](#the-external-tables-package "Direct link to The external tables package") The [dbt-external-tables](https://github.com/dbt-labs/dbt-external-tables) package is supported with the dbt-teradata adapter from v1.9.3 onwards. Under the hood, dbt-teradata uses the concept of foreign tables to create tables from external sources. More information can be found in the [Teradata documentation](https://docs.teradata.com/r/Enterprise_IntelliFlex_VMware/SQL-Data-Definition-Language-Syntax-and-Examples/Table-Statements/CREATE-FOREIGN-TABLE). You need to add the `dbt-external-tables` package as a dependency: ```yaml packages: - package: dbt-labs/dbt_external_tables version: [">=0.9.0", "<1.0.0"] ``` You need to add the dispatch config for the project to pick the overridden macros from the dbt-teradata package: ```yaml dispatch: - macro_namespace: dbt_external_tables search_order: ['dbt', 'dbt_external_tables'] ``` To define `STOREDAS` and `ROWFORMAT` for external tables, one of the following options can be used: * You can use the standard dbt-external-tables config `file_format` and `row_format` respectively. * Or you can add it in the `USING` config as mentioned in the [Teradata documentation](https://docs.teradata.com/r/Enterprise_IntelliFlex_VMware/SQL-Data-Definition-Language-Syntax-and-Examples/Table-Statements/CREATE-FOREIGN-TABLE/CREATE-FOREIGN-TABLE-Syntax-Elements/USING-Clause). For the external sources, which require authentication, you need to create an authentication object and pass it in `tbl_properties` as `EXTERNAL SECURITY` object. For more information on authentication objects, check out the [Teradata documentation](https://docs.teradata.com/r/Enterprise_IntelliFlex_VMware/SQL-Data-Definition-Language-Syntax-and-Examples/Authorization-Statements-for-External-Routines/CREATE-AUTHORIZATION-and-REPLACE-AUTHORIZATION). The following are examples of external sources configured for Teradata: ```yaml sources: - name: teradata_external schema: "{{ target.schema }}" loader: S3 tables: - name: people_csv_partitioned external: location: "/s3/s3.amazonaws.com/dbt-external-tables-testing/csv/" file_format: "TEXTFILE" row_format: '{"field_delimiter":",","record_delimiter":"\n","character_set":"LATIN"}' using: | PATHPATTERN ('$var1/$section/$var3') tbl_properties: | MAP = TD_MAP1 ,EXTERNAL SECURITY MyAuthObj partitions: - name: section data_type: CHAR(1) columns: - name: id data_type: int - name: first_name data_type: varchar(64) - name: last_name data_type: varchar(64) - name: email data_type: varchar(64) ``` ```yaml sources: - name: teradata_external schema: "{{ target.schema }}" loader: S3 tables: - name: people_json_partitioned external: location: '/s3/s3.amazonaws.com/dbt-external-tables-testing/json/' using: | STOREDAS('TEXTFILE') ROWFORMAT('{"record_delimiter":"\n", "character_set":"cs_value"}') PATHPATTERN ('$var1/$section/$var3') tbl_properties: | MAP = TD_MAP1 ,EXTERNAL SECURITY MyAuthObj partitions: - name: section data_type: CHAR(1) ``` ##### `temporary_metadata_generation_schema` (previously `fallback_schema`)[​](#temporary_metadata_generation_schema-previously-fallback_schema "Direct link to temporary_metadata_generation_schema-previously-fallback_schema") The dbt-teradata adapter internally creates temporary tables to fetch the metadata of views for manifest and catalog creation. If you lack permission to create tables on the schema you are working with, you can define a `temporary_metadata_generation_schema` (to which you have the proper `create`/`drop` privileges) in the `dbt_project.yml` as a variable. ```yaml vars: temporary_metadata_generation_schema: ``` #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Test selection examples Test selection works a little differently from other resource selection. This makes it very easy to: * run tests on a particular model * run tests on all models in a subdirectory * run tests on all models upstream / downstream of a model, etc. Like all resource types, tests can be selected **directly**, by methods and operators that capture one of their attributes: their name, properties, tags, etc. Unlike other resource types, tests can also be selected *indirectly* through relationships in your DAG. If a selection method or operator includes a model that a test depends on, dbt will also select that test. For example, when you run `dbt test --select model_b`, dbt includes tests defined on `model_b` as well as tests on related models (like `model_a`) that reference `model_b`.[See the next section](#indirect-selection) for more details on controlling this behavior. Test selection is powerful, and we know it can be tricky. To that end, we've included lots of examples below: ##### Direct selection[​](#direct-selection "Direct link to Direct selection") Run generic tests only: ```bash dbt test --select "test_type:generic" ``` Run singular tests only: ```bash dbt test --select "test_type:singular" ``` In both cases, `test_type` checks a property of the test itself. These are forms of "direct" test selection. ##### Indirect selection[​](#indirect-selection "Direct link to Indirect selection") Indirect selection modes control which tests run based on the models you select and their relationships in your DAG. These modes determine how dbt handles tests that reference your selected models, either directly or through upstream/downstream relationships. You can use the following modes (with `eager` as the default). Test exclusion is always greedy: if ANY parent is explicitly excluded, the test will be excluded as well. Building subsets of a DAG The `buildable` and `cautious` modes can be useful when you're only building a subset of your DAG, and you want to avoid test failures in `eager` mode caused by unbuilt resources. You can also achieve this with [deferral](https://docs.getdbt.com/reference/node-selection/defer.md). ###### Eager mode (default)[​](#eager-mode "Direct link to Eager mode (default)") Most inclusive and runs tests if *any* of the parent nodes are selected, regardless of whether all dependencies are met. This includes *any* tests that reference the selected nodes, even if they also reference other unselected nodes. For example, if you run `dbt test --select model_b`, eager mode will run: * Tests directly on `model_b` * Tests in upstream models (like `model_a`) that reference `model_b` * Tests in downstream models that reference `model_b` dbt builds models that depend on the selected model. In this mode, any tests depending on unbuilt resources will raise an error. ###### Buildable mode[​](#buildable-mode "Direct link to Buildable mode") Buildable mode is a middle ground between `cautious` and `eager`, running only tests that reference selected nodes (or their ancestors). This mode is slightly more inclusive than `cautious` by including tests whose references are each within the selected nodes (or their ancestors). This mode is useful when a test depends on a model *and* a direct ancestor of that model, like confirming an aggregation has the same totals as its input. ###### Cautious mode[​](#cautious-mode "Direct link to Cautious mode") Cautious is the most exclusive mode and ensures that tests are executed and models are built only when all necessary dependencies of the selected models are met. Restricts tests to only those that exclusively reference selected nodes. Tests will only be executed if all the nodes they depend on are selected, which prevents tests from running if one or more of its parent nodes are unselected and, consequently, unbuilt. ###### Empty mode[​](#empty-mode "Direct link to Empty mode") Empty mode runs no tests and restricts the build to the selected node, ignoring all indirect dependencies. It doesn't execute any tests, whether they are directly attached to the selected node or not. The empty mode is automatically used for [interactive compilation](https://docs.getdbt.com/reference/commands/compile.md#interactive-compile). ##### Indirect selection examples[​](#indirect-selection-examples "Direct link to Indirect selection examples") To visualize these methods, suppose you have `model_a`, `model_b`, and `model_c` and associated data tests. The following illustrates which tests will be run when you execute `dbt build` with the various indirect selection modes: [![dbt build](/img/docs/reference/indirect-selection-dbt-build.png?v=2 "dbt build")](#)dbt build [![Eager (default)](/img/docs/reference/indirect-selection-eager.png?v=2 "Eager (default)")](#)Eager (default) [![Buildable](/img/docs/reference/indirect-selection-buildable.png?v=2 "Buildable")](#)Buildable [![Cautious](/img/docs/reference/indirect-selection-cautious.png?v=2 "Cautious")](#)Cautious [![Empty](/img/docs/reference/indirect-selection-empty.png?v=2 "Empty")](#)Empty * Eager mode (default) * Buildable mode * Cautious mode * Empty mode In this example, during the build process, any test that depends on the selected "orders" model or its dependent models will be executed, even if it depends other models as well. ```shell dbt test --select "orders" dbt build --select "orders" ``` In this example, dbt executes tests that reference "orders" within the selected nodes (or their ancestors). ```shell dbt test --select "orders" --indirect-selection=buildable dbt build --select "orders" --indirect-selection=buildable ``` In this example, only tests that depend *exclusively* on the "orders" model will be executed: ```shell dbt test --select "orders" --indirect-selection=cautious dbt build --select "orders" --indirect-selection=cautious ``` This mode does not execute any tests, whether they are directly attached to the selected node or not. ```shell dbt test --select "orders" --indirect-selection=empty dbt build --select "orders" --indirect-selection=empty ``` ##### Test selection syntax examples[​](#test-selection-syntax-examples "Direct link to Test selection syntax examples") Setting `indirect_selection` can also be specified in a [yaml selector](https://docs.getdbt.com/reference/node-selection/yaml-selectors.md#indirect-selection). The following examples use *eager* mode by default for indirect selection, unless you specify another mode (like `--indirect-selection=cautious`). The selection operators (`+`, `tags`, and so on) determine which models are selected; the indirect selection mode determines which tests run for those models. ```bash # Run tests on a model (indirect selection) dbt test --select "customers" # Run tests on two or more specific models (indirect selection) dbt test --select "customers orders" # Run tests on all models in the models/staging/jaffle_shop directory (indirect selection) dbt test --select "staging.jaffle_shop" # Run tests downstream of a model (note this will select those tests directly!) dbt test --select "stg_customers+" # Run tests upstream of a model (indirect selection) dbt test --select "+stg_customers" # Run tests on all models with a particular tag (direct + indirect) dbt test --select "tag:my_model_tag" # Run tests on all models with a particular materialization (indirect selection) dbt test --select "config.materialized:table" # To change the indirect selection mode, add the flag: dbt test --select "customers" --indirect-selection=cautious ``` The same principle can be extended to tests defined on other resource types. In these cases, we will execute all tests defined on certain sources via the `source:` selection method: ```bash # tests on all sources dbt test --select "source:*" # tests on one source dbt test --select "source:jaffle_shop" # tests on two or more specific sources dbt test --select "source:jaffle_shop source:raffle_bakery" # tests on one source table dbt test --select "source:jaffle_shop.customers" # tests on everything _except_ sources dbt test --exclude "source:*" ``` ##### More complex selection[​](#more-complex-selection "Direct link to More complex selection") Through the combination of direct and indirect selection, there are many ways to accomplish the same outcome. Let's say we have a data test named `assert_total_payment_amount_is_positive` that depends on a model named `payments`. All of the following would manage to select and execute that test specifically: ```bash dbt test --select "assert_total_payment_amount_is_positive" # directly select the test by name dbt test --select "payments,test_type:singular" # indirect selection, v1.2 ``` As long as you can select a common property of a group of resources, indirect selection allows you to execute all the tests on those resources, too. In the example above, we saw it was possible to test all table-materialized models. This principle can be extended to other resource types, too: ```bash # Run tests on all models with a particular materialization dbt test --select "config.materialized:table" # Run tests on all seeds, which use the 'seed' materialization dbt test --select "config.materialized:seed" # Run tests on all snapshots, which use the 'snapshot' materialization dbt test --select "config.materialized:snapshot" ``` Note that this functionality may change in future versions of dbt. ##### Run tests on tagged columns[​](#run-tests-on-tagged-columns "Direct link to Run tests on tagged columns") Because the column `order_id` is tagged `my_column_tag`, the test itself also receives the tag `my_column_tag`. Because of that, this is an example of direct selection. models/\.yml ```yml models: - name: orders columns: - name: order_id config: tags: [my_column_tag] # changed to config in v1.10 and backported to 1.9 data_tests: - unique ``` ```bash dbt test --select "tag:my_column_tag" ``` Currently, tests "inherit" tags applied to columns, sources, and source tables. They do *not* inherit tags applied to models, seeds, or snapshots. In all likelihood, those tests would still be selected indirectly, because the tag selects its parent. This is a subtle distinction, and it may change in future versions of dbt. ##### Run tagged tests only[​](#run-tagged-tests-only "Direct link to Run tagged tests only") This is an even clearer example of direct selection: the test itself is tagged `my_test_tag`, and selected accordingly. models/\.yml ```yml models: - name: orders columns: - name: order_id data_tests: - unique: config: tags: [my_test_tag] # changed to config in v1.10 ``` ```bash dbt test --select "tag:my_test_tag" ``` #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### test-paths dbt\_project.yml ```yml test-paths: [directorypath] ``` #### Definition[​](#definition "Direct link to Definition") Optionally specify a custom list of directories where [singular tests](https://docs.getdbt.com/docs/build/data-tests.md#singular-data-tests) and [custom generic tests](https://docs.getdbt.com/docs/build/data-tests.md#generic-data-tests) are located. #### Default[​](#default "Direct link to Default") Without specifying this config, dbt will search for tests in the `tests` directory, i.e. `test-paths: ["tests"]`. Specifically, it will look for `.sql` files containing: * Generic test definitions in the `tests/generic` subdirectory * Singular tests (all other files) Paths specified in `test-paths` must be relative to the location of your `dbt_project.yml` file. Avoid using absolute paths like `/Users/username/project/test`, as it will lead to unexpected behavior and outcomes. * ✅ **Do** * Use relative path: ```yml test-paths: ["test"] ``` * ❌ **Don't:** * Avoid absolute paths: ```yml test-paths: ["/Users/username/project/test"] ``` #### Examples[​](#examples "Direct link to Examples") ##### Use a subdirectory named `custom_tests` instead of `tests` for data tests[​](#use-a-subdirectory-named-custom_tests-instead-of-tests-for-data-tests "Direct link to use-a-subdirectory-named-custom_tests-instead-of-tests-for-data-tests") dbt\_project.yml ```yml test-paths: ["custom_tests"] ``` #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### type 💡Did you know... Available from dbt v 1.11 or with the [dbt "Latest" release track](https://docs.getdbt.com/docs/dbt-versions/cloud-release-tracks.md). functions/\.yml ```yml functions: - name: function_name config: type: scalar | aggregate ``` In the future, we're considering adding support for `table` type. Refer to [this issue](https://github.com/dbt-labs/dbt-core/issues/11917) to track the progress and provide any feedback. #### Definition[​](#definition "Direct link to Definition") The `type` config specifies the type of user-defined function (UDF) you're creating. This config is optional and defaults to `scalar` if not specified. #### Supported function types[​](#supported-function-types "Direct link to Supported function types") The following function types are supported: * [scalar (default)](#scalar-default) * [aggregate](#aggregate) Support for `type` differs based on the warehouse and language (SQL or Python) you're using: | Adapter | scalar SQL | scalar Python | aggregate SQL | aggregate Python | | -------------- | ---------- | ------------- | ------------- | ---------------- | | dbt-bigquery | ✅ | ✅ | ✅ | ❌ | | dbt-snowflake | ✅ | ✅ | ❌ | ✅ | | dbt-databricks | ✅ | ❌ | ❌ | ❌ | | dbt-postgres | ✅ | ❌ | ❌ | ❌ | | dbt-redshift | ✅ | ❌ | ❌ | ❌ | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | ##### scalar (default)[​](#scalar-default "Direct link to scalar (default)") A scalar function returns a single value for each row of input. This is the most common type of UDF. **Example use cases:** * Data validation (checking if a string matches a pattern) * Data transformation (converting formats, cleaning strings) * Custom calculations (complex mathematical operations) functions/schema.yml ```yml functions: - name: is_positive_int description: Determines if a string represents a positive integer config: type: scalar arguments: - name: input_string data_type: STRING returns: data_type: BOOLEAN ``` ##### aggregate[​](#aggregate "Direct link to aggregate") Aggregate functions operate on multiple rows and return a single value — for example, they sum values or calculate an average for a group. Queries use these functions in `GROUP BY` operations. Aggregate functions are currently supported only for: * Python functions on Snowflake * SQL functions on BigQuery **Example use cases:** * Calculating totals or averages for groups of data (for example, total sales per customer) * Aggregating data over time (for example, daily, monthly, or yearly totals) functions/schema.yml ```yml functions: - name: double_total description: Sums values and doubles the result config: type: aggregate arguments: - name: values data_type: FLOAT description: A sequence of numbers to aggregate returns: data_type: FLOAT ``` #### Related documentation[​](#related-documentation "Direct link to Related documentation") * [User-defined functions](https://docs.getdbt.com/docs/build/udfs.md) * [Function properties](https://docs.getdbt.com/reference/function-properties.md) * [Function configurations](https://docs.getdbt.com/reference/function-configs.md) * [Volatility](https://docs.getdbt.com/reference/resource-configs/volatility.md) * [Arguments](https://docs.getdbt.com/reference/resource-properties/function-arguments.md) * [Returns](https://docs.getdbt.com/reference/resource-properties/returns.md) #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### unique_key unique\_key identifies records for incremental models or snapshots, ensuring changes are captured or updated correctly. * Models * Snapshots Configure the `unique_key` in the `config` block of your [incremental model's](https://docs.getdbt.com/docs/build/incremental-models.md) SQL file, in your `models/properties.yml` file, or in your `dbt_project.yml` file. models/my\_incremental\_model.sql ```sql {{ config( materialized='incremental', unique_key='id' ) }} ``` models/properties.yml ```yaml models: - name: my_incremental_model description: "An incremental model example with a unique key." config: materialized: incremental unique_key: id ``` dbt\_project.yml ```yaml name: jaffle_shop models: jaffle_shop: staging: +unique_key: id ``` dbt\_project.yml ```yml snapshots: : +unique_key: column_name_or_expression ``` #### Description[​](#description "Direct link to Description") A column name or expression that uniquely identifies each record in the inputs of a snapshot or incremental model. dbt uses this key to match incoming records to existing records in the target table (either a snapshot or an incremental model) so that changes can be captured or updated correctly: * In an incremental model, dbt replaces the old row (like a merge key or upsert). * In a snapshot, dbt keeps history, storing multiple rows for that same `unique_key` as it evolves over time. In dbt **Latest** release track and from dbt v1.9, [snapshots](https://docs.getdbt.com/docs/build/snapshots.md) are defined and configured in YAML files within your `snapshots/` directory. You can specify one or multiple `unique_key` values within your snapshot YAML file's `config` key. caution Providing a non-unique key will result in unexpected snapshot results. dbt **will not** test the uniqueness of this key, consider [testing](https://docs.getdbt.com/blog/primary-key-testing#how-to-test-primary-keys-with-dbt) the source data to ensure that this key is indeed unique. #### Default[​](#default "Direct link to Default") This parameter is optional. If you don't provide a `unique_key`, your adapter will default to using `incremental_strategy: append`. If you leave out the `unique_key` parameter and use strategies like `merge`, `insert_overwrite`, `delete+insert`, or `microbatch`, the adapter will fall back to using `incremental_strategy: append`. This is different for BigQuery: * For `incremental_strategy = merge`, you must provide a `unique_key`; leaving it out leads to ambiguous or failing behavior. * For `insert_overwrite` or `microbatch`, `unique_key` is not required because they work by partition replacement rather than row-level upserts. #### Examples[​](#examples "Direct link to Examples") ##### Use an `id` column as a unique key[​](#use-an-id-column-as-a-unique-key "Direct link to use-an-id-column-as-a-unique-key") * Models * Snapshots In this example, the `id` column is the unique key for an incremental model. models/my\_incremental\_model.sql ```sql {{ config( materialized='incremental', unique_key='id' ) }} select * from .. ``` In this example, the `id` column is used as a unique key for a snapshot. You can also specify configurations in your `dbt_project.yml` file if multiple snapshots share the same `unique_key`: dbt\_project.yml ```yml snapshots: : +unique_key: id ``` #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Unit test overrides When configuring your unit test, you can override the output of [macros](https://docs.getdbt.com/docs/build/jinja-macros.md#macros), [project variables](https://docs.getdbt.com/docs/build/project-variables.md), or [environment variables](https://docs.getdbt.com/docs/build/environment-variables.md) for a given unit test. models/schema.yml ```yml - name: test_my_model_overrides model: my_model given: - input: ref('my_model_a') rows: - {id: 1, a: 1} - input: ref('my_model_b') rows: - {id: 1, b: 2} - {id: 2, b: 2} overrides: macros: type_numeric: override invocation_id: 123 vars: my_test: var_override env_vars: MY_TEST: env_var_override expect: rows: - {macro_call: override, var_call: var_override, env_var_call: env_var_override, invocation_id: 123} ``` #### Macros[​](#macros "Direct link to Macros") You can override the output of any macro in your unit test defition. If the model you're unit testing uses these macros, you must override them: * [`is_incremental`](https://docs.getdbt.com/docs/build/incremental-models.md#understand-the-is_incremental-macro): If you're unit testing an incremental model, you must explicity set `is_incremental` to `true` or `false`. See more docs on unit testing incremental models [here](https://docs.getdbt.com/docs/build/unit-tests.md#unit-testing-incremental-models). models/schema.yml ```yml unit_tests: - name: my_unit_test model: my_incremental_model overrides: macros: # unit test this model in "full refresh" mode is_incremental: false ... ``` * [`dbt_utils.star`](https://docs.getdbt.com/blog/star-sql-love-letter): If you're unit testing a model that uses the `star` macro, you must explicity set `star` to a list of columns. This is because the `star` only accepts a [relation](https://docs.getdbt.com/reference/dbt-classes.md#relation) for the `from` argument; the unit test mock input data is injected directly into the model SQL, replacing the `ref('')` or `source('')` function, causing the `star` macro to fail unless overidden. models/schema.yml ```yml unit_tests: - name: my_other_unit_test model: my_model_that_uses_star overrides: macros: # explicity set star to relevant list of columns dbt_utils.star: col_a,col_b,col_c ... ``` #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Unit testing versioned SQL models If your model has multiple versions, the default unit test will run on *all* versions of your model. To specify version(s) of your model to unit test, use `include` or `exclude` for the desired versions in your model versions config: models/schema.yml ```yaml # my test_is_valid_email_address unit test will run on all versions of my_model unit_tests: - name: test_is_valid_email_address model: my_model ... # my test_is_valid_email_address unit test will run on ONLY version 2 of my_model unit_tests: - name: test_is_valid_email_address model: my_model versions: include: - 2 ... # my test_is_valid_email_address unit test will run on all versions EXCEPT 1 of my_model unit_tests: - name: test_is_valid_email_address model: my_model versions: exclude: - 1 ... ``` #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### updated_at dbt\_project.yml ```yml snapshots: : +strategy: timestamp +updated_at: column_name ``` #### Description[​](#description "Direct link to Description") A column within the results of your snapshot query that represents when the record row was last updated. This parameter is **required if using the `timestamp` [strategy](https://docs.getdbt.com/reference/resource-configs/strategy.md)**. The `updated_at` field may support ISO date strings and unix epoch integers, depending on the data platform you use. #### Default[​](#default "Direct link to Default") No default is provided. #### Examples[​](#examples "Direct link to Examples") ##### Use a column name `updated_at`[​](#use-a-column-name-updated_at "Direct link to use-a-column-name-updated_at") ##### Coalesce two columns to create a reliable `updated_at` column[​](#coalesce-two-columns-to-create-a-reliable-updated_at-column "Direct link to coalesce-two-columns-to-create-a-reliable-updated_at-column") Consider a data source that only has an `updated_at` column filled in when a record is updated (so a `null` value indicates that the record hasn't been updated after it was created). Since the `updated_at` configuration only takes a column name, rather than an expression, you should update your snapshot query to include the coalesced column. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Upsolver configurations #### Supported Upsolver SQLake functionality[​](#supported-upsolver-sqlake-functionality "Direct link to Supported Upsolver SQLake functionality") | COMMAND | STATE | MATERIALIZED | | ---------------------- | ------------- | ---------------- | | SQL compute cluster | not supported | - | | SQL connections | supported | connection | | SQL copy job | supported | incremental | | SQL merge job | supported | incremental | | SQL insert job | supported | incremental | | SQL materialized views | supported | materializedview | | Expectations | supported | incremental | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | #### Configs materialization[​](#configs-materialization "Direct link to Configs materialization") | Config | Required | Materialization | Description | Example | | ---------------------- | -------- | ---------------------------- | ---------------------------------------------------------------------------------------------- | ----------------------------------------------------------------------------------------------- | | connection\_type | Yes | connection | Connection identifier: S3/GLUE\_CATALOG/KINESIS | connection\_type='S3' | | connection\_options | Yes | connection | Dictionary of options supported by selected connection | connection\_options={ 'aws\_role': 'aws\_role', 'external\_id': 'SAMPLES', 'read\_only': True } | | incremental\_strategy | No | incremental | Define one of incremental strategies: merge/copy/insert. Default: copy | incremental\_strategy='merge' | | source | No | incremental | Define source to copy from: S3/KAFKA/KINESIS | source = 'S3' | | target\_type | No | incremental | Define target type REDSHIFT/ELASTICSEARCH/S3/SNOWFLAKE/POSTGRES. Default None for Data lake | target\_type='Snowflake' | | target\_prefix | False | incremental | Define PREFIX for ELASTICSEARCH target type | target\_prefix = 'orders' | | target\_location | False | incremental | Define LOCATION for S3 target type | target\_location = 's3://your-bucket-name/path/to/folder/' | | schema | Yes/No | incremental | Define target schema. Required if target\_type, no table created in a metastore connection | schema = 'target\_schema' | | database | Yes/No | incremental | Define target connection. Required if target\_type, no table created in a metastore connection | database = 'target\_connection' | | alias | Yes/No | incremental | Define target table. Required if target\_type, no table created in a metastore connection | alias = 'target\_table' | | delete\_condition | No | incremental | Records that match the ON condition and a delete condition can be deleted | delete\_condition='nettotal > 1000' | | partition\_by | No | incremental | List of dictionaries to define partition\_by for target metastore table | partition\_by=\[{'field':'$field\_name'}] | | primary\_key | No | incremental | List of dictionaries to define partition\_by for target metastore table | primary\_key=\[{'field':'customer\_email', 'type':'string'}] | | map\_columns\_by\_name | No | incremental | Maps columns from the SELECT statement to the table. Boolean. Default: False | map\_columns\_by\_name=True | | sync | No | incremental/materializedview | Boolean option to define if job is synchronized or non-msynchronized. Default: False | sync=True | | options | No | incremental/materializedview | Dictionary of job options | options={ 'START\_FROM': 'BEGINNING', 'ADD\_MISSING\_COLUMNS': True } | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | #### SQL connection[​](#sql-connection "Direct link to SQL connection") Connections are used to provide Upsolver with the proper credentials to bring your data into SQLake as well as to write out your transformed data to various services. More details on ["Upsolver SQL connections"](https://docs.upsolver.com/sqlake/sql-command-reference/sql-connections) As a dbt model connection is a model with materialized='connection' ```sql {{ config( materialized='connection', connection_type={ 'S3' | 'GLUE_CATALOG' | 'KINESIS' | 'KAFKA'| 'SNOWFLAKE' }, connection_options={} ) }} ``` Running this model will compile CREATE CONNECTION(or ALTER CONNECTION if exists) SQL and send it to Upsolver engine. Name of the connection will be name of the model. #### SQL copy job[​](#sql-copy-job "Direct link to SQL copy job") A COPY FROM job allows you to copy your data from a given source into a table created in a metastore connection. This table then serves as your staging table and can be used with SQLake transformation jobs to write to various target locations. More details on ["Upsolver SQL copy-from"](https://docs.upsolver.com/sqlake/sql-command-reference/sql-jobs/create-job/copy-from) As a dbt model copy job is model with materialized='incremental' ```sql {{ config( materialized='incremental', sync=True|False, source = 'S3'| 'KAFKA' | ... , options={ 'option_name': 'option_value' }, partition_by=[{}] ) }} SELECT * FROM {{ ref() }} ``` Running this model will compile CREATE TABLE SQL for target type Data lake (or ALTER TABLE if exists) and CREATE COPY JOB(or ALTER COPY JOB if exists) SQL and send it to Upsolver engine. Name of the table will be name of the model. Name of the job will be name of the model plus '\_job' #### SQL insert job[​](#sql-insert-job "Direct link to SQL insert job") An INSERT job defines a query that pulls in a set of data based on the given SELECT statement and inserts it into the designated target. This query is then run periodically based on the RUN\_INTERVAL defined within the job. More details on ["Upsolver SQL insert"](https://docs.upsolver.com/sqlake/sql-command-reference/sql-jobs/create-job/sql-transformation-jobs/insert). As a dbt model insert job is model with materialized='incremental' and incremental\_strategy='insert' ```sql {{ config( materialized='incremental', sync=True|False, map_columns_by_name=True|False, incremental_strategy='insert', options={ 'option_name': 'option_value' }, primary_key=[{}] ) }} SELECT ... FROM {{ ref() }} WHERE ... GROUP BY ... HAVING COUNT(DISTINCT orderid::string) ... ``` Running this model will compile CREATE TABLE SQL for target type Data lake(or ALTER TABLE if exists) and CREATE INSERT JOB(or ALTER INSERT JOB if exists) SQL and send it to Upsolver engine. Name of the table will be name of the model. Name of the job will be name of the model plus '\_job' #### SQL merge job[​](#sql-merge-job "Direct link to SQL merge job") A MERGE job defines a query that pulls in a set of data based on the given SELECT statement and inserts into, replaces, or deletes the data from the designated target based on the job definition. This query is then run periodically based on the RUN\_INTERVAL defined within the job. More details on ["Upsolver SQL merge"](https://docs.upsolver.com/sqlake/sql-command-reference/sql-jobs/create-job/sql-transformation-jobs/merge). As a dbt model merge job is model with materialized='incremental' and incremental\_strategy='merge' ```sql {{ config( materialized='incremental', sync=True|False, map_columns_by_name=True|False, incremental_strategy='merge', options={ 'option_name': 'option_value' }, primary_key=[{}] ) }} SELECT ... FROM {{ ref() }} WHERE ... GROUP BY ... HAVING COUNT ... ``` Running this model will compile CREATE TABLE SQL for target type Data lake(or ALTER TABLE if exists) and CREATE MERGE JOB(or ALTER MERGE JOB if exists) SQL and send it to Upsolver engine. Name of the table will be name of the model. Name of the job will be name of the model plus '\_job' #### SQL materialized views[​](#sql-materialized-views "Direct link to SQL materialized views") When transforming your data, you may find that you need data from multiple source tables in order to achieve your desired result. In such a case, you can create a materialized view from one SQLake table in order to join it with your other table (which in this case is considered the main table). More details on ["Upsolver SQL materialized views"](https://docs.upsolver.com/sqlake/sql-command-reference/sql-jobs/create-job/sql-transformation-jobs/sql-materialized-views). As a dbt model materialized views is model with materialized='materializedview'. ```sql {{ config( materialized='materializedview', sync=True|False, options={'option_name': 'option_value'} ) }} SELECT ... FROM {{ ref() }} WHERE ... GROUP BY ... ``` Running this model will compile CREATE MATERIALIZED VIEW SQL(or ALTER MATERIALIZED VIEW if exists) and send it to Upsolver engine. Name of the materializedview will be name of the model. #### Expectations/constraints[​](#expectationsconstraints "Direct link to Expectations/constraints") Data quality conditions can be added to your job to drop a row or trigger a warning when a column violates a predefined condition. ```sql WITH EXPECTATION EXPECT ON VIOLATION WARN ``` Expectations can be implemented with dbt constraints Supported constraints: check and not\_null ```yaml models: - name: # required config: contract: enforced: true # model-level constraints constraints: - type: check columns: ['', ''] expression: "column1 <= column2" name: - type: not_null columns: ['column1', 'column2'] name: columns: - name: data_type: string # column-level constraints constraints: - type: not_null - type: check expression: "REGEXP_LIKE(, '^[0-9]{4}[a-z]{5}$')" name: ``` #### Projects examples[​](#projects-examples "Direct link to Projects examples") > projects examples link: [github.com/dbt-upsolver/examples/](https://github.com/Upsolver/dbt-upsolver/tree/main/examples) #### Connection options[​](#connection-options "Direct link to Connection options") | Option | Storage | Editable | Optional | Config Syntax | | ---------------------------------- | ------------- | -------- | -------- | ------------------------------------------------------------------- | | aws\_role | s3 | True | True | 'aws\_role': `''` | | external\_id | s3 | True | True | 'external\_id': `''` | | aws\_access\_key\_id | s3 | True | True | 'aws\_access\_key\_id': `''` | | aws\_secret\_access\_key | s3 | True | True | 'aws\_secret\_access\_key\_id': `''` | | path\_display\_filter | s3 | True | True | 'path\_display\_filter': `''` | | path\_display\_filters | s3 | True | True | 'path\_display\_filters': (`''`, ...) | | read\_only | s3 | True | True | 'read\_only': True/False | | encryption\_kms\_key | s3 | True | True | 'encryption\_kms\_key': `''` | | encryption\_customer\_managed\_key | s3 | True | True | 'encryption\_customer\_kms\_key': `''` | | comment | s3 | True | True | 'comment': `''` | | host | kafka | False | False | 'host': `''` | | hosts | kafka | False | False | 'hosts': (`''`, ...) | | consumer\_properties | kafka | True | True | 'consumer\_properties': `''` | | version | kafka | False | True | 'version': `''` | | require\_static\_ip | kafka | True | True | 'require\_static\_ip': True/False | | ssl | kafka | True | True | 'ssl': True/False | | topic\_display\_filter | kafka | True | True | 'topic\_display\_filter': `''` | | topic\_display\_filters | kafka | True | True | 'topic\_display\_filter': (`''`, ...) | | comment | kafka | True | True | 'comment': `''` | | aws\_role | glue\_catalog | True | True | 'aws\_role': `''` | | external\_id | glue\_catalog | True | True | 'external\_id': `''` | | aws\_access\_key\_id | glue\_catalog | True | True | 'aws\_access\_key\_id': `''` | | aws\_secret\_access\_key | glue\_catalog | True | True | 'aws\_secret\_access\_key': `''` | | default\_storage\_connection | glue\_catalog | False | False | 'default\_storage\_connection': `''` | | default\_storage\_location | glue\_catalog | False | False | 'default\_storage\_location': `''` | | region | glue\_catalog | False | True | 'region': `''` | | database\_display\_filter | glue\_catalog | True | True | 'database\_display\_filter': `''` | | database\_display\_filters | glue\_catalog | True | True | 'database\_display\_filters': (`''`, ...) | | comment | glue\_catalog | True | True | 'comment': `''` | | aws\_role | kinesis | True | True | 'aws\_role': `''` | | external\_id | kinesis | True | True | 'external\_id': `''` | | aws\_access\_key\_id | kinesis | True | True | 'aws\_access\_key\_id': `''` | | aws\_secret\_access\_key | kinesis | True | True | 'aws\_secret\_access\_key': `''` | | region | kinesis | False | False | 'region': `''` | | read\_only | kinesis | False | True | 'read\_only': True/False | | max\_writers | kinesis | True | True | 'max\_writers': `` | | stream\_display\_filter | kinesis | True | True | 'stream\_display\_filter': `''` | | stream\_display\_filters | kinesis | True | True | 'stream\_display\_filters': (`''`, ...) | | comment | kinesis | True | True | 'comment': `''` | | connection\_string | snowflake | True | False | 'connection\_string': `''` | | user\_name | snowflake | True | False | 'user\_name': `''` | | password | snowflake | True | False | 'password': `''` | | max\_concurrent\_connections | snowflake | True | True | 'max\_concurrent\_connections': `` | | comment | snowflake | True | True | 'comment': `''` | | connection\_string | redshift | True | False | 'connection\_string': `''` | | user\_name | redshift | True | False | 'user\_name': `''` | | password | redshift | True | False | 'password': `''` | | max\_concurrent\_connections | redshift | True | True | 'max\_concurrent\_connections': `` | | comment | redshift | True | True | 'comment': `''` | | connection\_string | mysql | True | False | 'connection\_string': `''` | | user\_name | mysql | True | False | 'user\_name': `''` | | password | mysql | True | False | 'password': `''` | | comment | mysql | True | True | 'comment': `''` | | connection\_string | postgres | True | False | 'connection\_string': `''` | | user\_name | postgres | True | False | 'user\_name': `''` | | password | postgres | True | False | 'password': `''` | | comment | postgres | True | True | 'comment': `''` | | connection\_string | elasticsearch | True | False | 'connection\_string': `''` | | user\_name | elasticsearch | True | False | 'user\_name': `''` | | password | elasticsearch | True | False | 'password': `''` | | comment | elasticsearch | True | True | 'comment': `''` | | connection\_string | mongodb | True | False | 'connection\_string': `''` | | user\_name | mongodb | True | False | 'user\_name': `''` | | password | mongodb | True | False | 'password': `''` | | timeout | mongodb | True | True | 'timeout': "INTERVAL 'N' SECONDS" | | comment | mongodb | True | True | 'comment': `''` | | connection\_string | mssql | True | False | 'connection\_string': `''` | | user\_name | mssql | True | False | 'user\_name': `''` | | password | mssql | True | False | 'password': `''` | | comment | mssql | True | True | 'comment': `''` | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | #### Target options[​](#target-options "Direct link to Target options") | Option | Storage | Editable | Optional | Config Syntax | | --------------------------- | ------------------ | -------- | -------- | ------------------------------------------------------------------------------- | | globally\_unique\_keys | datalake | False | True | 'globally\_unique\_keys': True/False | | storage\_connection | datalake | False | True | 'storage\_connection': `''` | | storage\_location | datalake | False | True | 'storage\_location': `''` | | compute\_cluster | datalake | True | True | 'compute\_cluster': `''` | | compression | datalake | True | True | 'compression': 'SNAPPY/GZIP' | | compaction\_processes | datalake | True | True | 'compaction\_processes': `` | | disable\_compaction | datalake | True | True | 'disable\_compaction': True/False | | retention\_date\_partition | datalake | False | True | 'retention\_date\_partition': `''` | | table\_data\_retention | datalake | True | True | 'table\_data\_retention': `''` | | column\_data\_retention | datalake | True | True | 'column\_data\_retention': ({'COLUMN' : `''`,'DURATION': `''`}) | | comment | datalake | True | True | 'comment': `''` | | storage\_connection | materialized\_view | False | True | 'storage\_connection': `''` | | storage\_location | materialized\_view | False | True | 'storage\_location': `''` | | max\_time\_travel\_duration | materialized\_view | True | True | 'max\_time\_travel\_duration': `''` | | compute\_cluster | materialized\_view | True | True | 'compute\_cluster': `''` | | column\_transformations | snowflake | False | True | 'column\_transformations': {`''` : `''` , ...} | | deduplicate\_with | snowflake | False | True | 'deduplicate\_with': {'COLUMNS' : \['col1', 'col2'],'WINDOW': 'N HOURS'} | | exclude\_columns | snowflake | False | True | 'exclude\_columns': (`''`, ...) | | create\_table\_if\_missing | snowflake | False | True | 'create\_table\_if\_missing': True/False} | | run\_interval | snowflake | False | True | 'run\_interval': `''` | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | #### Transformation options[​](#transformation-options "Direct link to Transformation options") | Option | Storage | Editable | Optional | Config Syntax | | ---------------------------------- | ------------- | -------- | -------- | ------------------------------------------------------------------------------------------------- | | run\_interval | s3 | False | True | 'run\_interval': `''` | | start\_from | s3 | False | True | 'start\_from': `'/NOW/BEGINNING'` | | end\_at | s3 | True | True | 'end\_at': `'/NOW'` | | compute\_cluster | s3 | True | True | 'compute\_cluster': `''` | | comment | s3 | True | True | 'comment': `''` | | skip\_validations | s3 | False | True | 'skip\_validations': ('ALLOW\_CARTESIAN\_PRODUCT', ...) | | skip\_all\_validations | s3 | False | True | 'skip\_all\_validations': True/False | | aggregation\_parallelism | s3 | True | True | 'aggregation\_parallelism': `` | | run\_parallelism | s3 | True | True | 'run\_parallelism': `` | | file\_format | s3 | False | False | 'file\_format': '(type = ``)' | | compression | s3 | False | True | 'compression': 'SNAPPY/GZIP ...' | | date\_pattern | s3 | False | True | 'date\_pattern': `''` | | output\_offset | s3 | False | True | 'output\_offset': `''` | | run\_interval | elasticsearch | False | True | 'run\_interval': `''` | | routing\_field\_name | elasticsearch | True | True | 'routing\_field\_name': `''` | | start\_from | elasticsearch | False | True | 'start\_from': `'/NOW/BEGINNING'` | | end\_at | elasticsearch | True | True | 'end\_at': `'/NOW'` | | compute\_cluster | elasticsearch | True | True | 'compute\_cluster': `''` | | skip\_validations | elasticsearch | False | True | 'skip\_validations': ('ALLOW\_CARTESIAN\_PRODUCT', ...) | | skip\_all\_validations | elasticsearch | False | True | 'skip\_all\_validations': True/False | | aggregation\_parallelism | elasticsearch | True | True | 'aggregation\_parallelism': `` | | run\_parallelism | elasticsearch | True | True | 'run\_parallelism': `` | | bulk\_max\_size\_bytes | elasticsearch | True | True | 'bulk\_max\_size\_bytes': `` | | index\_partition\_size | elasticsearch | True | True | 'index\_partition\_size': 'HOURLY/DAILY ...' | | comment | elasticsearch | True | True | 'comment': `''` | | custom\_insert\_expressions | snowflake | True | True | 'custom\_insert\_expressions': {'INSERT\_TIME' : 'CURRENT\_TIMESTAMP()','MY\_VALUE': `''`} | | custom\_update\_expressions | snowflake | True | True | 'custom\_update\_expressions': {'UPDATE\_TIME' : 'CURRENT\_TIMESTAMP()','MY\_VALUE': `''`} | | keep\_existing\_values\_when\_null | snowflake | True | True | 'keep\_existing\_values\_when\_null': True/False | | add\_missing\_columns | snowflake | False | True | 'add\_missing\_columns': True/False | | run\_interval | snowflake | False | True | 'run\_interval': `''` | | commit\_interval | snowflake | True | True | 'commit\_interval': `''` | | start\_from | snowflake | False | True | 'start\_from': `'/NOW/BEGINNING'` | | end\_at | snowflake | True | True | 'end\_at': `'/NOW'` | | compute\_cluster | snowflake | True | True | 'compute\_cluster': `''` | | skip\_validations | snowflake | False | True | 'skip\_validations': ('ALLOW\_CARTESIAN\_PRODUCT', ...) | | skip\_all\_validations | snowflake | False | True | 'skip\_all\_validations': True/False | | aggregation\_parallelism | snowflake | True | True | 'aggregation\_parallelism': `` | | run\_parallelism | snowflake | True | True | 'run\_parallelism': `` | | comment | snowflake | True | True | 'comment': `''` | | add\_missing\_columns | datalake | False | True | 'add\_missing\_columns': True/False | | run\_interval | datalake | False | True | 'run\_interval': `''` | | start\_from | datalake | False | True | 'start\_from': `'/NOW/BEGINNING'` | | end\_at | datalake | True | True | 'end\_at': `'/NOW'` | | compute\_cluster | datalake | True | True | 'compute\_cluster': `''` | | skip\_validations | datalake | False | True | 'skip\_validations': ('ALLOW\_CARTESIAN\_PRODUCT', ...) | | skip\_all\_validations | datalake | False | True | 'skip\_all\_validations': True/False | | aggregation\_parallelism | datalake | True | True | 'aggregation\_parallelism': `` | | run\_parallelism | datalake | True | True | 'run\_parallelism': `` | | comment | datalake | True | True | 'comment': `''` | | run\_interval | redshift | False | True | 'run\_interval': `''` | | start\_from | redshift | False | True | 'start\_from': `'/NOW/BEGINNING'` | | end\_at | redshift | True | True | 'end\_at': `'/NOW'` | | compute\_cluster | redshift | True | True | 'compute\_cluster': `''` | | skip\_validations | redshift | False | True | 'skip\_validations': ('ALLOW\_CARTESIAN\_PRODUCT', ...) | | skip\_all\_validations | redshift | False | True | 'skip\_all\_validations': True/False | | aggregation\_parallelism | redshift | True | True | 'aggregation\_parallelism': `` | | run\_parallelism | redshift | True | True | 'run\_parallelism': `` | | skip\_failed\_files | redshift | False | True | 'skip\_failed\_files': True/False | | fail\_on\_write\_error | redshift | False | True | 'fail\_on\_write\_error': True/False | | comment | redshift | True | True | 'comment': `''` | | run\_interval | postgres | False | True | 'run\_interval': `''` | | start\_from | postgres | False | True | 'start\_from': `'/NOW/BEGINNING'` | | end\_at | postgres | True | True | 'end\_at': `'/NOW'` | | compute\_cluster | postgres | True | True | 'compute\_cluster': `''` | | skip\_validations | postgres | False | True | 'skip\_validations': ('ALLOW\_CARTESIAN\_PRODUCT', ...) | | skip\_all\_validations | postgres | False | True | 'skip\_all\_validations': True/False | | aggregation\_parallelism | postgres | True | True | 'aggregation\_parallelism': `` | | run\_parallelism | postgres | True | True | 'run\_parallelism': `` | | comment | postgres | True | True | 'comment': `''` | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | #### Copy options[​](#copy-options "Direct link to Copy options") | Option | Storage | Category | Editable | Optional | Config Syntax | | -------------------------- | -------- | --------------- | -------- | -------- | ------------------------------------------------------------------------ | | topic | kafka | source\_options | False | False | 'topic': `''` | | exclude\_columns | kafka | job\_options | False | True | 'exclude\_columns': (`''`, ...) | | deduplicate\_with | kafka | job\_options | False | True | 'deduplicate\_with': {'COLUMNS' : \['col1', 'col2'],'WINDOW': 'N HOURS'} | | consumer\_properties | kafka | job\_options | True | True | 'consumer\_properties': `''` | | reader\_shards | kafka | job\_options | True | True | 'reader\_shards': `` | | store\_raw\_data | kafka | job\_options | False | True | 'store\_raw\_data': True/False | | start\_from | kafka | job\_options | False | True | 'start\_from': 'BEGINNING/NOW' | | end\_at | kafka | job\_options | True | True | 'end\_at': `'/NOW'` | | compute\_cluster | kafka | job\_options | True | True | 'compute\_cluster': `''` | | run\_parallelism | kafka | job\_options | True | True | 'run\_parallelism': `` | | content\_type | kafka | job\_options | True | True | 'content\_type': 'AUTO/CSV/...' | | compression | kafka | job\_options | False | True | 'compression': 'AUTO/GZIP/...' | | column\_transformations | kafka | job\_options | False | True | 'column\_transformations': {`''` : `''` , ...} | | commit\_interval | kafka | job\_options | True | True | 'commit\_interval': `''` | | skip\_validations | kafka | job\_options | False | True | 'skip\_validations': ('MISSING\_TOPIC') | | skip\_all\_validations | kafka | job\_options | False | True | 'skip\_all\_validations': True/False | | comment | kafka | job\_options | True | True | 'comment': `''` | | table\_include\_list | mysql | source\_options | True | True | 'table\_include\_list': (`''`, ...) | | column\_exclude\_list | mysql | source\_options | True | True | 'column\_exclude\_list': (`''`, ...) | | exclude\_columns | mysql | job\_options | False | True | 'exclude\_columns': (`''`, ...) | | column\_transformations | mysql | job\_options | False | True | 'column\_transformations': {`''` : `''` , ...} | | skip\_snapshots | mysql | job\_options | True | True | 'skip\_snapshots': True/False | | end\_at | mysql | job\_options | True | True | 'end\_at': `'/NOW'` | | compute\_cluster | mysql | job\_options | True | True | 'compute\_cluster': `''` | | snapshot\_parallelism | mysql | job\_options | True | True | 'snapshot\_parallelism': `` | | ddl\_filters | mysql | job\_options | False | True | 'ddl\_filters': (`''`, ...) | | comment | mysql | job\_options | True | True | 'comment': `''` | | table\_include\_list | postgres | source\_options | False | False | 'table\_include\_list': (`''`, ...) | | column\_exclude\_list | postgres | source\_options | False | True | 'column\_exclude\_list': (`''`, ...) | | heartbeat\_table | postgres | job\_options | False | True | 'heartbeat\_table': `''` | | skip\_snapshots | postgres | job\_options | False | True | 'skip\_snapshots': True/False | | publication\_name | postgres | job\_options | False | False | 'publication\_name': `''` | | end\_at | postgres | job\_options | True | True | 'end\_at': `'/NOW'` | | compute\_cluster | postgres | job\_options | True | True | 'compute\_cluster': `''` | | comment | postgres | job\_options | True | True | 'comment': `''` | | parse\_json\_columns | postgres | job\_options | False | False | 'parse\_json\_columns': True/False | | column\_transformations | postgres | job\_options | False | True | 'column\_transformations': {`''` : `''` , ...} | | snapshot\_parallelism | postgres | job\_options | True | True | 'snapshot\_parallelism': `` | | exclude\_columns | postgres | job\_options | False | True | 'exclude\_columns': (`''`, ...) | | location | s3 | source\_options | False | False | 'location': `''` | | date\_pattern | s3 | job\_options | False | True | 'date\_pattern': `''` | | file\_pattern | s3 | job\_options | False | True | 'file\_pattern': `''` | | initial\_load\_pattern | s3 | job\_options | False | True | 'initial\_load\_pattern': `''` | | initial\_load\_prefix | s3 | job\_options | False | True | 'initial\_load\_prefix': `''` | | delete\_files\_after\_load | s3 | job\_options | False | True | 'delete\_files\_after\_load': True/False | | deduplicate\_with | s3 | job\_options | False | True | 'deduplicate\_with': {'COLUMNS' : \['col1', 'col2'],'WINDOW': 'N HOURS'} | | end\_at | s3 | job\_options | True | True | 'end\_at': `'/NOW'` | | start\_from | s3 | job\_options | False | True | 'start\_from': `'/NOW/BEGINNING'` | | compute\_cluster | s3 | job\_options | True | True | 'compute\_cluster': `''` | | run\_parallelism | s3 | job\_options | True | True | 'run\_parallelism': `` | | content\_type | s3 | job\_options | True | True | 'content\_type': 'AUTO/CSV...' | | compression | s3 | job\_options | False | True | 'compression': 'AUTO/GZIP...' | | comment | s3 | job\_options | True | True | 'comment': `''` | | column\_transformations | s3 | job\_options | False | True | 'column\_transformations': {`''` : `''` , ...} | | commit\_interval | s3 | job\_options | True | True | 'commit\_interval': `''` | | skip\_validations | s3 | job\_options | False | True | 'skip\_validations': ('EMPTY\_PATH') | | skip\_all\_validations | s3 | job\_options | False | True | 'skip\_all\_validations': True/False | | exclude\_columns | s3 | job\_options | False | True | 'exclude\_columns': (`''`, ...) | | stream | kinesis | source\_options | False | False | 'stream': `''` | | reader\_shards | kinesis | job\_options | True | True | 'reader\_shards': `` | | store\_raw\_data | kinesis | job\_options | False | True | 'store\_raw\_data': True/False | | start\_from | kinesis | job\_options | False | True | 'start\_from': `'/NOW/BEGINNING'` | | end\_at | kinesis | job\_options | False | True | 'end\_at': `'/NOW'` | | compute\_cluster | kinesis | job\_options | True | True | 'compute\_cluster': `''` | | run\_parallelism | kinesis | job\_options | False | True | 'run\_parallelism': `` | | content\_type | kinesis | job\_options | True | True | 'content\_type': 'AUTO/CSV...' | | compression | kinesis | job\_options | False | True | 'compression': 'AUTO/GZIP...' | | comment | kinesis | job\_options | True | True | 'comment': `''` | | column\_transformations | kinesis | job\_options | True | True | 'column\_transformations': {`''` : `''` , ...} | | deduplicate\_with | kinesis | job\_options | False | True | 'deduplicate\_with': {'COLUMNS' : \['col1', 'col2'],'WINDOW': 'N HOURS'} | | commit\_interval | kinesis | job\_options | True | True | 'commit\_interval': `''` | | skip\_validations | kinesis | job\_options | False | True | 'skip\_validations': ('MISSING\_STREAM') | | skip\_all\_validations | kinesis | job\_options | False | True | 'skip\_all\_validations': True/False | | exclude\_columns | kinesis | job\_options | False | True | 'exclude\_columns': (`''`, ...) | | table\_include\_list | mssql | source\_options | True | True | 'table\_include\_list': (`''`, ...) | | column\_exclude\_list | mssql | source\_options | True | True | 'column\_exclude\_list': (`''`, ...) | | exclude\_columns | mssql | job\_options | False | True | 'exclude\_columns': (`''`, ...) | | column\_transformations | mssql | job\_options | False | True | 'column\_transformations': {`''` : `''` , ...} | | skip\_snapshots | mssql | job\_options | True | True | 'skip\_snapshots': True/False | | end\_at | mssql | job\_options | True | True | 'end\_at': `'/NOW'` | | compute\_cluster | mssql | job\_options | True | True | 'compute\_cluster': `''` | | snapshot\_parallelism | mssql | job\_options | True | True | 'snapshot\_parallelism': `` | | parse\_json\_columns | mssql | job\_options | False | False | 'parse\_json\_columns': True/False | | comment | mssql | job\_options | True | True | 'comment': `''` | | collection\_include\_list | mongodb | source\_options | True | True | 'collection\_include\_list': (`''`, ...) | | exclude\_columns | mongodb | job\_options | False | True | 'exclude\_columns': (`''`, ...) | | column\_transformations | mongodb | job\_options | False | True | 'column\_transformations': {`''` : `''` , ...} | | skip\_snapshots | mongodb | job\_options | True | True | 'skip\_snapshots': True/False | | end\_at | mongodb | job\_options | True | True | 'end\_at': `'/NOW'` | | compute\_cluster | mongodb | job\_options | True | True | 'compute\_cluster': `''` | | snapshot\_parallelism | mongodb | job\_options | True | True | 'snapshot\_parallelism': `` | | comment | mongodb | job\_options | True | True | 'comment': `''` | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Using the + prefix Use the + prefix to help clarify the difference between resource paths and configs in dbt\_project.yml files. The `+` prefix is a dbt syntax feature which helps disambiguate between [resource paths](https://docs.getdbt.com/reference/resource-configs/resource-path.md) and configs in [`dbt_project.yml` files](https://docs.getdbt.com/reference/dbt_project.yml.md). * It is not compatible with `dbt_project.yml` files that use [`config-version`](https://docs.getdbt.com/reference/project-configs/config-version.md) 1. * It doesn't apply to: * `config()` Jinja macro within a resource file * config property in a `.yml` file. For example: dbt\_project.yml ```yml name: jaffle_shop config-version: 2 ... models: +materialized: view jaffle_shop: marts: +materialized: table ``` Throughout this documentation, we've tried to be consistent in using the `+` prefix in `dbt_project.yml` files. However, the leading `+` is in fact *only required* when you need to disambiguate between resource paths and configs. For example, when: * A config accepts a dictionary as its inputs. As an example, the [`persist_docs` config](https://docs.getdbt.com/reference/resource-configs/persist_docs.md). * Or, a config shares a key with part of a resource path. For example, if you had a directory of models named `tags`. dbt has deprecated specifying configurations without [the `+` prefix](https://docs.getdbt.com/reference/dbt_project.yml#the--prefix) in `dbt_project.yml`. Only folder and file names can be specified without the `+` prefix within resource configurations in `dbt_project.yml`. dbt\_project.yml ```yml name: jaffle_shop config-version: 2 ... models: +persist_docs: # this config is a dictionary, so needs a + prefix relation: true columns: true jaffle_shop: schema: my_schema # a plus prefix is optional here +tags: # this is the tag config - "hello" config: tags: # whereas this is the tag resource path # changed to config in v1.10 # The below config applies to models in the # models/tags/ directory. # Note: you don't _need_ a leading + here, # but it wouldn't hurt. materialized: view ``` **Note:** The use of the `+` prefix in `dbt_project.yml` is distinct from the use of `+` to control config merge behavior (clobber vs. add) in other config settings (specific resource `.yml` and `.sql` files). Currently, the only config which supports `+` for controlling config merge behavior is [`grants`](https://docs.getdbt.com/reference/resource-configs/grants.md#grant-config-inheritance). #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### version Model versions, dbt\_project.yml versions, and .yml versions The word "version" appears in multiple places in docs site and with different meanings: * [Model versions](https://docs.getdbt.com/docs/mesh/govern/model-versions.md) — A dbt Mesh feature that enables better governance and data model management by allowing you to track changes and updates to models over time. * [dbt\_project.yml version](https://docs.getdbt.com/reference/project-configs/version.md#dbt_projectyml-versions)(optional) — `dbt_project.yml` version is unrelated to Mesh and refers to the compatibility of the dbt project with a specific version of dbt. * [.yml property file version](https://docs.getdbt.com/reference/project-configs/version.md#yml-property-file-versions)(optional) — Version numbers within .yml property files inform how dbt parses those YAML files. Unrelated to Mesh. dbt projects have two distinct types of `version` tags. This field has a different meaning depending on its location. #### `dbt_project.yml` versions[​](#dbt_projectyml-versions "Direct link to dbt_projectyml-versions") The version tag in a `dbt_project` file represents the version of your dbt project. Starting in dbt version 1.5, `version` in the `dbt_project.yml` is an *optional parameter*. If used, the version must be in a [semantic version](https://semver.org/) format, such as `1.0.0`. The default value is `None` if not specified. For users on dbt version 1.4 or lower, this tag is required, though it isn't currently used meaningfully by dbt. For more on Core versions, see [About dbt Core versions](https://docs.getdbt.com/docs/dbt-versions/core.md). dbt\_project.yml ```yml version: version ``` #### `.yml` property file versions[​](#yml-property-file-versions "Direct link to yml-property-file-versions") A version tag in a `.yml` property file provides the control tag, which informs how dbt processes property files. Starting from version 1.5, dbt will no longer require this configuration in your resource `.yml` files. If you want to know more about why this tag was previously required, you can refer to the [FAQs](#faqs). For users on dbt version 1.4 or lower, this tag is required, For more on property files, see their general [documentation](https://docs.getdbt.com/reference/define-properties.md) on the same page. * Resource property file with version specified * Resource property file without version specified \.yml ```yml version: 2 # Only 2 is accepted by dbt versions up to 1.4.latest. models: ... ``` \.yml ```yml models: ... ``` #### FAQS[​](#faqs "Direct link to FAQS") Why do model and source YAML files always start with \`version: 2\`? Once upon a time, the structure of these `.yml` files was very different (s/o to anyone who was using dbt back then!). Adding `version: 2` allowed us to make this structure more extensible. From [dbt Core v1.5](), the top-level `version:` key is optional in all resource YAML files. If present, only `version: 2` is supported. Also starting in v1.5, both the [`config-version: 2`](https://docs.getdbt.com/reference/project-configs/config-version.md) and the top-level `version:` key in the `dbt_project.yml` are optional. Resource YAML files do not currently require this config. We only support `version: 2` if it's specified. Although we do not expect to update YAML files to `version: 3` soon, having this config will make it easier for us to introduce new structures in the future #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### versions Model versions, dbt\_project.yml versions, and .yml versions The word "version" appears in multiple places in docs site and with different meanings: * [Model versions](https://docs.getdbt.com/docs/mesh/govern/model-versions.md) — A dbt Mesh feature that enables better governance and data model management by allowing you to track changes and updates to models over time. * [dbt\_project.yml version](https://docs.getdbt.com/reference/project-configs/version.md#dbt_projectyml-versions)(optional) — `dbt_project.yml` version is unrelated to Mesh and refers to the compatibility of the dbt project with a specific version of dbt. * [.yml property file version](https://docs.getdbt.com/reference/project-configs/version.md#yml-property-file-versions)(optional) — Version numbers within .yml property files inform how dbt parses those YAML files. Unrelated to Mesh. models/\.yml ```yml models: - name: model_name versions: - v: # required defined_in: # optional -- default is _v columns: # specify all columns, or include/exclude columns from the top-level model YAML definition - include: exclude: # specify additional columns - name: # required - v: ... # optional latest_version: ``` The standard convention for naming model versions is `_v`. This holds for the file where dbt expects to find the model's definition (SQL or Python), and the alias it will use by default when materializing the model in the database. ##### `v`[​](#v "Direct link to v") The version identifier for a version of a model. This value can be numeric (integer or float), or any string. The value of the version identifier is used to order versions of a model relative to one another. If a versioned model does *not* explicitly configure a [`latest_version`](https://docs.getdbt.com/reference/resource-properties/latest_version.md), the highest version number is used as the latest version to resolve `ref` calls to the model without a `version` argument. In general, we recommend that you use a simple "major versioning" scheme for your models: `1`, `2`, `3`, and so on, where each version reflects a breaking change from previous versions. You are able to use other versioning schemes. dbt will sort your version identifiers alphabetically if the values are not all numeric. You should **not** include the letter `v` in the version identifier, as dbt will do that for you. To run a model with multiple versions, you can use the [`--select` flag](https://docs.getdbt.com/reference/node-selection/syntax.md). Refer to [Model versions](https://docs.getdbt.com/docs/mesh/govern/model-versions.md#run-a-model-with-multiple-versions) for more information and syntax. ##### `defined_in`[​](#defined_in "Direct link to defined_in") The name of the model file (excluding the file extension, e.g. `.sql` or `.py`) where the model version is defined. If `defined_in` is not specified, dbt searches for the definition of a versioned model in a model file named `_v`. The **latest** version of a model may also be defined in a file named ``, without the version suffix. Model file names must be globally unique, even when defining versioned implementations of a model with a different name. ##### `alias`[​](#alias "Direct link to alias") The default resolved `alias` for a versioned model is `_v`. The logic for this is encoded in the `generate_alias_name` macro. This default can be overwritten in two ways: * Configuring a custom `alias` within the version yaml, or the versioned model's definition * Overwriting dbt's `generate_alias_name` macro, to use different behavior based on `node.version` See ["Custom aliases"](https://docs.getdbt.com/docs/build/custom-aliases.md) for more details. Note that the value of `defined_in` and the `alias` configuration of a model are not coordinated, except by convention. The two are declared and determined independently. ##### `include`[​](#include "Direct link to include") The specification of which columns are defined in a model's top-level `columns` property to include or exclude in a versioned implementation of that model. * `include` is either: * a list of specific column names to include * `'*'` or `'all'`, indicating that **all** columns from the top-level `columns` property should be included in the versioned model * `exclude` is a list of column names to exclude. It can only be declared if `include` is set to one of `'*'` or `'all'`. tip Not to be confused with the `--select/--exclude` [syntax](https://docs.getdbt.com/reference/node-selection/exclude.md), which is used for model selection. The `columns` list of a versioned model can have *at most one* `include/exclude` element. However, if none of your model versions specify columns, you don't need to define columns at all and can omit the `columns/include`/`exclude` keys from the versioned model. In this case, dbt will automatically use all top-level columns for all versions. You may declare additional columns within the version's `columns` list. If a version-specific column's `name` matches a column included from the top level, the version-specific entry will override that column for that version. models/\.yml ```yml models: # top-level model properties - name: columns: - name: # required # versions of this model versions: - v: # required columns: - include: '*' | 'all' | [, ...] exclude: - - ... # declare additional column names to exclude # declare more columns -- can be overrides from top-level, or in addition - name: ... ``` By default, `include` is "all", and `exclude` is the empty list. This has the effect of including all columns from the base model in the versioned model. ###### Example[​](#example "Direct link to Example") models/customers.yml ```yml models: - name: customers columns: - name: customer_id description: Unique identifier for this table data_type: text constraints: - type: not_null data_tests: - unique - name: customer_country data_type: text description: "Country where the customer currently lives" - name: first_purchase_date data_type: date versions: - v: 4 - v: 3 columns: - include: "*" - name: customer_country data_type: text description: "Country where the customer first lived at time of first purchase" - v: 2 columns: - include: "*" exclude: - customer_country - v: 1 columns: - include: [] - name: id data_type: int ``` Because `v4` has not specified any `columns`, it will include all of the top-level `columns`. Each other version has declared a modification from the top-level property: * `v3` will include all columns, but it reimplements the `customer_country` column with a different `description`. * `v2` will include all columns *except* `customer_country`. * `v1` doesn't include *any* of the top-level `columns`. Instead, it declares only a single integer column named `id`. ##### Our recommendations[​](#our-recommendations "Direct link to Our recommendations") * Follow a consistent naming convention for model versions and aliases. * Use `defined_in` and `alias` only if you have good reason. * Create a view that always points to the latest version of your model. You can automate this for all versioned models in your project with an `on-run-end` hook. For more details, read the full docs on ["Model versions"](https://docs.getdbt.com/docs/mesh/govern/model-versions.md#configuring-database-location-with-alias) ##### Detecting breaking changes[​](#detecting-breaking-changes "Direct link to Detecting breaking changes") When you use the `state:modified` selection method in Slim CI, dbt will detect changes to versioned model contracts, and raise an error if any of those changes could be breaking for downstream consumers. Breaking changes include: * Removing an existing column * Changing the data\_type of an existing column * Removing or modifying one of the `constraints` on an existing column (dbt v1.6 or higher) * Changing unversioned, contracted models. * dbt also warns if a model has or had a contract but isn't versioned. - Example message for unversioned models - Example message for versioned models ```text Breaking Change to Unversioned Contract for contracted_model (models/contracted_models/contracted_model.sql) While comparing to previous project state, dbt detected a breaking change to an unversioned model. - Contract enforcement was removed: Previously, this model's configuration included contract: {enforced: true}. It is no longer configured to enforce its contract, and this is a breaking change. - Columns were removed: - color - date_day - Enforced column level constraints were removed: - id (ConstraintType.not_null) - id (ConstraintType.primary_key) - Enforced model level constraints were removed: - ConstraintType.check -> ['id'] - Materialization changed with enforced constraints: - table -> view ``` ```text Breaking Change to Contract Error in model sometable (models/sometable.sql) While comparing to previous project state, dbt detected a breaking change to an enforced contract. The contract's enforcement has been disabled. Columns were removed: - order_name Columns with data_type changes: - order_id (number -> int) Consider making an additive (non-breaking) change instead, if possible. Otherwise, create a new model version: https://docs.getdbt.com/docs/mesh/govern/model-versions ``` Additive changes are **not** considered breaking: * Adding a new column to a contracted model * Adding new `constraints` to an existing column in a contracted model #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Vertica configurations #### Configuration of Incremental Models[​](#configuration-of-incremental-models "Direct link to Configuration of Incremental Models") ##### Using the on\_schema\_change config parameter[​](#using-the-on_schema_change-config-parameter "Direct link to Using the on_schema_change config parameter") You can use `on_schema_change` parameter with values `ignore`, `fail` and `append_new_columns`. Value `sync_all_columns` is not supported at this time. ###### Configuring the `ignore` (default) parameter[​](#configuring-the-ignore-default-parameter "Direct link to configuring-the-ignore-default-parameter") * Source code * Run code vertica\_incremental.sql ```sql {{config(materialized = 'incremental',on_schema_change='ignore')}} select * from {{ ref('seed_added') }} ``` vertica\_incremental.sql ```sql insert into "VMart"."public"."merge" ("id", "name", "some_date") ( select "id", "name", "some_date" from "merge__dbt_tmp" ) ``` ###### Configuring the `fail` parameter[​](#configuring-the-fail-parameter "Direct link to configuring-the-fail-parameter") * Source code * Run code vertica\_incremental.sql ```sql {{config(materialized = 'incremental',on_schema_change='fail')}} select * from {{ ref('seed_added') }} ``` vertica\_incremental.sql ```text The source and target schemas on this incremental model are out of sync! They can be reconciled in several ways: - set the `on_schema_change` config to either append_new_columns or sync_all_columns, depending on your situation. - Re-run the incremental model with `full_refresh: True` to update the target schema. - update the schema manually and re-run the process. Additional troubleshooting context: Source columns not in target: {{ schema_changes_dict['source_not_in_target'] }} Target columns not in source: {{ schema_changes_dict['target_not_in_source'] }} New column types: {{ schema_changes_dict['new_target_types'] }} ``` ###### Configuring the `append_new_columns` parameter[​](#configuring-the-append_new_columns-parameter "Direct link to configuring-the-append_new_columns-parameter") * Source code * Run code vertica\_incremental.sql ```sql {{ config( materialized='incremental', on_schema_change='append_new_columns') }} select * from public.seed_added ``` vertica\_incremental.sql ```sql insert into "VMart"."public"."over" ("id", "name", "some_date", "w", "w1", "t1", "t2", "t3") ( select "id", "name", "some_date", "w", "w1", "t1", "t2", "t3" from "over__dbt_tmp" ) ``` ##### Using the `incremental_strategy` config ​parameter[​](#using-the-incremental_strategy-config-parameter "Direct link to using-the-incremental_strategy-config-parameter") **The `append` strategy (default)**: Insert new records without updating or overwriting any existing data. append only adds the new records based on the condition specified in the `is_incremental()` conditional block. * Source code * Run code vertica\_incremental.sql ```sql {{ config( materialized='incremental', incremental_strategy='append' ) }} select * from public.product_dimension {% if is_incremental() %} where product_key > (select max(product_key) from {{this }}) {% endif %} ``` vertica\_incremental.sql ```sql insert into "VMart"."public"."samp" ( "product_key", "product_version", "product_description", "sku_number", "category_description", "department_description", "package_type_description", "package_size", "fat_content", "diet_type", "weight", "weight_units_of_measure", "shelf_width", "shelf_height", "shelf_depth", "product_price", "product_cost", "lowest_competitor_price", "highest_competitor_price", "average_competitor_price", "discontinued_flag") ( select "product_key", "product_version", "product_description", "sku_number", "category_description", "department_description", "package_type_description", "package_size", "fat_content", "diet_type", "weight", "weight_units_of_measure", "shelf_width", "shelf_height", "shelf_depth", "product_price", "product_cost", "lowest_competitor_price", "highest_competitor_price", "average_competitor_price", "discontinued_flag" from "samp__dbt_tmp" ) ``` **The `merge` strategy**: Match records based on a unique\_key; update old records, insert new ones. (If no unique\_key is specified, all new data is inserted, similar to append.) The unique\_key config parameter is required for using the merge strategy, the value accepted by this parameter is a single table column. * Source code * Run code vertica\_incremental.sql ```sql {{ config( materialized = 'incremental', incremental_strategy = 'merge', unique_key='promotion_key' ) }} select * FROM public.promotion_dimension ``` vertica\_incremental.sql ```sql merge into "VMart"."public"."samp" as DBT_INTERNAL_DEST using "samp__dbt_tmp" as DBT_INTERNAL_SOURCE on DBT_INTERNAL_DEST."promotion_key" = DBT_INTERNAL_SOURCE."promotion_key" when matched then update set "promotion_key" = DBT_INTERNAL_SOURCE."promotion_key", "price_reduction_type" = DBT_INTERNAL_SOURCE."price_reduction_type", "promotion_media_type" = DBT_INTERNAL_SOURCE."promotion_media_type", "display_type" = DBT_INTERNAL_SOURCE."display_type", "coupon_type" = DBT_INTERNAL_SOURCE."coupon_type", "ad_media_name" = DBT_INTERNAL_SOURCE."ad_media_name", "display_provider" = DBT_INTERNAL_SOURCE."display_provider", "promotion_cost" = DBT_INTERNAL_SOURCE."promotion_cost", "promotion_begin_date" = DBT_INTERNAL_SOURCE."promotion_begin_date", "promotion_end_date" = DBT_INTERNAL_SOURCE."promotion_end_date" when not matched then insert ("promotion_key", "price_reduction_type", "promotion_media_type", "display_type", "coupon_type", "ad_media_name", "display_provider", "promotion_cost", "promotion_begin_date", "promotion_end_date") values ( DBT_INTERNAL_SOURCE."promotion_key", DBT_INTERNAL_SOURCE."price_reduction_type", DBT_INTERNAL_SOURCE."promotion_media_type", DBT_INTERNAL_SOURCE."display_type", DBT_INTERNAL_SOURCE."coupon_type", DBT_INTERNAL_SOURCE."ad_media_name", DBT_INTERNAL_SOURCE."display_provider", DBT_INTERNAL_SOURCE."promotion_cost", DBT_INTERNAL_SOURCE."promotion_begin_date", DBT_INTERNAL_SOURCE."promotion_end_date" ) ``` ###### Using the `merge_update_columns` config parameter[​](#using-the-merge_update_columns-config-parameter "Direct link to using-the-merge_update_columns-config-parameter") The `merge_update_columns` config parameter is passed to only update the columns specified and it accepts a list of table columns. * Source code * Run code vertica\_incremental.sql ```sql {{ config( materialized = 'incremental', incremental_strategy='merge', unique_key = 'id', merge_update_columns = ["names", "salary"] )}} select * from {{ref('seed_tc1')}} ``` vertica\_incremental.sql ```sql merge into "VMart"."public"."test_merge" as DBT_INTERNAL_DEST using "test_merge__dbt_tmp" as DBT_INTERNAL_SOURCE on DBT_INTERNAL_DEST."id" = DBT_INTERNAL_SOURCE."id" when matched then update set "names" = DBT_INTERNAL_SOURCE."names", "salary" = DBT_INTERNAL_SOURCE."salary" when not matched then insert ("id", "names", "salary") values ( DBT_INTERNAL_SOURCE."id", DBT_INTERNAL_SOURCE."names", DBT_INTERNAL_SOURCE."salary" ) ``` **`delete+insert` strategy**: Through the `delete+insert` incremental strategy, you can instruct dbt to use a two-step incremental approach. It will first delete the records detected through the configured `is_incremental()` block and then re-insert them. The `unique_key` is a required parameter for using `delete+instert` strategy which specifies how to update the records when there is duplicate data. The value accepted by this parameter is a single table column. * Source code * Run code vertica\_incremental.sql ```sql {{ config( materialized = 'incremental', incremental_strategy = 'delete+insert', unique_key='date_key' ) }} select * FROM public.date_dimension ``` vertica\_incremental.sql ```sql delete from "VMart"."public"."samp" where ( date_key) in ( select (date_key) from "samp__dbt_tmp" ); insert into "VMart"."public"."samp" ( "date_key", "date", "full_date_description", "day_of_week", "day_number_in_calendar_month", "day_number_in_calendar_year", "day_number_in_fiscal_month", "day_number_in_fiscal_year", "last_day_in_week_indicator", "last_day_in_month_indicator", "calendar_week_number_in_year", "calendar_month_name", "calendar_month_number_in_year", "calendar_year_month", "calendar_quarter", "calendar_year_quarter", "calendar_half_year", "calendar_year", "holiday_indicator", "weekday_indicator", "selling_season") ( select "date_key", "date", "full_date_description", "day_of_week", "day_number_in_calendar_month", "day_number_in_calendar_year", "day_number_in_fiscal_month", "day_number_in_fiscal_year", "last_day_in_week_indicator", "last_day_in_month_indicator", "calendar_week_number_in_year", "calendar_month_name", "calendar_month_number_in_year", "calendar_year_month", "calendar_quarter", "calendar_year_quarter", "calendar_half_year", "calendar_year", "holiday_indicator", "weekday_indicator", "selling_season" from "samp__dbt_tmp" ); ``` **`insert_overwrite` strategy**: The `insert_overwrite` strategy does not use a full-table scan to delete records. Instead of deleting records it drops entire partitions. This strategy may accept `partition_by_string` and `partitions` parameters. You provide these parameters when you want to overwrite a part of the table. `partition_by_string` accepts an expression based on which partitioning of the table takes place. This is the PARTITION BY clause in Vertica. `partitions` accepts a list of values in the partition column. The config parameter `partitions` must be used carefully. Two situations to consider: * Fewer partitions in the `partitions` parameter than in the where clause: destination table ends up with duplicates. * More partitions in the `partitions` parameter than in the where clause: destination table ends up missing rows. Less rows in destination than in source. To understand more about PARTITION BY clause check [here](https://www.vertica.com/docs/9.2.x/HTML/Content/Authoring/SQLReferenceManual/Statements/partition-clause.htm) Note: The `partitions` parameter is optional, if the `partitions` parameter is not provided, the partitions in the where clause will be dropped from destination and inserted back from source. If you use a where clause, you might not need the `partitions` parameter. The where clause condition is also optional, but if not provided then all data in source is inserted in destination. If no where clause condition and no `partitions` parameter are provided, then it drops all partitions from the table and inserts all of them again. If the `partitions` parameter is provided but not where clause is provided, the destination table ends up with duplicates because the partitions in the `partitions` parameter are dropped but all data in the source table (no where clause) is inserted in destination. The `partition_by_string` config parameter is also optional. If no `partition_by_string` parameter is provided, then it behaves like `delete+insert`. It deletes all records from destination and then it inserts all records from source. It won’t use or drop partitions. If both the `partition_by_string` and `partitions` parameters are not provided then `insert_overwrite` strategy truncates the target table and insert the source table data into target. If you want to use `partitions` parameter then you have to partition the table by passing `partition_by_string` parameter. * Source code * Run code vertica\_incremental.sql ```sql {{config(materialized = 'incremental',incremental_strategy = 'insert_overwrite',partition_by_string='YEAR(cc_open_date)',partitions=['2023'])}} select * from online_sales.call_center_dimension ``` vertica\_incremental.sql ```sql select PARTITION_TABLE('online_sales.update_call_center_dimension'); SELECT DROP_PARTITIONS('online_sales.update_call_center_dimension', '2023', '2023'); SELECT PURGE_PARTITION('online_sales.update_call_center_dimension', '2023'); insert into "VMart"."online_sales"."update_call_center_dimension" ("call_center_key", "cc_closed_date", "cc_open_date", "cc_name", "cc_class", "cc_employees", "cc_hours", "cc_manager", "cc_address", "cc_city", "cc_state", "cc_region") ( select "call_center_key", "cc_closed_date", "cc_open_date", "cc_name", "cc_class", "cc_employees", "cc_hours", "cc_manager", "cc_address", "cc_city", "cc_state", "cc_region" from "update_call_center_dimension__dbt_tmp" ); ``` #### Optimization options for table materialization[​](#optimization-options-for-table-materialization "Direct link to Optimization options for table materialization") There are multiple optimizations that can be used when materializing models as tables. Each config parameter applies a Vertica specific clause in the generated `CREATE TABLE` DDL. For more information see [Vertica](https://www.vertica.com/docs/12.0.x/HTML/Content/Authoring/SQLReferenceManual/Statements/CREATETABLE.htm) options for table optimization. You can configure these optimizations in your model SQL file as described in the examples below: ##### Configuring the `ORDER BY` clause[​](#configuring-the-order-by-clause "Direct link to configuring-the-order-by-clause") To leverage the `ORDER BY` clause of the `CREATE TABLE` statement use the `order_by` config param in your model. ###### Using the `order_by` config parameter[​](#using-the-order_by-config-parameter "Direct link to using-the-order_by-config-parameter") * Source code * Run code vertica\_incremental.sql ```sql {{ config( materialized='table', order_by='product_key') }} select * from public.product_dimension ``` vertica\_incremental.sql ```sql create table "VMart"."public"."order_s__dbt_tmp" as ( select * from public.product_dimension) order by product_key; ``` ##### Configuring the `SEGMENTED BY` clause[​](#configuring-the-segmented-by-clause "Direct link to configuring-the-segmented-by-clause") To leverage the `SEGMENTED BY` clause of the `CREATE TABLE` statement, use the `segmented_by_string` or `segmented_by_all_nodes` config parameters in your model. By default ALL NODES are used to segment tables, so the ALL NODES clause in the SQL statement will be added when using `segmented_by_string` config parameter. You can disable ALL NODES using `no_segmentation` parameter. To learn more about segmented by clause check [here](https://www.vertica.com/docs/12.0.x/HTML/Content/Authoring/SQLReferenceManual/Statements/hash-segmentation-clause.htm). ###### Using the `segmented_by_string` config parameter[​](#using-the-segmented_by_string-config-parameter "Direct link to using-the-segmented_by_string-config-parameter") `segmented_by_string` config parameter can be used to segment projection data using a SQL expression like hash segmentation. * Source code * Run code vertica\_incremental.sql ```sql {{ config( materialized='table', segmented_by_string='product_key' ) }} select * from public.product_dimension ``` vertica\_incremental.sql ```sql create table "VMart"."public"."segmented_by__dbt_tmp" as (select * from public.product_dimension) segmented by product_key ALL NODES; ``` ###### Using the `segmented_by_all_nodes` config parameter[​](#using-the-segmented_by_all_nodes-config--parameter "Direct link to using-the-segmented_by_all_nodes-config--parameter") `segmented_by_all_nodes` config parameter can be used to segment projection data for distribution across all cluster nodes. Note: If you want to pass `segmented_by_all_nodes` parameter then you have to segment the table by passing `segmented_by_string` parameter. * Source code * Run code vertica\_incremental.sql ```sql {{ config( materialized='table', segmented_by_string='product_key' ,segmented_by_all_nodes='True' ) }} select * from public.product_dimension ``` vertica\_incremental.sql ```sql create table "VMart"."public"."segmented_by__dbt_tmp" as (select * from public.product_dimension) segmented by product_key ALL NODES; ``` ##### Configuring the UNSEGMENTED ALL NODES clause[​](#configuring-the-unsegmented-all-nodes-clause "Direct link to Configuring the UNSEGMENTED ALL NODES clause") To leverage the`UNSEGMENTED ALL NODES` clause of the `CREATE TABLE` statement, use the `no_segmentation` config parameters in your model. ###### Using the `no_segmentation` config parameter[​](#using-the-no_segmentation-config-parameter "Direct link to using-the-no_segmentation-config-parameter") * Source code * Run code vertica\_incremental.sql ```sql {{config(materialized='table',no_segmentation='true')}} select * from public.product_dimension ``` vertica\_incremental.sql ```sql create table "VMart"."public"."ww__dbt_tmp" INCLUDE SCHEMA PRIVILEGES as ( select * from public.product_dimension ) UNSEGMENTED ALL NODES ; ``` ##### Configuring the `PARTITION BY` clause[​](#configuring-the-partition-by-clause "Direct link to configuring-the-partition-by-clause") To leverage the `PARTITION BY` clause of the `CREATE TABLE` statement, use the `partition_by_string`, `partition_by_active_count` or the `partition_by_group_by_string` config parameters in your model. To learn more about partition by clause check [here](https://www.vertica.com/docs/9.2.x/HTML/Content/Authoring/SQLReferenceManual/Statements/partition-clause.htm) ###### Using the `partition_by_string` config parameter[​](#using-the-partition_by_string-config-parameter "Direct link to using-the-partition_by_string-config-parameter") `partition_by_string` (optinal) accepts a string value of a any one specific `column_name` based on which partitioning of the table data takes place. * Source code * Run code vertica\_incremental.sql ```sql {{ config( materialized='table', partition_by_string='employee_age' )}} select * FROM public.employee_dimension ``` vertica\_incremental.sql ```sql create table "VMart"."public"."test_partition__dbt_tmp" as ( select * FROM public.employee_dimension); alter table "VMart"."public"."test_partition__dbt_tmp" partition BY employee_age ``` ###### Using the `partition_by_active_count` config parameter[​](#using-the-partition_by_active_count-config-parameter "Direct link to using-the-partition_by_active_count-config-parameter") `partition_by_active_count` (optional) specifies how many partitions are active for this table. It accepts an integer value. Note: If you want to pass `partition_by_active_count` parameter then you have to partition the table by passing `partition_by_string` parameter. * Source code * Run code vertica\_incremental.sql ```sql {{ config( materialized='table', partition_by_string='employee_age', partition_by_group_by_string=""" CASE WHEN employee_age < 5 THEN 1 WHEN employee_age>50 THEN 2 ELSE 3 END""", partition_by_active_count = 2) }} select * FROM public.employee_dimension ``` vertica\_incremental.sql ```sql create table "VMart"."public"."test_partition__dbt_tmp" as ( select * FROM public.employee_dimension ); alter table "VMart"."public"."test_partition__dbt_tmp" partition BY employee_ag group by CASE WHEN employee_age < 5 THEN 1 WHEN employee_age>50 THEN 2 ELSE 3 END SET ACTIVEPARTITIONCOUNT 2 ; ``` ###### Using the `partition_by_group_by_string` config parameter[​](#using-the-partition_by_group_by_string-config-parameter "Direct link to using-the-partition_by_group_by_string-config-parameter") `partition_by_group_by_string` parameter(optional) accepts a string, in which user should specify each group cases as a single string. This is derived from the `partition_by_string` value. `partition_by_group_by_string` parameter is used to merge partitions into separate partition groups. Note: If you want to pass `partition_by_group_by_string` parameter then you have to partition the table by passing `partition_by_string` parameter. * Source code * Run code vertica\_incremental.sql ```sql {{config(materialized='table', partition_by_string='number_of_children', partition_by_group_by_string=""" CASE WHEN number_of_children <= 2 THEN 'small_family' ELSE 'big_family' END""")}} select * from public.customer_dimension ``` vertica\_incremental.sql ```sql create table "VMart"."public"."test_partition__dbt_tmp" INCLUDE SCHEMA PRIVILEGES as ( select * from public.customer_dimension ) ; alter table "VMart"."public"."test_partition__dbt_tmp" partition BY number_of_children group by CASE WHEN number_of_children <= 2 THEN 'small_family' ELSE 'big_family' END ; ``` ##### Configuring the KSAFE clause[​](#configuring-the-ksafe-clause "Direct link to Configuring the KSAFE clause") To leverage the `KSAFE` clause of the `CREATE TABLE` statement, use the `ksafe` config parameter in your model. * Source code * Run code vertica\_incremental.sql ```sql {{ config( materialized='table', ksafe='1' ) }} select * from public.product_dimension ``` vertica\_incremental.sql ```sql create table "VMart"."public"."segmented_by__dbt_tmp" as (select * from public.product_dimension ) ksafe 1; ``` #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### volatility 💡Did you know... Available from dbt v 1.11 or with the [dbt "Latest" release track](https://docs.getdbt.com/docs/dbt-versions/cloud-release-tracks.md). functions/\.yml ```yml functions: - name: config: volatility: deterministic | stable | non-deterministic ``` #### Definition[​](#definition "Direct link to Definition") You can optionally use the [`volatility` config](https://docs.getdbt.com/reference/resource-configs/volatility.md) for SQL or Python UDFs to describe how predictable the function output is by using `deterministic`, `stable`, or `non-deterministic`. Warehouses use this information to decide if results can be cached, reordered, or inlined. Setting the appropriate volatility helps prevent incorrect results when a function isn’t safe to cache or reorder. For example: * A function that returns a random number (`random()`) should be set as `non-deterministic` because its output changes every time it’s called. * A function that returns today’s date (`current_date()`) is `stable`; its value remains consistent within a single query execution but may change between queries. If it were configured as `deterministic`, a warehouse might incorrectly cache the value and reuse it on subsequent days. By default, dbt does not specify a volatility value. If you don’t set volatility, dbt generates a `CREATE` statement without a volatility keyword, and the warehouse’s default behavior applies — except in Redshift. In Redshift, dbt sets `non-deterministic` (`VOLATILE`) by default if no volatility is specified, because Redshift requires an explicit volatility and `VOLATILE` is the safest assumption.  Warehouse-specific volatility keywords Different warehouses show volatility controls with different keywords and default values: | Warehouse | `deterministic` | `stable` | `non-deterministic` | | --------------------------------------------------------------------------------------------------------------------------------- | --------------- | ------------- | ------------------------------------------- | | [Snowflake](https://docs.snowflake.com/en/sql-reference/sql/create-function#sql-handler) | `IMMUTABLE` | Not supported | `VOLATILE` (default) | | [Redshift](https://docs.aws.amazon.com/redshift/latest/dg/r_CREATE_FUNCTION.html#r_CREATE_FUNCTION-synopsis) | `IMMUTABLE` | `STABLE` | `VOLATILE` (default) | | [Databricks](https://docs.databricks.com/aws/en/udf/unity-catalog#set-deterministic-if-your-function-produces-consistent-results) | `DETERMINISTIC` | Not supported | Assumed `non-deterministic` unless declared | | [Postgres](https://www.postgresql.org/docs/current/xfunc-volatility.html) | `IMMUTABLE` | `STABLE` | `VOLATILE` (default) | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | BigQuery does not support explicitly setting volatility. Instead, BigQuery infers volatility based on the functions and expressions used within the UDF. #### Supported volatility types[​](#supported-volatility-types "Direct link to Supported volatility types") In dbt, you can use the following values for the `volatility` config: | Value | Description | Example | | ------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | `deterministic` | Always returns the same output for the same input. Safe for aggressive optimizations and caching. | `substr()` — Produces the same substring when given the same string and parameters. | | `stable` | Returns the same value within a single query execution, but may change across executions. Not supported by all warehouses. For more information, see [Warehouse-specific volatility keywords](https://docs.getdbt.com/reference/resource-configs/volatility.md#warehouse-specific-volatility-keywords). | `now()` — Returns the current timestamp the moment a query starts; constant within a single query but different across runs. | | `non-deterministic` | May return different results for the same inputs. Warehouses shouldn't cache or reorder assuming stable results. | `first()` — May return different rows depending on query plan or ordering.
`random()` — Produces a random number that varies with each call, even with identical inputs. | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | #### Example[​](#example "Direct link to Example") In this example, we're using the `deterministic` volatility for the `is_positive_int` function: functions/schema.yml ```yaml functions: - name: is_positive_int description: Check whether a string is a positive integer config: volatility: deterministic # Optional: stable | non-deterministic | deterministic arguments: - name: a_string data_type: string returns: data_type: boolean ``` #### Related documentation[​](#related-documentation "Direct link to Related documentation") * [User-defined functions](https://docs.getdbt.com/docs/build/udfs.md) * [Function properties](https://docs.getdbt.com/reference/function-properties.md) * [Function configurations](https://docs.getdbt.com/reference/function-configs.md) * [Type](https://docs.getdbt.com/reference/resource-configs/type.md) * [Arguments](https://docs.getdbt.com/reference/resource-properties/function-arguments.md) * [Returns](https://docs.getdbt.com/reference/resource-properties/returns.md) #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Warnings Use the --warn-error flag to promote all warnings to errors or --warn-error-options for granular control through options. #### Use `--warn-error` to promote all warnings to errors[​](#use---warn-error-to-promote-all-warnings-to-errors "Direct link to use---warn-error-to-promote-all-warnings-to-errors") Enabling `WARN_ERROR` config or setting the `--warn-error` flag will convert *all* dbt warnings into errors. Any time dbt would normally warn, it will instead raise an error. Examples include `--select` criteria that selects no resources, deprecations, configurations with no associated models, invalid test configurations, or tests and freshness checks that are configured to return warnings. Usage ```text dbt run --warn-error ``` Proceed with caution in production environments Using the `--warn-error` flag or `--warn-error-options '{"error": "all"}'` will treat *all* current and future warnings as errors. This means that if a new warning is introduced in a future version of dbt Core, your production job may start failing unexpectedly. We recommend proceeding with caution when doing this in production environments, and explicitly listing only the warnings you want to treat as errors in production. #### Use `--warn-error-options` for targeted warnings[​](#use---warn-error-options-for-targeted-warnings "Direct link to use---warn-error-options-for-targeted-warnings") In some cases, you may want to convert *all* warnings to errors. However, when you want *some* warnings to stay as warnings and only promote or silence specific warnings you can instead use `--warn-error-options`. The `WARN_ERROR_OPTIONS` config or `--warn-error-options` flag gives you more granular control over *exactly which types of warnings* are treated as errors. `WARN_ERROR` and `WARN_ERROR_OPTIONS` are mutually exclusive `WARN_ERROR` and `WARN_ERROR_OPTIONS` are mutually exclusive. You can only specify one, even when you're specifying the config in multiple places (like env var or a flag), otherwise, you'll see a usage error. Warnings that should be treated as errors can be specified through `error` parameter. Warning names can be found in: * [dbt-core's types.py file](https://github.com/dbt-labs/dbt-core/blob/main/core/dbt/events/types.py), where each class name that inherits from `WarnLevel` corresponds to a warning name (e.g. `AdapterDeprecationWarning`, `NoNodesForSelectionCriteria`). * Using the `--log-format json` flag. The `error` parameter can be set to `"all"` or `"*"` to treat all warnings as errors (this behavior is the same as using the `--warn-error` flag), or to a list of specific warning names to treat as exceptions. * When `error` is set to `"all"` or `"*"`, the optional `warn` parameter can be set to exclude specific warnings from being treated as exceptions. * Use the `silence` parameter to ignore warnings. To silence certain warnings you want to ignore, you can specify them in the `silence` parameter. This is useful in large projects where certain warnings aren't critical and can be ignored to keep the noise low and logs clean. Here's how you can use the [`--warn-error-options`](#use---warn-error-options-for-targeted-warnings) flag to promote *specific* warnings to errors: * [Test warnings](https://docs.getdbt.com/reference/resource-configs/severity.md) with the `--warn-error-options '{"error": ["LogTestResult"]}'` flag. * Jinja [exception warnings](https://docs.getdbt.com/reference/dbt-jinja-functions/exceptions.md#warn) with `--warn-error-options '{"error": ["JinjaLogWarning"]}'`. * No nodes selected with `--warn-error-options '{"error": ["NoNodesForSelectionCriteria"]}'`. * Deprecation warnings with `--warn-error-options '{"error": ["Deprecations"]}'` (new in v1.10). ##### Configuration[​](#configuration "Direct link to Configuration") You can configure warnings as errors or which warnings to silence, by warn error options through command flag, environment variable, or `dbt_project.yml`. You can choose to: * Promote all warnings to errors using `{"error": "all"}` or `--warn-error` flag. * Promote specific warnings to errors using `error` and optionally exclude others from being treated as errors with `--warn-error-options` flag. `warn` tells dbt to continue treating the warnings as warnings. * Ignore warnings using `silence` with `--warn-error-options` flag. In the following example, we're silencing the [`NoNodesForSelectionCriteria` warning](https://github.com/dbt-labs/dbt-core/blob/main/core/dbt/events/types.py#L1227) in the `dbt_project.yml` file by adding it to the `silence` parameter: dbt\_project.yml ```yaml ... flags: warn_error_options: error: # Previously called "include" warn: # Previously called "exclude" silence: # To silence or ignore warnings - NoNodesForSelectionCriteria ``` ##### Examples[​](#examples "Direct link to Examples") Here are some examples that show you how to configure `warn_error_options` using flags or file-based configuration. ###### Target specific warnings[​](#target-specific-warnings "Direct link to Target specific warnings") Some of the examples use `NoNodesForSelectionCriteria`, which is a specific warning that occurs when your `--select` flag doesn't match any nodes/resources in your dbt project: * This command promotes all warnings to errors, except for `NoNodesForSelectionCriteria`: ```text dbt run --warn-error-options '{"error": "all", "warn": ["NoNodesForSelectionCriteria"]}' ``` * This command promotes all warnings to errors, except for deprecation warnings: ```text dbt run --warn-error-options '{"error": "all", "warn": ["Deprecations"]}' ``` * This command promotes only `NoNodesForSelectionCriteria` as an error: ```text dbt run --warn-error-options '{"error": ["NoNodesForSelectionCriteria"]}' ``` * This promotes only `NoNodesForSelectionCriteria` as an error, using an environment variable: ```text DBT_WARN_ERROR_OPTIONS='{"error": ["NoNodesForSelectionCriteria"]}' dbt run ``` Values for `error`, `warn`, and/or `silence` should be passed on as arrays. For example, `dbt run --warn-error-options '{"error": "all", "warn": ["NoNodesForSelectionCriteria"]}'` not `dbt run --warn-error-options '{"error": "all", "warn": "NoNodesForSelectionCriteria"}'`. The following example shows how to promote all warnings to errors, except for the `NoNodesForSelectionCriteria` warning using the `silence` and `warn` parameters in the `dbt_project.yml` file: dbt\_project.yml ```yaml ... flags: warn_error_options: error: all # Previously called "include" warn: # Previously called "exclude" - NoNodesForSelectionCriteria silence: # To silence or ignore warnings - NoNodesForSelectionCriteria ``` ###### Promote all warnings to errors[​](#promote-all-warnings-to-errors "Direct link to Promote all warnings to errors") Some examples of how to promote all warnings to errors: ###### using dbt command flags[​](#using-dbt-command-flags "Direct link to using dbt command flags") ```bash dbt run --warn-error dbt run --warn-error-options '{"error": "all"}' dbt run --warn-error-options '{"error": "*"}' ``` ###### using environment variables[​](#using-environment-variables "Direct link to using environment variables") ```bash WARN_ERROR=true dbt run DBT_WARN_ERROR_OPTIONS='{"error": "all"}' dbt run DBT_WARN_ERROR_OPTIONS='{"error": "*"}' dbt run ``` caution Note, using `warn_error_options: error: "all"` will treat all current and future warnings as errors. This means that if a new warning is introduced in a future version of dbt Core, your production job may start failing unexpectedly. We recommend proceeding with caution when doing this in production environments, and explicitly listing only the warnings you want to treat as errors in production. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### where ##### Definition[​](#definition "Direct link to Definition") Filter the resource being tested (model, source, seed, or snapshot). The `where` condition is templated into the test query by replacing the resource reference with a subquery. For instance, a `not_null` test may look like: ```sql select * from my_model where my_column is null ``` If the `where` config is set to `where date_column = current_date`, then the test query will be updated to: ```sql select * from (select * from my_model where date_column = current_date) dbt_subquery where my_column is null ``` ##### Examples[​](#examples "Direct link to Examples") * Specific test * One-off test * Generic test block * Project level Configure a specific instance of a generic (schema) test: models/\.yml ```yaml models: - name: large_table columns: - name: my_column data_tests: - accepted_values: arguments: # available in v1.10.5 and higher. Older versions can set the as the top-level property. values: ["a", "b", "c"] config: where: "date_column = current_date" - name: other_column data_tests: - not_null: config: where: "date_column < current_date" ``` This config is ignored for one-off tests. Set the default for all instances of a generic (schema) test, by setting the config inside its test block (definition): macros/\.sql ```sql {% test (model, column_name) %} {{ config(where = "date_column = current_date") }} select ... {% endtest %} ``` Set the default for all tests in a package or project: dbt\_project.yml ```yaml data_tests: +where: "date_column = current_date" : +where: > date_column = current_date and another_column is not null ``` ##### Custom logic[​](#custom-logic "Direct link to Custom logic") The rendering context for the `where` config is the same as for all configurations defined in `.yml` files. You have access to `{{ var() }}` and `{{ env_var() }}`, but you **do not** have access to custom macros for setting this config. If you do want to use custom macros to template out the `where` filter for certain tests, there is a workaround. dbt defines a `get_where_subquery` macro. dbt replaces `{{ model }}` in generic test definitions with `{{ get_where_subquery(relation) }}`, where `relation` is a `ref()` or `source()` for the resource being tested. The default implementation of this macro returns: * `{{ relation }}` when the `where` config is not defined (`ref()` or `source()`) * `(select * from {{ relation }} where {{ where }}) dbt_subquery` when the `where` config is defined You can override this behavior by: * Defining a custom `get_where_subquery` in your root project * Defining a custom `__get_where_subquery` [dispatch candidate](https://docs.getdbt.com/reference/dbt-jinja-functions/dispatch.md) in your package or adapter plugin Within this macro definition, you can reference whatever custom macros you want, based on static inputs from the configuration. At simplest, this enables you to DRY up code that you'd otherwise need to repeat across many different `.yml` files. Because the `get_where_subquery` macro is resolved at runtime, your custom macros can also include [fetching the results of introspective database queries](https://docs.getdbt.com/reference/dbt-jinja-functions/run_query.md). ###### Example[​](#example "Direct link to Example") Filter your test to the past N days of data, using dbt's cross-platform [`dateadd()`](https://docs.getdbt.com/reference/dbt-jinja-functions/cross-database-macros.md#dateadd) utility macro. You can set the number of days in the placeholder string. models/config.yml ```yml models: - name: my_model columns: - name: id data_tests: - unique: config: where: "date_column > __3_days_ago__" # placeholder string for static config ``` macros/custom\_get\_where\_subquery.sql ```sql {% macro get_where_subquery(relation) -%} {% set where = config.get('where') %} {% if where %} {% if "_days_ago__" in where %} {# replace placeholder string with result of custom macro #} {% set where = replace_days_ago(where) %} {% endif %} {%- set filtered -%} (select * from {{ relation }} where {{ where }}) dbt_subquery {%- endset -%} {% do return(filtered) %} {%- else -%} {% do return(relation) %} {%- endif -%} {%- endmacro %} {% macro replace_days_ago(where_string) %} {# Use regex to search the pattern for the number days #} {# Default to 3 days when no number found #} {% set re = modules.re %} {% set days = 3 %} {% set pattern = '__(\d+)_days_ago__' %} {% set match = re.search(pattern, where_string) %} {% if match %} {% set days = match.group(1) | int %} {% endif %} {% set n_days_ago = dbt.dateadd('day', -days, current_timestamp()) %} {% set result = re.sub(pattern, n_days_ago, where_string) %} {{ return(result) }} {% endmacro %} ``` #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### YAML Selectors Write resource selectors in YAML, save them with a human-friendly name, and reference them using the `--selector` flag. By recording selectors in a top-level `selectors.yml` file: * **Legibility:** complex selection criteria are composed of dictionaries and arrays * **Version control:** selector definitions are stored in the same git repository as the dbt project * **Reusability:** selectors can be referenced in multiple job definitions, and their definitions are extensible (via YAML anchors) Selectors live in a top-level file named `selectors.yml`. Each must have a `name` and a `definition`, and can optionally define a `description` and [`default` flag](#default). selectors.yml ```yml selectors: - name: nodes_to_joy definition: ... - name: nodes_to_a_grecian_urn description: Attic shape with a fair attitude default: true definition: ... ``` #### Definitions[​](#definitions "Direct link to Definitions") Each `definition` is comprised of one or more arguments, which can be one of the following: * **CLI-style:** strings, representing CLI-style arguments * **Key-value:** pairs in the form `method: value` * **Full YAML:** fully specified dictionaries with items for `method`, `value`, operator-equivalent keywords, and support for `exclude` Use the `union` and `intersection` operator-equivalent keywords to organize multiple arguments. ##### CLI-style[​](#cli-style "Direct link to CLI-style") ```yml definition: 'tag:nightly' ``` This simple syntax supports use of the `+`, `@`, and `*` [graph](https://docs.getdbt.com/reference/node-selection/graph-operators.md) operators, [set](https://docs.getdbt.com/reference/node-selection/set-operators.md) operators, and `exclude`. ##### Key-value[​](#key-value "Direct link to Key-value") ```yml definition: tag: nightly ``` This simple syntax does not support any [graph](https://docs.getdbt.com/reference/node-selection/graph-operators.md) or [set](https://docs.getdbt.com/reference/node-selection/set-operators.md) operators or `exclude`. ##### Full YAML[​](#full-yaml "Direct link to Full YAML") This is the most thorough syntax, which can include the operator-equivalent keywords for [graph](https://docs.getdbt.com/reference/node-selection/graph-operators.md) and [set](https://docs.getdbt.com/reference/node-selection/set-operators.md) operators. Review [methods](https://docs.getdbt.com/reference/node-selection/methods.md) for the available list. ```yml definition: method: tag value: nightly # Optional keywords map to the `+` and `@` graph operators: children: true | false parents: true | false children_depth: 1 # if children: true, degrees to include parents_depth: 1 # if parents: true, degrees to include childrens_parents: true | false # @ operator indirect_selection: eager | cautious | buildable | empty # include all tests selected indirectly? eager by default ``` The `*` operator to select all nodes can be written as: ```yml definition: method: fqn value: "*" ``` ###### Exclude[​](#exclude "Direct link to Exclude") The `exclude` keyword is only supported by fully-qualified dictionaries. It may be passed as an argument to each dictionary, or as an item in a `union`. The following are equivalent: ```yml - method: tag value: nightly exclude: - "@tag:daily" ``` ```yml - union: - method: tag value: nightly - exclude: - method: tag value: daily ``` Note: The `exclude` argument in YAML selectors is subtly different from the `--exclude` CLI argument. Here, `exclude` *always* returns a [set difference](https://en.wikipedia.org/wiki/Complement_\(set_theory\)), and it is always applied *last* within its scope. When more than one "yeslist" (`--select`) is passed, they are treated as a [union](https://docs.getdbt.com/reference/node-selection/set-operators.md#unions) rather than an [intersection](https://docs.getdbt.com/reference/node-selection/set-operators.md#intersections). Same thing when there is more than one "nolist" (`--exclude`). ###### Indirect selection[​](#indirect-selection "Direct link to Indirect selection") As a general rule, dbt will indirectly select *all* tests if they touch *any* resource that you're selecting directly. We call this "eager" indirect selection. You can optionally switch the indirect selection mode to "cautious", "buildable", or "empty" by setting `indirect_selection` for a specific criterion: ```yml - union: - method: fqn value: model_a indirect_selection: eager # default: will include all tests that touch model_a - method: fqn value: model_b indirect_selection: cautious # will not include tests touching model_b # if they have other unselected parents - method: fqn value: model_c indirect_selection: buildable # will not include tests touching model_c # if they have other unselected parents (unless they have an ancestor that is selected) - method: fqn value: model_d indirect_selection: empty # will include tests for only the selected node and ignore all tests attached to model_d ``` If provided, a YAML selector's `indirect_selection` value will take precedence over the CLI flag `--indirect-selection`. Because `indirect_selection` is defined separately for *each* selection criterion, it's possible to mix eager/cautious/buildable/empty modes within the same definition, to achieve the exact behavior that you need. Remember that you can always test out your critiera with `dbt ls --selector`. See [test selection examples](https://docs.getdbt.com/reference/node-selection/test-selection-examples.md) for more details about indirect selection. #### Example[​](#example "Direct link to Example") Here are two ways to represent: ```bash $ dbt run --select @source:snowplow,tag:nightly models/export --exclude package:snowplow,config.materialized:incremental export_performance_timing ``` * CLI-style * Full YML selectors.yml ```yml selectors: - name: nightly_diet_snowplow description: "Non-incremental Snowplow models that power nightly exports" definition: # Optional `union` and `intersection` keywords map to the ` ` and `,` set operators: union: - intersection: - '@source:snowplow' - 'tag:nightly' - 'models/export' - exclude: - intersection: - 'package:snowplow' - 'config.materialized:incremental' - export_performance_timing ``` selectors.yml ```yml selectors: - name: nightly_diet_snowplow description: "Non-incremental Snowplow models that power nightly exports" definition: # Optional `union` and `intersection` keywords map to the ` ` and `,` set operators: union: - intersection: - method: source value: snowplow childrens_parents: true - method: tag value: nightly - method: path value: models/export - exclude: - intersection: - method: package value: snowplow - method: config.materialized value: incremental - method: fqn value: export_performance_timing ``` Then in our job definition: ```bash dbt run --selector nightly_diet_snowplow ``` #### Default[​](#default "Direct link to Default") Selectors may define a boolean `default` property. If a selector has `default: true`, dbt will use this selector's criteria when tasks do not define their own selection criteria. Let's say we define a default selector that only selects resources defined in our root project: ```yml selectors: - name: root_project_only description: > Only resources from the root project. Excludes resources defined in installed packages. default: true definition: method: package value: ``` If I run an "unqualified" command, dbt will use the selection criteria defined in `root_project_only`—that is, dbt will only build / freshness check / generate compiled SQL for resources defined in my root project. ```text dbt build dbt source freshness dbt docs generate ``` If I run a command that defines its own selection criteria (via `--select`, `--exclude`, or `--selector`), dbt will ignore the default selector and use the flag criteria instead. It will not try to combine the two. ```bash dbt run --select "model_a" dbt run --exclude model_a ``` Only one selector may set `default: true` for a given invocation; otherwise, dbt will return an error. You may use a Jinja expression to adjust the value of `default` depending on the environment, however: ```yml selectors: - name: default_for_dev default: "{{ target.name == 'dev' | as_bool }}" definition: ... - name: default_for_prod default: "{{ target.name == 'prod' | as_bool }}" definition: ... ``` ##### Selector inheritance[​](#selector-inheritance "Direct link to Selector inheritance") Selectors can reuse and extend definitions from other selectors, via the `selector` method. ```yml selectors: - name: foo_and_bar definition: intersection: - tag: foo - tag: bar - name: foo_bar_less_buzz definition: intersection: # reuse the definition from above - method: selector value: foo_and_bar # with a modification! - exclude: - method: tag value: buzz ``` **Note:** While selector inheritance allows the logic from another selector to be *reused*, it doesn't allow the logic from that selector to be *modified* by means of `parents`, `children`, `indirect_selection`, and so on. The `selector` method returns the complete set of nodes returned by the named selector. #### Difference between `--select` and `--selector`[​](#difference-between---select-and---selector "Direct link to difference-between---select-and---selector") In dbt, [`select`](https://docs.getdbt.com/reference/node-selection/syntax.md#how-does-selection-work) and `selector` are related concepts used for choosing specific models, tests, or resources. The following tables explains the differences and when to best use them: | Feature | `--select` | `--selector` | | ----------- | ---------------------------------------------------------------------------------------------------------- | --------------------------------------------------------------------------------------------------------------------------------- | | Definition | Ad-hoc, specified directly in the command. | Pre-defined in `selectors.yml` file. | | Usage | One-time or task-specific filtering. | Reusable for multiple executions. | | Complexity | Requires manual entry of selection criteria. | Can encapsulate complex logic for reuse. | | Flexibility | Flexible; less reusable. | Flexible; focuses on reusable and structured logic. | | Example | `dbt run --select my_model+`
(runs `my_model` and all downstream dependencies with the `+` operator). | `dbt run --selector nightly_diet_snowplow`
(runs models defined by the `nightly_diet_snowplow` selector in `selectors.yml`). | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | Notes: * You can combine `--select` with `--exclude` for ad-hoc selection of nodes. * The `--select` and `--selector` syntax both provide the same overall functions for node selection. Using [graph operators](https://docs.getdbt.com/reference/node-selection/graph-operators.md) (such as `+`, `@`.) and [set operators](https://docs.getdbt.com/reference/node-selection/set-operators.md) (such as `union` and `intersection`) in `--select` is the same as YAML-based configs in `--selector`. For additional examples, check out [this GitHub Gist](https://gist.github.com/jeremyyeo/1aeca767e2a4f157b07955d58f8078f7). #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Yellowbrick configurations #### Incremental materialization strategies[​](#incremental-materialization-strategies "Direct link to Incremental materialization strategies") The dbt-yellowbrick adapter supports the following incremental materialization strategies: * `append` (default when `unique_key` is not defined) * `delete+insert` (default when `unique_key` is defined) All of these strategies are inherited from the dbt-postgres adapter. #### Performance optimizations[​](#performance-optimizations "Direct link to Performance optimizations") To improve query performance, tables in Yellowbrick Data support several optimizations that can be defined as model-level configurations in dbt. These will be applied to `CREATE TABLE` DDL statements generated at compile or run time. Note that these settings will have no effect on models set to `view` or `ephemeral`. dbt-yellowbrick supports the following Yellowbrick-specific features when defining tables: * `dist` - applies a single-column distribution key, or sets the distribution to `RANDOM` or `REPLICATE` * `sort_col` - applies the `SORT ON (column)` clause that names a single column to sort on before data is stored on media * `cluster_cols` - applies the `CLUSTER ON (column, column, ...)` clause that names up to four columns to cluster on before data is stored on the media A table that has sorted or clustered columns facilitates the skipping of blocks when tables are scanned with restrictions applied in the query. Further details can be found in the [Yellowbrick Data Warehouse](https://docs.yellowbrick.com/latest/ybd_sqlref/clustered_tables.html#clustered-tables) documentation. ##### Some example model configurations[​](#some-example-model-configurations "Direct link to Some example model configurations") * `DISTRIBUTE REPLICATE` with a `SORT` column... ```sql {{ config( materialized = "table", dist = "replicate", sort_col = "stadium_capacity" ) }} select hash(stg.name) as team_key , stg.name as team_name , stg.nickname as team_nickname , stg.city as home_city , stg.stadium as stadium_name , stg.capacity as stadium_capacity , stg.avg_att as average_game_attendance , current_timestamp as md_create_timestamp from {{ source('premdb_public','team') }} stg where stg.name is not null ``` gives the following model output: ```sql create table if not exists marts.dim_team as ( select hash(stg.name) as team_key , stg.name as team_name , stg.nickname as team_nickname , stg.city as home_city , stg.stadium as stadium_name , stg.capacity as stadium_capacity , stg.avg_att as average_game_attendance , current_timestamp as md_create_timestamp from premdb.public.team stg where stg.name is not null ) distribute REPLICATE sort on (stadium_capacity); ```
* `DISTRIBUTE` on a single column and define up to four `CLUSTER` columns... ```sql {{ config( materialized = 'table', dist = 'match_key', cluster_cols = ['season_key', 'match_date_key', 'home_team_key', 'away_team_key'] ) }} select hash(concat_ws('||', lower(trim(s.season_name)), translate(left(m.match_ts,10), '-', ''), lower(trim(h."name")), lower(trim(a."name")))) as match_key , hash(lower(trim(s.season_name))) as season_key , cast(translate(left(m.match_ts,10), '-', '') as integer) as match_date_key , hash(lower(trim(h."name"))) as home_team_key , hash(lower(trim(a."name"))) as away_team_key , m.htscore , split_part(m.htscore, '-', 1) as home_team_goals_half_time , split_part(m.htscore , '-', 2) as away_team_goals_half_time , m.ftscore , split_part(m.ftscore, '-', 1) as home_team_goals_full_time , split_part(m.ftscore, '-', 2) as away_team_goals_full_time from {{ source('premdb_public','match') }} m inner join {{ source('premdb_public','team') }} h on (m.htid = h.htid) inner join {{ source('premdb_public','team') }} a on (m.atid = a.atid) inner join {{ source('premdb_public','season') }} s on (m.seasonid = s.seasonid) ``` gives the following model output: ```sql create table if not exists marts.fact_match as ( select hash(concat_ws('||', lower(trim(s.season_name)), translate(left(m.match_ts,10), '-', ''), lower(trim(h."name")), lower(trim(a."name")))) as match_key , hash(lower(trim(s.season_name))) as season_key , cast(translate(left(m.match_ts,10), '-', '') as integer) as match_date_key , hash(lower(trim(h."name"))) as home_team_key , hash(lower(trim(a."name"))) as away_team_key , m.htscore , split_part(m.htscore, '-', 1) as home_team_goals_half_time , split_part(m.htscore , '-', 2) as away_team_goals_half_time , m.ftscore , split_part(m.ftscore, '-', 1) as home_team_goals_full_time , split_part(m.ftscore, '-', 2) as away_team_goals_full_time from premdb.public.match m inner join premdb.public.team h on (m.htid = h.htid) inner join premdb.public.team a on (m.atid = a.atid) inner join premdb.public.season s on (m.seasonid = s.seasonid) ) distribute on (match_key) cluster on (season_key, match_date_key, home_team_key, away_team_key); ``` #### Cross-database materializations[​](#cross-database-materializations "Direct link to Cross-database materializations") Yellowbrick supports cross-database queries and the dbt-yellowbrick adapter will permit cross-database reads into a specific target on the same appliance instance. #### Limitations[​](#limitations "Direct link to Limitations") This initial implementation of the dbt adapter for Yellowbrick Data Warehouse may not support some use cases. We strongly advise validating all records or transformations resulting from the adapter output. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ## Search ### Search the documentation [Skip to main content](#__docusaurus_skipToContent_fallback) [Join our free, Fast track to dbt workshop on April 7 or 8. Build and run your first dbt models!](https://www.getdbt.com/resources/webinars/fast-track-to-dbt-workshop/?utm_medium=internal\&utm_source=docs\&utm_campaign=q1-2027_fast-track-dbt-workshop_aw\&utm_content=____\&utm_term=all_all__) [![dbt Logo](/img/dbt-logo.svg?v=2)![dbt Logo](/img/dbt-logo-light.svg?v=2)](https://docs.getdbt.com/index.md) [Version: v](#) * FusionCoreAll * [dbt Fusion engine (Latest)]() [Docs](#) * [Product docs](https://docs.getdbt.com/docs/introduction.md) * [References](https://docs.getdbt.com/reference/references-overview.md) * [Best practices](https://docs.getdbt.com/best-practices.md) * [Developer blog](https://docs.getdbt.com/blog) [Guides](https://docs.getdbt.com/guides.md)[APIs](https://docs.getdbt.com/docs/dbt-cloud-apis/overview.md) [Help](#) * [Release notes](https://docs.getdbt.com/docs/dbt-versions/dbt-cloud-release-notes.md) * [FAQs](https://docs.getdbt.com/docs/faqs.md) * [Support and billing](https://docs.getdbt.com/docs/dbt-support.md) * [Fusion Diaries](https://github.com/dbt-labs/dbt-fusion/discussions/categories/announcements) * [Courses](https://learn.getdbt.com) [Community](#) * [Join the dbt Community](https://docs.getdbt.com/community/join.md) * [Become a contributor](https://docs.getdbt.com/community/contribute.md) * [Community forum](https://docs.getdbt.com/community/forum) * [Events](https://docs.getdbt.com/community/events) * [Spotlight](https://docs.getdbt.com/community/spotlight.md) [Account](#) * [Log in to dbt](https://cloud.getdbt.com/) * [Create a free account](https://www.getdbt.com/signup) [Install VS Code extension](https://marketplace.visualstudio.com/items?itemName=dbtLabsInc.dbt) Search Type your search here [](https://www.algolia.com/) Get started #### Start building with dbt. The free dbt VS Code extension is the best way to develop locally with the dbt Fusion Engine. [Install free extension](https://marketplace.visualstudio.com/items?itemName=dbtLabsInc.dbt) [Request your demo](https://www.getdbt.com/contact) [![dbt Labs](/img/dbt-logo-light.svg?v=2)](https://docs.getdbt.com/index.md) ###### Resources [VS Code Extension](https://docs.getdbt.com/docs/about-dbt-extension.md) [Resource Hub](https://www.getdbt.com/resources) [dbt Learn](https://www.getdbt.com/dbt-learn) [Certification](https://www.getdbt.com/dbt-certification) [Developer Blog](https://docs.getdbt.com/blog) ###### Community [Join the Community](https://docs.getdbt.com/community/join.md) [Become a Contributor](https://docs.getdbt.com/community/contribute.md) [Open Source dbt Packages](https://hub.getdbt.com/) [Community Forum](https://docs.getdbt.com/community/forum) ###### Support [Contact Support](https://docs.getdbt.com/docs/dbt-support.md) [Professional Services](https://www.getdbt.com/services) [Find a Partner](https://www.getdbt.com/partner-directory) [System Status](https://status.getdbt.com/) ###### Connect with Us [](https://github.com/dbt-labs/docs.getdbt.com "GitHub") [](https://www.linkedin.com/company/dbtlabs/mycompany/ "LinkedIn") [](https://www.youtube.com/channel/UCVpBwKK-ecMEV75y1dYLE5w "YouTube") [](https://www.instagram.com/dbt_labs/ "Instagram") [](https://x.com/dbt_labs "X") [](https://bsky.app/profile/getdbt.com "Bluesky") [](https://www.getdbt.com/community/join-the-community/ "Community Slack") © 2026 dbt Labs, Inc. All Rights Reserved. [Terms of Service](https://www.getdbt.com/terms-of-use/) [Privacy Policy](https://www.getdbt.com/cloud/privacy-policy/) [Security](https://www.getdbt.com/security/) Cookie Settings --- ## Set up dbt ### About environments In software engineering, environments are used to enable engineers to develop and test code without impacting the users of their software. Typically, there are two types of environments in dbt: * **Deployment or Production** (or *prod*) — Refers to the environment that end users interact with. * **Development** (or *dev*) — Refers to the environment that engineers work in. This means that engineers can work iteratively when writing and testing new code in *development*. Once they are confident in these changes, they can deploy their code to *production*. In traditional software engineering, different environments often use completely separate architecture. For example, the dev and prod versions of a website may use different servers and databases. Data warehouses can also be designed to have separate environments — the *production* environment refers to the relations (for example, schemas, tables, and views) that your end users query (often through a BI tool). Configure environments to tell dbt or dbt Core how to build and execute your project in development and production: [![](/img/icons/dbt-bit.svg)](https://docs.getdbt.com/docs/dbt-cloud-environments.md) ###### [Environments in dbt](https://docs.getdbt.com/docs/dbt-cloud-environments.md) [Seamlessly configure development and deployment environments in dbt to control how your project runs in both the Studio IDE, dbt CLI, and dbt jobs.](https://docs.getdbt.com/docs/dbt-cloud-environments.md) [![](/img/icons/command-line.svg)](https://docs.getdbt.com/docs/local/dbt-core-environments.md) ###### [Environments in dbt Core](https://docs.getdbt.com/docs/local/dbt-core-environments.md) [Setup and maintain separate deployment and development environments through the use of targets within a profile file](https://docs.getdbt.com/docs/local/dbt-core-environments.md)
#### Related docs[​](#related-docs "Direct link to Related docs") * [dbt environment best practices](https://docs.getdbt.com/guides/set-up-ci.md) * [Deployment environments](https://docs.getdbt.com/docs/deploy/deploy-environments.md) * [About dbt Core versions](https://docs.getdbt.com/docs/dbt-versions/core.md) * [Set Environment variables in dbt](https://docs.getdbt.com/docs/build/environment-variables.md#special-environment-variables) * [Use Environment variables in jinja](https://docs.getdbt.com/reference/dbt-jinja-functions/env_var.md) #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### dbt setup and installation dbt compiles and runs your analytics code against your data platform, enabling you and your team to collaborate on a single source of truth for metrics, insights, and business definitions. There are two options for deploying dbt: * **dbt platform** (formerly dbt Cloud) runs the dbt Fusion engine or dbt Core in a hosted (single or multi-tenant) environment with a browser-based interface. The intuitive user interface aids you in setting up the various components. dbt comes equipped with turnkey support for scheduling jobs, CI/CD, hosting documentation, monitoring, and alerting. It also offers an integrated development environment (Studio IDE) and allows you to develop and run dbt commands from your local command line (CLI) or code editor. * **dbt Core** is an open-source command line tool that you can install locally in your environment, and communication with databases is facilitated through adapters. If you're not sure which is the right solution for you, read our [What is dbt?](https://docs.getdbt.com/docs/introduction.md) and our [dbt features](https://docs.getdbt.com/docs/cloud/about-cloud/dbt-cloud-features.md) articles to help you decide. If you still have questions, don't hesitate to [contact us](https://www.getdbt.com/contact/). To begin configuring dbt now, select the option that is right for you. [![](/img/icons/dbt-bit.svg)](https://docs.getdbt.com/docs/cloud/about-cloud-setup.md) ###### [dbt platform setup](https://docs.getdbt.com/docs/cloud/about-cloud-setup.md) [Learn how to connect to a data platform, integrate with secure authentication methods, and configure a sync with a git repo.](https://docs.getdbt.com/docs/cloud/about-cloud-setup.md) [![](/img/icons/dbt-bit.svg)](https://docs.getdbt.com/docs/local/install-dbt.md) ###### [dbt local setup](https://docs.getdbt.com/docs/local/install-dbt.md) [Learn how to set up dbt locally using the dbt VS Code extension or CLI.](https://docs.getdbt.com/docs/local/install-dbt.md) #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Run your dbt projects You can run your dbt projects locally or using the [dbt platform](https://docs.getdbt.com/docs/cloud/about-cloud/dbt-cloud-features.md) with the dbt framework. #### Common commands[​](#common-commands "Direct link to Common commands") In dbt, the commands you commonly use are: * [dbt run](https://docs.getdbt.com/reference/commands/run.md) — Run the models you defined in your project * [dbt build](https://docs.getdbt.com/reference/commands/build.md) — Build and test your selected resources such as models, seeds, snapshots, and tests * [dbt test](https://docs.getdbt.com/reference/commands/test.md) — Execute the tests you defined for your project For all dbt commands and their arguments (flags), see the [dbt command reference](https://docs.getdbt.com/reference/dbt-commands.md). To list all dbt commands from the command line, run `dbt --help`. To list a specific command's arguments, run `dbt COMMAND_NAME --help`.  New to the command line? If you're new to the command line: 1. Open your computer's terminal application (such as Terminal or iTerm) to access the command line. 2. Make sure you navigate to your dbt project directory before running any dbt commands. 3. These terminal commands help you navigate your file system: `cd` (change directory), `ls` (list directory contents), and `pwd` (present working directory). #### Where to run dbt[​](#where-to-run-dbt "Direct link to Where to run dbt") Use the dbt framework to quickly and collaboratively transform data and deploy analytics code following software engineering best practices like version control, modularity, portability, CI/CD, and documentation. This means anyone on the data team familiar with SQL can safely contribute to production-grade data pipelines. The dbt framework is composed of a *language* and an *engine*: * The *dbt language* is the code you write in your dbt project — SQL select statements, Jinja templating, YAML configs, tests, and more. It's the standard for the data industry and the foundation of the dbt framework. * The *dbt engine* compiles your project, executes your transformation graph, and produces metadata. dbt supports two engines which you can use depending on your needs: * The dbt Core engine, which renders Jinja and runs your models. * The dbt Fusion engine, which goes beyond Jinja rendering to statically analyze your SQL — validating syntax and logic before your SQL is sent to the database (saving compute resources), and supports LSP features. ##### dbt platform[​](#dbt-platform "Direct link to dbt platform") The dbt platform is a fully managed service that gives you a complete environment to build, test, deploy, and collaborate on dbt projects. You can develop in the browser or locally using the dbt Fusion engine or dbt Core engine. * [Develop in your browser using the Studio IDE](https://docs.getdbt.com/docs/cloud/studio-ide/develop-in-studio.md) * [Seamless drag-and-drop development with Canvas](https://docs.getdbt.com/docs/cloud/canvas.md) * [Run dbt commands from your local command line](#dbt-local-development) using dbt VS Code extension or dbt CLI (both which integrate seamlessly with the dbt platform project(s)). For more details, see [About dbt plans](https://www.getdbt.com/pricing). ##### dbt local development[​](#dbt-local-development "Direct link to dbt local development") You can run dbt locally with the dbt Fusion engine or the dbt Core engine: * [Install the dbt VS Code extension](https://docs.getdbt.com/docs/about-dbt-extension.md) — Combines dbt Fusion engine performance with visual features like autocomplete, inline errors, and lineage. Includes [LSP features](https://docs.getdbt.com/docs/about-dbt-lsp.md) and suitable for users with dbt platform projects or running dbt locally without a dbt platform project. *Recommended for local development.* * [Install the Fusion CLI](https://docs.getdbt.com/docs/local/install-dbt.md?version=2#get-started) — dbt Fusion engine from the command line, but doesn't include LSP features. * [Install the dbt CLI](https://docs.getdbt.com/docs/cloud/cloud-cli-installation.md) — The dbt platform CLI, which allows you to run dbt commands against your dbt platform development environment from your local command line. Requires a dbt platform project. * [Install dbt Core](https://docs.getdbt.com/docs/local/install-dbt.md) — The open-source, Python-based CLI that uses the dbt Core engine. Doesn't include LSP features. #### Related docs[​](#related-docs "Direct link to Related docs") * [About the dbt VS Code extension](https://docs.getdbt.com/docs/about-dbt-extension.md) * [dbt features](https://docs.getdbt.com/docs/cloud/about-cloud/dbt-cloud-features.md) * [Model selection syntax](https://docs.getdbt.com/reference/node-selection/syntax.md) * [dbt CLI](https://docs.getdbt.com/docs/cloud/cloud-cli-installation.md) * [Studio IDE features](https://docs.getdbt.com/docs/cloud/studio-ide/develop-in-studio.md#ide-features) * [Does dbt offer extract and load functionality?](https://docs.getdbt.com/faqs/Project/transformation-tool.md) * [Why does dbt compile need a data platform connection](https://docs.getdbt.com/faqs/Warehouse/db-connection-dbt-compile.md) #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### Using threads When dbt runs, it creates a directed acyclic graph (DAG) of links between models. The number of threads represents the maximum number of paths through the graph dbt may work on at once – increasing the number of threads can minimize the run time of your project. For example, if you specify `threads: 1`, dbt will start building only one model, and finish it, before moving onto the next. Specifying `threads: 8` means that dbt will work on *up to* 8 models at once without violating dependencies – the actual number of models it can work on will likely be constrained by the available paths through the dependency graph. There's no set limit of the maximum number of threads you can set – while increasing the number of threads generally decreases execution time, there are a number of things to consider: * Increasing the number of threads increases the load on your warehouse, which may impact other tools in your data stack. For example, if your BI tool uses the same compute resources as dbt, their queries may get queued during a dbt run. * The number of concurrent queries your database will allow you to run may be a limiting factor in how many models can be actively built – some models may queue while waiting for an available query slot. Generally the optimal number of threads depends on your data warehouse and its configuration. It’s best to test different values to find the best number of threads for your project. We recommend setting this to 4 to start with. You can use a different number of threads than the value defined in your target by using the `--threads` option when executing a dbt command. You will define the number of threads in your `profiles.yml` file (when developing locally with dbt Core and the dbt Fusion engine), dbt job definition, and dbt development credentials under your profile. #### Fusion engine thread optimization[​](#fusion-engine-thread-optimization "Direct link to Fusion engine thread optimization") The dbt Fusion engine handles threading differently than dbt Core. Rather than treating `threads` as a strict limit, Fusion manages parallelism based on each adapter's characteristics. ##### Adapter-specific behavior[​](#adapter-specific-behavior "Direct link to Adapter-specific behavior") | Adapter | Behavior | | -------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | **Snowflake** | Fusion ignores user-set threads and automatically optimizes parallelism for maximum performance.
The only supported override is `threads: 1`, which can also help resolve timeout issues if set. | | **Databricks** | Fusion ignores user-set threads and automatically optimizes parallelism for maximum performance.
The only supported override is `threads: 1`, which can also help resolve timeout issues if set. | | **BigQuery** | Fusion respects user-set threads to manage API rate limits.
Setting `--threads 0` (or omitting the setting) allows Fusion to dynamically optimize parallelism. | | **Redshift** | Fusion respects user-set threads to manage concurrency limits.
Setting `--threads 0` (or omitting the setting) allows Fusion to dynamically optimize parallelism. | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | For more information about Fusion's approach to parallelism, refer to [the dbt Fusion engine](https://docs.getdbt.com/docs/fusion.md) page. #### Related docs[​](#related-docs "Direct link to Related docs") * [About profiles.yml](https://docs.getdbt.com/docs/local/profiles.yml.md) * [dbt job scheduler](https://docs.getdbt.com/docs/deploy/job-scheduler.md) #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### dbt local installation #### dbt environments dbt makes it easy to maintain separate production and development environments through the use of [targets](https://docs.getdbt.com/reference/dbt-jinja-functions/target.md) within a [profile](https://docs.getdbt.com/docs/local/profiles.yml.md). A typical profile, when using dbt locally (for example, running from your command line), will have a target named `dev` and have this set as the default. This means that while making changes, your objects will be built in your *development* target without affecting production queries made by your end users. Once you are confident in your changes, you can deploy the code to *production*, by running your dbt project with a *prod* target. Running dbt in production You can learn more about different ways to run dbt in production in [this article](https://docs.getdbt.com/docs/deploy/deployments.md). Targets offer the flexibility to decide how to implement your separate environments – whether you want to use separate schemas, databases, or entirely different clusters altogether! We recommend using *different schemas within one database* to separate your environments. This is the easiest to set up and is the most cost-effective solution in a modern cloud-based data stack. In practice, this means that most of the details in a target will be consistent across all targets, except for the `schema` and user credentials. If you have multiple dbt users writing code, it often makes sense for *each user* to have their own *development* environment. A pattern we've found useful is to set your dev target schema to be `dbt_`. User credentials should also differ across targets so that each dbt user is using their own data warehouse user. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- #### Install dbt dbt enables data teams to transform data using analytics engineering best practices. You can run dbt locally through a command line interface (CLI) to build, test, and deploy your data transformations. #### dbt Fusion engine (recommended)[​](#dbt-fusion-engine-recommended "Direct link to dbt Fusion engine (recommended)") For the best local development experience, we recommend the dbt Fusion engine. Built in Rust, Fusion delivers: * **Faster performance** — Up to 10x faster parsing, compilation, and execution. * **SQL comprehension** — Dialect-aware validation catches errors before they reach your warehouse. * **Column-level lineage** — Trace data flow across your entire project. [Install Fusion now!](https://docs.getdbt.com/docs/local/install-dbt.md?version=2#installation) ##### dbt VS Code extension[​](#dbt-vs-code-extension "Direct link to dbt VS Code extension") The [dbt VS Code extension](https://docs.getdbt.com/docs/dbt-extension-features.md) combines Fusion's performance with powerful LSP editor features: * **IntelliSense** — Autocomplete for models, macros, and columns. * **Inline errors** — See SQL errors as you type. * **Hover insights** — View model definitions and column info without leaving your code. * **Refactoring tools** — Rename models and columns across your project. This is the fastest way to get started with dbt locally. [Install Fusion with the dbt VS Code extension](https://docs.getdbt.com/docs/local/install-dbt.md?version=2#installation) #### dbt Core[​](#dbt-core "Direct link to dbt Core") [dbt Core](https://docs.getdbt.com/docs/local/install-dbt.md?version=1) is the original Python-based dbt engine. dbt Core changed data transformation forever and includes a rich set of features: * **Apache License 2.0** — dbt Core is open source now and forever. * **Community adapters** — An amazing community of contributors has built adapters for a vast [catalog of data warehouses](https://docs.getdbt.com/docs/supported-data-platforms.md). * **Code editor support** — Build your dbt project in popular editors like VS Code or Cursor. * **Command line interface** — Run your project from the terminal using macOS Terminal, iTerm, or the integrated terminal in your code editor. [Install dbt Core now!](https://docs.getdbt.com/docs/local/install-dbt.md?version=1#installation) #### Installation[​](#installation "Direct link to Installation") #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- #### Connect data platform ##### About data platform connections dbt connects to your data platform to run SQL transformations against your data. The connection setup depends on which dbt engine you use: * [dbt Fusion engine](https://docs.getdbt.com/docs/local/connect-data-platform/about-dbt-connections.md?version=2) * [dbt Core](https://docs.getdbt.com/docs/local/connect-data-platform/about-dbt-connections.md?version=1) #### Next steps[​](#next-steps "Direct link to Next steps") For step-by-step setup instructions with demo project data, see our [Quickstart guides](https://docs.getdbt.com/guides.md). #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ##### BigQuery #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ##### Connect AlloyDB to dbt Core * **Maintained by**: dbt Labs * **Authors**: dbt Labs * **GitHub repo**: [dbt-labs/dbt-adapters](https://github.com/dbt-labs/dbt-adapters) [![](https://img.shields.io/github/stars/dbt-labs/dbt-adapters?style=for-the-badge)](https://github.com/dbt-labs/dbt-adapters) * **PyPI package**: `dbt-postgres` [![](https://badge.fury.io/py/dbt-postgres.svg)](https://badge.fury.io/py/dbt-postgres) * **Slack channel**: [#db-postgres](https://getdbt.slack.com/archives/C0172G2E273) * **Supported dbt Core version**: v1.0.0 and newer * **dbt support**: Not Supported * **Minimum data platform version**: ? #### Installing dbt-postgres Use `pip` to install the adapter. Before 1.8, installing the adapter would automatically install `dbt-core` and any additional dependencies. Beginning in 1.8, installing an adapter does not automatically install `dbt-core`. This is because adapters and dbt Core versions have been decoupled from each other so we no longer want to overwrite existing dbt-core installations. Use the following command for installation: `python -m pip install dbt-core dbt-postgres` #### Configuring dbt-postgres For AlloyDB-specific configuration, please refer to [AlloyDB configs.](https://docs.getdbt.com/reference/resource-configs/postgres-configs.md) #### Profile Configuration[​](#profile-configuration "Direct link to Profile Configuration") AlloyDB targets are configured exactly the same as [Postgres targets](https://docs.getdbt.com/docs/local/connect-data-platform/postgres-setup.md#profile-configuration). #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ##### Connect Apache Spark to dbt Core `profiles.yml` file is for dbt Core and dbt fusion only If you're using dbt platform, you don't need to create a `profiles.yml` file. This file is only necessary when you use dbt Core or dbt Fusion locally. To learn more about Fusion prerequisites, refer to [Supported features](https://docs.getdbt.com/docs/fusion/supported-features.md). To connect your data platform to dbt, refer to [About data platforms](https://docs.getdbt.com/docs/cloud/connect-data-platform/about-connections.md). If you're using Databricks, use `dbt-databricks` If you're using Databricks, the `dbt-databricks` adapter is recommended over `dbt-spark`. If you're still using dbt-spark with Databricks consider [migrating from the dbt-spark adapter to the dbt-databricks adapter](https://docs.getdbt.com/guides/migrate-from-spark-to-databricks.md). For the Databricks version of this page, refer to [Databricks setup](#databricks-setup). * **Maintained by**: dbt Labs * **Authors**: core dbt maintainers * **GitHub repo**: [dbt-labs/dbt-adapters](https://github.com/dbt-labs/dbt-adapters) [![](https://img.shields.io/github/stars/dbt-labs/dbt-adapters?style=for-the-badge)](https://github.com/dbt-labs/dbt-adapters) * **PyPI package**: `dbt-spark` [![](https://badge.fury.io/py/dbt-spark.svg)](https://badge.fury.io/py/dbt-spark) * **Slack channel**: [db-databricks-and-spark](https://getdbt.slack.com/archives/CNGCW8HKL) * **Supported dbt Core version**: v0.15.0 and newer * **dbt support**: Supported * **Minimum data platform version**: n/a #### Installing dbt-spark Use `pip` to install the adapter. Before 1.8, installing the adapter would automatically install `dbt-core` and any additional dependencies. Beginning in 1.8, installing an adapter does not automatically install `dbt-core`. This is because adapters and dbt Core versions have been decoupled from each other so we no longer want to overwrite existing dbt-core installations. Use the following command for installation: `python -m pip install dbt-core dbt-spark` #### Configuring dbt-spark For Spark-specific configuration, please refer to [Spark configs.](https://docs.getdbt.com/reference/resource-configs/spark-configs.md) If connecting to Databricks via ODBC driver, it requires `pyodbc`. Depending on your system, you can install it separately or via pip. See the [`pyodbc` wiki](https://github.com/mkleehammer/pyodbc/wiki/Install) for OS-specific installation details. If connecting to a Spark cluster via the generic thrift or http methods, it requires `PyHive`. ```zsh # odbc connections $ python -m pip install "dbt-spark[ODBC]" # thrift or http connections $ python -m pip install "dbt-spark[PyHive]" ``` ```zsh # session connections $ python -m pip install "dbt-spark[session]" ``` #### Configuring dbt-spark For Spark-specific configuration please refer to [Spark Configuration](https://docs.getdbt.com/reference/resource-configs/spark-configs.md) For further info, refer to the GitHub repository: [dbt-labs/dbt-adapters](https://github.com/dbt-labs/dbt-adapters) #### Connection methods[​](#connection-methods "Direct link to Connection methods") dbt-spark can connect to Spark clusters by four different methods: * [`odbc`](#odbc) is the preferred method when connecting to Databricks. It supports connecting to a SQL Endpoint or an all-purpose interactive cluster. * [`thrift`](#thrift) connects directly to the lead node of a cluster, either locally hosted / on premise or in the cloud (for example, Amazon EMR). * [`http`](#http) is a more generic method for connecting to a managed service that provides an HTTP endpoint. Currently, this includes connections to a Databricks interactive cluster. * [`session`](#session) connects to a pySpark session, running locally or on a remote machine. Advanced functionality The `session` connection method is intended for advanced users and experimental dbt development. This connection method is not supported by dbt. ##### ODBC[​](#odbc "Direct link to ODBC") Use the `odbc` connection method if you are connecting to a Databricks SQL endpoint or interactive cluster via ODBC driver. (Download the latest version of the official driver [here](https://databricks.com/spark/odbc-driver-download).) \~/.dbt/profiles.yml ```yaml your_profile_name: target: dev outputs: dev: type: spark method: odbc driver: [path/to/driver] schema: [database/schema name] host: [yourorg.sparkhost.com] organization: [org id] # Azure Databricks only token: [abc123] # one of: endpoint: [endpoint id] cluster: [cluster id] # optional port: [port] # default 443 user: [user] server_side_parameters: "spark.driver.memory": "4g" ``` ##### Thrift[​](#thrift "Direct link to Thrift") Use the `thrift` connection method if you are connecting to a Thrift server sitting in front of a Spark cluster, for example, a cluster running locally or on Amazon EMR. \~/.dbt/profiles.yml ```yaml your_profile_name: target: dev outputs: dev: type: spark method: thrift schema: [database/schema name] host: [hostname] # optional port: [port] # default 10001 user: [user] auth: [for example, KERBEROS] kerberos_service_name: [for example, hive] use_ssl: [true|false] # value of hive.server2.use.SSL, default false server_side_parameters: "spark.driver.memory": "4g" ``` ##### HTTP[​](#http "Direct link to HTTP") Use the `http` method if your Spark provider supports generic connections over HTTP (for example, Databricks interactive cluster). \~/.dbt/profiles.yml ```yaml your_profile_name: target: dev outputs: dev: type: spark method: http schema: [database/schema name] host: [yourorg.sparkhost.com] organization: [org id] # Azure Databricks only token: [abc123] cluster: [cluster id] # optional port: [port] # default: 443 user: [user] connect_timeout: 60 # default 10 connect_retries: 5 # default 0 server_side_parameters: "spark.driver.memory": "4g" ``` Databricks interactive clusters can take several minutes to start up. You may include the optional profile configs `connect_timeout` and `connect_retries`, and dbt will periodically retry the connection. ##### Session[​](#session "Direct link to Session") Use the `session` method if you want to run `dbt` against a pySpark session. \~/.dbt/profiles.yml ```yaml your_profile_name: target: dev outputs: dev: type: spark method: session schema: [database/schema name] host: NA # not used, but required by `dbt-core` server_side_parameters: "spark.driver.memory": "4g" ``` #### Optional configurations[​](#optional-configurations "Direct link to Optional configurations") ##### Retries[​](#retries "Direct link to Retries") Intermittent errors can crop up unexpectedly while running queries against Apache Spark. If `retry_all` is enabled, dbt-spark will naively retry any query that fails, based on the configuration supplied by `connect_timeout` and `connect_retries`. It does not attempt to determine if the query failure was transient or likely to succeed on retry. This configuration is recommended in production environments, where queries ought to be succeeding. For instance, this will instruct dbt to retry all failed queries up to 3 times, with a 5 second delay between each retry: \~/.dbt/profiles.yml ```yaml retry_all: true connect_timeout: 5 connect_retries: 3 ``` ##### Server side configuration[​](#server-side-configuration "Direct link to Server side configuration") Spark can be customized using [Application Properties](https://spark.apache.org/docs/latest/configuration.html). Using these properties the execution can be customized, for example, to allocate more memory to the driver process. Also, the Spark SQL runtime can be set through these properties. For example, this allows the user to [set a Spark catalogs](https://spark.apache.org/docs/latest/configuration.html#spark-sql). #### Caveats[​](#caveats "Direct link to Caveats") When facing difficulties, run `poetry run dbt debug --log-level=debug`. The logs are saved at `logs/dbt.log`. ##### Usage with EMR[​](#usage-with-emr "Direct link to Usage with EMR") To connect to Apache Spark running on an Amazon EMR cluster, you will need to run `sudo /usr/lib/spark/sbin/start-thriftserver.sh` on the master node of the cluster to start the Thrift server (see [the docs](https://aws.amazon.com/premiumsupport/knowledge-center/jdbc-connection-emr/) for more information). You will also need to connect to port 10001, which will connect to the Spark backend Thrift server; port 10000 will instead connect to a Hive backend, which will not work correctly with dbt. ##### Supported functionality[​](#supported-functionality "Direct link to Supported functionality") Most dbt Core functionality is supported, but some features are only available on Delta Lake (Databricks). Delta-only features: 1. Incremental model updates by `unique_key` instead of `partition_by` (see [`merge` strategy](https://docs.getdbt.com/reference/resource-configs/spark-configs.md#the-merge-strategy)) 2. [Snapshots](https://docs.getdbt.com/docs/build/snapshots.md) 3. [Persisting](https://docs.getdbt.com/reference/resource-configs/persist_docs.md) column-level descriptions as database comments ##### Default namespace with Thrift connection method[​](#default-namespace-with-thrift-connection-method "Direct link to Default namespace with Thrift connection method") To run metadata queries in dbt, you need to have a namespace named `default` in Spark when connecting with Thrift. You can check available namespaces by using Spark's `pyspark` and running `spark.sql("SHOW NAMESPACES").show()`. If the default namespace doesn't exist, create it by running `spark.sql("CREATE NAMESPACE default").show()`. If there's a network connection issue, your logs will display an error like `Could not connect to any of [('127.0.0.1', 10000)]` (or something similar). #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ##### Connect Athena to dbt Core * **Maintained by**: dbt Labs * **Authors**: dbt Labs * **GitHub repo**: [dbt-labs/dbt-adapters](https://github.com/dbt-labs/dbt-adapters) [![](https://img.shields.io/github/stars/dbt-labs/dbt-adapters?style=for-the-badge)](https://github.com/dbt-labs/dbt-adapters) * **PyPI package**: `dbt-athena` [![](https://badge.fury.io/py/dbt-athena.svg)](https://badge.fury.io/py/dbt-athena) * **Slack channel**: [#db-athena](https://getdbt.slack.com/archives/C013MLFR7BQ) * **Supported dbt Core version**: v1.3.0 and newer * **dbt support**: Supported * **Minimum data platform version**: engine version 2 and 3 #### Installing dbt-athena Use `pip` to install the adapter. Before 1.8, installing the adapter would automatically install `dbt-core` and any additional dependencies. Beginning in 1.8, installing an adapter does not automatically install `dbt-core`. This is because adapters and dbt Core versions have been decoupled from each other so we no longer want to overwrite existing dbt-core installations. Use the following command for installation: `python -m pip install dbt-core dbt-athena` #### Configuring dbt-athena For Athena-specific configuration, please refer to [Athena configs.](https://docs.getdbt.com/reference/resource-configs/athena-configs.md) `dbt-athena` vs `dbt-athena-community` `dbt-athena-community` was the community-maintained adapter until dbt Labs took over maintenance in late 2024. Both `dbt-athena` and `dbt-athena-community` are maintained by dbt Labs, but `dbt-athena-community` is now simply a wrapper on `dbt-athena`, published for backwards compatibility. #### Connecting to Athena with dbt-athena[​](#connecting-to-athena-with-dbt-athena "Direct link to Connecting to Athena with dbt-athena") This plugin does not accept any credentials directly. Instead, [credentials are determined automatically](https://boto3.amazonaws.com/v1/documentation/api/latest/guide/credentials.html) based on AWS CLI/boto3 conventions and stored login info. You can configure the AWS profile name to use via aws\_profile\_name. Check out the dbt profile configuration below for details. \~/.dbt/profiles.yml ```yaml default: outputs: dev: type: athena s3_staging_dir: [s3_staging_dir] s3_data_dir: [s3_data_dir] s3_data_naming: [table_unique] # the type of naming convention used when writing to S3 region_name: [region_name] database: [database name] schema: [dev_schema] aws_profile_name: [optional profile to use from your AWS shared credentials file.] threads: [1 or more] num_retries: [0 or more] # number of retries performed by the adapter. Defaults to 5 target: dev ``` ##### Example Config[​](#example-config "Direct link to Example Config") profiles.yml ```yaml default: outputs: dev: type: athena s3_staging_dir: s3://dbt_demo_bucket/athena-staging/ s3_data_dir: s3://dbt_demo_bucket/dbt-data/ s3_data_naming: schema_table region_name: us-east-1 database: warehouse schema: dev aws_profile_name: demo threads: 4 num_retries: 3 target: dev ``` #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ##### Connect AWS Glue to dbt Core Community plugin Some core functionality may be limited. If you're interested in contributing, check out the source code for each repository listed below. * **Maintained by**: Community * **Authors**: Benjamin Menuet, Moshir Mikael, Armando Segnini and Amine El Mallem * **GitHub repo**: [aws-samples/dbt-glue](https://github.com/aws-samples/dbt-glue) [![](https://img.shields.io/github/stars/aws-samples/dbt-glue?style=for-the-badge)](https://github.com/aws-samples/dbt-glue) * **PyPI package**: `dbt-glue` [![](https://badge.fury.io/py/dbt-glue.svg)](https://badge.fury.io/py/dbt-glue) * **Slack channel**: [#db-glue](https://getdbt.slack.com/archives/C02R4HSMBAT) * **Supported dbt Core version**: v0.24.0 and newer * **dbt support**: Not Supported * **Minimum data platform version**: Glue 2.0 #### Installing dbt-glue Use `pip` to install the adapter. Before 1.8, installing the adapter would automatically install `dbt-core` and any additional dependencies. Beginning in 1.8, installing an adapter does not automatically install `dbt-core`. This is because adapters and dbt Core versions have been decoupled from each other so we no longer want to overwrite existing dbt-core installations. Use the following command for installation: `python -m pip install dbt-core dbt-glue` #### Configuring dbt-glue For AWS Glue-specific configuration, please refer to [AWS Glue configs.](https://docs.getdbt.com/reference/resource-configs/glue-configs.md) For further (and more likely up-to-date) info, see the [README](https://github.com/aws-samples/dbt-glue#readme) #### Connection Methods[​](#connection-methods "Direct link to Connection Methods") ##### Configuring your AWS profile for Glue Interactive Session[​](#configuring-your-aws-profile-for-glue-interactive-session "Direct link to Configuring your AWS profile for Glue Interactive Session") There are two IAM principals used with interactive sessions. * Client principal: The principal (either user or role) calling the AWS APIs (Glue, Lake Formation, Interactive Sessions) from the local client. This is the principal configured in the AWS CLI and is likely the same. * Service role: The IAM role that AWS Glue uses to execute your session. This is the same as AWS Glue ETL. Read [this documentation](https://docs.aws.amazon.com/glue/latest/dg/glue-is-security.html) to configure these principals. You will find below a least privileged policy to enjoy all features of **`dbt-glue`** adapter. Please to update variables between **`<>`**, here are explanations of these arguments: | Args | Description | | ------------------- | ---------------------------------------------------------------------------------------------------------------------------- | | region | The region where your Glue database is stored | | AWS Account | The AWS account where you run your pipeline | | dbt output database | The database updated by dbt (this is the schema configured in the profile.yml of your dbt environment) | | dbt source database | All databases used as source | | dbt output bucket | The bucket name where the data will be generated by dbt (the location configured in the profile.yml of your dbt environment) | | dbt source bucket | The bucket name of source databases (if they are not managed by Lake Formation) | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | sample\_IAM\_Policy.yml ```yml { "Version": "2012-10-17", "Statement": [ { "Sid": "Read_and_write_databases", "Action": [ "glue:SearchTables", "glue:BatchCreatePartition", "glue:CreatePartitionIndex", "glue:DeleteDatabase", "glue:GetTableVersions", "glue:GetPartitions", "glue:DeleteTableVersion", "glue:UpdateTable", "glue:DeleteTable", "glue:DeletePartitionIndex", "glue:GetTableVersion", "glue:UpdateColumnStatisticsForTable", "glue:CreatePartition", "glue:UpdateDatabase", "glue:CreateTable", "glue:GetTables", "glue:GetDatabases", "glue:GetTable", "glue:GetDatabase", "glue:GetPartition", "glue:UpdateColumnStatisticsForPartition", "glue:CreateDatabase", "glue:BatchDeleteTableVersion", "glue:BatchDeleteTable", "glue:DeletePartition", "glue:GetUserDefinedFunctions", "lakeformation:ListResources", "lakeformation:BatchGrantPermissions", "lakeformation:ListPermissions", "lakeformation:GetDataAccess", "lakeformation:GrantPermissions", "lakeformation:RevokePermissions", "lakeformation:BatchRevokePermissions", "lakeformation:AddLFTagsToResource", "lakeformation:RemoveLFTagsFromResource", "lakeformation:GetResourceLFTags", "lakeformation:ListLFTags", "lakeformation:GetLFTag", ], "Resource": [ "arn:aws:glue:::catalog", "arn:aws:glue:::table//*", "arn:aws:glue:::database/" ], "Effect": "Allow" }, { "Sid": "Read_only_databases", "Action": [ "glue:SearchTables", "glue:GetTableVersions", "glue:GetPartitions", "glue:GetTableVersion", "glue:GetTables", "glue:GetDatabases", "glue:GetTable", "glue:GetDatabase", "glue:GetPartition", "lakeformation:ListResources", "lakeformation:ListPermissions" ], "Resource": [ "arn:aws:glue:::table//*", "arn:aws:glue:::database/", "arn:aws:glue:::database/default", "arn:aws:glue:::database/global_temp" ], "Effect": "Allow" }, { "Sid": "Storage_all_buckets", "Action": [ "s3:GetBucketLocation", "s3:ListBucket" ], "Resource": [ "arn:aws:s3:::", "arn:aws:s3:::" ], "Effect": "Allow" }, { "Sid": "Read_and_write_buckets", "Action": [ "s3:PutObject", "s3:PutObjectAcl", "s3:GetObject", "s3:DeleteObject" ], "Resource": [ "arn:aws:s3:::" ], "Effect": "Allow" }, { "Sid": "Read_only_buckets", "Action": [ "s3:GetObject" ], "Resource": [ "arn:aws:s3:::" ], "Effect": "Allow" } ] } ``` ##### Configuration of the local environment[​](#configuration-of-the-local-environment "Direct link to Configuration of the local environment") Because **`dbt`** and **`dbt-glue`** adapters are compatible with Python versions 3.9 or higher, check the version of Python: ```bash $ python3 --version ``` Configure a Python virtual environment to isolate package version and code dependencies: ```bash $ sudo yum install git $ python3 -m venv dbt_venv $ source dbt_venv/bin/activate $ python3 -m pip install --upgrade pip ``` Configure the last version of AWS CLI ```bash $ curl "https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip" -o "awscliv2.zip" $ unzip awscliv2.zip $ sudo ./aws/install ``` Install boto3 package ```bash $ sudo yum install gcc krb5-devel.x86_64 python3-devel.x86_64 -y $ pip3 install —upgrade boto3 ``` Install the package: ```bash $ pip3 install dbt-glue ``` ##### Example config[​](#example-config "Direct link to Example config") profiles.yml ```yml type: glue query-comment: This is a glue dbt example role_arn: arn:aws:iam::1234567890:role/GlueInteractiveSessionRole region: us-east-1 workers: 2 worker_type: G.1X idle_timeout: 10 schema: "dbt_demo" session_provisioning_timeout_in_seconds: 120 location: "s3://dbt_demo_bucket/dbt_demo_data" ``` The table below describes all the options. | Option | Description | Mandatory | | ------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | --------- | | project\_name | The dbt project name. This must be the same as the one configured in the dbt project. | yes | | type | The driver to use. | yes | | query-comment | A string to inject as a comment in each query that dbt runs. | no | | role\_arn | The ARN of the glue interactive session IAM role. | yes | | region | The AWS Region where you run the data pipeline. | yes | | workers | The number of workers of a defined workerType that are allocated when a job runs. | yes | | worker\_type | The type of predefined worker that is allocated when a job runs. Accepts a value of Standard, G.1X, or G.2X. | yes | | schema | The schema used to organize data stored in Amazon S3.Additionally, is the database in AWS Lake Formation that stores metadata tables in the Data Catalog. | yes | | session\_provisioning\_timeout\_in\_seconds | The timeout in seconds for AWS Glue interactive session provisioning. | yes | | location | The Amazon S3 location of your target data. | yes | | query\_timeout\_in\_minutes | The timeout in minutes for a single query. Default is 300 | no | | idle\_timeout | The AWS Glue session idle timeout in minutes. (The session stops after being idle for the specified amount of time) | no | | glue\_version | The version of AWS Glue for this session to use. Currently, the only valid options are 2.0 and 3.0. The default value is 3.0. | no | | security\_configuration | The security configuration to use with this session. | no | | connections | A comma-separated list of connections to use in the session. | no | | conf | Specific configuration used at the startup of the Glue Interactive Session (arg --conf) | no | | extra\_py\_files | Extra python Libs that can be used by the interactive session. | no | | delta\_athena\_prefix | A prefix used to create Athena-compatible tables for Delta tables (if not specified, then no Athena-compatible table will be created) | no | | tags | The map of key-value pairs (tags) belonging to the session. Ex: `KeyName1=Value1,KeyName2=Value2` | no | | seed\_format | By default `parquet`, can be Spark format compatible like `csv` or `json` | no | | seed\_mode | By default `overwrite`, the seed data will be overwritten, you can set it to `append` if you just want to add new data in your dataset | no | | default\_arguments | The map of key-value pairs parameters belonging to the session. More information on [Job parameters used by AWS Glue](https://docs.aws.amazon.com/glue/latest/dg/aws-glue-programming-etl-glue-arguments.html). Ex: `--enable-continuous-cloudwatch-log=true,--enable-continuous-log-filter=true` | no | | glue\_session\_id | re-use the glue-session to run multiple dbt run commands: set a glue session id you need to use | no | | glue\_session\_reuse | Reuse the glue-session to run multiple dbt run commands: If set to true, the glue session will not be closed for re-use. If set to false, the session will be closed | no | | datalake\_formats | The ACID data lake format that you want to use if you are doing merge, can be `hudi`, `ìceberg` or `delta` | no | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | #### Configs[​](#configs "Direct link to Configs") ##### Configuring tables[​](#configuring-tables "Direct link to Configuring tables") When materializing a model as `table`, you may include several optional configs that are specific to the dbt-spark plugin, in addition to the standard [model configs](https://docs.getdbt.com/reference/model-configs.md). | Option | Description | Required? | Example | | ---------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | --------------------------------------- | ------------------------------------------ | | file\_format | The file format to use when creating tables (`parquet`, `csv`, `json`, `text`, `jdbc` or `orc`). | Optional | `parquet` | | partition\_by | Partition the created table by the specified columns. A directory is created for each partition. | Optional | `date_day` | | clustered\_by | Each partition in the created table will be split into a fixed number of buckets by the specified columns. | Optional | `country_code` | | buckets | The number of buckets to create while clustering | Required if `clustered_by` is specified | `8` | | custom\_location | By default, the adapter will store your data in the following path: `location path`/`schema`/`table`. If you don't want to follow that default behaviour, you can use this parameter to set your own custom location on S3 | No | `s3://mycustombucket/mycustompath` | | hudi\_options | When using file\_format `hudi`, gives the ability to overwrite any of the default configuration options. | Optional | `{'hoodie.schema.on.read.enable': 'true'}` | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | #### Incremental models[​](#incremental-models "Direct link to Incremental models") dbt seeks to offer useful and intuitive modeling abstractions by means of its built-in configurations and materializations. For that reason, the dbt-glue plugin leans heavily on the [`incremental_strategy` config](https://docs.getdbt.com/docs/build/incremental-models.md). This config tells the incremental materialization how to build models in runs beyond their first. It can be set to one of three values: * **`append`** (default): Insert new records without updating or overwriting any existing data. * **`insert_overwrite`**: If `partition_by` is specified, overwrite partitions in the table with new data. If no `partition_by` is specified, overwrite the entire table with new data. * **`merge`** (Apache Hudi and Apache Iceberg only): Match records based on a `unique_key`; update old records, and insert new ones. (If no `unique_key` is specified, all new data is inserted, similar to `append`.) Each of these strategies has its pros and cons, which we'll discuss below. As with any model config, `incremental_strategy` may be specified in `dbt_project.yml` or within a model file's `config()` block. **Notes:** The default strategy is **`insert_overwrite`** ##### The `append` strategy[​](#the-append-strategy "Direct link to the-append-strategy") Following the `append` strategy, dbt will perform an `insert into` statement with all new data. The appeal of this strategy is that it is straightforward and functional across all platforms, file types, connection methods, and Apache Spark versions. However, this strategy *cannot* update, overwrite, or delete existing data, so it is likely to insert duplicate records for many data sources. ###### Source code[​](#source-code "Direct link to Source code") ```sql {{ config( materialized='incremental', incremental_strategy='append', ) }} -- All rows returned by this query will be appended to the existing table select * from {{ ref('events') }} {% if is_incremental() %} where event_ts > (select max(event_ts) from {{ this }}) {% endif %} ``` ###### Run Code[​](#run-code "Direct link to Run Code") ```sql create temporary view spark_incremental__dbt_tmp as select * from analytics.events where event_ts >= (select max(event_ts) from {{ this }}) ; insert into table analytics.spark_incremental select `date_day`, `users` from spark_incremental__dbt_tmp ``` ##### The `insert_overwrite` strategy[​](#the-insert_overwrite-strategy "Direct link to the-insert_overwrite-strategy") This strategy is most effective when specified alongside a `partition_by` clause in your model config. dbt will run an [atomic `insert overwrite` statement](https://spark.apache.org/docs/latest/sql-ref-syntax-dml-insert-overwrite-table.html) that dynamically replaces all partitions included in your query. Be sure to re-select *all* of the relevant data for a partition when using this incremental strategy. If no `partition_by` is specified, then the `insert_overwrite` strategy will atomically replace all contents of the table, overriding all existing data with only the new records. The column schema of the table remains the same, however. This can be desirable in some limited circumstances since it minimizes downtime while the table contents are overwritten. The operation is comparable to running `truncate` + `insert` on other databases. For atomic replacement of Delta-formatted tables, use the `table` materialization (which runs `create or replace`) instead. ###### Source Code[​](#source-code-1 "Direct link to Source Code") ```sql {{ config( materialized='incremental', partition_by=['date_day'], file_format='parquet' ) }} /* Every partition returned by this query will be overwritten when this model runs */ with new_events as ( select * from {{ ref('events') }} {% if is_incremental() %} where date_day >= date_add(current_date, -1) {% endif %} ) select date_day, count(*) as users from events group by 1 ``` ###### Run Code[​](#run-code-1 "Direct link to Run Code") ```sql create temporary view spark_incremental__dbt_tmp as with new_events as ( select * from analytics.events where date_day >= date_add(current_date, -1) ) select date_day, count(*) as users from events group by 1 ; insert overwrite table analytics.spark_incremental partition (date_day) select `date_day`, `users` from spark_incremental__dbt_tmp ``` Specifying `insert_overwrite` as the incremental strategy is optional since it's the default strategy used when none is specified. ##### The `merge` strategy[​](#the-merge-strategy "Direct link to the-merge-strategy") **Compatibility:** * Hudi : OK * Delta Lake : OK * Iceberg : OK * Lake Formation Governed Tables : On going NB: * For Glue 3: you have to set up a [Glue connectors](https://docs.aws.amazon.com/glue/latest/ug/connectors-chapter.html). * For Glue 4: use the `datalake_formats` option in your profile.yml When using a connector be sure that your IAM role has these policies: ```text { "Sid": "access_to_connections", "Action": [ "glue:GetConnection", "glue:GetConnections" ], "Resource": [ "arn:aws:glue:::catalog", "arn:aws:glue:::connection/*" ], "Effect": "Allow" } ``` and that the managed policy `AmazonEC2ContainerRegistryReadOnly` is attached. Be sure that you follow the getting started instructions [here](https://docs.aws.amazon.com/glue/latest/ug/setting-up.html#getting-started-min-privs-connectors). This [blog post](https://aws.amazon.com/blogs/big-data/part-1-integrate-apache-hudi-delta-lake-apache-iceberg-datasets-at-scale-aws-glue-studio-notebook/) also explains how to set up and works with Glue Connectors ###### Hudi[​](#hudi "Direct link to Hudi") **Usage notes:** The `merge` with Hudi incremental strategy requires: * To add `file_format: hudi` in your table configuration * To add a datalake\_formats in your profile : `datalake_formats: hudi` * Alternatively, to add a connection in your profile: `connections: name_of_your_hudi_connector` * To add Kryo serializer in your Interactive Session Config (in your profile): `conf: spark.serializer=org.apache.spark.serializer.KryoSerializer --conf spark.sql.hive.convertMetastoreParquet=false` dbt will run an [atomic `merge` statement](https://hudi.apache.org/docs/writing_data#spark-datasource-writer) which looks nearly identical to the default merge behavior on Snowflake and BigQuery. If a `unique_key` is specified (recommended), dbt will update old records with values from new records that match the key column. If a `unique_key` is not specified, dbt will forgo match criteria and simply insert all new records (similar to `append` strategy). ###### Profile config example[​](#profile-config-example "Direct link to Profile config example") ```yaml test_project: target: dev outputs: dev: type: glue query-comment: my comment role_arn: arn:aws:iam::1234567890:role/GlueInteractiveSessionRole region: eu-west-1 glue_version: "4.0" workers: 2 worker_type: G.1X schema: "dbt_test_project" session_provisioning_timeout_in_seconds: 120 location: "s3://aws-dbt-glue-datalake-1234567890-eu-west-1/" conf: spark.serializer=org.apache.spark.serializer.KryoSerializer --conf spark.sql.hive.convertMetastoreParquet=false datalake_formats: hudi ``` ###### Source Code example[​](#source-code-example "Direct link to Source Code example") ```sql {{ config( materialized='incremental', incremental_strategy='merge', unique_key='user_id', file_format='hudi', hudi_options={ 'hoodie.datasource.write.precombine.field': 'eventtime', } ) }} with new_events as ( select * from {{ ref('events') }} {% if is_incremental() %} where date_day >= date_add(current_date, -1) {% endif %} ) select user_id, max(date_day) as last_seen from events group by 1 ``` ###### Delta[​](#delta "Direct link to Delta") You can also use Delta Lake to be able to use merge feature on tables. **Usage notes:** The `merge` with Delta incremental strategy requires: * To add `file_format: delta` in your table configuration * To add a datalake\_formats in your profile : `datalake_formats: delta` * Alternatively, to add a connection in your profile: `connections: name_of_your_delta_connector` * To add the following config in your Interactive Session Config (in your profile): `conf: "spark.sql.extensions=io.delta.sql.DeltaSparkSessionExtension --conf spark.sql.catalog.spark_catalog=org.apache.spark.sql.delta.catalog.DeltaCatalog` **Athena:** Athena is not compatible by default with delta tables, but you can configure the adapter to create Athena tables on top of your delta table. To do so, you need to configure the two following options in your profile: * For Delta Lake 2.1.0 supported natively in Glue 4.0: `extra_py_files: "/opt/aws_glue_connectors/selected/datalake/delta-core_2.12-2.1.0.jar"` * For Delta Lake 1.0.0 supported natively in Glue 3.0: `extra_py_files: "/opt/aws_glue_connectors/selected/datalake/delta-core_2.12-1.0.0.jar"` * `delta_athena_prefix: "the_prefix_of_your_choice"` * If your table is partitioned, then the addition of new partition is not automatic, you need to perform an `MSCK REPAIR TABLE your_delta_table` after each new partition adding ###### Profile config example[​](#profile-config-example-1 "Direct link to Profile config example") ```yaml test_project: target: dev outputs: dev: type: glue query-comment: my comment role_arn: arn:aws:iam::1234567890:role/GlueInteractiveSessionRole region: eu-west-1 glue_version: "4.0" workers: 2 worker_type: G.1X schema: "dbt_test_project" session_provisioning_timeout_in_seconds: 120 location: "s3://aws-dbt-glue-datalake-1234567890-eu-west-1/" datalake_formats: delta conf: "spark.sql.extensions=io.delta.sql.DeltaSparkSessionExtension --conf spark.sql.catalog.spark_catalog=org.apache.spark.sql.delta.catalog.DeltaCatalog" extra_py_files: "/opt/aws_glue_connectors/selected/datalake/delta-core_2.12-2.1.0.jar" delta_athena_prefix: "delta" ``` ###### Source Code example[​](#source-code-example-1 "Direct link to Source Code example") ```sql {{ config( materialized='incremental', incremental_strategy='merge', unique_key='user_id', partition_by=['dt'], file_format='delta' ) }} with new_events as ( select * from {{ ref('events') }} {% if is_incremental() %} where date_day >= date_add(current_date, -1) {% endif %} ) select user_id, max(date_day) as last_seen, current_date() as dt from events group by 1 ``` ###### Iceberg[​](#iceberg "Direct link to Iceberg") **Usage notes:** The `merge` with Iceberg incremental strategy requires: * To attach the AmazonEC2ContainerRegistryReadOnly Manged policy to your execution role : * To add the following policy to your execution role to enable commit locking in a dynamodb table (more info [here](https://iceberg.apache.org/docs/latest/aws/#dynamodb-lock-manager)). Note that the DynamoDB table specified in the resource field of this policy should be the one that is mentioned in your dbt profiles (`--conf spark.sql.catalog.glue_catalog.lock.table=myGlueLockTable`). By default, this table is named `myGlueLockTable` and is created automatically (with On-Demand Pricing) when running a dbt-glue model with Incremental Materialization and Iceberg file format. If you want to name the table differently or to create your own table without letting Glue do it on your behalf, please provide the `iceberg_glue_commit_lock_table` parameter with your table name (eg. `MyDynamoDbTable`) in your dbt profile. ```yaml iceberg_glue_commit_lock_table: "MyDynamoDbTable" ``` * the latest connector for iceberg in AWS marketplace uses Ver 0.14.0 for Glue 3.0, and Ver 1.2.1 for Glue 4.0 where Kryo serialization fails when writing iceberg, use "org.apache.spark.serializer.JavaSerializer" for spark.serializer instead, more info [here](https://github.com/apache/iceberg/pull/546) Make sure you update your conf with `--conf spark.sql.catalog.glue_catalog.lock.table=` and, you change the below iam permission with your correct table name. ```text { "Version": "2012-10-17", "Statement": [ { "Sid": "CommitLockTable", "Effect": "Allow", "Action": [ "dynamodb:CreateTable", "dynamodb:BatchGetItem", "dynamodb:BatchWriteItem", "dynamodb:ConditionCheckItem", "dynamodb:PutItem", "dynamodb:DescribeTable", "dynamodb:DeleteItem", "dynamodb:GetItem", "dynamodb:Scan", "dynamodb:Query", "dynamodb:UpdateItem" ], "Resource": "arn:aws:dynamodb:::table/myGlueLockTable" } ] } ``` * To add `file_format: Iceberg` in your table configuration * To add a datalake\_formats in your profile : `datalake_formats: iceberg` * Alternatively, to add connections in your profile: `connections: name_of_your_iceberg_connector` ( * For Athena version 3: * The adapter is compatible with the Iceberg Connector from AWS Marketplace with Glue 3.0 as Fulfillment option and 0.14.0 (Oct 11, 2022) as Software version) * the latest connector for iceberg in AWS marketplace uses Ver 0.14.0 for Glue 3.0, and Ver 1.2.1 for Glue 4.0 where Kryo serialization fails when writing iceberg, use "org.apache.spark.serializer.JavaSerializer" for spark.serializer instead, more info [here](https://github.com/apache/iceberg/pull/546) * For Athena version 2: The adapter is compatible with the Iceberg Connector from AWS Marketplace with Glue 3.0 as Fulfillment option and 0.12.0-2 (Feb 14, 2022) as Software version) * To add the following config in your Interactive Session Config (in your profile): ```--conf --conf spark.serializer=org.apache.spark.serializer.KryoSerializer --conf spark.sql.warehouse=s3:// --conf spark.sql.catalog.glue_catalog=org.apache.iceberg.spark.SparkCatalog --conf spark.sql.catalog.glue_catalog.catalog-impl=org.apache.iceberg.aws.glue.GlueCatalog --conf spark.sql.catalog.glue_catalog.io-impl=org.apache.iceberg.aws.s3.S3FileIO --conf spark.sql.catalog.glue_catalog.lock-impl=org.apache.iceberg.aws.dynamodb.DynamoDbLockManager --conf spark.sql.catalog.glue_catalog.lock.table=myGlueLockTable --conf spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions ``` * For Glue 3.0, set `spark.sql.catalog.glue_catalog.lock-impl` to `org.apache.iceberg.aws.glue.DynamoLockManager` instead dbt will run an [atomic `merge` statement](https://iceberg.apache.org/docs/latest/spark-writes/) which looks nearly identical to the default merge behavior on Snowflake and BigQuery. You need to provide a `unique_key` to perform merge operation otherwise it will fail. This key is to provide in a Python list format and can contains multiple column name to create a composite unique\_key. ###### Notes[​](#notes "Direct link to Notes") * When using a custom\_location in Iceberg, avoid to use final trailing slash. Adding a final trailing slash lead to an un-proper handling of the location, and issues when reading the data from query engines like Trino. The issue should be fixed for Iceberg version > 0.13. Related Github issue can be find [here](https://github.com/apache/iceberg/issues/4582). * Iceberg also supports `insert_overwrite` and `append` strategies. * The `warehouse` conf must be provided, but it's overwritten by the adapter `location` in your profile or `custom_location` in model configuration. * By default, this materialization has `iceberg_expire_snapshots` set to 'True', if you need to have historical auditable changes, set: `iceberg_expire_snapshots='False'`. * Currently, due to some dbt internal, the iceberg catalog used internally when running glue interactive sessions with dbt-glue has a hardcoded name `glue_catalog`. This name is an alias pointing to the AWS Glue Catalog but is specific to each session. If you want to interact with your data in another session without using dbt-glue (from a Glue Studio notebook, for example), you can configure another alias (ie. another name for the Iceberg Catalog). To illustrate this concept, you can set in your configuration file : ```text --conf spark.sql.catalog.RandomCatalogName=org.apache.iceberg.spark.SparkCatalog ``` And then run in an AWS Glue Studio Notebook a session with the following config: ```text --conf spark.sql.catalog.AnotherRandomCatalogName=org.apache.iceberg.spark.SparkCatalog ``` In both cases, the underlying catalog would be the AWS Glue Catalog, unique in your AWS Account and Region, and you would be able to work with the exact same data. Also make sure that if you change the name of the Glue Catalog Alias, you change it in all the other `--conf` where it's used: ```text --conf spark.sql.catalog.RandomCatalogName=org.apache.iceberg.spark.SparkCatalog --conf spark.sql.catalog.RandomCatalogName.catalog-impl=org.apache.iceberg.aws.glue.GlueCatalog ... --conf spark.sql.catalog.RandomCatalogName.lock-impl=org.apache.iceberg.aws.glue.DynamoLockManager ``` * A full reference to `table_properties` can be found [here](https://iceberg.apache.org/docs/latest/configuration/). * Iceberg Tables are natively supported by Athena. Therefore, you can query tables created and operated with dbt-glue adapter from Athena. * Incremental Materialization with Iceberg file format supports dbt snapshot. You are able to run a dbt snapshot command that queries an Iceberg Table and create a dbt fashioned snapshot of it. ###### Profile config example[​](#profile-config-example-2 "Direct link to Profile config example") ```yaml test_project: target: dev outputs: dev: type: glue query-comment: my comment role_arn: arn:aws:iam::1234567890:role/GlueInteractiveSessionRole region: eu-west-1 glue_version: "4.0" workers: 2 worker_type: G.1X schema: "dbt_test_project" session_provisioning_timeout_in_seconds: 120 location: "s3://aws-dbt-glue-datalake-1234567890-eu-west-1/" datalake_formats: iceberg conf: --conf spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions --conf spark.serializer=org.apache.spark.serializer.KryoSerializer --conf spark.sql.warehouse=s3://aws-dbt-glue-datalake-1234567890-eu-west-1/dbt_test_project --conf spark.sql.catalog.glue_catalog=org.apache.iceberg.spark.SparkCatalog --conf spark.sql.catalog.glue_catalog.catalog-impl=org.apache.iceberg.aws.glue.GlueCatalog --conf spark.sql.catalog.glue_catalog.io-impl=org.apache.iceberg.aws.s3.S3FileIO --conf spark.sql.catalog.glue_catalog.lock-impl=org.apache.iceberg.aws.dynamodb.DynamoDbLockManager --conf spark.sql.catalog.glue_catalog.lock.table=myGlueLockTable --conf spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions ``` ###### Source Code example[​](#source-code-example-2 "Direct link to Source Code example") ```sql {{ config( materialized='incremental', incremental_strategy='merge', unique_key=['user_id'], file_format='iceberg', iceberg_expire_snapshots='False', partition_by=['status'] table_properties={'write.target-file-size-bytes': '268435456'} ) }} with new_events as ( select * from {{ ref('events') }} {% if is_incremental() %} where date_day >= date_add(current_date, -1) {% endif %} ) select user_id, max(date_day) as last_seen from events group by 1 ``` ###### Iceberg Snapshot source code example[​](#iceberg-snapshot-source-code-example "Direct link to Iceberg Snapshot source code example") #### Monitoring your Glue Interactive Session[​](#monitoring-your-glue-interactive-session "Direct link to Monitoring your Glue Interactive Session") Monitoring is an important part of maintaining the reliability, availability, and performance of AWS Glue and your other AWS solutions. AWS provides monitoring tools that you can use to watch AWS Glue, identify the required number of workers required for your Glue Interactive Session, report when something is wrong and take action automatically when appropriate. AWS Glue provides Spark UI, and CloudWatch logs and metrics for monitoring your AWS Glue jobs. More information on: [Monitoring AWS Glue Spark jobs](https://docs.aws.amazon.com/glue/latest/dg/monitor-spark.html) **Usage notes:** Monitoring requires: * To add the following IAM policy to your IAM role: ```text { "Version": "2012-10-17", "Statement": [ { "Sid": "CloudwatchMetrics", "Effect": "Allow", "Action": "cloudwatch:PutMetricData", "Resource": "*", "Condition": { "StringEquals": { "cloudwatch:namespace": "Glue" } } }, { "Sid": "CloudwatchLogs", "Effect": "Allow", "Action": [ "s3:PutObject", "logs:CreateLogStream", "logs:CreateLogGroup", "logs:PutLogEvents" ], "Resource": [ "arn:aws:logs:*:*:/aws-glue/*", "arn:aws:s3:::bucket-to-write-sparkui-logs/*" ] } ] } ``` * To add monitoring parameters in your Interactive Session Config (in your profile). More information on [Job parameters used by AWS Glue](https://docs.aws.amazon.com/glue/latest/dg/aws-glue-programming-etl-glue-arguments.html) ###### Profile config example[​](#profile-config-example-3 "Direct link to Profile config example") ```yaml test_project: target: dev outputs: dev: type: glue query-comment: my comment role_arn: arn:aws:iam::1234567890:role/GlueInteractiveSessionRole region: eu-west-1 glue_version: "4.0" workers: 2 worker_type: G.1X schema: "dbt_test_project" session_provisioning_timeout_in_seconds: 120 location: "s3://aws-dbt-glue-datalake-1234567890-eu-west-1/" default_arguments: "--enable-metrics=true, --enable-continuous-cloudwatch-log=true, --enable-continuous-log-filter=true, --enable-spark-ui=true, --spark-event-logs-path=s3://bucket-to-write-sparkui-logs/dbt/" ``` If you want to use the Spark UI, you can launch the Spark history server using a AWS CloudFormation template that hosts the server on an EC2 instance, or launch locally using Docker. More information on [Launching the Spark history server](https://docs.aws.amazon.com/glue/latest/dg/monitor-spark-ui-history.html#monitor-spark-ui-history-local) #### Enabling AWS Glue Auto Scaling[​](#enabling-aws-glue-auto-scaling "Direct link to Enabling AWS Glue Auto Scaling") Auto Scaling is available since AWS Glue version 3.0 or later. More information on the following AWS blog post: ["Introducing AWS Glue Auto Scaling: Automatically resize serverless computing resources for lower cost with optimized Apache Spark"](https://aws.amazon.com/blogs/big-data/introducing-aws-glue-auto-scaling-automatically-resize-serverless-computing-resources-for-lower-cost-with-optimized-apache-spark/) With Auto Scaling enabled, you will get the following benefits: * AWS Glue automatically adds and removes workers from the cluster depending on the parallelism at each stage or microbatch of the job run. * It removes the need for you to experiment and decide on the number of workers to assign for your AWS Glue Interactive sessions. * Once you choose the maximum number of workers, AWS Glue will choose the right size resources for the workload. * You can see how the size of the cluster changes during the Glue Interactive sessions run by looking at CloudWatch metrics. More information on [Monitoring your Glue Interactive Session](#Monitoring-your-Glue-Interactive-Session). **Usage notes:** AWS Glue Auto Scaling requires: * To set your AWS Glue version 3.0 or later. * To set the maximum number of workers (if Auto Scaling is enabled, the `workers` parameter sets the maximum number of workers) * To set the `--enable-auto-scaling=true` parameter on your Glue Interactive Session Config (in your profile). More information on [Job parameters used by AWS Glue](https://docs.aws.amazon.com/glue/latest/dg/aws-glue-programming-etl-glue-arguments.html) ###### Profile config example[​](#profile-config-example-4 "Direct link to Profile config example") ```yaml test_project: target: dev outputs: dev: type: glue query-comment: my comment role_arn: arn:aws:iam::1234567890:role/GlueInteractiveSessionRole region: eu-west-1 glue_version: "3.0" workers: 2 worker_type: G.1X schema: "dbt_test_project" session_provisioning_timeout_in_seconds: 120 location: "s3://aws-dbt-glue-datalake-1234567890-eu-west-1/" default_arguments: "--enable-auto-scaling=true" ``` #### Access Glue catalog in another AWS account[​](#access-glue-catalog-in-another-aws-account "Direct link to Access Glue catalog in another AWS account") In many cases, you may need to run you dbt jobs to read from another AWS account. Review the following link to set up access policies in source and target accounts Add the following `"spark.hadoop.hive.metastore.glue.catalogid="` to your conf in the dbt profile, as such, you can have multiple outputs for each of the accounts that you have access to. Note: The access cross-accounts need to be within the same AWS Region ###### Profile config example[​](#profile-config-example-5 "Direct link to Profile config example") ```yaml test_project: target: dev outputsAccountB: dev: type: glue query-comment: my comment role_arn: arn:aws:iam::1234567890:role/GlueInteractiveSessionRole region: eu-west-1 glue_version: "3.0" workers: 2 worker_type: G.1X schema: "dbt_test_project" session_provisioning_timeout_in_seconds: 120 location: "s3://aws-dbt-glue-datalake-1234567890-eu-west-1/" conf: "--conf hive.metastore.client.factory.class=com.amazonaws.glue.catalog.metastore.AWSGlueDataCatalogHiveClientFactory --conf spark.hadoop.hive.metastore.glue.catalogid=" ``` #### Persisting model descriptions[​](#persisting-model-descriptions "Direct link to Persisting model descriptions") Relation-level docs persistence is supported. For more information on configuring docs persistence, see [the docs](https://docs.getdbt.com/reference/resource-configs/persist_docs.md). When the `persist_docs` option is configured appropriately, you'll be able to see model descriptions in the `Comment` field of `describe [table] extended` or `show table extended in [database] like '*'`. #### Always `schema`, never `database`[​](#always-schema-never-database "Direct link to always-schema-never-database") Apache Spark uses the terms "schema" and "database" interchangeably. dbt understands `database` to exist at a higher level than `schema`. As such, you should *never* use or set `database` as a node config or in the target profile when running dbt-glue. If you want to control the schema/database in which dbt will materialize models, use the `schema` config and `generate_schema_name` macro *only*. For more information, check the dbt documentation about [custom schemas](https://docs.getdbt.com/docs/build/custom-schemas.md). #### AWS Lakeformation integration[​](#aws-lakeformation-integration "Direct link to AWS Lakeformation integration") The adapter supports AWS Lake Formation tags management enabling you to associate existing tags defined out of dbt-glue to database objects built by dbt-glue (database, table, view, snapshot, incremental models, seeds). * You can enable or disable lf-tags management via config, at model and dbt-project level (disabled by default) * If enabled, lf-tags will be updated on every dbt run. There are table level lf-tags configs and column-level lf-tags configs. * You can specify that you want to drop existing database, table column Lake Formation tags by setting the drop\_existing config field to True (False by default, meaning existing tags are kept) * Please note that if the tag you want to associate with the table does not exist, the dbt-glue execution will throw an error The adapter also supports AWS Lakeformation data cell filtering. * You can enable or disable data-cell filtering via config, at model and dbt-project level (disabled by default) * If enabled, data\_cell\_filters will be updated on every dbt run. * You can specify that you want to drop existing table data-cell filters by setting the drop\_existing config field to True (False by default, meaning existing filters are kept) * You can leverage excluded\_columns\_names **OR** columns config fields to perform Column level security as well. **Please note that you can use one or the other but not both**. * By default, if you don't specify any column or excluded\_columns, dbt-glue does not perform Column level filtering and let the principal access all the columns. The below configuration let the specified principal (lf-data-scientist IAM user) access rows that have a customer\_lifetime\_value > 15 and all the columns specified ('customer\_id', 'first\_order', 'most\_recent\_order', 'number\_of\_orders') ```sql lf_grants={ 'data_cell_filters': { 'enabled': True, 'drop_existing' : True, 'filters': { 'the_name_of_my_filter': { 'row_filter': 'customer_lifetime_value>15', 'principals': ['arn:aws:iam::123456789:user/lf-data-scientist'], 'column_names': ['customer_id', 'first_order', 'most_recent_order', 'number_of_orders'] } }, } } ``` The below configuration let the specified principal (lf-data-scientist IAM user) access rows that have a customer\_lifetime\_value > 15 and all the columns *except* the one specified ('first\_name') ```sql lf_grants={ 'data_cell_filters': { 'enabled': True, 'drop_existing' : True, 'filters': { 'the_name_of_my_filter': { 'row_filter': 'customer_lifetime_value>15', 'principals': ['arn:aws:iam::123456789:user/lf-data-scientist'], 'excluded_column_names': ['first_name'] } }, } } ``` See below some examples of how you can integrate LF Tags management and data cell filtering to your configurations : ###### At model level[​](#at-model-level "Direct link to At model level") This way of defining your Lakeformation rules is appropriate if you want to handle the tagging and filtering policy at object level. Remember that it overrides any configuration defined at dbt-project level. ```sql {{ config( materialized='incremental', unique_key="customer_id", incremental_strategy='append', lf_tags_config={ 'enabled': true, 'drop_existing' : False, 'tags_database': { 'name_of_my_db_tag': 'value_of_my_db_tag' }, 'tags_table': { 'name_of_my_table_tag': 'value_of_my_table_tag' }, 'tags_columns': { 'name_of_my_lf_tag': { 'value_of_my_tag': ['customer_id', 'customer_lifetime_value', 'dt'] }}}, lf_grants={ 'data_cell_filters': { 'enabled': True, 'drop_existing' : True, 'filters': { 'the_name_of_my_filter': { 'row_filter': 'customer_lifetime_value>15', 'principals': ['arn:aws:iam::123456789:user/lf-data-scientist'], 'excluded_column_names': ['first_name'] } }, } } ) }} select customers.customer_id, customers.first_name, customers.last_name, customer_orders.first_order, customer_orders.most_recent_order, customer_orders.number_of_orders, customer_payments.total_amount as customer_lifetime_value, current_date() as dt from customers left join customer_orders using (customer_id) left join customer_payments using (customer_id) ``` ###### At dbt-project level[​](#at-dbt-project-level "Direct link to At dbt-project level") This way you can specify tags and data filtering policy for a particular path in your dbt project (eg. models, seeds, models/model\_group1, etc.) This is especially useful for seeds, for which you can't define configuration in the file directly. ```yml seeds: +lf_tags_config: enabled: true tags_table: name_of_my_table_tag: 'value_of_my_table_tag' tags_database: name_of_my_database_tag: 'value_of_my_database_tag' models: +lf_tags_config: enabled: true drop_existing: True tags_database: name_of_my_database_tag: 'value_of_my_database_tag' tags_table: name_of_my_table_tag: 'value_of_my_table_tag' ``` #### Tests[​](#tests "Direct link to Tests") To perform a functional test: 1. Install dev requirements: ```bash $ pip3 install -r dev-requirements.txt ``` 2. Install dev locally ```bash $ python3 setup.py build && python3 setup.py install_lib ``` 3. Export variables ```bash $ export DBT_S3_LOCATION=s3://mybucket/myprefix $ export DBT_ROLE_ARN=arn:aws:iam::1234567890:role/GlueInteractiveSessionRole ``` 4. Run the test ```bash $ python3 -m pytest tests/functional ``` For more information, check the dbt documentation about [testing a new adapter](https://docs.getdbt.com/guides/adapter-creation.md). #### Caveats[​](#caveats "Direct link to Caveats") ##### Supported Functionality[​](#supported-functionality "Direct link to Supported Functionality") Most dbt Core functionality is supported, but some features are only available with Apache Hudi. Apache Hudi-only features: 1. Incremental model updates by `unique_key` instead of `partition_by` (see [`merge` strategy](https://docs.getdbt.com/reference/resource-configs/glue-configs.md#the-merge-strategy)) Some dbt features, available on the core adapters, are not yet supported on Glue: 1. [Persisting](https://docs.getdbt.com/reference/resource-configs/persist_docs.md) column-level descriptions as database comments 2. [Snapshots](https://docs.getdbt.com/docs/build/snapshots.md) #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ##### Connect ClickHouse to dbt Core Some core functionality may be limited. If you're interested in contributing, check out the source code for each repository listed below. * **Maintained by**: Community * **Authors**: Geoff Genz & Bentsi Leviav * **GitHub repo**: [ClickHouse/dbt-clickhouse](https://github.com/ClickHouse/dbt-clickhouse) [![](https://img.shields.io/github/stars/ClickHouse/dbt-clickhouse?style=for-the-badge)](https://github.com/ClickHouse/dbt-clickhouse) * **PyPI package**: `dbt-clickhouse` [![](https://badge.fury.io/py/dbt-clickhouse.svg)](https://badge.fury.io/py/dbt-clickhouse) * **Slack channel**: [#db-clickhouse](https://getdbt.slack.com/archives/C01DRQ178LQ) * **Supported dbt Core version**: v0.19.0 and newer * **dbt support**: Not Supported * **Minimum data platform version**: n/a #### Installing dbt-clickhouse Use `pip` to install the adapter. Before 1.8, installing the adapter would automatically install `dbt-core` and any additional dependencies. Beginning in 1.8, installing an adapter does not automatically install `dbt-core`. This is because adapters and dbt Core versions have been decoupled from each other so we no longer want to overwrite existing dbt-core installations. Use the following command for installation: `python -m pip install dbt-core dbt-clickhouse` #### Configuring dbt-clickhouse For Clickhouse-specific configuration, please refer to [Clickhouse configs.](https://docs.getdbt.com/reference/resource-configs/clickhouse-configs.md) #### Connecting to ClickHouse[​](#connecting-to-clickhouse "Direct link to Connecting to ClickHouse") To connect to ClickHouse from dbt, you'll need to add a [profile](https://docs.getdbt.com/docs/local/profiles.yml.md) to your `profiles.yml` configuration file. Follow the reference configuration below to set up a ClickHouse profile: profiles.yml ```yaml clickhouse-service: target: dev outputs: dev: type: clickhouse schema: [ default ] # ClickHouse database for dbt models # optional host: [ ] # Your clickhouse cluster url for example, abc123.clickhouse.cloud. Defaults to `localhost`. port: [ 8123 ] # Defaults to 8123, 8443, 9000, 9440 depending on the secure and driver settings user: [ default ] # User for all database operations password: [ ] # Password for the user secure: [ False ] # Use TLS (native protocol) or HTTPS (http protocol) ``` For a complete list of configuration options, see the [ClickHouse documentation](https://clickhouse.com/docs/integrations/dbt). ##### Create a dbt project[​](#create-a-dbt-project "Direct link to Create a dbt project") You can now use this profile in one of your existing projects or create a new one using: ```sh dbt init project_name ``` Navigate to the `project_name` directory and update your `dbt_project.yml` file to use the profile you configured to connect to ClickHouse. ```yaml profile: 'clickhouse-service' ``` ##### Test connection[​](#test-connection "Direct link to Test connection") Execute `dbt debug` with the CLI tool to confirm whether dbt is able to connect to ClickHouse. Confirm the response includes `Connection test: [OK connection ok]`, indicating a successful connection. #### Supported features[​](#supported-features "Direct link to Supported features") ##### dbt features[​](#dbt-features "Direct link to dbt features") | Type | Supported? | Details | | --------------------- | ---------- | -------------------------- | | Contracts | YES | | | Docs generate | YES | | | Most dbt-utils macros | YES | (now included in dbt-core) | | Seeds | YES | | | Sources | YES | | | Snapshots | YES | | | Tests | YES | | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | ##### Materializations[​](#materializations "Direct link to Materializations") | Type | Supported? | Details | | --------------------------------------- | ----------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | Table | YES | Creates a [table](https://clickhouse.com/docs/en/operations/system-tables/tables/). See below for the list of supported engines. | | View | YES | Creates a [view](https://clickhouse.com/docs/en/sql-reference/table-functions/view/). | | Incremental | YES | Creates a table if it doesn't exist, and then writes only updates to it. | | Microbatch incremental | YES | | | Ephemeral materialization | YES | Creates a ephemeral/CTE materialization. This model is internal to dbt and does not create any database objects. | | Materialized View | YES, Experimental | Creates a [materialized view](https://clickhouse.com/docs/en/materialized-view). | | Distributed table materialization | YES, Experimental | Creates a [distributed table](https://clickhouse.com/docs/en/engines/table-engines/special/distributed). | | Distributed incremental materialization | YES, Experimental | Incremental model based on the same idea as distributed table. Note that not all strategies are supported, visit [this](https://github.com/ClickHouse/dbt-clickhouse?tab=readme-ov-file#distributed-incremental-materialization) for more info. | | Dictionary materialization | YES, Experimental | Creates a [dictionary](https://clickhouse.com/docs/en/engines/table-engines/special/dictionary). | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | **Note**: Community-developed features are labeled as experimental. Despite this designation, many of these features, like materialized views, are widely adopted and successfully used in production environments. #### Documentation[​](#documentation "Direct link to Documentation") See the [ClickHouse documentation](https://clickhouse.com/docs/integrations/dbt) for more details on using the `dbt-clickhouse` adapter to manage your data model. #### Contributing[​](#contributing "Direct link to Contributing") We welcome contributions from the community to help improve the `dbt-ClickHouse` adapter. Whether you're fixing a bug, adding a new feature, or enhancing the documentation, your efforts are greatly appreciated! Please take a moment to read our [Contribution Guide](https://github.com/ClickHouse/dbt-clickhouse/blob/main/CONTRIBUTING.md) to get started. The guide provides detailed instructions on setting up your environment, running tests, and submitting pull requests. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ##### Connect Cloudera Hive to dbt Core * **Maintained by**: Cloudera * **Authors**: Cloudera * **GitHub repo**: [cloudera/dbt-hive](https://github.com/cloudera/dbt-hive) [![](https://img.shields.io/github/stars/cloudera/dbt-hive?style=for-the-badge)](https://github.com/cloudera/dbt-hive) * **PyPI package**: `dbt-hive` [![](https://badge.fury.io/py/dbt-hive.svg)](https://badge.fury.io/py/dbt-hive) * **Slack channel**: [#db-hive](https://getdbt.slack.com/archives/C0401DTNSKW) * **Supported dbt Core version**: v1.1.0 and newer * **dbt support**: Not Supported * **Minimum data platform version**: n/a #### Installing dbt-hive Use `pip` to install the adapter. Before 1.8, installing the adapter would automatically install `dbt-core` and any additional dependencies. Beginning in 1.8, installing an adapter does not automatically install `dbt-core`. This is because adapters and dbt Core versions have been decoupled from each other so we no longer want to overwrite existing dbt-core installations. Use the following command for installation: `python -m pip install dbt-core dbt-hive` #### Configuring dbt-hive For Hive-specific configuration, please refer to [Hive configs.](https://docs.getdbt.com/reference/resource-configs/hive-configs.md) #### Connection Methods[​](#connection-methods "Direct link to Connection Methods") dbt-hive can connect to Apache Hive and Cloudera Data Platform clusters. The [Impyla](https://github.com/cloudera/impyla/) library is used to establish connections to Hive. dbt-hive supports two transport mechanisms: * binary * HTTP(S) The default mechanism is `binary`. To use HTTP transport, use the boolean option. For example, `use_http_transport: true`. #### Authentication Methods[​](#authentication-methods "Direct link to Authentication Methods") dbt-hive supports two authentication mechanisms: * [`insecure`](#Insecure) No authentication is used, only recommended for testing. * [`ldap`](#ldap) Authentication via LDAP ##### Insecure[​](#insecure "Direct link to Insecure") This method is only recommended if you have a local install of Hive and want to test out the dbt-hive adapter. \~/.dbt/profiles.yml ```yaml your_profile_name: target: dev outputs: dev: type: hive host: localhost port: PORT # default value: 10000 schema: SCHEMA_NAME ``` ##### LDAP[​](#ldap "Direct link to LDAP") LDAP allows you to authenticate with a username and password when Hive is [configured with LDAP Auth](https://cwiki.apache.org/confluence/display/Hive/Setting+Up+HiveServer2). LDAP is supported over Binary & HTTP connection mechanisms. This is the recommended authentication mechanism to use with Cloudera Data Platform (CDP). \~/.dbt/profiles.yml ```yaml your_profile_name: target: dev outputs: dev: type: hive host: HOST_NAME http_path: YOUR/HTTP/PATH # optional, http path to Hive default value: None port: PORT # default value: 10000 auth_type: ldap use_http_transport: BOOLEAN # default value: true use_ssl: BOOLEAN # TLS should always be used with LDAP to ensure secure transmission of credentials, default value: true user: USERNAME password: PASSWORD schema: SCHEMA_NAME ``` Note: When creating workload user in CDP, make sure the user has CREATE, SELECT, ALTER, INSERT, UPDATE, DROP, INDEX, READ, and WRITE permissions. If you need the user to execute GRANT statements, you should also configure the appropriate GRANT permissions for them. When using Apache Ranger, permissions for allowing GRANT are typically set using "Delegate Admin" option. For more information, see [`grants`](https://docs.getdbt.com/reference/resource-configs/grants.md) and [on-run-start & on-run-end](https://docs.getdbt.com/reference/project-configs/on-run-start-on-run-end.md). ##### Kerberos[​](#kerberos "Direct link to Kerberos") The Kerberos authentication mechanism uses GSSAPI to share Kerberos credentials when Hive is [configured with Kerberos Auth](https://ambari.apache.org/1.2.5/installing-hadoop-using-ambari/content/ambari-kerb-2-3-3.html). \~/.dbt/profiles.yml ```yaml your_profile_name: target: dev outputs: dev: type: hive host: HOSTNAME port: PORT # default value: 10000 auth_type: GSSAPI kerberos_service_name: KERBEROS_SERVICE_NAME # default value: None use_http_transport: BOOLEAN # default value: true use_ssl: BOOLEAN # TLS should always be used to ensure secure transmission of credentials, default value: true schema: SCHEMA_NAME ``` Note: A typical setup of Cloudera Private Cloud will involve the following steps to setup Kerberos before one can execute dbt commands: * Get the correct realm config file for your installation (krb5.conf) * Set environment variable to point to the config file (export KRB5\_CONFIG=/path/to/krb5.conf) * Set correct permissions for config file (sudo chmod 644 /path/to/krb5.conf) * Obtain keytab using kinit (kinit \) * The keytab is valid for certain period after which you will need to run kinit again to renew validity of the keytab. * User will need CREATE, DROP, INSERT permissions on the schema provided in profiles.yml ##### Instrumentation[​](#instrumentation "Direct link to Instrumentation") By default, the adapter will collect instrumentation events to help improve functionality and understand bugs. If you want to specifically switch this off, for instance, in a production environment, you can explicitly set the flag `usage_tracking: false` in your `profiles.yml` file. #### Installation and Distribution[​](#installation-and-distribution "Direct link to Installation and Distribution") dbt's adapter for Cloudera Hive is managed in its own repository, [dbt-hive](https://github.com/cloudera/dbt-hive). To use it, you must install the `dbt-hive` plugin. ##### Using pip[​](#using-pip "Direct link to Using pip") The following commands will install the latest version of `dbt-hive` as well as the requisite version of `dbt-core` and `impyla` driver used for connections. ```text python -m pip install dbt-hive ``` ##### Supported Functionality[​](#supported-functionality "Direct link to Supported Functionality") | Name | Supported | | ----------------------------------------------- | --------- | | Materialization: Table | Yes | | Materialization: View | Yes | | Materialization: Incremental - Append | Yes | | Materialization: Incremental - Insert+Overwrite | Yes | | Materialization: Incremental - Merge | No | | Materialization: Ephemeral | No | | Seeds | Yes | | Tests | Yes | | Snapshots | No | | Documentation | Yes | | Authentication: LDAP | Yes | | Authentication: Kerberos | Yes | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ##### Connect Cloudera Impala to dbt Core * **Maintained by**: Cloudera * **Authors**: Cloudera * **GitHub repo**: [cloudera/dbt-impala](https://github.com/cloudera/dbt-impala) [![](https://img.shields.io/github/stars/cloudera/dbt-impala?style=for-the-badge)](https://github.com/cloudera/dbt-impala) * **PyPI package**: `dbt-impala` [![](https://badge.fury.io/py/dbt-impala.svg)](https://badge.fury.io/py/dbt-impala) * **Slack channel**: [#db-impala](https://getdbt.slack.com/archives/C01PWAH41A5) * **Supported dbt Core version**: v1.1.0 and newer * **dbt support**: Not Supported * **Minimum data platform version**: n/a #### Installing dbt-impala Use `pip` to install the adapter. Before 1.8, installing the adapter would automatically install `dbt-core` and any additional dependencies. Beginning in 1.8, installing an adapter does not automatically install `dbt-core`. This is because adapters and dbt Core versions have been decoupled from each other so we no longer want to overwrite existing dbt-core installations. Use the following command for installation: `python -m pip install dbt-core dbt-impala` #### Configuring dbt-impala For Impala-specific configuration, please refer to [Impala configs.](https://docs.getdbt.com/reference/resource-configs/impala-configs.md) #### Connection Methods[​](#connection-methods "Direct link to Connection Methods") dbt-impala can connect to Apache Impala and Cloudera Data Platform clusters. The [Impyla](https://github.com/cloudera/impyla/) library is used to establish connections to Impala. Two transport mechanisms are supported: * binary * HTTP(S) The default mechanism is `binary`. To use HTTP transport, use the boolean option `use_http_transport: [true / false]`. #### Authentication Methods[​](#authentication-methods "Direct link to Authentication Methods") dbt-impala supports three authentication mechanisms: * [`insecure`](#Insecure) No authentication is used, only recommended for testing. * [`ldap`](#ldap) Authentication via LDAP * [`kerbros`](#kerbros) Authentication via Kerberos (GSSAPI) ##### Insecure[​](#insecure "Direct link to Insecure") This method is only recommended if you have a local install of Impala and want to test out the dbt-impala adapter. \~/.dbt/profiles.yml ```yaml your_profile_name: target: dev outputs: dev: type: impala host: [host] # default value: localhost port: [port] # default value: 21050 dbname: [db name] # this should be same as schema name provided below, starting with 1.1.2 this parameter is optional schema: [schema name] ``` ##### LDAP[​](#ldap "Direct link to LDAP") LDAP allows you to authenticate with a username & password when Impala is [configured with LDAP Auth](https://impala.apache.org/docs/build/html/topics/impala_ldap.html). LDAP is supported over Binary & HTTP connection mechanisms. This is the recommended authentication mechanism to use with Cloudera Data Platform (CDP). \~/.dbt/profiles.yml ```yaml your_profile_name: target: dev outputs: dev: type: impala host: [host name] http_path: [optional, http path to Impala] port: [port] # default value: 21050 auth_type: ldap use_http_transport: [true / false] # default value: true use_ssl: [true / false] # TLS should always be used with LDAP to ensure secure transmission of credentials, default value: true user: [username] password: [password] dbname: [db name] # this should be same as schema name provided below, starting with 1.1.2 this parameter is optional schema: [schema name] retries: [retries] # number of times Impala attempts retry connection to warehouse, default value: 3 ``` Note: When creating workload user in CDP ensure that the user has CREATE, SELECT, ALTER, INSERT, UPDATE, DROP, INDEX, READ, and WRITE permissions. If the user is required to execute GRANT statements, see for instance (/reference/resource-configs/grants) or (/reference/project-configs/on-run-start-on-run-end) appropriate GRANT permissions should be configured. When using Apache Ranger, permissions for allowing GRANT are typically set using "Delegate Admin" option. ##### Kerberos[​](#kerberos "Direct link to Kerberos") The Kerberos authentication mechanism uses GSSAPI to share Kerberos credentials when Impala is [configured with Kerberos Auth](https://impala.apache.org/docs/build/html/topics/impala_kerberos.html). \~/.dbt/profiles.yml ```yaml your_profile_name: target: dev outputs: dev: type: impala host: [hostname] port: [port] # default value: 21050 auth_type: [GSSAPI] kerberos_service_name: [kerberos service name] # default value: None use_http_transport: true # default value: true use_ssl: true # TLS should always be used with LDAP to ensure secure transmission of credentials, default value: true dbname: [db name] # this should be same as schema name provided below, starting with 1.1.2 this parameter is optional schema: [schema name] retries: [retries] # number of times Impala attempts retry connection to warehouse, default value: 3 ``` Note: A typical setup of Cloudera EDH will involve the following steps to setup Kerberos before one can execute dbt commands: * Get the correct realm config file for your installation (krb5.conf) * Set environment variable to point to the config file (export KRB5\_CONFIG=/path/to/krb5.conf) * Set correct permissions for config file (sudo chmod 644 /path/to/krb5.conf) * Obtain keytab using kinit (kinit \) * The keytab is valid for certain period after which you will need to run kinit again to renew validity of the keytab. ##### Instrumentation[​](#instrumentation "Direct link to Instrumentation") By default, the adapter will send instrumentation events to Cloudera to help improve functionality and understand bugs. If you want to specifically switch this off, for instance, in a production environment, you can explicitly set the flag `usage_tracking: false` in your `profiles.yml` file. Relatedly, if you'd like to turn off dbt Lab's anonymous usage tracking, see [YAML Configurations: Send anonymous usage stats](https://docs.getdbt.com/reference/global-configs/about-global-configs.md#send-anonymous-usage-stats) for more info. ##### Supported Functionality[​](#supported-functionality "Direct link to Supported Functionality") | Name | Supported | | ----------------------------------------------- | --------- | | Materialization: Table | Yes | | Materialization: View | Yes | | Materialization: Incremental - Append | Yes | | Materialization: Incremental - Insert+Overwrite | Yes | | Materialization: Incremental - Merge | No | | Materialization: Ephemeral | No | | Seeds | Yes | | Tests | Yes | | Snapshots | Yes | | Documentation | Yes | | Authentication: LDAP | Yes | | Authentication: Kerberos | Yes | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ##### Connect CrateDB to dbt Core * **Maintained by**: Crate.io, Inc. * **Authors**: CrateDB maintainers * **GitHub repo**: [crate/dbt-cratedb2](https://github.com/crate/dbt-cratedb2) [![](https://img.shields.io/github/stars/crate/dbt-cratedb2?style=for-the-badge)](https://github.com/crate/dbt-cratedb2) * **PyPI package**: `dbt-cratedb2` [![](https://badge.fury.io/py/dbt-cratedb2.svg)](https://badge.fury.io/py/dbt-cratedb2) * **Slack channel**: [Community Forum](https://community.cratedb.com/) * **Supported dbt Core version**: v1.0.0 and newer * **dbt support**: Not Supported * **Minimum data platform version**: n/a #### Installing dbt-cratedb2 Use `pip` to install the adapter. Before 1.8, installing the adapter would automatically install `dbt-core` and any additional dependencies. Beginning in 1.8, installing an adapter does not automatically install `dbt-core`. This is because adapters and dbt Core versions have been decoupled from each other so we no longer want to overwrite existing dbt-core installations. Use the following command for installation: `python -m pip install dbt-core dbt-cratedb2` #### Configuring dbt-cratedb2 For CrateDB-specific configuration, please refer to [CrateDB configs.](https://docs.getdbt.com/reference/resource-configs/no-configs.md) [CrateDB](https://cratedb.com/database) is compatible with PostgreSQL, so its dbt adapter strongly depends on dbt-postgres, documented at [PostgreSQL profile setup](https://docs.getdbt.com/docs/local/connect-data-platform/postgres-setup). CrateDB targets are configured exactly the same way, see also [PostgreSQL configuration](https://docs.getdbt.com/reference/resource-configs/postgres-configs), with just a few things to consider which are special to CrateDB. Relevant details are outlined at [using dbt with CrateDB](https://cratedb.com/docs/guide/integrate/dbt/), which also includes up-to-date information. #### Profile configuration[​](#profile-configuration "Direct link to Profile configuration") CrateDB targets should be set up using a configuration like this minimal sample of settings in your [`profiles.yml`](https://docs.getdbt.com/docs/local/profiles.yml) file. \~/.dbt/profiles.yml ```yaml cratedb_analytics: target: dev outputs: dev: type: cratedb host: [clustername].aks1.westeurope.azure.cratedb.net port: 5432 user: [username] pass: [password] dbname: crate # Do not change this value. CrateDB's only catalog is `crate`. schema: doc # Define the schema name. CrateDB's default schema is `doc`. ``` #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ##### Connect Databend Cloud to dbt Core Vendor-supported plugin Some [core functionality](https://github.com/databendcloud/dbt-databend#supported-features) may be limited. If you're interested in contributing, check out the source code repository listed below. * **Maintained by**: Databend Cloud * **Authors**: Shanjie Han * **GitHub repo**: [databendcloud/dbt-databend](https://github.com/databendcloud/dbt-databend) [![](https://img.shields.io/github/stars/databendcloud/dbt-databend?style=for-the-badge)](https://github.com/databendcloud/dbt-databend) * **PyPI package**: `dbt-databend-cloud` [![](https://badge.fury.io/py/dbt-databend-cloud.svg)](https://badge.fury.io/py/dbt-databend-cloud) * **Slack channel**:[]() * **Supported dbt Core version**: v1.0.0 and newer * **dbt support**: Not Supported * **Minimum data platform version**: n/a #### Installing dbt-databend-cloud Use `pip` to install the adapter. Before 1.8, installing the adapter would automatically install `dbt-core` and any additional dependencies. Beginning in 1.8, installing an adapter does not automatically install `dbt-core`. This is because adapters and dbt Core versions have been decoupled from each other so we no longer want to overwrite existing dbt-core installations. Use the following command for installation: `python -m pip install dbt-core dbt-databend-cloud` #### Configuring dbt-databend-cloud For Databend Cloud-specific configuration, please refer to [Databend Cloud configs.](https://docs.getdbt.com/reference/resource-configs/no-configs.md) #### Connecting to Databend Cloud with **dbt-databend-cloud**[​](#connecting-to-databend-cloud-with-dbt-databend-cloud "Direct link to connecting-to-databend-cloud-with-dbt-databend-cloud") ##### User / Password Authentication[​](#user--password-authentication "Direct link to User / Password Authentication") Configure your dbt profile for using Databend Cloud: ###### Databend Cloud connection profile[​](#databend-cloud-connection-profile "Direct link to Databend Cloud connection profile") profiles.yml ```yaml dbt-databend-cloud: target: dev outputs: dev: type: databend host: databend-cloud-host port: 443 schema: database_name user: username pass: password ``` ###### Description of Profile Fields[​](#description-of-profile-fields "Direct link to Description of Profile Fields") | Option | Description | Required? | Example | | ------ | -------------------------------------------------- | --------- | --------------------------- | | type | The specific adapter to use | Required | `databend` | | host | The host (hostname) to connect to | Required | `yourorg.datafusecloud.com` | | port | The port to use | Required | `443` | | schema | Specify the schema (database) to build models into | Required | `default` | | user | The username to use to connect to the host | Required | `dbt_admin` | | pass | The password to use for authenticating to the host | Required | `awesome_password` | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | #### Database User Privileges[​](#database-user-privileges "Direct link to Database User Privileges") Your database user would be able to have some abilities to read or write, such as `SELECT`, `CREATE`, and so on. You can find some help [here](https://docs.databend.com/using-databend-cloud/warehouses/connecting-a-warehouse) with Databend Cloud privileges management. | Required Privilege | | ---------------------- | | SELECT | | CREATE | | CREATE TEMPORARY TABLE | | CREATE VIEW | | INSERT | | DROP | | SHOW DATABASE | | SHOW VIEW | | SUPER | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | #### Supported features[​](#supported-features "Direct link to Supported features") | ok | Feature | | -- | --------------------------- | | ✅ | Table materialization | | ✅ | View materialization | | ✅ | Incremental materialization | | ❌ | Ephemeral materialization | | ✅ | Seeds | | ✅ | Sources | | ✅ | Custom data tests | | ✅ | Docs generate | | ❌ | Snapshots | | ✅ | Connection retry | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | **Note:** * Databend does not support `Ephemeral` and `SnapShot`. You can find more detail [here](https://github.com/datafuselabs/databend/issues/8685) #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ##### Connect Databricks Lakebase to dbt Core * **Maintained by**: dbt Labs * **Authors**: dbt Labs * **GitHub repo**: [dbt-labs/dbt-adapters](https://github.com/dbt-labs/dbt-adapters) [![](https://img.shields.io/github/stars/dbt-labs/dbt-adapters?style=for-the-badge)](https://github.com/dbt-labs/dbt-adapters) * **PyPI package**: `dbt-postgres` [![](https://badge.fury.io/py/dbt-postgres.svg)](https://badge.fury.io/py/dbt-postgres) * **Slack channel**: [#db-postgres](https://getdbt.slack.com/archives/C0172G2E273) * **Supported dbt Core version**: v1.0.0 and newer * **dbt support**: Supported * **Minimum data platform version**: ? #### Installing dbt-postgres Use `pip` to install the adapter. Before 1.8, installing the adapter would automatically install `dbt-core` and any additional dependencies. Beginning in 1.8, installing an adapter does not automatically install `dbt-core`. This is because adapters and dbt Core versions have been decoupled from each other so we no longer want to overwrite existing dbt-core installations. Use the following command for installation: `python -m pip install dbt-core dbt-postgres` #### Configuring dbt-postgres For Lakebase-specific configuration, please refer to [Lakebase configs.](https://docs.getdbt.com/reference/resource-configs/postgres-configs.md) #### Profile Configuration[​](#profile-configuration "Direct link to Profile Configuration") Databricks Lakebase targets are configured exactly the same as [Postgres targets](https://docs.getdbt.com/docs/local/connect-data-platform/postgres-setup.md#profile-configuration). Use these key parameters to connect to Databricks Lakebase: * `host name`: Found in **Databricks** > **Compute** > **Database instances** > **Connect with PSQL** using the format `instance-123abcdef456.database.cloud.databricks.com` * `database name`: Use `databricks_postgres` by default * Authentication: dbt-postgres only supports username/password. You can generate a username/password by [enabling Native Postgres Role Login](https://docs.databricks.com/aws/en/oltp/oauth?language=UI#authenticate-with-databricks-identities) and use the role name as the username. To learn more about managing the Postgres roles and privileges, check out the [docs](https://docs.databricks.com/aws/en/oltp/pg-roles#create-postgres-roles-and-grant-privileges-for-databricks-identities). Alternatively you can [generate an OAuth token](https://docs.databricks.com/aws/en/oltp/oauth?language=UI#authenticate-with-databricks-identities) that will need to be refreshed every hour to use with your Databricks username. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ##### Connect Decodable to dbt Core Community plugin Some core functionality may be limited. If you're interested in contributing, see the source code for the repository listed below. * **Maintained by**: Decodable * **Authors**: Decodable Team * **GitHub repo**: [decodableco/dbt-decodable](https://github.com/decodableco/dbt-decodable) [![](https://img.shields.io/github/stars/decodableco/dbt-decodable?style=for-the-badge)](https://github.com/decodableco/dbt-decodable) * **PyPI package**: `dbt-decodable` [![](https://badge.fury.io/py/dbt-decodable.svg)](https://badge.fury.io/py/dbt-decodable) * **Slack channel**: [#general](https://decodablecommunity.slack.com) * **Supported dbt Core version**: 1.3.1 and newer * **dbt support**: Not supported * **Minimum data platform version**: n/a #### Installing dbt-decodable Use `pip` to install the adapter. Before 1.8, installing the adapter would automatically install `dbt-core` and any additional dependencies. Beginning in 1.8, installing an adapter does not automatically install `dbt-core`. This is because adapters and dbt Core versions have been decoupled from each other so we no longer want to overwrite existing dbt-core installations. Use the following command for installation: `python -m pip install dbt-core dbt-decodable` #### Configuring dbt-decodable For Decodable-specific configuration, please refer to [Decodable configs.](https://docs.getdbt.com/reference/resource-configs/no-configs.md) #### Connecting to Decodable with **dbt-decodable**[​](#connecting-to-decodable-with-dbt-decodable "Direct link to connecting-to-decodable-with-dbt-decodable") Do the following steps to connect to Decodable with dbt. ##### Prerequisites[​](#prerequisites "Direct link to Prerequisites") In order to properly connect to Decodable, you must have the Decodable CLI installed and have used it to login to Decodable at least once. See [Install the Decodable CLI](https://docs.decodable.co/docs/setup#install-the-cli-command-line-interface) for more information. ##### Steps[​](#steps "Direct link to Steps") To connect to Decodable with dbt, you'll need to add a Decodable profile to your `profiles.yml` file. A Decodable profile has the following fields. \~/.dbt/profiles.yml ```yaml dbt-decodable: target: dev outputs: dev: type: decodable database: None schema: None account_name: [your account] profile_name: [name of the profile] materialize_tests: [true | false] timeout: [ms] preview_start: [earliest | latest] local_namespace: [namespace prefix] ``` ###### Description of Profile Fields[​](#description-of-profile-fields "Direct link to Description of Profile Fields") | Option | Description | Required? | Example | | ------------------ | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | --------- | ------------------------------ | | type | The specific adapter to use | Required | `decodable` | | database | Required but unused by this adapter. | Required | | | schema | Required but unused by this adapter. | Required | | | account\_name | The name of your Decodable account. | Required | `my_awesome_decodable_account` | | profile\_name | The name of your Decodable profile. | Required | `my_awesome_decodable_profile` | | materialize\_tests | Specify whether to materialize tests as a pipeline/stream pair. Defaults to false. | Optional | `false` | | timeout | The amount of time, in milliseconds, that a preview request runs. Defaults to 60000. | Optional | `60000` | | preview\_start | Specify where preview should start reading data from. If set to `earliest`, then preview will start reading from the earliest record possible. If set to `latest`, preview will start reading from the latest record. Defaults to `earliest`. | Optional | `latest` | | local\_namespace | Specify a prefix to add to all entities created on Decodable. Defaults to `none`, meaning that no prefix is added. | Optional | `none` | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | #### Supported features[​](#supported-features "Direct link to Supported features") | Name | Supported | Notes | | --------------------------- | --------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | Table materialization | Yes | Only table materialization are supported. A dbt table model translates to a pipeline/stream pair in Decodable, both sharing the same name. Pipelines for models are automatically activated upon materialization. To materialize your models, run the `dbt run` command which does the following: 1. Create a stream with the model's name and schema inferred by Decodable from the model's SQL.
2. Create a pipeline that inserts the SQL's results into the newly created stream.
3. Activate the pipeline. By default, the adapter does not tear down and recreate the model on Decodable if no changes to the model have been detected. Invoking dbt with the `--full-refresh` flag or setting that configuration option for a specific model causes the corresponding resources on Decodable to be destroyed and built from scratch. | | View materialization | No | | | Incremental materialization | No | | | Ephemeral materialization | No | | | Seeds | Yes | Running the `dbt seed` command performs the following steps for each specified seed: 1. Create a REST connection and an associated stream with the same name as the seed.
2. Activate the connection.
3. Send the data stored in the seed’s `.csv` file to the connection as events.
4. Deactivate the connection. After the `dbt seed` command has finished running, you can access the seed's data on the newly created stream. | | Tests | Yes | The `dbt test` command behaves differently depending on the `materialize_tests` option set for the specified target.

If `materialize_tests = false`, then tests are only run after the preview job has completed and returned results. How long a preview job takes as well as what records are returned are defined by the `timeout` and `preview_start` configurations respectively.

If `materialize_tests = true`, then dbt persists the specified tests as pipeline/stream pairs in Decodable. Use this configuration to allow for continuous testing of your models. You can run a preview on the created stream with the Decodable CLI or web interface to monitor the results. | | Sources | No | Sources in dbt correspond to Decodable source connections. However, the `dbt source` command is not supported. | | Docs generate | No | For details about your models, check your Decodable account. | | Snapshots | No | Snapshots and the `dbt snapshot` command are not supported. | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | #### Additional operations[​](#additional-operations "Direct link to Additional operations") `dbt-decodable` provides a set of commands for managing the project’s resources on Decodable. Those commands can be run using `dbt run-operation {name} --args {args}`. For example, the following command runs the `delete_streams` operation ```text dbt run-operation delete_streams --args '{streams: [stream1, stream2], skip_errors: True}' ``` **stop\_pipelines(pipelines)** * pipelines: An optional list of pipeline names to deactivate. Defaults to none. Deactivate all pipelines for resources defined within the project. If the pipelines argument is provided, then only the specified pipelines are deactivated.

**delete\_pipelines(pipelines)** * pipelines: An optional list of pipeline names to delete. Defaults to none. Delete all pipelines for resources defined within the project. If the pipelines argument is provided, then only the specified pipelines are deleted.

**delete\_streams(streams, skip\_errors)** * streams: An optional list of stream names to delete. Defaults to none. * skip\_errors: Specify whether to treat errors as warnings. When set to true, any stream deletion failures are reported as warnings. When set to false, the operation stops when a stream cannot be deleted. Defaults to true. Delete all streams for resources defined within the project. If a pipeline is associated with a stream, then neither the pipeline nor stream are deleted. See the cleanup operation for a complete removal of stream/pipeline pairs.

**cleanup(list, models, seeds, tests)** * list: An optional list of resource entity names to delete. Defaults to none. * models: Specify whether to include models during cleanup. Defaults to true. * seeds: Specify whether to include seeds during cleanup. Defaults to true. * tests: Specify whether to include tests during cleanup. Defaults to true.

Delete all Decodable entities resulting from the materialization of the project’s resources, i.e. connections, streams, and pipelines. If the list argument is provided, then only the specified resource entities are deleted. If the models, seeds, or test arguments are provided, then those resource types are also included in the cleanup. Tests that have not been materialized are not included in the cleanup. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ##### Connect DeltaStream to dbt Core * **Maintained by**: Community * **Authors**: DeltaStream Team * **GitHub repo**: [deltastreaminc/dbt-deltastream](https://github.com/deltastreaminc/dbt-deltastream) [![](https://img.shields.io/github/stars/deltastreaminc/dbt-deltastream?style=for-the-badge)](https://github.com/deltastreaminc/dbt-deltastream) * **PyPI package**: `dbt-deltastream` [![](https://badge.fury.io/py/dbt-deltastream.svg)](https://badge.fury.io/py/dbt-deltastream) * **Slack channel**: [#db-deltastream]() * **Supported dbt Core version**: v1.10.0 and newer * **dbt support**: Not supported * **Minimum data platform version**: ? #### Installing dbt-deltastream Use `pip` to install the adapter. Before 1.8, installing the adapter would automatically install `dbt-core` and any additional dependencies. Beginning in 1.8, installing an adapter does not automatically install `dbt-core`. This is because adapters and dbt Core versions have been decoupled from each other so we no longer want to overwrite existing dbt-core installations. Use the following command for installation: `python -m pip install dbt-core dbt-deltastream` #### Configuring dbt-deltastream For DeltaStream-specific configuration, please refer to [DeltaStream configs.](https://docs.getdbt.com/reference/resource-configs/deltastream-configs.md) #### Connecting to DeltaStream with **dbt-deltastream**[​](#connecting-to-deltastream-with-dbt-deltastream "Direct link to connecting-to-deltastream-with-dbt-deltastream") To connect to DeltaStream from dbt, you'll need to add a [profile](https://docs.getdbt.com/docs/local/profiles.yml.md) to your `profiles.yml` file. A DeltaStream profile conforms to the following syntax: profiles.yml ```yaml : target: outputs: : type: deltastream # Required parameters token: [ your-api-token ] # Authentication token for DeltaStream API database: [ your-database ] # Target database name schema: [ your-schema ] # Target schema name organization_id: [ your-org-id ] # Organization identifier # Optional parameters url: [ https://api.deltastream.io/v2 ] # DeltaStream API URL, defaults to https://api.deltastream.io/v2 timezone: [ UTC ] # Timezone for operations, defaults to UTC session_id: [ ] # Custom session identifier for debugging purpose role: [ ] # User role store: [ ] # Target store name compute_pool: [ ] # Compute pool name to be used if any else use the default compute pool ``` ##### Description of DeltaStream profile fields[​](#description-of-deltastream-profile-fields "Direct link to Description of DeltaStream profile fields") | Field | Required | Description | | ----------------- | -------- | ------------------------------------------------------------------------------------------------------------------------------ | | `type` | ✅ | This must be included either in `profiles.yml` or in the `dbt_project.yml` file. Must be set to `deltastream`. | | `token` | ✅ | Authentication token for DeltaStream API. This should be stored securely, preferably as an environment variable. | | `database` | ✅ | Target default database name in DeltaStream where your dbt models will be created. | | `schema` | ✅ | Target default schema name within the specified database. | | `organization_id` | ✅ | Organization identifier that determines which DeltaStream organization you're connecting to. | | `url` | ❌ | DeltaStream API URL. Defaults to `https://api.deltastream.io/v2` if not specified. | | `timezone` | ❌ | Timezone for operations. Defaults to `UTC` if not specified. | | `session_id` | ❌ | Custom session identifier for debugging purposes. Helps track operations in DeltaStream logs. | | `role` | ❌ | User role within the DeltaStream organization. If not specified, uses the default role associated with the token. | | `store` | ❌ | Target default store name. Stores represent external system connections (Kafka, PostgreSQL, etc.) in DeltaStream. | | `compute_pool` | ❌ | Compute pool name to be used for models that require computational resources. If not specified, uses the default compute pool. | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | #### Security best practices[​](#security-best-practices "Direct link to Security best practices") When configuring your project for production, it is strongly recommended to use environment variables to store sensitive information such as the authentication token: profiles.yml ```yaml your_profile_name: target: prod outputs: prod: type: deltastream token: "{{ env_var('DELTASTREAM_API_TOKEN') }}" database: "{{ env_var('DELTASTREAM_DATABASE') }}" schema: "{{ env_var('DELTASTREAM_SCHEMA') }}" organization_id: "{{ env_var('DELTASTREAM_ORG_ID') }}" ``` #### Troubleshooting connections[​](#troubleshooting-connections "Direct link to Troubleshooting connections") If you encounter issues connecting to DeltaStream from dbt, verify the following: ##### Authentication issues[​](#authentication-issues "Direct link to Authentication issues") * Ensure your API token is valid and has not expired * Verify the token has appropriate permissions for the target organization * Check that the `organization_id` matches your DeltaStream organization #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ##### Connect Doris to dbt Core * **Maintained by**: SelectDB * **Authors**: catpineapple,JNSimba * **GitHub repo**: [selectdb/dbt-doris](https://github.com/selectdb/dbt-doris) [![](https://img.shields.io/github/stars/selectdb/dbt-doris?style=for-the-badge)](https://github.com/selectdb/dbt-doris) * **PyPI package**: `dbt-doris` [![](https://badge.fury.io/py/dbt-doris.svg)](https://badge.fury.io/py/dbt-doris) * **Slack channel**: [#db-doris](https://www.getdbt.com/community) * **Supported dbt Core version**: v1.3.0 and newer * **dbt support**: Not Supported * **Minimum data platform version**: #### Installing dbt-doris Use `pip` to install the adapter. Before 1.8, installing the adapter would automatically install `dbt-core` and any additional dependencies. Beginning in 1.8, installing an adapter does not automatically install `dbt-core`. This is because adapters and dbt Core versions have been decoupled from each other so we no longer want to overwrite existing dbt-core installations. Use the following command for installation: `python -m pip install dbt-core dbt-doris` #### Configuring dbt-doris For Apache Doris / SelectDB-specific configuration, please refer to [Apache Doris / SelectDB configs.](https://docs.getdbt.com/reference/resource-configs/doris-configs.md) #### Connecting to Doris/SelectDB with **dbt-doris**[​](#connecting-to-dorisselectdb-with-dbt-doris "Direct link to connecting-to-dorisselectdb-with-dbt-doris") ##### User / Password Authentication[​](#user--password-authentication "Direct link to User / Password Authentication") Configure your dbt profile for using Doris: ###### Doris connection profile[​](#doris-connection-profile "Direct link to Doris connection profile") profiles.yml ```yaml dbt-doris: target: dev outputs: dev: type: doris host: 127.0.0.1 port: 9030 schema: database_name username: username password: password ``` ###### Description of Profile Fields[​](#description-of-profile-fields "Direct link to Description of Profile Fields") | Option | Description | Required? | Example | | -------- | -------------------------------------------------------------------------------------------------------------------------------- | --------- | ----------- | | type | The specific adapter to use | Required | `doris` | | host | The hostname to connect to | Required | `127.0.0.1` | | port | The port to use | Required | `9030` | | schema | Specify the schema (database) to build models into, doris have not schema to make a collection of table or view' like PostgreSql | Required | `dbt` | | username | The username to use to connect to the doris | Required | `root` | | password | The password to use for authenticating to the doris | Required | `password` | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | #### Database User Privileges[​](#database-user-privileges "Direct link to Database User Privileges") Your Doris/SelectDB database user would be able to have some abilities to read or write. You can find some help [here](https://doris.apache.org/docs/admin-manual/privilege-ldap/user-privilege) with Doris privileges management. | Required Privilege | | ------------------ | | Select\_priv | | Load\_priv | | Alter\_priv | | Create\_priv | | Drop\_priv | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ##### Connect Dremio to dbt Core * **Maintained by**: Dremio * **Authors**: Dremio * **GitHub repo**: [dremio/dbt-dremio](https://github.com/dremio/dbt-dremio) [![](https://img.shields.io/github/stars/dremio/dbt-dremio?style=for-the-badge)](https://github.com/dremio/dbt-dremio) * **PyPI package**: `dbt-dremio` [![](https://badge.fury.io/py/dbt-dremio.svg)](https://badge.fury.io/py/dbt-dremio) * **Slack channel**: [db-dremio](https://docs.getdbt.com/[https://www.getdbt.com/community]\(https://getdbt.slack.com/archives/C049G61TKBK\)) * **Supported dbt Core version**: v1.8.0 and newer * **dbt support**: Not Supported * **Minimum data platform version**: Dremio 22.0 #### Installing dbt-dremio Use `pip` to install the adapter. Before 1.8, installing the adapter would automatically install `dbt-core` and any additional dependencies. Beginning in 1.8, installing an adapter does not automatically install `dbt-core`. This is because adapters and dbt Core versions have been decoupled from each other so we no longer want to overwrite existing dbt-core installations. Use the following command for installation: `python -m pip install dbt-core dbt-dremio` #### Configuring dbt-dremio For Dremio-specific configuration, please refer to [Dremio configs.](https://docs.getdbt.com/reference/resource-configs/no-configs.md) Follow the repository's link for OS dependencies. note [Model contracts](https://docs.getdbt.com/docs/mesh/govern/model-contracts.md) are not supported. #### Prerequisites for Dremio Cloud[​](#prerequisites-for-dremio-cloud "Direct link to Prerequisites for Dremio Cloud") Before connecting from project to Dremio Cloud, follow these prerequisite steps: * Ensure that you have the ID of the Sonar project that you want to use. See [Obtaining the ID of a Project](https://docs.dremio.com/cloud/cloud-entities/projects/#obtaining-the-id-of-a-project). * Ensure that you have a personal access token (PAT) for authenticating to Dremio Cloud. See [Creating a Token](https://docs.dremio.com/cloud/security/authentication/personal-access-token/#creating-a-token). * Ensure that Python 3.9.x or later is installed on the system that you are running dbt on. #### Prerequisites for Dremio Software[​](#prerequisites-for-dremio-software "Direct link to Prerequisites for Dremio Software") * Ensure that you are using version 22.0 or later. * Ensure that Python 3.9.x or later is installed on the system that you are running dbt on. * If you want to use TLS to secure the connection between dbt and Dremio Software, configure full wire encryption in your Dremio cluster. For instructions, see [Configuring Wire Encryption](https://docs.dremio.com/software/deployment/wire-encryption-config/). #### Initializing a Project[​](#initializing-a-project "Direct link to Initializing a Project") 1. Run the command `dbt init `. 2. Select `dremio` as the database to use. 3. Select one of these options to generate a profile for your project: * `dremio_cloud` for working with Dremio Cloud * `software_with_username_password` for working with a Dremio Software cluster and authenticating to the cluster with a username and a password * `software_with_pat` for working with a Dremio Software cluster and authenticating to the cluster with a personal access token Next, configure the profile for your project. #### Profiles[​](#profiles "Direct link to Profiles") When you initialize a project, you create one of these three profiles. You must configure it before trying to connect to Dremio Cloud or Dremio Software. * Profile for Dremio Cloud * Profile for Dremio Software with Username/Password Authentication * Profile for Dremio Software with Authentication Through a Personal Access Token For descriptions of the configurations in these profiles, see [Configurations](#configurations). * Cloud * Software (Username/Password) * Software (Personal Access Token) ```yaml [project name]: outputs: dev: cloud_host: api.dremio.cloud cloud_project_id: [project ID] object_storage_source: [name] object_storage_path: [path] dremio_space: [name] dremio_space_folder: [path] pat: [personal access token] threads: [integer >= 1] type: dremio use_ssl: true user: [email address] target: dev ``` ```yaml [project name]: outputs: dev: password: [password] port: [port] software_host: [hostname or IP address] object_storage_source: [name object_storage_path: [path] dremio_space: [name] dremio_space_folder: [path] threads: [integer >= 1] type: dremio use_ssl: [true|false] user: [username] target: dev ``` ```yaml [project name]: outputs: dev: pat: [personal access token] port: [port] software_host: [hostname or IP address] object_storage_source: [name object_storage_path: [path] dremio_space: [name] dremio_space_folder: [path] threads: [integer >= 1] type: dremio use_ssl: [true|false] user: [username] target: dev ``` #### Configurations Common to Profiles for Dremio Cloud and Dremio Software[​](#configurations-common-to-profiles-for-dremio-cloud-and-dremio-software "Direct link to Configurations Common to Profiles for Dremio Cloud and Dremio Software") | Configuration | Required? | Default Value | Description | | ----------------------- | --------- | -------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | `type` | Yes | dremio | Auto-populated when creating a Dremio project. Do not change this value. | | `threads` | Yes | 1 | The number of threads the dbt project runs on. | | `object_storage_source` | No | $scratch | The name of the filesystem in which to create tables, materialized views, tests, and other objects. The dbt alias is `datalake`. This name corresponds to the name of a source in the **Object Storage** section of the Datasets page in Dremio, which is "Samples" in the following image: ![dbt samples path](/assets/images/dbt-Samples-91e0c8e2d2bf58047d38b76202049e8a.png) | | `object_storage_path` | No | `no_schema` | The path in the filesystem in which to create objects. The default is the root level of the filesystem. The dbt alias is `root_path`. Nested folders in the path are separated with periods. This value corresponds to the path in this location in the Datasets page in Dremio, which is "samples.dremio.com.Dremio University" in the following image: ![dbt samples path](/assets/images/dbt-SamplesPath-b7c23597a4bbb05010de7a9a1d938da4.png) | | `dremio_space` | No | `@\` | The value of the Dremio space in which to create views. The dbt alias is `database`. This value corresponds to the name in this location in the **Spaces** section of the Datasets page in Dremio: ![dbt spaces](/assets/images/dbt-Spaces-3b4197940bedfd0a179d74eef56df501.png) | | `dremio_space_folder` | No | `no_schema` | The folder in the Dremio space in which to create views. The default is the top level in the space. The dbt alias is `schema`. Nested folders are separated with periods. This value corresponds to the path in this location in the Datasets page in Dremio, which is `Folder1.Folder2` in the following image: ![Folder1.Folder2](/assets/images/dbt-SpacesPath-6f785309a289a80a338c03c0bf2879ea.png) | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | ##### Configurations in Profiles for Dremio Cloud[​](#configurations-in-profiles-for-dremio-cloud "Direct link to Configurations in Profiles for Dremio Cloud") | Configuration | Required? | Default Value | Description | | ------------------ | --------- | ------------------ | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | `cloud_host` | Yes | `api.dremio.cloud` | US Control Plane: `api.dremio.cloud`
EU Control Plane: `api.eu.dremio.cloud` | | `user` | Yes | None | Email address used as a username in Dremio Cloud | | `pat` | Yes | None | The personal access token to use for authentication. See [Personal Access Tokens](https://docs.dremio.com/cloud/security/authentication/personal-access-token/) for instructions about obtaining a token. | | `cloud_project_id` | Yes | None | The ID of the Sonar project in which to run transformations. | | `use_ssl` | Yes | `true` | The value must be `true`. | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | ##### Configurations in Profiles for Dremio Software[​](#configurations-in-profiles-for-dremio-software "Direct link to Configurations in Profiles for Dremio Software") | Configuration | Required? | Default Value | Description | | --------------- | --------------------------------------------------------------- | ------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | | `software_host` | Yes | None | The hostname or IP address of the coordinator node of the Dremio cluster. | | `port` | Yes | `9047` | Port for Dremio Software cluster API endpoints. | | `user` | Yes | None | The username of the account to use when logging into the Dremio cluster. | | `password` | Yes, if you are not using the pat configuration. | None | The password of the account to use when logging into the Dremio cluster. | | `pat` | Yes, if you are not using the user and password configurations. | None | The personal access token to use for authenticating to Dremio. See [Personal Access Tokens](https://docs.dremio.com/software/security/personal-access-tokens/) for instructions about obtaining a token. The use of a personal access token takes precedence if values for the three configurations user, password, and pat are specified. | | `use_ssl` | Yes | `true` | Acceptable values are `true` and `false`. If the value is set to true, ensure that full wire encryption is configured in your Dremio cluster. See [Prerequisites for Dremio Software](#prerequisites-for-dremio-software). | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ##### Connect DuckDB to dbt Core Community plugin Some core functionality may be limited. If you're interested in contributing, check out the source code for each repository listed below. * **Maintained by**: Community * **Authors**: Josh Wills (https://github.com/jwills) * **GitHub repo**: [duckdb/dbt-duckdb](https://github.com/duckdb/dbt-duckdb) [![](https://img.shields.io/github/stars/duckdb/dbt-duckdb?style=for-the-badge)](https://github.com/duckdb/dbt-duckdb) * **PyPI package**: `dbt-duckdb` [![](https://badge.fury.io/py/dbt-duckdb.svg)](https://badge.fury.io/py/dbt-duckdb) * **Slack channel**: [#db-duckdb](https://getdbt.slack.com/archives/C039D1J1LA2) * **Supported dbt Core version**: v1.0.1 and newer * **dbt support**: Not Supported * **Minimum data platform version**: DuckDB 0.3.2 #### Installing dbt-duckdb Use `pip` to install the adapter. Before 1.8, installing the adapter would automatically install `dbt-core` and any additional dependencies. Beginning in 1.8, installing an adapter does not automatically install `dbt-core`. This is because adapters and dbt Core versions have been decoupled from each other so we no longer want to overwrite existing dbt-core installations. Use the following command for installation: `python -m pip install dbt-core dbt-duckdb` #### Configuring dbt-duckdb For Duck DB-specific configuration, please refer to [Duck DB configs.](https://docs.getdbt.com/reference/resource-configs/no-configs.md) #### Connecting to DuckDB with dbt-duckdb[​](#connecting-to-duckdb-with-dbt-duckdb "Direct link to Connecting to DuckDB with dbt-duckdb") [DuckDB](http://duckdb.org) is an embedded database, similar to SQLite, but designed for OLAP-style analytics instead of OLTP. The only configuration parameter that is required in your profile (in addition to `type: duckdb`) is the `path` field, which should refer to a path on your local filesystem where you would like the DuckDB database file (and it's associated write-ahead log) to be written. You can also specify the `schema` parameter if you would like to use a schema besides the default (which is called `main`). There is also a `database` field defined in the `DuckDBCredentials` class for consistency with the parent `Credentials` class, but it defaults to `main` and setting it to be something else will likely cause strange things to happen that cannot be fully predicted, so please avoid changing it. As of version 1.2.3, you can load any supported [DuckDB extensions](https://duckdb.org/docs/extensions/overview) by listing them in the `extensions` field in your profile. You can also set any additional [DuckDB configuration options](https://duckdb.org/docs/sql/configuration) via the `settings` field, including options that are supported in any loaded extensions. For example, to be able to connect to `s3` and read/write `parquet` files using an AWS access key and secret, your profile would look something like this: profiles.yml ```yaml your_profile_name: target: dev outputs: dev: type: duckdb path: 'file_path/database_name.duckdb' extensions: - httpfs - parquet settings: s3_region: my-aws-region s3_access_key_id: "{{ env_var('S3_ACCESS_KEY_ID') }}" s3_secret_access_key: "{{ env_var('S3_SECRET_ACCESS_KEY') }}" ``` #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ##### Connect Exasol to dbt Core * **Maintained by**: Exasol * **Authors**: Torsten Glunde, Ilija Kutle * **GitHub repo**: [exasol/dbt-exasol](https://github.com/exasol/dbt-exasol) [![](https://img.shields.io/github/stars/exasol/dbt-exasol?style=for-the-badge)](https://github.com/exasol/dbt-exasol) * **PyPI package**: `dbt-exasol` [![](https://badge.fury.io/py/dbt-exasol.svg)](https://badge.fury.io/py/dbt-exasol) * **Slack channel**: [n/a]() * **Supported dbt Core version**: v1.8.0 and newer * **dbt support**: Not Supported * **Minimum data platform version**: Exasol 6.x #### Installing dbt-exasol Use `pip` to install the adapter. Before 1.8, installing the adapter would automatically install `dbt-core` and any additional dependencies. Beginning in 1.8, installing an adapter does not automatically install `dbt-core`. This is because adapters and dbt Core versions have been decoupled from each other so we no longer want to overwrite existing dbt-core installations. Use the following command for installation: `python -m pip install dbt-core dbt-exasol` #### Configuring dbt-exasol For Exasol-specific configuration, please refer to [Exasol configs.](https://docs.getdbt.com/reference/resource-configs/exasol-configs.md) ##### Connecting to Exasol with **dbt-exasol**[​](#connecting-to-exasol-with-dbt-exasol "Direct link to connecting-to-exasol-with-dbt-exasol") ###### User / password authentication[​](#user--password-authentication "Direct link to User / password authentication") Configure your dbt profile for using Exasol: ###### Exasol connection information[​](#exasol-connection-information "Direct link to Exasol connection information") profiles.yml ```yaml dbt-exasol: target: dev outputs: dev: type: exasol threads: 1 dsn: HOST:PORT user: USERNAME password: PASSWORD dbname: db schema: SCHEMA ``` ###### OpenID authentication (Exasol SaaS)[​](#open-id-authentication "Direct link to OpenID authentication (Exasol SaaS)") For Exasol SaaS environments, you can authenticate using OpenID tokens instead of username and password: profiles.yml ```yaml dbt-exasol: target: dev outputs: dev: type: exasol threads: 1 dsn: HOST:PORT user: USERNAME access_token: YOUR_ACCESS_TOKEN # or use refresh_token dbname: db schema: SCHEMA encryption: True # required for SaaS ``` * **`access_token`** — Personal access token for OpenID authentication * **`refresh_token`** — Refresh token for OpenID authentication (alternative to `access_token`) info Use either `access_token` or `refresh_token`, not both. TLS encryption is required when using OpenID authentication with Exasol SaaS. ###### Optional parameters[​](#optional-parameters "Direct link to Optional parameters") * **`connection_timeout`** — defaults to pyexasol default * **`socket_timeout`** — defaults to pyexasol default * **`query_timeout`** — defaults to pyexasol default * **`compression`** — default: False * **`encryption`** — default: True. Enables SSL/TLS encryption for secure connections. Required for Exasol SaaS * **`validate_server_certificate`** — default: True. Validates the SSL/TLS certificate when encryption is enabled. Set to False only for development/testing with self-signed certificates (not recommended for production) * **`protocol_version`** — default: v3 * **`row_separator`** — default: CRLF for windows - LF otherwise * **`timestamp_format`** — default: `YYYY-MM-DDTHH:MI:SS.FF6` SSL/TLS Certificate Validation By default, dbt-exasol validates SSL/TLS certificates when `encryption=True`. For development/testing with self-signed certificates, you can either: * Set `validate_server_certificate: False` (not recommended for production) * Use a certificate fingerprint in the DSN: `dsn: myhost/FINGERPRINT:8563` * Use `dsn: myhost/nocertcheck:8563` to skip validation (testing only) For more information, see the [PyExasol security documentation](https://exasol.github.io/pyexasol/master/user_guide/configuration/security.html). #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ##### Connect Extrica to dbt Core #### Overview of dbt-extrica * **Maintained by**: Extrica, Trianz * **Authors**: Gaurav Mittal, Viney Kumar, Mohammed Feroz, and Mrinal Mayank * **GitHub repo**: [extricatrianz/dbt-extrica](https://github.com/extricatrianz/dbt-extrica) * **PyPI package**: `dbt-extrica` [![](https://badge.fury.io/py/dbt-extrica.svg)](https://badge.fury.io/py/dbt-extrica) * **Supported dbt Core version**: v1.7.2 and newer * **dbt support**: Not Supported * **Minimum data platform version**: n/a #### Installing dbt-extrica Use `pip` to install the adapter, which automatically installs `dbt-core` and any additional dependencies. Use the following command for installation: `python -m pip install dbt-extrica` #### Connecting to Extrica ###### Example profiles.yml[​](#example-profilesyml "Direct link to Example profiles.yml") Here is an example of dbt-extrica profiles. At a minimum, you need to specify `type`, `method`, `username`, `password` `host`, `port`, `schema`, `catalog` and `threads`. \~/.dbt/profiles.yml ```yaml : outputs: dev: type: extrica method: jwt username: [username for jwt auth] password: [password for jwt auth] host: [extrica hostname] port: [port number] schema: [dev_schema] catalog: [catalog_name] threads: [1 or more] prod: type: extrica method: jwt username: [username for jwt auth] password: [password for jwt auth] host: [extrica hostname] port: [port number] schema: [dev_schema] catalog: [catalog_name] threads: [1 or more] target: dev ``` ###### Description of Extrica Profile Fields[​](#description-of-extrica-profile-fields "Direct link to Description of Extrica Profile Fields") | Parameter | Type | Description | | --------- | ------- | -------------------------------------------------------------------------------------------------------------------- | | type | string | Specifies the type of dbt adapter (Extrica). | | method | jwt | Authentication method for JWT authentication. | | username | string | Username for JWT authentication. The obtained JWT token is used to initialize a trino.auth.JWTAuthentication object. | | password | string | Password for JWT authentication. The obtained JWT token is used to initialize a trino.auth.JWTAuthentication object. | | host | string | The host parameter specifies the hostname or IP address of the Extrica's Trino server. | | port | integer | The port parameter specifies the port number on which the Extrica's Trino server is listening. | | schema | string | Schema or database name for the connection. | | catalog | string | Name of the catalog representing the data source. | | threads | integer | Number of threads for parallel execution of queries. (1 or more) | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ##### Connect Firebolt to dbt Core Some core functionality may be limited. If you're interested in contributing, check out the source code for the repository listed below. * **Maintained by**: Firebolt * **Authors**: Firebolt * **GitHub repo**: [firebolt-db/dbt-firebolt](https://github.com/firebolt-db/dbt-firebolt) [![](https://img.shields.io/github/stars/firebolt-db/dbt-firebolt?style=for-the-badge)](https://github.com/firebolt-db/dbt-firebolt) * **PyPI package**: `dbt-firebolt` [![](https://badge.fury.io/py/dbt-firebolt.svg)](https://badge.fury.io/py/dbt-firebolt) * **Slack channel**: [#db-firebolt](https://getdbt.slack.com/archives/C03K2PTHHTP) * **Supported dbt Core version**: v1.1.0 and newer * **dbt support**: Not Supported * **Minimum data platform version**: n/a #### Installing dbt-firebolt Use `pip` to install the adapter. Before 1.8, installing the adapter would automatically install `dbt-core` and any additional dependencies. Beginning in 1.8, installing an adapter does not automatically install `dbt-core`. This is because adapters and dbt Core versions have been decoupled from each other so we no longer want to overwrite existing dbt-core installations. Use the following command for installation: `python -m pip install dbt-core dbt-firebolt` #### Configuring dbt-firebolt For Firebolt-specific configuration, please refer to [Firebolt configs.](https://docs.getdbt.com/reference/resource-configs/firebolt-configs.md) For other information including Firebolt feature support, see the [GitHub README](https://github.com/firebolt-db/dbt-firebolt/blob/main/README.md) and the [changelog](https://github.com/firebolt-db/dbt-firebolt/blob/main/CHANGELOG.md). #### Connecting to Firebolt[​](#connecting-to-firebolt "Direct link to Connecting to Firebolt") To connect to Firebolt from dbt, you'll need to add a [profile](https://docs.getdbt.com/docs/local/profiles.yml.md) to your `profiles.yml` file. A Firebolt profile conforms to the following syntax: profiles.yml ```yml : target: outputs: : type: firebolt client_id: "" client_secret: "" database: "" engine_name: "" account_name: "" schema: threads: 1 #optional fields host: "" ``` ###### Description of Firebolt Profile Fields[​](#description-of-firebolt-profile-fields "Direct link to Description of Firebolt Profile Fields") To specify values as environment variables, use the format `{{ env_var('' }}`. For example, `{{ env_var('DATABASE_NAME' }}`. | Field | Description | | --------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | `type` | This must be included either in `profiles.yml` or in the `dbt_project.yml` file. Must be set to `firebolt`. | | `client_id` | Required. Your [service account](https://docs.firebolt.io/godocs/Guides/managing-your-organization/service-accounts.html) id. | | `client_secret` | Required. The secret associated with the specified `client_id`. | | `database` | Required. The name of the Firebolt database to connect to. | | `engine_name` | Required. The name (not the URL) of the Firebolt engine to use in the specified `database`. This must be a general purpose read-write engine and the engine must be running. If omitted in earlier versions, the default engine for the specified `database` is used. | | `account_name` | Required. Specifies the account name under which the specified `database` exists. | | `schema` | Recommended. A string to add as a prefix to the names of generated tables when using the [custom schemas workaround](https://docs.getdbt.com/docs/local/connect-data-platform/firebolt-setup.md#supporting-concurrent-development). | | `threads` | Required. Set to higher number to improve performance. | | `host` | Optional. The host name of the connection. For all customers it is `api.app.firebolt.io`, which will be used if omitted. | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | ###### Troubleshooting Connections[​](#troubleshooting-connections "Direct link to Troubleshooting Connections") If you encounter issues connecting to Firebolt from dbt, make sure the following criteria are met: * You must have adequate permissions to access the engine and the database. * Your service account must be attached to a user. * The engine must be running. #### Supporting Concurrent Development[​](#supporting-concurrent-development "Direct link to Supporting Concurrent Development") In dbt, database schemas are used to compartmentalize developer environments so that concurrent development does not cause table name collisions. Firebolt, however, does not currently support database schemas (it is on the roadmap). To work around this, we recommend that you add the following macro to your project. This macro will take the `schema` field of your `profiles.yml` file and use it as a table name prefix. ```sql -- macros/generate_alias_name.sql {% macro generate_alias_name(custom_alias_name=none, node=none) -%} {%- if custom_alias_name is none -%} {{ node.schema }}__{{ node.name }} {%- else -%} {{ node.schema }}__{{ custom_alias_name | trim }} {%- endif -%} {%- endmacro %} ``` For an example of how this works, let’s say Shahar and Eric are both working on the same project. In her `.dbt/profiles.yml`, Sharar sets `schema=sh`, whereas Eric sets `schema=er` in his. When each runs the `customers` model, the models will land in the database as tables named `sh_customers` and `er_customers`, respectively. When running dbt in production, you would use yet another `profiles.yml` with a string of your choice. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ##### Connect Greenplum to dbt Core * **Maintained by**: Community * **Authors**: Mark Poroshin, Dmitry Bevz * **GitHub repo**: [markporoshin/dbt-greenplum](https://github.com/markporoshin/dbt-greenplum) [![](https://img.shields.io/github/stars/markporoshin/dbt-greenplum?style=for-the-badge)](https://github.com/markporoshin/dbt-greenplum) * **PyPI package**: `dbt-greenplum` [![](https://badge.fury.io/py/dbt-greenplum.svg)](https://badge.fury.io/py/dbt-greenplum) * **Slack channel**: [n/a](https://www.getdbt.com/community) * **Supported dbt Core version**: v1.0.0 and newer * **dbt support**: Not Supported * **Minimum data platform version**: Greenplum 6.0 #### Installing dbt-greenplum Use `pip` to install the adapter. Before 1.8, installing the adapter would automatically install `dbt-core` and any additional dependencies. Beginning in 1.8, installing an adapter does not automatically install `dbt-core`. This is because adapters and dbt Core versions have been decoupled from each other so we no longer want to overwrite existing dbt-core installations. Use the following command for installation: `python -m pip install dbt-core dbt-greenplum` #### Configuring dbt-greenplum For Greenplum-specific configuration, please refer to [Greenplum configs.](https://docs.getdbt.com/reference/resource-configs/greenplum-configs.md) For further (and more likely up-to-date) info, see the [README](https://github.com/markporoshin/dbt-greenplum#README.md) #### Profile Configuration[​](#profile-configuration "Direct link to Profile Configuration") Greenplum targets should be set up using the following configuration in your `profiles.yml` file. \~/.dbt/profiles.yml ```yaml company-name: target: dev outputs: dev: type: greenplum host: [hostname] user: [username] password: [password] port: [port] dbname: [database name] schema: [dbt schema] threads: [1 or more] keepalives_idle: 0 # default 0, indicating the system default. See below connect_timeout: 10 # default 10 seconds search_path: [optional, override the default postgres search_path] role: [optional, set the role dbt assumes when executing queries] sslmode: [optional, set the sslmode used to connect to the database] ``` ##### Notes[​](#notes "Direct link to Notes") This adapter strongly depends on dbt-postgres, so you can read more about configurations here [Profile Setup](https://docs.getdbt.com/docs/local/connect-data-platform/postgres-setup.md) #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ##### Connect Hologres to dbt Core * **Maintained by**: Alibaba Cloud Hologres Team * **Authors**: Alibaba Cloud Hologres Team * **GitHub repo**: [aliyun/dbt-hologres](https://github.com/aliyun/dbt-hologres) [![](https://img.shields.io/github/stars/aliyun/dbt-hologres?style=for-the-badge)](https://github.com/aliyun/dbt-hologres) * **PyPI package**: `dbt-alibaba-cloud-hologres` [![](https://badge.fury.io/py/dbt-alibaba-cloud-hologres.svg)](https://badge.fury.io/py/dbt-alibaba-cloud-hologres) * **Slack channel**:[]() * **Supported dbt Core version**: v1.8.0 and newer * **dbt support**: Not Supported * **Minimum data platform version**: #### Installing dbt-alibaba-cloud-hologres Use `pip` to install the adapter. Before 1.8, installing the adapter would automatically install `dbt-core` and any additional dependencies. Beginning in 1.8, installing an adapter does not automatically install `dbt-core`. This is because adapters and dbt Core versions have been decoupled from each other so we no longer want to overwrite existing dbt-core installations. Use the following command for installation: `python -m pip install dbt-core dbt-alibaba-cloud-hologres` #### Configuring dbt-alibaba-cloud-hologres For Hologres-specific configuration, please refer to [Hologres configs.](https://docs.getdbt.com/reference/resource-configs/no-configs.md) #### Connecting to Hologres with **dbt-alibaba-cloud-hologres**[​](#connecting-to-hologres-with-dbt-alibaba-cloud-hologres "Direct link to connecting-to-hologres-with-dbt-alibaba-cloud-hologres") `dbt-alibaba-cloud-hologres` enables dbt to work with Alibaba Cloud Hologres, a real-time data warehouse compatible with PostgreSQL. Check out the dbt profile configuration below for details. \~/.dbt/profiles.yml ```yaml dbt-alibaba-cloud-hologres: # this needs to match the profile in your dbt_project.yml file target: dev outputs: dev: type: hologres host: HOST_NAME port: 80 user: USER_NAME password: PASSWORD database: DATABASE_NAME schema: SCHEMA_NAME threads: 4 ``` ##### Connection parameters[​](#connection-parameters "Direct link to Connection parameters") Currently it supports the following parameters: | **Field** | **Description** | Required? | **Default** | **Example** | | ------------------ | ---------------------------------------------------------------------------------------------- | --------- | ------------------------ | --------------------------------- | | `type` | Specifies the type of database connection; must be set to "hologres" for Hologres connections. | Required | - | `hologres` | | `host` | The endpoint hostname for connecting to Hologres instance. | Required | - | `hgxxx-xxx.hologres.aliyuncs.com` | | `port` | Port number for Hologres connection. | Optional | `80` | `80` | | `user` | The username for authentication with Hologres (case-sensitive). | Required | - | `AccessKey ID` | | `password` | The password for authentication with Hologres (case-sensitive). | Required | - | `AccessKey Secret` | | `database` | The name of your Hologres database. | Required | - | `my_database` | | `schema` | The default schema that the models will use in Hologres (use empty string "" if not needed). | Required | - | `public` | | `threads` | Number of threads for parallel execution. | Optional | `1` | `4` | | `connect_timeout` | Connection timeout in seconds. | Optional | `10` | `10` | | `sslmode` | SSL mode for the connection. | Optional | `disable` | `disable` | | `application_name` | Application identifier for connection tracking. | Optional | `dbt_hologres_{version}` | `my_dbt_app` | | `retries` | Number of connection retries. | Optional | `1` | `3` | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | #### Authentication configuration[​](#authentication-configuration "Direct link to Authentication configuration") `dbt-alibaba-cloud-hologres` uses the standard PostgreSQL-compatible authentication mechanism with username and password (Access Key). Hologres supports using Alibaba Cloud AccessKey or RAM user credentials for authentication. ##### Access key[​](#access-key "Direct link to Access key") You can authenticate using your Alibaba Cloud account credentials. For security reasons, it is recommended to create a RAM sub-account with appropriate permissions rather than using the primary account AccessKey. ```yaml jaffle_shop: # this needs to match the profile in your dbt_project.yml file target: dev outputs: dev: type: hologres host: hgxxx-cn-shanghai.hologres.aliyuncs.com # Replace with your Hologres endpoint port: 80 user: your_access_key_id # Replace with your AccessKeyId password: your_access_key_secret # Replace with your AccessKeySecret database: my_database # Replace with your database name schema: public # Replace with your schema name threads: 4 connect_timeout: 10 sslmode: disable ``` ##### Important notes[​](#important-notes "Direct link to Important notes") 1. **Case sensitivity**: Hologres usernames and passwords are case-sensitive. Make sure to enter them exactly as configured. 2. **Default port**: The default port for Hologres is `80`, which is different from the standard PostgreSQL port `5432`. 3. **SSL mode**: SSL is disabled by default for Hologres connections. You can enable it by setting `sslmode` to an appropriate value if required. #### Testing your connection[​](#testing-your-connection "Direct link to Testing your connection") After configuring your `profiles.yml`, you can verify your connection by running: ```bash dbt debug ``` This [command](https://docs.getdbt.com/reference/commands/debug.md) will test the connection to your Hologres instance and report any configuration issues. #### Hologres-specific features[​](#hologres-specific-features "Direct link to Hologres-specific features") ##### Dynamic tables in Hologres[​](#dynamic-tables-in-hologres "Direct link to Dynamic tables in Hologres") Dynamic tables are Hologres's implementation of materialized views with automatic refresh. When refreshing data, multiple modes are supported, including "full" (full mode) and "incremental" (incremental mode). For more information, please refer to the [reference manual](https://www.alibabacloud.com/help/en/hologres/user-guide/introduction-to-dynamic-table). You can configure them in your dbt models: ```yaml models: my_model: materialized: dynamic_table freshness: "30 minutes" auto_refresh_mode: auto computing_resource: serverless ``` Supported configurations for Dynamic tables: | **Configuration** | **Description** | **Example values** | | -------------------- | ----------------------------------------- | ------------------------------------- | | `freshness` | Data freshness requirement. | `"30 minutes"`, `"1 hours"` | | `auto_refresh_mode` | Refresh mode for the dynamic table. | `auto`, `incremental`, `full` | | `computing_resource` | Computing resource to use for refreshing. | `serverless`, `local`, warehouse name | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | ##### Incremental models with dbt[​](#incremental-models-with-dbt "Direct link to Incremental models with dbt") `dbt-alibaba-cloud-hologres` supports multiple incremental strategies: * `append`: Simply append new records * `delete+insert`: Delete matching records and insert new ones * `merge`: Use MERGE statement for upsert operations * `microbatch`: Process data in small batches ##### Constraints[​](#constraints "Direct link to Constraints") Full support for database constraints including: * Primary keys * Not null constraints ##### Table properties[​](#table-properties "Direct link to Table properties") Hologres supports the following table properties. For full details, see the [developer reference documentation](https://www.alibabacloud.com/help/en/hologres/developer-reference/create-tables). | Property | Best practices | | ----------------------------- | ---------------------------------------------------------------------------- | | `orientation` | Use `column` for olap workloads and `row` for key-value queries | | `distribution_key` | Choose frequently joined or grouped columns; prefer a single column | | `clustering_key` | Use for range filter columns; max 3 columns; follow the left-match principle | | `event_time_column` | Set for time-series data (timestamp columns) | | `bitmap_columns` | Use for equality filters | | `dictionary_encoding_columns` | Use for low-cardinality string columns | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | #### References[​](#references "Direct link to References") * [dbt-alibaba-cloud-hologres GitHub repository](https://github.com/aliyun/dbt-hologres) * [Hologres documentation](https://www.alibabacloud.com/help/en/hologres/) * [Hologres dynamic table guide](https://www.alibabacloud.com/help/en/hologres/user-guide/introduction-to-dynamic-table) #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ##### Connect IBM DB2 to dbt Core Community plugin Some core functionality may be limited. If you're interested in contributing, check out the source code for each repository listed below. * **Maintained by**: Community * **Authors**: Rasmus Nyberg (https://github.com/aurany) * **GitHub repo**: [aurany/dbt-ibmdb2](https://github.com/aurany/dbt-ibmdb2) [![](https://img.shields.io/github/stars/aurany/dbt-ibmdb2?style=for-the-badge)](https://github.com/aurany/dbt-ibmdb2) * **PyPI package**: `dbt-ibmdb2` [![](https://badge.fury.io/py/dbt-ibmdb2.svg)](https://badge.fury.io/py/dbt-ibmdb2) * **Slack channel**: [n/a](https://www.getdbt.com/community) * **Supported dbt Core version**: v1.0.4 and newer * **dbt support**: Not Supported * **Minimum data platform version**: IBM DB2 V9fp2 #### Installing dbt-ibmdb2 Use `pip` to install the adapter. Before 1.8, installing the adapter would automatically install `dbt-core` and any additional dependencies. Beginning in 1.8, installing an adapter does not automatically install `dbt-core`. This is because adapters and dbt Core versions have been decoupled from each other so we no longer want to overwrite existing dbt-core installations. Use the following command for installation: `python -m pip install dbt-core dbt-ibmdb2` #### Configuring dbt-ibmdb2 For IBM DB2-specific configuration, please refer to [IBM DB2 configs.](https://docs.getdbt.com/reference/resource-configs/no-configs.md) This is an experimental plugin: * We have not tested it extensively * Tested with [dbt-adapter-tests](https://pypi.org/project/pytest-dbt-adapter/) and DB2 LUW on Mac OS+RHEL8 * Compatibility with other [dbt packages](https://hub.getdbt.com/) (like [dbt\_utils](https://hub.getdbt.com/dbt-labs/dbt_utils/latest/)) is only partially tested #### Connecting to IBM DB2 with dbt-ibmdb2[​](#connecting-to-ibm-db2-with-dbt-ibmdb2 "Direct link to Connecting to IBM DB2 with dbt-ibmdb2") IBM DB2 targets should be set up using the following configuration in your `profiles.yml` file. Example: \~/.dbt/profiles.yml ```yaml your_profile_name: target: dev outputs: dev: type: ibmdb2 schema: analytics database: test host: localhost port: 50000 protocol: TCPIP username: my_username password: my_password ``` ###### Description of IBM DB2 Profile Fields[​](#description-of-ibm-db2-profile-fields "Direct link to Description of IBM DB2 Profile Fields") | Option | Description | Required? | Example | | -------- | ---------------------------------------------------- | --------- | ------------- | | type | The specific adapter to use | Required | `ibmdb2` | | schema | Specify the schema (database) to build models into | Required | `analytics` | | database | Specify the database you want to connect to | Required | `testdb` | | host | Hostname or IP-address | Required | `localhost` | | port | The port to use | Optional | `50000` | | protocol | Protocol to use | Optional | `TCPIP` | | username | The username to use to connect to the server | Required | `my-username` | | password | The password to use for authenticating to the server | Required | `my-password` | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | #### Supported features[​](#supported-features "Direct link to Supported features") | DB2 LUW | DB2 z/OS | Feature | | ------- | -------- | --------------------------- | | ✅ | 🤷 | Table materialization | | ✅ | 🤷 | View materialization | | ✅ | 🤷 | Incremental materialization | | ✅ | 🤷 | Ephemeral materialization | | ✅ | 🤷 | Seeds | | ✅ | 🤷 | Sources | | ✅ | 🤷 | Custom data tests | | ✅ | 🤷 | Docs generate | | ✅ | 🤷 | Snapshots | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | #### Notes[​](#notes "Direct link to Notes") * dbt-ibmdb2 is built on the ibm\_db python package and there are some known encoding issues related to z/OS. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ##### Connect IBM Netezza to dbt Core The dbt-ibm-netezza adapter allows you to use dbt to transform and manage data on IBM Netezza, leveraging its distributed SQL query engine capabilities. Before proceeding, ensure you have the following: * An active IBM Netezza engine with connection details (host, port, database, schema, etc) in SaaS/PaaS. * Authentication credentials: Username and password. Refer to [Configuring dbt-ibm-netezza](https://github.com/IBM/nz-dbt?tab=readme-ov-file#testing-sample-dbt-project) for guidance on obtaining and organizing these details. * **Maintained by**: IBM * **Authors**: Abhishek Jog, Sagar Soni, Ayush Mehrotra * **GitHub repo**: [IBM/nz-dbt](https://github.com/IBM/nz-dbt) [![](https://img.shields.io/github/stars/IBM/nz-dbt?style=for-the-badge)](https://github.com/IBM/nz-dbt) * **PyPI package**: `dbt-ibm-netezza` [![](https://badge.fury.io/py/dbt-ibm-netezza.svg)](https://badge.fury.io/py/dbt-ibm-netezza) * **Slack channel**:[]() * **Supported dbt Core version**: v1.9.2 and newer * **dbt support**: Not Supported * **Minimum data platform version**: 11.2.3.4 #### Installing dbt-ibm-netezza Use `pip` to install the adapter. Before 1.8, installing the adapter would automatically install `dbt-core` and any additional dependencies. Beginning in 1.8, installing an adapter does not automatically install `dbt-core`. This is because adapters and dbt Core versions have been decoupled from each other so we no longer want to overwrite existing dbt-core installations. Use the following command for installation: `python -m pip install dbt-core dbt-ibm-netezza` #### Configuring dbt-ibm-netezza For IBM Netezza-specific configuration, please refer to [IBM Netezza configs.](https://docs.getdbt.com/reference/resource-configs/ibm-netezza-config.md) #### Connecting to IBM Netezza[​](#connecting-to-ibm-netezza "Direct link to Connecting to IBM Netezza") To connect dbt with IBM Netezza, you need to configure a profile in your `profiles.yml` file located in the `.dbt/` directory of your home folder. The following is an example configuration for connecting to IBM Netezza instances: \~/.dbt/profiles.yml ```yaml my_project: outputs: dev: type: netezza user: [user] password: [password] host: [hostname] database: [catalog name] schema: [schema name] port: 5480 threads: [1 or more] target: dev ``` ##### Setup external table options[​](#setup-external-table-options "Direct link to Setup external table options") You also need to configure the `et_options.yml` file located in your project directory. Make sure the file is correctly setup before running the `dbt seed`. This ensures that data is inserted into your tables accurately as specified in the external data file. ./et\_options.yml ```yaml - !ETOptions SkipRows: "1" Delimiter: "','" DateDelim: "'-'" MaxErrors: " 0 " ``` Refer the [Netezza external tables option summary](https://www.ibm.com/docs/en/netezza?topic=eto-option-summary) for more options in the file. #### Host parameters[​](#host-parameters "Direct link to Host parameters") The following profile fields are required to configure IBM Netezza connections. | Option | Required/Optional | Description | Example | | ---------- | ----------------- | ----------------------------------------------------- | ----------- | | `user` | Required | Username or email address for authentication. | `user` | | `password` | Required | Password or API key for authentication | `password` | | `host` | Required | Hostname for connecting to Netezza. | `127.0.0.1` | | `database` | Required | The catalog name in your Netezza instance. | `SYSTEM` | | `schema` | Required | The schema name within your Netezza instance catalog. | `my_schema` | | `port` | Required | The port for connecting to Netezza. | `5480` | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | ##### Schemas and databases[​](#schemas-and-databases "Direct link to Schemas and databases") When selecting the database and the schema, make sure the user has read and write access to both. This selection does not limit your ability to query the database. Instead, they serve as the default location for where tables and views are materialized. #### Notes:[​](#notes "Direct link to Notes:") The `dbt-ibm-netezza` adapter is built on the IBM Netezza Python driver - [nzpy](https://pypi.org/project/nzpy/) and is a pre-requisite which gets installed along with the adapter. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ##### Connect IBM watsonx.data Presto to dbt Core The dbt-watsonx-presto adapter allows you to use dbt to transform and manage data on IBM watsonx.data Presto(Java), leveraging its distributed SQL query engine capabilities. Before proceeding, ensure you have the following: * An active IBM watsonx.data Presto(Java) engine with connection details (host, port, catalog, schema) in SaaS/Software. * Authentication credentials: Username and password/apikey. * For watsonx.data instances, SSL verification is required for secure connections. If the instance host uses HTTPS, there is no need to specify the SSL certificate parameter. However, if the instance host uses an unsecured HTTP connection, ensure you provide the path to the SSL certificate file. Refer to [Configuring dbt-watsonx-presto](https://www.ibm.com/docs/en/watsonx/watsonxdata/2.1.x?topic=presto-configuration-setting-up-your-profile) for guidance on obtaining and organizing these details. * **Maintained by**: IBM * **Authors**: Karnati Naga Vivek, Hariharan Ashokan, Biju Palliyath, Gopikrishnan Varadarajulu, Rohan Pednekar * **GitHub repo**: [IBM/dbt-watsonx-presto](https://github.com/IBM/dbt-watsonx-presto) [![](https://img.shields.io/github/stars/IBM/dbt-watsonx-presto?style=for-the-badge)](https://github.com/IBM/dbt-watsonx-presto) * **PyPI package**: `dbt-watsonx-presto` [![](https://badge.fury.io/py/dbt-watsonx-presto.svg)](https://badge.fury.io/py/dbt-watsonx-presto) * **Slack channel**: [#db-watsonx-presto](https://getdbt.slack.com/archives/C08C7D53R40) * **Supported dbt Core version**: v1.8.0 and newer * **dbt support**: Not Supported * **Minimum data platform version**: n/a #### Installing dbt-watsonx-presto Use `pip` to install the adapter. Before 1.8, installing the adapter would automatically install `dbt-core` and any additional dependencies. Beginning in 1.8, installing an adapter does not automatically install `dbt-core`. This is because adapters and dbt Core versions have been decoupled from each other so we no longer want to overwrite existing dbt-core installations. Use the following command for installation: `python -m pip install dbt-core dbt-watsonx-presto` #### Configuring dbt-watsonx-presto For IBM watsonx.data-specific configuration, please refer to [IBM watsonx.data configs.](https://docs.getdbt.com/reference/resource-configs/watsonx-presto-config.md) #### Connecting to IBM watsonx.data presto[​](#connecting-to-ibm-watsonxdata-presto "Direct link to Connecting to IBM watsonx.data presto") To connect dbt with watsonx.data Presto(java), you need to configure a profile in your `profiles.yml` file located in the `.dbt/` directory of your home folder. The following is an example configuration for connecting to IBM watsonx.data SaaS and Software instances: \~/.dbt/profiles.yml ```yaml my_project: outputs: software: type: watsonx_presto method: BasicAuth user: [user] password: [password] host: [hostname] catalog: [catalog_name] schema: [your dbt schema] port: [port number] threads: [1 or more] ssl_verify: path/to/certificate saas: type: watsonx_presto method: BasicAuth user: [user] password: [api_key] host: [hostname] catalog: [catalog_name] schema: [your dbt schema] port: [port number] threads: [1 or more] target: software ``` #### Host parameters[​](#host-parameters "Direct link to Host parameters") The following profile fields are required to configure watsonx.data Presto(java) connections. For IBM watsonx.data SaaS or Software instances, you can get the `hostname` and `port` details by clicking **View connect details** on the Presto(java) engine details page. | Option | Required/Optional | Description | Example | | ------------ | ---------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------- | ------------------------------- | | `method` | Required | Specifies the authentication method for secure connections. Use `BasicAuth` when connecting to IBM watsonx.data SaaS or Software instances. | `BasicAuth` | | `user` | Required | Username or email address for authentication. | `user` | | `password` | Required | Password or API key for authentication | `password` | | `host` | Required | Hostname for connecting to Presto. | `127.0.0.1` | | `catalog` | Required | The catalog name in your Presto instance. | `Analytics` | | `schema` | Required | The schema name within your Presto instance catalog. | `my_schema` | | `port` | Required | The port for connecting to Presto. | `443` | | `ssl_verify` | Optional (default: **true**) | Specifies the path to the SSL certificate or a boolean value. The SSL certificate path is required if the watsonx.data instance is not secure (HTTP). | `path/to/certificate` or `true` | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | ##### Schemas and databases[​](#schemas-and-databases "Direct link to Schemas and databases") When selecting the catalog and the schema, make sure the user has read and write access to both. This selection does not limit your ability to query the catalog. Instead, they serve as the default location for where tables and views are materialized. In addition, the Presto connector used in the catalog must support creating tables. This default can be changed later from within your dbt project. ##### SSL verification[​](#ssl-verification "Direct link to SSL verification") * If the Presto instance uses an unsecured HTTP connection, you must set `ssl_verify` to the path of the SSL certificate file. * If the instance uses `HTTPS`, this parameter is not required and can be omitted. #### Additional parameters[​](#additional-parameters "Direct link to Additional parameters") The following profile fields are optional to set up. They let you configure your instance session and dbt for your connection. | Profile field | Description | Example | | -------------- | ----------------------------------------------------------------------------------------------------------- | ------------------------------------- | | `threads` | How many threads dbt should use (default is `1`) | `8` | | `http_headers` | HTTP headers to send alongside requests to Presto, specified as a yaml dictionary of (header, value) pairs. | `X-Presto-Routing-Group: my-instance` | | `http_scheme` | The HTTP scheme to use for requests to (default: `http`, or `https` if `BasicAuth`) | `https` or `http` | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ##### Connect IBM watsonx.data Spark to dbt Core * **Maintained by**: IBM * **Authors**: Bayan Albunayan, Reema Alzaid, Manjot Sidhu * **GitHub repo**: [IBM/dbt-watsonx-spark](https://github.com/IBM/dbt-watsonx-spark) [![](https://img.shields.io/github/stars/IBM/dbt-watsonx-spark?style=for-the-badge)](https://github.com/IBM/dbt-watsonx-spark) * **PyPI package**: `dbt-watsonx-spark` [![](https://badge.fury.io/py/dbt-watsonx-spark.svg)](https://badge.fury.io/py/dbt-watsonx-spark) * **Slack channel**:[]() * **Supported dbt Core version**: v0.0.8 and newer * **dbt support**: Not Supported * **Minimum data platform version**: n/a #### Installing dbt-watsonx-spark Use `pip` to install the adapter. Before 1.8, installing the adapter would automatically install `dbt-core` and any additional dependencies. Beginning in 1.8, installing an adapter does not automatically install `dbt-core`. This is because adapters and dbt Core versions have been decoupled from each other so we no longer want to overwrite existing dbt-core installations. Use the following command for installation: `python -m pip install dbt-core dbt-watsonx-spark` #### Configuring dbt-watsonx-spark For IBM watsonx.data-specific configuration, please refer to [IBM watsonx.data configs.](https://docs.getdbt.com/reference/resource-configs/watsonx-Spark-config) The `dbt-watsonx-spark` adapter allows you to use dbt to transform and manage data on IBM watsonx.data Spark, leveraging its distributed SQL query engine capabilities. Before proceeding, ensure you have the following: * An active IBM watsonx.data, For [IBM Cloud (SaaS)](https://cloud.ibm.com/docs/watsonxdata?topic=watsonxdata-getting-started). For [Software](https://www.ibm.com/docs/en/watsonx/watsonxdata/2.1.x?topic=installing-watsonxdata-developer-version) * Provision **Native Spark engine** in watsonx.data, For [IBM Cloud (SaaS)](https://cloud.ibm.com/docs/watsonxdata?topic=watsonxdata-prov_nspark). For [Software](https://www.ibm.com/docs/en/watsonx/watsonxdata/2.1.x?topic=spark-native-engine) * An active **Spark query server** in your **Native Spark engine** Read the official documentation for using **watsonx.data** with `dbt-watsonx-spark` * [Documentation for IBM Cloud and SaaS offerings](https://cloud.ibm.com/docs/watsonxdata?topic=watsonxdata-dbt_watsonx_spark_inst) * [Documentation for IBM watsonx.data software](https://www.ibm.com/docs/en/watsonx/watsonxdata/2.1.x?topic=integration-data-build-tool-adapter-spark) #### Installing dbt-watsonx-spark[​](#installing-dbt-watsonx-spark "Direct link to Installing dbt-watsonx-spark") Note: Installing an adapter doesn't install 'dbt Core' automatically. This is because adapters and dbt Core versions are decoupled to avoid overwriting dbt Core installations. Use the following command for installation: ```sh python -m pip install dbt-watsonx-spark ``` #### Configuring `dbt-watsonx-spark`[​](#configuring-dbt-watsonx-spark "Direct link to configuring-dbt-watsonx-spark") For IBM watsonx.data-specific configuration, refer to [IBM watsonx.data configs.](https://docs.getdbt.com/reference/resource-configs/watsonx-spark-config.md) #### Connecting to IBM watsonx.data Spark[​](#connecting-to-ibm-watsonxdata-spark "Direct link to Connecting to IBM watsonx.data Spark") To connect dbt with watsonx.data Spark, configure a profile in your `profiles.yml` file located in the `.dbt/` directory of your home folder. The following is an example configuration for connecting to IBM watsonx.data SaaS and Software instances: \~/.dbt/profiles.yml ```yaml project_name: target: "dev" outputs: dev: type: watsonx_spark method: http schema: [schema name] host: [hostname] uri: [uri] catalog: [catalog name] use_ssl: false auth: instance: [Watsonx.data Instance ID] user: [username] apikey: [apikey] ``` #### Host parameters[​](#host-parameters "Direct link to Host parameters") The following profile fields are required to configure watsonx.data Spark connections. For IBM watsonx.data SaaS or Software instances, To get the 'profile' details, click 'View connect details' when the 'query server' is in RUNNING status in watsonx.data (In watsonx.data (both SaaS or Software). The Connection details page opens with the profile configuration. Copy and paste the connection details in the profiles.yml file that is located in .dbt of your home directory The following profile fields are required to configure watsonx.data Spark connections: | Option | Required/Optional | Description | Example | | ---------- | ----------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | -------------------------------------------------------------------------------------- | | `method` | Required | Specifies the connection method to the spark query server. Use `http`. | `http` | | `schema` | Required | To choose an existing schema within spark engine or create a new schema. | `spark_schema` | | `host` | Required | Hostname of the watsonx.data console. For more information, see [Getting connection information](https://www.ibm.com/docs/en/watsonx/watsonxdata/2.1.x?topic=references-getting-connection-information#connection_info__conn_info_). | `https://dataplatform.cloud.ibm.com` | | `uri` | Required | URI of your query server that is running on watsonx.data. For more information, see [Getting connection information](https://www.ibm.com/docs/en/watsonx/watsonxdata/2.1.x?topic=references-getting-connection-information#connection_info__conn_info_). | `/lakehouse/api/v2/spark_engines//query_servers//connect/cliservice` | | `catalog` | Required | The catalog that is associated with the Spark engine. | `my_catalog` | | `use_ssl` | Optional (default: **false**) | Specifies whether to use SSL. | `true` or `false` | | `instance` | Required | For **SaaS** set it as CRN of watsonx.data. As for **Software**, set it as instance ID of watsonx.data | `1726574045872688` | | `user` | Required | Username for the watsonx.data instance. for \[Saas] use email as username | `username` or `user@example.com` | | `apikey` | Required | Your API key. For more info on [SaaS](https://www.ibm.com/docs/en/software-hub/5.1.x?topic=started-generating-api-keys), For [Software](https://cloud.ibm.com/docs/account?topic=account-userapikey\&interface=ui#manage-user-keys) | `API key` | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | ##### Schemas and catalogs[​](#schemas-and-catalogs "Direct link to Schemas and catalogs") When selecting the catalog, ensure the user has read and write access. This selection does not limit your ability to query into the schema spcified/created but also serves as the default location for materialized `tables`, `views`, and `incremental`. ##### SSL verification[​](#ssl-verification "Direct link to SSL verification") * If the Spark instance uses an unsecured HTTP connection, set `use_ssl` to `false`. * If the instance uses `HTTPS`, set it `true`. #### Additional parameters[​](#additional-parameters "Direct link to Additional parameters") The following profile fields are optional. You can configure the instance session and dbt for the connection. | Profile field | Description | Example | | ----------------- | ------------------------------------------------------------ | ------- | | `threads` | How many threads dbt should use (default is `1`) | `8` | | `retry_all` | Enables automatic retries for transient connection failures. | `true` | | `connect_timeout` | Timeout for establishing a connection (in seconds). | `5` | | `connect_retries` | Number of retry attempts for connection failures. | `3` | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | #### Limitations and considerations[​](#limitations-and-considerations "Direct link to Limitations and considerations") * **Supports only HTTP**: No support for ODBC, Thrift, or session-based connections. * **Limited dbt Support**: Not fully compatible with dbt. * **Metadata Persistence**: Some dbt features, such as column descriptions, may not persist in all table formats. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ##### Connect Infer to dbt Core Vendor-supported plugin Certain core functionality may vary. If you would like to report a bug, request a feature, or contribute, you can check out the linked repository and open an issue. * **Maintained by**: Infer * **Authors**: Erik Mathiesen-Dreyfus, Ryan Garland * **GitHub repo**: [inferlabs/dbt-infer](https://github.com/inferlabs/dbt-infer) [![](https://img.shields.io/github/stars/inferlabs/dbt-infer?style=for-the-badge)](https://github.com/inferlabs/dbt-infer) * **PyPI package**: `dbt-infer` [![](https://badge.fury.io/py/dbt-infer.svg)](https://badge.fury.io/py/dbt-infer) * **Slack channel**: [n/a]() * **Supported dbt Core version**: v1.2.0 and newer * **dbt support**: Not Supported * **Minimum data platform version**: n/a #### Installing dbt-infer Use `pip` to install the adapter. Before 1.8, installing the adapter would automatically install `dbt-core` and any additional dependencies. Beginning in 1.8, installing an adapter does not automatically install `dbt-core`. This is because adapters and dbt Core versions have been decoupled from each other so we no longer want to overwrite existing dbt-core installations. Use the following command for installation: `python -m pip install dbt-core dbt-infer` #### Configuring dbt-infer For Infer-specific configuration, please refer to [Infer configs.](https://docs.getdbt.com/reference/resource-configs/infer-configs.md) #### Connecting to Infer with **dbt-infer**[​](#connecting-to-infer-with-dbt-infer "Direct link to connecting-to-infer-with-dbt-infer") Infer allows you to perform advanced ML Analytics within SQL as if native to your data warehouse. To do this Infer uses a variant called SQL-inf, which defines as set of primitive ML commands from which you can build advanced analysis for any business use case. Read more about SQL-inf and Infer in the [Infer documentation](https://docs.getinfer.io/). The `dbt-infer` package allow you to use SQL-inf easily within your dbt models. You can read more about the `dbt-infer` package itself and how it connects to Infer in the [dbt-infer documentation](https://dbt.getinfer.io/). The dbt-infer adapter is maintained via PyPi and installed with pip. To install the latest dbt-infer package simply run the following within the same shell as you run dbt. ```python pip install dbt-infer ``` Versioning of dbt-infer follows the standard dbt versioning scheme - meaning if you are using dbt 1.2 the corresponding dbt-infer will be named 1.2.x where is the latest minor version number. Before using SQL-inf in your dbt models you need to setup an Infer account and generate an API-key for the connection. You can read how to do that in the [Getting Started Guide](https://docs.getinfer.io/docs/reference/integrations/dbt). The profile configuration in `profiles.yml` for `dbt-infer` should look something like this: \~/.dbt/profiles.yml ```yaml : target: outputs: : type: infer url: "" username: "" apikey: "" data_config: [configuration for your underlying data warehouse] ``` Note that you need to also have installed the adapter package for your underlying data warehouse. For example, if your data warehouse is BigQuery then you need to also have installed the appropriate `dbt-bigquery` package. The configuration of this goes into the `data_config` field. ##### Description of Infer Profile Fields[​](#description-of-infer-profile-fields "Direct link to Description of Infer Profile Fields") | Field | Required | Description | | ------------- | -------- | ------------------------------------------------------------------------------------------------------------------------------------------------- | | `type` | Yes | Must be set to `infer`. This must be included either in `profiles.yml` or in the `dbt_project.yml` file. | | `url` | Yes | The host name of the Infer server to connect to. Typically this is `https://app.getinfer.io`. | | `username` | Yes | Your Infer username - the one you use to login. | | `apikey` | Yes | Your Infer api key. | | `data_config` | Yes | The configuration for your underlying data warehouse. The format of this follows the format of the configuration for your data warehouse adapter. | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | ##### Example of Infer configuration[​](#example-of-infer-configuration "Direct link to Example of Infer configuration") To illustrate the above descriptions, here is an example of what a `dbt-infer` configuration might look like. In this case the underlying data warehouse is BigQuery, which we configure the adapter for inside the `data_config` field. ```yaml infer_bigquery: apikey: 1234567890abcdef username: my_name@example.com url: https://app.getinfer.io type: infer data_config: dataset: my_dataset job_execution_timeout_seconds: 300 job_retries: 1 keyfile: bq-user-creds.json location: EU method: service-account priority: interactive project: my-bigquery-project threads: 1 type: bigquery ``` #### Usage[​](#usage "Direct link to Usage") You do not need to change anything in your existing dbt models when switching to use SQL-inf – they will all work the same as before – but you now have the ability to use SQL-inf commands as native SQL functions. Infer supports a number of SQL-inf commands, including `PREDICT`, `EXPLAIN`, `CLUSTER`, `SIMILAR_TO`, `TOPICS`, `SENTIMENT`. You can read more about SQL-inf and the commands it supports in the [SQL-inf Reference Guide](https://docs.getinfer.io/docs/category/commands). To get you started we will give a brief example here of what such a model might look like. You can find other more complex examples in the [dbt-infer examples repo](https://github.com/inferlabs/dbt-infer-examples). In our simple example, we will show how to use a previous model 'user\_features' to predict churn by predicting the column `has_churned`. predict\_user\_churn.sql ```sql {{ config( materialized = "table" ) }} with predict_user_churn_input as ( select * from {{ ref('user_features') }} ) SELECT * FROM predict_user_churn_input PREDICT(has_churned, ignore=user_id) ``` Not that we ignore `user_id` from the prediction. This is because we think that the `user_id` might, and should, not influence our prediction of churn, so we remove it. We also use the convention of pulling together the inputs for our prediction in a CTE, named `predict_user_churn_input`. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ##### Connect iomete to dbt Core * **Maintained by**: iomete * **Authors**: Namig Aliyev * **GitHub repo**: [iomete/dbt-iomete](https://github.com/iomete/dbt-iomete) [![](https://img.shields.io/github/stars/iomete/dbt-iomete?style=for-the-badge)](https://github.com/iomete/dbt-iomete) * **PyPI package**: `dbt-iomete` [![](https://badge.fury.io/py/dbt-iomete.svg)](https://badge.fury.io/py/dbt-iomete) * **Slack channel**: [##db-iomete](https://getdbt.slack.com/archives/C03JFG22EP9) * **Supported dbt Core version**: v0.18.0 and newer * **dbt support**: Not Supported * **Minimum data platform version**: n/a #### Installing dbt-iomete Use `pip` to install the adapter. Before 1.8, installing the adapter would automatically install `dbt-core` and any additional dependencies. Beginning in 1.8, installing an adapter does not automatically install `dbt-core`. This is because adapters and dbt Core versions have been decoupled from each other so we no longer want to overwrite existing dbt-core installations. Use the following command for installation: `python -m pip install dbt-core dbt-iomete` #### Configuring dbt-iomete For iomete-specific configuration, please refer to [iomete configs.](https://docs.getdbt.com/reference/resource-configs/no-configs.md) Set up a iomete Target iomete targets should be set up using the following configuration in your profiles.yml file. profiles.yml ```yaml iomete: target: dev outputs: dev: type: iomete cluster: cluster_name host: .iomete.com port: 443 schema: database_name account_number: iomete_account_number user: iomete_user_name password: iomete_user_password ``` ###### Description of Profile Fields[​](#description-of-profile-fields "Direct link to Description of Profile Fields") | Field | Description | Required | Example | | --------------- | --------------------------------------------------------------------------------------------------------------------------------------- | -------- | ---------------------- | | type | The specific adapter to use | Required | `iomete` | | cluster | The cluster to connect | Required | `reporting` | | host | The host name of the connection. It is a combination of
`account_number` with the prefix `dwh-`
and the suffix `.iomete.com`. | Required | `dwh-12345.iomete.com` | | port | The port to use. | Required | `443` | | schema | Specify the schema (database) to build models into. | Required | `dbt_finance` | | account\_number | The iomete account number with single quotes. | Required | `'1234566789123'` | | username | The iomete username to use to connect to the server. | Required | `dbt_user` | | password | The iomete user password to use to connect to the server. | Required | `strong_password` | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | #### Supported Functionality[​](#supported-functionality "Direct link to Supported Functionality") Most dbt Core functionality is supported. Iceberg specific improvements. 1. Joining the results of `show tables` and `show views`. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ##### Connect Layer to dbt Core * **Maintained by**: Layer * **Authors**: Mehmet Ecevit * **GitHub repo**: [layerai/dbt-layer](https://github.com/layerai/dbt-layer) [![](https://img.shields.io/github/stars/layerai/dbt-layer?style=for-the-badge)](https://github.com/layerai/dbt-layer) * **PyPI package**: `dbt-layer-bigquery` [![](https://badge.fury.io/py/dbt-layer-bigquery.svg)](https://badge.fury.io/py/dbt-layer-bigquery) * **Slack channel**: [#tools-layer](https://getdbt.slack.com/archives/C03STA39TFE) * **Supported dbt Core version**: v1.0.0 and newer * **dbt support**: Not Supported * **Minimum data platform version**: n/a #### Installing dbt-layer-bigquery Use `pip` to install the adapter. Before 1.8, installing the adapter would automatically install `dbt-core` and any additional dependencies. Beginning in 1.8, installing an adapter does not automatically install `dbt-core`. This is because adapters and dbt Core versions have been decoupled from each other so we no longer want to overwrite existing dbt-core installations. Use the following command for installation: `python -m pip install dbt-core dbt-layer-bigquery` #### Configuring dbt-layer-bigquery For Layer-specific configuration, please refer to [Layer configs.](https://docs.getdbt.com/reference/resource-configs/no-configs.md) ##### Profile Configuration[​](#profile-configuration "Direct link to Profile Configuration") Layer Bigquery targets should be set up using the following sections in your `profiles.yml` file. ###### Layer Authentication[​](#layer-authentication "Direct link to Layer Authentication") Add your `layer_api_key` to your `profiles.yaml` to authenticate with Layer. To get your Layer API Key: * First, [create your free Layer account](https://app.layer.ai/login?returnTo=%2Fgetting-started). * Go to [app.layer.ai](https://app.layer.ai) > **Settings** (Cog Icon by your profile photo) > **Developer** > **Create API key** to get your Layer API Key. ###### Bigquery Authentication[​](#bigquery-authentication "Direct link to Bigquery Authentication") You can use any [authentication method](https://docs.getdbt.com/docs/local/connect-data-platform/bigquery-setup.md) supported in the official dbt Bigquery adapter since Layer uses `dbt-bigquery` adapter to connect to your Bigquery instance. A sample profile: profiles.yml ```yaml layer-profile: target: dev outputs: dev: # Layer authentication type: layer_bigquery layer_api_key: [the API Key to access your Layer account (opt)] # Bigquery authentication method: service-account project: [GCP project id] dataset: [the name of your dbt dataset] threads: [1 or more] keyfile: [/path/to/bigquery/keyfile.json] ``` ###### Description of Layer Bigquery Profile Fields[​](#description-of-layer-bigquery-profile-fields "Direct link to Description of Layer Bigquery Profile Fields") The following fields are required: | Parameter | Default | Type | Description | | --------------- | ------- | ------------ | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | `type` | | string | Specifies the adapter you want to use. It should be `layer_bigquery`. | | `layer_api_key` | | string (opt) | Specifies your Layer API key. If you want to make predictions with public ML models from Layer, you don't need to have this key in your profile. It's required if you load ML models from your Layer account or train an AutoML model. | | `layer_project` | | string (opt) | Specifies your target Layer project. If you don't specify, Layer will use the project same name with your dbt project. | | `method` | | string | Specifies the authentication type to connect to your BigQuery. | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | Rest of the parameters depends on the BigQuery authentication method you specified. #### Usage[​](#usage "Direct link to Usage") ##### AutoML[​](#automl "Direct link to AutoML") You can automatically build state-of-art ML models using your own dbt models with plain SQL. To train an AutoML model all you have to do is pass your model type, input data (features) and target column you want to predict to `layer.automl()` in your SQL. The Layer AutoML will pick the best performing model and enable you to call it by its dbt model name to make predictions as shown above. *Syntax:* ```text layer.automl("MODEL_TYPE", ARRAY[FEATURES], TARGET) ``` *Parameters:* | Syntax | Description | | ------------ | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | `MODEL_TYPE` | Type of the model your want to train. There are two options:
- `classifier`: A model to predict classes/labels or categories such as spam detection
- `regressor`: A model to predict continuous outcomes such as CLV prediction. | | `FEATURES` | Input column names as a list to train your AutoML model. | | `TARGET` | Target column that you want to predict. | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | *Requirements:* * You need to put `layer_api_key` to your dbt profile to make AutoML work. *Example:* Check out [Order Review AutoML Project](https://github.com/layerai/dbt-layer/tree/mecevit/update-docs/examples/order_review_prediction): ```sql SELECT order_id, layer.automl( -- This is a regression problem 'regressor', -- Data (input features) to train our model ARRAY[ days_between_purchase_and_delivery, order_approved_late, actual_delivery_vs_expectation_bucket, total_order_price, total_order_freight, is_multiItems_order,seller_shipped_late], -- Target column we want to predict review_score ) FROM {{ ref('training_data') }} ``` ##### Prediction[​](#prediction "Direct link to Prediction") You can make predictions using any Layer ML model within your dbt models. Layer dbt Adapter helps you score your data resides on your warehouse within your dbt DAG with SQL. *Syntax:* ```text layer.predict("LAYER_MODEL_PATH", ARRAY[FEATURES]) ``` *Parameters:* | Syntax | Description | | ------------------ | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | `LAYER_MODEL_PATH` | This is the Layer model path in form of `/[organization_name]/[project_name]/models/[model_name]`. You can use only the model name if you want to use an AutoML model within the same dbt project. | | `FEATURES` | These are the columns that this model requires to make a prediction. You should pass the columns as a list like `ARRAY[column1, column2, column3]`. | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | *Example:* Check out [Cloth Detection Project](https://github.com/layerai/dbt-layer/tree/mecevit/update-docs/examples/cloth_detector): ```sql SELECT id, layer.predict("layer/clothing/models/objectdetection", ARRAY[image]) FROM {{ ref("products") }} ``` #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ##### Connect Materialize to dbt Core Vendor-supported plugin Certain core functionality may vary. If you would like to report a bug, request a feature, or contribute, you can check out the linked repository and open an issue. * **Maintained by**: Materialize Inc. * **Authors**: Materialize team * **GitHub repo**: [MaterializeInc/materialize](https://github.com/MaterializeInc/materialize) [![](https://img.shields.io/github/stars/MaterializeInc/materialize?style=for-the-badge)](https://github.com/MaterializeInc/materialize) * **PyPI package**: `dbt-materialize` [![](https://badge.fury.io/py/dbt-materialize.svg)](https://badge.fury.io/py/dbt-materialize) * **Slack channel**: [#db-materialize](https://getdbt.slack.com/archives/C01PWAH41A5) * **Supported dbt Core version**: v0.18.1 and newer * **dbt support**: Not Supported * **Minimum data platform version**: v0.28.0 #### Installing dbt-materialize Use `pip` to install the adapter. Before 1.8, installing the adapter would automatically install `dbt-core` and any additional dependencies. Beginning in 1.8, installing an adapter does not automatically install `dbt-core`. This is because adapters and dbt Core versions have been decoupled from each other so we no longer want to overwrite existing dbt-core installations. Use the following command for installation: `python -m pip install dbt-core dbt-materialize` #### Configuring dbt-materialize For Materialize-specific configuration, please refer to [Materialize configs.](https://docs.getdbt.com/reference/resource-configs/materialize-configs.md) #### Connecting to Materialize[​](#connecting-to-materialize "Direct link to Connecting to Materialize") Once you have set up a [Materialize account](https://materialize.com/register/), adapt your `profiles.yml` to connect to your instance using the following reference profile configuration: \~/.dbt/profiles.yml ```yaml materialize: target: dev outputs: dev: type: materialize host: [host] port: [port] user: [user@domain.com] pass: [password] dbname: [database] cluster: [cluster] # default 'default' schema: [dbt schema] sslmode: require keepalives_idle: 0 # default: 0, indicating the system default connect_timeout: 10 # default: 10 seconds retries: 1 # default: 1, retry on error/timeout when opening connections ``` ##### Configurations[​](#configurations "Direct link to Configurations") `cluster`: The default [cluster](https://materialize.com/docs/overview/key-concepts/#clusters) is used to maintain materialized views or indexes. A [`default` cluster](https://materialize.com/docs/sql/show-clusters/#default-cluster) is pre-installed in every environment, but we recommend creating dedicated clusters to isolate the workloads in your dbt project (for example, `staging` and `data_mart`). `keepalives_idle`: The number of seconds before sending a ping to keep the Materialize connection active. If you are encountering `SSL SYSCALL error: EOF detected`, you may want to lower the [keepalives\_idle](https://docs.getdbt.com/docs/local/connect-data-platform/postgres-setup.md#keepalives_idle) value to prevent the database from closing its connection. To test the connection to Materialize, run: ```text dbt debug ``` If the output reads "All checks passed!", you’re good to go! Check the [dbt and Materialize guide](https://materialize.com/docs/guides/dbt/) to learn more and get started. #### Supported Features[​](#supported-features "Direct link to Supported Features") ##### Materializations[​](#materializations "Direct link to Materializations") Because Materialize is optimized for transformations on streaming data and the core of dbt is built around batch, the `dbt-materialize` adapter implements a few custom materialization types: | Type | Supported? | Details | | ------------------ | ---------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | | `source` | YES | Creates a [source](https://materialize.com/docs/sql/create-source/). | | `view` | YES | Creates a [view](https://materialize.com/docs/sql/create-view/#main). | | `materializedview` | YES | Creates a [materialized view](https://materialize.com/docs/sql/create-materialized-view/#main). | | `table` | YES | Creates a [materialized view](https://materialize.com/docs/sql/create-materialized-view/#main). (Actual table support pending [#5266](https://github.com/MaterializeInc/materialize/issues/5266)) | | `sink` | YES | Creates a [sink](https://materialize.com/docs/sql/create-sink/#main). | | `ephemeral` | YES | Executes queries using CTEs. | | `incremental` | NO | Use the `materializedview` materialization instead. Materialized views will always return up-to-date results without manual or configured refreshes. For more information, check out [Materialize documentation](https://materialize.com/docs/). | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | ##### Indexes[​](#indexes "Direct link to Indexes") Materialized views (`materializedview`), views (`view`) and sources (`source`) may have a list of [`indexes`](https://docs.getdbt.com/reference/resource-configs/materialize-configs.md#indexes) defined. ##### Seeds[​](#seeds "Direct link to Seeds") Running [`dbt seed`](https://docs.getdbt.com/reference/commands/seed.md) will create a static materialized view from a CSV file. You will not be able to add to or update this view after it has been created. ##### Tests[​](#tests "Direct link to Tests") Running [`dbt test`](https://docs.getdbt.com/reference/commands/test.md) with the optional `--store-failures` flag or [`store_failures` config](https://docs.getdbt.com/reference/resource-configs/store_failures.md) will create a materialized view for each configured test that can keep track of failures over time. #### Resources[​](#resources "Direct link to Resources") * [dbt and Materialize guide](https://materialize.com/docs/guides/dbt/) #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ##### Connect MaxCompute to dbt Core * **Maintained by**: Alibaba Cloud MaxCompute Team * **Authors**: Alibaba Cloud MaxCompute Team * **GitHub repo**: [aliyun/dbt-maxcompute](https://github.com/aliyun/dbt-maxcompute) [![](https://img.shields.io/github/stars/aliyun/dbt-maxcompute?style=for-the-badge)](https://github.com/aliyun/dbt-maxcompute) * **PyPI package**: `dbt-maxcompute` [![](https://badge.fury.io/py/dbt-maxcompute.svg)](https://badge.fury.io/py/dbt-maxcompute) * **Slack channel**:[]() * **Supported dbt Core version**: v1.8.0 and newer * **dbt support**: Not Supported * **Minimum data platform version**: #### Installing dbt-maxcompute Use `pip` to install the adapter. Before 1.8, installing the adapter would automatically install `dbt-core` and any additional dependencies. Beginning in 1.8, installing an adapter does not automatically install `dbt-core`. This is because adapters and dbt Core versions have been decoupled from each other so we no longer want to overwrite existing dbt-core installations. Use the following command for installation: `python -m pip install dbt-core dbt-maxcompute` #### Configuring dbt-maxcompute For MaxCompute-specific configuration, please refer to [MaxCompute configs.](https://docs.getdbt.com/reference/resource-configs/no-configs.md) #### Connecting to MaxCompute with **dbt-maxcompute**[​](#connecting-to-maxcompute-with-dbt-maxcompute "Direct link to connecting-to-maxcompute-with-dbt-maxcompute") Check out the dbt profile configuration below for details. \~/.dbt/profiles.yml ```yaml dbt-maxcompute: # this needs to match the profile in your dbt_project.yml file target: dev outputs: dev: type: maxcompute project: PROJECT_ID schema: SCHEMA_NAME endpoint: ENDPOINT auth_type: access_key access_key_id: ACCESS_KEY_ID access_key_secret: ACCESS_KEY_SECRET ``` Currently it supports the following parameters: | **Field** | **Description** | Required? | **Example** | | ------------------- | -------------------------------------------------------------------------------------------------- | --------- | ------------------------------------------------------ | | `type` | Specifies the type of database connection; must be set to "maxcompute" for MaxCompute connections. | Required | `maxcompute` | | `project` | The name of your MaxCompute project. | Required | `dbt-project` | | `endpoint` | The endpoint URL for connecting to MaxCompute. | Required | `http://service.cn-shanghai.maxcompute.aliyun.com/api` | | `schema` | The namespace schema that the models will use in MaxCompute. | Required | `default` | | `auth_type` | Authentication type for accessing MaxCompute | Required | `access_key` | | `access_key_id` | The Access ID for authentication with MaxCompute. | Required | `XXX` | | `access_key_secret` | The Access Key for authentication with MaxCompute. | Required | `XXX` | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | See the section below for other authentication type. #### Authentication Configuration[​](#authentication-configuration "Direct link to Authentication Configuration") `dbt-maxcompute` is a dbt adapter that allows you to seamlessly integrate with Alibaba Cloud's MaxCompute service, enabling you to build and manage your data transformations using dbt. To ensure secure and flexible access to MaxCompute, `dbt-maxcompute` leverages the [credentials-python](https://github.com/aliyun/credentials-python) library, which provides comprehensive support for various authentication methods supported by Alibaba Cloud. With `dbt-maxcompute`, you can utilize all the authentication mechanisms provided by `credentials-python`, ensuring that your credentials are managed securely and efficiently. Whether you're using Access Keys, STS Tokens, RAM Roles, or other advanced authentication methods, `dbt-maxcompute` has got you covered. ##### Key Notes on Configuration[​](#key-notes-on-configuration "Direct link to Key Notes on Configuration") To avoid ambiguity in configuration options, some parameter names have been adjusted compared to those used in `credentials-python`. Specifically: * `type` becomes `auth_type` * `policy` becomes `auth_policy` * `host` becomes `auth_host` * `timeout` becomes `auth_timeout` * `connect_timeout` becomes `auth_connect_timeout` * `proxy` becomes `auth_proxy` These changes ensure clarity and consistency across different authentication methods while maintaining compatibility with the underlying `credentials-python` library. #### Usage[​](#usage "Direct link to Usage") Before you begin, you need to sign up for an Alibaba Cloud account and retrieve your [Credentials](https://usercenter.console.aliyun.com/#/manage/ak). ##### Credential Type[​](#credential-type "Direct link to Credential Type") ###### Access Key[​](#access-key "Direct link to Access Key") Setup access\_key credential through \[User Information Management]\[ak], it have full authority over the account, please keep it safe. Sometimes for security reasons, you cannot hand over a primary account AccessKey with full access to the developer of a project. You may create a sub-account \[RAM Sub-account]\[ram] , grant its \[authorization]\[permissions],and use the AccessKey of RAM Sub-account. ```yaml jaffle_shop: # this needs to match the profile in your dbt_project.yml file target: dev outputs: dev: type: maxcompute project: dbt-example # Replace this with your project name schema: default # Replace this with schema name, for example, dbt_bilbo endpoint: http://service.cn-shanghai.maxcompute.aliyun.com/api # Replace this with your maxcompute endpoint auth_type: access_key # credential type, Optional, default is 'access_key' access_key_id: accessKeyId # AccessKeyId access_key_secret: accessKeySecret # AccessKeySecret ``` ###### STS[​](#sts "Direct link to STS") Create a temporary security credential by applying Temporary Security Credentials (TSC) through the Security Token Service (STS). ```yaml jaffle_shop: # this needs to match the profile in your dbt_project.yml file target: dev outputs: dev: type: maxcompute project: dbt-example # Replace this with your project name schema: default # Replace this with schema name, for example, dbt_bilbo endpoint: http://service.cn-shanghai.maxcompute.aliyun.com/api # Replace this with your maxcompute endpoint auth_type: sts # credential type access_key_id: accessKeyId # AccessKeyId access_key_secret: accessKeySecret # AccessKeySecret security_token: securityToken # STS Token ``` ###### RAM Role ARN[​](#ram-role-arn "Direct link to RAM Role ARN") By specifying \[RAM Role]\[RAM Role], the credential will be able to automatically request maintenance of STS Token. If you want to limit the permissions(\[How to make a policy]\[policy]) of STS Token, you can assign value for `Policy`. ```yaml jaffle_shop: # this needs to match the profile in your dbt_project.yml file target: dev outputs: dev: type: maxcompute project: dbt-example # Replace this with your project name schema: default # Replace this with schema name, for example, dbt_bilbo endpoint: http://service.cn-shanghai.maxcompute.aliyun.com/api # Replace this with your maxcompute endpoint auth_type: ram_role_arn # credential type access_key_id: accessKeyId # AccessKeyId access_key_secret: accessKeySecret # AccessKeySecret security_token: securityToken # STS Token role_arn: roleArn # Format: acs:ram::USER_ID:role/ROLE_NAME role_session_name: roleSessionName # Role Session Name auth_policy: policy # Not required, limit the permissions of STS Token role_session_expiration: 3600 # Not required, limit the Valid time of STS Token ``` ###### OIDC Role ARN[​](#oidc-role-arn "Direct link to OIDC Role ARN") By specifying \[OIDC Role]\[OIDC Role], the credential will be able to automatically request maintenance of STS Token. If you want to limit the permissions(\[How to make a policy]\[policy]) of STS Token, you can assign value for `Policy`. ```yaml jaffle_shop: # this needs to match the profile in your dbt_project.yml file target: dev outputs: dev: type: maxcompute project: dbt-example # Replace this with your project name schema: default # Replace this with schema name, for example, dbt_bilbo endpoint: http://service.cn-shanghai.maxcompute.aliyun.com/api # Replace this with your maxcompute endpoint auth_type: oidc_role_arn # credential type access_key_id: accessKeyId # AccessKeyId access_key_secret: accessKeySecret # AccessKeySecret security_token: securityToken # STS Token role_arn: roleArn # Format: acs:ram::USER_ID:role/ROLE_NAME oidc_provider_arn: oidcProviderArn # Format: acs:ram::USER_Id:oidc-provider/OIDC Providers oidc_token_file_path: /Users/xxx/xxx # oidc_token_file_path can be replaced by setting environment variable: ALIBABA_CLOUD_OIDC_TOKEN_FILE role_session_name: roleSessionName # Role Session Name auth_policy: policy # Not required, limit the permissions of STS Token role_session_expiration: 3600 # Not required, limit the Valid time of STS Token ``` ###### ECS RAM Role[​](#ecs-ram-role "Direct link to ECS RAM Role") Both ECS and ECI instances support binding instance RAM roles. When the Credentials tool is used in an instance, the RAM role bound to the instance will be automatically obtained, and the STS Token of the RAM role will be obtained by accessing the metadata service to complete the initialization of the credential client. The instance metadata server supports two access modes: hardened mode and normal mode. The Credentials tool uses hardened mode (IMDSv2) by default to obtain access credentials. If an exception occurs when using hardened mode, you can set disable\_imds\_v1 to perform different exception handling logic: * When the value is false (default value), the normal mode will continue to be used to obtain access credentials. * When the value is true, it means that only hardened mode can be used to obtain access credentials, and an exception will be thrown. Whether the server supports IMDSv2 depends on your configuration on the server. ```yaml jaffle_shop: # this needs to match the profile in your dbt_project.yml file target: dev outputs: dev: type: maxcompute project: dbt-example # Replace this with your project name schema: default # Replace this with schema name, for example, dbt_bilbo endpoint: http://service.cn-shanghai.maxcompute.aliyun.com/api # Replace this with your maxcompute endpoint auth_type: ecs_ram_role # credential type role_name: roleName # `role_name` is optional. It will be retrieved automatically if not set. It is highly recommended to set it up to reduce requests. disable_imds_v1: True # Optional, whether to forcibly disable IMDSv1, that is, to use IMDSv2 hardening mode, which can be set by the environment variable ALIBABA_CLOUD_IMDSV1_DISABLED ``` ###### Credentials URI[​](#credentials-uri "Direct link to Credentials URI") By specifying a credentials uri, get credential from the local or remote uri, the credential will be able to automatically request maintenance to keep it update. ```yaml jaffle_shop: # this needs to match the profile in your dbt_project.yml file target: dev outputs: dev: type: maxcompute project: dbt-example # Replace this with your project name schema: default # Replace this with schema name, for example, dbt_bilbo endpoint: http://service.cn-shanghai.maxcompute.aliyun.com/api # Replace this with your maxcompute endpoint auth_type: credentials_uri # credential type credentials_uri: http://local_or_remote_uri/ # Credentials URI ``` ###### Bearer[​](#bearer "Direct link to Bearer") If credential is required by the Cloud Call Centre (CCC), please apply for Bearer Token maintenance by yourself. ```yaml jaffle_shop: # this needs to match the profile in your dbt_project.yml file target: dev outputs: dev: type: maxcompute project: dbt-example # Replace this with your project name schema: default # Replace this with schema name, for example, dbt_bilbo endpoint: http://service.cn-shanghai.maxcompute.aliyun.com/api # Replace this with your maxcompute endpoint auth_type: bearer # credential type bearer_token: bearerToken # BearerToken ``` ##### Use the credential provider chain[​](#use-the-credential-provider-chain "Direct link to Use the credential provider chain") ```yaml jaffle_shop: # this needs to match the profile in your dbt_project.yml file target: dev outputs: dev: type: maxcompute project: dbt-example # Replace this with your project name schema: default # Replace this with schema name, for example, dbt_bilbo endpoint: http://service.cn-shanghai.maxcompute.aliyun.com/api # Replace this with your maxcompute endpoint auth_type: chain ``` The default credential provider chain looks for available credentials, with following order: 1. Environment Credentials Look for environment credentials in environment variable. If the `ALIBABA_CLOUD_ACCESS_KEY_ID` and `ALIBABA_CLOUD_ACCESS_KEY_SECRET` environment variables are defined and are not empty, the program will use them to create default credentials. If the `ALIBABA_CLOUD_ACCESS_KEY_ID`, `ALIBABA_CLOUD_ACCESS_KEY_SECRET` and `ALIBABA_CLOUD_SECURITY_TOKEN` environment variables are defined and are not empty, the program will use them to create temporary security credentials(STS). Note: This token has an expiration time, it is recommended to use it in a temporary environment. 2. Credentials File If there is `~/.alibabacloud/credentials.ini default file (Windows shows C:\Users\USER_NAME\.alibabacloud\credentials.ini)`, the program automatically creates credentials with the specified type and name. The default file is not necessarily exist, but a parse error will throw an exception. The name of configuration item is lowercase.This configuration file can be shared between different projects and between different tools. Because it is outside of the project and will not be accidentally committed to the version control. The path to the default file can be modified by defining the `ALIBABA_CLOUD_CREDENTIALS_FILE` environment variable. If not configured, use the default configuration `default`. You can also set the environment variables `ALIBABA_CLOUD_PROFILE` to use the configuration. ```ini [default] # default setting enable = true # Enable,Enabled by default if this option is not present type = access_key # Certification type: access_key access_key_id = foo # Key access_key_secret = bar # Secret [client1] # configuration that is named as `client1` type = ecs_ram_role # Certification type: ecs_ram_role role_name = EcsRamRoleTest # Role Name [client2] # configuration that is named as `client2` enable = false # Disable type = ram_role_arn # Certification type: ram_role_arn region_id = cn-test policy = test # optional Specify permissions access_key_id = foo access_key_secret = bar role_arn = role_arn role_session_name = session_name # optional [client3] # configuration that is named as `client3` enable = false # Disable type = oidc_role_arn # Certification type: oidc_role_arn region_id = cn-test policy = test # optional Specify permissions access_key_id = foo # optional access_key_secret = bar # optional role_arn = role_arn oidc_provider_arn = oidc_provider_arn oidc_token_file_path = /xxx/xxx # can be replaced by setting environment variable: ALIBABA_CLOUD_OIDC_TOKEN_FILE role_session_name = session_name # optional ``` 3. Instance RAM Role If there is no credential information with a higher priority, the Credentials tool will obtain the value of ALIBABA\_CLOUD\_ECS\_METADATA (ECS instance RAM role name) through the environment variable. If the value of this variable exists, the program will use the hardened mode (IMDSv2) to access the metadata service (Meta Data Server) of ECS to obtain the STS Token of the ECS instance RAM role as the default credential information. If an exception occurs when using the hardened mode, the normal mode will be used as a fallback to obtain access credentials. You can also set the environment variable ALIBABA\_CLOUD\_IMDSV1\_DISABLED to perform different exception handling logic: * When the value is false, the normal mode will continue to obtain access credentials. * When the value is true, it means that only the hardened mode can be used to obtain access credentials, and an exception will be thrown. Whether the server supports IMDSv2 depends on your configuration on the server. 4. Credentials URI If the environment variable `ALIBABA_CLOUD_CREDENTIALS_URI` is defined and not empty, the program will take the value of the environment variable as credentials uri to get the temporary Security credentials. #### References[​](#references "Direct link to References") * [Credentials Python](https://github.com/aliyun/credentials-python) #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ##### Connect Microsoft Azure Synapse Analytics to dbt Core info The following is a guide to using Azure Synapse Analytics dedicated SQL pools (formerly SQL DW). For more info, refer to [What is dedicated SQL pool (formerly SQL DW) in Azure Synapse Analytics?](https://learn.microsoft.com/azure/synapse-analytics/sql-data-warehouse/sql-data-warehouse-overview-what-is) for more info. For Microsoft Fabric setup with dbt, refer to [Microsoft Fabric Data Warehouse](https://docs.getdbt.com/docs/local/connect-data-platform/fabric-setup.md). * **Maintained by**: Microsoft * **Authors**: Microsoft (https://github.com/Microsoft) * **GitHub repo**: [Microsoft/dbt-synapse](https://github.com/Microsoft/dbt-synapse) [![](https://img.shields.io/github/stars/Microsoft/dbt-synapse?style=for-the-badge)](https://github.com/Microsoft/dbt-synapse) * **PyPI package**: `dbt-synapse` [![](https://badge.fury.io/py/dbt-synapse.svg)](https://badge.fury.io/py/dbt-synapse) * **Slack channel**: [#db-synapse](https://getdbt.slack.com/archives/C01DRQ178LQ) * **Supported dbt Core version**: v0.18.0 and newer * **dbt support**: Supported * **Minimum data platform version**: Azure Synapse 10 #### Installing dbt-synapse Use `pip` to install the adapter. Before 1.8, installing the adapter would automatically install `dbt-core` and any additional dependencies. Beginning in 1.8, installing an adapter does not automatically install `dbt-core`. This is because adapters and dbt Core versions have been decoupled from each other so we no longer want to overwrite existing dbt-core installations. Use the following command for installation: `python -m pip install dbt-core dbt-synapse` #### Configuring dbt-synapse For Synapse-specific configuration, please refer to [Synapse configs.](https://docs.getdbt.com/reference/resource-configs/azuresynapse-configs.md) Dedicated SQL only Azure Synapse Analytics offers both dedicated SQL pools and serverless SQL pools. \*\*Only Dedicated SQL Pools are supported by this adapter. ##### Prerequisites[​](#prerequisites "Direct link to Prerequisites") On Debian/Ubuntu make sure you have the ODBC header files before installing ```bash sudo apt install unixodbc-dev ``` Download and install the [Microsoft ODBC Driver 18 for SQL Server](https://docs.microsoft.com/sql/connect/odbc/download-odbc-driver-for-sql-server?view=sql-server-ver15). If you already have ODBC Driver 17 installed, then that one will work as well. Default settings change in dbt-synapse v1.2 / ODBC Driver 18 Microsoft made several changes related to connection encryption. Read more about the changes [here](https://docs.getdbt.com/docs/local/connect-data-platform/mssql-setup.md). ##### Authentication methods[​](#authentication-methods "Direct link to Authentication methods") This adapter is based on the adapter for Microsoft SQL Server. Therefore, the same authentication methods are supported. The configuration is the same except for 1 major difference: instead of specifying `type: sqlserver`, you specify `type: synapse`. Example: profiles.yml ```yaml your_profile_name: target: dev outputs: dev: type: synapse driver: 'ODBC Driver 17 for SQL Server' # (The ODBC Driver installed on your system) server: workspacename.sql.azuresynapse.net # (Dedicated SQL endpoint of your workspace here) port: 1433 database: exampledb schema: schema_name user: username password: password ``` You can find all the available options and the documentation and how to configure them on [the documentation page for the dbt-sqlserver adapter](https://docs.getdbt.com/docs/local/connect-data-platform/mssql-setup.md). #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ##### Connect Microsoft Fabric Data Warehouse to dbt Core `profiles.yml` file is for dbt Core and dbt fusion only If you're using dbt platform, you don't need to create a `profiles.yml` file. This file is only necessary when you use dbt Core or dbt Fusion locally. To learn more about Fusion prerequisites, refer to [Supported features](https://docs.getdbt.com/docs/fusion/supported-features.md). To connect your data platform to dbt, refer to [About data platforms](https://docs.getdbt.com/docs/cloud/connect-data-platform/about-connections.md). Below is a guide for use with [Fabric Data Warehouse](https://learn.microsoft.com/en-us/fabric/data-warehouse/data-warehousing#synapse-data-warehouse), a new product within Microsoft Fabric. The adapter currently supports connecting to a warehouse. To learn how to set up dbt using Fabric Lakehouse, refer to [Microsoft Fabric Lakehouse](https://docs.getdbt.com/docs/local/connect-data-platform/fabricspark-setup.md). To learn how to set up dbtAnalytics dedicated SQL pools, refer to [Microsoft Azure Synapse Analytics setup](https://docs.getdbt.com/docs/local/connect-data-platform/azuresynapse-setup.md). * **Maintained by**: Microsoft * **Authors**: Microsoft * **GitHub repo**: [Microsoft/dbt-fabric](https://github.com/Microsoft/dbt-fabric) [![](https://img.shields.io/github/stars/Microsoft/dbt-fabric?style=for-the-badge)](https://github.com/Microsoft/dbt-fabric) * **PyPI package**: `dbt-fabric` [![](https://badge.fury.io/py/dbt-fabric.svg)](https://badge.fury.io/py/dbt-fabric) * **Slack channel**:[]() * **Supported dbt Core version**: 1.4.0 and newer * **dbt support**: Supported * **Minimum data platform version**: #### Installing dbt-fabric Use `pip` to install the adapter. Before 1.8, installing the adapter would automatically install `dbt-core` and any additional dependencies. Beginning in 1.8, installing an adapter does not automatically install `dbt-core`. This is because adapters and dbt Core versions have been decoupled from each other so we no longer want to overwrite existing dbt-core installations. Use the following command for installation: `python -m pip install dbt-core dbt-fabric` #### Configuring dbt-fabric For Microsoft Fabric-specific configuration, please refer to [Microsoft Fabric configs.](https://docs.getdbt.com/reference/resource-configs/fabric-configs.md) ##### Prerequisites[​](#prerequisites "Direct link to Prerequisites") On Debian/Ubuntu make sure you have the ODBC header files before installing ```bash sudo apt install unixodbc-dev ``` Download and install the [Microsoft ODBC Driver 18 for SQL Server](https://docs.microsoft.com/en-us/sql/connect/odbc/download-odbc-driver-for-sql-server?view=sql-server-ver15). If you already have ODBC Driver 17 installed, then that one will work as well. ###### Supported configurations[​](#supported-configurations "Direct link to Supported configurations") * The adapter is tested with Microsoft Fabric Data Warehouse (also referred to as warehouses). * We test all combinations with Microsoft ODBC Driver 17 and Microsoft ODBC Driver 18. * The collations we run our tests on are `Latin1_General_100_BIN2_UTF8`. The adapter support is not limited to the matrix of the above configurations. If you notice an issue with any other configuration, let us know by opening an issue on [GitHub](https://github.com/microsoft/dbt-fabric). ###### Unsupported configurations[​](#unsupported-configurations "Direct link to Unsupported configurations") SQL analytics endpoints are read-only and so are not appropriate for Transformation workloads, use a Warehouse instead. #### Authentication methods & profile configuration[​](#authentication-methods--profile-configuration "Direct link to Authentication methods & profile configuration") Supported authentication methods Microsoft Fabric supports two authentication types: * Microsoft Entra service principal * Microsoft Entra password To better understand the authentication mechanisms, read our [Connect Microsoft Fabric](https://docs.getdbt.com/docs/cloud/connect-data-platform/connect-microsoft-fabric.md) page. ##### Common configuration[​](#common-configuration "Direct link to Common configuration") For all the authentication methods, refer to the following configuration options that can be set in your `profiles.yml` file. A complete reference of all options can be found [at the end of this page](#reference-of-all-connection-options). | Configuration option | Description | Type | Example | | ---------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | -------- | ------------------------------- | | `driver` | The ODBC driver to use | Required | `ODBC Driver 18 for SQL Server` | | `server` | The server hostname | Required | `localhost` | | `port` | The server port | Required | `1433` | | `database` | The database name | Required | Not applicable | | `schema` | The schema name | Required | `dbo` | | `retries` | The number of automatic times to retry a query before failing. Defaults to `1`. Queries with syntax errors will not be retried. This setting can be used to overcome intermittent network issues. | Optional | Not applicable | | `login_timeout` | The number of seconds used to establish a connection before failing. Defaults to `0`, which means that the timeout is disabled or uses the default system settings. | Optional | Not applicable | | `query_timeout` | The number of seconds used to wait for a query before failing. Defaults to `0`, which means that the timeout is disabled or uses the default system settings. | Optional | Not applicable | | `schema_authorization` | Optionally set this to the principal who should own the schemas created by dbt. [Read more about schema authorization](#schema-authorization). | Optional | Not applicable | | `encrypt` | Whether to encrypt the connection to the server. Defaults to `true`. Read more about [connection encryption](#connection-encryption). | Optional | Not applicable | | `trust_cert` | Whether to trust the server certificate. Defaults to `false`. Read more about [connection encryption](#connection-encryption). | Optional | Not applicable | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | ##### Connection encryption[​](#connection-encryption "Direct link to Connection encryption") Microsoft made several changes in the release of ODBC Driver 18 that affects how connection encryption is configured. To accommodate these changes, starting in dbt-sqlserver 1.2.0 or newer the default values of `encrypt` and `trust_cert` have changed. Both of these settings will now **always** be included in the connection string to the server, regardless if you've left them out of your profile configuration or not. * The default value of `encrypt` is `true`, meaning that connections are encrypted by default. * The default value of `trust_cert` is `false`, meaning that the server certificate will be validated. By setting this to `true`, a self-signed certificate will be accepted. More details about how these values affect your connection and how they are used differently in versions of the ODBC driver can be found in the [Microsoft documentation](https://learn.microsoft.com/en-us/sql/connect/odbc/dsn-connection-string-attribute?view=sql-server-ver16#encrypt). ##### Standard SQL Server authentication[​](#standard-sql-server-authentication "Direct link to Standard SQL Server authentication") SQL Server and windows authentication are not supported by Microsoft Fabric Data Warehouse. ##### Microsoft Entra ID authentication[​](#microsoft-entra-id-authentication "Direct link to Microsoft Entra ID authentication") Microsoft Entra ID (formerly Azure AD) authentication is a default authentication mechanism in Microsoft Fabric Data Warehouse. The following additional methods are available to authenticate to Azure SQL products: * Microsoft Entra ID username and password * Service principal * Environment-based authentication * Azure CLI authentication * VS Code authentication (available through the automatic option below) * Azure PowerShell module authentication (available through the automatic option below) * Automatic authentication The automatic authentication setting is in most cases the easiest choice and works for all of the above. * Microsoft Entra ID username & password * Service principal * Managed Identity * Environment-based * Azure CLI * Automatic profiles.yml ```yaml your_profile_name: target: dev outputs: dev: type: fabric driver: 'ODBC Driver 18 for SQL Server' # (The ODBC Driver installed on your system) server: hostname or IP of your server port: 1433 database: exampledb schema: schema_name authentication: ActiveDirectoryPassword user: bill.gates@microsoft.com password: iheartopensource ``` Client ID is often also referred to as Application ID. profiles.yml ```yaml your_profile_name: target: dev outputs: dev: type: fabric driver: 'ODBC Driver 18 for SQL Server' # (The ODBC Driver installed on your system) server: hostname or IP of your server port: 1433 database: exampledb schema: schema_name authentication: ServicePrincipal tenant_id: 00000000-0000-0000-0000-000000001234 client_id: 00000000-0000-0000-0000-000000001234 client_secret: S3cret! ``` This authentication option allows you to dynamically select an authentication method depending on the available environment variables. [The Microsoft docs on EnvironmentCredential](https://docs.microsoft.com/en-us/python/api/azure-identity/azure.identity.environmentcredential?view=azure-python) explain the available combinations of environment variables you can use. profiles.yml ```yaml your_profile_name: target: dev outputs: dev: type: fabric driver: 'ODBC Driver 18 for SQL Server' # (The ODBC Driver installed on your system) server: hostname or IP of your server port: 1433 database: exampledb schema: schema_name authentication: environment ``` First, install the [Azure CLI](https://docs.microsoft.com/en-us/cli/azure/install-azure-cli), then, log in: `az login` profiles.yml ```yaml your_profile_name: target: dev outputs: dev: type: fabric driver: 'ODBC Driver 18 for SQL Server' # (The ODBC Driver installed on your system) server: hostname or IP of your server port: 1433 database: exampledb schema: schema_name authentication: CLI ``` This authentication option will automatically try to use all available authentication methods. The following methods are tried in order: 1. Environment-based authentication 2. Managed Identity authentication. Managed Identity is not supported at this time. 3. Visual Studio authentication (*Windows only, ignored on other operating systems*) 4. Visual Studio Code authentication 5. Azure CLI authentication 6. Azure PowerShell module authentication profiles.yml ```yaml your_profile_name: target: dev outputs: dev: type: fabric driver: 'ODBC Driver 18 for SQL Server' # (The ODBC Driver installed on your system) server: hostname or IP of your server port: 1433 database: exampledb schema: schema_name authentication: auto ``` ###### Additional options for Microsoft Entra ID on Windows[​](#additional-options-for-microsoft-entra-id-on-windows "Direct link to Additional options for Microsoft Entra ID on Windows") On Windows systems, the following additional authentication methods are also available for Azure SQL: * Microsoft Entra ID interactive * Microsoft Entra ID integrated * Visual Studio authentication (available through the automatic option above) - Microsoft Entra ID interactive - Microsoft Entra ID integrated This setting can optionally show Multi-Factor Authentication prompts. profiles.yml ```yaml your_profile_name: target: dev outputs: dev: type: fabric driver: 'ODBC Driver 18 for SQL Server' # (The ODBC Driver installed on your system) server: hostname or IP of your server port: 1433 database: exampledb schema: schema_name authentication: ActiveDirectoryInteractive user: bill.gates@microsoft.com ``` This uses the credentials you're logged in with on the current machine. profiles.yml ```yaml your_profile_name: target: dev outputs: dev: type: fabric driver: 'ODBC Driver 18 for SQL Server' # (The ODBC Driver installed on your system) server: hostname or IP of your server port: 1433 database: exampledb schema: schema_name authentication: ActiveDirectoryIntegrated ``` ##### Automatic Microsoft Entra ID principal provisioning for grants[​](#automatic-microsoft-entra-id-principal-provisioning-for-grants "Direct link to Automatic Microsoft Entra ID principal provisioning for grants") Please note that automatic Microsoft Entra ID principal provisioning is not supported by Microsoft Fabric Data Warehouse at this time. Even though in dbtn use the [grants](https://docs.getdbt.com/reference/resource-configs/grants.md) config block to automatically grant/revoke permissions on your models to users or groups, the data warehouse does not support this feature at this time. You need to add the service principal or Microsoft Entra identity to a Fabric Workspace as an admin ##### Schema authorization[​](#schema-authorization "Direct link to Schema authorization") You can optionally set the principal who should own all schemas created by dbt. This is then used in the `CREATE SCHEMA` statement like so: ```sql CREATE SCHEMA [schema_name] AUTHORIZATION [schema_authorization] ``` A common use case is to use this when you are authenticating with a principal who has permissions based on a group, such as a Microsoft Entra ID group. When that principal creates a schema, the server will first try to create an individual login for this principal and then link the schema to that principal. If you would be using Microsoft Entra ID in this case, then this would fail since Azure SQL can't create logins for individuals part of an AD group automatically. ##### Reference of all connection options[​](#reference-of-all-connection-options "Direct link to Reference of all connection options") | Configuration option | Description | Required | Default value | | ---------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------- | -------- | ------------- | | `driver` | The ODBC driver to use. | ✅ | | | `host` | The hostname of the database server. | ✅ | | | `port` | The port of the database server. | | `1433` | | `database` | The name of the database to connect to. | ✅ | | | `schema` | The schema to use. | ✅ | | | `authentication` | The authentication method to use. This is not required for Windows authentication. | | `'sql'` | | `UID` | Username used to authenticate. This can be left out depending on the authentication method. | | | | `PWD` | Password used to authenticate. This can be left out depending on the authentication method. | | | | `tenant_id` | The tenant ID of the Microsoft Entra ID instance. This is only used when connecting to Azure SQL with a service principal. | | | | `client_id` | The client ID of the Microsoft Entra service principal. This is only used when connecting to Azure SQL with a Microsoft Entra service principal. | | | | `client_secret` | The client secret of the Microsoft Entra service principal. This is only used when connecting to Azure SQL with a Microsoft Entra service principal. | | | | `encrypt` | Set this to `false` to disable the use of encryption. See [above](#connection-encryption). | | `true` | | `trust_cert` | Set this to `true` to trust the server certificate. See [above](#connection-encryption). | | `false` | | `retries` | The number of times to retry a failed connection. | | `1` | | `schema_authorization` | Optionally set this to the principal who should own the schemas created by dbt. [Details above](#schema-authorization). | | | | `login_timeout` | The amount of seconds to wait until a response from the server is received when establishing a connection. `0` means that the timeout is disabled. | | `0` | | `query_timeout` | The amount of seconds to wait until a response from the server is received when executing a query. `0` means that the timeout is disabled. | | `0` | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | Valid values for `authentication`: * `ActiveDirectoryPassword`: Active Directory authentication using username and password * `ActiveDirectoryInteractive`: Active Directory authentication using a username and MFA prompts * `ActiveDirectoryIntegrated`: Active Directory authentication using the current user's credentials * `ServicePrincipal`: Microsoft Entra ID authentication using a service principal * `CLI`: Microsoft Entra ID authentication using the account you're logged in within the Azure CLI * `environment`: Microsoft Entra ID authentication using environment variables as documented [here](https://learn.microsoft.com/en-us/python/api/azure-identity/azure.identity.environmentcredential?view=azure-python) * `auto`: Microsoft Entra ID authentication trying the previous authentication methods until it finds one that works #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ##### Connect Microsoft Fabric Lakehouse to dbt Core `profiles.yml` file is for dbt Core and dbt fusion only If you're using dbt platform, you don't need to create a `profiles.yml` file. This file is only necessary when you use dbt Core or dbt Fusion locally. To learn more about Fusion prerequisites, refer to [Supported features](https://docs.getdbt.com/docs/fusion/supported-features.md). To connect your data platform to dbt, refer to [About data platforms](https://docs.getdbt.com/docs/cloud/connect-data-platform/about-connections.md). Below is a guide for use with [Fabric Data Engineering](https://learn.microsoft.com/en-us/fabric/data-engineering/data-engineering-overview), a new product within Microsoft Fabric. This adapter currently supports connecting to a lakehouse endpoint. To learn how to set up dbt using Fabric Warehouse, refer to [Microsoft Fabric Data Warehouse](https://docs.getdbt.com/docs/local/connect-data-platform/fabric-setup.md). * **Maintained by**: Microsoft * **Authors**: Microsoft * **GitHub repo**: [microsoft/dbt-fabricspark](https://github.com/microsoft/dbt-fabricspark) [![](https://img.shields.io/github/stars/microsoft/dbt-fabricspark?style=for-the-badge)](https://github.com/microsoft/dbt-fabricspark) * **PyPI package**: `dbt-fabricspark` [![](https://badge.fury.io/py/dbt-fabricspark.svg)](https://badge.fury.io/py/dbt-fabricspark) * **Slack channel**: [db-fabric-synapse](https://getdbt.slack.com/archives/C01DRQ178LQ) * **Supported dbt Core version**: v1.7 and newer * **dbt support**: Not supported * **Minimum data platform version**: n/a #### Installing dbt-fabricspark Use `pip` to install the adapter. Before 1.8, installing the adapter would automatically install `dbt-core` and any additional dependencies. Beginning in 1.8, installing an adapter does not automatically install `dbt-core`. This is because adapters and dbt Core versions have been decoupled from each other so we no longer want to overwrite existing dbt-core installations. Use the following command for installation: `python -m pip install dbt-core dbt-fabricspark` #### Configuring dbt-fabricspark For Microsoft Fabric-specific configuration, please refer to [Microsoft Fabric configs.](https://docs.getdbt.com/reference/resource-configs/fabricspark-configs.md) For further info, refer to the GitHub repository: [microsoft/dbt-fabricspark](https://github.com/microsoft/dbt-fabricspark) #### Authentication[​](#authentication "Direct link to Authentication") The Fabric Lakehouse adapter (`dbt-fabricspark`) connects to Fabric Spark through the Livy API. You can authenticate using Azure CLI, which allows dbt to use credentials from an active `az login` session. To use this method, set `authentication: CLI` in your `profiles.yml` file and run `az login`. When you authenticate, Azure CLI may open a browser window or prompt you to complete sign-in on the [Microsoft device login](https://microsoft.com/devicelogin) page and enter a one-time code to complete sign-in. Once authentication is successful, dbt automatically reuses the active Azure CLI session for subsequent commands. Refer to [`session-jobs`](https://docs.getdbt.com/docs/local/connect-data-platform/fabricspark-setup.md#session-jobs) for an example authentication configuration. #### Connection methods[​](#connection-methods "Direct link to Connection methods") `dbt-fabricspark` can connect to Fabric Spark runtime using Fabric Livy API method. The Fabric Livy API allows submitting jobs in two different modes: * [`session-jobs`](#session-jobs) A Livy session job entails establishing a Spark session that remains active throughout the Spark session. A Spark session, can run multiple jobs (each job is an action), sharing state and cached data between jobs. * batch jobs entails submitting a Spark application for a single job execution. In contrast to a Livy session job, a batch job doesn't sustain an ongoing Spark session. With Livy batch jobs, each job initiates a new Spark session that ends when the job finishes. Supported mode To share the session state among jobs and reduce the overhead of session management, `dbt-fabricspark` adapter supports only `session-jobs` mode. ##### session-jobs[​](#session-jobs "Direct link to session-jobs") `session-jobs` is the preferred method when connecting to Fabric Lakehouse. \~/.dbt/profiles.yml ```yaml your_profile_name: target: dev outputs: dev: type: fabricspark method: livy authentication: CLI endpoint: https://api.fabric.microsoft.com/v1 workspaceid: [Fabric Workspace GUID] lakehouseid: [Lakehouse GUID] lakehouse: [Lakehouse Name] schema: [Lakehouse Name] spark_config: name: [Application Name] # optional archives: - "example-archive.zip" conf: spark.executor.memory: "2g" spark.executor.cores: "2" tags: project: [Project Name] user: [User Email] driverMemory: "2g" driverCores: 2 executorMemory: "4g" executorCores: 4 numExecutors: 3 # optional connect_retries: 0 connect_timeout: 10 retry_all: true ``` #### Optional configurations[​](#optional-configurations "Direct link to Optional configurations") ##### Retries[​](#retries "Direct link to Retries") Intermittent errors can crop up unexpectedly while running queries against Fabric Spark. If `retry_all` is enabled, `dbt-fabricspark` will naively retry any queries that fails, based on the configuration supplied by `connect_timeout` and `connect_retries`. It does not attempt to determine if the query failure was transient or likely to succeed on retry. This configuration is recommended in production environments, where queries ought to be succeeding. The default `connect_retries` configuration is 2. For instance, this will instruct dbt to retry all failed queries up to 3 times, with a 5 second delay between each retry: \~/.dbt/profiles.yml ```yaml retry_all: true connect_timeout: 5 connect_retries: 3 ``` ##### Spark configuration[​](#spark-configuration "Direct link to Spark configuration") Spark can be customized using [Application Properties](https://spark.apache.org/docs/latest/configuration.html). Using these properties the execution can be customized, for example, to allocate more memory to the driver process. Also, the Spark SQL runtime can be set through these properties. For example, this allows the user to [set a Spark catalogs](https://spark.apache.org/docs/latest/configuration.html#spark-sql). ##### Supported functionality[​](#supported-functionality "Direct link to Supported functionality") Most dbt Core functionality is supported, Please refer to [Delta Lake interoporability](https://learn.microsoft.com/en-us/fabric/fundamentals/delta-lake-interoperability). Delta-only features: 1. Incremental model updates by `unique_key` instead of `partition_by` (see [`merge` strategy](https://docs.getdbt.com/reference/resource-configs/spark-configs.md#the-merge-strategy)) 2. [Snapshots](https://docs.getdbt.com/docs/build/snapshots.md) 3. [Persisting](https://docs.getdbt.com/reference/resource-configs/persist_docs.md) column-level descriptions as database comments ##### Limitations[​](#limitations "Direct link to Limitations") 1. Lakehouse schemas are not supported. Refer to [limitations](https://learn.microsoft.com/en-us/fabric/data-engineering/lakehouse-schemas#public-preview-limitations) 2. Service principal authentication is not supported yet by Livy API. 3. Only Delta, CSV and Parquet table data formats are supported by Fabric Lakehouse. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ##### Connect Microsoft SQL Server to dbt Core Community plugin Some core functionality may be limited. If you're interested in contributing, check out the source code for each repository listed below. * **Maintained by**: Community * **Authors**: Mikael Ene & dbt-msft community (https://github.com/dbt-msft) * **GitHub repo**: [dbt-msft/dbt-sqlserver](https://github.com/dbt-msft/dbt-sqlserver) [![](https://img.shields.io/github/stars/dbt-msft/dbt-sqlserver?style=for-the-badge)](https://github.com/dbt-msft/dbt-sqlserver) * **PyPI package**: `dbt-sqlserver` [![](https://badge.fury.io/py/dbt-sqlserver.svg)](https://badge.fury.io/py/dbt-sqlserver) * **Slack channel**: [#db-sqlserver](https://getdbt.slack.com/archives/CMRMDDQ9W) * **Supported dbt Core version**: v0.14.0 and newer * **dbt support**: Not Supported * **Minimum data platform version**: SQL Server 2016 #### Installing dbt-sqlserver Use `pip` to install the adapter. Before 1.8, installing the adapter would automatically install `dbt-core` and any additional dependencies. Beginning in 1.8, installing an adapter does not automatically install `dbt-core`. This is because adapters and dbt Core versions have been decoupled from each other so we no longer want to overwrite existing dbt-core installations. Use the following command for installation: `python -m pip install dbt-core dbt-sqlserver` #### Configuring dbt-sqlserver For SQL Server-specific configuration, please refer to [SQL Server configs.](https://docs.getdbt.com/reference/resource-configs/mssql-configs.md) Default settings change in dbt-sqlserver v1.2 / ODBC Driver 18 Microsoft made several changes related to connection encryption. Read more about the changes [below](#connection-encryption). ##### Prerequisites[​](#prerequisites "Direct link to Prerequisites") On Debian/Ubuntu make sure you have the ODBC header files before installing ```bash sudo apt install unixodbc-dev ``` Download and install the [Microsoft ODBC Driver 18 for SQL Server](https://docs.microsoft.com/en-us/sql/connect/odbc/download-odbc-driver-for-sql-server?view=sql-server-ver15). If you already have ODBC Driver 17 installed, then that one will work as well. ###### Supported configurations[​](#supported-configurations "Direct link to Supported configurations") * The adapter is tested with SQL Server 2017, SQL Server 2019, SQL Server 2022 and Azure SQL Database. * We test all combinations with Microsoft ODBC Driver 17 and Microsoft ODBC Driver 18. * The collations we run our tests on are `SQL_Latin1_General_CP1_CI_AS` and `SQL_Latin1_General_CP1_CS_AS`. The adapter support is not limited to the matrix of the above configurations. If you notice an issue with any other configuration, let us know by opening an issue on [GitHub](https://github.com/dbt-msft/dbt-sqlserver). #### Authentication methods & profile configuration[​](#authentication-methods--profile-configuration "Direct link to Authentication methods & profile configuration") ##### Common configuration[​](#common-configuration "Direct link to Common configuration") For all the authentication methods, refer to the following configuration options that can be set in your `profiles.yml` file. A complete reference of all options can be found [at the end of this page](#reference-of-all-connection-options). | Configuration option | Description | Type | Example | | ---------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | -------- | ------------------------------- | | `driver` | The ODBC driver to use | Required | `ODBC Driver 18 for SQL Server` | | `server` | The server hostname | Required | `localhost` | | `port` | The server port | Required | `1433` | | `database` | The database name | Required | Not applicable | | `schema` | The schema name | Required | `dbo` | | `retries` | The number of automatic times to retry a query before failing. Defaults to `1`. Queries with syntax errors will not be retried. This setting can be used to overcome intermittent network issues. | Optional | Not applicable | | `login_timeout` | The number of seconds used to establish a connection before failing. Defaults to `0`, which means that the timeout is disabled or uses the default system settings. | Optional | Not applicable | | `query_timeout` | The number of seconds used to wait for a query before failing. Defaults to `0`, which means that the timeout is disabled or uses the default system settings. | Optional | Not applicable | | `schema_authorization` | Optionally set this to the principal who should own the schemas created by dbt. [Read more about schema authorization](#schema-authorization). | Optional | Not applicable | | `encrypt` | Whether to encrypt the connection to the server. Defaults to `true`. Read more about [connection encryption](#connection-encryption). | Optional | Not applicable | | `trust_cert` | Whether to trust the server certificate. Defaults to `false`. Read more about [connection encryption](#connection-encryption). | Optional | Not applicable | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | ##### Connection encryption[​](#connection-encryption "Direct link to Connection encryption") Microsoft made several changes in the release of ODBC Driver 18 that affects how connection encryption is configured. To accommodate these changes, starting in dbt-sqlserver 1.2.0 or newer the default values of `encrypt` and `trust_cert` have changed. Both of these settings will now **always** be included in the connection string to the server, regardless if you've left them out of your profile configuration or not. * The default value of `encrypt` is `true`, meaning that connections are encrypted by default. * The default value of `trust_cert` is `false`, meaning that the server certificate will be validated. By setting this to `true`, a self-signed certificate will be accepted. More details about how these values affect your connection and how they are used differently in versions of the ODBC driver can be found in the [Microsoft documentation](https://learn.microsoft.com/en-us/sql/connect/odbc/dsn-connection-string-attribute?view=sql-server-ver16#encrypt). ##### Standard SQL Server authentication[​](#standard-sql-server-authentication "Direct link to Standard SQL Server authentication") SQL Server credentials are supported for on-premise servers as well as Azure, and it is the default authentication method for `dbt-sqlserver`. When running on Windows, you can also use your Windows credentials to authenticate. * SQL Server credentials * Windows credentials profiles.yml ```yaml your_profile_name: target: dev outputs: dev: type: sqlserver driver: 'ODBC Driver 18 for SQL Server' # (The ODBC Driver installed on your system) server: hostname or IP of your server port: 1433 database: database schema: schema_name user: username password: password ``` profiles.yml ```yaml your_profile_name: target: dev outputs: dev: type: sqlserver driver: 'ODBC Driver 18 for SQL Server' # (The ODBC Driver installed on your system) server: hostname or IP of your server port: 1433 database: exampledb schema: schema_name windows_login: True ``` ##### Microsoft Entra ID authentication[​](#microsoft-entra-id-authentication "Direct link to Microsoft Entra ID authentication") While you can use the SQL username and password authentication as mentioned above, you might opt to use one of the authentication methods below for Azure SQL. The following additional methods are available to authenticate to Azure SQL products: * Microsoft Entra ID (formerly Azure AD) username and password * Service principal * Managed Identity * Environment-based authentication * Azure CLI authentication * VS Code authentication (available through the automatic option below) * Azure PowerShell module authentication (available through the automatic option below) * Automatic authentication The automatic authentication setting is in most cases the easiest choice and works for all of the above. * Microsoft Entra ID username & password * Service principal * Managed Identity * Environment-based * Azure CLI * Automatic profiles.yml ```yaml your_profile_name: target: dev outputs: dev: type: sqlserver driver: 'ODBC Driver 18 for SQL Server' # (The ODBC Driver installed on your system) server: hostname or IP of your server port: 1433 database: exampledb schema: schema_name authentication: ActiveDirectoryPassword user: bill.gates@microsoft.com password: iheartopensource ``` Client ID is often also referred to as Application ID. profiles.yml ```yaml your_profile_name: target: dev outputs: dev: type: sqlserver driver: 'ODBC Driver 18 for SQL Server' # (The ODBC Driver installed on your system) server: hostname or IP of your server port: 1433 database: exampledb schema: schema_name authentication: ServicePrincipal tenant_id: 00000000-0000-0000-0000-000000001234 client_id: 00000000-0000-0000-0000-000000001234 client_secret: S3cret! ``` Both system-assigned and user-assigned managed identities will work. profiles.yml ```yaml your_profile_name: target: dev outputs: dev: type: sqlserver driver: 'ODBC Driver 18 for SQL Server' # (The ODBC Driver installed on your system) server: hostname or IP of your server port: 1433 database: exampledb schema: schema_name authentication: ActiveDirectoryMsi ``` This authentication option allows you to dynamically select an authentication method depending on the available environment variables. [The Microsoft docs on EnvironmentCredential](https://docs.microsoft.com/en-us/python/api/azure-identity/azure.identity.environmentcredential?view=azure-python) explain the available combinations of environment variables you can use. profiles.yml ```yaml your_profile_name: target: dev outputs: dev: type: sqlserver driver: 'ODBC Driver 18 for SQL Server' # (The ODBC Driver installed on your system) server: hostname or IP of your server port: 1433 database: exampledb schema: schema_name authentication: environment ``` First, install the [Azure CLI](https://docs.microsoft.com/en-us/cli/azure/install-azure-cli), then, log in: `az login` profiles.yml ```yaml your_profile_name: target: dev outputs: dev: type: sqlserver driver: 'ODBC Driver 18 for SQL Server' # (The ODBC Driver installed on your system) server: hostname or IP of your server port: 1433 database: exampledb schema: schema_name authentication: CLI ``` This authentication option will automatically try to use all available authentication methods. The following methods are tried in order: 1. Environment-based authentication 2. Managed Identity authentication 3. Visual Studio authentication (*Windows only, ignored on other operating systems*) 4. Visual Studio Code authentication 5. Azure CLI authentication 6. Azure PowerShell module authentication profiles.yml ```yaml your_profile_name: target: dev outputs: dev: type: sqlserver driver: 'ODBC Driver 18 for SQL Server' # (The ODBC Driver installed on your system) server: hostname or IP of your server port: 1433 database: exampledb schema: schema_name authentication: auto ``` ###### Additional options for Microsoft Entra ID on Windows[​](#additional-options-for-microsoft-entra-id-on-windows "Direct link to Additional options for Microsoft Entra ID on Windows") On Windows systems, the following additional authentication methods are also available for Azure SQL: * Microsoft Entra ID interactive * Microsoft Entra ID integrated * Visual Studio authentication (available through the automatic option above) - Microsoft Entra ID interactive - Microsoft Entra ID integrated This setting can optionally show Multi-Factor Authentication prompts. profiles.yml ```yaml your_profile_name: target: dev outputs: dev: type: sqlserver driver: 'ODBC Driver 18 for SQL Server' # (The ODBC Driver installed on your system) server: hostname or IP of your server port: 1433 database: exampledb schema: schema_name authentication: ActiveDirectoryInteractive user: bill.gates@microsoft.com ``` This uses the credentials you're logged in with on the current machine. profiles.yml ```yaml your_profile_name: target: dev outputs: dev: type: sqlserver driver: 'ODBC Driver 18 for SQL Server' # (The ODBC Driver installed on your system) server: hostname or IP of your server port: 1433 database: exampledb schema: schema_name authentication: ActiveDirectoryIntegrated ``` ##### Automatic Microsoft Entra ID principal provisioning for grants[​](#automatic-microsoft-entra-id-principal-provisioning-for-grants "Direct link to Automatic Microsoft Entra ID principal provisioning for grants") In dbt 1.2 or newer you can use the [grants](https://docs.getdbt.com/reference/resource-configs/grants.md) config block to automatically grant/revoke permissions on your models to users or groups. This is fully supported in this adapter and comes with an additional feature. By setting `auto_provision_aad_principals` to `true` in your model configuration, you can automatically provision Microsoft Entra ID principals (users or groups) that don't exist yet. In Azure SQL, you can sign in using Microsoft Entra ID authentication, but to be able to grant a Microsoft Entra ID principal certain permissions, it needs to be linked in the database first. ([Microsoft documentation](https://learn.microsoft.com/en-us/azure/azure-sql/database/authentication-aad-configure?view=azuresql)) Note that principals will not be deleted automatically when they are removed from the `grants` block. ##### Schema authorization[​](#schema-authorization "Direct link to Schema authorization") You can optionally set the principal who should own all schemas created by dbt. This is then used in the `CREATE SCHEMA` statement like so: ```sql CREATE SCHEMA [schema_name] AUTHORIZATION [schema_authorization] ``` A common use case is to use this when you are authenticating with a principal who has permissions based on a group, such as a Microsoft Entra ID group. When that principal creates a schema, the server will first try to create an individual login for this principal and then link the schema to that principal. If you would be using Microsoft Entra ID in this case, then this would fail since Azure SQL can't create logins for individuals part of an AD group automatically. ##### Reference of all connection options[​](#reference-of-all-connection-options "Direct link to Reference of all connection options") | Configuration option | Description | Required | Default value | | ---------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------- | -------- | ------------- | | `driver` | The ODBC driver to use. | ✅ | | | `host` | The hostname of the database server. | ✅ | | | `port` | The port of the database server. | | `1433` | | `database` | The name of the database to connect to. | ✅ | | | `schema` | The schema to use. | ✅ | | | `authentication` | The authentication method to use. This is not required for Windows authentication. | | `'sql'` | | `UID` | Username used to authenticate. This can be left out depending on the authentication method. | | | | `PWD` | Password used to authenticate. This can be left out depending on the authentication method. | | | | `windows_login` | Set this to `true` to use Windows authentication. This is only available for SQL Server. | | | | `tenant_id` | The tenant ID of the Microsoft Entra ID instance. This is only used when connecting to Azure SQL with a service principal. | | | | `client_id` | The client ID of the Microsoft Entra service principal. This is only used when connecting to Azure SQL with a Microsoft Entra service principal. | | | | `client_secret` | The client secret of the Microsoft Entra service principal. This is only used when connecting to Azure SQL with a Microsoft Entra service principal. | | | | `encrypt` | Set this to `false` to disable the use of encryption. See [above](#connection-encryption). | | `true` | | `trust_cert` | Set this to `true` to trust the server certificate. See [above](#connection-encryption). | | `false` | | `retries` | The number of times to retry a failed connection. | | `1` | | `schema_authorization` | Optionally set this to the principal who should own the schemas created by dbt. [Details above](#schema-authorization). | | | | `login_timeout` | The amount of seconds to wait until a response from the server is received when establishing a connection. `0` means that the timeout is disabled. | | `0` | | `query_timeout` | The amount of seconds to wait until a response from the server is received when executing a query. `0` means that the timeout is disabled. | | `0` | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | Valid values for `authentication`: * `sql`: SQL authentication using username and password * `ActiveDirectoryPassword`: Active Directory authentication using username and password * `ActiveDirectoryInteractive`: Active Directory authentication using a username and MFA prompts * `ActiveDirectoryIntegrated`: Active Directory authentication using the current user's credentials * `ServicePrincipal`: Microsoft Entra ID authentication using a service principal * `CLI`: Microsoft Entra ID authentication using the account you're logged in with in the Azure CLI * `ActiveDirectoryMsi`: Microsoft Entra ID authentication using a managed identity available on the system * `environment`: Microsoft Entra ID authentication using environment variables as documented [here](https://learn.microsoft.com/en-us/python/api/azure-identity/azure.identity.environmentcredential?view=azure-python) * `auto`: Microsoft Entra ID authentication trying the previous authentication methods until it finds one that works #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ##### Connect MindsDB to dbt Core Vendor-supported plugin The dbt-mindsdb package allows dbt to connect to [MindsDB](https://github.com/mindsdb/mindsdb). * **Maintained by**: MindsDB * **Authors**: MindsDB team * **GitHub repo**: [mindsdb/dbt-mindsdb](https://github.com/mindsdb/dbt-mindsdb) [![](https://img.shields.io/github/stars/mindsdb/dbt-mindsdb?style=for-the-badge)](https://github.com/mindsdb/dbt-mindsdb) * **PyPI package**: `dbt-mindsdb` [![](https://badge.fury.io/py/dbt-mindsdb.svg)](https://badge.fury.io/py/dbt-mindsdb) * **Slack channel**: [n/a](https://www.getdbt.com/community) * **Supported dbt Core version**: v1.0.1 and newer * **dbt support**: Not Supported * **Minimum data platform version**: ? #### Installing dbt-mindsdb Use `pip` to install the adapter. Before 1.8, installing the adapter would automatically install `dbt-core` and any additional dependencies. Beginning in 1.8, installing an adapter does not automatically install `dbt-core`. This is because adapters and dbt Core versions have been decoupled from each other so we no longer want to overwrite existing dbt-core installations. Use the following command for installation: `python -m pip install dbt-core dbt-mindsdb` #### Configuring dbt-mindsdb For MindsDB-specific configuration, please refer to [MindsDB configs.](https://docs.getdbt.com/reference/resource-configs/mindsdb-configs.md) #### Configurations[​](#configurations "Direct link to Configurations") Basic `profile.yml` for connecting to MindsDB: ```yml mindsdb: outputs: dev: database: 'mindsdb' host: '127.0.0.1' password: '' port: 47335 schema: 'mindsdb' type: mindsdb username: 'mindsdb' target: dev ``` | Key | Required | Description | Example | | -------- | -------- | ---------------------------------------------------- | ------------------------------- | | type | ✔️ | The specific adapter to use | `mindsdb` | | host | ✔️ | The MindsDB (hostname) to connect to | `cloud.mindsdb.com` | | port | ✔️ | The port to use | `3306` or `47335` | | schema | ✔️ | Specify the schema (database) to build models into | The MindsDB datasource | | username | ✔️ | The username to use to connect to the server | `mindsdb` or mindsdb cloud user | | password | ✔️ | The password to use for authenticating to the server | \`pass | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ##### Connect MySQL to dbt Core Community plugin Some core functionality may be limited. If you're interested in contributing, check out the source code for each repository listed below. * **Maintained by**: Community * **Authors**: Doug Beatty (https://github.com/dbeatty10) * **GitHub repo**: [dbeatty10/dbt-mysql](https://github.com/dbeatty10/dbt-mysql) [![](https://img.shields.io/github/stars/dbeatty10/dbt-mysql?style=for-the-badge)](https://github.com/dbeatty10/dbt-mysql) * **PyPI package**: `dbt-mysql` [![](https://badge.fury.io/py/dbt-mysql.svg)](https://badge.fury.io/py/dbt-mysql) * **Slack channel**: [#db-mysql-family](https://getdbt.slack.com/archives/C03BK0SHC64) * **Supported dbt Core version**: v0.18.0 and newer * **dbt support**: Not Supported * **Minimum data platform version**: MySQL 5.7 and 8.0 #### Installing dbt-mysql Use `pip` to install the adapter. Before 1.8, installing the adapter would automatically install `dbt-core` and any additional dependencies. Beginning in 1.8, installing an adapter does not automatically install `dbt-core`. This is because adapters and dbt Core versions have been decoupled from each other so we no longer want to overwrite existing dbt-core installations. Use the following command for installation: `python -m pip install dbt-core dbt-mysql` #### Configuring dbt-mysql For MySQL-specific configuration, please refer to [MySQL configs.](https://docs.getdbt.com/reference/resource-configs/no-configs.md) This is an experimental plugin: * It has not been tested extensively. * Storage engines other than the default of InnoDB are untested. * Only tested with [dbt-adapter-tests](https://github.com/dbt-labs/dbt-adapter-tests) with the following versions: * MySQL 5.7 * MySQL 8.0 * MariaDB 10.5 * Compatibility with other [dbt packages](https://hub.getdbt.com/) (like [dbt\_utils](https://hub.getdbt.com/dbt-labs/dbt_utils/latest/)) are also untested. Please read these docs carefully and use at your own risk. [Issues](https://github.com/dbeatty10/dbt-mysql/issues/new) and [PRs](https://github.com/dbeatty10/dbt-mysql/blob/main/CONTRIBUTING.rst#contributing) welcome! #### Connecting to MySQL with dbt-mysql[​](#connecting-to-mysql-with-dbt-mysql "Direct link to Connecting to MySQL with dbt-mysql") MySQL targets should be set up using the following configuration in your `profiles.yml` file. Example: \~/.dbt/profiles.yml ```yaml your_profile_name: target: dev outputs: dev: type: mysql server: localhost port: 3306 schema: analytics username: your_mysql_username password: your_mysql_password ssl_disabled: True ``` ###### Description of MySQL Profile Fields[​](#description-of-mysql-profile-fields "Direct link to Description of MySQL Profile Fields") | Option | Description | Required? | Example | | ------------- | ----------------------------------------------------- | --------- | ------------------------------ | | type | The specific adapter to use | Required | `mysql`, `mysql5` or `mariadb` | | server | The server (hostname) to connect to | Required | `yourorg.mysqlhost.com` | | port | The port to use | Optional | `3306` | | schema | Specify the schema (database) to build models into | Required | `analytics` | | username | The username to use to connect to the server | Required | `dbt_admin` | | password | The password to use for authenticating to the server | Required | `correct-horse-battery-staple` | | ssl\_disabled | Set to enable or disable TLS connectivity to mysql5.x | Optional | `True` or `False` | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | #### Supported features[​](#supported-features "Direct link to Supported features") | MariaDB 10.5 | MySQL 5.7 | MySQL 8.0 | Feature | | ------------ | --------- | --------- | --------------------------- | | ✅ | ✅ | ✅ | Table materialization | | ✅ | ✅ | ✅ | View materialization | | ✅ | ✅ | ✅ | Incremental materialization | | ✅ | ❌ | ✅ | Ephemeral materialization | | ✅ | ✅ | ✅ | Seeds | | ✅ | ✅ | ✅ | Sources | | ✅ | ✅ | ✅ | Custom data tests | | ✅ | ✅ | ✅ | Docs generate | | 🤷 | 🤷 | ✅ | Snapshots | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | #### Notes[​](#notes "Direct link to Notes") * Ephemeral materializations rely upon [Common Table Expressions](https://en.wikipedia.org/wiki/Hierarchical_and_recursive_queries_in_SQL) (CTEs), which are not supported until MySQL 8.0. * MySQL 5.7 has some configuration gotchas that might affect dbt snapshots to not work properly due to [automatic initialization and updating for `TIMESTAMP`](https://dev.mysql.com/doc/refman/5.7/en/timestamp-initialization.html). * If the output of `SHOW VARIABLES LIKE 'sql_mode'` includes `NO_ZERO_DATE`. A solution is to include the following in a `*.cnf` file: ```text [mysqld] explicit_defaults_for_timestamp = true sql_mode = "ALLOW_INVALID_DATES,{other_sql_modes}" ``` * Where `{other_sql_modes}` is the rest of the modes from the `SHOW VARIABLES LIKE 'sql_mode'` output. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ##### Connect Oracle to dbt Core * **Maintained by**: Oracle * **Authors**: Oracle * **GitHub repo**: [oracle/dbt-oracle](https://github.com/oracle/dbt-oracle) [![](https://img.shields.io/github/stars/oracle/dbt-oracle?style=for-the-badge)](https://github.com/oracle/dbt-oracle) * **PyPI package**: `dbt-oracle` [![](https://badge.fury.io/py/dbt-oracle.svg)](https://badge.fury.io/py/dbt-oracle) * **Slack channel**: [#db-oracle](https://getdbt.slack.com/archives/C01PWH4TXLY) * **Supported dbt Core version**: v1.2.1 and newer * **dbt support**: Not Supported * **Minimum data platform version**: Oracle 12c and higher #### Installing dbt-oracle Use `pip` to install the adapter. Before 1.8, installing the adapter would automatically install `dbt-core` and any additional dependencies. Beginning in 1.8, installing an adapter does not automatically install `dbt-core`. This is because adapters and dbt Core versions have been decoupled from each other so we no longer want to overwrite existing dbt-core installations. Use the following command for installation: `python -m pip install dbt-core dbt-oracle` #### Configuring dbt-oracle For Oracle-specific configuration, please refer to [Oracle configs.](https://docs.getdbt.com/reference/resource-configs/oracle-configs.md) ##### Configure the Python driver mode[​](#configure-the-python-driver-mode "Direct link to Configure the Python driver mode") [python-oracledb](https://oracle.github.io/python-oracledb/) makes it optional to install the Oracle Client libraries. This driver supports 2 modes 1. **Thin mode (preferred)**: Python process directly connects to the Oracle database. This mode does not need the Oracle Client libraries 2. **Thick mode**: Python process links with the Oracle Client libraries. Some advanced Oracle database functionalities (for example: Advanced Queuing, LDAP connections, Scrollable cursors) are currently available via Oracle Client libraries You can configure the driver mode using the environment variable `ORA_PYTHON_DRIVER_TYPE`. Use the **thin** mode as it vastly simplifies installation. | Driver Mode | Oracle Client libraries required? | Configuration | | ----------- | --------------------------------- | ------------------------------ | | Thin | No | `ORA_PYTHON_DRIVER_TYPE=thin` | | Thick | Yes | `ORA_PYTHON_DRIVER_TYPE=thick` | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | The default value of `ORA_PYTHON_DRIVER_TYPE` is `thin` * Thin * Thick ```bash export ORA_PYTHON_DRIVER_TYPE=thin # default ``` ```bash export ORA_PYTHON_DRIVER_TYPE=thick ``` ##### Install Oracle Instant Client libraries[​](#install-oracle-instant-client-libraries "Direct link to Install Oracle Instant Client libraries") In thick mode, you will need the [Oracle Instant Client libraries](https://www.oracle.com/database/technologies/instant-client.html) installed. These provide the necessary network connectivity allowing dbt-oracle to access an Oracle Database instance. Oracle Client versions 23, 21, 19, 18, 12 and 11.2 are supported. It is recommended to use the latest client possible: Oracle’s standard client-server version interoperability allows connection to both older and newer databases. * Linux * Windows * MacOS 1. Download an Oracle 23, 21, 19, 18, 12, or 11.2 “Basic” or “Basic Light” zip file matching your Python 64-bit or 32-bit architecture: 1. [x86-64 64-bit](https://www.oracle.com/database/technologies/instant-client/linux-x86-64-downloads.html) 2. [x86 32-bit](https://www.oracle.com/database/technologies/instant-client/linux-x86-32-downloads.html) 3. [ARM (aarch64) 64-bit](https://www.oracle.com/database/technologies/instant-client/linux-arm-aarch64-downloads.html) 2. Unzip the package into a single directory that is accessible to your application. For example: ```bash mkdir -p /opt/oracle cd /opt/oracle unzip instantclient-basic-linux.x64-21.6.0.0.0.zip ``` 3. Install the libaio package with sudo or as the root user. For example: ```bash sudo yum install libaio ``` On some Linux distributions this package is called `libaio1` instead. 4. If there is no other Oracle software on the machine that will be impacted, permanently add Instant Client to the runtime link path. For example, with sudo or as the root user: ```bash sudo sh -c "echo /opt/oracle/instantclient_21_6 > /etc/ld.so.conf.d/oracle-instantclient.conf" sudo ldconfig ``` Alternatively, set the environment variable `LD_LIBRARY_PATH` ```bash export LD_LIBRARY_PATH=/opt/oracle/instantclient_21_6:$LD_LIBRARY_PATH ``` 5. If you use optional Oracle configuration files such as tnsnames.ora, sqlnet.ora, or oraaccess.xml with Instant Client, then put the files in an accessible directory and set the environment variable TNS\_ADMIN to that directory name. ```bash export TNS_ADMIN=/opt/oracle/your_config_dir ``` 1. Download an Oracle 21, 19, 18, 12, or 11.2 “Basic” or “Basic Light” zip file: [64-bit](https://www.oracle.com/database/technologies/instant-client/winx64-64-downloads.html) or [32-bit](https://www.oracle.com/database/technologies/instant-client/microsoft-windows-32-downloads.html), matching your Python architecture. Windows 7 users Note that Oracle Client versions 21c and 19c are not supported on Windows 7. 2. Unzip the package into a directory that is accessible to your application. For example, unzip `instantclient-basic-windows.x64-19.11.0.0.0dbru.zip` to `C:\oracle\instantclient_19_11`. 3. Oracle Instant Client libraries require a Visual Studio redistributable with a 64-bit or 32-bit architecture to match Instant Client’s architecture. 1. For Instant Client 21 install [VS 2019](https://docs.microsoft.com/en-US/cpp/windows/latest-supported-vc-redist?view=msvc-170) or later 2. For Instant Client 19 install [VS 2017](https://docs.microsoft.com/en-US/cpp/windows/latest-supported-vc-redist?view=msvc-170) 3. For Instant Client 18 or 12.2 install [VS 2013](https://docs.microsoft.com/en-US/cpp/windows/latest-supported-vc-redist?view=msvc-170#visual-studio-2013-vc-120) 4. For Instant Client 12.1 install [VS 2010](https://docs.microsoft.com/en-US/cpp/windows/latest-supported-vc-redist?view=msvc-170#visual-studio-2010-vc-100-sp1-no-longer-supported) 5. For Instant Client 11.2 install [VS 2005 64-bit](https://docs.microsoft.com/en-US/cpp/windows/latest-supported-vc-redist?view=msvc-170#visual-studio-2005-vc-80-sp1-no-longer-supported) 4. Add the Oracle Instant Client directory to the `PATH` environment variable. The directory must occur in `PATH` before any other Oracle directories. Restart any open command prompt windows. ```bash SET PATH=C:\oracle\instantclient_19_9;%PATH% ``` Check the python-oracledb documentation for installation instructions on [MacOS ARM64](https://python-oracledb.readthedocs.io/en/latest/user_guide/installation.html#instant-client-scripted-installation-on-macos-arm64) or [MacOS Intel x84-64](https://python-oracledb.readthedocs.io/en/latest/user_guide/installation.html#instant-client-scripted-installation-on-macos-intel-x86-64) #### Configure wallet for Oracle Autonomous Database (ADB-S) in Cloud[​](#configure-wallet-for-oracle-autonomous-database-adb-s-in-cloud "Direct link to Configure wallet for Oracle Autonomous Database (ADB-S) in Cloud") dbt can connect to Oracle Autonomous Database (ADB-S) in Oracle Cloud using either TLS (Transport Layer Security) or mutual TLS (mTLS). TLS and mTLS provide enhanced security for authentication and encryption. A database username and password is still required for dbt connections which can be configured as explained in the next section [Connecting to Oracle Database](#connecting-to-oracle-database). * TLS * Mutual TLS With TLS, dbt can connect to Oracle ADB without using a wallet. Both Thin and Thick modes of the python-oracledb driver support TLS. info In Thick mode, dbt can connect through TLS only when using Oracle Client library versions 19.14 (or later) or 21.5 (or later). Refer to Oracle documentation to [connect to an ADB instance using TLS authentication](https://docs.oracle.com/en/cloud/paas/autonomous-database/adbsa/connecting-nodejs-tls.html#GUID-B3809B88-D2FB-4E08-8F9B-65A550F93A07) and the blog post [Easy wallet-less connections to Oracle Autonomous Databases in Python](https://blogs.oracle.com/opal/post/easy-way-to-connect-python-applications-to-oracle-autonomous-databases) to enable TLS for your Oracle ADB instance. For mutual TLS connections, a wallet needs to be downloaded from the OCI console and the Python driver needs to be configured to use it. ###### Install the Wallet and Network Configuration Files[​](#install-the-wallet-and-network-configuration-files "Direct link to Install the Wallet and Network Configuration Files") From the Oracle Cloud console for the database, download the wallet zip file using the `DB Connection` button. The zip contains the wallet and network configuration files. Note Keep wallet files in a secure location and share them only with authorized users. Unzip the wallet zip file. * Thin * Thick In Thin mode, only two files from the zip are needed: * `tnsnames.ora` - Maps net service names used for application connection strings to your database services * `ewallet.pem` - Enables SSL/TLS connections in Thin mode. Keep this file secure After unzipping the files in a secure directory, set the **TNS\_ADMIN** and **WALLET\_LOCATION** environment variables to the directory name. ```bash export WALLET_LOCATION=/path/to/directory_containing_ewallet.pem export WALLET_PASSWORD=*** export TNS_ADMIN=/path/to/directory_containing_tnsnames.ora ``` Optionally, if `ewallet.pem` file is encrypted using a wallet password, specify the password using environment variable **WALLET\_PASSWORD** In Thick mode, the following files from the zip are needed: * `tnsnames.ora` - Maps net service names used for application connection strings to your database services * `sqlnet.ora` - Configures Oracle Network settings * `cwallet.sso` - Enables SSL/TLS connections After unzipping the files in a secure directory, set the **TNS\_ADMIN** environment variable to that directory name. ```bash export TNS_ADMIN=/path/to/directory_containing_tnsnames.ora ``` Next, edit the `sqlnet.ora` file to point to the wallet directory. sqlnet.ora ```text WALLET_LOCATION = (SOURCE = (METHOD = file) (METHOD_DATA = (DIRECTORY="/path/to/wallet/directory"))) SSL_SERVER_DN_MATCH=yes ``` #### Connecting to Oracle Database[​](#connecting-to-oracle-database "Direct link to Connecting to Oracle Database") Define the following mandatory parameters as environment variables and refer them in the connection profile using [env\_var](https://docs.getdbt.com/reference/dbt-jinja-functions/env_var.md) Jinja function. Optionally, you can also define these directly in the `profiles.yml` file, but this is not recommended. ```bash export DBT_ORACLE_USER= export DBT_ORACLE_PASSWORD=*** export DBT_ORACLE_SCHEMA= export DBT_ORACLE_DATABASE=example_db2022adb ``` Use the following query to retrieve the database name: ```sql SELECT SYS_CONTEXT('userenv', 'DB_NAME') FROM DUAL ``` An Oracle connection profile for dbt can be set using any one of the following methods * Using TNS alias * Using Connect string * Using Database hostname The `tnsnames.ora` file is a configuration file that contains network service names mapped to connect descriptors. The directory location of `tnsnames.ora` file can be specified using `TNS_ADMIN` environment variable tnsnames.ora ```text db2022adb_high = (description = (retry_count=20)(retry_delay=3) (address=(protocol=tcps) (port=1522) (host=adb.example.oraclecloud.com)) (connect_data=(service_name=example_high.adb.oraclecloud.com)) (security=(ssl_server_cert_dn="CN=adb.example.oraclecloud.com, OU=Oracle BMCS US,O=Oracle Corporation,L=Redwood City,ST=California,C=US"))) ``` The TNS alias `db2022adb_high` can be defined as an environment variable and referred to in `profiles.yml` ```bash export DBT_ORACLE_TNS_NAME=db2022adb_high ``` \~/.dbt/profiles.yml ```yaml dbt_test: target: dev outputs: dev: type: oracle user: "{{ env_var('DBT_ORACLE_USER') }}" pass: "{{ env_var('DBT_ORACLE_PASSWORD') }}" database: "{{ env_var('DBT_ORACLE_DATABASE') }}" tns_name: "{{ env_var('DBT_ORACLE_TNS_NAME') }}" schema: "{{ env_var('DBT_ORACLE_SCHEMA') }}" threads: 4 ``` The connection string identifies which database service to connect to. It can be one of the following * An [Oracle Easy Connect String](https://docs.oracle.com/en/database/oracle/oracle-database/21/netag/configuring-naming-methods.html#GUID-B0437826-43C1-49EC-A94D-B650B6A4A6EE) * An Oracle Net Connect Descriptor String * A Net Service Name mapping to a connect descriptor ```bash export DBT_ORACLE_CONNECT_STRING="(description=(retry_count=20)(retry_delay=3)(address=(protocol=tcps)(port=1522) (host=adb.example.oraclecloud.com))(connect_data=(service_name=example_high.adb.oraclecloud.com)) (security=(ssl_server_cert_dn=\"CN=adb.example.oraclecloud.com, OU=Oracle BMCS US,O=Oracle Corporation,L=Redwood City,ST=California,C=US\")))" ``` \~/.dbt/profiles.yml ```yaml dbt_test: target: "{{ env_var('DBT_TARGET', 'dev') }}" outputs: dev: type: oracle user: "{{ env_var('DBT_ORACLE_USER') }}" pass: "{{ env_var('DBT_ORACLE_PASSWORD') }}" database: "{{ env_var('DBT_ORACLE_DATABASE') }}" schema: "{{ env_var('DBT_ORACLE_SCHEMA') }}" connection_string: "{{ env_var('DBT_ORACLE_CONNECT_STRING') }}" ``` To connect using the database hostname or IP address, you need to specify the following * host * port (1521 or 1522) * protocol (tcp or tcps) * service ```bash export DBT_ORACLE_HOST=adb.example.oraclecloud.com export DBT_ORACLE_SERVICE=example_high.adb.oraclecloud.com ``` \~/.dbt/profiles.yml ```yaml dbt_test: target: "{{ env_var('DBT_TARGET', 'dev') }}" outputs: dev: type: oracle user: "{{ env_var('DBT_ORACLE_USER') }}" pass: "{{ env_var('DBT_ORACLE_PASSWORD') }}" protocol: "tcps" host: "{{ env_var('DBT_ORACLE_HOST') }}" port: 1522 service: "{{ env_var('DBT_ORACLE_SERVICE') }}" database: "{{ env_var('DBT_ORACLE_DATABASE') }}" schema: "{{ env_var('DBT_ORACLE_SCHEMA') }}" retry_count: 1 retry_delay: 3 threads: 4 ``` Note Starting with `dbt-oracle==1.0.2`, it is **optional** to set the `database` name in `profile.yml` Starting with `dbt-oracle==1.8.0` database key in `profile.yml` is **still optional for all but one** of the dbt-oracle workflows. if `database` is missing in `profile.yml` the generated catalog used for project documentation will be empty. From `dbt-oracle==1.8`, we detect that `database` key is missing from `profile.yml` and issue a warning to add it for catalog generation. The warning message also shows the database name that dbt-oracle expects. That way users don't have to worry about "what" the database name is and "how" to get it. ##### Quoting configuration[​](#quoting-configuration "Direct link to Quoting configuration") The default quoting configuration used by dbt-oracle is shown below: dbt\_project.yaml ```yaml quoting: database: false identifier: false schema: false ``` This is recommended and works for most cases. ##### Approximate relation match error[​](#approximate-relation-match-error "Direct link to Approximate relation match error") Often users have complained about an approximate relation match as shown below: ```text Compilation Error in model 19:09:40 When searching for a relation, dbt found an approximate match. Instead of guessing 19:09:40 which relation to use, dbt will move on. Please delete , or rename it to be less ambiguous. Searched for: ``` This is reported in multiple channels: * [StackOverFlow Approximate relation Match Error](https://stackoverflow.com/questions/75892325/approximate-relation-match-with-dbt-on-oracle) * [Github Issue #51](https://github.com/oracle/dbt-oracle/issues/51) * [Github Issue #143](https://github.com/oracle/dbt-oracle/issues/143) * [Github Issue #144](https://github.com/oracle/dbt-oracle/issues/144) In all cases, the solution was to enable quoting only for the database. To solve this issue of `approximate match` use the following quoting configuration dbt\_project.yaml ```yaml quoting: database: true ``` #### Python models using Oracle Autonomous Database (ADB-S)[​](#python-models-using-oracle-autonomous-database-adb-s "Direct link to Python models using Oracle Autonomous Database (ADB-S)") Oracle's Autonomous Database Serverless (ADB-S) users can run dbt-py models using Oracle Machine Learning (OML4PY) which is available without any extra setup required. ##### Features[​](#features "Direct link to Features") * User Defined Python function is run in an ADB-S spawned Python 3.12.1 runtime * Access to external Python packages available in the Python runtime. For example, `numpy`, `pandas`, `scikit_learn` etc * Integration with Conda 24.x to create environments with custom Python packages * Access to Database session in the Python function * DataFrame read API to read `TABLES`, `VIEWS`, and ad-hoc `SELECT` queries as DataFrames * DataFrame write API to write DataFrames as `TABLES` * Supports both table and incremental materialization ##### Setup[​](#setup "Direct link to Setup") ###### Required roles[​](#required-roles "Direct link to Required roles") * User must be non-ADMIN to execute the Python function * User must be granted the `OML_DEVELOPER` role ###### OML Cloud Service URL[​](#oml-cloud-service-url "Direct link to OML Cloud Service URL") OML Cloud Service URL is of the following format: ```text https://tenant1-dbt.adb.us-sanjose-1.oraclecloudapps.com ``` In this example: * `tenant1` is the tenancy ID * `dbt` is the database name * `us-sanjose-1` is the datacenter region * `oraclecloudapps.com` is the root domain Add `oml_cloud_service_url` to your existing `~/.dbt/profiles.yml` \~/.dbt/profiles.yml ```yaml dbt_test: target: dev outputs: dev: type: oracle user: "{{ env_var('DBT_ORACLE_USER') }}" pass: "{{ env_var('DBT_ORACLE_PASSWORD') }}" database: "{{ env_var('DBT_ORACLE_DATABASE') }}" tns_name: "{{ env_var('DBT_ORACLE_TNS_NAME') }}" schema: "{{ env_var('DBT_ORACLE_SCHEMA') }}" oml_cloud_service_url: "https://tenant1-dbt.adb.us-sanjose-1.oraclecloudapps.com" ``` ##### Python model configurations[​](#python-model-configurations "Direct link to Python model configurations") | Configuration | Datatype | Examples | | --------------------------------------------------------------------------------------- | -------- | --------------------------------------------------------------------------------------------- | | Materialization | String | `dbt.config(materialized="incremental")` or `dbt.config(materialized="table")` | | Service | String | `dbt.config(service="HIGH")` or `dbt.config(service="MEDIUM")` or `dbt.config(service="LOW")` | | Async Mode | Boolean | `dbt.config(async_flag=True)` | | Timeout in seconds only to be used with ***async*** mode (`min: 1800` and `max: 43200`) | Integer | `dbt.config(timeout=1800)` | | Conda environment | String | `dbt.config(conda_env_name="dbt_py_env")` | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | In async mode, dbt-oracle will schedule a Python job, poll the job's status, and wait for it to complete. Without async mode, dbt-oracle will immediately invoke the Python job in a blocking manner. Note Use `dbt.config(async_flag=True)` for long-running Python jobs. ##### Python model examples[​](#python-model-examples "Direct link to Python model examples") ###### Refer other model[​](#refer-other-model "Direct link to Refer other model") Use `dbt.ref(model_name)` to refer to either SQL or Python model ```python def model(dbt, session): # Must be either table or incremental (view is not currently supported) dbt.config(materialized="table") # returns oml.core.DataFrame referring a dbt model s_df = dbt.ref("sales_cost") return s_df ``` ###### Refer a source[​](#refer-a-source "Direct link to Refer a source") Use `dbt.source(source_schema, table_name)` ```python def model(dbt, session): # Must be either table or incremental (view is not currently supported) dbt.config(materialized="table") # oml.core.DataFrame representing a datasource s_df = dbt.source("sh_database", "channels") return s_df ``` ###### Incremental materialization[​](#incremental-materialization "Direct link to Incremental materialization") ```python def model(dbt, session): # Must be either table or incremental dbt.config(materialized="incremental") # oml.DataFrame representing a datasource sales_cost_df = dbt.ref("sales_cost") if dbt.is_incremental: cr = session.cursor() result = cr.execute(f"select max(cost_timestamp) from {dbt.this.identifier}") max_timestamp = result.fetchone()[0] # filter new rows sales_cost_df = sales_cost_df[sales_cost_df["COST_TIMESTAMP"] > max_timestamp] return sales_cost_df ``` ###### Concatenate a new column in Dataframe[​](#concatenate-a-new-column-in-dataframe "Direct link to Concatenate a new column in Dataframe") ```python def model(dbt, session): dbt.config(materialized="table") dbt.config(async_flag=True) dbt.config(timeout=1800) sql = f"""SELECT customer.cust_first_name, customer.cust_last_name, customer.cust_gender, customer.cust_marital_status, customer.cust_street_address, customer.cust_email, customer.cust_credit_limit, customer.cust_income_level FROM sh.customers customer, sh.countries country WHERE country.country_iso_code = ''US'' AND customer.country_id = country.country_id""" # session.sync(query) will run the sql query and returns a oml.core.DataFrame us_potential_customers = session.sync(query=sql) # Compute an ad-hoc anomaly score on the credit limit median_credit_limit = us_potential_customers["CUST_CREDIT_LIMIT"].median() mean_credit_limit = us_potential_customers["CUST_CREDIT_LIMIT"].mean() anomaly_score = (us_potential_customers["CUST_CREDIT_LIMIT"] - median_credit_limit)/(median_credit_limit - mean_credit_limit) # Add a new column "CUST_CREDIT_ANOMALY_SCORE" us_potential_customers = us_potential_customers.concat({"CUST_CREDIT_ANOMALY_SCORE": anomaly_score.round(3)}) # Return potential customers dataset as a oml.core.DataFrame return us_potential_customers ``` ##### Use Custom Conda environment[​](#use-custom-conda-environment "Direct link to Use Custom Conda environment") 1. As ADMIN user, create a conda environment using [OML4PY Conda Notebook](https://docs.oracle.com/en/database/oracle/machine-learning/oml4py/1/mlpug/administrative-task-create-and-conda-environments.html): ```bash conda create -n dbt_py_env -c conda-forge --override-channels --strict-channel-priority python=3.12.1 nltk gensim ``` 2. Save this environment using the following command from the OML4PY Conda Notebook: ```bash conda upload --overwrite dbt_py_env -t application OML4PY ``` 3. Use the environment in dbt Python models: ```python # Import custom packages from Conda environments import nltk import gensim def model(dbt, session): dbt.config(materialized="table") dbt.config(conda_env_name="dbt_py_env") # Refer the conda environment dbt.config(async_flag=True) # Use async mode for long running Python jobs dbt.config(timeout=900) # oml.core.DataFrame referencing a dbt-sql model promotion_cost = dbt.ref("direct_sales_channel_promo_cost") return promotion_cost ``` #### Supported features[​](#supported-features "Direct link to Supported features") * Table materialization * View materialization * Materialized View * Incremental materialization * Seeds * Data sources * Singular tests * Generic tests; Not null, Unique, Accepted values and Relationships * Operations * Analyses * Exposures * Document generation * Serve project documentation as a website * Python Models (from dbt-oracle version 1.5.1) * Integration with Conda to use any Python packages from Anaconda's repository * All dbt commands are supported #### Not supported features[​](#not-supported-features "Direct link to Not supported features") * Ephemeral materialization #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ##### Connect Postgres to dbt Core `profiles.yml` file is for dbt Core and dbt fusion only If you're using dbt platform, you don't need to create a `profiles.yml` file. This file is only necessary when you use dbt Core or dbt Fusion locally. To learn more about Fusion prerequisites, refer to [Supported features](https://docs.getdbt.com/docs/fusion/supported-features.md). To connect your data platform to dbt, refer to [About data platforms](https://docs.getdbt.com/docs/cloud/connect-data-platform/about-connections.md). * **Maintained by**: dbt Labs * **Authors**: core dbt maintainers * **GitHub repo**: [dbt-labs/dbt-adapters](https://github.com/dbt-labs/dbt-adapters) [![](https://img.shields.io/github/stars/dbt-labs/dbt-adapters?style=for-the-badge)](https://github.com/dbt-labs/dbt-adapters) * **PyPI package**: `dbt-postgres` [![](https://badge.fury.io/py/dbt-postgres.svg)](https://badge.fury.io/py/dbt-postgres) * **Slack channel**: [#db-postgres](https://getdbt.slack.com/archives/C0172G2E273) * **Supported dbt Core version**: v0.4.0 and newer * **dbt support**: Supported * **Minimum data platform version**: n/a #### Installing dbt-postgres Use `pip` to install the adapter. Before 1.8, installing the adapter would automatically install `dbt-core` and any additional dependencies. Beginning in 1.8, installing an adapter does not automatically install `dbt-core`. This is because adapters and dbt Core versions have been decoupled from each other so we no longer want to overwrite existing dbt-core installations. Use the following command for installation: `python -m pip install dbt-core dbt-postgres` #### Configuring dbt-postgres For Postgres-specific configuration, please refer to [Postgres configs.](https://docs.getdbt.com/reference/resource-configs/postgres-configs.md) #### Profile Configuration[​](#profile-configuration "Direct link to Profile Configuration") Postgres targets should be set up using the following configuration in your `profiles.yml` file. \~/.dbt/profiles.yml ```yaml company-name: target: dev outputs: dev: type: postgres host: [hostname] user: [username] password: [password] port: [port] dbname: [database name] # or database instead of dbname schema: [dbt schema] threads: [optional, 1 or more] keepalives_idle: 0 # default 0, indicating the system default. See below connect_timeout: 10 # default 10 seconds retries: 1 # default 1 retry on error/timeout when opening connections search_path: [optional, override the default postgres search_path] role: [optional, set the role dbt assumes when executing queries] sslmode: [optional, set the sslmode used to connect to the database] sslcert: [optional, set the sslcert to control the certifcate file location] sslkey: [optional, set the sslkey to control the location of the private key] sslrootcert: [optional, set the sslrootcert config value to a new file path in order to customize the file location that contain root certificates] ``` ##### Configurations[​](#configurations "Direct link to Configurations") ###### search\_path[​](#search_path "Direct link to search_path") The `search_path` config controls the Postgres "search path" that dbt configures when opening new connections to the database. By default, the Postgres search path is `"$user, public"`, meaning that unqualified table names will be searched for in the `public` schema, or a schema with the same name as the logged-in user. **Note:** Setting the `search_path` to a custom value is not necessary or recommended for typical usage of dbt. ###### role[​](#role "Direct link to role") The `role` config controls the Postgres role that dbt assumes when opening new connections to the database. ###### sslmode[​](#sslmode "Direct link to sslmode") The `sslmode` config controls how dbt connects to Postgres databases using SSL. See [the Postgres docs](https://www.postgresql.org/docs/9.1/libpq-ssl.html) on `sslmode` for usage information. When unset, dbt will connect to databases using the Postgres default, `prefer`, as the `sslmode`. ###### sslcert[​](#sslcert "Direct link to sslcert") The `sslcert` config controls the location of the certificate file used to connect to Postgres when using client SSL connections. To use a certificate file that is not in the default location, set that file path using this value. Without this config set, dbt uses the Postgres default locations. See [Client Certificates](https://www.postgresql.org/docs/current/libpq-ssl.html#LIBPQ-SSL-CLIENTCERT) in the Postgres SSL docs for the default paths. ###### sslkey[​](#sslkey "Direct link to sslkey") The `sslkey` config controls the location of the private key for connecting to Postgres using client SSL connections. If this config is omitted, dbt uses the default key location for Postgres. See [Client Certificates](https://www.postgresql.org/docs/current/libpq-ssl.html#LIBPQ-SSL-CLIENTCERT) in the Postgres SSL docs for the default locations. ###### sslrootcert[​](#sslrootcert "Direct link to sslrootcert") When connecting to a Postgres server using a client SSL connection, dbt verifies that the server provides an SSL certificate signed by a trusted root certificate. These root certificates are in the `~/.postgresql/root.crt` file by default. To customize the location of this file, set the `sslrootcert` config value to a new file path. ##### `keepalives_idle`[​](#keepalives_idle "Direct link to keepalives_idle") If the database closes its connection while dbt is waiting for data, you may see the error `SSL SYSCALL error: EOF detected`. Lowering the [`keepalives_idle` value](https://www.postgresql.org/docs/9.3/libpq-connect.html) may prevent this, because the server will send a ping to keep the connection active more frequently. [dbt's default setting](https://github.com/dbt-labs/dbt-core/blob/main/plugins/postgres/dbt/adapters/postgres/connections.py#L28) is 0 (the server's default value), but can be configured lower (perhaps 120 or 60 seconds), at the cost of a chattier network connection. ###### retries[​](#retries "Direct link to retries") If `dbt-postgres` encounters an operational error or timeout when opening a new connection, it will retry up to the number of times configured by `retries`. The default value is 1 retry. If set to 2+ retries, dbt will wait 1 second before retrying. If set to 0, dbt will not retry at all. ##### `psycopg2` vs `psycopg2-binary`[​](#psycopg2-vs-psycopg2-binary "Direct link to psycopg2-vs-psycopg2-binary") `psycopg2-binary` is installed by default when installing `dbt-postgres`. Installing `psycopg2-binary` uses a pre-built version of `psycopg2` which may not be optimized for your particular machine. This is ideal for development and testing workflows where performance is less of a concern and speed and ease of install is more important. However, production environments will benefit from a version of `psycopg2` which is built from source for your particular operating system, and architecture. In this scenario, speed and ease of install is less important as the on-going usage is the focus. To use `psycopg2`: 1. Install `dbt-postgres` 2. Uninstall `psycopg2-binary` 3. Install the equivalent version of `psycopg2` ```bash pip install dbt-postgres if [[ $(pip show psycopg2-binary) ]]; then PSYCOPG2_VERSION=$(pip show psycopg2-binary | grep Version | cut -d " " -f 2) pip uninstall -y psycopg2-binary && pip install psycopg2==$PSYCOPG2_VERSION fi ``` Installing `psycopg2` often requires OS level dependencies. These dependencies may vary across operating systems and architectures. For example, on Ubuntu, you need to install `libpq-dev` and `python-dev`: ```bash sudo apt-get update sudo apt-get install libpq-dev python-dev ``` whereas on Mac, you need to install `postgresql`: ```bash brew install postgresql pip install psycopg2 ``` Your OS may have its own dependencies based on your particular scenario. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ##### Connect RisingWave to dbt Core Vendor-supported plugin Certain core functionality may vary. If you would like to report a bug, request a feature, or contribute, you can check out the linked repository and open an issue. * **Maintained by**: RisingWave * **Authors**: Dylan Chen * **GitHub repo**: [risingwavelabs/dbt-risingwave](https://github.com/risingwavelabs/dbt-risingwave) [![](https://img.shields.io/github/stars/risingwavelabs/dbt-risingwave?style=for-the-badge)](https://github.com/risingwavelabs/dbt-risingwave) * **PyPI package**: `dbt-risingwave` [![](https://badge.fury.io/py/dbt-risingwave.svg)](https://badge.fury.io/py/dbt-risingwave) * **Slack channel**: [N/A](https://www.risingwave.com/slack) * **Supported dbt Core version**: v1.6.1 and newer * **dbt support**: Not Supported * **Minimum data platform version**: #### Installing dbt-risingwave Use `pip` to install the adapter. Before 1.8, installing the adapter would automatically install `dbt-core` and any additional dependencies. Beginning in 1.8, installing an adapter does not automatically install `dbt-core`. This is because adapters and dbt Core versions have been decoupled from each other so we no longer want to overwrite existing dbt-core installations. Use the following command for installation: `python -m pip install dbt-core dbt-risingwave` #### Configuring dbt-risingwave For RisingWave-specific configuration, please refer to [RisingWave configs.](https://docs.getdbt.com/reference/resource-configs/no-configs.md) #### Connecting to RisingWave with dbt-risingwave[​](#connecting-to-risingwave-with-dbt-risingwave "Direct link to Connecting to RisingWave with dbt-risingwave") Before connecting to RisingWave, ensure that RisingWave is installed and running. For more information about how to get RisingWave up and running, see the [RisingWave quick start guide](https://docs.risingwave.com/get-started/quickstart). To connect to RisingWave with dbt, you need to add a RisingWave profile to your dbt profile file (`~/.dbt/profiles.yml`). Below is an example RisingWave profile. Revise the field values when necessary. \~/.dbt/profiles.yml ```yaml default: outputs: dev: type: risingwave host: [host name] user: [user name] pass: [password] dbname: [database name] port: [port] schema: [dbt schema] target: dev ``` | Field | Description | | -------- | ------------------------------------------------------ | | `host` | The host name or IP address of the RisingWave instance | | `user` | The RisingWave database user you want to use | | `pass` | The password of the database user | | `dbname` | The RisingWave database name | | `port` | The port number that RisingWave listens on | | `schema` | The schema of the RisingWave database | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | To test the connection to RisingWave, run: ```bash dbt debug ``` #### Materializations[​](#materializations "Direct link to Materializations") The dbt models for managing data transformations in RisingWave are similar to typical dbt SQL models. In the `dbt-risingwave` adapter, we have customized some of the materializations to align with the streaming data processing model of RisingWave. | Materializations | Supported | Notes | | ---------------------- | ----------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | `table` | Yes | Creates a [table](https://docs.risingwave.com/sql/commands/sql-create-table). To use this materialization, add `{{ config(materialized='table') }}` to your model SQL files. | | `view` | Yes | Creates a [view](https://docs.risingwave.com/sql/commands/sql-create-view). To use this materialization, add `{{ config(materialized='view') }}` to your model SQL files. | | `ephemeral` | Yes | This materialization uses [common table expressions](https://docs.risingwave.com/sql/query-syntax/with-clause) in RisingWave under the hood. To use this materialization, add `{{ config(materialized='ephemeral') }}` to your model SQL files. | | `materializedview` | To be deprecated. | It is available only for backward compatibility purposes (for v1.5.1 of the dbt-risingwave adapter plugin). If you are using v1.6.0 and later versions of the dbt-risingwave adapter plugin, use `materialized_view` instead. | | `materialized_view` | Yes | Creates a [materialized view](https://docs.risingwave.com/sql/commands/sql-create-mv). This materialization corresponds the `incremental` one in dbt. To use this materialization, add `{{ config(materialized='materialized_view') }}` to your model SQL files. | | `incremental` | No | Please use `materialized_view` instead. Since RisingWave is designed to use materialized view to manage data transformation in an incremental way, you can just use the `materialized_view` materialization. | | `source` | Yes | Creates a [source](https://docs.risingwave.com/sql/commands/sql-create-source). To use this materialization, add {{ config(materialized='source') }} to your model SQL files. You need to provide your create source statement as a whole in this model. See [Example model files](https://docs.risingwave.com/integrations/other/dbt#example-model-files) for details. | | `table_with_connector` | Yes | Creates a table with connector settings. In RisingWave, a table with connector settings is similar to a source. The difference is that a table object with connector settings persists raw streaming data in the source, while a source object does not. To use this materialization, add `{{ config(materialized='table_with_connector') }}` to your model SQL files. You need to provide your create table with connector statement as a whole in this model (see [Example model files](https://docs.risingwave.com/integrations/other/dbt#example-model-files) for details). Because dbt tables have their own semantics, RisingWave use `table_with_connector` to distinguish itself from a dbt table. | | `sink` | Yes | Creates a [sink](https://docs.risingwave.com/sql/commands/sql-create-sink). To use this materialization, add `{{ config(materialized='sink') }}` to your SQL files. You need to provide your create sink statement as a whole in this model. See [Example model files](https://docs.risingwave.com/integrations/other/dbt#example-model-files) for details. | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | #### Resources[​](#resources "Direct link to Resources") * [RisingWave's guide about using dbt for data transformations](https://docs.risingwave.com/integrations/other/dbt) * [A demo project using dbt to manage Nexmark benchmark queries in RisingWave](https://github.com/risingwavelabs/dbt_rw_nexmark) #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ##### Connect Rockset to dbt Core Vendor-supported plugin Certain core functionality may vary. If you would like to report a bug, request a feature, or contribute, you can check out the linked repository and open an issue. * **Maintained by**: Rockset, Inc. * **Authors**: Rockset, Inc. * **GitHub repo**: [rockset/dbt-rockset](https://github.com/rockset/dbt-rockset) [![](https://img.shields.io/github/stars/rockset/dbt-rockset?style=for-the-badge)](https://github.com/rockset/dbt-rockset) * **PyPI package**: `dbt-rockset` [![](https://badge.fury.io/py/dbt-rockset.svg)](https://badge.fury.io/py/dbt-rockset) * **Slack channel**: [#dbt-rockset](https://getdbt.slack.com/archives/C02J7AZUAMN) * **Supported dbt Core version**: v0.19.2 and newer * **dbt support**: Not Supported * **Minimum data platform version**: ? #### Installing dbt-rockset Use `pip` to install the adapter. Before 1.8, installing the adapter would automatically install `dbt-core` and any additional dependencies. Beginning in 1.8, installing an adapter does not automatically install `dbt-core`. This is because adapters and dbt Core versions have been decoupled from each other so we no longer want to overwrite existing dbt-core installations. Use the following command for installation: `python -m pip install dbt-core dbt-rockset` #### Configuring dbt-rockset For Rockset-specific configuration, please refer to [Rockset configs.](https://docs.getdbt.com/reference/resource-configs/no-configs.md) #### Connecting to Rockset with **dbt-rockset**[​](#connecting-to-rockset-with-dbt-rockset "Direct link to connecting-to-rockset-with-dbt-rockset") The dbt profile for Rockset is very simple and contains the following fields: profiles.yml ```yaml rockset: target: dev outputs: dev: type: rockset workspace: [schema] api_key: [api_key] api_server: [api_server] # (Default is api.rs2.usw2.rockset.com) ``` ##### Materializations[​](#materializations "Direct link to Materializations") | Type | Supported? | Details | | ----------- | ---------- | -------------------------------------------------------------------------------------------------------------------------------------- | | view | YES | Creates a [view](https://rockset.com/docs/views/#gatsby-focus-wrapper). | | table | YES | Creates a [collection](https://rockset.com/docs/collections/#gatsby-focus-wrapper). | | ephemeral | YES | Executes queries using CTEs. | | incremental | YES | Creates a [collection](https://rockset.com/docs/collections/#gatsby-focus-wrapper) if it doesn't exist, and then writes results to it. | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | #### Caveats[​](#caveats "Direct link to Caveats") 1. `unique_key` is not supported with incremental, unless it is set to [\_id](https://rockset.com/docs/special-fields/#the-_id-field), which acts as a natural `unique_key` in Rockset anyway. 2. The `table` materialization is slower in Rockset than most due to Rockset's architecture as a low-latency, real-time database. Creating new collections requires provisioning hot storage to index and serve fresh data, which takes about a minute. 3. Rockset queries have a two-minute timeout. Any model which runs a query that takes longer to execute than two minutes will fail. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ##### Connect SingleStore to dbt Core Vendor-supported plugin Certain core functionality may vary. If you would like to report a bug, request a feature, or contribute, you can check out the linked repository and open an issue. * **Maintained by**: SingleStore, Inc. * **Authors**: SingleStore, Inc. * **GitHub repo**: [memsql/dbt-singlestore](https://github.com/memsql/dbt-singlestore) [![](https://img.shields.io/github/stars/memsql/dbt-singlestore?style=for-the-badge)](https://github.com/memsql/dbt-singlestore) * **PyPI package**: `dbt-singlestore` [![](https://badge.fury.io/py/dbt-singlestore.svg)](https://badge.fury.io/py/dbt-singlestore) * **Slack channel**: [db-singlestore](https://getdbt.slack.com/archives/C02V2QHFF7U) * **Supported dbt Core version**: v1.0.0 and newer * **dbt support**: Not supported * **Minimum data platform version**: v7.5 #### Installing dbt-singlestore Use `pip` to install the adapter. Before 1.8, installing the adapter would automatically install `dbt-core` and any additional dependencies. Beginning in 1.8, installing an adapter does not automatically install `dbt-core`. This is because adapters and dbt Core versions have been decoupled from each other so we no longer want to overwrite existing dbt-core installations. Use the following command for installation: `python -m pip install dbt-core dbt-singlestore` #### Configuring dbt-singlestore For SingleStore-specific configuration, please refer to [SingleStore configs.](https://docs.getdbt.com/reference/resource-configs/singlestore-configs.md) ##### Set up a SingleStore Target[​](#set-up-a-singlestore-target "Direct link to Set up a SingleStore Target") SingleStore targets should be set up using the following configuration in your `profiles.yml` file. If you are using SingleStore Managed Service, required connection details can be found on your Cluster Page under "Connect" -> "SQL IDE" tab. \~/.dbt/profiles.yml ```yaml singlestore: target: dev outputs: dev: type: singlestore host: [hostname] # optional, default localhost port: [port number] # optional, default 3306 user: [user] # optional, default root password: [password] # optional, default empty database: [database name] # required schema: [prefix for tables that dbt will generate] # required threads: [1 or more] # optional, default 1 ``` It is recommended to set optional parameters as well. ##### Description of SingleStore Profile Fields[​](#description-of-singlestore-profile-fields "Direct link to Description of SingleStore Profile Fields") | Field | Required | Description | | ---------- | -------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | `type` | Yes | Must be set to `singlestore`. This must be included either in `profiles.yml` or in the `dbt_project.yml` file. | | `host` | No | The host name of the SingleStore server to connect to. | | `user` | No | Your SingleStore database username. | | `password` | No | Your SingleStore database password. | | `database` | Yes | The name of your database. If you are using custom database names in your models config, they must be created prior to running those models. | | `schema` | Yes | The string to prefix the names of generated tables if `generate_alias_name` macro is added (see below). If you are using a custom schema name in your model config, it will be concatenated with the one specified in profile using `_`. | | `threads` | No | The number of threads available to dbt. | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | #### Schema and Concurrent Development[​](#schema-and-concurrent-development "Direct link to Schema and Concurrent Development") SingleStore doesn't have a concept of `schema` that corresponds to one used in `dbt` (namespace within a database). `schema` in your profile is required for `dbt` to work correctly with your project metadata. For example, you will see it on "dbt docs" page, even though it's not present in the database. In order to support concurrent development, `schema` can be used to prefix table names that `dbt` is building within your database. In order to enable this, add the following macro to your project. This macro will take the `schema` field of your `profiles.yml` file and use it as a table name prefix. ```sql -- macros/generate_alias_name.sql {% macro generate_alias_name(custom_alias_name=none, node=none) -%} {%- if custom_alias_name is none -%} {{ node.schema }}__{{ node.name }} {%- else -%} {{ node.schema }}__{{ custom_alias_name | trim }} {%- endif -%} {%- endmacro %} ``` Therefore, if you set `schema=dev` in your `.dbt/profiles.yml` file and run the `customers` model with the corresponding profile, `dbt` will create a table named `dev__customers` in your database. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ##### Connect SQLite to dbt Core Community plugin Some core functionality may be limited. If you're interested in contributing, check out the source code for each repository listed below. * **Maintained by**: Community * **Authors**: Jeff Chiu (https://github.com/codeforkjeff) * **GitHub repo**: [codeforkjeff/dbt-sqlite](https://github.com/codeforkjeff/dbt-sqlite) [![](https://img.shields.io/github/stars/codeforkjeff/dbt-sqlite?style=for-the-badge)](https://github.com/codeforkjeff/dbt-sqlite) * **PyPI package**: `dbt-sqlite` [![](https://badge.fury.io/py/dbt-sqlite.svg)](https://badge.fury.io/py/dbt-sqlite) * **Slack channel**: [n/a](https://www.getdbt.com/community) * **Supported dbt Core version**: v1.1.0 and newer * **dbt support**: Not Supported * **Minimum data platform version**: SQlite Version 3.0 #### Installing dbt-sqlite Use `pip` to install the adapter. Before 1.8, installing the adapter would automatically install `dbt-core` and any additional dependencies. Beginning in 1.8, installing an adapter does not automatically install `dbt-core`. This is because adapters and dbt Core versions have been decoupled from each other so we no longer want to overwrite existing dbt-core installations. Use the following command for installation: `python -m pip install dbt-core dbt-sqlite` #### Configuring dbt-sqlite For SQLite-specific configuration, please refer to [SQLite configs.](https://docs.getdbt.com/reference/resource-configs/no-configs.md) Starting with the release of dbt Core 1.0.0, versions of dbt-sqlite are aligned to the same major+minor [version](https://semver.org/) of dbt Core. * versions 1.1.x of this adapter work with dbt Core 1.1.x * versions 1.0.x of this adapter work with dbt Core 1.0.x #### Connecting to SQLite with dbt-sqlite[​](#connecting-to-sqlite-with-dbt-sqlite "Direct link to Connecting to SQLite with dbt-sqlite") SQLite targets should be set up using the following configuration in your `profiles.yml` file. Example: \~/.dbt/profiles.yml ```yaml your_profile_name: target: dev outputs: dev: type: sqlite threads: 1 database: 'database' schema: 'main' schemas_and_paths: main: 'file_path/database_name.db' schema_directory: 'file_path' #optional fields extensions: - "/path/to/sqlean/crypto.so" ``` ###### Description of SQLite Profile Fields[​](#description-of-sqlite-profile-fields "Direct link to Description of SQLite Profile Fields") | Field | Description | | ------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | `type` | Required. Must be set to `sqlite`. | | `threads` | Required. Must be set to `1`. SQLite locks the whole db on writes so anything > 1 won't help. | | `database` | Required but the value is arbitrary because there is no 'database' portion of relation names in SQLite so it gets stripped from the output of ref() and from SQL everywhere. It still needs to be set in the configuration and is used by dbt internally. | | `schema` | Value of 'schema' must be defined in schema\_paths below. in most cases, this should be main. | | `schemas_and_paths` | Connect schemas to paths: at least one of these must be 'main' | | `schema_directory` | Directory where all \*.db files are attached as schema, using base filename as schema name, and where new schemas are created. This can overlap with the dirs of files in schemas\_and\_paths as long as there's no conflicts. | | `extensions` | Optional. List of file paths of SQLite extensions to load. crypto.so is needed for snapshots to work; see SQLlite Extensions below. | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | #### Caveats[​](#caveats "Direct link to Caveats") * Schemas are implemented as attached database files. (SQLite conflates databases and schemas.) * SQLite automatically assigns 'main' to the file you initially connect to, so this must be defined in your profile. Other schemas defined in your profile get attached when database connection is created. * If dbt needs to create a new schema, it will be created in `schema_directory` as `schema_name.db`. Dropping a schema results in dropping all its relations and detaching the database file from the session. * Schema names are stored in view definitions, so when you access a non-'main' database file outside dbt, you'll need to attach it using the same name, or the views won't work. * SQLite does not allow views in one schema (i.e. database file) to reference objects in another schema. You'll get this error from SQLite: "view \[someview] cannot reference objects in database \[somedatabase]". You must set `materialized='table'` in models that reference other schemas. * Materializations are simplified: they drop and re-create the model, instead of doing the backup-and-swap-in new model that the other dbt database adapters support. This choice was made because SQLite doesn't support `DROP ... CASCADE` or `ALTER VIEW` or provide information about relation dependencies in something information\_schema-like. These limitations make it really difficult to make the backup-and-swap-in functionality work properly. Given how SQLite aggressively [locks](https://sqlite.org/lockingv3.html) the database anyway, it's probably not worth the effort. #### SQLite Extensions[​](#sqlite-extensions "Direct link to SQLite Extensions") For snapshots to work, you'll need the `crypto` module from SQLean to get an `md5()` function. It's recommended that you install all the SQLean modules, as they provide many common SQL functions missing from SQLite. Precompiled binaries are available for download from the [SQLean github repository page](https://github.com/nalgeon/sqlean). You can also compile them yourself if you want. Point to these module files in your profile config as shown in the example above. Mac OS seems to ship with [SQLite libraries that do not have support for loading extensions compiled in](https://docs.python.org/3/library/sqlite3.html#f1), so this won't work "out of the box." Accordingly, snapshots won't work. If you need snapshot functionality, you'll need to compile SQLite/python or find a python distribution for Mac OS with this support. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ##### Connect Starburst/Trino to dbt Core `profiles.yml` file is for dbt Core and dbt fusion only If you're using dbt platform, you don't need to create a `profiles.yml` file. This file is only necessary when you use dbt Core or dbt Fusion locally. To learn more about Fusion prerequisites, refer to [Supported features](https://docs.getdbt.com/docs/fusion/supported-features.md). To connect your data platform to dbt, refer to [About data platforms](https://docs.getdbt.com/docs/cloud/connect-data-platform/about-connections.md). * **Maintained by**: Starburst Data, Inc. * **Authors**: Marius Grama, Przemek Denkiewicz, Michiel de Smet, Damian Owsianny * **GitHub repo**: [starburstdata/dbt-trino](https://github.com/starburstdata/dbt-trino) [![](https://img.shields.io/github/stars/starburstdata/dbt-trino?style=for-the-badge)](https://github.com/starburstdata/dbt-trino) * **PyPI package**: `dbt-trino` [![](https://badge.fury.io/py/dbt-trino.svg)](https://badge.fury.io/py/dbt-trino) * **Slack channel**: [#db-starburst-and-trino](https://getdbt.slack.com/archives/CNNPBQ24R) * **Supported dbt Core version**: v0.20.0 and newer * **dbt support**: Supported * **Minimum data platform version**: n/a #### Installing dbt-trino Use `pip` to install the adapter. Before 1.8, installing the adapter would automatically install `dbt-core` and any additional dependencies. Beginning in 1.8, installing an adapter does not automatically install `dbt-core`. This is because adapters and dbt Core versions have been decoupled from each other so we no longer want to overwrite existing dbt-core installations. Use the following command for installation: `python -m pip install dbt-core dbt-trino` #### Configuring dbt-trino For Starburst/Trino-specific configuration, please refer to [Starburst/Trino configs.](https://docs.getdbt.com/reference/resource-configs/trino-configs.md) #### Connecting to Starburst/Trino[​](#connecting-to-starbursttrino "Direct link to Connecting to Starburst/Trino") To connect to a data platform with dbt Core, create appropriate *profile* and *target* YAML keys/values in the `profiles.yml` configuration file for your Starburst/Trino clusters. This dbt YAML file lives in the `.dbt/` directory of your user/home directory. For more information, refer to [Connection profiles](https://docs.getdbt.com/docs/local/profiles.yml.md) and [profiles.yml](https://docs.getdbt.com/docs/local/profiles.yml.md). The parameters for setting up a connection are for Starburst Enterprise, Starburst Galaxy, and Trino clusters. Unless specified, "cluster" will mean any of these products' clusters. #### Host parameters[​](#host-parameters "Direct link to Host parameters") The following profile fields are always required except for `user`, which is also required unless you're using the `oauth`, `oauth_console`, `cert`, or `jwt` authentication methods. | Field | Example | Description | | ---------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | `host` | `mycluster.mydomain.com`

Format for Starburst Galaxy:
- `mygalaxyaccountname-myclustername.trino.galaxy.starburst.io` | The hostname of your cluster.

Don't include the `http://` or `https://` prefix. | | `database` | `my_postgres_catalog` | The name of a catalog in your cluster. | | `schema` | `my_schema` | The name of a schema within your cluster's catalog.

It's *not recommended* to use schema names that have upper case or mixed case letters. | | `port` | `443` | The port to connect to your cluster. By default, it's 443 for TLS enabled clusters. | | `user` | Format for Starburst Enterprise or Trino:
- `user.name`
- `user.name@mydomain.com`
Format for Starburst Galaxy:
- `user.name@mydomain.com/role` | The username (of the account) to log in to your cluster. When connecting to Starburst Galaxy clusters, you must include the role of the user as a suffix to the username. | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | ##### Roles in Starburst Enterprise[​](#roles-in-starburst-enterprise "Direct link to Roles in Starburst Enterprise") If connecting to a Starburst Enterprise cluster with built-in access controls enabled, you must specify a role using the format detailed in [Additional parameters](#additional-parameters). If a role is not specified, the default role for the provided username is used. ##### Schemas and databases[​](#schemas-and-databases "Direct link to Schemas and databases") When selecting the catalog and the schema, make sure the user has read and write access to both. This selection does not limit your ability to query the catalog. Instead, they serve as the default location for where tables and views are materialized. In addition, the Trino connector used in the catalog must support creating tables. This *default* can be changed later from within your dbt project. #### Additional parameters[​](#additional-parameters "Direct link to Additional parameters") The following profile fields are optional to set up. They let you configure your cluster's session and dbt for your connection. | Profile field | Example | Description | | ----------------------------- | -------------------------------- | ---------------------------------------------------------------------------------------------------------- | | `threads` | `8` | How many threads dbt should use (default is `1`) | | `roles` | `system: analyst` | Catalog roles can be set under the optional `roles` parameter using the following format: `catalog: role`. | | `session_properties` | `query_max_run_time: 4h` | Sets Trino session properties used in the connection. Execute `SHOW SESSION` to see available options | | `prepared_statements_enabled` | `true` or `false` | Enable usage of Trino prepared statements (used in `dbt seed` commands) (default: `true`) | | `retries` | `10` | Configure how many times all database operation is retried when connection issues arise (default: `3`) | | `timezone` | `Europe/Brussels` | The time zone for the Trino session (default: client-side local timezone) | | `http_headers` | `X-Trino-Client-Info: dbt-trino` | HTTP Headers to send alongside requests to Trino, specified as a YAML dictionary of (header, value) pairs. | | `http_scheme` | `https` or `http` | The HTTP scheme to use for requests to Trino (default: `http`, or `https` if `kerberos`, `ldap` or `jwt`) | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | #### Authentication parameters[​](#authentication-parameters "Direct link to Authentication parameters") The authentication methods that dbt Core supports are: * `ldap` — LDAP (username and password) * `kerberos` — Kerberos * `jwt` — JSON Web Token (JWT) * `certificate` — Certificate-based authentication * `oauth` — Open Authentication (OAuth) * `oauth_console` — Open Authentication (OAuth) with authentication URL printed to the console * `none` — None, no authentication Set the `method` field to the authentication method you intend to use for the connection. For a high-level introduction to authentication in Trino, see [Trino Security: Authentication types](https://trino.io/docs/current/security/authentication-types.html). Click on one of these authentication methods for further details on how to configure your connection profile. Each tab also includes an example `profiles.yml` configuration file for you to review. * LDAP * Kerberos * JWT * Certificate * OAuth * OAuth (console) * None The following table lists the authentication parameters to set for LDAP. For more information, refer to [LDAP authentication](https://trino.io/docs/current/security/ldap.html) in the Trino docs. | Profile field | Example | Description | | ------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | `method` | `ldap` | Set LDAP as the authentication method. | | `user` | Format for Starburst Enterprise or Trino:
- `user.name`
- `user.name@mydomain.com`
Format for Starburst Galaxy:
- `user.name@mydomain.com/role` | The username (of the account) to log in to your cluster. When connecting to Starburst Galaxy clusters, you must include the role of the user as a suffix to the username. | | `password` | `abc123` | Password for authentication. | | `impersonation_user` (optional) | `impersonated_tom` | Override the provided username. This lets you impersonate another user. | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | |
###### Example profiles.yml for LDAP[​](#example-profilesyml-for-ldap "Direct link to Example profiles.yml for LDAP") \~/.dbt/profiles.yml ```yaml trino: target: dev outputs: dev: type: trino method: ldap user: [user] password: [password] host: [hostname] database: [database name] schema: [your dbt schema] port: [port number] threads: [1 or more] ``` The following table lists the authentication parameters to set for Kerberos. For more information, refer to [Kerberos authentication](https://trino.io/docs/current/security/kerberos.html) in the Trino docs. | Profile field | Example | Description | | ------------------------------------------- | ------------------- | --------------------------------------------------------------- | | `method` | `kerberos` | Set Kerberos as the authentication method. | | `user` | `commander` | Username for authentication | | `keytab` | `/tmp/trino.keytab` | Path to keytab | | `krb5_config` | `/tmp/krb5.conf` | Path to config | | `principal` | `trino@EXAMPLE.COM` | Principal | | `service_name` (optional) | `abc123` | Service name (default is `trino`) | | `hostname_override` (optional) | `EXAMPLE.COM` | Kerberos hostname for a host whose DNS name doesn't match | | `mutual_authentication` (optional) | `false` | Boolean flag for mutual authentication | | `force_preemptive` (optional) | `false` | Boolean flag to preemptively initiate the Kerberos GSS exchange | | `sanitize_mutual_error_response` (optional) | `true` | Boolean flag to strip content and headers from error responses | | `delegate` (optional) | `false` | Boolean flag for credential delegation (`GSS_C_DELEG_FLAG`) | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | |
###### Example profiles.yml for Kerberos[​](#example-profilesyml-for-kerberos "Direct link to Example profiles.yml for Kerberos") \~/.dbt/profiles.yml ```yaml trino: target: dev outputs: dev: type: trino method: kerberos user: commander keytab: /tmp/trino.keytab krb5_config: /tmp/krb5.conf principal: trino@EXAMPLE.COM host: trino.example.com port: 443 database: analytics schema: public ``` The following table lists the authentication parameters to set for JSON Web Token. For more information, refer to [JWT authentication](https://trino.io/docs/current/security/jwt.html) in the Trino docs. | Profile field | Example | Description | | ------------- | ------------------- | ------------------------------------- | | `method` | `jwt` | Set JWT as the authentication method. | | `jwt_token` | `aaaaa.bbbbb.ccccc` | The JWT string. | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | |
###### Example profiles.yml for JWT[​](#example-profilesyml-for-jwt "Direct link to Example profiles.yml for JWT") \~/.dbt/profiles.yml ```yaml trino: target: dev outputs: dev: type: trino method: jwt jwt_token: [my_long_jwt_token_string] host: [hostname] database: [database name] schema: [your dbt schema] port: [port number] threads: [1 or more] ``` The following table lists the authentication parameters to set for certificates. For more information, refer to [Certificate authentication](https://trino.io/docs/current/security/certificate.html) in the Trino docs. | Profile field | Example | Description | | -------------------- | -------------- | -------------------------------------------------- | | `method` | `certificate` | Set certificate-based authentication as the method | | `client_certificate` | `/tmp/tls.crt` | Path to client certificate | | `client_private_key` | `/tmp/tls.key` | Path to client private key | | `cert` | | The full path to a certificate file | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | |
###### Example profiles.yml for certificate[​](#example-profilesyml-for-certificate "Direct link to Example profiles.yml for certificate") \~/.dbt/profiles.yml ```yaml trino: target: dev outputs: dev: type: trino method: certificate cert: [path/to/cert_file] client_certificate: [path/to/client/cert] client_private_key: [path to client key] database: [database name] schema: [your dbt schema] port: [port number] threads: [1 or more] ``` The only authentication parameter to set for OAuth 2.0 is `method: oauth`. If you're using Starburst Enterprise or Starburst Galaxy, you must enable OAuth 2.0 in Starburst before you can use this authentication method. For more information, refer to both [OAuth 2.0 authentication](https://trino.io/docs/current/security/oauth2.html) in the Trino docs and the [README](https://github.com/trinodb/trino-python-client#oauth2-authentication) for the Trino Python client. It's recommended that you install `keyring` to cache the OAuth 2.0 token over multiple dbt invocations by running `python -m pip install 'trino[external-authentication-token-cache]'`. The `keyring` package is not installed by default. ###### Example profiles.yml for OAuth[​](#example-profilesyml-for-oauth "Direct link to Example profiles.yml for OAuth") ```yaml sandbox-galaxy: target: oauth outputs: oauth: type: trino method: oauth host: bunbundersders.trino.galaxy-dev.io catalog: dbt_target schema: dataders port: 443 ``` The only authentication parameter to set for OAuth 2.0 is `method: oauth_console`. If you're using Starburst Enterprise or Starburst Galaxy, you must enable OAuth 2.0 in Starburst before you can use this authentication method. For more information, refer to both [OAuth 2.0 authentication](https://trino.io/docs/current/security/oauth2.html) in the Trino docs and the [README](https://github.com/trinodb/trino-python-client#oauth2-authentication) for the Trino Python client. The only difference between `oauth_console` and `oauth` is: * `oauth` — An authentication URL automatically opens in a browser. * `oauth_console` — A URL is printed to the console. It's recommended that you install `keyring` to cache the OAuth 2.0 token over multiple dbt invocations by running `python -m pip install 'trino[external-authentication-token-cache]'`. The `keyring` package is not installed by default. ###### Example profiles.yml for OAuth[​](#example-profilesyml-for-oauth-1 "Direct link to Example profiles.yml for OAuth") ```yaml sandbox-galaxy: target: oauth_console outputs: oauth: type: trino method: oauth_console host: bunbundersders.trino.galaxy-dev.io catalog: dbt_target schema: dataders port: 443 ``` You don't need to set up authentication (`method: none`), however, dbt Labs strongly discourages people from using it in any real application. Its use case is only for toy purposes (as in to play around with it), like local examples such as running Trino and dbt entirely within a single Docker container. ###### Example profiles.yml for no authentication[​](#example-profilesyml-for-no-authentication "Direct link to Example profiles.yml for no authentication") \~/.dbt/profiles.yml ```yaml trino: target: dev outputs: dev: type: trino method: none user: commander host: trino.example.com port: 443 database: analytics schema: public ``` #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ##### Connect StarRocks to dbt Core #### Overview of dbt-starrocks * **Maintained by**: Starrocks * **Authors**: Astralidea * **GitHub repo**: [StarRocks/dbt-starrocks](https://github.com/StarRocks/dbt-starrocks)[![](https://img.shields.io/github/stars/StarRocks/dbt-starrocks?style=for-the-badge)](https://github.com/StarRocks/dbt-starrocks) * **PyPI package**: `dbt-starrocks` [![](https://badge.fury.io/py/dbt-starrocks.svg)](https://badge.fury.io/py/dbt-starrocks) * **Slack channel**: [#db-starrocks](https://www.getdbt.com/community) * **Supported dbt Core version**: v1.6.2 and newer * **dbt support**: Not Supported * **Minimum data platform version**: Starrocks 2.5 #### Installing dbt-starrocks pip is the easiest way to install the adapter: `python -m pip install dbt-starrocks` Installing `dbt-starrocks` will also install `dbt-core` and any other dependencies. #### Configuring dbt-starrocks For Starrocks-specifc configuration please refer to [Starrocks Configuration](https://docs.getdbt.com/reference/resource-configs/starrocks-configs.md) For further info, refer to the GitHub repository: [StarRocks/dbt-starrocks](https://github.com/StarRocks/dbt-starrocks) #### Authentication Methods[​](#authentication-methods "Direct link to Authentication Methods") ##### User / Password Authentication[​](#user--password-authentication "Direct link to User / Password Authentication") Starrocks can be configured using basic user/password authentication as shown below. \~/.dbt/profiles.yml ```yaml my-starrocks-db: target: dev outputs: dev: type: starrocks host: localhost port: 9030 schema: analytics # User/password auth username: your_starrocks_username password: your_starrocks_password ``` ###### Description of Profile Fields[​](#description-of-profile-fields "Direct link to Description of Profile Fields") | Option | Description | Required? | Example | | -------- | ------------------------------------------------------ | --------- | ------------------------------ | | type | The specific adapter to use | Required | `starrocks` | | host | The hostname to connect to | Required | `192.168.100.28` | | port | The port to use | Required | `9030` | | schema | Specify the schema (database) to build models into | Required | `analytics` | | username | The username to use to connect to the server | Required | `dbt_admin` | | password | The password to use for authenticating to the server | Required | `correct-horse-battery-staple` | | version | Let Plugin try to go to a compatible starrocks version | Optional | `3.1.0` | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | #### Supported features[​](#supported-features "Direct link to Supported features") | Starrocks <= 2.5 | Starrocks 2.5 ~ 3.1 | Starrocks >= 3.1 | Feature | | ---------------- | -------------------- | ---------------- | --------------------------------- | | ✅ | ✅ | ✅ | Table materialization | | ✅ | ✅ | ✅ | View materialization | | ❌ | ❌ | ✅ | Materialized View materialization | | ❌ | ✅ | ✅ | Incremental materialization | | ❌ | ✅ | ✅ | Primary Key Model | | ✅ | ✅ | ✅ | Sources | | ✅ | ✅ | ✅ | Custom data tests | | ✅ | ✅ | ✅ | Docs generate | | ❌ | ❌ | ❌ | Kafka | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | ##### Notice[​](#notice "Direct link to Notice") 1. When StarRocks Version < 2.5, `Create table as` can only set engine='OLAP' and table\_type='DUPLICATE' 2. When StarRocks Version >= 2.5, `Create table as` supports table\_type='PRIMARY' 3. When StarRocks Version < 3.1 distributed\_by is required It is recommended to use the latest starrocks version and dbt-starrocks version for the best experience. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ##### Connect Teradata to dbt Core Some core functionality may be limited. If you're interested in contributing, check out the source code in the repository listed in the next section. `profiles.yml` file is for dbt Core and dbt fusion only If you're using dbt platform, you don't need to create a `profiles.yml` file. This file is only necessary when you use dbt Core or dbt Fusion locally. To learn more about Fusion prerequisites, refer to [Supported features](https://docs.getdbt.com/docs/fusion/supported-features.md). To connect your data platform to dbt, refer to [About data platforms](https://docs.getdbt.com/docs/cloud/connect-data-platform/about-connections.md). * **Maintained by**: Teradata * **Authors**: Teradata * **GitHub repo**: [Teradata/dbt-teradata](https://github.com/Teradata/dbt-teradata) [![](https://img.shields.io/github/stars/Teradata/dbt-teradata?style=for-the-badge)](https://github.com/Teradata/dbt-teradata) * **PyPI package**: `dbt-teradata` [![](https://badge.fury.io/py/dbt-teradata.svg)](https://badge.fury.io/py/dbt-teradata) * **Slack channel**: [#db-teradata](https://getdbt.slack.com/archives/C027B6BHMT3) * **Supported dbt Core version**: v0.21.0 and newer * **dbt support**: Supported * **Minimum data platform version**: n/a #### Installing dbt-teradata Use `pip` to install the adapter. Before 1.8, installing the adapter would automatically install `dbt-core` and any additional dependencies. Beginning in 1.8, installing an adapter does not automatically install `dbt-core`. This is because adapters and dbt Core versions have been decoupled from each other so we no longer want to overwrite existing dbt-core installations. Use the following command for installation: `python -m pip install dbt-core dbt-teradata` #### Configuring dbt-teradata For Teradata-specific configuration, please refer to [Teradata configs.](https://docs.getdbt.com/reference/resource-configs/teradata-configs.md) #### Python compatibility[​](#python-compatibility "Direct link to Python compatibility") | Plugin version | Python 3.9 | Python 3.10 | Python 3.11 | Python 3.12 | Python 3.13 | | -------------- | ---------- | ----------- | ----------- | ----------- | ----------- | | 1.0.0.x | ✅ | ❌ | ❌ | ❌ | ❌ | | 1.1.x.x | ✅ | ✅ | ❌ | ❌ | ❌ | | 1.2.x.x | ✅ | ✅ | ❌ | ❌ | ❌ | | 1.3.x.x | ✅ | ✅ | ❌ | ❌ | ❌ | | 1.4.x.x | ✅ | ✅ | ✅ | ❌ | ❌ | | 1.5.x | ✅ | ✅ | ✅ | ❌ | ❌ | | 1.6.x | ✅ | ✅ | ✅ | ❌ | ❌ | | 1.7.x | ✅ | ✅ | ✅ | ❌ | ❌ | | 1.8.x | ✅ | ✅ | ✅ | ✅ | ❌ | | 1.9.x | ✅ | ✅ | ✅ | ✅ | ❌ | | 1.10.x | ✅ | ✅ | ✅ | ✅ | ✅ | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | #### dbt dependent packages version compatibility[​](#dbt-dependent-packages-version-compatibility "Direct link to dbt dependent packages version compatibility") | dbt-teradata | dbt Core | dbt-teradata-util | dbt-util | | ------------ | -------- | ----------------- | -------------- | | 1.2.x | 1.2.x | 0.1.0 | 0.9.x or below | | 1.6.7 | 1.6.7 | 1.1.1 | 1.1.1 | | 1.7.x | 1.7.x | 1.1.1 | 1.1.1 | | 1.8.x | 1.8.x | 1.1.1 | 1.1.1 | | 1.8.x | 1.8.x | 1.2.0 | 1.2.0 | | 1.8.x | 1.8.x | 1.3.0 | 1.3.0 | | 1.9.x | 1.9.x | 1.3.0 | 1.3.0 | | 1.10.x | 1.10.x | 1.3.0 | 1.3.0 | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | ##### Connecting to Teradata[​](#connecting-to-teradata "Direct link to Connecting to Teradata") To connect to Teradata Vantage from dbt, you'll need to add a [profile](https://docs.getdbt.com/docs/local/profiles.yml.md) to your `profiles.yml` file. A Teradata profile conforms to the following syntax: profiles.yml ```yaml : target: outputs: : type: teradata user: password: schema: tmode: ANSI threads: [optional, 1 or more] #optional fields ``` ###### Description of Teradata Profile Fields[​](#description-of-teradata-profile-fields "Direct link to Description of Teradata Profile Fields") The following fields are required: | Parameter | Default | Type | Description | | ---------- | -------- | ------ | -------------------------------------------------------------------------------------------------------- | | `user` | | string | Specifies the database username. Equivalent to the Teradata JDBC Driver `USER` connection parameter. | | `password` | | string | Specifies the database password. Equivalent to the Teradata JDBC Driver `PASSWORD` connection parameter. | | `schema` | | string | Specifies the initial database to use after logon, instead of the user's default database. | | `tmode` | `"ANSI"` | string | Specifies the transaction mode. Only `ANSI` mode is currently supported. | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | The plugin also supports the following optional connection parameters: | Parameter | Default | Type | Description | | --------------------- | -------------------------------------------- | -------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | `account` | | string | Specifies the database account. Equivalent to the Teradata JDBC Driver `ACCOUNT` connection parameter. | | `browser` | | string | Specifies the command to open the browser for Browser Authentication, when logmech is BROWSER. Browser Authentication is supported for Windows and macOS. Equivalent to the Teradata JDBC Driver BROWSER connection parameter. | | `browser_tab_timeout` | `"5"` | quoted integer | Specifies the number of seconds to wait before closing the browser tab after Browser Authentication is completed. The default is 5 seconds. The behavior is under the browser's control, and not all browsers support automatic closing of browser tabs. | | `browser_timeout` | `"180"` | quoted integer | Specifies the number of seconds that the driver will wait for Browser Authentication to complete. The default is 180 seconds (3 minutes). | | `column_name` | `"false"` | quoted boolean | Controls the behavior of cursor `.description` sequence `name` items. Equivalent to the Teradata JDBC Driver `COLUMN_NAME` connection parameter. False specifies that a cursor `.description` sequence `name` item provides the AS-clause name if available, or the column name if available, or the column title. True specifies that a cursor `.description` sequence `name` item provides the column name if available, but has no effect when StatementInfo parcel support is unavailable. | | `connect_timeout` | `"10000"` | quoted integer | Specifies the timeout in milliseconds for establishing a TCP socket connection. Specify 0 for no timeout. The default is 10 seconds (10000 milliseconds). | | `cop` | `"true"` | quoted boolean | Specifies whether COP Discovery is performed. Equivalent to the Teradata JDBC Driver `COP` connection parameter. | | `coplast` | `"false"` | quoted boolean | Specifies how COP Discovery determines the last COP hostname. Equivalent to the Teradata JDBC Driver `COPLAST` connection parameter. When `coplast` is `false` or omitted, or COP Discovery is turned off, then no DNS lookup occurs for the coplast hostname. When `coplast` is `true`, and COP Discovery is turned on, then a DNS lookup occurs for a coplast hostname. | | `port` | `"1025"` | quoted integer | Specifies the database port number. Equivalent to the Teradata JDBC Driver `DBS_PORT` connection parameter. | | `encryptdata` | `"false"` | quoted boolean | Controls encryption of data exchanged between the driver and the database. Equivalent to the Teradata JDBC Driver `ENCRYPTDATA` connection parameter. | | `fake_result_sets` | `"false"` | quoted boolean | Controls whether a fake result set containing statement metadata precedes each real result set. | | `field_quote` | `"\""` | string | Specifies a single character string used to quote fields in a CSV file. | | `field_sep` | `","` | string | Specifies a single character string used to separate fields in a CSV file. Equivalent to the Teradata JDBC Driver `FIELD_SEP` connection parameter. | | `host` | | string | Specifies the database hostname. | | `https_port` | `"443"` | quoted integer | Specifies the database port number for HTTPS/TLS connections. Equivalent to the Teradata JDBC Driver `HTTPS_PORT` connection parameter. | | `lob_support` | `"true"` | quoted boolean | Controls LOB support. Equivalent to the Teradata JDBC Driver `LOB_SUPPORT` connection parameter. | | `log` | `"0"` | quoted integer | Controls debug logging. Somewhat equivalent to the Teradata JDBC Driver `LOG` connection parameter. This parameter's behavior is subject to change in the future. This parameter's value is currently defined as an integer in which the 1-bit governs function and method tracing, the 2-bit governs debug logging, the 4-bit governs transmit and receive message hex dumps, and the 8-bit governs timing. Compose the value by adding together 1, 2, 4, and/or 8. | | `logdata` | | string | Specifies extra data for the chosen logon authentication method. Equivalent to the Teradata JDBC Driver `LOGDATA` connection parameter. | | `logon_timeout` | `"0"` | quoted integer | Specifies the logon timeout in seconds. Zero means no timeout. | | `logmech` | `"TD2"` | string | Specifies the logon authentication method. Equivalent to the Teradata JDBC Driver `LOGMECH` connection parameter. Possible values are `TD2` (the default), `JWT`, `LDAP`, `BROWSER`, `KRB5` for Kerberos, or `TDNEGO`. | | `max_message_body` | `"2097000"` | quoted integer | Specifies the maximum Response Message size in bytes. Equivalent to the Teradata JDBC Driver `MAX_MESSAGE_BODY` connection parameter. | | `partition` | `"DBC/SQL"` | string | Specifies the database partition. Equivalent to the Teradata JDBC Driver `PARTITION` connection parameter. | | `request_timeout` | `"0"` | quoted integer | Specifies the timeout for executing each SQL request. Zero means no timeout. | | `retries` | `0` | integer | Allows an adapter to automatically try again when the attempt to open a new connection on the database has a transient, infrequent error. This option can be set using the retries configuration. Default value is 0. The default wait period between connection attempts is one second. retry\_timeout (seconds) option allows us to adjust this waiting period. | | `runstartup` | "false" | quoted boolean | Controls whether the user's STARTUP SQL request is executed after logon. For more information, refer to User STARTUP SQL Request. Equivalent to the Teradata JDBC Driver RUNSTARTUP connection parameter. If retries is set to 3, the adapter will try to establish a new connection three times if an error occurs. | | `sessions` | | quoted integer | Specifies the number of data transfer connections for FastLoad or FastExport. The default (recommended) lets the database choose the appropriate number of connections. Equivalent to the Teradata JDBC Driver SESSIONS connection parameter. | | `sip_support` | `"true"` | quoted boolean | Controls whether StatementInfo parcel is used. Equivalent to the Teradata JDBC Driver `SIP_SUPPORT` connection parameter. | | `sp_spl` | `"true"` | quoted boolean | Controls whether stored procedure source code is saved in the database when a SQL stored procedure is created. Equivalent to the Teradata JDBC Driver SP\_SPL connection parameter. | | `sslca` | | string | Specifies the file name of a PEM file that contains Certificate Authority (CA) certificates for use with `sslmode` values `VERIFY-CA` or `VERIFY-FULL`. Equivalent to the Teradata JDBC Driver `SSLCA` connection parameter. | | `sslcrc` | `"ALLOW"` | string | Equivalent to the Teradata JDBC Driver SSLCRC connection parameter. Values are case-insensitive.
• ALLOW provides "soft fail" behavior such that communication failures are ignored during certificate revocation checking.
• REQUIRE mandates that certificate revocation checking must succeed. | | `sslcapath` | | string | Specifies a directory of PEM files that contain Certificate Authority (CA) certificates for use with `sslmode` values `VERIFY-CA` or `VERIFY-FULL`. Only files with an extension of `.pem` are used. Other files in the specified directory are not used. Equivalent to the Teradata JDBC Driver `SSLCAPATH` connection parameter. | | `sslcipher` | | string | Specifies the TLS cipher for HTTPS/TLS connections. Equivalent to the Teradata JDBC Driver `SSLCIPHER` connection parameter. | | `sslmode` | `"PREFER"` | string | Specifies the mode for connections to the database. Equivalent to the Teradata JDBC Driver `SSLMODE` connection parameter.
• `DISABLE` disables HTTPS/TLS connections and uses only non-TLS connections.
• `ALLOW` uses non-TLS connections unless the database requires HTTPS/TLS connections.
• `PREFER` uses HTTPS/TLS connections unless the database does not offer HTTPS/TLS connections.
• `REQUIRE` uses only HTTPS/TLS connections.
• `VERIFY-CA` uses only HTTPS/TLS connections and verifies that the server certificate is valid and trusted.
• `VERIFY-FULL` uses only HTTPS/TLS connections, verifies that the server certificate is valid and trusted, and verifies that the server certificate matches the database hostname. | | `sslprotocol` | `"TLSv1.2"` | string | Specifies the TLS protocol for HTTPS/TLS connections. Equivalent to the Teradata JDBC Driver `SSLPROTOCOL` connection parameter. | | `teradata_values` | `"true"` | quoted boolean | Controls whether `str` or a more specific Python data type is used for certain result set column value types. | | `query_band` | `"org=teradata-internal-telem;appname=dbt;"` | string | Specifies the Query Band string to be set for each SQL request. | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | Refer to [connection parameters](https://github.com/Teradata/python-driver#connection-parameters) for the full description of the connection parameters. #### Supported features[​](#supported-features "Direct link to Supported features") ##### Materializations[​](#materializations "Direct link to Materializations") * `view` * `table` * `ephemeral` * `incremental` ###### Incremental Materialization[​](#incremental-materialization "Direct link to Incremental Materialization") The following incremental materialization strategies are supported: * `append` (default) * `delete+insert` * `merge` * `valid_history` * `microbatch` info * To learn more about dbt incremental strategies, refer to [the dbt incremental strategy documentation](https://docs.getdbt.com/docs/build/incremental-strategy.md). * To learn more about `valid_history` incremental strategy, refer to [Teradata configs](https://docs.getdbt.com/reference/resource-configs/teradata-configs.md). ##### Commands[​](#commands "Direct link to Commands") All dbt commands are supported. #### Support for model contracts[​](#support-for-model-contracts "Direct link to Support for model contracts") Model contracts are supported with dbt-teradata v1.7.1 and onwards. Constraint support and enforcement in dbt-teradata: | Constraint type | Support Platform | enforcement | | --------------- | ---------------- | ----------- | | not\_null | ✅ Supported | ✅ Enforced | | primary\_key | ✅ Supported | ✅ Enforced | | foreign\_key | ✅ Supported | ✅ Enforced | | unique | ✅ Supported | ✅ Enforced | | check | ✅ Supported | ✅ Enforced | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | Refer to [Model contracts](https://docs.getdbt.com/docs/mesh/govern/model-contracts.md) for more info. #### Support for `dbt-utils` package[​](#support-for-dbt-utils-package "Direct link to support-for-dbt-utils-package") `dbt-utils` package is supported through `teradata/teradata_utils` dbt package. The package provides a compatibility layer between `dbt_utils` and `dbt-teradata`. See [teradata\_utils](https://hub.getdbt.com/teradata/teradata_utils/latest/) package for install instructions. ##### Cross DB macros[​](#cross-db-macros "Direct link to Cross DB macros") Starting with release 1.3, some macros were migrated from [teradata-dbt-utils](https://github.com/Teradata/dbt-teradata-utils) dbt package to the connector. Refer the following table for the macros supported by the connector. For using cross-DB macros, teradata-utils as a macro namespace will not be used, as cross-DB macros have been migrated from teradata-utils to Dbt-Teradata. ###### Compatibility[​](#compatibility "Direct link to Compatibility") | Macro Group | Macro Name | Status | Comment | | --------------------- | ------------------ | ------ | ------------------------------------------------------------ | | Cross-database macros | current\_timestamp | ✅ | custom macro provided | | Cross-database macros | dateadd | ✅ | custom macro provided | | Cross-database macros | datediff | ✅ | custom macro provided, see [compatibility note](#datediff) | | Cross-database macros | split\_part | ✅ | custom macro provided | | Cross-database macros | date\_trunc | ✅ | custom macro provided | | Cross-database macros | hash | ✅ | custom macro provided, see [compatibility note](#hash) | | Cross-database macros | replace | ✅ | custom macro provided | | Cross-database macros | type\_string | ✅ | custom macro provided | | Cross-database macros | last\_day | ✅ | no customization needed, see [compatibility note](#last_day) | | Cross-database macros | width\_bucket | ✅ | no customization | | Cross-database macros | generate\_series | ✅ | custom macro provided | | Cross-database macros | date\_spine | ✅ | no customization | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | ###### examples for cross DB macros[​](#examples-for-cross-db-macros "Direct link to examples for cross DB macros") ###### []()replace[​](#replace "Direct link to replace") {{ dbt.replace("string\_text\_column", "old\_chars", "new\_chars") }} {{ replace('abcgef', 'g', 'd') }} ###### []()date\_trunc[​](#date_trunc "Direct link to date_trunc") {{ dbt.date\_trunc("date\_part", "date") }} {{ dbt.date\_trunc("DD", "'2018-01-05 12:00:00'") }} ###### []()datediff[​](#datediff "Direct link to datediff") `datediff` macro in teradata supports difference between dates. Differece between timestamps is not supported. ###### []()hash[​](#hash "Direct link to hash") `Hash` macro needs an `md5` function implementation. Teradata doesn't support `md5` natively. You need to install a User Defined Function (UDF) and optionally specify `md5_udf` [variable](https://docs.getdbt.com/docs/build/project-variables.md). If not specified the code defaults to using `GLOBAL_FUNCTIONS.hash_md5`. See the following instructions on how to install the custom UDF: 1. Download the md5 UDF implementation from Teradata (registration required): . 2. Unzip the package and go to `src` directory. 3. Start up `bteq` and connect to your database. 4. Create database `GLOBAL_FUNCTIONS` that will host the UDF. You can't change the database name as it's hardcoded in the macro: ```sql CREATE DATABASE GLOBAL_FUNCTIONS AS PERMANENT = 60e6, SPOOL = 120e6; ``` 5. Create the UDF. Replace `` with your current database user: ```sql GRANT CREATE FUNCTION ON GLOBAL_FUNCTIONS TO ; DATABASE GLOBAL_FUNCTIONS; .run file = hash_md5.btq ``` 6. Grant permissions to run the UDF with grant option. ```sql GRANT EXECUTE FUNCTION ON GLOBAL_FUNCTIONS TO PUBLIC WITH GRANT OPTION; ``` Instruction on how to add md5\_udf variable in dbt\_project.yml for custom hash function: ```yaml vars: md5_udf: Custom_database_name.hash_method_function ``` ###### []()last\_day[​](#last_day "Direct link to last_day") `last_day` in `teradata_utils`, unlike the corresponding macro in `dbt_utils`, doesn't support `quarter` datepart. dbt-teradata 1.8.0 and later versions support unit tests, enabling you to validate SQL models and logic with a small set of static inputs before going to production. This feature enhances test-driven development and boosts developer efficiency and code reliability. Learn more about dbt unit tests [here](https://docs.getdbt.com/docs/build/unit-tests.md). #### Limitations[​](#limitations "Direct link to Limitations") ##### Browser authentication[​](#browser-authentication "Direct link to Browser authentication") * When running a dbt job with logmech set to "browser", the initial authentication opens a browser window where you must enter your username and password. * After authentication, this window remains open, requiring you to manually switch back to the dbt console. * For every subsequent connection, a new browser tab briefly opens, displaying the message "TERADATA BROWSER AUTHENTICATION COMPLETED," and silently reuses the existing session. * However, the focus stays on the browser window, so you’ll need to manually switch back to the dbt console each time. * This behavior is the default functionality of the teradatasql driver and cannot be avoided at this time. * To prevent session expiration and the need to re-enter credentials, ensure the authentication browser window stays open until the job is complete. ##### Transaction mode[​](#transaction-mode "Direct link to Transaction mode") Both ANSI and TERA modes are now supported in dbt-teradata. TERA mode's support is introduced with dbt-teradata 1.7.1, it is an initial implementation. TERA transaction mode This is an initial implementation of the TERA transaction mode and may not support some use cases. We highly recommend validating all records or transformations using this mode to avoid unexpected issues or errors. #### Credits[​](#credits "Direct link to Credits") The adapter was originally created by [Doug Beatty](https://github.com/dbeatty10). Teradata took over the adapter in January 2022. We are grateful to Doug for founding the project and accelerating the integration of dbt + Teradata. #### License[​](#license "Direct link to License") The adapter is published using Apache-2.0 License. Refer to the [terms and conditions](https://github.com/dbt-labs/dbt-core/blob/main/License.md) to understand items such as creating derivative work and the support model. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ##### Connect TiDB to dbt Core Vendor-supported plugin Some [core functionality](https://github.com/pingcap/dbt-tidb/blob/main/README.md#supported-features) may be limited. If you're interested in contributing, check out the source code repository listed below. * **Maintained by**: PingCAP * **Authors**: Xiang Zhang, Qiang Wu, Yuhang Shi * **GitHub repo**: [pingcap/dbt-tidb](https://github.com/pingcap/dbt-tidb) [![](https://img.shields.io/github/stars/pingcap/dbt-tidb?style=for-the-badge)](https://github.com/pingcap/dbt-tidb) * **PyPI package**: `dbt-tidb` [![](https://badge.fury.io/py/dbt-tidb.svg)](https://badge.fury.io/py/dbt-tidb) * **Slack channel**: [#db-tidb](https://getdbt.slack.com/archives/C03CC86R1NY) * **Supported dbt Core version**: v1.0.0 and newer * **dbt support**: Not Supported * **Minimum data platform version**: n/a #### Installing dbt-tidb Use `pip` to install the adapter. Before 1.8, installing the adapter would automatically install `dbt-core` and any additional dependencies. Beginning in 1.8, installing an adapter does not automatically install `dbt-core`. This is because adapters and dbt Core versions have been decoupled from each other so we no longer want to overwrite existing dbt-core installations. Use the following command for installation: `python -m pip install dbt-core dbt-tidb` #### Configuring dbt-tidb For TiDB-specific configuration, please refer to [TiDB configs.](https://docs.getdbt.com/reference/resource-configs/no-configs.md) #### Connecting to TiDB with **dbt-tidb**[​](#connecting-to-tidb-with-dbt-tidb "Direct link to connecting-to-tidb-with-dbt-tidb") ##### User / Password Authentication[​](#user--password-authentication "Direct link to User / Password Authentication") Configure your dbt profile for using TiDB: ###### TiDB connection profile[​](#tidb-connection-profile "Direct link to TiDB connection profile") profiles.yml ```yaml dbt-tidb: target: dev outputs: dev: type: tidb server: 127.0.0.1 port: 4000 schema: database_name username: tidb_username password: tidb_password # optional retries: 3 # default 1 ``` ###### Description of Profile Fields[​](#description-of-profile-fields "Direct link to Description of Profile Fields") | Option | Description | Required? | Example | | -------- | ---------------------------------------------------- | --------- | ------------------ | | type | The specific adapter to use | Required | `tidb` | | server | The server (hostname) to connect to | Required | `yourorg.tidb.com` | | port | The port to use | Required | `4000` | | schema | Specify the schema (database) to build models into | Required | `analytics` | | username | The username to use to connect to the server | Required | `dbt_admin` | | password | The password to use for authenticating to the server | Required | `awesome_password` | | retries | The retry times after an unsuccessful connection | Optional | `default 1` | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | #### Database User Privileges[​](#database-user-privileges "Direct link to Database User Privileges") Your database user would be able to have some abilities to read or write, such as `SELECT`, `CREATE`, and so on. You can find some help [here](https://docs.pingcap.com/tidb/v4.0/privilege-management) with TiDB privileges management. | Required Privilege | | ---------------------- | | SELECT | | CREATE | | CREATE TEMPORARY TABLE | | CREATE VIEW | | INSERT | | DROP | | SHOW DATABASE | | SHOW VIEW | | SUPER | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | #### Supported features[​](#supported-features "Direct link to Supported features") | TiDB 4.X | TiDB 5.0 ~ 5.2 | TiDB >= 5.3 | Feature | | -------- | --------------- | ----------- | --------------------------- | | ✅ | ✅ | ✅ | Table materialization | | ✅ | ✅ | ✅ | View materialization | | ❌ | ❌ | ✅ | Incremental materialization | | ❌ | ✅ | ✅ | Ephemeral materialization | | ✅ | ✅ | ✅ | Seeds | | ✅ | ✅ | ✅ | Sources | | ✅ | ✅ | ✅ | Custom data tests | | ✅ | ✅ | ✅ | Docs generate | | ❌ | ❌ | ✅ | Snapshots | | ✅ | ✅ | ✅ | Grant | | ✅ | ✅ | ✅ | Connection retry | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | **Note:** * TiDB 4.0 ~ 5.0 does not support [CTE](https://docs.pingcap.com/tidb/dev/sql-statement-with), you should avoid using `WITH` in your SQL code. * TiDB 4.0 ~ 5.2 does not support creating a [temporary table or view](https://docs.pingcap.com/tidb/v5.2/sql-statement-create-table#:~:text=sec\)-,MySQL%20compatibility,-TiDB%20does%20not). * TiDB 4.X does not support using SQL func in `CREATE VIEW`, avoid it in your SQL code. You can find more detail [here](https://github.com/pingcap/tidb/pull/27252). #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ##### Connect Upsolver to dbt Core #### Overview of dbt-upsolver * **Maintained by**: Upsolver Team * **Authors**: Upsolver Team * **GitHub repo**: [Upsolver/dbt-upsolver](https://github.com/Upsolver/dbt-upsolver)[![](https://img.shields.io/github/stars/Upsolver/dbt-upsolver?style=for-the-badge)](https://github.com/Upsolver/dbt-upsolver) * **PyPI package**: `dbt-upsolver` [![](https://badge.fury.io/py/dbt-upsolver.svg)](https://badge.fury.io/py/dbt-upsolver) * **Slack channel**: [Upsolver Community](https://join.slack.com/t/upsolvercommunity/shared_invite/zt-1zo1dbyys-hj28WfaZvMh4Z4Id3OkkhA) * **Supported dbt Core version**: v1.5.0 and newer * **dbt support**: Not Supported * **Minimum data platform version**: n/a #### Installing dbt-upsolver pip is the easiest way to install the adapter: `python -m pip install dbt-upsolver` Installing `dbt-upsolver` will also install `dbt-core` and any other dependencies. #### Configuring dbt-upsolver For Upsolver-specifc configuration please refer to [Upsolver Configuration](https://docs.getdbt.com/reference/resource-configs/upsolver-configs.md) For further info, refer to the GitHub repository: [Upsolver/dbt-upsolver](https://github.com/Upsolver/dbt-upsolver) #### Authentication Methods[​](#authentication-methods "Direct link to Authentication Methods") ##### User / Token authentication[​](#user--token-authentication "Direct link to User / Token authentication") Upsolver can be configured using basic user/token authentication as shown below. \~/.dbt/profiles.yml ```yaml my-upsolver-db: target: dev outputs: dev: type: upsolver api_url: https://mt-api-prod.upsolver.com user: [username] token: [token] database: [database name] schema: [schema name] threads: [1 or more] ``` #### Configurations[​](#configurations "Direct link to Configurations") The configs for Upsolver targets are shown below. ##### All configurations[​](#all-configurations "Direct link to All configurations") | Config | Required? | Description | | -------- | --------- | ---------------------------------------------------------------------------------------------------------- | | token | Yes | The token to connect Upsolver [Upsolver's documentation](https://docs.upsolver.com/sqlake/api-integration) | | user | Yes | The user to log in as | | database | Yes | The database that dbt should create models in | | schema | Yes | The schema to build models into by default | | api\_url | Yes | The API url to connect. Common value `https://mt-api-prod.upsolver.com` | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ##### Connect Vertica to dbt Core VENDOR-SUPPORTED PLUGIN If you're interested in contributing, check out the source code for each repository listed below. * **Maintained by**: Vertica * **Authors**: Vertica (Former authors: Matthew Carter, Andy Regan, Andrew Hedengren) * **GitHub repo**: [vertica/dbt-vertica](https://github.com/vertica/dbt-vertica) [![](https://img.shields.io/github/stars/vertica/dbt-vertica?style=for-the-badge)](https://github.com/vertica/dbt-vertica) * **PyPI package**: `dbt-vertica` [![](https://badge.fury.io/py/dbt-vertica.svg)](https://badge.fury.io/py/dbt-vertica) * **Slack channel**: [n/a](https://www.getdbt.com/community/) * **Supported dbt Core version**: v1.8.5 and newer * **dbt support**: Not Supported * **Minimum data platform version**: Vertica 24.3.0 #### Installing dbt-vertica Use `pip` to install the adapter. Before 1.8, installing the adapter would automatically install `dbt-core` and any additional dependencies. Beginning in 1.8, installing an adapter does not automatically install `dbt-core`. This is because adapters and dbt Core versions have been decoupled from each other so we no longer want to overwrite existing dbt-core installations. Use the following command for installation: `python -m pip install dbt-core dbt-vertica` #### Configuring dbt-vertica For Vertica-specific configuration, please refer to [Vertica configs.](https://docs.getdbt.com/reference/resource-configs/vertica-configs.md) ##### Connecting to Vertica with dbt-vertica ###### Username / password authentication[​](#username--password-authentication "Direct link to Username / password authentication") Configure your dbt profile for using Vertica: ###### Vertica connection information[​](#vertica-connection-information "Direct link to Vertica connection information") profiles.yml ```yaml your-profile: outputs: dev: type: vertica # Don't change this! host: [hostname] port: [port] # or your custom port (optional) username: [your username] password: [your password] database: [database name] oauth_access_token: [access token] schema: [dbt schema] connection_load_balance: True backup_server_node: [list of backup hostnames or IPs] retries: [1 or more] autocommit: False threads: [1 or more] target: dev ``` ###### Description of Profile Fields:[​](#description-of-profile-fields "Direct link to Description of Profile Fields:") | Property | Description | Required? | Default Value | Example | | ------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | --------- | ----------------------------------------------------- | ------------------------------------------------------------------------------------------------------------- | | type | The specific adapter to use. | Yes | None | vertica | | host | The host name or IP address of any active node in the Vertica Server. | Yes | None | 127.0.0.1 | | port | The port to use, default or custom. | Yes | 5433 | 5433 | | username | The username to use to connect to the server. | Yes | None | dbadmin | | password | The password to use for authenticating to the server. | Yes | None | my\_password | | database | The name of the database running on the server. | Yes | None | my\_db | | oauth\_access\_token | To authenticate via OAuth, provide an OAuth Access Token that authorizes a user to the database. | No | "" | Default: "" | | schema | The schema to build models into. | No | None | VMart | | connection\_load\_balance | A Boolean value that indicates whether the connection can be redirected to a host in the database other than host. | No | True | True | | backup\_server\_node | List of hosts to connect to if the primary host specified in the connection (host, port) is unreachable. Each item in the list should be either a host string (using default port 5433) or a (host, port) tuple. A host can be a host name or an IP address. | No | None | \['123.123.123.123','[www.abc.com',('123.123.123.124',5433)](http://www.abc.com',\('123.123.123.124',5433\))] | | retries | The retry times after an unsuccessful connection. | No | 2 | 3 | | threads | The number of threads the dbt project will run on. | No | 1 | 3 | | label | A session label to identify the connection. | No | An auto-generated label with format of: dbt\_username | dbt\_dbadmin | | autocommit | Boolean value that indicates if the connection can enable or disable auto-commit. | No | True | False | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | For more information on Vertica’s connection properties please refer to [Vertica-Python](https://github.com/vertica/vertica-python#create-a-connection) Connection Properties. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ##### Connect YDB to dbt Core #### Overview of dbt-ydb * **Maintained by**: YDB Team * **Authors**: YDB Team * **GitHub repo**: [ydb-platform/dbt-ydb](https://github.com/ydb-platform/dbt-ydb)[![](https://img.shields.io/github/stars/ydb-platform/dbt-ydb?style=for-the-badge)](https://github.com/ydb-platform/dbt-ydb) * **PyPI package**: `dbt-ydb` [![](https://badge.fury.io/py/dbt-ydb.svg)](https://badge.fury.io/py/dbt-ydb) * **Slack channel**: [n/a]() * **Supported dbt Core version**: v1.8.0 and newer * **dbt support**: Not Supported * **Minimum data platform version**: n/a #### Installing dbt-ydb pip is the easiest way to install the adapter: `python -m pip install dbt-ydb` Installing `dbt-ydb` will also install `dbt-core` and any other dependencies. #### Configuring dbt-ydb For YDB-specifc configuration please refer to [YDB Configuration](https://docs.getdbt.com/reference/resource-configs/no-configs.md) For further info, refer to the GitHub repository: [ydb-platform/dbt-ydb](https://github.com/ydb-platform/dbt-ydb) #### Connecting to YDB[​](#connecting-to-ydb "Direct link to Connecting to YDB") To connect to YDB from dbt, you'll need to add a [profile](https://docs.getdbt.com/docs/local/profiles.yml.md) to your `profiles.yml` file. A YDB profile conforms to the following syntax: profiles.yml ```yaml profile-name: target: dev outputs: dev: type: ydb host: localhost port: 2136 database: /local schema: empty_string secure: False root_certificates_path: empty_string # Static credentials username: empty_string password: empty_string # Access token credentials token: empty_string # Service account credentials service_account_credentials_file: empty_string ``` ##### All configurations[​](#all-configurations "Direct link to All configurations") | Config | Required? | Default | Description | | ----------------------------------- | --------- | -------------- | ----------------------------------------------------------------------------- | | host | Yes | | YDB host | | port | Yes | | YDB port | | database | Yes | | YDB database | | schema | No | `empty_string` | Optional subfolder for dbt models. Use empty string or `/` to use root folder | | secure | No | False | If enabled, `grpcs` protocol will be used | | root\_certificates\_path | No | `empty_string` | Optional path to root certificates file | | username | No | `empty_string` | YDB username to use static Ccredentials | | password | No | `empty_string` | YDB password to use static credentials | | token | No | `empty_string` | YDB token to use Access Token credentials | | service\_account\_credentials\_file | No | `empty_string` | Path to service account credentials file to use service account credentials | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ##### Connect Yellowbrick to dbt Core Community plugin Some core functionality may be limited. * **Maintained by**: Community * **Authors**: InfoCapital team * **GitHub repo**: [InfoCapital-AU/dbt-yellowbrick](https://github.com/InfoCapital-AU/dbt-yellowbrick) [![](https://img.shields.io/github/stars/InfoCapital-AU/dbt-yellowbrick?style=for-the-badge)](https://github.com/InfoCapital-AU/dbt-yellowbrick) * **PyPI package**: `dbt-yellowbrick` [![](https://badge.fury.io/py/dbt-yellowbrick.svg)](https://badge.fury.io/py/dbt-yellowbrick) * **Slack channel**: [n/a](https://www.getdbt.com/community) * **Supported dbt Core version**: v1.7.0 and newer * **dbt support**: Not Supported * **Minimum data platform version**: Yellowbrick 5.2 #### Installing dbt-yellowbrick Use `pip` to install the adapter. Before 1.8, installing the adapter would automatically install `dbt-core` and any additional dependencies. Beginning in 1.8, installing an adapter does not automatically install `dbt-core`. This is because adapters and dbt Core versions have been decoupled from each other so we no longer want to overwrite existing dbt-core installations. Use the following command for installation: `python -m pip install dbt-core dbt-yellowbrick` #### Configuring dbt-yellowbrick For Yellowbrick Data-specific configuration, please refer to [Yellowbrick Data configs.](https://docs.getdbt.com/reference/resource-configs/yellowbrick-configs.md) #### Profile configuration[​](#profile-configuration "Direct link to Profile configuration") Yellowbrick targets should be set up using the following configuration in your `profiles.yml` file. \~/.dbt/profiles.yml ```yaml company-name: target: dev outputs: dev: type: yellowbrick host: [hostname] user: [username] password: [password] port: [port] dbname: [database name] schema: [dbt schema] role: [optional, set the role dbt assumes when executing queries] sslmode: [optional, set the sslmode used to connect to the database] sslrootcert: [optional, set the sslrootcert config value to a new file path to customize the file location that contains root certificates] ``` ##### Configuration notes[​](#configuration-notes "Direct link to Configuration notes") This adapter is based on the dbt-postgres adapter documented here [Postgres profile setup](https://docs.getdbt.com/docs/local/connect-data-platform/postgres-setup.md) ###### role[​](#role "Direct link to role") The `role` config controls the user role that dbt assumes when opening new connections to the database. ###### sslmode / sslrootcert[​](#sslmode--sslrootcert "Direct link to sslmode / sslrootcert") The ssl config parameters control how dbt connects to Yellowbrick using SSL. Refer to the [Yellowbrick documentation](https://docs.yellowbrick.com/5.2.27/client_tools/config_ssl_for_clients_intro.html) for details. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ##### Connection profiles When you invoke dbt from the command line, dbt parses your `dbt_project.yml` and obtains the `profile` name, which dbt needs to connect to your data warehouse. dbt\_project.yml ```yaml # Example dbt_project.yml file name: 'jaffle_shop' profile: 'jaffle_shop' ... ``` dbt then checks your `profiles.yml` file for a profile with the same name. A profile contains all the details required to connect to your data warehouse. dbt will search the current working directory for the `profiles.yml` file and will default to the `~/.dbt/` directory if not found. This file generally lives outside of your dbt project to avoid sensitive credentials being checked in to version control, but `profiles.yml` can be safely checked in when [using environment variables](#advanced-using-environment-variables) to load sensitive credentials. \~/.dbt/profiles.yml ```yaml # example profiles.yml file jaffle_shop: target: dev outputs: dev: type: postgres host: localhost user: alice password: port: 5432 dbname: jaffle_shop schema: dbt_alice threads: 4 prod: # additional prod target type: postgres host: prod.db.example.com user: alice password: port: 5432 dbname: jaffle_shop schema: analytics threads: 8 ``` To add an additional target (like `prod`) to your existing `profiles.yml`, you can add another entry under the `outputs` key. #### The `env_var` function[​](#the-env_var-function "Direct link to the-env_var-function") The `env_var` function can be used to incorporate environment variables from the system into your dbt project. You can use the `env_var` function in your `profiles.yml` file, the `dbt_project.yml` file, the `sources.yml` file, your `schema.yml` files, and in model `.sql` files. Essentially, `env_var` is available anywhere dbt processes Jinja code. When used in a `profiles.yml` file (to avoid putting credentials on a server), it can be used like this: profiles.yml ```yaml profile: target: prod outputs: prod: type: postgres host: 127.0.0.1 # IMPORTANT: Make sure to quote the entire Jinja string here user: "{{ env_var('DBT_USER') }}" password: "{{ env_var('DBT_PASSWORD') }}" .... ``` #### About the `profiles.yml` file[​](#about-the-profilesyml-file "Direct link to about-the-profilesyml-file") In your `profiles.yml` file, you can store as many profiles as you need. Typically, you would have one profile for each warehouse you use. Most organizations only have one profile. #### About profiles[​](#about-profiles "Direct link to About profiles") A profile consists of *targets*, and a specified *default target*. Each *target* specifies the type of warehouse you are connecting to, the credentials to connect to the warehouse, and some dbt-specific configurations. The credentials you need to provide in your target varies across warehouses — sample profiles for each supported warehouse are available in the [Supported Data Platforms](https://docs.getdbt.com/docs/supported-data-platforms.md) section. **Pro Tip:** You may need to surround your password in quotes if it contains special characters. More details [here](https://stackoverflow.com/a/37015689/10415173). #### Setting up your profile[​](#setting-up-your-profile "Direct link to Setting up your profile") To set up your profile, copy the correct sample profile for your warehouse into your `profiles.yml` file and update the details as follows: * Profile name: Replace the name of the profile with a sensible name – it’s often a good idea to use the name of your organization. Make sure that this is the same name as the `profile` indicated in your `dbt_project.yml` file. * `target`: This is the default target your dbt project will use. It must be one of the targets you define in your profile. Commonly it is set to `dev`. * Populating your target: * `type`: The type of data warehouse you are connecting to * Warehouse credentials: Get these from your database administrator if you don’t already have them. Remember that user credentials are very sensitive information that should not be shared. * `schema`: The default schema that dbt will build objects in. * `threads`: The number of threads the dbt project will run on. You can find more information on which values to use in your targets below. Use the [debug](https://docs.getdbt.com/reference/dbt-jinja-functions/debug-method.md) command to validate your warehouse connection. Run `dbt debug` from within a dbt project to test your connection. #### Understanding targets in profiles[​](#understanding-targets-in-profiles "Direct link to Understanding targets in profiles") dbt supports multiple targets within one profile to encourage the use of separate development and production environments as discussed in [dbt environments](https://docs.getdbt.com/docs/local/dbt-core-environments.md). A typical profile for an analyst using dbt locally will have a target named `dev`, and have this set as the default. You may also have a `prod` target within your profile, which creates the objects in your production schema. However, since it's often desirable to perform production runs on a schedule, we recommend deploying your dbt project to a separate machine other than your local machine. Most dbt users only have a `dev` target in their profile on their local machine. If you do have multiple targets in your profile, and want to use a target other than the default, you can do this using the `--target` flag when running a dbt command. For example, to run against your `prod` target instead of the default `dev` target: ```bash dbt run --target prod ``` You can use the `--target` flag with any dbt command, such as: ```bash dbt build --target prod dbt test --target dev dbt compile --target qa ``` ##### Overriding profiles and targets[​](#overriding-profiles-and-targets "Direct link to Overriding profiles and targets") When running dbt commands, you can specify which profile and target to use from the CLI using the `--profile` and `--target` [flags](https://docs.getdbt.com/reference/global-configs/about-global-configs.md#available-flags). These flags override what’s defined in your `dbt_project.yml` as long as the specified profile and target are already defined in your `profiles.yml` file. To run your dbt project with a different profile or target than the default, you can do so using the followingCLI flags: * `--profile` flag — Overrides the profile set in `dbt_project.yml` by pointing to another profile defined in `profiles.yml`. * `--target` flag — Specifies the target within that profile to use (as defined in `profiles.yml`). These flags help when you're working with multiple profiles and targets and want to override defaults without changing your files. ```bash dbt run --profile my-profile-name --target dev ``` In this example, the `dbt run` command will use the `my-profile-name` profile and the `dev` target. #### Understanding warehouse credentials[​](#understanding-warehouse-credentials "Direct link to Understanding warehouse credentials") We recommend that each dbt user has their own set of database credentials, including a separate user for production runs of dbt – this helps debug rogue queries, simplifies ownerships of schemas, and improves security. To ensure the user credentials you use in your target allow dbt to run, you will need to ensure the user has appropriate privileges. While the exact privileges needed varies between data warehouses, at a minimum your user must be able to: * read source data * create schemas¹ * read system tables Running dbt without create schema privileges If your user is unable to be granted the privilege to create schemas, your dbt runs should instead target an existing schema that your user has permission to create relations within. #### Understanding target schemas[​](#understanding-target-schemas "Direct link to Understanding target schemas") The target schema represents the default schema that dbt will build objects into, and is often used as the differentiator between separate environments within a warehouse. Schemas in BigQuery dbt uses the term "schema" in a target across all supported warehouses for consistency. Note that in the case of BigQuery, a schema is actually a dataset. The schema used for production should be named in a way that makes it clear that it is ready for end-users to use for analysis – we often name this `analytics`. In development, a pattern we’ve found to work well is to name the schema in your `dev` target `dbt_`. Suffixing your name to the schema enables multiple users to develop in dbt, since each user will have their own separate schema for development, so that users will not build over the top of each other, and ensuring that object ownership and permissions are consistent across an entire schema. Note that there’s no need to create your target schema beforehand – dbt will check if the schema already exists when it runs, and create it if it doesn’t. While the target schema represents the default schema that dbt will use, it may make sense to split your models into separate schemas, which can be done by using [custom schemas](https://docs.getdbt.com/docs/build/custom-schemas.md). #### Understanding threads[​](#understanding-threads "Direct link to Understanding threads") When dbt runs, it creates a directed acyclic graph (DAG) of links between models. The number of threads represents the maximum number of paths through the graph dbt may work on at once – increasing the number of threads can minimize the run time of your project. The default value for threads in user profiles is 4 threads. For more information, check out [using threads](https://docs.getdbt.com/docs/running-a-dbt-project/using-threads.md). #### Advanced: Customizing a profile directory[​](#advanced-customizing-a-profile-directory "Direct link to Advanced: Customizing a profile directory") * dbt Fusion * dbt Core Fusion determines the parent directory for `profiles.yml` using the following precedence: 1. `--profiles-dir` option 2. Project root directory 3. `~/.dbt/` directory Note that Fusion doesn't currently support the `DBT_PROFILES_DIR` environment variable or setting the `profiles.yml` in the current working directory. dbt Core determines the parent directory for `profiles.yml` using the following precedence: 1. `--profiles-dir` option 2. `DBT_PROFILES_DIR` environment variable 3. current working directory 4. `~/.dbt/` directory To check the expected location of your `profiles.yml` file for your installation of dbt, you can run the following: ```bash $ dbt debug --config-dir To view your profiles.yml file, run: open /Users/alice/.dbt ``` You may want to have your `profiles.yml` file stored in a different directory than `~/.dbt/` – for example, if you are [using environment variables](#advanced-using-environment-variables) to load your credentials, you might choose to include this file in the root directory of your dbt project. Note that the file always needs to be called `profiles.yml`, regardless of which directory it is in. There are multiple ways to direct dbt to a different location for your `profiles.yml` file: ##### 1. Use the `--profiles-dir` option when executing a dbt command[​](#1-use-the---profiles-dir-option-when-executing-a-dbt-command "Direct link to 1-use-the---profiles-dir-option-when-executing-a-dbt-command") This option can be used as follows: ```text $ dbt run --profiles-dir path/to/directory ``` If using this method, the `--profiles-dir` option needs to be provided every time you run a dbt command. ##### 2. Use the `DBT_PROFILES_DIR` environment variable to change the default location (dbt Core only)[​](#2-use-the-dbt_profiles_dir-environment-variable-to-change-the-default-location-dbt-core-only "Direct link to 2-use-the-dbt_profiles_dir-environment-variable-to-change-the-default-location-dbt-core-only") Setting this environment variable tells dbt Core to look for your `profiles.yml` file in the specified directory instead of the default location. You can specify this by running: ```text $ export DBT_PROFILES_DIR=path/to/directory ``` Note: This environment variable isn't supported in Fusion. #### Advanced: Using environment variables[​](#advanced-using-environment-variables "Direct link to Advanced: Using environment variables") Credentials can be placed directly into the `profiles.yml` file or loaded from environment variables. Using environment variables is especially useful for production deployments of dbt. You can find more information about environment variables [here](https://docs.getdbt.com/reference/dbt-jinja-functions/env_var.md). #### Related docs[​](#related-docs "Direct link to Related docs") * [About `profiles.yml`](https://docs.getdbt.com/docs/local/profiles.yml.md) #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ##### Databricks setup #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ##### Redshift setup #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ##### Salesforce Data 360 setup Beta ### Salesforce Data 360 setup [Beta](https://docs.getdbt.com/docs/dbt-versions/product-lifecycles "Go to https://docs.getdbt.com/docs/dbt-versions/product-lifecycles") This `dbt-salesforce` adapter is available via the dbt Fusion engine CLI. To access the adapter, [install dbt Fusion](https://docs.getdbt.com/docs/fusion/about-fusion-install.md). We recommend using the [VS Code Extension](https://docs.getdbt.com/docs/local/install-dbt.md?version=2#get-started) as the development interface. dbt platform support coming soon. #### Prerequisites[​](#prerequisites "Direct link to Prerequisites") Before you can connect dbt to the Salesforce Data 360, you need the following: * A Data 360 instance * [An external client app that dbt connects to for the Data 360 instance](https://help.salesforce.com/s/articleView?id=xcloud.create_a_local_external_client_app.htm\&type=5), with [OAuth configured](https://help.salesforce.com/s/articleView?id=xcloud.configure_external_client_app_oauth_settings.htm\&type=5). OAuth scopes must include: * `api` - To manage user data via APIs. * `refresh_token`, `offline_access` - To perform requests at any time, even when the user is offline or tokens have expired. * `cdp_query_api` - To execute ANSI SQL queries on Data 360 data. * [A private key and the `server.key` file](https://developer.salesforce.com/docs/atlas.en-us.252.0.sfdx_dev.meta/sfdx_dev/sfdx_dev_auth_key_and_cert.htm) * User with `Data Cloud Architect` permission #### Configure Fusion[​](#configure-fusion "Direct link to Configure Fusion") To connect dbt to Salesforce Data 360, set up your `profiles.yml`. Refer to the following configuration: \~/.dbt/profiles.yml ```yaml company-name: target: dev outputs: dev: type: salesforce method: jwt_bearer client_id: [Consumer Key of your Data 360 app] private_key_path: [local file path of your server key] login_url: "https://login.salesforce.com" username: [username on the Data 360 Instance] ``` | Profile field | Required | Description | Example | | ------------------ | -------- | -------------------------------------------------------------- | ------------------------------------------------------------- | | `method` | Yes | Authentication Method. Currently, only `jwt_bearer` supported. | `jwt_bearer` | | `client_id` | Yes | This is the `Consumer Key` from your connected app secrets. | | | `private_key_path` | Yes | File path of the `server.key` file in your computer. | `/Users/dbt_user/Documents/server.key` | | `login_url` | Yes | Login URL of the Salesforce instance. | [https://login.salesforce.com](https://login.salesforce.com/) | | `username` | Yes | Username on the Data 360 Instance. | | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | #### More information[​](#more-information "Direct link to More information") Find Salesforce-specific configuration information in the [Salesforce adapter reference guide](https://docs.getdbt.com/reference/resource-configs/data-cloud-configs.md). #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ##### Snowflake setup #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- #### Install dbt Fusion engine ##### About Fusion local installation Preview ### About Fusion local installation [Preview](https://docs.getdbt.com/docs/dbt-versions/product-lifecycles "Go to https://docs.getdbt.com/docs/dbt-versions/product-lifecycles") important The dbt Fusion engine is currently available for installation in: * [Local command line interface (CLI) tools](https://docs.getdbt.com/docs/local/install-dbt.md?version=2#get-started) [Preview](https://docs.getdbt.com/docs/dbt-versions/product-lifecycles "Go to https://docs.getdbt.com/docs/dbt-versions/product-lifecycles") * [VS Code and Cursor with the dbt extension](https://docs.getdbt.com/docs/install-dbt-extension.md) [Preview](https://docs.getdbt.com/docs/dbt-versions/product-lifecycles "Go to https://docs.getdbt.com/docs/dbt-versions/product-lifecycles") * [dbt platform environments](https://docs.getdbt.com/docs/dbt-versions/upgrade-dbt-version-in-cloud.md#dbt-fusion-engine) [Private preview](https://docs.getdbt.com/docs/dbt-versions/product-lifecycles "Go to https://docs.getdbt.com/docs/dbt-versions/product-lifecycles") Join the conversation in our Community Slack channel [`#dbt-fusion-engine`](https://getdbt.slack.com/archives/C088YCAB6GH). Read the [Fusion Diaries](https://github.com/dbt-labs/dbt-fusion/discussions/categories/announcements) for the latest updates. Learn more about installing Fusion locally, along with important prerequisites, step-by-step installation instructions, troubleshooting common issues, and configuration guidance. #### Prerequisites[​](#prerequisites "Direct link to Prerequisites") Before installing Fusion, ensure that you: * Have administrative privileges to install software on your local machine. * Are comfortable using a command-line interface (Terminal on macOS/Linux, PowerShell on Windows). * Use a supported data warehouse and authentication method and configure permissions as needed:  BigQuery * Service Account / User Token * Native OAuth * External OAuth * [Required permissions](https://docs.getdbt.com/docs/local/connect-data-platform/bigquery-setup.md#required-permissions)  Databricks * Service Account / User Token * Native OAuth  Redshift * Username / Password * IAM profile  Snowflake * Username / Password * Native OAuth * External OAuth * Key pair using a modern PKCS#8 method * MFA * Use a supported operating system: * **macOS:** Supported on both Intel (x86-64) and Apple Silicon (ARM) * **Linux:** Supported on both x86-64 and ARM * **Windows:** Supported on x86-64; ARM support coming soon #### Getting started[​](#getting-started "Direct link to Getting started") If you're ready to get started, choose one of the following options. To learn more about which tool is best for you, see the [Fusion availability](https://docs.getdbt.com/docs/fusion/fusion-availability.md) page. [![](/img/icons/dbt-bit.svg)](https://docs.getdbt.com/docs/local/install-dbt.md?version=2#get-started) ###### [dbt VS Code Extension](https://docs.getdbt.com/docs/local/install-dbt.md?version=2#get-started) [Learn how to connect to a data platform, integrate with secure authentication methods, and configure a sync with a git repo.](https://docs.getdbt.com/docs/local/install-dbt.md?version=2#get-started) [![](/img/icons/dbt-bit.svg)](https://docs.getdbt.com/docs/local/install-dbt.md?version=2#get-started) ###### [dbt Fusion engine from the CLI](https://docs.getdbt.com/docs/local/install-dbt.md?version=2#get-started) [Learn how to install the dbt Fusion engine on the command line interface (CLI).](https://docs.getdbt.com/docs/local/install-dbt.md?version=2#get-started) [![](/img/icons/dbt-bit.svg)](https://docs.getdbt.com/docs/dbt-versions/upgrade-dbt-version-in-cloud.md#dbt-fusion-engine) ###### [dbt Fusion engine upgrade](https://docs.getdbt.com/docs/dbt-versions/upgrade-dbt-version-in-cloud.md#dbt-fusion-engine) [Learn how you can upgrade and leverage the speed and scale of the dbt Fusion engine](https://docs.getdbt.com/docs/dbt-versions/upgrade-dbt-version-in-cloud.md#dbt-fusion-engine) #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### dbt platform #### Account integrations in dbt The following sections describe the different **Account integrations** available from your dbt account under the account **Settings** section. [![Example of Account integrations from the sidebar](/img/docs/dbt-cloud/account-integrations.png?v=2 "Example of Account integrations from the sidebar")](#)Example of Account integrations from the sidebar #### Git integrations[​](#git-integrations "Direct link to Git integrations") Connect your dbt account to your Git provider to enable dbt users to authenticate your personal accounts. dbt will perform Git actions on behalf of your authenticated self, against repositories to which you have access according to your Git provider permissions. To configure a Git account integration: 1. Navigate to **Account settings** in the side menu. 2. Under the **Settings** section, click on **Integrations**. 3. Click on the Git provider from the list and select the **Pencil** icon to the right of the provider. 4. dbt [natively connects](https://docs.getdbt.com/docs/cloud/git/git-configuration-in-dbt-cloud.md) to the following Git providers: * [GitHub](https://docs.getdbt.com/docs/cloud/git/connect-github.md) * [GitLab](https://docs.getdbt.com/docs/cloud/git/connect-gitlab.md) * [Azure DevOps](https://docs.getdbt.com/docs/cloud/git/connect-azure-devops.md) [Enterprise](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing")[Enterprise +](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing") You can connect your dbt account to additional Git providers by importing a git repository from any valid git URL. Refer to [Import a git repository](https://docs.getdbt.com/docs/cloud/git/import-a-project-by-git-url.md) for more information. [![Example of the Git integration page](/img/docs/dbt-cloud/account-integration-git.png?v=2 "Example of the Git integration page")](#)Example of the Git integration page #### OAuth integrations[​](#oauth-integrations "Direct link to OAuth integrations") Connect your dbt account to an OAuth provider that are integrated with dbt. To configure an OAuth account integration: 1. Navigate to **Account settings** in the side menu. 2. Under the **Settings** section, click on **Integrations**. 3. Under **OAuth**, click on **Link** to [connect your Slack account](https://docs.getdbt.com/docs/deploy/job-notifications.md#set-up-the-slack-integration). 4. For custom OAuth providers, under **Custom OAuth integrations**, click on **Add integration** and select the [OAuth provider](https://docs.getdbt.com/docs/cloud/manage-access/sso-overview.md) from the list. Fill in the required fields and click **Save**. [![Example of the OAuth integration page](/img/docs/dbt-cloud/account-integration-oauth.png?v=2 "Example of the OAuth integration page")](#)Example of the OAuth integration page #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- #### Account settings in dbt The following sections describe the different **Account settings** available from your dbt account in the sidebar (under your account name on the lower left-hand side). [![Example of Account settings from the sidebar](/img/docs/dbt-cloud/example-sidebar-account-settings.png?v=2 "Example of Account settings from the sidebar")](#)Example of Account settings from the sidebar #### Git repository caching [Enterprise](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing")[Enterprise +](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing")[​](#git-repository-caching- "Direct link to git-repository-caching-") repo caching enabled by default Git repository caching is enabled by default for all new Enterprise and Enterprise+ accounts, improving reliability by allowing dbt to use a cached copy of your repo if cloning fails. See the next section for more details on repo caching, retention, and more. At the start of every [job](https://docs.getdbt.com/docs/deploy/jobs.md) run, dbt clones the project's Git repository so it has the latest versions of your project's code and runs `dbt deps` to install your dependencies. For improved reliability and performance on your job runs, you can enable dbt to keep a cache of the project's Git repository. So, if there's a third-party outage that causes the cloning operation to fail, dbt will instead use the cached copy of the repo so your jobs can continue running as scheduled. dbt caches your project's Git repo after each successful run and retains it for 8 days if there are no repo updates. It caches all packages regardless of installation method and does not fetch code outside of the job runs. dbt will use the cached copy of your project's Git repo under these circumstances: * Outages from third-party services (for example, the [dbt package hub](https://hub.getdbt.com/)). * Git authentication fails. * There are syntax errors in the `packages.yml` file. You can set up and use [continuous integration (CI)](https://docs.getdbt.com/docs/deploy/continuous-integration.md) to find these errors sooner. * If a package doesn't work with the current dbt version. You can set up and use [continuous integration (CI)](https://docs.getdbt.com/docs/deploy/continuous-integration.md) to identify this issue sooner. * Note, Git repository caching should not be used for CI jobs. CI jobs are designed to test the latest code changes in a pull request and ensure your code is up to date. Using a cached copy of the repo in CI jobs could result in stale code being tested. To use, select the **Enable repository caching** option from your account settings. [![Example of the Enable repository caching option](/img/docs/deploy/account-settings-repository-caching.png?v=2 "Example of the Enable repository caching option")](#)Example of the Enable repository caching option #### Partial parsing[​](#partial-parsing "Direct link to Partial parsing") At the start of every dbt invocation, dbt reads all the files in your project, extracts information, and constructs an internal manifest containing every object (model, source, macro, and so on). Among other things, it uses the `ref()`, `source()`, and `config()` macro calls within models to set properties, infer dependencies, and construct your project's DAG. When dbt finishes parsing your project, it stores the internal manifest in a file called `partial_parse.msgpack`. Parsing projects can be time-consuming, especially for large projects with hundreds of models and thousands of files. To reduce the time it takes dbt to parse your project, use the partial parsing feature in dbt for your environment. When enabled, dbt uses the `partial_parse.msgpack` file to determine which files have changed (if any) since the project was last parsed, and then it parses *only* the changed files and the files related to those changes. Partial parsing in dbt requires dbt version 1.4 or newer. The feature does have some known limitations. Refer to [Known limitations](https://docs.getdbt.com/reference/parsing.md#known-limitations) to learn more about them. To use, select the **Enable partial parsing between deployment runs** option from your account settings. [![Example of the Enable partial parsing between deployment runs option](/img/docs/deploy/account-settings-partial-parsing.png?v=2 "Example of the Enable partial parsing between deployment runs option")](#)Example of the Enable partial parsing between deployment runs option #### Account access and enablement[​](#account-access-and-enablement "Direct link to Account access and enablement") ##### Enabling dbt Copilot [Starter](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing")[Enterprise](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing")[Enterprise +](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing")[​](#enabling-dbt-copilot- "Direct link to enabling-dbt-copilot-") [Copilot](https://docs.getdbt.com/docs/cloud/dbt-copilot.md) is an AI-powered assistant fully integrated into your dbt experience and is designed to accelerate your analytics workflows. To use this feature, your dbt administrator must enable Copilot on your account by selecting the **Enable account access to dbt Copilot features** option from the account settings. For more information, see [Enable dbt Copilot](https://docs.getdbt.com/docs/cloud/enable-dbt-copilot.md). ##### Enabling Advanced CI features [Enterprise](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing")[Enterprise +](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing")[​](#enabling-advanced-ci-features- "Direct link to enabling-advanced-ci-features-") [Advanced CI](https://docs.getdbt.com/docs/deploy/advanced-ci.md) features, such as [compare changes](https://docs.getdbt.com/docs/deploy/advanced-ci.md#compare-changes), allow dbt account members to view details about the changes between what's in the production environment and the pull request. To use Advanced CI features, your dbt account must have access to them. Ask your dbt administrator to enable Advanced CI features on your account, which they can do by choosing the **Enable account access to Advanced CI** option from the account settings. Once enabled, the **dbt compare** option becomes available in the CI job settings for you to select. [![The Enable account access to Advanced CI option](/img/docs/deploy/account-settings-advanced-ci.png?v=2 "The Enable account access to Advanced CI option")](#)The Enable account access to Advanced CI option ##### Enabling external metadata ingestion in dbt Catalog [Starter](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing")[Enterprise](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing")[Enterprise +](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing")[​](#enabling-external-metadata-ingestion-in-dbt-catalog- "Direct link to enabling-external-metadata-ingestion-in-dbt-catalog-") [Catalog](https://docs.getdbt.com/docs/explore/explore-projects.md) allows you to view your project's resources (for example, models, tests, and metrics), their lineage, and model consumption to gain a better understanding of your project's latest production state. You can bring [external metadata](https://docs.getdbt.com/docs/explore/external-metadata-ingestion.md) into Catalog by connecting directly to your warehouse. This enables you to view tables and other assets that aren't defined in dbt. Currently, external metadata ingestion is supported for Snowflake only. To use external metadata ingestion, you must be an [account admin](https://docs.getdbt.com/docs/cloud/manage-access/enterprise-permissions.md#account-admin) with permission to edit connections. Enable Catalog in your account by selecting the **Ingest external metadata in dbt Catalog (formerly dbt Explorer)** option from your account settings. For more information, see [Enable external metadata ingestion](https://docs.getdbt.com/docs/explore/external-metadata-ingestion.md#enable-external-metadata-ingestion). #### Project settings history[​](#project-settings-history "Direct link to Project settings history") You can view historical project settings changes over the last 90 days. To view the change history: 1. Click your account name at the bottom of the left-side menu and click **Account settings**. 2. Click **Projects**. 3. Click a **project name**. 4. Click **History**. [![Example of the project history option. ](/img/docs/deploy/project-history.png?v=2 "Example of the project history option. ")](#)Example of the project history option. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- #### dbt environments An environment determines how dbt will execute your project in the [Studio IDE](https://docs.getdbt.com/docs/cloud/studio-ide/develop-in-studio.md) or [dbt CLI](https://docs.getdbt.com/docs/cloud/cloud-cli-installation.md) (for development) and scheduled jobs (for deployment). Critically, in order to execute dbt, environments define three variables: 1. The version of dbt Core that will be used to run your project 2. The warehouse connection information (including the target database/schema settings) 3. The version of your code to execute Each dbt project can have only one [development environment](#create-a-development-environment), but there is no limit to the number of [deployment environments](https://docs.getdbt.com/docs/deploy/deploy-environments.md), providing you the flexibility and customization to tailor the execution of scheduled jobs. Use environments to customize settings for different stages of your project and streamline the execution process by using software engineering principles. [![dbt environment hierarchy showing projects, environments, connections, and orchestration jobs.](/img/dbt-env.png?v=2 "dbt environment hierarchy showing projects, environments, connections, and orchestration jobs.")](#)dbt environment hierarchy showing projects, environments, connections, and orchestration jobs. The following sections detail the different types of environments and how to intuitively configure your development environment in dbt. #### Types of environments[​](#types-of-environments "Direct link to Types of environments") In dbt, there are two types of environments: * **Deployment environment** — Determines the settings used when jobs created within that environment are executed.
Types of deployment environments: * General * Staging * Production * **Development environment** — Determines the settings used in the Studio IDE or dbt CLI, for that particular project. Each dbt project can only have a single development environment, but can have any number of General deployment environments, one Production deployment environment and one Staging deployment environment. | | Development | General | Production | Staging | | -------------------------------------- | --------------------- | ------------ | ------------ | ------------ | | **Determines settings for** | Studio IDE or dbt CLI | dbt Job runs | dbt Job runs | dbt Job runs | | **How many can I have in my project?** | 1 | Any number | 1 | 1 | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | note For users familiar with development on dbt Core, each environment is roughly analogous to an entry in your `profiles.yml` file, with some additional information about your repository to ensure the proper version of code is executed. More info on dbt core environments [here](https://docs.getdbt.com/docs/local/dbt-core-environments.md). #### Common environment settings[​](#common-environment-settings "Direct link to Common environment settings") Both development and deployment environments have a section called **General Settings**, which has some basic settings that all environments will define: | Setting | Example Value | Definition | Accepted Values | | --------------------------- | ------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ----------------------------- | | Environment name | Production | The environment name | Any string! | | Environment type | Deployment | The type of environment | Deployment, Development | | Set deployment type | PROD | Designates the deployment environment type. | Production, Staging, General | | dbt version | Latest | dbt automatically upgrades the dbt version running in this environment, based on the [release track](https://docs.getdbt.com/docs/dbt-versions/cloud-release-tracks.md) you select. | Lastest, Compatible, Extended | | Only run on a custom branch | ☑️ | Determines whether to use a branch other than the repository’s default | See below | | Custom branch | dev | Custom Branch name | See below | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | About dbt version dbt allows users to select a [release track](https://docs.getdbt.com/docs/dbt-versions/cloud-release-tracks.md) to receive ongoing dbt version upgrades at the cadence that makes sense for their team. ##### Custom branch behavior[​](#custom-branch-behavior "Direct link to Custom branch behavior") By default, all environments will use the default branch in your repository (usually the `main` branch) when accessing your dbt code. This is overridable within each dbt Environment using the **Default to a custom branch** option. This setting will have slightly different behavior depending on the environment type: * **Development**: determines which branch in the Studio IDE or dbt CLI developers create branches from and open PRs against. * **Deployment:** determines the branch is cloned during job executions for each environment. For more info, check out this [FAQ page on this topic](https://docs.getdbt.com/faqs/Environments/custom-branch-settings.md)! ##### Extended attributes[​](#extended-attributes "Direct link to Extended attributes") note Extended attributes are currently *not* supported for SSH tunneling Extended attributes allows users to set a flexible [profiles.yml](https://docs.getdbt.com/docs/local/profiles.yml.md) snippet in their dbt Environment settings. It provides users with more control over environments (both deployment and development) and extends how dbt connects to the data platform within a given environment. Extended attributes are set at the environment level, and can partially override connection or environment credentials, including any custom environment variables. You can set any YAML attributes that a dbt adapter accepts in its `profiles.yml`. [![Extended Attributes helps users add profiles.yml attributes to dbt Environment settings using a free form text box.](/img/docs/dbt-cloud/using-dbt-cloud/extended-attributes.png?v=2 "Extended Attributes helps users add profiles.yml attributes to dbt Environment settings using a free form text box.")](#)Extended Attributes helps users add profiles.yml attributes to dbt Environment settings using a free form text box.
The following code is an example of the types of attributes you can add in the **Extended Attributes** text box: ```yaml dbname: jaffle_shop schema: dbt_alice threads: 4 username: alice password: '{{ env_var(''DBT_ENV_SECRET_PASSWORD'') }}' ``` ###### Extended Attributes don't mask secret values[​](#extended-attributes-dont-mask-secret-values "Direct link to Extended Attributes don't mask secret values") We recommend avoiding setting secret values to prevent visibility in the text box and logs. A common workaround is to wrap extended attributes in [environment variables](https://docs.getdbt.com/docs/build/environment-variables.md). In the earlier example, `password: '{{ env_var(''DBT_ENV_SECRET_PASSWORD'') }}'` will get a value from the `DBT_ENV_SECRET_PASSWORD` environment variable at runtime. ###### How extended attributes work[​](#how-extended-attributes-work "Direct link to How extended attributes work") If you're developing in the [Studio IDE](https://docs.getdbt.com/docs/cloud/studio-ide/develop-in-studio.md), [dbt CLI](https://docs.getdbt.com/docs/cloud/cloud-cli-installation.md), or [orchestrating job runs](https://docs.getdbt.com/docs/deploy/deployments.md), extended attributes parses through the provided YAML and extracts the `profiles.yml` attributes. For each individual attribute: * If the attribute exists in another source (such as your project settings), it will replace its value (like environment-level values) in the profile. It also overrides any custom environment variables (if not itself wired using the syntax described for secrets above) * If the attribute doesn't exist, it will add the attribute or value pair to the profile. ###### Only the **top-level keys** are accepted in extended attributes[​](#only-the-top-level-keys-are-accepted-in-extended-attributes "Direct link to only-the-top-level-keys-are-accepted-in-extended-attributes") This means that if you want to change a specific sub-key value, you must provide the entire top-level key as a JSON block in your resulting YAML. For example, if you want to customize a particular field within a [service account JSON](https://docs.getdbt.com/docs/local/connect-data-platform/bigquery-setup.md#service-account-json) for your BigQuery connection (like 'project\_id' or 'client\_email'), you need to provide an override for the entire top-level `keyfile_json` main key/attribute using extended attributes. Include the sub-fields as a nested JSON block. #### Create a development environment[​](#create-a-development-environment "Direct link to Create a development environment") To create a new dbt development environment: 1. Navigate to **Deploy** -> **Environments** 2. Click **Create Environment**. 3. Select **Development** as the environment type. 4. Fill in the fields under **General Settings** and **Development Credentials**. 5. Click **Save** to create the environment. ##### Set developer credentials[​](#set-developer-credentials "Direct link to Set developer credentials") To use the dbt Studio IDE or dbt CLI, each developer will need to set up [personal development credentials](https://docs.getdbt.com/docs/cloud/studio-ide/develop-in-studio.md#get-started-with-the-cloud-ide) to your warehouse connection in their **Profile Settings**. This allows you to set separate target information and maintain individual credentials to connect to your warehouse. [![Creating a development environment](/img/docs/dbt-cloud/refresh-ide/new-development-environment-fields.png?v=2 "Creating a development environment")](#)Creating a development environment #### Deployment environment[​](#deployment-environment "Direct link to Deployment environment") Deployment environments in dbt are necessary to execute scheduled jobs and use other features (like different workspaces for different tasks). You can have many environments in a single dbt project, enabling you to set up each space in a way that suits different needs (such as experimenting or testing). Even though you can have many environments, only one of them can be the "main" deployment environment. This would be considered your "production" environment and represents your project's "source of truth", meaning it's where your most reliable and final data transformations live. To learn more about dbt deployment environments and how to configure them, refer to the [Deployment environments](https://docs.getdbt.com/docs/deploy/deploy-environments.md) page. For our best practices guide, read [dbt environment best practices](https://docs.getdbt.com/guides/set-up-ci.md) for more info. #### Delete an environment[​](#delete-an-environment "Direct link to Delete an environment") Deleting an environment automatically deletes its associated job(s). If you want to keep those jobs, move them to a different environment first. Follow these steps to delete an environment in dbt: 1. Click **Deploy** on the navigation header and then click **Environments** 2. Select the environment you want to delete. 3. Click **Settings** on the top right of the page and then click **Edit**. 4. Scroll to the bottom of the page and click **Delete** to delete the environment. [![Delete an environment](/img/docs/dbt-cloud/cloud-configuring-dbt-cloud/delete-environment.png?v=2 "Delete an environment")](#)Delete an environment 5. Confirm your action in the pop-up by clicking **Confirm delete** in the bottom right to delete the environment immediately. This action cannot be undone. However, you can create a new environment with the same information if the deletion was made in error. 6. Refresh your page and the deleted environment should now be gone. To delete multiple environments, you'll need to perform these steps to delete each one. If you're having any issues, feel free to [contact us](mailto:support@getdbt.com) for additional help. #### Job monitoring[​](#job-monitoring "Direct link to Job monitoring") On the **Environments** page, there are two sections that provide an overview of the jobs for that environment: * **In progress** — Lists the currently in progress jobs with information on when the run started * **Top jobs by models built** — Ranks jobs by the number of models built over a specific time [![In progress jobs and Top jobs by models built](/img/docs/deploy/in-progress-top-jobs.png?v=2 "In progress jobs and Top jobs by models built")](#)In progress jobs and Top jobs by models built #### Environment settings history[​](#environment-settings-history "Direct link to Environment settings history") You can view historical environment settings changes over the last 90 days. To view the change history: 1. Navigate to **Orchestration** from the main menu and click **Environments**. 2. Click an **environment name**. 3. Click **Settings**. 4. Click **History**. [![Example of the environment history option.](/img/docs/deploy/environment-history.png?v=2 "Example of the environment history option.")](#)Example of the environment history option. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- #### Multi-cell migration checklist dbt Labs is in the process of rolling out a new cell-based architecture for dbt. This architecture provides the foundation of dbt for years to come, and brings improved reliability, performance, and consistency to users of dbt. We're scheduling migrations by account. When we're ready to migrate your account, you will receive a banner or email communication with your migration date. If you have not received this communication, then you don't need to take action at this time. dbt Labs will share information about your migration with you, with appropriate advance notice, when applicable to your account. Your account will be automatically migrated on or after its scheduled date. However, if you use certain features, you must take action before that date to avoid service disruptions. #### Recommended actions[​](#recommended-actions "Direct link to Recommended actions") Rescheduling your migration If you're on the dbt Enterprise tier, you can postpone your account migration by up to 45 days. To reschedule your migration, navigate to **Account Settings** → **Migration guide**. For help, contact the dbt Support Team at . We highly recommended you take these actions: * Ensure pending user invitations are accepted or note outstanding invitations. Pending user invitations might be voided during the migration. You can resend user invitations after the migration is complete. * Commit unsaved changes in the [Studio IDE](https://docs.getdbt.com/docs/cloud/studio-ide/develop-in-studio.md). Unsaved changes might be lost during migration. * Export and download [audit logs](https://docs.getdbt.com/docs/cloud/manage-access/audit-log.md) older than 90 days, as they will be unavailable from dbt after the migration is complete. Logs older than 90 days while within the data retention period are not deleted, but you will have to work with the dbt Labs Customer Support team to recover. #### Required actions[​](#required-actions "Direct link to Required actions") These actions are required to prevent users from losing access dbt: * If you still need to, complete [Auth0 migration for SSO](https://docs.getdbt.com/docs/cloud/manage-access/auth0-migration.md) before your scheduled migration date to avoid service disruptions. If you've completed the Auth0 migration, your account SSO configurations will be transferred automatically. * Update your IP allow lists. dbt will be using new IPs to access your warehouse post-migration. Allow inbound traffic from all of the following new IPs in your firewall and include them in any database grants: * `52.3.77.232` * `3.214.191.130` * `34.233.79.135` Keep the old dbt IPs listed until the migration is complete. #### Run related data retention[​](#run-related-data-retention "Direct link to Run related data retention") All runs available in dbt will be migrated with your account. This includes metadata about the run, like its status, execution start time, and duration. However, the individual steps associated with a run will not be migrated with your account. Therefore, the dbt commands executed during a run, along with their logs and artifact files, will not be available in dbt after your migration. The [Discovery API](https://docs.getdbt.com/docs/dbt-cloud-apis/discovery-api.md) will contain a subset of data after your account has been migrated. Metadata generated in the past 7 days will be migrated with your account. A maximum of 20 runs will be available when querying the [job object](https://docs.getdbt.com/docs/dbt-cloud-apis/discovery-schema-job.md) after migration. #### Post-migration​[​](#post-migration "Direct link to Post-migration​") Complete all of these items to ensure your dbt resources and jobs will continue working without interruption. Use one of these two URL login options: * `us1.dbt.com.` If you were previously logging in with a username and password at `cloud.getdbt.com`, you should instead plan to log in at us1.dbt.com. The original URL will still work, but you’ll have to click through to be redirected upon login. If you have single sign-on configured, you will use the unique URL listed in the SSO account settings (ex: `ACCOUNT_PREFIX.us1.dbt.com`). * `ACCOUNT_PREFIX.us1.dbt.com`: A unique URL specifically for your account. If you belong to multiple accounts, each will have a unique URL available as long as they have been migrated to multi-cell. Check out [access, regions, and IP addresses](https://docs.getdbt.com/docs/cloud/about-cloud/access-regions-ip-addresses.md) for more information. Remove the following old IP addresses from your firewall and database grants: * `52.45.144.63` * `54.81.134.249` * `52.22.161.231` #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- #### Connect data platform ##### About data platform connections The dbt platform can connect with a variety of data platform providers. Expand the sections below to know the supported data platforms for dbt Core and the dbt Fusion engine: | Connection | Available on Latest | Available on Fusion[Private preview](https://docs.getdbt.com/docs/dbt-versions/product-lifecycles "Go to https://docs.getdbt.com/docs/dbt-versions/product-lifecycles") | | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | [AlloyDB](https://docs.getdbt.com/docs/cloud/connect-data-platform/connect-postgresql-alloydb.md) | ✅ | ❌ | | [Amazon Athena](https://docs.getdbt.com/docs/cloud/connect-data-platform/connect-amazon-athena.md) | ✅ | ❌ | | [Amazon Redshift](https://docs.getdbt.com/docs/cloud/connect-data-platform/connect-redshift.md) | ✅ | ✅ | | [Apache Spark](https://docs.getdbt.com/docs/cloud/connect-data-platform/connect-apache-spark.md) | ✅ | ❌ | | [Azure Synapse Analytics](https://docs.getdbt.com/docs/cloud/connect-data-platform/connect-azure-synapse-analytics.md) | ✅ | ❌ | | [Databricks](https://docs.getdbt.com/docs/cloud/connect-data-platform/connect-databricks.md) | ✅ | ✅ | | [Google BigQuery](https://docs.getdbt.com/docs/cloud/connect-data-platform/connect-bigquery.md) | ✅ | ✅ | | [Microsoft Fabric](https://docs.getdbt.com/docs/cloud/connect-data-platform/connect-microsoft-fabric.md) | ✅ | ❌ | | [PostgreSQL](https://docs.getdbt.com/docs/cloud/connect-data-platform/connect-postgresql-alloydb.md) | ✅ | ❌ | | [Snowflake](https://docs.getdbt.com/docs/cloud/connect-data-platform/connect-snowflake.md) | ✅ | ✅ | | [Starburst or Trino](https://docs.getdbt.com/docs/cloud/connect-data-platform/connect-starburst-trino.md) | ✅ | ❌ | | [Teradata](https://docs.getdbt.com/docs/cloud/connect-data-platform/connect-teradata.md) [Preview](https://docs.getdbt.com/docs/dbt-versions/product-lifecycles "Go to https://docs.getdbt.com/docs/dbt-versions/product-lifecycles") | ✅ | ❌ | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | To connect to your database in dbt: 1. Click your account name at the bottom of the left-side menu and click **Account settings**. 2. Select **Connections** from the top left, and from there, click **New connection**. [![Choose a connection](/img/docs/connect-data-platform/choose-a-connection.png?v=2 "Choose a connection")](#)Choose a connection These connection instructions provide the basic fields required for configuring a data platform connection in dbt. For more detailed guides, which include demo project data, read our [Quickstart guides](https://docs.getdbt.com/guides.md). ##### Supported authentication methods[​](#supported-authentication-methods "Direct link to Supported authentication methods") The following tables show which authentication types are supported for each connection available on the dbt platform: * dbt Core * dbt Fusion | Integration | User credentials | Service account credentials | Warehouse OAuth for users | External OAuth for users | Service-to-service OAuth | SSH | Private connectivity support\*\* | | ----------- | ---------------- | --------------------------- | ------------------------- | ------------------------ | ------------------------ | --- | -------------------------------- | | Snowflake | ✅ | ✅ | ✅ | ✅ | ❌ | ❌ | ✅ | | BigQuery | ✅ | ✅ | ✅ | ❌ | ✅ | ❌ | ✅ | | Databricks | ✅ | ✅ | ✅ | ❌ | ❌ | ❌ | ✅ | | Redshift | ✅ | ❌ | ❌ | ✅ | ❌ | ✅ | ✅ | | Fabric | ✅ | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ | | Synapse | ✅ | ✅ | ❌ | ❌ | ❌ | ❌ | ✅ | | Trino | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | | Teradata | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ | ✅ | | AWS Athena | ✅ | ✅ | ❌ | ❌ | ❌ | ❌ | ✅ | | Postgres | ✅ | ❌ | ❌ | ❌ | ❌ | ✅ | ✅ | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | \*\* Private connectivity is only supported for certain cloud providers and deployment types. See [Private connectivity documentation](https://docs.getdbt.com/docs/cloud/secure/private-connectivity/private-connectivity.md) for details. | Integration | User credentials/token | Service account credentials | Warehouse OAuth for users | External OAuth for users | Service-to-service OAuth | Key/Pair | MFA | SSH | Private connectivity support\*\* | | ----------- | ---------------------- | --------------------------- | ------------------------- | ------------------------ | ------------------------ | -------- | --- | --- | -------------------------------- | | Snowflake | ✅ | ✅ | ✅ | ✅ | ❌ | ✅ | ✅ | ❌ | ✅ | | BigQuery | ✅ | ✅ | ✅ | ✅ | ❌ | ❌ | ❌ | ❌ | ✅ | | Databricks | ✅ | ✅ | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ | ✅ | | Redshift | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ✅ | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | \*\* Private connectivity is only supported for certain cloud providers and deployment types. See [Private connectivity documentation](https://docs.getdbt.com/docs/cloud/secure/private-connectivity/private-connectivity.md) for details. #### Connection management[​](#connection-management "Direct link to Connection management") Warehouse connections are an account-level resource. You can find them under **Accounts settings** > **Connections**. Warehouse connections can be re-used across projects. If multiple projects all connect to the same warehouse, you should re-use the same connection to streamline your management operations. Connections are assigned to a project via an [environment](https://docs.getdbt.com/docs/dbt-cloud-environments.md). [![Connection model](/img/docs/dbt-cloud/cloud-configuring-dbt-cloud/connections-new-model.png?v=2 "Connection model")](#)Connection model As shown in the image, a project with 2 environments can target between 1 and 2 different connections. If you want to separate your production environment from your non-production environment, assign multiple connections to a single project. ##### Migration from project-level connections to account-level connections[​](#migration-from-project-level-connections-to-account-level-connections "Direct link to Migration from project-level connections to account-level connections") Rolling out account-level connections will not require any interruption of service in your current usage (Studio IDE, CLI, jobs, and so on.). Why am I prompted to configure a development environment? If your project did not previously have a development environment, you may be redirected to the project setup page. Your project is still intact. Choose a connection for your new development environment, and you can view all your environments again. However, to fully utilize the value of account-level connections, you may have to rethink how you assign and use connections across projects and environments. [![Typical connection setup post rollout](/img/docs/dbt-cloud/cloud-configuring-dbt-cloud/connections-post-rollout.png?v=2 "Typical connection setup post rollout")](#)Typical connection setup post rollout Please consider the following actions, as the steps you take will depend on the desired outcome. * The initial clean-up of your connection list * Delete unused connections with 0 environments. * Rename connections with a temporary, descriptive naming scheme to better understand where each is used [![Post initial clean-up](/img/docs/dbt-cloud/cloud-configuring-dbt-cloud/connections-post-rollout-2.png?v=2 "Post initial clean-up")](#)Post initial clean-up * Get granular with your connections * Define an intent for each connection, usually a combination of warehouse/database instance, intended use (dev, prod, etc), and administrative surface (which teams/projects will need to collaborate on the connection) * Aim to minimize the need for local overrides (like extended attributes) * Come to a consensus on a naming convention. We recommend you name connections after the server hostname and distinct intent/domain/configuration. It will be easier to reuse connections across projects this way [![Granularity determined](/img/docs/dbt-cloud/cloud-configuring-dbt-cloud/connections-post-rollout-3.png?v=2 "Granularity determined")](#)Granularity determined * Deduplication (connection list + environment details — not touching extended attributes for now) * Based of the granularity of your connection details, determine which connections should remain among groups of duplicates, and update every relevant environment to leverage that connection * Delete unused connections with 0 environments as you go * Deduplicate thoughtfully. If you want connections to be maintained by two different groups of users, you may want to preserve two identical connections to the same warehouse so each can evolve as each group sees fit without impacting the other group * Do not update extended attributes at this stage [![Connections de-duplicated](/img/docs/dbt-cloud/cloud-configuring-dbt-cloud/connections-post-rollout-4.png?v=2 "Connections de-duplicated")](#)Connections de-duplicated * Normalization * Understand how new connections should be created to avoid local overrides. If you currently use extended attributes to override the warehouse instance in your production environment - you should instead create a new connection for that instance, and wire your production environment to it, removing the need for the local overrides * Create new connections, update relevant environments to target these connections, removing now unecessary local overrides (which may not be all of them!) * Test the new wiring by triggering jobs or starting Studio IDE sessions [![Connections normalized](/img/docs/dbt-cloud/cloud-configuring-dbt-cloud/connections-post-rollout-5.png?v=2 "Connections normalized")](#)Connections normalized #### IP Restrictions[​](#ip-restrictions "Direct link to IP Restrictions") dbt will always connect to your data platform from the IP addresses specified in the [Regions & IP addresses](https://docs.getdbt.com/docs/cloud/about-cloud/access-regions-ip-addresses.md) page. Be sure to allow traffic from these IPs in your firewall, and include them in any database grants. Allowing these IP addresses only enables the connection to your data warehouse. However, you might want to send API requests from your restricted network to the dbt API. Using the dbt API requires allowing the `cloud.getdbt.com` subdomain. For more on the dbt architecture, see [Deployment architecture](https://docs.getdbt.com/docs/cloud/about-cloud/architecture.md). #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ##### Connect Amazon Athena Your environment(s) must be on a supported [release track](https://docs.getdbt.com/docs/dbt-versions/cloud-release-tracks.md) to use the Amazon Athena connection. Connect dbt to Amazon's Athena interactive query service to build your dbt project. The following are the required and optional fields for configuring the Athena connection: | Field | Option | Description | Type | Required? | Example | | ----------------------------- | --------------------- | ------------------------------------------------------------------------------------------------ | ------- | --------- | --------------------- | | AWS region name | region\_name | AWS region of your Athena instance | String | Required | eu-west-1 | | Database (catalog) | database | Specify the database (Data catalog) to build models into (lowercase only) | String | Required | awsdatacatalog | | AWS S3 staging directory | s3\_staging\_dir | S3 location to store Athena query results and metadata | String | Required | s3://bucket/dbt/ | | Athena workgroup | work\_group | Identifier of Athena workgroup | String | Optional | my-custom-workgroup | | Athena Spark workgroup | spark\_work\_group | Identifier of Athena Spark workgroup for running Python models | String | Optional | my-spark-workgroup | | AWS S3 data directory | s3\_data\_dir | Prefix for storing tables, if different from the connection's s3\_staging\_dir | String | Optional | s3://bucket2/dbt/ | | AWS S3 data naming convention | s3\_data\_naming | How to generate table paths in s3\_data\_dir | String | Optional | schema\_table\_unique | | AWS S3 temp tables prefix | s3\_tmp\_table\_dir | Prefix for storing temporary tables, if different from the connection's s3\_data\_dir | String | Optional | s3://bucket3/dbt/ | | Poll interval | poll\_interval | Interval in seconds to use for polling the status of query results in Athena | Integer | Optional | 5 | | Query retries | num\_retries | Number of times to retry a failing query | Integer | Optional | 3 | | Boto3 retries | num\_boto3\_retries | Number of times to retry boto3 requests (for example, deleting S3 files for materialized tables) | Integer | Optional | 5 | | Iceberg retries | num\_iceberg\_retries | Number of times to retry iceberg commit queries to fix ICEBERG\_COMMIT\_ERROR | Integer | Optional | 0 | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | ##### Development credentials[​](#development-credentials "Direct link to Development credentials") Enter your *development* (not deployment) credentials with the following fields: | Field | Option | Description | Type | Required | Example | | --------------------- | ------------------------ | -------------------------------------------------------------------------- | ------- | -------- | ---------------------------------------- | | AWS Access Key ID | aws\_access\_key\_id | Access key ID of the user performing requests | String | Required | AKIAIOSFODNN7EXAMPLE | | AWS Secret Access Key | aws\_secret\_access\_key | Secret access key of the user performing requests | String | Required | wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY | | Schema | schema | Specify the schema (Athena database) to build models into (lowercase only) | String | Required | dbt | | Threads | threads | | Integer | Optional | 3 | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ##### Connect Apache Spark If you're using Databricks, use `dbt-databricks` If you're using Databricks, the `dbt-databricks` adapter is recommended over `dbt-spark`. If you're still using dbt-spark with Databricks consider [migrating from the dbt-spark adapter to the dbt-databricks adapter](https://docs.getdbt.com/guides/migrate-from-spark-to-databricks.md). For the Databricks version of this page, refer to [Databricks setup](#databricks-setup). note See [Connect Databricks](#connect-databricks) for the Databricks version of this page. dbt supports connecting to an Apache Spark cluster using the HTTP method or the Thrift method. Note: While the HTTP method can be used to connect to an all-purpose Databricks cluster, the ODBC method is recommended for all Databricks connections. For further details on configuring these connection parameters, please see the [dbt-spark documentation](https://github.com/dbt-labs/dbt-spark#configuring-your-profile). To learn how to optimize performance with data platform-specific configurations in dbt, refer to [Apache Spark-specific configuration](https://docs.getdbt.com/reference/resource-configs/spark-configs.md). The following fields are available when creating an Apache Spark connection using the HTTP and Thrift connection methods: | Field | Description | Examples | | --------------------- | --------------------------------------------------------------- | ----------------------- | | Host Name | The hostname of the Spark cluster to connect to | `yourorg.sparkhost.com` | | Port | The port to connect to Spark on | 443 | | Organization | Optional (default: 0) | 0123456789 | | Cluster | The ID of the cluster to connect to | 1234-567890-abc12345 | | Connection Timeout | Number of seconds after which to timeout a connection | 10 | | Connection Retries | Number of times to attempt connecting to cluster before failing | 10 | | User | Optional | dbt\_cloud\_user | | Auth | Optional, supply if using Kerberos | `KERBEROS` | | Kerberos Service Name | Optional, supply if using Kerberos | `hive` | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | [![Configuring a Spark connection](/img/docs/dbt-cloud/cloud-configuring-dbt-cloud/spark-connection.png?v=2 "Configuring a Spark connection")](#)Configuring a Spark connection #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ##### Connect Azure Synapse Analytics #### Supported authentication methods[​](#supported-authentication-methods "Direct link to Supported authentication methods") The supported authentication methods are: * Microsoft Entra ID service principal * Active Directory password * SQL server authentication ##### Microsoft Entra ID service principal[​](#microsoft-entra-id-service-principal "Direct link to Microsoft Entra ID service principal") The following are the required fields for setting up a connection with Azure Synapse Analytics using Microsoft Entra ID service principal authentication. | Field | Description | | ------------------ | ------------------------------------------------------------------------------------------------------------------------------- | | **Server** | The service principal's **Synapse host name** value (without the trailing string `, 1433`) for the Synapse test endpoint. | | **Port** | The port to connect to Azure Synapse Analytics. You can use `1433` (the default), which is the standard SQL server port number. | | **Database** | The service principal's **database** value for the Synapse test endpoint. | | **Authentication** | Choose **Service Principal** from the dropdown. | | **Tenant ID** | The service principal's **Directory (tenant) ID**. | | **Client ID** | The service principal's **application (client) ID id**. | | **Client secret** | The service principal's **client secret** (not the **client secret id**). | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | ##### Active Directory password[​](#active-directory-password "Direct link to Active Directory password") The following are the required fields for setting up a connection with Azure Synapse Analytics using Active Directory password authentication. | Field | Description | | ------------------ | ------------------------------------------------------------------------------------------------ | | **Server** | The server hostname to connect to Azure Synapse Analytics. | | **Port** | The server port. You can use `1433` (the default), which is the standard SQL server port number. | | **Database** | The database name. | | **Authentication** | Choose **Active Directory Password** from the dropdown. | | **User** | The AD username. | | **Password** | The AD username's password. | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | ##### SQL server authentication[​](#sql-server-authentication "Direct link to SQL server authentication") The following are the required fields for setting up a connection with Azure Synapse Analytics using SQL server authentication. | Field | Description | | ------------------ | ------------------------------------------------------------------------------------------------ | | **Server** | The server hostname or IP to connect to Azure Synapse Analytics. | | **Port** | The server port. You can use `1433` (the default), which is the standard SQL server port number. | | **Database** | The database name. | | **Authentication** | Choose **SQL** from the dropdown. | | **User** | The username. | | **Password** | The username's password. | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | #### Configuration[​](#configuration "Direct link to Configuration") To learn how to optimize performance with data platform-specific configurations in dbt, refer to [Microsoft Azure Synapse DWH configurations](https://docs.getdbt.com/reference/resource-configs/azuresynapse-configs.md). #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ##### Connect BigQuery Fusion compatible #### Required permissions[​](#required-permissions "Direct link to Required permissions") dbt user accounts need the following permissions to read from and create tables and views in a BigQuery project: * BigQuery Data Editor * BigQuery User For BigQuery with dbt Fusion engine, users also need: * BigQuery Read Session User (for Storage Read API access) For BigQuery DataFrames, users need these additional permissions: * BigQuery Job User * BigQuery Read Session User * Notebook Runtime User * Code Creator * colabEnterpriseUser #### Authentication[​](#authentication "Direct link to Authentication") dbt supports different authentication methods depending on your environment and plan type: * Development environments support: * Service JSON * BigQuery OAuth [Enterprise](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing") * Deployment environments support: * Service JSON * BigQuery Workload Identity Federation (WIF) [Enterprise](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing") These authentication methods are set up in the [global connections account settings](https://docs.getdbt.com/docs/cloud/connect-data-platform/about-connections.md), rather than single sign-on or integration settings. When you create a new BigQuery connection, you will be presented with two schema options for the connection (both use the same adapter): * **BigQuery:** Supports all connection types (Use this option) * **BigQuery (Legacy):**  Supports all connection types except for WIF (Deprecated feature. Do not use.) All new connections should use the **BigQuery** option as **BigQuery (Legacy)** will be deprecated. To update existing connections and credentials in an environment to use the new BigQuery option, first, use the [APIs](https://docs.getdbt.com/docs/dbt-cloud-apis/admin-cloud-api.md) to remove the configurations. ##### JSON keyfile[​](#json-keyfile "Direct link to JSON keyfile") Uploading a service account JSON keyfile While the fields in a BigQuery connection can be entered manually, we recommend uploading a service account JSON keyfile to quickly and accurately configure a connection to BigQuery. You can provide the JSON keyfile in one of two formats: * JSON keyfile upload — Upload the keyfile directly using its normal JSON format. * Base64-encoded string — Provide the keyfile as a base64-encoded string. When you provide a base64-encoded string, dbt decodes it automatically and populates the necessary fields. The JSON keyfile option is available for configuring both **development** and **deployment** environments. Uploading a valid JSON keyfile will populate the following fields: * Project ID * Private key ID * Private key * Client email * Client ID * Auth URI * Token URI * Auth provider x509 cert url * Client x509 cert url In addition to these fields, two other optional fields can be configured in a BigQuery connection: | Field | Description | Examples | | -------- | ------------------------------------------------------------------------------------------------------------ | ---------- | | Timeout | Deprecated; exists for backwards compatibility with older versions of dbt and will be removed in the future. | `300` | | Location | The [location](https://cloud.google.com/bigquery/docs/locations) where dbt should create datasets. | `US`, `EU` | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | [![A valid BigQuery connection](/img/docs/dbt-cloud/cloud-configuring-dbt-cloud/bigquery-connection.png?v=2 "A valid BigQuery connection")](#)A valid BigQuery connection ##### BigQuery OAuth[​](#bigquery-oauth "Direct link to BigQuery OAuth") **Available in:** Development environments, Enterprise-tier plans only The OAuth auth method permits dbt to run queries on behalf of a BigQuery user or workload without storing the BigQuery service account keyfile in dbt. However, the JSON must still be provided, or fields must be manually filled out to complete the configuration in dbt Cloud. Those values do not have to be real for this bypass to work (for example, they can be `N/A`). For more information on the initial configuration of a BigQuery OAuth connection in dbt>, please see [the docs on setting up BigQuery OAuth](https://docs.getdbt.com/docs/cloud/manage-access/set-up-bigquery-oauth.md). As an end user, if your organization has set up BigQuery OAuth, you can link a project with your personal BigQuery account in your Profile in dbt. ##### BigQuery Workload Identity Federation [Enterprise](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing")[Preview](https://docs.getdbt.com/docs/dbt-versions/product-lifecycles "Go to https://docs.getdbt.com/docs/dbt-versions/product-lifecycles")[​](#bigquery-workload-identity-federation- "Direct link to bigquery-workload-identity-federation-") note If you're using BigQuery WIF, we recommend using it with BigQuery OAuth. Otherwise, you must create two connections - one with service JSON and one with WIF to use service JSON for development environments. **Available in:** Deployment environments The BigQuery WIF auth method permits dbt to run deployment queries as a service account without configuring a BigQuery service account keyfile in dbt. For more information on the initial configuration of a BigQuery WIF connection in dbt, refer to [Set up BigQuery Workload Identity Federation](https://docs.getdbt.com/docs/cloud/manage-access/set-up-bigquery-oauth.md#set-up-bigquery-workload-identity-federation). #### Configuration[​](#configuration "Direct link to Configuration") To learn how to optimize performance with data platform-specific configurations in dbt, refer to [BigQuery-specific configuration](https://docs.getdbt.com/reference/resource-configs/bigquery-configs.md). ##### Optional configurations[​](#optional-configurations "Direct link to Optional configurations") In BigQuery, optional configurations let you tailor settings for tasks such as query priority, dataset location, job timeout, and more. These options give you greater control over how BigQuery functions behind the scenes to meet your requirements. To customize your optional configurations in dbt: 1. Click your account name at the bottom left-hand menu and go to **Account settings** > **Projects**. 2. Select your BigQuery project. 3. Go to **Development connection** and select **BigQuery**. 4. Click **Edit** and then scroll down to **Optional settings**. [![BigQuery optional configuration](/img/bigquery/bigquery-optional-config.png?v=2 "BigQuery optional configuration")](#)BigQuery optional configuration The following are the optional configurations you can set in dbt: | Configuration | Information | Type | Example | | ------------------------------------------------------------- | ------------------------------------------------------------------------------------------------- | ------- | ------------------------------------------------- | | [Priority](#priority) | Sets the priority for BigQuery jobs (either `interactive` or queued for `batch` processing) | String | `batch` or `interactive` | | [Retries](#retries) | Specifies the number of retries for failed jobs due to temporary issues | Integer | `3` | | [Location](#location) | Location for creating new datasets | String | `US`, `EU`, `us-west2` | | [Maximum bytes billed](#maximum-bytes-billed) | Limits the maximum number of bytes that can be billed for a query | Integer | `1000000000` | | [Execution project](#execution-project) | Specifies the project ID to bill for query execution | String | `my-project-id` | | [Impersonate service account](#impersonate-service-account) | Allows users authenticated locally to access BigQuery resources under a specified service account | String | `service-account@project.iam.gserviceaccount.com` | | [Job retry deadline seconds](#job-retry-deadline-seconds) | Sets the total number of seconds BigQuery will attempt to retry a job if it fails | Integer | `600` | | [Job creation timeout seconds](#job-creation-timeout-seconds) | Specifies the maximum timeout for the job creation step | Integer | `120` | | [Google cloud storage-bucket](#google-cloud-storage-bucket) | Location for storing objects in Google Cloud Storage | String | `my-bucket` | | [Dataproc region](#dataproc-region) | Specifies the cloud region for running data processing jobs | String | `US`, `EU`, `asia-northeast1` | | [Dataproc cluster name](#dataproc-cluster-name) | Assigns a unique identifier to a group of virtual machines in Dataproc | String | `my-cluster` | | [Notebook Template ID](#notebook-template-id) | Unique identifier to a Colab Enterprise notebook runtime | Integer | `7018811640745295872` | | [Compute Region](#compute-region) | Assigns a unique identifier to a group of virtual machines in Dataproc | String | `US`, `EU`, `asia-northeast1` | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | |  Priority The `priority` for the BigQuery jobs that dbt executes can be configured with the `priority` configuration in your BigQuery profile. The priority field can be set to one of `batch` or `interactive`. For more information on query priority, consult the [BigQuery documentation](https://cloud.google.com/bigquery/docs/running-queries).  Retries Retries in BigQuery help to ensure that jobs complete successfully by trying again after temporary failures, making your operations more robust and reliable.  Location The `location` of BigQuery datasets can be set using the `location` setting in a BigQuery profile. As per the [BigQuery documentation](https://cloud.google.com/bigquery/docs/locations), `location` may be either a multi-regional location (for example, `EU`, `US`), or a regional location (like `us-west2`).  Maximum bytes billed When a `maximum_bytes_billed` value is configured for a BigQuery profile, that allows you to limit how much data your query can process. It’s a safeguard to prevent your query from accidentally processing more data than you expect, which could lead to higher costs. Queries executed by dbt will fail if they exceed the configured maximum bytes threshhold. This configuration should be supplied as an integer number of bytes. If your `maximum_bytes_billed` is 1000000000, you would enter that value in the `maximum_bytes_billed` field in dbt.  Execution project By default, dbt will use the specified `project`/`database` as both: 1. The location to materialize resources (models, seeds, snapshots, and so on), unless they specify a custom project/database config 2. The GCP project that receives the bill for query costs or slot usage Optionally, you may specify an execution project to bill for query execution, instead of the project/database where you materialize most resources.  Impersonate service account This feature allows users authenticating using local OAuth to access BigQuery resources based on the permissions of a service account. For a general overview of this process, see the official docs for [Creating Short-lived Service Account Credentials](https://cloud.google.com/iam/docs/create-short-lived-credentials-direct).  Job retry deadline seconds Job retry deadline seconds is the maximum amount of time BigQuery will spend retrying a job before it gives up.  Job creation timeout seconds Job creation timeout seconds is the maximum time BigQuery will wait to start the job. If the job doesn’t start within that time, it times out. From dbt Core v1.10, the BigQuery adapter cancels BigQuery jobs that exceed their configured timeout by sending a cancellation request. If the request succeeds, dbt stops the job. If the request fails, the BigQuery job may keep running in the background until it finishes or you cancel it manually. ###### Run dbt python models on Google Cloud Platform[​](#run-dbt-python-models-on-google-cloud-platform "Direct link to Run dbt python models on Google Cloud Platform") To run dbt Python models on GCP, dbt uses companion services, Dataproc and Cloud Storage, that offer tight integrations with BigQuery. You may use an existing Dataproc cluster and Cloud Storage bucket, or create new ones: * *  Google cloud storage bucket Everything you store in Cloud Storage must be placed inside a [bucket](https://cloud.google.com/storage/docs/buckets). Buckets help you organize your data and manage access to it.  Dataproc region A designated location in the cloud where you can run your data processing jobs efficiently. This region must match the location of your BigQuery dataset if you want to use Dataproc with BigQuery to ensure data doesn't move across regions, which can be inefficient and costly. For more information on [Dataproc regions](https://cloud.google.com/bigquery/docs/locations), refer to the BigQuery documentation.  Dataproc cluster name A unique label you give to your group of virtual machines to help you identify and manage your data processing tasks in the cloud. When you integrate Dataproc with BigQuery, you need to provide the cluster name so BigQuery knows which specific set of resources (the cluster) to use for running the data jobs. Have a look at [Dataproc's document on Create a cluster](https://cloud.google.com/dataproc/docs/guides/create-cluster) for an overview on how clusters work.  Notebook Template ID The unique identifier associated with a specific Colab notebook, which acts are the python runtime for BigQuery DataFrames.  Compute Region This region must match the location of your BigQuery dataset if you want to use BigQuery DataFrames, ensure the Colab runtime is also within the same region. ##### Account level connections and credential management[​](#account-level-connections-and-credential-management "Direct link to Account level connections and credential management") You can re-use connections across multiple projects with [global connections](https://docs.getdbt.com/docs/cloud/connect-data-platform/about-connections.md#migration-from-project-level-connections-to-account-level-connections). Connections are attached at the environment level (formerly project level), so you can use multiple connections inside of a single project (to handle dev, staging, production, and more). BigQuery connections in dbt currently expect the credentials to be handled at the connection level (and only BigQuery connections). This was originally designed to facilitate creating a new connection by uploading a service account keyfile. This describes how to override credentials at the environment level, via [extended attributes](https://docs.getdbt.com/docs/dbt-cloud-environments.md#extended-attributes), *to allow project administrators to manage credentials independently* of the account level connection details used for that environment. For a project, you will first create an environment variable to store the secret `private_key` value. Then, you will use extended attributes to override the entire service account JSON (you can't only override the secret key due to a constraint of extended attributes). 1. **New environment variable** * Create a new *secret* [environment variable](https://docs.getdbt.com/docs/build/environment-variables.md#handling-secrets) to handle the private key: `DBT_ENV_SECRET_PROJECTXXX_PRIVATE_KEY` * Fill in the private key value according the environment To automate your deployment, use the following [admin API request](https://docs.getdbt.com/dbt-cloud/api-v3#/operations/Create%20Projects%20Environment%20Variables%20Bulk), with `XXXXX` your account number, `YYYYY` your project number, `ZZZZZ` your [API token](https://docs.getdbt.com/docs/dbt-cloud-apis/authentication.md): ```shell curl --request POST \ --url https://cloud.getdbt.com/api/v3/accounts/XXXXX/projects/YYYYY/environment-variables/bulk/ \ --header 'Accept: application/json' \ --header 'Authorization: Bearer ZZZZZ' \ --header 'Content-Type: application/json' \ --data '{ "env_var": [ { "new_name": "DBT_ENV_SECRET_PROJECTXXX_PRIVATE_KEY", "project": "Value by default for the entire project", "ENVIRONMENT_NAME_1": "Optional, if wanted, value for environment name 1", "ENVIRONMENT_NAME_2": "Optional, if wanted, value for environment name 2" } ] }' ``` 2. **Extended attributes** In the environment details, complete the [extended attributes](https://docs.getdbt.com/docs/dbt-cloud-environments.md#extended-attributes) block with the following payload (replacing `XXX` with your corresponding information): ```yaml keyfile_json: type: service_account project_id: xxx private_key_id: xxx private_key: '{{ env_var(''DBT_ENV_SECRET_PROJECTXXX_PRIVATE_KEY'') }}' client_email: xxx client_id: xxx auth_uri: xxx token_uri: xxx auth_provider_x509_cert_url: xxx client_x509_cert_url: xxx ``` If you require [other fields](https://docs.getdbt.com/docs/local/connect-data-platform/bigquery-setup.md#service-account-json) to be overridden at the environment level via extended attributes, please respect the [expected indentation](https://docs.getdbt.com/docs/dbt-cloud-environments.md#only-the-top-level-keys-are-accepted-in-extended-attributes) (ordering doesn't matter): ```yaml priority: interactive keyfile_json: type: xxx project_id: xxx private_key_id: xxx private_key: '{{ env_var(''DBT_ENV_SECRET_PROJECTXXX_PRIVATE_KEY'') }}' client_email: xxx client_id: xxx auth_uri: xxx token_uri: xxx auth_provider_x509_cert_url: xxx client_x509_cert_url: xxx execution_project: buck-stops-here-456 ``` To automate your deployment, you first need to [create the extended attributes payload](https://docs.getdbt.com/dbt-cloud/api-v3#/operations/Create%20Extended%20Attributes) for a given project, and then [assign it](https://docs.getdbt.com/dbt-cloud/api-v3#/operations/Update%20Environment) to a specific environment. With `XXXXX` as your account number, `YYYYY` as your project number, and `ZZZZZ` as your [API token](https://docs.getdbt.com/docs/dbt-cloud-apis/authentication.md): ```shell curl --request POST \ --url https://cloud.getdbt.com/api/v3/accounts/XXXXX/projects/YYYYY/extended-attributes/ \ --header 'Accept: application/json' \ --header 'Authorization: Bearer ZZZZZ' \ --header 'Content-Type: application/json' \ --data '{ "id": null, "extended_attributes": {"type":"service_account","project_id":"xxx","private_key_id":"xxx","private_key":"{{ env_var('DBT_ENV_SECRET_PROJECTXXX_PRIVATE_KEY') }}","client_email":"xxx","client_id":xxx,"auth_uri":"https://accounts.google.com/o/oauth2/auth","token_uri":"https://oauth2.googleapis.com/token","auth_provider_x509_cert_url":"https://www.googleapis.com/oauth2/v1/certs","client_x509_cert_url":"xxx"}, "state": 1 }' ``` *Make a note of the `id` returned in the message.* It will be used in the following call. With `EEEEE` the environment id, `FFFFF` the extended attributes id: ```shell curl --request POST \ --url https://cloud.getdbt.com/api/v3/accounts/XXXXX/projects/YYYYY/environments/EEEEE/ \ --header 'Accept: application/json' \ --header 'Authorization: Bearer ZZZZZZ' \ --header 'Content-Type: application/json' \ --data '{ "extended_attributes_id": FFFFF }' ``` #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ##### Connect Databricks Fusion compatible The dbt-databricks adapter is maintained by the Databricks team. The Databricks team is committed to supporting and improving the adapter over time, so you can be sure the integrated experience will provide the best of dbt and the best of Databricks. Connecting to Databricks via dbt-spark has been deprecated. #### About the dbt-databricks adapter[​](#about-the-dbt-databricks-adapter "Direct link to About the dbt-databricks adapter") dbt-databricks is compatible with the following versions of dbt Core in dbt with varying degrees of functionality. | Feature | dbt Versions | | -------------- | -------------------------------------- | | dbt-databricks | Available starting with dbt 1.0 in dbt | | Unity Catalog | Available starting with dbt 1.1 | | Python models | Available starting with dbt 1.3 | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | The dbt-databricks adapter offers: * **Easier set up** * **Better defaults:** The dbt-databricks adapter is more opinionated, guiding users to an improved experience with less effort. Design choices of this adapter include defaulting to Delta format, using merge for incremental models, and running expensive queries with Photon. * **Support for Unity Catalog:** Unity Catalog allows Databricks users to centrally manage all data assets, simplifying access management and improving search and query performance. Databricks users can now get three-part data hierarchies – catalog, schema, model name – which solves a longstanding friction point in data organization and governance. To learn how to optimize performance with data platform-specific configurations in dbt, refer to [Databricks-specific configuration](https://docs.getdbt.com/reference/resource-configs/databricks-configs.md). To grant users or roles database permissions (access rights and privileges), refer to the [example permissions](https://docs.getdbt.com/reference/database-permissions/databricks-permissions.md) page. To set up the Databricks connection, supply the following fields: | Field | Description | Examples | | --------------- | -------------------------------------------------------- | -------------------------------------- | | Server Hostname | The hostname of the Databricks account to connect to | dbc-a2c61234-1234.cloud.databricks.com | | HTTP Path | The HTTP path of the Databricks cluster or SQL warehouse | /sql/1.0/warehouses/1a23b4596cd7e8fg | | Catalog | Name of Databricks Catalog (optional) | Production | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | [![Configuring a Databricks connection using the dbt-databricks adapter](/img/docs/dbt-cloud/cloud-configuring-dbt-cloud/dbt-databricks.png?v=2 "Configuring a Databricks connection using the dbt-databricks adapter")](#)Configuring a Databricks connection using the dbt-databricks adapter #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ##### Connect Microsoft Fabric #### Supported authentication methods[​](#supported-authentication-methods "Direct link to Supported authentication methods") The supported authentication methods are: * Microsoft Entra service principal * Microsoft Entra password SQL password (LDAP) is not supported in Microsoft Fabric Data Warehouse so you must use Microsoft Entra ID. This means that to use [Microsoft Fabric](https://www.microsoft.com/en-us/microsoft-fabric) in dbt, you will need at least one Microsoft Entra service principal to connect dbt to Fabric, ideally one service principal for each user. ##### Microsoft Entra service principal[​](#microsoft-entra-service-principal "Direct link to Microsoft Entra service principal") The following are the required fields for setting up a connection with a Microsoft Fabric using Microsoft Entra service principal authentication. | Field | Description | | ------------------ | ------------------------------------------------------------------------------------------------------------------------ | | **Server** | The service principal's **host** value for the Fabric test endpoint. | | **Port** | The port to connect to Microsoft Fabric. You can use `1433` (the default), which is the standard SQL server port number. | | **Database** | The service principal's **database** value for the Fabric test endpoint. | | **Authentication** | Choose **Service Principal** from the dropdown. | | **Tenant ID** | The service principal's **Directory (tenant) ID**. | | **Client ID** | The service principal's **application (client) ID id**. | | **Client secret** | The service principal's **client secret** (not the **client secret id**). | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | ##### Microsoft Entra password[​](#microsoft-entra-password "Direct link to Microsoft Entra password") The following are the required fields for setting up a connection with a Microsoft Fabric using Microsoft Entra password authentication. | Field | Description | | ------------------ | ------------------------------------------------------------------------------------------------ | | **Server** | The server hostname to connect to Microsoft Fabric. | | **Port** | The server port. You can use `1433` (the default), which is the standard SQL server port number. | | **Database** | The database name. | | **Authentication** | Choose **Active Directory Password** from the dropdown. | | **User** | The Microsoft Entra username. | | **Password** | The Microsoft Entra password. | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | #### Configuration[​](#configuration "Direct link to Configuration") To learn how to optimize performance with data platform-specific configurations in dbt, refer to [Microsoft Fabric Data Warehouse configurations](https://docs.getdbt.com/reference/resource-configs/fabric-configs.md). #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ##### Connect Onehouse dbt supports connecting to [Onehouse SQL](https://www.onehouse.ai/product/quanton) using the Apache Spark Connector with the Thrift method. note Connect to a Onehouse SQL Cluster with the [dbt-spark](https://docs.getdbt.com/docs/cloud/connect-data-platform/connect-apache-spark.md) adapter.\*\* #### Requirements[​](#requirements "Direct link to Requirements") * For dbt, ensure your Onehouse SQL endpoint is accessible via external DNS/IP, whitelisting dbt IPs. #### What works[​](#what-works "Direct link to What works") * All dbt Commands, including: `dbt clean`, `dbt compile`, `dbt debug`, `dbt seed`, and `dbt run`. * dbt materializations: `table` and `incremental` * Apache Hudi table types of Merge on Read (MoR) and Copy on Write (CoW). It is recommended to use MoR for mutable workloads. #### Limitations[​](#limitations "Direct link to Limitations") * Views are not supported * `dbt seed` has row / record limits. * `dbt seed` only supports Copy on Write tables. #### dbt connection[​](#dbt-connection "Direct link to dbt connection") Fill in the following fields when creating an **Apache Spark** warehouse connection using the Thrift connection method: | Field | Description | Examples | | --------------------- | --------------------------------------------------------------- | --------------------------- | | Method | The method for connecting to Spark | Thrift | | Hostname | The hostname of your Onehouse SQL Cluster endpoint | `yourProject.sparkHost.com` | | Port | The port to connect to Spark on | 10000 | | Cluster | Onehouse does not use this field | | | Connection Timeout | Number of seconds after which to timeout a connection | 10 | | Connection Retries | Number of times to attempt connecting to cluster before failing | 0 | | Organization | Onehouse does not use this field | | | User | Optional. Not enabled by default. | dbt\_cloud\_user | | Auth | Optional, supply if using Kerberos. Not enabled by default. | `KERBEROS` | | Kerberos Service Name | Optional, supply if using Kerberos. Not enabled by default. | `hive` | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | [![Onehouse configuration](/img/onehouse/onehouse-dbt.png?v=2 "Onehouse configuration")](#)Onehouse configuration #### dbt project[​](#dbt-project "Direct link to dbt project") We recommend that you set default configurations on the dbt\_project.yml to ensure that the adapter executes with Onehouse compatible sql | Field | Description | Required | Default | Recommended | | ----------------- | ----------------------------------------------------- | -------- | --------------------- | ------------------------------ | | materialized | materialization the project/directory will default to | Yes | without input, `view` | `table` | | file\_format | table format the project will default to | Yes | N/A | hudi | | location\_root | Location of the database in DFS | Yes | N/A | `` | | hoodie.table.type | Merge on Read or Copy on Write | No | cow | mor | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | dbt\_project.yml template ```yml +materialized: table | incremental +file_format: hudi +location_root: +tblproperties: hoodie.table.type: mor | cow ``` A dbt\_project.yml example if using jaffle shop would be ```sql models: jaffle_shop: +file_format: hudi +location_root: s3://lakehouse/demolake/dbt_ecomm/ +tblproperties: hoodie.table.type: mor staging: +materialized: incremental marts: +materialized: table ``` #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ##### Connect PostgreSQL, Lakebase and AlloyDB dbt platform supports connecting to PostgresSQL and Postgres-compatible databases (AlloyDB, Lakebase). The following fields are required when creating a connection: | Field | Description | Examples | | --------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | --------------------------------------- | | Host Name | The hostname of the database to connect to. This can either be a hostname or an IP address. Refer to the [set up pages](https://docs.getdbt.com/docs/local/connect-data-platform/about-dbt-connections.md) for adapter-specific details. | Postgres: `xxx.us-east-1.amazonaws.com` | | Port | Usually 5432 | `5439` | | Database | The logical database to connect to and run queries against. | `analytics` | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | **Note**: When you set up a Postgres connection in dbt, SSL-related parameters aren't available as inputs. [![Configuring a Postgres connection](/img/docs/dbt-cloud/cloud-configuring-dbt-cloud/postgres-redshift-connection.png?v=2 "Configuring a Postgres connection")](#)Configuring a Postgres connection ##### Authentication Parameters[​](#authentication-parameters "Direct link to Authentication Parameters") For authentication, dbt users can use **Database username and password** for Postgres and Postgres-compatible databases. For more information on what is supported, check out the database specific setup page for limitations and helpful tips. The following table contains the parameters for the database (password-based) connection method. | Field | Description | Examples | | ---------- | ----------------------------------------- | ------------ | | `user` | Account username to log into your cluster | myuser | | `password` | Password for authentication | \_password1! | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | ##### Connecting using an SSH Tunnel[​](#connecting-using-an-ssh-tunnel "Direct link to Connecting using an SSH Tunnel") Use an SSH tunnel when your Postgres or AlloyDB instance is not publicly accessible and must be reached through a [bastion server](https://docs.getdbt.com/docs/cloud/connect-data-platform/connect-redshift.md#about-the-bastion-server-in-aws). When enabled, dbt platform connects to your database by first establishing a secure connection to the bastion host, which then forwards traffic to your database. To configure a connection using an SSH tunnel: 1. Navigate to **Account settings** (by clicking on your account name in the left side menu) and select **Connections**. 2. Select an existing connection to edit it, or click **+ New connection**. 3. In **Connection settings**, ensure **SSH Tunnel Enabled** is checked. 4. Enter the hostname, username, and port for the bastion server. [![A public key is generated after saving](/img/docs/dbt-cloud/cloud-configuring-dbt-cloud/postgres-redshift-ssh-tunnel.png?v=2 "A public key is generated after saving")](#)A public key is generated after saving 5. Click **Save**. dbt platform generates and displays a public key. 6. Copy the newly generated public key to the bastion server and add it to the server’s `authorized_keys` file to authorize dbt platform to connect through the bastion host. If the new key is not added, the SSH tunnel connection will fail. important Each time you create and save a new SSH tunnel connection, dbt platform generates a unique SSH key pair, even when the connection details are identical to an existing connection. ###### About the Bastion server in AWS[​](#about-the-bastion-server-in-aws "Direct link to About the Bastion server in AWS") What is a bastion server? A bastion server in [Amazon Web Services (AWS)](https://aws.amazon.com/blogs/security/how-to-record-ssh-sessions-established-through-a-bastion-host/) is a host that allows dbt to open an SSH connection.
dbt only sends queries and doesn't transmit large data volumes. This means the bastion server can run on an AWS instance of any size, like a t2.small instance or t2.micro.

Make sure the location of the instance is the same Virtual Private Cloud (VPC) as the Postgres instance, and configure the security group for the bastion server to ensure that it's able to connect to the warehouse port. ###### Configuring the Bastion Server in AWS[​](#configuring-the-bastion-server-in-aws "Direct link to Configuring the Bastion Server in AWS") To configure the SSH tunnel in dbt, you'll need to provide the hostname/IP of your bastion server, username, and port, of your choosing, that dbt will connect to. Review the following steps: 1. Verify the bastion server has its network security rules set up to accept connections from the [dbt IP addresses](https://docs.getdbt.com/docs/cloud/about-cloud/access-regions-ip-addresses.md) on whatever port you configured. 2. Set up the user account by using the bastion servers instance's CLI, The following example uses the username `dbtcloud`: ```shell sudo groupadd dbtcloud sudo useradd -m -g dbtcloud dbtcloud sudo su - dbtcloud mkdir ~/.ssh chmod 700 ~/.ssh touch ~/.ssh/authorized_keys chmod 600 ~/.ssh/authorized_keys ``` 3. Copy and paste the dbt generated public key, into the authorized\_keys file. The bastion server should now be ready for dbt to use as a tunnel into the Postgres environment. #### Configuration[​](#configuration "Direct link to Configuration") To grant users or roles database permissions (access rights and privileges), refer to the [Postgres permissions](https://docs.getdbt.com/reference/database-permissions/postgres-permissions.md) page. #### FAQs[​](#faqs "Direct link to FAQs")  Database Error - could not connect to server: Connection timed out When setting up a database connection using an SSH tunnel, you need the following components: * A load balancer (like ELB or NLB) to manage traffic. * A bastion host (or jump server) that runs the SSH protocol, acting as a secure entry point. * The database itself (such as a Postgres cluster). dbt uses an SSH tunnel to connect through the load balancer to the database. This connection is established at the start of any dbt job run. If the tunnel connection drops, the job fails. Tunnel failures usually happen because: * The SSH daemon times out if it's idle for too long. * The load balancer cuts off the connection if it's idle. * dbt tries to keep the connection alive by checking in every 30 seconds, and the system will end the connection if there's no response from the SSH service after 300 seconds. This helps avoid drops due to inactivity unless the Load Balancer's timeout is less than 30 seconds. Bastion hosts might have additional SSH settings to disconnect inactive clients after several checks without a response. By default, it checks three times. To prevent premature disconnections, you can adjust the settings on the bastion host: * `ClientAliveCountMax `— Configures the number of checks before deciding the client is inactive. For example, `ClientAliveCountMax 10` checks 10 times. * `ClientAliveInterval` — Configures when to check for client activity. For example, `ClientAliveInterval 30` checks every 30 seconds. The example adjustments ensure that inactive SSH clients are disconnected after about 300 seconds, reducing the chance of tunnel failures. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ##### Connect Redshift Fusion compatible dbt platform supports connecting to Redshift. The following fields are required when creating a connection: | Field | Description | Examples | | --------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | -------------------------------------------------- | | Host Name | The hostname of the database to connect to. This can either be a hostname or an IP address. Refer to [set up pages](https://docs.getdbt.com/docs/local/connect-data-platform/about-dbt-connections.md) to find the hostname for your adapter. | Redshift: `hostname.region.redshift.amazonaws.com` | | Port | Usually 5439 (Redshift) | `5439` | | Database | The logical database to connect to and run queries against. | `analytics` | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | **Note**: When you set up a Redshift connection in dbt, SSL-related parameters aren't available as inputs. [![Configuring a Redshift connection](/img/docs/dbt-cloud/cloud-configuring-dbt-cloud/postgres-redshift-connection.png?v=2 "Configuring a Redshift connection")](#)Configuring a Redshift connection ##### Authentication Parameters[​](#authentication-parameters "Direct link to Authentication Parameters") See the following supported authentication methods for Redshift: * Username and password * SSH tunneling * Identity Center via [external Oauth](https://docs.getdbt.com/docs/cloud/manage-access/redshift-external-oauth.md) * IAM User authentication via [extended attributes](https://docs.getdbt.com/docs/dbt-cloud-environments.md#extended-attributes) On the dbt platform, the IAM user authentication is currently only supported via [extended attributes](https://docs.getdbt.com/docs/dbt-cloud-environments.md#extended-attributes). Once the project is created, development and deployment environments can be updated to use extended attributes to pass the fields described below, as some are not supported via textbox. You will need to create an IAM User, generate an [access key](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_credentials_access-keys.html#Using_CreateAccessKey), and either: * on a cluster, a database user is expected in the `user` field. The IAM user is only leveraged for authentication, the database user for authorization * on Serverless, grant permission to the IAM user in Redshift. The `user` field is ignored (but still required) * For both, the `password` field will be ignored. | Profile field | Example | Description | | ------------------- | ------------------- | -------------------------------------------------------------------------------------------------------------------------------- | | `method` | IAM | use IAM to authenticate via IAM User authentication | | `cluster_id` | CLUSTER\_ID | Required for IAM authentication only for provisoned cluster, not for Serverless | | `user` | username | User querying the database, ignored for Serverless (but still required) | | `region` | us-east-1 | Region of your Redshift instance | | `access_key_id` | ACCESS\_KEY\_ID | IAM user [access key id](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_credentials_access-keys.html#Using_CreateAccessKey) | | `secret_access_key` | SECRET\_ACCESS\_KEY | IAM user secret access key | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | |
###### Example Extended Attributes for IAM User on Redshift Serverless[​](#example-extended-attributes-for-iam-user-on-redshift-serverless "Direct link to Example Extended Attributes for IAM User on Redshift Serverless") To avoid pasting secrets in extended attributes, leverage [environment variables](https://docs.getdbt.com/docs/build/environment-variables.md#handling-secrets): \~/.dbt/profiles.yml ```yaml host: my-production-instance.myregion.redshift-serverless.amazonaws.com method: iam region: us-east-2 access_key_id: '{{ env_var(''DBT_ENV_ACCESS_KEY_ID'') }}' secret_access_key: '{{ env_var(''DBT_ENV_SECRET_ACCESS_KEY'') }}' ``` Both `DBT_ENV_ACCESS_KEY_ID` and `DBT_ENV_SECRET_ACCESS_KEY` will need [to be assigned](https://docs.getdbt.com/docs/build/environment-variables.md) for every environment leveraging extended attributes as such. ##### Connecting using an SSH Tunnel[​](#connecting-using-an-ssh-tunnel "Direct link to Connecting using an SSH Tunnel") Use an SSH tunnel when your Redshift instance is not publicly accessible and must be reached through a [bastion server](https://docs.getdbt.com/docs/cloud/connect-data-platform/connect-redshift.md#about-the-bastion-server-in-aws). When enabled, dbt platform connects to your database by first establishing a secure connection to the bastion host, which then forwards traffic to your database. To configure a connection using an SSH tunnel: 1. Navigate to **Account settings** (by clicking on your account name in the left side menu) and select **Connections**. 2. Select an existing connection to edit it, or click **+ New connection**. 3. In **Connection settings**, ensure **SSH Tunnel Enabled** is checked. 4. Enter the hostname, username, and port for the bastion server. [![A public key is generated after saving](/img/docs/dbt-cloud/cloud-configuring-dbt-cloud/postgres-redshift-ssh-tunnel.png?v=2 "A public key is generated after saving")](#)A public key is generated after saving 5. Click **Save**. dbt platform generates and displays a public key. 6. Copy the newly generated public key to the bastion server and add it to the server’s `authorized_keys` file to authorize dbt platform to connect through the bastion host. If the new key is not added, the SSH tunnel connection will fail. important Each time you create and save a new SSH tunnel connection, dbt platform generates a unique SSH key pair, even when the connection details are identical to an existing connection. ###### About the Bastion server in AWS[​](#about-the-bastion-server-in-aws "Direct link to About the Bastion server in AWS") What is a bastion server? A bastion server in [Amazon Web Services (AWS)](https://aws.amazon.com/blogs/security/how-to-record-ssh-sessions-established-through-a-bastion-host/) is a host that allows dbt to open an SSH connection.
dbt only sends queries and doesn't transmit large data volumes. This means the bastion server can run on an AWS instance of any size, like a t2.small instance or t2.micro.

Make sure the location of the instance is the same Virtual Private Cloud (VPC) as the Redshift instance, and configure the security group for the bastion server to ensure that it's able to connect to the warehouse port. ###### Configuring the Bastion Server in AWS[​](#configuring-the-bastion-server-in-aws "Direct link to Configuring the Bastion Server in AWS") To configure the SSH tunnel in dbt, you'll need to provide the hostname/IP of your bastion server, username, and port, of your choosing, that dbt will connect to. Review the following steps: 1. Verify the bastion server has its network security rules set up to accept connections from the [dbt IP addresses](https://docs.getdbt.com/docs/cloud/about-cloud/access-regions-ip-addresses.md) on whatever port you configured. 2. Set up the user account by using the bastion servers instance's CLI, The following example uses the username `dbtcloud`: ```shell sudo groupadd dbtcloud sudo useradd -m -g dbtcloud dbtcloud sudo su - dbtcloud mkdir ~/.ssh chmod 700 ~/.ssh touch ~/.ssh/authorized_keys chmod 600 ~/.ssh/authorized_keys ``` 3. Copy and paste the dbt generated public key, into the authorized\_keys file. The bastion server should now be ready for dbt to use as a tunnel into the Redshift environment. #### Configuration[​](#configuration "Direct link to Configuration") To optimize performance with data platform-specific configurations in dbt, refer to [Redshift-specific configuration](https://docs.getdbt.com/reference/resource-configs/redshift-configs.md). To grant users or roles database permissions (access rights and privileges), refer to the [Redshift permissions](https://docs.getdbt.com/reference/database-permissions/redshift-permissions.md) page. #### FAQs[​](#faqs "Direct link to FAQs")  Database Error - could not connect to server: Connection timed out When setting up a database connection using an SSH tunnel, you need the following components: * A load balancer (like ELB or NLB) to manage traffic. * A bastion host (or jump server) that runs the SSH protocol, acting as a secure entry point. * The database itself (such as a Redshift cluster). dbt uses an SSH tunnel to connect through the load balancer to the database. This connection is established at the start of any dbt job run. If the tunnel connection drops, the job fails. Tunnel failures usually happen because: * The SSH daemon times out if it's idle for too long. * The load balancer cuts off the connection if it's idle. * dbt tries to keep the connection alive by checking in every 30 seconds, and the system will end the connection if there's no response from the SSH service after 300 seconds. This helps avoid drops due to inactivity unless the Load Balancer's timeout is less than 30 seconds. Bastion hosts might have additional SSH settings to disconnect inactive clients after several checks without a response. By default, it checks three times. To prevent premature disconnections, you can adjust the settings on the bastion host: * `ClientAliveCountMax `— Configures the number of checks before deciding the client is inactive. For example, `ClientAliveCountMax 10` checks 10 times. * `ClientAliveInterval` — Configures when to check for client activity. For example, `ClientAliveInterval 30` checks every 30 seconds. The example adjustments ensure that inactive SSH clients are disconnected after about 300 seconds, reducing the chance of tunnel failures. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ##### Connect Snowflake Fusion compatible note dbt connections and credentials inherit the permissions of the accounts configured. You can customize roles and associated permissions in Snowflake to fit your company's requirements and fine-tune access to database objects in your account. Refer to [Snowflake permissions](https://docs.getdbt.com/reference/database-permissions/snowflake-permissions.md) for more information about customizing roles in Snowflake. Snowflake column size change [Snowflake plans to increase](https://docs.snowflake.com/en/release-notes/bcr-bundles/un-bundled/bcr-2118) the default column size for string and binary data types in May 2026. `dbt-snowflake` versions below v1.10.6 may fail to build certain incremental models when this change is deployed.  Assess impact and required actions If you're using a `dbt-snowflake` version below v1.10.6 or have not yet migrated to a [release track](https://docs.getdbt.com/docs/dbt-versions/cloud-release-tracks.md) in the dbt platform, your adapter version is incompatible with this change and may fail to build incremental models that meet *both* of the following conditions: * Contain string columns with collation defined * Use the `on_schema_change='sync_all_columns'` config To check whether this change affects your project, run the following [list](https://docs.getdbt.com/reference/commands/list.md) command: ```bash dbt ls -s config.materialized:incremental,config.on_schema_change:sync_all_columns --resource-type model ``` * If the command returns `No nodes selected!`, no action is required. * If the command returns one or more models (for example, `Found 1000 models, 644 macros`), you may be impacted if those models have string columns that don't specify a width. In that case, upgrade to a version that includes the fix: * **dbt Core**: `dbt-snowflake` v1.10.6 or later. For upgrade instructions, see [Upgrade adapters](https://docs.getdbt.com/docs/local/install-dbt.md#upgrade-adapters). * **dbt platform**: Any release track (Latest, Compatible, Extended, or Fallback). * **dbt Fusion engine**: v2.0.0-preview.147 or higher. This ensures your incremental models can safely handle schema changes while maintaining required collation settings. The following fields are required when creating a Snowflake connection: | Field | Description | Examples | | --------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | --------------------------------------------------------------------------------------------------------- | | Account | The Snowflake account to connect to. Take a look [here](https://docs.getdbt.com/docs/local/connect-data-platform/snowflake-setup.md#account) to determine what the account field should look like based on your region. |   ✅ `db5261993` or `db5261993.east-us-2.azure`
  ❌ `db5261993.eu-central-1.snowflakecomputing.com` | | Role | A mandatory field indicating what role should be assumed after connecting to Snowflake | `transformer` | | Database | The logical database to connect to and run queries against. | `analytics` | | Warehouse | The virtual warehouse to use for running queries. | `transforming` | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | #### Authentication methods[​](#authentication-methods "Direct link to Authentication methods") This section describes the different authentication methods for connecting dbt to Snowflake. Configure Deployment environment (Production, Staging, General) credentials globally in the [**Connections**](https://docs.getdbt.com/docs/deploy/deploy-environments.md#deployment-connection) area of **Account settings**. Individual users configure their development credentials in the [**Credentials**](https://docs.getdbt.com/docs/cloud/studio-ide/develop-in-studio.md#get-started-with-the-cloud-ide) area of their user profile. ##### Username and password with MFA[​](#username-and-password-with-mfa "Direct link to Username and password with MFA") Snowflake authentication Starting November 2025, Snowflake will phase out single-factor password authentication, and multi-factor authentication (MFA) will be enforced. MFA will be required for all `Username / Password` authentication. To continue using key pair authentication, users should update any deployment environments currently using `Username / Password` by November 2025. Refer to [Snowflake's blog post](https://www.snowflake.com/en/blog/blocking-single-factor-password-authentification/) for more information. Snowflake MFA plan availability Snowflake's MFA is available on all [plan types](https://www.getdbt.com/pricing). **Available in:** Development environments The `Username / Password` auth method is the simplest way to authenticate Development credentials in a dbt project. Simply enter your Snowflake username (specifically, the `login_name`) and the corresponding user's Snowflake `password` to authenticate dbt to run queries against Snowflake on behalf of a Snowflake user. `Username / Password` authentication is not supported for deployment credentials because MFA is required. In deployment environments, use [keypair](https://docs.getdbt.com/docs/cloud/connect-data-platform/connect-snowflake.md#key-pair) authentication instead. **Note**: The *Schema*\* field in the **Developer Credentials** section is required. [![Snowflake username/password authentication](/img/docs/dbt-cloud/snowflake-userpass-auth.png?v=2 "Snowflake username/password authentication")](#)Snowflake username/password authentication **Prerequisites:** * A development environment in a dbt project * The Duo authentication app * Admin access to Snowflake (if MFA settings haven't already been applied to the account) * [Admin (write) access](https://docs.getdbt.com/docs/cloud/manage-access/seats-and-users.md) to dbt environments [MFA](https://docs.snowflake.com/en/user-guide/security-mfa) is required by Snowflake for all `Username / Password` logins. Snowflake's MFA support is powered by the Duo Security service. * In dbt, set the following [extended attribute](https://docs.getdbt.com/docs/dbt-cloud-environments.md#extended-attributes) in the development environment **General settings** page, under the **Extended attributes** section: ```yaml authenticator: username_password_mfa ``` * To reduce the number of user prompts when connecting to Snowflake with MFA, [enable token caching](https://docs.snowflake.com/en/user-guide/security-mfa#using-mfa-token-caching-to-minimize-the-number-of-prompts-during-authentication-optional) in Snowflake. * Optionally, if users miss prompts and their Snowflake accounts get locked, you can prevent automatic retries by adding the following in the same **Extended attributes** section: ```yaml connect_retries: 0 ``` [![Configure the MFA username and password, and connect\_retries in the development environment settings.](/img/docs/dbt-cloud/cloud-configuring-dbt-cloud/extended-attributes-mfa.png?v=2 "Configure the MFA username and password, and connect_retries in the development environment settings.")](#)Configure the MFA username and password, and connect\_retries in the development environment settings. ##### Key pair[​](#key-pair "Direct link to Key pair") **Available in:** Development environments, Deployment environments The `Keypair` auth method uses Snowflake's [Key Pair Authentication](https://docs.snowflake.com/en/user-guide/key-pair-auth) to authenticate Development or Deployment credentials for a dbt project. 1. After [generating an encrypted key pair](https://docs.snowflake.com/en/user-guide/key-pair-auth.html#configuring-key-pair-authentication), be sure to set the `rsa_public_key` for the Snowflake user to authenticate in dbt: ```sql alter user jsmith set rsa_public_key='MIIBIjANBgkqh...'; ``` 2. Finally, set the **Private Key** and **Private Key Passphrase** fields in the **Credentials** page to finish configuring dbt to authenticate with Snowflake using a key pair. * **Note:** Unencrypted private keys are permitted. Use a passphrase only if needed. dbt can specify a `private_key` directly as a string instead of a `private_key_path`. This `private_key` string can be in either Base64-encoded DER format, representing the key bytes, or in plain-text PEM format. Refer to [Snowflake documentation](https://docs.snowflake.com/en/user-guide/key-pair-auth) for more info on how they generate the key. * Specifying a private key using an [environment variable](https://docs.getdbt.com/docs/build/environment-variables.md) (for example, `{{ env_var('DBT_PRIVATE_KEY') }}`) is not supported. 3. To successfully fill in the Private Key field, you *must* include commented lines. If you receive a `Could not deserialize key data` or `JWT token` error, refer to [Troubleshooting](#troubleshooting) for more info. **Example:** ```sql -----BEGIN ENCRYPTED PRIVATE KEY----- < encrypted private key contents here - line 1 > < encrypted private key contents here - line 2 > < ... > -----END ENCRYPTED PRIVATE KEY----- ``` [![Snowflake keypair authentication](/img/docs/dbt-cloud/snowflake-keypair-auth.png?v=2 "Snowflake keypair authentication")](#)Snowflake keypair authentication ###### Fusion key pair[​](#fusion-key-pair "Direct link to Fusion key pair") We recommend using PKCS#8 format with AES-256 encryption for key pair authentication with Fusion. Fusion doesn't support legacy 3DES encryption or headerless key formats. Using older key formats may cause authentication failures. If you encounter the `Key is PKCS#1 (RSA private key). Snowflake requires PKCS#8` error, then your private key is in the wrong format. You have two options: * (Recommended fix) Re-export your key with modern encryption: ```bash # Convert to PKCS#8 with AES-256 encryption openssl genrsa 2048 | openssl pkcs8 -topk8 -v2 aes-256-cbc -inform PEM -out rsa_key.p8 ``` * (Temporary workaround) Add the `BEGIN` header and `END` footer to your PEM body: ```text -----BEGIN ENCRYPTED PRIVATE KEY----- < Your existing encrypted private key contents > -----END ENCRYPTED PRIVATE KEY----- ``` ##### Snowflake OAuth[​](#snowflake-oauth "Direct link to Snowflake OAuth") **Available in:** Development environments, Enterprise-tier plans only The OAuth auth method permits dbt to run development queries on behalf of a Snowflake user without the configuration of Snowflake password in dbt. For more information on configuring a Snowflake OAuth connection in dbt, please see [the docs on setting up Snowflake OAuth](https://docs.getdbt.com/docs/cloud/manage-access/set-up-snowflake-oauth.md). [![Configuring Snowflake OAuth connection](/img/docs/dbt-cloud/dbt-cloud-enterprise/database-connection-snowflake-oauth.png?v=2 "Configuring Snowflake OAuth connection")](#)Configuring Snowflake OAuth connection #### Configuration[​](#configuration "Direct link to Configuration") To learn how to optimize performance with data platform-specific configurations in dbt, refer to [Snowflake-specific configuration](https://docs.getdbt.com/reference/resource-configs/snowflake-configs.md). ##### Custom domain URL[​](#custom-domain-url "Direct link to Custom domain URL") To connect to Snowflake through a custom domain (vanity URL) instead of the account locator, use [extended attributes](https://docs.getdbt.com/docs/dbt-cloud-environments.md#extended-attributes) to configure the `host` parameter with the custom domain: ```yaml host: https://custom_domain_to_snowflake.com ``` This configuration may conflict with Snowflake OAuth when used with PrivateLink. IF users can't reach Snowflake authentication servers from a networking standpoint, please [contact dbt Support](mailto:support@getdbt.com) to find a workaround with this architecture. #### Troubleshooting[​](#troubleshooting "Direct link to Troubleshooting") If you're receiving a `Could not deserialize key data` or `JWT token` error, refer to the following causes and solutions:  Error: \`Could not deserialize key data\` Possible cause and solution for the error "Could not deserialize key data" in dbt. * This could be because of mistakes like not copying correctly, missing dashes, or leaving out commented lines. **Solution**: * You can copy the key from its source and paste it into a text editor to verify it before using it in dbt.  Error: \`JWT token\` Possible cause and solution for the error "JWT token" in dbt. * This could be a transient issue between Snowflake and dbt. When connecting to Snowflake, dbt gets a JWT token valid for only 60 seconds. If there's no response from Snowflake within this time, you might see a `JWT token is invalid` error in dbt. * The public key was not entered correctly in Snowflake. **Solutions** * dbt needs to retry connections to Snowflake. * Confirm and enter Snowflake's public key correctly. Additionally, you can reach out to Snowflake for help or refer to this Snowflake doc for more info: [Key-Based Authentication Failed with JWT token is invalid Error](https://community.snowflake.com/s/article/Key-Based-Authentication-Failed-with-JWT-token-is-invalid-Error). #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ##### Connect Starburst or Trino The following are the required fields for setting up a connection with a [Starburst Enterprise](https://docs.starburst.io/starburst-enterprise/index.html), [Starburst Galaxy](https://docs.starburst.io/starburst-galaxy/index.html), or [Trino](https://trino.io/) cluster. Unless specified, "cluster" means any of these products' clusters. | Field | Description | Examples | | ------------ | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------- | | **Host** | The hostname of your cluster. Don't include the HTTP protocol prefix. | `mycluster.mydomain.com` | | **Port** | The port to connect to your cluster. By default, it's 443 for TLS enabled clusters. | `443` | | **User** | The username (of the account) to log in to your cluster. When connecting to Starburst Galaxy clusters, you must include the role of the user as a suffix to the username.

| Format for Starburst Enterprise or Trino depends on your configured authentication method.
Format for Starburst Galaxy:
- `user.name@mydomain.com/role` | | **Password** | The user's password. | - | | **Database** | The name of a catalog in your cluster. | `example_catalog` | | **Schema** | The name of a schema that exists within the specified catalog.  | `example_schema` | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | #### Roles in Starburst Enterprise[​](#roles-in-starburst-enterprise "Direct link to Roles in Starburst Enterprise") If connecting to a Starburst Enterprise cluster with built-in access controls enabled, you must specify a role using the format detailed in [Additional parameters](#additional-parameters). If a role is not specified, the default role for the provided username is used. #### Catalogs and schemas[​](#catalogs-and-schemas "Direct link to Catalogs and schemas") When selecting the catalog and the schema, make sure the user has read and write access to both. This selection does not limit your ability to query the catalog. Instead, they serve as the default location for where tables and views are materialized. In addition, the Trino connector used in the catalog must support creating tables. This *default* can be changed later from within your dbt project. #### Configuration[​](#configuration "Direct link to Configuration") To learn how to optimize performance with data platform-specific configurations in dbt, refer to [Starburst/Trino-specific configuration](https://docs.getdbt.com/reference/resource-configs/trino-configs.md). #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ##### Connect Teradata Preview ### Connect Teradata [Preview](https://docs.getdbt.com/docs/dbt-versions/product-lifecycles "Go to https://docs.getdbt.com/docs/dbt-versions/product-lifecycles") Your environment(s) must be on a supported [release track](https://docs.getdbt.com/docs/dbt-versions/cloud-release-tracks.md) to use the Teradata connection. | Field | Description | Type | Required? | Example | | --------------- | ------------------------------------------------------------------------------------------------ | -------------- | --------- | ------------------------------------- | | Host | Host name of your Teradata environment. | String | Required | host-name.env.clearscape.teradata.com | | Port | The database port number. Equivalent to the Teradata JDBC Driver DBS\_PORT connection parameter. | Quoted integer | Optional | 1025 | | Retries | Number of times to retry to connect to database upon error. | Integer | optional | 10 | | Request timeout | The waiting period between connections attempts in seconds. Default is "1" second. | Quoted integer | Optional | 3 | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | [![Example of the Teradata connection fields.](/img/docs/dbt-cloud/teradata-connection.png?v=2 "Example of the Teradata connection fields.")](#)Example of the Teradata connection fields. ##### Development and deployment credentials[​](#development-and-deployment-credentials "Direct link to Development and deployment credentials") | Field | Description | Type | Required? | Example | | -------- | -------------------------------------------------------------------------------------------- | ------ | --------- | ------------------- | | Username | The database username. Equivalent to the Teradata JDBC Driver USER connection parameter. | String | Required | database\_username | | Password | The database password. Equivalent to the Teradata JDBC Driver PASSWORD connection parameter. | String | Required | DatabasePassword123 | | Schema | Specifies the initial database to use after login, rather than the user's default database. | String | Required | dbtlabsdocstest | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | [![Example of the developer credential fields.](/img/docs/dbt-cloud/teradata-deployment.png?v=2 "Example of the developer credential fields.")](#)Example of the developer credential fields. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- #### Git ##### About git A [version control](https://en.wikipedia.org/wiki/Version_control) system allows you and your teammates to work collaboratively, safely, and simultaneously on a single project. Version control helps you track all the code changes made in your dbt project. In a distributed version control system, every developer has a full copy of the project and project history. Git is one of the most popular distributed version control systems and is commonly used for both open source and commercial software development, with great benefits for individuals, teams and businesses. ![ overview](/assets/images/git-overview-77b8a2fbc084935621ee6e247a1abde9.png) Git allows developers see the entire timeline of their changes, decisions, and progression of any project in one place. From the moment they access the history of a project, the developer has all the context they need to understand it and start contributing. When you develop in the command line interface (CLI) or Cloud integrated development environment (Studio IDE), you can leverage Git directly to version control your code. To use version control, make sure you are connected to a Git repository in the CLI or Cloud Studio IDE. #### Related docs[​](#related-docs "Direct link to Related docs") * [Version control basics](https://docs.getdbt.com/docs/cloud/git/version-control-basics.md) * [Merge conflicts](https://docs.getdbt.com/docs/cloud/git/merge-conflicts.md) * [Connect to GitHub](https://docs.getdbt.com/docs/cloud/git/connect-github.md) * [Connect to GitLab](https://docs.getdbt.com/docs/cloud/git/connect-gitlab.md) * [Connect to Azure DevOps](https://docs.getdbt.com/docs/cloud/git/connect-azure-devops.md) #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ##### Authenticate with Azure DevOps If you use the Studio IDE or dbt CLI to collaborate on your team's Azure DevOps dbt repo, you need to [link your dbt profile to Azure DevOps](#link-your-dbt-cloud-profile-to-azure-devops), which provides an extra layer of authentication. #### Link your dbt profile to Azure DevOps[​](#link-your-dbt-profile-to-azure-devops "Direct link to Link your dbt profile to Azure DevOps") Connect your dbt profile to Azure DevOps using OAuth: 1. Click your account name at the bottom of the left-side menu and click **Account settings** 2. Scroll down to **Your profile** and select **Personal profile**. 3. Go to the **Linked accounts** section in the middle of the page. [![Azure DevOps Authorization Screen](/img/docs/dbt-cloud/connecting-azure-devops/LinktoAzure.png?v=2 "Azure DevOps Authorization Screen")](#)Azure DevOps Authorization Screen 4. Once you're redirected to Azure DevOps, sign into your account. 5. When you see the permission request screen from Azure DevOps App, click **Accept**. [![Azure DevOps Authorization Screen]( "Azure DevOps Authorization Screen")](#)Azure DevOps Authorization Screen You will be directed back to dbt, and your profile should be linked. You are now ready to develop in dbt! #### FAQs[​](#faqs "Direct link to FAQs") How can I fix my .gitignore file? A `.gitignore` file specifies which files git should intentionally ignore or 'untrack'. dbt indicates untracked files in the project file explorer pane by putting the file or folder name in *italics*. If you encounter issues like problems reverting changes, checking out or creating a new branch, or not being prompted to open a pull request after a commit in the Studio IDE — this usually indicates a problem with the [.gitignore](https://github.com/dbt-labs/dbt-starter-project/blob/main/.gitignore) file. The file may be missing or lacks the required entries for dbt to work correctly. The following sections describe how to fix the `.gitignore` file in:  Fix in the Studio IDE To resolve issues with your `gitignore` file, adding the correct entries won't automatically remove (or 'untrack') files or folders that have already been tracked by git. The updated `gitignore` will only prevent new files or folders from being tracked. So you'll need to first fix the `gitignore` file, then perform some additional git operations to untrack any incorrect files or folders. 1. Launch the Studio IDE into the project that is being fixed, by selecting **Develop** on the menu bar. 2. In your **File Explorer**, check to see if a `.gitignore` file exists at the root of your dbt project folder. If it doesn't exist, create a new file. 3. Open the new or existing `gitignore` file, and add the following: ```bash # ✅ Correct target/ dbt_packages/ logs/ # legacy -- renamed to dbt_packages in dbt v1 dbt_modules/ ``` * **Note** — You can place these lines anywhere in the file, as long as they're on separate lines. The lines shown are wildcards that will include all nested files and folders. Avoid adding a trailing `'*'` to the lines, such as `target/*`. For more info on `gitignore` syntax, refer to the [Git docs](https://git-scm.com/docs/gitignore). 4. Save the changes but *don't commit*. 5. Restart the IDE by clicking on the three dots next to the **IDE Status button** on the lower right corner of the IDE screen and select **Restart IDE**. [![Restart the IDE by clicking the three dots on the lower right or click on the Status bar](/img/docs/dbt-cloud/cloud-ide/restart-ide.png?v=2 "Restart the IDE by clicking the three dots on the lower right or click on the Status bar")](#)Restart the IDE by clicking the three dots on the lower right or click on the Status bar 6. Once the Studio IDE restarts, go to the **File Catalog** to delete the following files or folders (if they exist). No data will be lost: * `target`, `dbt_modules`, `dbt_packages`, `logs` 7. **Save** and then **Commit and sync** the changes. 8. Restart the Studio IDE again using the same procedure as step 5. 9. Once the Studio IDE restarts, use the **Create a pull request** (PR) button under the **Version Control** menu to start the process of integrating the changes. 10. When the git provider's website opens to a page with the new PR, follow the necessary steps to complete and merge the PR into the main branch of that repository. * **Note** — The 'main' branch might also be called 'master', 'dev', 'qa', 'prod', or something else depending on the organizational naming conventions. The goal is to merge these changes into the root branch that all other development branches are created from. 11. Return to the Studio IDE and use the **Change Branch** button, to switch to the main branch of the project. 12. Once the branch has changed, click the **Pull from remote** button to pull in all the changes. 13. Verify the changes by making sure the files/folders in the `.gitignore` file are in italics. [![A dbt project on the main branch that has properly configured gitignore folders (highlighted in italics).](/img/docs/dbt-cloud/cloud-ide/gitignore-italics.png?v=2 "A dbt project on the main branch that has properly configured gitignore folders (highlighted in italics).")](#)A dbt project on the main branch that has properly configured gitignore folders (highlighted in italics).  Fix in the Git provider Sometimes it's necessary to use the git providers web interface to fix a broken `.gitignore` file. Although the specific steps may vary across providers, the general process remains the same. There are two options for this approach: editing the main branch directly if allowed, or creating a pull request to implement the changes if required: * Edit in main branch * Unable to edit main branch When permissions allow it, it's possible to edit the `.gitignore` directly on the main branch of your repo. Here are the following steps: 1. Go to your repository's web interface. 2. Switch to the main branch and the root directory of your dbt project. 3. Find the `.gitignore` file. Create a blank one if it doesn't exist. 4. Edit the file in the web interface, adding the following entries: ```bash target/ dbt_packages/ logs/ # legacy -- renamed to dbt_packages in dbt v1 dbt_modules/ ``` 5. Commit (save) the file. 6. Delete the following folders from the dbt project root, if they exist. No data or code will be lost: * `target`, `dbt_modules`, `dbt_packages`, `logs` 7. Commit (save) the deletions to the main branch. 8. Switch to the Studio IDE , and open the project that you're fixing. 9. [Rollback your repo to remote](https://docs.getdbt.com/docs/cloud/git/version-control-basics.md#the-git-button-in-the-cloud-ide) in the IDE by clicking on the three dots next to the **IDE Status** button on the lower right corner of the IDE screen, then select **Rollback to remote**. * **Note** — Rollback to remote resets your repo back to an earlier clone from your remote. Any saved but uncommitted changes will be lost, so make sure you copy any modified code that you want to keep in a temporary location outside of dbt. 10. Once you rollback to remote, open the `.gitignore` file in the branch you're working in. If the new changes aren't included, you'll need to merge the latest commits from the main branch into your working branch. 11. Go to the **File Explorer** to verify the `.gitignore` file contains the correct entries and make sure the untracked files/folders in the .gitignore file are in *italics*. 12. Great job 🎉! You've configured the `.gitignore` correctly and can continue with your development! If you can't edit the `.gitignore` directly on the main branch of your repo, follow these steps: 1. Go to your repository's web interface. 2. Switch to an existing development branch, or create a new branch just for these changes (This is often faster and cleaner). 3. Find the `.gitignore` file. Create a blank one if it doesn't exist. 4. Edit the file in the web interface, adding the following entries: ```bash target/ dbt_packages/ logs/ # legacy -- renamed to dbt_packages in dbt v1 dbt_modules/ ``` 5. Commit (save) the file. 6. Delete the following folders from the dbt project root, if they exist. No data or code will be lost: * `target`, `dbt_modules`, `dbt_packages`, `logs` 7. Commit (save) the deleted folders. 8. Open a merge request using the git provider web interface. The merge request should attempt to merge the changes into the 'main' branch that all development branches are created from. 9. Follow the necessary procedures to get the branch approved and merged into the 'main' branch. You can delete the branch after the merge is complete. 10. Once the merge is complete, go back to the Studio IDE, and open the project that you're fixing. 11. [Rollback your repo to remote](https://docs.getdbt.com/docs/cloud/git/version-control-basics.md#the-git-button-in-the-cloud-ide) in the Studio IDE by clicking on the three dots next to the **Studio IDE Status** button on the lower right corner of the Studio IDE screen, then select **Rollback to remote**. * **Note** — Rollback to remote resets your repo back to an earlier clone from your remote. Any saved but uncommitted changes will be lost, so make sure you copy any modified code that you want to keep in a temporary location outside of dbt. 12. Once you rollback to remote, open the `.gitignore` file in the branch you're working in. If the new changes aren't included, you'll need to merge the latest commits from the main branch into your working branch. 13. Go to the **File Explorer** to verify the `.gitignore` file contains the correct entries and make sure the untracked files/folders in the .gitignore file are in *italics*. 14. Great job 🎉! You've configured the `.gitignore` correctly and can continue with your development! For more info, refer to this [detailed video](https://www.loom.com/share/9b3b8e2b617f41a8bad76ec7e42dd014) for additional guidance. How to migrate git providers To migrate from one git provider to another, refer to the following steps to avoid minimal disruption: 1. Outside of dbt, you'll need to import your existing repository into your new provider. By default, connecting your repository in one account won't automatically disconnected it from another account. As an example, if you're migrating from GitHub to Azure DevOps, you'll need to import your existing repository (GitHub) into your new Git provider (Azure DevOps). For detailed steps on how to do this, refer to your Git provider's documentation (Such as [GitHub](https://docs.github.com/en/migrations/importing-source-code/using-github-importer/importing-a-repository-with-github-importer), [GitLab](https://docs.gitlab.com/ee/user/project/import/repo_by_url.html), [Azure DevOps](https://learn.microsoft.com/en-us/azure/devops/repos/git/import-git-repository?view=azure-devops)) 2. Go back to dbt and set up your [integration for the new Git provider](https://docs.getdbt.com/docs/cloud/git/git-configuration-in-dbt-cloud.md), if needed. 3. Disconnect the old repository in dbt by going to **Account Settings** and then **Projects**. 4. Click on the **Repository** link, then click **Edit** and **Disconnect**. [![Disconnect and reconnect your Git repository in your dbt Account settings page.](/img/docs/dbt-cloud/disconnect-repo.png?v=2 "Disconnect and reconnect your Git repository in your dbt Account settings page.")](#)Disconnect and reconnect your Git repository in your dbt Account settings page. 5. Click **Confirm Disconnect**. 6. On the same page, connect to the new Git provider repository by clicking **Configure Repository** * If you're using the native integration, you may need to OAuth to it. 7. That's it, you should now be connected to the new Git provider! 🎉 Note — As a tip, we recommend you refresh your page and Studio IDE before performing any actions. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ##### Configure Git in dbt [Version control](https://docs.getdbt.com/docs/cloud/git/version-control-basics.md) — a system that allows you and your teammates to work safely and simultaneously on a single project — is an essential part of the dbt workflow. It enables teams to collaborate effectively and maintain a history of changes to their dbt projects. In dbt, you can configure Git integrations to manage your dbt project code with ease. dbt offers multiple ways to integrate with you Git provider, catering to diverse team needs and preferences. Whether you use a Git integration that natively connects with dbt or prefer to work with a managed or cloned repository, dbt supports flexible options to streamline your workflow. [![](/img/icons/dbt-bit.svg)](https://docs.getdbt.com/docs/cloud/git/managed-repository.md) ###### [Managed repository](https://docs.getdbt.com/docs/cloud/git/managed-repository.md) [Learn how to quickly set up a project with a managed repository.](https://docs.getdbt.com/docs/cloud/git/managed-repository.md) [![](/img/icons/dbt-bit.svg)](https://docs.getdbt.com/docs/cloud/git/import-a-project-by-git-url.md) ###### [Git clone](https://docs.getdbt.com/docs/cloud/git/import-a-project-by-git-url.md) [Learn how to connect to a git repository using a git URL and deploy keys.](https://docs.getdbt.com/docs/cloud/git/import-a-project-by-git-url.md) [![](/img/icons/dbt-bit.svg)](https://docs.getdbt.com/docs/cloud/git/connect-github.md) ###### [Connect to GitHub](https://docs.getdbt.com/docs/cloud/git/connect-github.md) [Learn how to connect to GitHub using dbt's native integration.](https://docs.getdbt.com/docs/cloud/git/connect-github.md) [![](/img/icons/dbt-bit.svg)](https://docs.getdbt.com/docs/cloud/git/connect-gitlab.md) ###### [Connect to GitLab](https://docs.getdbt.com/docs/cloud/git/connect-gitlab.md) [Learn how to connect to GitLab using dbt's native integration.](https://docs.getdbt.com/docs/cloud/git/connect-gitlab.md) [![](/img/icons/dbt-bit.svg)](https://docs.getdbt.com/docs/cloud/git/connect-azure-devops.md) ###### [Connect to Azure DevOps](https://docs.getdbt.com/docs/cloud/git/connect-azure-devops.md) [Learn how to connect to Azure DevOps using dbt's native integration.](https://docs.getdbt.com/docs/cloud/git/connect-azure-devops.md)

[Available on dbt Enterprise or Enterprise+ plans.](https://docs.getdbt.com/docs/cloud/git/connect-azure-devops.md) [![](/img/icons/dbt-bit.svg)](https://docs.getdbt.com/docs/deploy/continuous-integration.md#git-providers-who-support-ci) ###### [Availability of CI features by Git provider](https://docs.getdbt.com/docs/deploy/continuous-integration.md#git-providers-who-support-ci) [Learn which Git providers have native support for Continuous Integration workflows](https://docs.getdbt.com/docs/deploy/continuous-integration.md#git-providers-who-support-ci) #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ##### Connect to Azure DevOps EnterpriseEnterprise + ### Connect to Azure DevOps [Enterprise](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing")[Enterprise +](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing") Available for dbt Enterprise and Enterprise+ Connecting an Azure DevOps cloud account is available for organizations using the dbt Enterprise or Enterprise+ plans. dbt's native Azure DevOps integration does not support Azure DevOps Server (on-premise). Instead, you can [import a project by git URL](https://docs.getdbt.com/docs/cloud/git/import-a-project-by-git-url.md) to connect to an Azure DevOps Server. #### About Azure DevOps and dbt[​](#about-azure-devops-and-dbt "Direct link to About Azure DevOps and dbt") Connect your Azure DevOps cloud account in dbt to unlock new product experiences: * Import new Azure DevOps repos with a couple clicks during dbt project setup. * Clone repos using HTTPS rather than SSH * Enforce user authorization with OAuth 2.0. * Carry Azure DevOps user repository permissions (read / write access) through to Studio IDE or dbt CLI's git actions. * Trigger Continuous integration (CI) builds when pull requests are opened in Azure DevOps. Currently, there are multiple methods for integrating Azure DevOps with dbt. The following methods are available to all accounts: * [**Service principal (recommended)**](https://docs.getdbt.com/docs/cloud/git/setup-service-principal.md) * [**Service user (legacy)**](https://docs.getdbt.com/docs/cloud/git/setup-service-user.md) * [**Service user to service principal migration**](https://docs.getdbt.com/docs/cloud/git/setup-service-principal.md#migrate-to-service-principal) No matter which approach you take, you will need admins for dbt, Azure Entra ID, and Azure DevOps to complete the integration. For more information, follow the setup guide that's right for you. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ##### Connect to GitHub Connecting your GitHub account to dbt provides convenience and another layer of security to dbt: * Import new GitHub repositories with a couple clicks during dbt project setup. * Clone repos using HTTPS rather than SSH. * Trigger [Continuous integration](https://docs.getdbt.com/docs/deploy/continuous-integration.md)(CI) builds when pull requests are opened in GitHub. #### Prerequisites[​](#prerequisites "Direct link to Prerequisites") * For On-Premises GitHub deployment, reference [importing a project by git URL](https://docs.getdbt.com/docs/cloud/git/import-a-project-by-git-url.md) to set up your connection instead. Some git features are [limited](https://docs.getdbt.com/docs/cloud/git/import-a-project-by-git-url.md#limited-integration) with this setup. * **Note** — [Single tenant](https://docs.getdbt.com/docs/cloud/about-cloud/tenancy.md#single-tenant) accounts offer enhanced connection options for integrating with an On-Premises GitHub deployment setup using the native integration. This integration allows you to use all the features of the integration, such as triggering CI builds. The dbt Labs infrastructure team will coordinate with you to ensure any additional networking configuration requirements are met and completed. To discuss details, contact dbt Labs support or your dbt account team. * You *must* be a **GitHub organization owner** in order to [install the dbt application](https://docs.getdbt.com/docs/cloud/git/connect-github.md#installing-dbt-in-your-github-account) in your GitHub organization. To learn about GitHub organization roles, see the [GitHub documentation](https://docs.github.com/en/organizations/managing-peoples-access-to-your-organization-with-roles/roles-in-an-organization). * The GitHub organization owner requires [*Owner*](https://docs.getdbt.com/docs/cloud/manage-access/self-service-permissions.md) or [*Account Admin*](https://docs.getdbt.com/docs/cloud/manage-access/enterprise-permissions.md) permissions when they log into dbt to integrate with a GitHub environment using organizations. * You may need to temporarily provide an extra dbt user account with *Owner* or *Account Admin* [permissions](https://docs.getdbt.com/docs/cloud/manage-access/enterprise-permissions.md) for your GitHub organization owner until they complete the installation. Case-sensitive repository names When specifying a GitHub repository in the dbt platform using the UI, API, or Terraform provider, the repository name must exactly match the case used in the GitHub URL to avoid cloning errors or job failures. For example, if the URL of your repository is `github.com/my-org/MyRepo`, enter the name as `MyRepo`, not `myrepo`. #### Installing dbt in your GitHub account[​](#installing-dbt-in-your-github-account "Direct link to Installing dbt in your GitHub account") You can connect your dbt account to GitHub by installing the dbt application in your GitHub organization and providing access to the appropriate repositories. To connect your dbt account to your GitHub account: 1. From dbt, click on your account name in the left side menu and select **Account settings**. 2. Select **Personal profile** under the **Your profile** section. 3. Scroll down to **Linked accounts**. [![Navigated to Linked Accounts under your profile](/img/docs/dbt-cloud/cloud-configuring-dbt-cloud/connecting-github/github-connect-1.png?v=2 "Navigated to Linked Accounts under your profile")](#)Navigated to Linked Accounts under your profile 4. In the **Linked accounts** section, set up your GitHub account connection to dbt by clicking **Link** to the right of GitHub. This redirects you to your account on GitHub where you will be asked to install and configure the dbt application. 5. Select the GitHub organization and repositories dbt should access. [![Installing the dbt application into a GitHub organization](/img/docs/dbt-cloud/cloud-configuring-dbt-cloud/connecting-github/github-app-install.png?v=2 "Installing the dbt application into a GitHub organization")](#)Installing the dbt application into a GitHub organization 6. Assign the dbt GitHub App the following permissions: * Read access to metadata * Read and write access to Checks * Read and write access to Commit statuses * Read and write access to Contents (Code) * Read and write access to Pull requests * Read and write access to Webhooks * Read and write access to Workflows 7. Once you grant access to the app, you will be redirected back to dbt and shown a linked account success state. You are now personally authenticated. 8. Ask your team members to individually authenticate by connecting their [personal GitHub profiles](#authenticate-your-personal-github-account). #### Limiting repository access in GitHub[​](#limiting-repository-access-in-github "Direct link to Limiting repository access in GitHub") If you are your GitHub organization owner, you can also configure the dbt GitHub application to have access to only select repositories. This configuration must be done in GitHub, but we provide an easy link in dbt to start this process. [![Configuring the dbt app](/img/docs/dbt-cloud/cloud-configuring-dbt-cloud/connecting-github/configure-github.png?v=2 "Configuring the dbt app")](#)Configuring the dbt app #### Authenticate your personal GitHub account[​](#authenticate-your-personal-github-account "Direct link to Authenticate your personal GitHub account") After the dbt administrator [sets up a connection](https://docs.getdbt.com/docs/cloud/git/connect-github.md#installing-dbt-cloud-in-your-github-account) to your organization's GitHub account, you need to authenticate using your personal account. You must connect your personal GitHub profile to dbt to use the [Studio IDE](https://docs.getdbt.com/docs/cloud/studio-ide/develop-in-studio.md) and [CLI](https://docs.getdbt.com/docs/cloud/cloud-cli-installation.md) and verify your read and write access to the repository. GitHub profile connection * dbt developers on the [Enterprise or Enterprise+ plan](https://www.getdbt.com/pricing/) must each connect their GitHub profiles to dbt. This is because the Studio IDE verifies every developer's read / write access for the dbt repo. * dbt developers on the [Starter plan](https://www.getdbt.com/pricing/) don't need to each connect their profiles to GitHub, however, it's still recommended to do so. To connect a personal GitHub account: 1. From dbt, click on your account name in the left side menu and select **Account settings**. 2. Select **Personal profile** under the **Your profile** section. 3. Scroll down to **Linked accounts**. If your GitHub account is not connected, you’ll see "No connected account". 4. Select **Link** to begin the setup process. You’ll be redirected to GitHub, and asked to authorize dbt in a grant screen. [![Authorizing the dbt app for developers](/img/docs/dbt-cloud/cloud-configuring-dbt-cloud/connecting-github/github-auth.png?v=2 "Authorizing the dbt app for developers")](#)Authorizing the dbt app for developers 5. Once you approve authorization, you will be redirected to dbt, and you should now see your connected account. You can now use the Studio IDE or dbt CLI. #### FAQs[​](#faqs "Direct link to FAQs") How can I fix my .gitignore file? A `.gitignore` file specifies which files git should intentionally ignore or 'untrack'. dbt indicates untracked files in the project file explorer pane by putting the file or folder name in *italics*. If you encounter issues like problems reverting changes, checking out or creating a new branch, or not being prompted to open a pull request after a commit in the Studio IDE — this usually indicates a problem with the [.gitignore](https://github.com/dbt-labs/dbt-starter-project/blob/main/.gitignore) file. The file may be missing or lacks the required entries for dbt to work correctly. The following sections describe how to fix the `.gitignore` file in:  Fix in the Studio IDE To resolve issues with your `gitignore` file, adding the correct entries won't automatically remove (or 'untrack') files or folders that have already been tracked by git. The updated `gitignore` will only prevent new files or folders from being tracked. So you'll need to first fix the `gitignore` file, then perform some additional git operations to untrack any incorrect files or folders. 1. Launch the Studio IDE into the project that is being fixed, by selecting **Develop** on the menu bar. 2. In your **File Explorer**, check to see if a `.gitignore` file exists at the root of your dbt project folder. If it doesn't exist, create a new file. 3. Open the new or existing `gitignore` file, and add the following: ```bash # ✅ Correct target/ dbt_packages/ logs/ # legacy -- renamed to dbt_packages in dbt v1 dbt_modules/ ``` * **Note** — You can place these lines anywhere in the file, as long as they're on separate lines. The lines shown are wildcards that will include all nested files and folders. Avoid adding a trailing `'*'` to the lines, such as `target/*`. For more info on `gitignore` syntax, refer to the [Git docs](https://git-scm.com/docs/gitignore). 4. Save the changes but *don't commit*. 5. Restart the IDE by clicking on the three dots next to the **IDE Status button** on the lower right corner of the IDE screen and select **Restart IDE**. [![Restart the IDE by clicking the three dots on the lower right or click on the Status bar](/img/docs/dbt-cloud/cloud-ide/restart-ide.png?v=2 "Restart the IDE by clicking the three dots on the lower right or click on the Status bar")](#)Restart the IDE by clicking the three dots on the lower right or click on the Status bar 6. Once the Studio IDE restarts, go to the **File Catalog** to delete the following files or folders (if they exist). No data will be lost: * `target`, `dbt_modules`, `dbt_packages`, `logs` 7. **Save** and then **Commit and sync** the changes. 8. Restart the Studio IDE again using the same procedure as step 5. 9. Once the Studio IDE restarts, use the **Create a pull request** (PR) button under the **Version Control** menu to start the process of integrating the changes. 10. When the git provider's website opens to a page with the new PR, follow the necessary steps to complete and merge the PR into the main branch of that repository. * **Note** — The 'main' branch might also be called 'master', 'dev', 'qa', 'prod', or something else depending on the organizational naming conventions. The goal is to merge these changes into the root branch that all other development branches are created from. 11. Return to the Studio IDE and use the **Change Branch** button, to switch to the main branch of the project. 12. Once the branch has changed, click the **Pull from remote** button to pull in all the changes. 13. Verify the changes by making sure the files/folders in the `.gitignore` file are in italics. [![A dbt project on the main branch that has properly configured gitignore folders (highlighted in italics).](/img/docs/dbt-cloud/cloud-ide/gitignore-italics.png?v=2 "A dbt project on the main branch that has properly configured gitignore folders (highlighted in italics).")](#)A dbt project on the main branch that has properly configured gitignore folders (highlighted in italics).  Fix in the Git provider Sometimes it's necessary to use the git providers web interface to fix a broken `.gitignore` file. Although the specific steps may vary across providers, the general process remains the same. There are two options for this approach: editing the main branch directly if allowed, or creating a pull request to implement the changes if required: * Edit in main branch * Unable to edit main branch When permissions allow it, it's possible to edit the `.gitignore` directly on the main branch of your repo. Here are the following steps: 1. Go to your repository's web interface. 2. Switch to the main branch and the root directory of your dbt project. 3. Find the `.gitignore` file. Create a blank one if it doesn't exist. 4. Edit the file in the web interface, adding the following entries: ```bash target/ dbt_packages/ logs/ # legacy -- renamed to dbt_packages in dbt v1 dbt_modules/ ``` 5. Commit (save) the file. 6. Delete the following folders from the dbt project root, if they exist. No data or code will be lost: * `target`, `dbt_modules`, `dbt_packages`, `logs` 7. Commit (save) the deletions to the main branch. 8. Switch to the Studio IDE , and open the project that you're fixing. 9. [Rollback your repo to remote](https://docs.getdbt.com/docs/cloud/git/version-control-basics.md#the-git-button-in-the-cloud-ide) in the IDE by clicking on the three dots next to the **IDE Status** button on the lower right corner of the IDE screen, then select **Rollback to remote**. * **Note** — Rollback to remote resets your repo back to an earlier clone from your remote. Any saved but uncommitted changes will be lost, so make sure you copy any modified code that you want to keep in a temporary location outside of dbt. 10. Once you rollback to remote, open the `.gitignore` file in the branch you're working in. If the new changes aren't included, you'll need to merge the latest commits from the main branch into your working branch. 11. Go to the **File Explorer** to verify the `.gitignore` file contains the correct entries and make sure the untracked files/folders in the .gitignore file are in *italics*. 12. Great job 🎉! You've configured the `.gitignore` correctly and can continue with your development! If you can't edit the `.gitignore` directly on the main branch of your repo, follow these steps: 1. Go to your repository's web interface. 2. Switch to an existing development branch, or create a new branch just for these changes (This is often faster and cleaner). 3. Find the `.gitignore` file. Create a blank one if it doesn't exist. 4. Edit the file in the web interface, adding the following entries: ```bash target/ dbt_packages/ logs/ # legacy -- renamed to dbt_packages in dbt v1 dbt_modules/ ``` 5. Commit (save) the file. 6. Delete the following folders from the dbt project root, if they exist. No data or code will be lost: * `target`, `dbt_modules`, `dbt_packages`, `logs` 7. Commit (save) the deleted folders. 8. Open a merge request using the git provider web interface. The merge request should attempt to merge the changes into the 'main' branch that all development branches are created from. 9. Follow the necessary procedures to get the branch approved and merged into the 'main' branch. You can delete the branch after the merge is complete. 10. Once the merge is complete, go back to the Studio IDE, and open the project that you're fixing. 11. [Rollback your repo to remote](https://docs.getdbt.com/docs/cloud/git/version-control-basics.md#the-git-button-in-the-cloud-ide) in the Studio IDE by clicking on the three dots next to the **Studio IDE Status** button on the lower right corner of the Studio IDE screen, then select **Rollback to remote**. * **Note** — Rollback to remote resets your repo back to an earlier clone from your remote. Any saved but uncommitted changes will be lost, so make sure you copy any modified code that you want to keep in a temporary location outside of dbt. 12. Once you rollback to remote, open the `.gitignore` file in the branch you're working in. If the new changes aren't included, you'll need to merge the latest commits from the main branch into your working branch. 13. Go to the **File Explorer** to verify the `.gitignore` file contains the correct entries and make sure the untracked files/folders in the .gitignore file are in *italics*. 14. Great job 🎉! You've configured the `.gitignore` correctly and can continue with your development! For more info, refer to this [detailed video](https://www.loom.com/share/9b3b8e2b617f41a8bad76ec7e42dd014) for additional guidance. How to migrate git providers To migrate from one git provider to another, refer to the following steps to avoid minimal disruption: 1. Outside of dbt, you'll need to import your existing repository into your new provider. By default, connecting your repository in one account won't automatically disconnected it from another account. As an example, if you're migrating from GitHub to Azure DevOps, you'll need to import your existing repository (GitHub) into your new Git provider (Azure DevOps). For detailed steps on how to do this, refer to your Git provider's documentation (Such as [GitHub](https://docs.github.com/en/migrations/importing-source-code/using-github-importer/importing-a-repository-with-github-importer), [GitLab](https://docs.gitlab.com/ee/user/project/import/repo_by_url.html), [Azure DevOps](https://learn.microsoft.com/en-us/azure/devops/repos/git/import-git-repository?view=azure-devops)) 2. Go back to dbt and set up your [integration for the new Git provider](https://docs.getdbt.com/docs/cloud/git/git-configuration-in-dbt-cloud.md), if needed. 3. Disconnect the old repository in dbt by going to **Account Settings** and then **Projects**. 4. Click on the **Repository** link, then click **Edit** and **Disconnect**. [![Disconnect and reconnect your Git repository in your dbt Account settings page.](/img/docs/dbt-cloud/disconnect-repo.png?v=2 "Disconnect and reconnect your Git repository in your dbt Account settings page.")](#)Disconnect and reconnect your Git repository in your dbt Account settings page. 5. Click **Confirm Disconnect**. 6. On the same page, connect to the new Git provider repository by clicking **Configure Repository** * If you're using the native integration, you may need to OAuth to it. 7. That's it, you should now be connected to the new Git provider! 🎉 Note — As a tip, we recommend you refresh your page and Studio IDE before performing any actions. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ##### Connect to GitLab Connecting your GitLab account to dbt provides convenience and another layer of security to dbt: * Import new GitLab repos with a couple clicks during dbt project setup. * Clone repos using HTTPS rather than SSH. * Carry GitLab user permissions through to dbt or dbt CLI's git actions. * Trigger [Continuous integration](https://docs.getdbt.com/docs/deploy/continuous-integration.md) builds when merge requests are opened in GitLab. info When configuring the repository in dbt, GitLab automatically: * Registers a webhook that triggers pipeline jobs in dbt. * Creates a [project access token](https://docs.gitlab.com/ee/user/project/settings/project_access_tokens.html) in your GitLab repository, which sends the job run status back to GitLab using the dbt API for CI jobs. dbt automatically refreshes this token for you. Requires a [GitLab Premium or Ultimate account](https://about.gitlab.com/pricing/). Depending on your plan, use these steps to integrate GitLab in dbt: * The Developer or Starter plan, read these [instructions](#for-dbt-developer-and-starter-plans). * The Enterprise or Enterprise+ plan, jump ahead to these [instructions](#for-the-dbt-enterprise-plans). #### For dbt Developer and Starter plans[​](#for-dbt-developer-and-starter-plans "Direct link to For dbt Developer and Starter plans") Before you can work with GitLab repositories in dbt, you’ll need to connect your GitLab account to your user profile. This allows dbt to authenticate your actions when interacting with Git repositories. Make sure to read about [limitations](#limitations) of the Team and Developer plans before you connect your account. To connect your GitLab account: 1. From dbt, click on your account name in the left side menu and select **Account settings**. 2. Select **Personal profile** under the **Your profile** section. 3. Scroll down to **Linked accounts**. 4. Click **Link** to the right of your GitLab account. [![The Personal profile settings with the Linked Accounts section of the user profile](/img/docs/dbt-cloud/cloud-configuring-dbt-cloud/connecting-github/github-connect.png?v=2 "The Personal profile settings with the Linked Accounts section of the user profile")](#)The Personal profile settings with the Linked Accounts section of the user profile When you click **Link**, you will be redirected to GitLab and prompted to sign into your account. GitLab will then ask for your explicit authorization: [![GitLab Authorization Screen](/img/docs/dbt-cloud/connecting-gitlab/GitLab-Auth.png?v=2 "GitLab Authorization Screen")](#)GitLab Authorization Screen Once you've accepted, you should be redirected back to dbt, and you'll see that your account has been linked to your profile. ##### Requirements and limitations[​](#requirements-and-limitations "Direct link to Requirements and limitations") dbt Team and Developer plans use a single GitLab deploy token created by the first user who connects the repository, which means: * All repositories users access from the dbt platform must belong to a [GitLab group](https://docs.gitlab.com/user/group/). * All Git operations (like commits and pushes) from the Studio IDE appear as coming from the same deploy token. * GitLab push rules may reject pushes made through dbt, particularly when multiple users are committing via the same deploy token. To support advanced Git workflows and multi-user commit behavior, upgrade to the Enterprise plan, which provides more flexible Git authentication strategies. #### For the dbt Enterprise plans[​](#for-the-dbt-enterprise-plans "Direct link to For the dbt Enterprise plans") dbt Enterprise and Enterprise+ customers have the added benefit of bringing their own GitLab OAuth application to dbt. This tier benefits from extra security, as dbt will: * Enforce user authorization with OAuth. * Carry GitLab's user repository permissions (read / write access) through to dbt or dbt CLI's git actions. In order to connect GitLab in dbt, a GitLab account admin must: 1. [Set up a GitLab OAuth application](#setting-up-a-gitlab-oauth-application). 2. [Add the GitLab app to dbt](#adding-the-gitlab-oauth-application-to-dbt-cloud). Once the admin completes those steps, dbt developers need to: 1. [Personally authenticate with GitLab](#personally-authenticating-with-gitlab) from dbt. ##### Setting up a GitLab OAuth application[​](#setting-up-a-gitlab-oauth-application "Direct link to Setting up a GitLab OAuth application") We recommend that before you set up a project in dbt, a GitLab account admin set up an OAuth application in GitLab for use in dbt. For more detail, GitLab has a [guide for creating a Group Application](https://docs.gitlab.com/ee/integration/oauth_provider.html#group-owned-applications). In GitLab, navigate to your group settings and select **Applications**. Here you'll see a form to create a new application. [![GitLab application navigation]( "GitLab application navigation")](#)GitLab application navigation In GitLab, when creating your Group Application, input the following: | Field | Value | | ---------------- | ----------------------------------------- | | **Name** | dbt | | **Redirect URI** | `https://YOUR_ACCESS_URL/complete/gitlab` | | **Confidential** | ✅ | | **Scopes** | ✅ api | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | Replace `YOUR_ACCESS_URL` with the [appropriate Access URL](https://docs.getdbt.com/docs/cloud/about-cloud/access-regions-ip-addresses.md) for your region and plan. The application form in GitLab should look as follows when completed: [![GitLab group owned application form]( "GitLab group owned application form")](#)GitLab group owned application form Click **Save application** in GitLab, and GitLab will then generate an **Application ID** and **Secret**. These values will be available even if you close the app screen, so this is not the only chance you have to save them. If you're a Business Critical customer using [IP restrictions](https://docs.getdbt.com/docs/cloud/secure/ip-restrictions.md), ensure you've added the appropriate Gitlab CIDRs to your IP restriction rules, or else the Gitlab connection will fail. ##### Adding the GitLab OAuth application to dbt[​](#adding-the-gitlab-oauth-application-to-dbt "Direct link to Adding the GitLab OAuth application to dbt") After you've created your GitLab application, you need to provide dbt information about the app. In dbt, account admins should navigate to **Account Settings**, click on the **Integrations** tab, and expand the GitLab section. [![Navigating to the GitLab Integration in dbt](/img/docs/dbt-cloud/connecting-gitlab/GitLab-Navigation.gif?v=2 "Navigating to the GitLab Integration in dbt")](#)Navigating to the GitLab Integration in dbt In dbt, input the following values: | Field | Value | | ------------------- | ---------------------------- | | **GitLab Instance** | | | **Application ID** | *copy value from GitLab app* | | **Secret** | *copy value from GitLab app* | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | Note, if you have a special hosted version of GitLab, modify the **GitLab Instance** to use the hostname provided for your organization instead - for example `https://gitlab.yourgreatcompany.com/`. Once the form is complete in dbt, click **Save**. You will then be redirected to GitLab and prompted to sign into your account. GitLab will ask for your explicit authorization: [![GitLab Authorization Screen](/img/docs/dbt-cloud/connecting-gitlab/GitLab-Auth.png?v=2 "GitLab Authorization Screen")](#)GitLab Authorization Screen Once you've accepted, you should be redirected back to dbt, and your integration is ready for developers on your team to [personally authenticate with](#personally-authenticating-with-gitlab). ##### Personally authenticating with GitLab[​](#personally-authenticating-with-gitlab "Direct link to Personally authenticating with GitLab") dbt developers on the Enterprise or Enterprise+ plan must each connect their GitLab profiles to dbt, as every developer's read / write access for the dbt repo is checked in the Studio IDE or dbt CLI. To connect a personal GitLab account: 1. From dbt, click on your account name in the left side menu and select **Account settings**. 2. Select **Personal profile** under the **Your profile** section. 3. Scroll down to **Linked accounts**. If your GitLab account is not connected, you’ll see "No connected account". Select **Link** to begin the setup process. You’ll be redirected to GitLab, and asked to authorize dbt in a grant screen. [![Authorizing the dbt app for developers](/img/docs/dbt-cloud/connecting-gitlab/GitLab-Auth.png?v=2 "Authorizing the dbt app for developers")](#)Authorizing the dbt app for developers Once you approve authorization, you will be redirected to dbt, and you should see your connected account. You're now ready to start developing in the Studio IDE or dbt CLI. #### Troubleshooting[​](#troubleshooting "Direct link to Troubleshooting") Unable to trigger a CI job with GitLab When you connect dbt to a GitLab repository, GitLab automatically registers a webhook in the background, viewable under the repository settings. This webhook is also used to trigger [CI jobs](https://docs.getdbt.com/docs/deploy/ci-jobs.md) when you push to the repository. If you're unable to trigger a CI job, this usually indicates that the webhook registration is missing or incorrect. To resolve this issue, navigate to the repository settings in GitLab and view the webhook registrations by navigating to GitLab --> **Settings** --> **Webhooks**. Some things to check: * The webhook registration is enabled in GitLab. * The webhook registration is configured with the correct URL and secret. If you're still experiencing this issue, reach out to the Support team at and we'll be happy to help! Errors importing a repository on dbt project set up If you don't see your repository listed, double-check that: * Your repository is in a Gitlab group you have access to. dbt will not read repos associated with a user. If you do see your repository listed, but are unable to import the repository successfully, double-check that: * You are a maintainer of that repository. Only users with maintainer permissions can set up repository connections. If you imported a repository using the dbt native integration with GitLab, you should be able to see if the clone strategy is using a `deploy_token`. If it's relying on an SSH key, this means the repository was not set up using the native GitLab integration, but rather using the generic git clone option. The repository must be reconnected in order to get the benefits described above. How can I fix my .gitignore file? A `.gitignore` file specifies which files git should intentionally ignore or 'untrack'. dbt indicates untracked files in the project file explorer pane by putting the file or folder name in *italics*. If you encounter issues like problems reverting changes, checking out or creating a new branch, or not being prompted to open a pull request after a commit in the Studio IDE — this usually indicates a problem with the [.gitignore](https://github.com/dbt-labs/dbt-starter-project/blob/main/.gitignore) file. The file may be missing or lacks the required entries for dbt to work correctly. The following sections describe how to fix the `.gitignore` file in:  Fix in the Studio IDE To resolve issues with your `gitignore` file, adding the correct entries won't automatically remove (or 'untrack') files or folders that have already been tracked by git. The updated `gitignore` will only prevent new files or folders from being tracked. So you'll need to first fix the `gitignore` file, then perform some additional git operations to untrack any incorrect files or folders. 1. Launch the Studio IDE into the project that is being fixed, by selecting **Develop** on the menu bar. 2. In your **File Explorer**, check to see if a `.gitignore` file exists at the root of your dbt project folder. If it doesn't exist, create a new file. 3. Open the new or existing `gitignore` file, and add the following: ```bash # ✅ Correct target/ dbt_packages/ logs/ # legacy -- renamed to dbt_packages in dbt v1 dbt_modules/ ``` * **Note** — You can place these lines anywhere in the file, as long as they're on separate lines. The lines shown are wildcards that will include all nested files and folders. Avoid adding a trailing `'*'` to the lines, such as `target/*`. For more info on `gitignore` syntax, refer to the [Git docs](https://git-scm.com/docs/gitignore). 4. Save the changes but *don't commit*. 5. Restart the IDE by clicking on the three dots next to the **IDE Status button** on the lower right corner of the IDE screen and select **Restart IDE**. [![Restart the IDE by clicking the three dots on the lower right or click on the Status bar](/img/docs/dbt-cloud/cloud-ide/restart-ide.png?v=2 "Restart the IDE by clicking the three dots on the lower right or click on the Status bar")](#)Restart the IDE by clicking the three dots on the lower right or click on the Status bar 6. Once the Studio IDE restarts, go to the **File Catalog** to delete the following files or folders (if they exist). No data will be lost: * `target`, `dbt_modules`, `dbt_packages`, `logs` 7. **Save** and then **Commit and sync** the changes. 8. Restart the Studio IDE again using the same procedure as step 5. 9. Once the Studio IDE restarts, use the **Create a pull request** (PR) button under the **Version Control** menu to start the process of integrating the changes. 10. When the git provider's website opens to a page with the new PR, follow the necessary steps to complete and merge the PR into the main branch of that repository. * **Note** — The 'main' branch might also be called 'master', 'dev', 'qa', 'prod', or something else depending on the organizational naming conventions. The goal is to merge these changes into the root branch that all other development branches are created from. 11. Return to the Studio IDE and use the **Change Branch** button, to switch to the main branch of the project. 12. Once the branch has changed, click the **Pull from remote** button to pull in all the changes. 13. Verify the changes by making sure the files/folders in the `.gitignore` file are in italics. [![A dbt project on the main branch that has properly configured gitignore folders (highlighted in italics).](/img/docs/dbt-cloud/cloud-ide/gitignore-italics.png?v=2 "A dbt project on the main branch that has properly configured gitignore folders (highlighted in italics).")](#)A dbt project on the main branch that has properly configured gitignore folders (highlighted in italics).  Fix in the Git provider Sometimes it's necessary to use the git providers web interface to fix a broken `.gitignore` file. Although the specific steps may vary across providers, the general process remains the same. There are two options for this approach: editing the main branch directly if allowed, or creating a pull request to implement the changes if required: * Edit in main branch * Unable to edit main branch When permissions allow it, it's possible to edit the `.gitignore` directly on the main branch of your repo. Here are the following steps: 1. Go to your repository's web interface. 2. Switch to the main branch and the root directory of your dbt project. 3. Find the `.gitignore` file. Create a blank one if it doesn't exist. 4. Edit the file in the web interface, adding the following entries: ```bash target/ dbt_packages/ logs/ # legacy -- renamed to dbt_packages in dbt v1 dbt_modules/ ``` 5. Commit (save) the file. 6. Delete the following folders from the dbt project root, if they exist. No data or code will be lost: * `target`, `dbt_modules`, `dbt_packages`, `logs` 7. Commit (save) the deletions to the main branch. 8. Switch to the Studio IDE , and open the project that you're fixing. 9. [Rollback your repo to remote](https://docs.getdbt.com/docs/cloud/git/version-control-basics.md#the-git-button-in-the-cloud-ide) in the IDE by clicking on the three dots next to the **IDE Status** button on the lower right corner of the IDE screen, then select **Rollback to remote**. * **Note** — Rollback to remote resets your repo back to an earlier clone from your remote. Any saved but uncommitted changes will be lost, so make sure you copy any modified code that you want to keep in a temporary location outside of dbt. 10. Once you rollback to remote, open the `.gitignore` file in the branch you're working in. If the new changes aren't included, you'll need to merge the latest commits from the main branch into your working branch. 11. Go to the **File Explorer** to verify the `.gitignore` file contains the correct entries and make sure the untracked files/folders in the .gitignore file are in *italics*. 12. Great job 🎉! You've configured the `.gitignore` correctly and can continue with your development! If you can't edit the `.gitignore` directly on the main branch of your repo, follow these steps: 1. Go to your repository's web interface. 2. Switch to an existing development branch, or create a new branch just for these changes (This is often faster and cleaner). 3. Find the `.gitignore` file. Create a blank one if it doesn't exist. 4. Edit the file in the web interface, adding the following entries: ```bash target/ dbt_packages/ logs/ # legacy -- renamed to dbt_packages in dbt v1 dbt_modules/ ``` 5. Commit (save) the file. 6. Delete the following folders from the dbt project root, if they exist. No data or code will be lost: * `target`, `dbt_modules`, `dbt_packages`, `logs` 7. Commit (save) the deleted folders. 8. Open a merge request using the git provider web interface. The merge request should attempt to merge the changes into the 'main' branch that all development branches are created from. 9. Follow the necessary procedures to get the branch approved and merged into the 'main' branch. You can delete the branch after the merge is complete. 10. Once the merge is complete, go back to the Studio IDE, and open the project that you're fixing. 11. [Rollback your repo to remote](https://docs.getdbt.com/docs/cloud/git/version-control-basics.md#the-git-button-in-the-cloud-ide) in the Studio IDE by clicking on the three dots next to the **Studio IDE Status** button on the lower right corner of the Studio IDE screen, then select **Rollback to remote**. * **Note** — Rollback to remote resets your repo back to an earlier clone from your remote. Any saved but uncommitted changes will be lost, so make sure you copy any modified code that you want to keep in a temporary location outside of dbt. 12. Once you rollback to remote, open the `.gitignore` file in the branch you're working in. If the new changes aren't included, you'll need to merge the latest commits from the main branch into your working branch. 13. Go to the **File Explorer** to verify the `.gitignore` file contains the correct entries and make sure the untracked files/folders in the .gitignore file are in *italics*. 14. Great job 🎉! You've configured the `.gitignore` correctly and can continue with your development! For more info, refer to this [detailed video](https://www.loom.com/share/9b3b8e2b617f41a8bad76ec7e42dd014) for additional guidance. I'm seeing a Gitlab authentication out of date error loop If you're seeing a 'GitLab Authentication is out of date' 500 server error page - this usually occurs when the deploy key in the repository settings in both dbt and GitLab do not match. No worries - this is a current issue the dbt Labs team is working on and we have a few workarounds for you to try: ###### First workaround[​](#first-workaround "Direct link to First workaround") 1. Disconnect repo from project in dbt. 2. Go to Gitlab and click on Settings > Repository. 3. Under Repository Settings, remove/revoke active dbt deploy tokens and deploy keys. 4. Attempt to reconnect your repository via dbt. 5. You would then need to check Gitlab to make sure that the new deploy key is added. 6. Once confirmed that it's added, refresh dbt and try developing once again. ###### Second workaround[​](#second-workaround "Direct link to Second workaround") 1. Keep repo in project as is -- don't disconnect. 2. Copy the deploy key generated in dbt. 3. Go to Gitlab and click on Settings > Repository. 4. Under Repository Settings, manually add to your Gitlab project deploy key repo (with `Grant write permissions` box checked). 5. Go back to dbt, refresh your page and try developing again. If you've tried the workarounds above and are still experiencing this behavior - reach out to the Support team at and we'll be happy to help! Can self-hosted GitLab instances only be connected via dbt Enterprise plans? Presently yes, this is only available to Enterprise users. This is because of the way you have to set up the GitLab app redirect URL for auth, which can only be customized if you're a user on an Enterprise plan. Check out our [pricing page](https://www.getdbt.com/pricing/) for more information or feel free to [contact us](https://www.getdbt.com/contact) to build your Enterprise pricing. How to migrate git providers To migrate from one git provider to another, refer to the following steps to avoid minimal disruption: 1. Outside of dbt, you'll need to import your existing repository into your new provider. By default, connecting your repository in one account won't automatically disconnected it from another account. As an example, if you're migrating from GitHub to Azure DevOps, you'll need to import your existing repository (GitHub) into your new Git provider (Azure DevOps). For detailed steps on how to do this, refer to your Git provider's documentation (Such as [GitHub](https://docs.github.com/en/migrations/importing-source-code/using-github-importer/importing-a-repository-with-github-importer), [GitLab](https://docs.gitlab.com/ee/user/project/import/repo_by_url.html), [Azure DevOps](https://learn.microsoft.com/en-us/azure/devops/repos/git/import-git-repository?view=azure-devops)) 2. Go back to dbt and set up your [integration for the new Git provider](https://docs.getdbt.com/docs/cloud/git/git-configuration-in-dbt-cloud.md), if needed. 3. Disconnect the old repository in dbt by going to **Account Settings** and then **Projects**. 4. Click on the **Repository** link, then click **Edit** and **Disconnect**. [![Disconnect and reconnect your Git repository in your dbt Account settings page.](/img/docs/dbt-cloud/disconnect-repo.png?v=2 "Disconnect and reconnect your Git repository in your dbt Account settings page.")](#)Disconnect and reconnect your Git repository in your dbt Account settings page. 5. Click **Confirm Disconnect**. 6. On the same page, connect to the new Git provider repository by clicking **Configure Repository** * If you're using the native integration, you may need to OAuth to it. 7. That's it, you should now be connected to the new Git provider! 🎉 Note — As a tip, we recommend you refresh your page and Studio IDE before performing any actions. GitLab token refresh message When you connect dbt to a GitLab repository, GitLab automatically creates a [project access token](https://docs.gitlab.com/ee/user/project/settings/project_access_tokens.html) in your GitLab repository in the background. This sends the job run status back to Gitlab using the dbt API for CI jobs. By default, the project access token follows a naming pattern: `dbt token for GitLab project: `. If you have multiple tokens in your repository, look for one that follows this pattern to identify the correct token used by dbt. If you're receiving a "Refresh token" message, don't worry — dbt automatically refreshes this project access token for you, which means you never have to manually rotate it. If you still experience any token refresh errors, please try disconnecting and reconnecting the repository in your dbt project to refresh the token. For any issues, please reach out to the Support team at and we'll be happy to help! #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ##### Connect with Git clone In dbt, you can import a git repository from any valid git URL that points to a dbt project. There are some important considerations to keep in mind when doing this. #### Git protocols[​](#git-protocols "Direct link to Git protocols") You must use the `git@...` or `ssh://...` version of your git URL, not the `https://...` version. dbt uses the SSH protocol to clone repositories, so dbt will be unable to clone repos supplied with the HTTP protocol. ##### Converting SSH URLs to the correct format[​](#converting-ssh-urls-to-the-correct-format "Direct link to Converting SSH URLs to the correct format") When you copy an SSH URL from your git provider (usually found under **Code** --> **Clone** --> **SSH**), it may be in an SCP-style format that uses a colon (`:`) to separate the host from the path. You'll need to convert this to a standard SSH URL format for dbt. If your SSH URL looks like this (SCP-style with a colon): ```text git@github.com:organization/repo-name.git user@custom-host.example.com:organization/repo-name.git ``` Convert it to the standard SSH URL (uses `ssh://` and `/`): ```text ssh://git@github.com/organization/repo-name.git ssh://user@custom-host.example.com/organization/repo-name.git ``` #### Availability of features by Git provider[​](#availability-of-features-by-git-provider "Direct link to Availability of features by Git provider") * If your git provider has a [native dbt integration](https://docs.getdbt.com/docs/cloud/git/git-configuration-in-dbt-cloud.md), you can seamlessly set up [continuous integration (CI)](https://docs.getdbt.com/docs/deploy/ci-jobs.md) jobs directly within dbt. * For providers without native integration, you can still use the [Git clone method](https://docs.getdbt.com/docs/cloud/git/import-a-project-by-git-url.md) to import your git URL and leverage the [dbt Administrative API](https://docs.getdbt.com/docs/dbt-cloud-apis/admin-cloud-api.md) to trigger a CI job to run. The following table outlines the available integration options and their corresponding capabilities. | **Git provider** | **Native dbt integration** | **Automated CI job** | **Git clone** | **Information** | **Supported plans** | | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | -------------------------- | -------------------- | ------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | ----------------------- | | [Azure DevOps](https://docs.getdbt.com/docs/cloud/git/connect-azure-devops.md)
| ✅ | ✅ | ✅ | Organizations on the Starter and Developer plans can connect to Azure DevOps using a deploy key. Note, you won’t be able to configure automated CI jobs but you can still develop. | Enterprise, Enterprise+ | | [GitHub](https://docs.getdbt.com/docs/cloud/git/connect-github.md)
| ✅ | ✅ | ✅ | | All dbt plans | | [GitLab](https://docs.getdbt.com/docs/cloud/git/connect-gitlab.md)
| ✅ | ✅ | ✅ | | All dbt plans | | All other git providers using [Git clone](https://docs.getdbt.com/docs/cloud/git/import-a-project-by-git-url.md) ([BitBucket](https://docs.getdbt.com/docs/cloud/git/import-a-project-by-git-url.md#bitbucket), [AWS CodeCommit](https://docs.getdbt.com/docs/cloud/git/import-a-project-by-git-url.md#aws-codecommit), and others) | ❌ | ❌ | ✅ | Refer to the [Customizing CI/CD with custom pipelines](https://docs.getdbt.com/guides/custom-cicd-pipelines.md?step=1) guide to set up continuous integration and continuous deployment (CI/CD). | | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | #### Managing deploy keys[​](#managing-deploy-keys "Direct link to Managing deploy keys") After importing a project by Git URL, dbt will generate a Deploy Key for your repository. To find the deploy key in dbt: 1. From dbt, click on your account name in the left side menu and select **Account settings**. 2. Go to **Projects** and select a project. 3. Click the **Repository** link to the repository details page. 4. Copy the key under the **Deploy Key** section. You must provide this Deploy Key in the Repository configuration of your Git host. Configure this Deploy Key to allow *read and write access* to the specified repositories. **Note**: Each dbt project will generate a different deploy key when connected to a repo, even if two projects are connected to the same repo. You will need to supply both deploy keys to your Git provider. #### GitHub[​](#github "Direct link to GitHub") Use GitHub? If you use GitHub, you can import your repo directly using [dbt's GitHub Application](https://docs.getdbt.com/docs/cloud/git/connect-github.md). Connecting your repo via the GitHub Application [enables Continuous Integration](https://docs.getdbt.com/docs/deploy/continuous-integration.md). ##### Locate your SSH URL[​](#locate-your-ssh-url "Direct link to Locate your SSH URL") 1. Navigate to your repository on GitHub. 2. Click the **Code** button. 3. Select the **SSH** tab to view the SSH clone URL. 4. Copy the URL and [convert it to the correct format](#converting-ssh-urls-to-the-correct-format) if needed. ##### Add a deploy key[​](#add-a-deploy-key "Direct link to Add a deploy key") To add a deploy key to a GitHub account, navigate to the Deploy keys tab of the settings page in your GitHub repository. * After supplying a name for the deploy key and pasting in your deploy key (generated by dbt), be sure to check the **Allow write access** checkbox. * After adding this key, dbt will be able to read and write files in your dbt project. * Refer to [Adding a deploy key in GitHub](https://github.blog/2015-06-16-read-only-deploy-keys/) [![Configuring a GitHub Deploy Key](/img/docs/dbt-cloud/cloud-configuring-dbt-cloud/cd7351c-Screen_Shot_2019-10-16_at_1.09.41_PM.png?v=2 "Configuring a GitHub Deploy Key")](#)Configuring a GitHub Deploy Key #### GitLab[​](#gitlab "Direct link to GitLab") Use GitLab? If you use GitLab, you can import your repo directly using [dbt's GitLab Application](https://docs.getdbt.com/docs/cloud/git/connect-gitlab.md). Connecting your repo via the GitLab Application [enables Continuous Integration](https://docs.getdbt.com/docs/deploy/continuous-integration.md). * To add a deploy key to a GitLab account, navigate to the [SSH keys](https://gitlab.com/profile/keys) tab in the User Settings page of your GitLab account. * Next, paste in the deploy key generated by dbt for your repository. * After saving this SSH key, dbt will be able to read and write files in your GitLab repository. * Refer to [Adding a read only deploy key in GitLab](https://docs.gitlab.com/ee/user/project/deploy_keys/) [![Configuring a GitLab SSH Key](/img/docs/dbt-cloud/cloud-configuring-dbt-cloud/f3ea88d-Screen_Shot_2019-10-16_at_4.45.50_PM.png?v=2 "Configuring a GitLab SSH Key")](#)Configuring a GitLab SSH Key #### BitBucket[​](#bitbucket "Direct link to BitBucket") Use a deploy key to import your BitBucket repository into dbt. To preserve account security, use a service account to add the BitBucket deploy key and maintain the connection between your BitBucket repository and dbt. BitBucket links every repository commit and other Git actions (such as opening a pull request) to the email associated with the user's Bitbucket account. To add a deploy key to a BitBucket account: * Navigate to **SSH keys** tab in the Personal Settings page of your BitBucket account. * Next, click the **Add key** button and paste in the deploy key generated by dbt for your repository. * After saving this SSH key, dbt will be able to read and write files in your BitBucket repository. [![Configuring a BitBucket SSH Key](/img/docs/dbt-cloud/cloud-configuring-dbt-cloud/bitbucket-ssh-key.png?v=2 "Configuring a BitBucket SSH Key")](#)Configuring a BitBucket SSH Key #### AWS CodeCommit[​](#aws-codecommit "Direct link to AWS CodeCommit") dbt can work with dbt projects hosted on AWS CodeCommit, but there are some extra steps needed compared to Github or other git providers. This guide will help you connect your CodeCommit-hosted dbt project to dbt. ###### Step 1: Create an AWS User for dbt[​](#step-1-create-an-aws-user-for-dbt "Direct link to Step 1: Create an AWS User for dbt") * To give dbt access to your repository, first you'll need to create an AWS IAM user for dbt. * Log into the AWS Console and navigate to the IAM section. * Click **Add User**, and create a new user by entering a unique and meaningful user name. * The user will need clone access to your repository. You can do this by adding the **AWSCodeCommitPowerUser** permission during setup. ###### Step 2: Import your repository by name[​](#step-2-import-your-repository-by-name "Direct link to Step 2: Import your repository by name") * Open the AWS CodeCommit console and choose your repository. * Copy the SSH URL from that page. * Next, navigate to the **New Repository** page in dbt. * Choose the **Git Clone** tab, and paste in the SSH URL you copied from the console. * In the newly created Repository details page, you'll see a **Deploy Key** field. * Copy the contents of this field as you'll need it for [Step 3](#step-3-grant-dbt-cloud-aws-user-access). **Note:** The dbt-generated public key is the only key that will work in the next step. Any other key that has been generated outside of dbt will not work. ###### Step 3: Grant dbt AWS User access[​](#step-3-grant-dbt-aws-user-access "Direct link to Step 3: Grant dbt AWS User access") * Open up the newly created dbt user in the AWS IAM Console. * Choose the **Security Credentials** tab and then click **Upload SSH public key**. * Paste in the contents of the **Public Key** field from the dbt Repository page. * Once you've created the key, you'll see an **SSH key ID** for it. * **[Contact dbt Support](mailto:support@getdbt.com)** and share that field so that dbt Support team can complete the setup process for you. ###### Step 4: Specify a custom branch in dbt[​](#step-4-specify-a-custom-branch-in-dbt "Direct link to Step 4: Specify a custom branch in dbt") CodeCommit uses `master` as its default branch, and to initialize your project, you'll need to specify the `master` branch as a [custom branch](https://docs.getdbt.com/faqs/Environments/custom-branch-settings.md#development) in dbt. * Go to **Deploy** -> **Environments** -> **Development**. * Select **Settings** -> **Edit** and under **General Settings**, check the **Default to a custom branch** checkbox. * Specify the custom branch as `master` and click **Save**. ###### Step 5: Configure pull request template URLs (Optional)[​](#step-5-configure-pull-request-template-urls-optional "Direct link to Step 5: Configure pull request template URLs (Optional)") To prevent users from directly merging code changes into the default branch, configure the [PR Template URL](https://docs.getdbt.com/docs/cloud/git/pr-template.md) in the **Repository details** page for your project. Once configured, dbt will prompt users to open a new PR after committing and synching code changes on the branch in the Studio IDE, before merging any changes into the default branch. * Go to **Account Settings** -> **Projects** -> Select the project. * Click the repository link under **Repository**. * In the **Repository details** page, click **Edit** in the lower right. [![Configure PR template in the 'Repository details' page.](/img/docs/collaborate/repo-details.jpg?v=2 "Configure PR template in the 'Repository details' page.")](#)Configure PR template in the 'Repository details' page. * In the **Pull request URL** field, set the URL based on the suggested [PR template format](https://docs.getdbt.com/docs/cloud/git/pr-template.md#aws-codecommit). * Replace `` with the name of your repository (Note that it is case sensitive). In the following example, the repository name is `New_Repo`. [![In the Pull request URL field example, the repository name is 'New\_Repo'.](/img/docs/collaborate/pr-template-example.jpg?v=2 "In the Pull request URL field example, the repository name is 'New_Repo'.")](#)In the Pull request URL field example, the repository name is 'New\_Repo'. * After filling the **Pull request URL** field, click **Save**. 🎉 **You're all set!** Once dbt Support handles your request and you've set your custom branch, your project is ready to execute dbt runs on dbt. #### Azure DevOps[​](#azure-devops "Direct link to Azure DevOps") Use Azure DevOps? If you use Azure DevOps and you are on the dbt Enterprise or Enterprise+ plan, you can import your repo directly using [dbt's Azure DevOps Integration](https://docs.getdbt.com/docs/cloud/git/connect-azure-devops.md). Connecting your repo via the Azure DevOps Application [enables Continuous Integration](https://docs.getdbt.com/docs/deploy/continuous-integration.md). 1. To add a deploy key to an Azure DevOps account, navigate to the **SSH public keys** page in the User Settings of your user's Azure DevOps account or a service user's account. 2. We recommend using a dedicated service user for the integration to ensure that dbt's connection to Azure DevOps is not interrupted by changes to user permissions. [![Navigate to the 'SSH public keys' settings page](/img/docs/dbt-cloud/cloud-configuring-dbt-cloud/52bfdaa-Screen_Shot_2020-03-09_at_4.13.20_PM.png?v=2 "Navigate to the 'SSH public keys' settings page")](#)Navigate to the 'SSH public keys' settings page 3. Next, click the **+ New Key** button to create a new SSH key for the repository. [![Click the '+ New Key' button to create a new SSH key for the repository.](/img/docs/dbt-cloud/cloud-configuring-dbt-cloud/6d8e980-Screen_Shot_2020-03-09_at_4.13.27_PM.png?v=2 "Click the '+ New Key' button to create a new SSH key for the repository.")](#)Click the '+ New Key' button to create a new SSH key for the repository. 4. Select a descriptive name for the key and then paste in the deploy key generated by dbt for your repository. 5. After saving this SSH key, dbt will be able to read and write files in your Azure DevOps repository. [![Enter and save the public key generated for your repository by dbt](/img/docs/dbt-cloud/cloud-configuring-dbt-cloud/d19f199-Screen_Shot_2020-03-09_at_4.13.50_PM.png?v=2 "Enter and save the public key generated for your repository by dbt")](#)Enter and save the public key generated for your repository by dbt #### Other git providers[​](#other-git-providers "Direct link to Other git providers") Don't see your git provider here? Please [contact dbt Support](mailto:support@getdbt.com) - we're happy to help you set up dbt with any supported git provider. #### Limited integration[​](#limited-integration "Direct link to Limited integration") Some features of dbt require a tight integration with your git host, for example, updating GitHub pull requests with dbt run statuses. Importing your project by a URL prevents you from using these features. Once you give dbt access to your repository, you can continue to set up your project by adding a connection and creating and running your first dbt job. #### FAQs[​](#faqs "Direct link to FAQs") How can I fix my .gitignore file? A `.gitignore` file specifies which files git should intentionally ignore or 'untrack'. dbt indicates untracked files in the project file explorer pane by putting the file or folder name in *italics*. If you encounter issues like problems reverting changes, checking out or creating a new branch, or not being prompted to open a pull request after a commit in the Studio IDE — this usually indicates a problem with the [.gitignore](https://github.com/dbt-labs/dbt-starter-project/blob/main/.gitignore) file. The file may be missing or lacks the required entries for dbt to work correctly. The following sections describe how to fix the `.gitignore` file in:  Fix in the Studio IDE To resolve issues with your `gitignore` file, adding the correct entries won't automatically remove (or 'untrack') files or folders that have already been tracked by git. The updated `gitignore` will only prevent new files or folders from being tracked. So you'll need to first fix the `gitignore` file, then perform some additional git operations to untrack any incorrect files or folders. 1. Launch the Studio IDE into the project that is being fixed, by selecting **Develop** on the menu bar. 2. In your **File Explorer**, check to see if a `.gitignore` file exists at the root of your dbt project folder. If it doesn't exist, create a new file. 3. Open the new or existing `gitignore` file, and add the following: ```bash # ✅ Correct target/ dbt_packages/ logs/ # legacy -- renamed to dbt_packages in dbt v1 dbt_modules/ ``` * **Note** — You can place these lines anywhere in the file, as long as they're on separate lines. The lines shown are wildcards that will include all nested files and folders. Avoid adding a trailing `'*'` to the lines, such as `target/*`. For more info on `gitignore` syntax, refer to the [Git docs](https://git-scm.com/docs/gitignore). 4. Save the changes but *don't commit*. 5. Restart the IDE by clicking on the three dots next to the **IDE Status button** on the lower right corner of the IDE screen and select **Restart IDE**. [![Restart the IDE by clicking the three dots on the lower right or click on the Status bar](/img/docs/dbt-cloud/cloud-ide/restart-ide.png?v=2 "Restart the IDE by clicking the three dots on the lower right or click on the Status bar")](#)Restart the IDE by clicking the three dots on the lower right or click on the Status bar 6. Once the Studio IDE restarts, go to the **File Catalog** to delete the following files or folders (if they exist). No data will be lost: * `target`, `dbt_modules`, `dbt_packages`, `logs` 7. **Save** and then **Commit and sync** the changes. 8. Restart the Studio IDE again using the same procedure as step 5. 9. Once the Studio IDE restarts, use the **Create a pull request** (PR) button under the **Version Control** menu to start the process of integrating the changes. 10. When the git provider's website opens to a page with the new PR, follow the necessary steps to complete and merge the PR into the main branch of that repository. * **Note** — The 'main' branch might also be called 'master', 'dev', 'qa', 'prod', or something else depending on the organizational naming conventions. The goal is to merge these changes into the root branch that all other development branches are created from. 11. Return to the Studio IDE and use the **Change Branch** button, to switch to the main branch of the project. 12. Once the branch has changed, click the **Pull from remote** button to pull in all the changes. 13. Verify the changes by making sure the files/folders in the `.gitignore` file are in italics. [![A dbt project on the main branch that has properly configured gitignore folders (highlighted in italics).](/img/docs/dbt-cloud/cloud-ide/gitignore-italics.png?v=2 "A dbt project on the main branch that has properly configured gitignore folders (highlighted in italics).")](#)A dbt project on the main branch that has properly configured gitignore folders (highlighted in italics).  Fix in the Git provider Sometimes it's necessary to use the git providers web interface to fix a broken `.gitignore` file. Although the specific steps may vary across providers, the general process remains the same. There are two options for this approach: editing the main branch directly if allowed, or creating a pull request to implement the changes if required: * Edit in main branch * Unable to edit main branch When permissions allow it, it's possible to edit the `.gitignore` directly on the main branch of your repo. Here are the following steps: 1. Go to your repository's web interface. 2. Switch to the main branch and the root directory of your dbt project. 3. Find the `.gitignore` file. Create a blank one if it doesn't exist. 4. Edit the file in the web interface, adding the following entries: ```bash target/ dbt_packages/ logs/ # legacy -- renamed to dbt_packages in dbt v1 dbt_modules/ ``` 5. Commit (save) the file. 6. Delete the following folders from the dbt project root, if they exist. No data or code will be lost: * `target`, `dbt_modules`, `dbt_packages`, `logs` 7. Commit (save) the deletions to the main branch. 8. Switch to the Studio IDE , and open the project that you're fixing. 9. [Rollback your repo to remote](https://docs.getdbt.com/docs/cloud/git/version-control-basics.md#the-git-button-in-the-cloud-ide) in the IDE by clicking on the three dots next to the **IDE Status** button on the lower right corner of the IDE screen, then select **Rollback to remote**. * **Note** — Rollback to remote resets your repo back to an earlier clone from your remote. Any saved but uncommitted changes will be lost, so make sure you copy any modified code that you want to keep in a temporary location outside of dbt. 10. Once you rollback to remote, open the `.gitignore` file in the branch you're working in. If the new changes aren't included, you'll need to merge the latest commits from the main branch into your working branch. 11. Go to the **File Explorer** to verify the `.gitignore` file contains the correct entries and make sure the untracked files/folders in the .gitignore file are in *italics*. 12. Great job 🎉! You've configured the `.gitignore` correctly and can continue with your development! If you can't edit the `.gitignore` directly on the main branch of your repo, follow these steps: 1. Go to your repository's web interface. 2. Switch to an existing development branch, or create a new branch just for these changes (This is often faster and cleaner). 3. Find the `.gitignore` file. Create a blank one if it doesn't exist. 4. Edit the file in the web interface, adding the following entries: ```bash target/ dbt_packages/ logs/ # legacy -- renamed to dbt_packages in dbt v1 dbt_modules/ ``` 5. Commit (save) the file. 6. Delete the following folders from the dbt project root, if they exist. No data or code will be lost: * `target`, `dbt_modules`, `dbt_packages`, `logs` 7. Commit (save) the deleted folders. 8. Open a merge request using the git provider web interface. The merge request should attempt to merge the changes into the 'main' branch that all development branches are created from. 9. Follow the necessary procedures to get the branch approved and merged into the 'main' branch. You can delete the branch after the merge is complete. 10. Once the merge is complete, go back to the Studio IDE, and open the project that you're fixing. 11. [Rollback your repo to remote](https://docs.getdbt.com/docs/cloud/git/version-control-basics.md#the-git-button-in-the-cloud-ide) in the Studio IDE by clicking on the three dots next to the **Studio IDE Status** button on the lower right corner of the Studio IDE screen, then select **Rollback to remote**. * **Note** — Rollback to remote resets your repo back to an earlier clone from your remote. Any saved but uncommitted changes will be lost, so make sure you copy any modified code that you want to keep in a temporary location outside of dbt. 12. Once you rollback to remote, open the `.gitignore` file in the branch you're working in. If the new changes aren't included, you'll need to merge the latest commits from the main branch into your working branch. 13. Go to the **File Explorer** to verify the `.gitignore` file contains the correct entries and make sure the untracked files/folders in the .gitignore file are in *italics*. 14. Great job 🎉! You've configured the `.gitignore` correctly and can continue with your development! For more info, refer to this [detailed video](https://www.loom.com/share/9b3b8e2b617f41a8bad76ec7e42dd014) for additional guidance. How to migrate git providers To migrate from one git provider to another, refer to the following steps to avoid minimal disruption: 1. Outside of dbt, you'll need to import your existing repository into your new provider. By default, connecting your repository in one account won't automatically disconnected it from another account. As an example, if you're migrating from GitHub to Azure DevOps, you'll need to import your existing repository (GitHub) into your new Git provider (Azure DevOps). For detailed steps on how to do this, refer to your Git provider's documentation (Such as [GitHub](https://docs.github.com/en/migrations/importing-source-code/using-github-importer/importing-a-repository-with-github-importer), [GitLab](https://docs.gitlab.com/ee/user/project/import/repo_by_url.html), [Azure DevOps](https://learn.microsoft.com/en-us/azure/devops/repos/git/import-git-repository?view=azure-devops)) 2. Go back to dbt and set up your [integration for the new Git provider](https://docs.getdbt.com/docs/cloud/git/git-configuration-in-dbt-cloud.md), if needed. 3. Disconnect the old repository in dbt by going to **Account Settings** and then **Projects**. 4. Click on the **Repository** link, then click **Edit** and **Disconnect**. [![Disconnect and reconnect your Git repository in your dbt Account settings page.](/img/docs/dbt-cloud/disconnect-repo.png?v=2 "Disconnect and reconnect your Git repository in your dbt Account settings page.")](#)Disconnect and reconnect your Git repository in your dbt Account settings page. 5. Click **Confirm Disconnect**. 6. On the same page, connect to the new Git provider repository by clicking **Configure Repository** * If you're using the native integration, you may need to OAuth to it. 7. That's it, you should now be connected to the new Git provider! 🎉 Note — As a tip, we recommend you refresh your page and Studio IDE before performing any actions. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ##### Connect with managed repository Managed repositories are a great way to trial dbt without needing to create a new repository. If you don't already have a Git repository for your dbt project, you can let dbt host and manage a repository for you. If in the future you choose to host this repository elsewhere, you can export the information from dbt at any time. Refer to [Move from a managed repository to a self-hosted repository](https://docs.getdbt.com/faqs/Git/managed-repo.md) for more information on how to do that. info dbt Labs recommends against using a managed repository in a production environment. You can't use Git features like pull requests, which are part of our recommended version control best practices. To set up a project with a managed repository: 1. From your **Account settings** in dbt, select the project you want to set up with a managed repository. If the project already has a repository set up, you need to edit the repository settings and disconnect the existing repository. 2. Click **Edit** for the project. 3. Under Repository, click **Configure repository**. 4. Select **Managed**. 5. Enter a name for the repository. For example, "analytics" or "dbt-models." 6. Click **Create**. [![Adding a managed repository](/img/docs/dbt-cloud/cloud-configuring-dbt-cloud/managed-repo.png?v=2 "Adding a managed repository")](#)Adding a managed repository #### Download managed repository[​](#download-managed-repository "Direct link to Download managed repository") To download a copy of your managed repository from dbt to your local machine: 1. Use the **Project** selector on the main left-side menu to navigate to a project that's using a managed repository. 2. Click **Dashboard** from the main left-side menu. 3. From the dashboard, click **Settings**. 4. Locate the **Repository** field and click the hyperlink for the repo. 5. Below the **Deploy key** you will find the **Download repository** option. Click the button to download. If you don't see this option, you're either not assigned a [permission set](https://docs.getdbt.com/docs/cloud/manage-access/enterprise-permissions.md#account-permissions) with `write` access to Git repositories, or you don't have a managed repo for your project. [![The download button for a managed repo.](/img/docs/dbt-cloud/cloud-configuring-dbt-cloud/download-managed-repo.png?v=2 "The download button for a managed repo.")](#)The download button for a managed repo. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ##### Merge conflicts [Merge conflicts](https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/addressing-merge-conflicts/about-merge-conflicts) in the [Studio IDE](https://docs.getdbt.com/docs/cloud/studio-ide/develop-in-studio.md) often occur when multiple users are simultaneously making edits to the same section in the same file. This makes it difficult for Git to decide what changes to incorporate in the final merge. The merge conflict process provides users the ability to choose which lines of code they'd like to preserve and commit. This document will show you how to resolve merge conflicts in the Studio IDE. #### Identify merge conflicts[​](#identify-merge-conflicts "Direct link to Identify merge conflicts") You can experience a merge conflict in two possible ways: * Pulling changes from your main branch when someone else has merged a conflicting change. * Committing your changes to the same branch when someone else has already committed their change first. The way to [resolve](#resolve-merge-conflicts) either scenario will be exactly the same. For example, if you and a teammate make changes to the same file and commit, you will encounter a merge conflict as soon as you **Commit and sync**. The Studio IDE will display: * **Commit and resolve** git action bar under **Version Control** instead of **Commit** — This indicates that the Cloud Studio IDE has detected some conflicts that you need to address. * A 2-split editor view — The left view includes your code changes and is read-only. The right view includes the additional changes, allows you to edit and marks the conflict with some flags: ```text <<<<<< HEAD your current code ====== conflicting code >>>>>> (some branch identifier) ``` * The file and path colored in red in the **File Catalog**, with a warning icon to highlight files that you need to resolve. * The file name colored in red in the **Changes** section, with a warning icon. * If you press commit without resolving the conflict, the Studio IDE will prompt a pop up box with a list which files need to be resolved. [![Conflicting section that needs resolution will be highlighted](/img/docs/dbt-cloud/cloud-ide/merge-conflict.png?v=2 "Conflicting section that needs resolution will be highlighted")](#)Conflicting section that needs resolution will be highlighted [![Pop up box when you commit without resolving the conflict](/img/docs/dbt-cloud/cloud-ide/commit-without-resolve.png?v=2 "Pop up box when you commit without resolving the conflict")](#)Pop up box when you commit without resolving the conflict #### Resolve merge conflicts[​](#resolve-merge-conflicts "Direct link to Resolve merge conflicts") You can seamlessly resolve merge conflicts that involve competing line changes in the Cloud Studio IDE. 1. In the Studio IDE, you can edit the right-side of the conflict file, choose which lines of code you'd like to preserve, and delete the rest. * Note: The left view editor is read-only and you cannot make changes. 2. Delete the special flags or conflict markers `<<<<<<<`, `=======`, `>>>>>>>` that highlight the merge conflict and also choose which lines of code to preserve. 3. If you have more than one merge conflict in your file, scroll down to the next set of conflict markers and repeat steps one and two to resolve your merge conflict. 4. Press **Save**. You will notice the line highlights disappear and return to a plain background. This means that you've resolved the conflict successfully. 5. Repeat this process for every file that has a merge conflict. [![Choosing lines of code to preserve](/img/docs/dbt-cloud/cloud-ide/resolve-conflict.png?v=2 "Choosing lines of code to preserve")](#)Choosing lines of code to preserve Edit conflict files * If you open the conflict file under **Changes**, the file name will display something like `model.sql (last commit)` and is fully read-only and cannot be edited.
* If you open the conflict file under **File Catalog**, you can edit the file in the right view. #### Commit changes[​](#commit-changes "Direct link to Commit changes") When you've resolved all the merge conflicts, the last step would be to commit the changes you've made. 1. Click the git action bar **Commit and resolve**. 2. The **Commit Changes** pop up box will confirm that all conflicts have been resolved. Write your commit message and click **Commit Changes**. 3. The Studio IDE will return to its normal state and you can continue developing! [![Conflict has been resolved](/img/docs/dbt-cloud/cloud-ide/commit-resolve.png?v=2 "Conflict has been resolved")](#)Conflict has been resolved [![Commit Changes pop up box to commit your changes](/img/docs/dbt-cloud/cloud-ide/commit-changes.png?v=2 "Commit Changes pop up box to commit your changes")](#)Commit Changes pop up box to commit your changes #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ##### PR template #### Configure pull request (PR) template URLs[​](#configure-pull-request-pr-template-urls "Direct link to Configure pull request (PR) template URLs") When you commit changes to a branch in the Studio IDE, dbt can prompt users to open a new Pull Request for the code changes. To enable this functionality, ensure that a PR Template URL is configured in the **Repository details** page in your **Account Settings**. If this setting is blank, the Studio IDE will prompt users to merge the changes directly into their default branch. [![Configure a PR template in the 'Repository details' page.](/img/docs/collaborate/repo-details.jpg?v=2 "Configure a PR template in the 'Repository details' page.")](#)Configure a PR template in the 'Repository details' page. ##### PR Template URL by git provider[​](#pr-template-url-by-git-provider "Direct link to PR Template URL by git provider") The PR Template URL setting will be automatically set for most repositories, depending on the connection method. * If you connect to your repository via in-app integrations with your git provider or the "Git Clone" method via SSH, this URL setting will be auto-populated and editable. * For AWS CodeCommit, this URL setting isn't auto-populated and must be [manually configured](https://docs.getdbt.com/docs/cloud/git/import-a-project-by-git-url.md#step-5-configure-pull-request-template-urls-optional). * If you connect via a dbt [Managed repository](https://docs.getdbt.com/docs/cloud/git/managed-repository.md), this URL will not be set, and the Studio IDE will prompt users to merge the changes directly into their default branch. The PR template URL supports two variables that can be used to build a URL string. These variables, `{{source}}` and `{{destination}}` return branch names based on the state of the configured Environment and active branch open in the IDE. The `{{source}}` variable represents the active development branch, and the `{{destination}}` variable represents the configured base branch for the environment, eg. `master`. A typical PR build URL looks like: * Template * Rendered ```text https://github.com/dbt-labs/jaffle_shop/compare/{{destination}}..{{source}} ``` ```text https://github.com/dbt-labs/jaffle_shop/compare/master..my-branch ``` #### Example templates[​](#example-templates "Direct link to Example templates") Some common URL templates are provided below, but please note that the exact value may vary depending on your configured git provider. ##### GitHub[​](#github "Direct link to GitHub") ```text https://github.com///compare/{{destination}}..{{source}} ``` If you're using Github Enterprise your template may look something like: ```text https://git..com///compare/{{destination}}..{{source}} ``` ##### GitLab[​](#gitlab "Direct link to GitLab") ```text https://gitlab.com///-/merge_requests/new?merge_request[source_branch]={{source}}&merge_request[target_branch]={{destination}} ``` ##### BitBucket[​](#bitbucket "Direct link to BitBucket") ```text https://bitbucket.org///pull-requests/new?source={{source}}&dest={{destination}} ``` If you're using BitBucket Server or Data Center your template may look something like: ```text https:///projects//repos//pull-requests?create&sourceBranch={{source}}&targetBranch={{destination}} ``` ##### AWS CodeCommit[​](#aws-codecommit "Direct link to AWS CodeCommit") ```text https://console.aws.amazon.com/codesuite/codecommit/repositories//pull-requests/new/refs/heads/{{destination}}/.../refs/heads/{{source}} ``` ##### Azure DevOps[​](#azure-devops "Direct link to Azure DevOps") ```text https://dev.azure.com///_git//pullrequestcreate?sourceRef={{source}}&targetRef={{destination}} ``` #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ##### Set up Azure DevOps EnterpriseEnterprise + ### Set up Azure DevOps [Enterprise](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing")[Enterprise +](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing") #### Service principal overview[​](#service-principal-overview "Direct link to Service principal overview") note If this is your first time setting up an Entra app as a service principal, refer to the [Microsoft documentation](https://learn.microsoft.com/en-us/entra/identity-platform/howto-create-service-principal-portal) for any prerequisite steps you may need to take to prepare. To use dbt's native integration with Azure DevOps, an account admin needs to set up a Microsoft Entra ID app as a service principal. We recommend setting up a separate [Entra ID application than used for SSO](https://docs.getdbt.com/docs/cloud/manage-access/set-up-sso-microsoft-entra-id.md). The application's service principal represents the Entra ID application object. While a "service user" represents a real user in Azure with an Entra ID (and an applicable license), the "service principal" is a secure identity used by an application to access Azure resources unattended. The service principal authenticates with a client ID and secret rather than a username and password (or any other form of user auth). Service principals are the [Microsoft recommended method](https://learn.microsoft.com/en-us/entra/architecture/secure-service-accounts#types-of-microsoft-entra-service-accounts) for authenticating apps. 1. [Register an Entra ID app](#register-a-microsoft-entra-id-app). 2. [Connect Azure DevOps to your new app](#connect-azure-devops-to-your-new-app). 3. [Add your Entra ID app to dbt](#connect-your-microsoft-entra-id-app-to-dbt). Once the Microsoft Entra ID app is added to dbt, it will act as a [service principal](https://learn.microsoft.com/en-us/entra/identity-platform/app-objects-and-service-principals?tabs=browser), which will be used to power headless actions in dbt such as deployment runs and CI. The dbt developers can then personally authenticate in dbt from Azure DevOps. For more, see [Authenticate with Azure DevOps](https://docs.getdbt.com/docs/cloud/git/authenticate-azure.md). The following personas are required to complete the steps on this page: * Microsoft Entra ID admin * Azure DevOps admin * dbt account admin * Azure admin (if your Entra ID and Azure DevOps environments are not connected) #### Register a Microsoft Entra ID app[​](#register-a-microsoft-entra-id-app "Direct link to Register a Microsoft Entra ID app") A Microsoft Entra ID admin needs to perform the following steps: 1. Sign into your Azure portal and click **Microsoft Entra ID**. 2. Select **App registrations** in the left panel. 3. Select **New registration**. The form for creating a new Entra ID app opens. 4. Provide a name for your app. We recommend using, "dbt Labs Azure DevOps app". 5. Select **Accounts in any organizational directory (Any Entra ID directory - Multitenant)** as the Supported Account Types. Many customers ask why they need to select Multitenant instead of Single Tenant, and they frequently get this step wrong. Microsoft considers Azure DevOps (formerly called Visual Studio) and Microsoft Entra ID separate tenants, and for the Entra ID application to work properly, you must select Multitenant. 6. Set **Redirect URI** to **Web**. Copy and paste the Redirect URI from dbt into the next field. To find the Redirect URI in dbt: 1. In dbt, navigate to **Account Settings** -> **Integrations**. 2. Click the **edit icon** next to **Azure DevOps**. 3. Copy the first **Redirect URIs** value which looks like `https:///complete/azure_active_directory` and does NOT end with `service_user`. 7. Click **Register**. Here's what your app should look like before registering it: [![Registering a Microsoft Entra ID app]( "Registering a Microsoft Entra ID app")](#)Registering a Microsoft Entra ID app #### Create a client secret[​](#create-a-client-secret "Direct link to Create a client secret") A Microsoft Entra ID admin needs to complete the following steps: 1. Navigate to **Microsoft Entra ID**, click **App registrations**, and click on your app. 2. Select **Certificates and Secrets** from the left navigation panel. 3. Select **Client secrets** and click **New client secret** 4. Give the secret a description and select the expiration time. Click **Add**. 5. Copy the **Value** field and securely share it with the dbt account admin, who will complete the setup. #### Create the app's service principal[​](#create-the-apps-service-principal "Direct link to Create the app's service principal") After you've created the app, you need to verify whether it has a service principal. In many cases, if this has been configured before, new apps will get one assigned upon creation. 1. Navigate to **Microsoft Entra ID**. 2. Under **Manage** on the left-side menu, click **App registrations**. 3. Click the app for the dbt and Azure DevOps integration. 4. Locate the **Managed application in local directory** field and, if it has the option, click **Create Service Principal**. If the field is already populated, a service principal has already been assigned. [![Example of the 'Create Service Principal' option highlighted .](/img/docs/cloud-integrations/create-service-principal.png?v=2 "Example of the 'Create Service Principal' option highlighted .")](#)Example of the 'Create Service Principal' option highlighted . #### Add permissions to your service principal[​](#add-permissions-to-your-service-principal "Direct link to Add permissions to your service principal") An Entra ID admin needs to provide your new app access to Azure DevOps: 1. Select **API permissions** in the left navigation panel. 2. Remove the **Microsoft Graph / User Read** permission. 3. Click **Add a permission**. 4. Select **Azure DevOps**. 5. Select the **user\_impersonation** permission. This is the only permission available for Azure DevOps. #### Connect Azure DevOps to your new app[​](#connect-azure-devops-to-your-new-app "Direct link to Connect Azure DevOps to your new app") An Azure admin will need one of the following permissions in both the Microsoft Entra ID and Azure DevOps environments: * Azure Service Administrator * Azure Co-administrator note You can only add a managed identity or service principal for the tenant to which your organization is connected. You need to add a directory to your organization so that it can access all the service principals and other identities. Navigate to **Organization settings** --> **Microsoft Entra** --> **Connect Directory** to connect. 1. From your Azure DevOps account organization screen, click **Organization settings** in the bottom left. 2. Under **General** settings, click **Users**. 3. Click **Add users**, and in the resulting panel, enter the service principal's name in the first field. Then, click the name when it appears below the field. 4. In the **Add to projects** field, click the boxes for any projects you want to include (or select all). 5. Set the **Azure DevOps Groups** to **Project Administrator**. [![Example setup with the service principal added as a user.](/img/docs/dbt-cloud/connecting-azure-devops/add-service-principal.png?v=2 "Example setup with the service principal added as a user.")](#)Example setup with the service principal added as a user. #### Connect your Microsoft Entra ID app to dbt[​](#connect-your-microsoft-entra-id-app-to-dbt "Direct link to Connect your Microsoft Entra ID app to dbt") A dbt account admin must take the following actions. Once you connect your Microsoft Entra ID app and Azure DevOps, you must provide dbt information about the app. If this is a first-time setup, you will create a new configuration. If you are [migrating from a service user](#migrate-to-service-principal), you can edit an existing configuration and change it to **Service principal**. To create the configuration: 1. Navigate to your account settings in dbt. 2. Select **Integrations**. 3. Scroll to the Azure DevOps section and click the **Edit icon**. 4. Select the **Service principal** option (service user configurations will auto-complete the fields, if applicable). 5. Complete/edit the form (if you are migrating, the existing configurations carry over): * **Azure DevOps Organization:** Must match the name of your Azure DevOps organization exactly. Do not include the `dev.azure.com/` prefix in this field. ✅ Use `my-DevOps-org` ❌ Avoid `dev.azure.com/my-DevOps-org` * **Application (client) ID:** Found in the Microsoft Entra ID app. * **Client Secret**: Copy the **Value** field in the Microsoft Entra ID app client secrets and paste it into the **Client Secret** field in dbt. Entra ID admins are responsible for the expiration of the app secret, and dbt Admins should note the expiration date for rotation. * **Directory(tenant) ID:** Found in the Microsoft Entra ID app. [![Fields for adding Entra ID app to dbt.](/img/docs/cloud-integrations/service-principal-fields.png?v=2 "Fields for adding Entra ID app to dbt.")](#)Fields for adding Entra ID app to dbt. Your Microsoft Entra ID app should now be added to your dbt Account. People on your team who want to develop in the Studio IDE or dbt CLI can now personally [authorize Azure DevOps from their profiles](https://docs.getdbt.com/docs/cloud/git/authenticate-azure.md). #### Migrate to service principal[​](#migrate-to-service-principal "Direct link to Migrate to service principal") Migrate from a service user to a service principal using the existing app. It will only take a few steps, and you won't experience any service disruptions. * Verify whether or not your app has a service principal * If not, create the app service principal * Update the application's configuration * Update the configuration in dbt ##### Verify the service principal[​](#verify-the-service-principal "Direct link to Verify the service principal") You will need an Entra ID admin to complete these steps. To confirm whether your existing app already has a service principal: 1. In the Azure account, navigate to **Microsoft Entra ID** -> **Manage** -> **App registrations**. 2. Click on the application for the service user integration with dbt. 3. Verify whether a name populates the **Managed application in local directory** field. * If a name exists: The service principal has been created. Move on to step 4. * If no name exists: Go to the next section, [Create the service principal](#create-the-service-principal). 4. Follow the instructions to [add permissions](#add-permissions-to-your-service-principal) to your service principal. 5. Follow the instructions to [connect DevOps to your app](#connect-azure-devops-to-your-new-app). 6. In your dbt account: 1. Navigate to **Account settings** and click **Integrations** 2. Click the **edit icon** to the right of the **Azure DevOps** settings. 3. Change **Service user** to **Service principal** and click **Save**. You do not need to edit any existing fields. ##### Create the service principal[​](#create-the-service-principal "Direct link to Create the service principal") If there is no name populating that field, a Service Principal does not exist. To configure a Service Principal, please review the instructions here. If your dbt app does not have a service principal, take the following actions in your Azure account: 1. Navigate to **Microsoft Entra ID**. 2. Under **Manage** on the left-side menu, click **App registrations**. 3. Click the app for the dbt and Azure DevOps integration. 4. Locate the **Managed application in local directory** field and click **Create Service Principal**. [![Example of the 'Create Service Principal' option highlighted .](/img/docs/cloud-integrations/create-service-principal.png?v=2 "Example of the 'Create Service Principal' option highlighted .")](#)Example of the 'Create Service Principal' option highlighted . 5. Follow the instructions to [add permissions](#add-permissions-to-your-service-principal) to your service principal. 6. Follow the instructions to [connect DevOps to your app](#connect-azure-devops-to-your-new-app). 7. In your dbt account: 1. Navigate to **Account settings** and click **Integrations** 2. Click the **edit icon** to the right of the **Azure DevOps** settings. 3. Change **Service user** to **Service principal** and click **Save**. You do not need to edit any existing fields. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ##### Set up Azure DevOps with Service User #### Service user overview[​](#service-user-overview "Direct link to Service user overview") important Service users are no longer a recommended method for authentication and dbt is rolling out a new [Entra ID service principal](https://learn.microsoft.com/en-us/entra/identity-platform/app-objects-and-service-principals) option. Once the option is available in your account settings, you should plan to [migrate from service user to service principal](https://docs.getdbt.com/docs/cloud/git/setup-service-principal.md#migrate-to-service-principal). Service principals are the [Microsoft recommended service account type](https://learn.microsoft.com/en-us/entra/architecture/secure-service-accounts#types-of-microsoft-entra-service-accounts) for app authentication. To use our native integration with Azure DevOps in dbt, an account admin needs to set up an Microsoft Entra ID app. We recommend setting up a separate [Entra ID application than used for SSO](https://docs.getdbt.com/docs/cloud/manage-access/set-up-sso-microsoft-entra-id.md). 1. [Register an Entra ID app](#register-a-microsoft-entra-id-app). 2. [Add permissions to your new app](#add-permissions-to-your-new-app). 3. [Add another redirect URI](#add-another-redirect-uri). 4. [Connect Azure DevOps to your new app](#connect-azure-devops-to-your-new-app). 5. [Add your Entra ID app to dbt](#add-your-azure-ad-app-to-dbt-cloud). Once the Microsoft Entra ID app is added to dbt, an account admin must also [connect a service user](#connecting-a-service-user) via OAuth, which will be used to power headless actions in dbt such as deployment runs and CI. Once the Microsoft Entra ID app is added to dbt and the service user is connected, then dbt developers can personally authenticate in dbt from Azure DevOps. For more on this, see [Authenticate with Azure DevOps](https://docs.getdbt.com/docs/cloud/git/authenticate-azure.md). The following personas are required to complete the steps on this page: * Microsoft Entra ID admin * Azure DevOps admin * dbt account admin * Azure admin (if your Entra ID and Azure DevOps environments are not connected) #### Register a Microsoft Entra ID app[​](#register-a-microsoft-entra-id-app "Direct link to Register a Microsoft Entra ID app") A Microsoft Entra ID admin needs to perform the following steps: 1. Sign into your Azure portal and click **Microsoft Entra ID**. 2. Select **App registrations** in the left panel. 3. Select **New registration**. The form for creating a new Entra ID app opens. 4. Provide a name for your app. We recommend using, "dbt Labs Azure DevOps app". 5. Select **Accounts in any organizational directory (Any Entra ID directory - Multitenant)** as the Supported Account Types. Many customers ask why they need to select Multitenant instead of Single tenant, and they frequently get this step wrong. Microsoft considers Azure DevOps (formerly called Visual Studio) and Microsoft Entra ID as separate tenants, and in order for this Entra ID application to work properly, you must select Multitenant. 6. Add a redirect URI. 1. Select **Web** as the platform. 2. In the field, enter `https://YOUR_ACCESS_URL/complete/azure_active_directory`. Make sure to replace `YOUR_ACCESS_URL` with the [appropriate Access URL](https://docs.getdbt.com/docs/cloud/about-cloud/access-regions-ip-addresses.md) for your region and plan. 7. Click **Register**. [![Navigating to the Entra ID app registrations](/img/docs/dbt-cloud/connecting-azure-devops/ADnavigation.gif?v=2 "Navigating to the Entra ID app registrations")](#)Navigating to the Entra ID app registrations Here's what your app should look like before registering it: [![Registering a Microsoft Entra ID app]( "Registering a Microsoft Entra ID app")](#)Registering a Microsoft Entra ID app #### Add permissions to your new app[​](#add-permissions-to-your-new-app "Direct link to Add permissions to your new app") An Entra ID admin needs to provide your new app access to Azure DevOps: 1. Select **API permissions** in the left navigation panel. 2. Remove the **Microsoft Graph / User Read** permission. 3. Click **Add a permission**. 4. Select **Azure DevOps**. 5. Select the **user\_impersonation** permission. This is the only permission available for Azure DevOps. [![Adding permissions to the app](/img/docs/dbt-cloud/connecting-azure-devops/user-impersonation.gif?v=2 "Adding permissions to the app")](#)Adding permissions to the app #### Add another redirect URI[​](#add-another-redirect-uri "Direct link to Add another redirect URI") A Microsoft Entra ID admin needs to add another redirect URI to your Entra ID application. This redirect URI will be used to authenticate the service user for headless actions in deployment environments. Before adding another redirect URI, make sure you selected **Web** as the platform when you [registered the Microsoft Entra ID app](#register-a-microsoft-entra-id-app). 1. Navigate to your Microsoft Entra ID application. 2. Select the link next to **Redirect URIs**. 3. Click **Add URI** and add the URI, replacing `YOUR_ACCESS_URL` with the [appropriate Access URL](https://docs.getdbt.com/docs/cloud/about-cloud/access-regions-ip-addresses.md) for your region and plan: `https://YOUR_ACCESS_URL/complete/azure_active_directory_service_user` 4. Click **Save**. [![Adding the Service User redirect URI](/img/docs/dbt-cloud/connecting-azure-devops/redirect-uri.gif?v=2 "Adding the Service User redirect URI")](#)Adding the Service User redirect URI #### Create a client secret[​](#create-a-client-secret "Direct link to Create a client secret") A Microsoft Entra ID admin needs to complete the following steps: 1. Navigate to your Microsoft Entra ID application. 2. Select **Certificates and Secrets** from the left navigation panel. 3. Select **Client secrets** and click **New client secret** 4. Give the secret a description and select the expiration time. Click **Add**. 5. Copy the **Value** field and securely share it with the dbt account admin who will complete the setup. #### Connect Azure DevOps to your new app[​](#connect-azure-devops-to-your-new-app "Direct link to Connect Azure DevOps to your new app") An Azure admin will need one of the following permissions in both the Microsoft Entra ID and Azure DevOps environments: * Azure Service Administrator * Azure Co-administrator If your Azure DevOps account is connected to Entra ID, then you can proceed to [Connecting a service user](#connecting-a-service-user). However, if you're just getting set up, connect Azure DevOps to the Microsoft Entra ID app you just created: 1. From your Azure DevOps account, select **Organization settings** in the bottom left. 2. Navigate to Microsoft Entra ID. 3. Click **Connect directory**. 4. Select the directory you want to connect. 5. Click **Connect**. [![Connecting Azure DevOps and Microsoft Entra ID]( "Connecting Azure DevOps and Microsoft Entra ID")](#)Connecting Azure DevOps and Microsoft Entra ID #### Add your Microsoft Entra ID app to dbt[​](#add-your-microsoft-entra-id-app-to-dbt "Direct link to Add your Microsoft Entra ID app to dbt") A dbt account admin needs to perform the following steps. Once you connect your Microsoft Entra ID app and Azure DevOps, you need to provide dbt information about the app: 1. Navigate to your account settings in dbt. 2. Select **Integrations**. 3. Scroll to the Azure DevOps section and click on the pencil icon to edit the integration. 4. Complete the form: * **Azure DevOps Organization:** Must match the name of your Azure DevOps organization exactly. Do not include the `dev.azure.com/` prefix in this field. ✅ Use `my-devops-org` ❌ Avoid `dev.azure.com/my-devops-org` * **Application (client) ID:** Found in the Microsoft Entra ID app. * **Client Secrets:** Copy the **Value** field in the Microsoft Entra ID app client secrets and paste it in the **Client Secret** field in dbt. Entra ID admins are responsible for the Entra ID app secret expiration and dbt Admins should note the expiration date for rotation. * **Directory(tenant) ID:** Found in the Microsoft Entra ID app. [![Adding a Microsoft Entra ID app to dbt](/img/docs/dbt-cloud/connecting-azure-devops/AzureDevopsAppdbtCloud.gif?v=2 "Adding a Microsoft Entra ID app to dbt")](#)Adding a Microsoft Entra ID app to dbt Your Microsoft Entra ID app should now be added to your dbt Account. People on your team who want to develop in the Studio IDE or dbt CLI can now personally [authorize Azure DevOps from their profiles](https://docs.getdbt.com/docs/cloud/git/authenticate-azure.md). #### Connect a service user[​](#connect-a-service-user "Direct link to Connect a service user") A service user is a pseudo user set up in the same way an admin would set up a real user, but it's given permissions specifically scoped for service to service interactions. You should avoid linking authentication to a real Azure DevOps user because if this person leaves your organization, dbt will lose privileges to the dbt Azure DevOps repositories, causing production runs to fail. Service user authentication expiration dbt will refresh the authentication for the service user on each run triggered by the scheduler, API, or CI. If your account does not have any active runs for over 90 days, an admin will need to manually refresh the authentication of the service user by disconnecting and reconnecting the service user's profile via the OAuth flow described above in order to resume headless interactions like project set up, deployment runs, and CI. ##### Service users permissions[​](#service-users-permissions "Direct link to Service users permissions") A service user account must have the following Azure DevOps permissions for all Azure DevOps projects and repos you want accessible in dbt. Read more about how dbt uses each permission in the following paragraphs. * **Project Reader** * **ViewSubscriptions** * **EditSubscriptions** * **DeleteSubscriptions** \* * **PullRequestContribute** * **GenericContribute** \* Note: **DeleteSubscriptions** permission might be included in **EditSubscriptions** depending on your version of Azure. Some of these permissions are only accessible via the [Azure DevOps API](https://docs.microsoft.com/en-us/azure/devops/organizations/security/namespace-reference?view=azure-devops) or [CLI](https://learn.microsoft.com/en-us/cli/azure/devops?view=azure-cli-latest). We’ve also detailed more information on Azure DevOps API usage below to help accelerate the setup. Alternatively, you can use the Azure DevOps UI to enable permissions, but you cannot get the least permissioned set. * Required permissions for service users * Turn off MFA for service user The service user's permissions will also power which repositories a team can select from during dbt project set up, so an Azure DevOps admin must grant at minimum Project Reader access to the service user *before* creating a new project in dbt. If you are migrating an existing dbt project to use the native Azure DevOps integration, the dbt account's service user must have proper permissions on the repository before migration. While it's common to enforce multi-factor authentication (MFA) for normal user accounts, service user authentication must not need an extra factor. If you enable a second factor for the service user, this can interrupt production runs and cause a failure to clone the repository. In order for the OAuth access token to work, the best practice is to remove any more burden of proof of identity for service users. As a result, MFA must be explicity disabled in the Office 365 or Microsoft Entra ID administration panel for the service user. Just having it "un-connected" will not be sufficient, as dbt will be prompted to set up MFA instead of allowing the credentials to be used as intended. **To disable MFA for a single user using the Office 365 Administration console:** * Go to Microsoft 365 admin center -> Users -> Active users -> Select the user -> Manage multifactor authentication -> Select the user -> Disable multi-factor authentication. **To use the Microsoft Entra ID interface:** Note, this procedure involves disabling Security Defaults in your Entra ID environment. 1. Go to the Azure Admin Center. Open Microsoft Entra ID and under the **Manage** section of the left navigation, click **Properties**, scroll down to **Manage Security defaults**, and then select **No** in "Enable Security Defaults" and click **Save**. 2. Go to **Microsoft Entra ID** -> Manage -> Users ->click on the ellipsis (...) and then the **Multi-Factor Authentication** link. If the link is grayed out, you need to make sure you disable **Security Defaults** from the previous step. 3. If MFA is enabled for users, select the user(s) and select **Disable** under **Quick steps**. 4. Select **Yes** to confirm your changes. To re-enable MFA for the user, select them again and click **Enable**. Note, you may have to go through MFA setup for that user after enabling it. **ViewSubscriptions**
**Security Namespace ID:** cb594ebe-87dd-4fc9-ac2c-6a10a4c92046 **Namespace:** ServiceHooks **Permission:** ```json { "bit": 1, "displayName": "View Subscriptions", "name": "ViewSubscriptions" } ``` **Uses:** To view existing Azure DevOps service hooks subscriptions **Token (where applicable - API only):** * PublisherSecurity for access to all projects * PublisherSecurity/\ for per project access **UI/API/CLI:** API/CLI only **Sample CLI code snippet** ```bash az devops security permission update --organization https://dev.azure.com/ --namespace-id cb594ebe-87dd-4fc9-ac2c-6a10a4c92046 --subject @xxxxxx.onmicrosoft.com --token PublisherSecurity/ --allow-bit 1 ``` **EditSubscriptions**
**Security Namespace ID:** cb594ebe-87dd-4fc9-ac2c-6a10a4c92046 **Namespace:** ServiceHooks **Permission:** ```json { "bit": 2, "displayName": "Edit Subscription", "name": "EditSubscriptions" } ``` **Uses:** To add or update existing Azure DevOps service hooks subscriptions **Token (where applicable - API only):** * PublisherSecurity for access to all projects * PublisherSecurity/\ for per project access **UI/API/CLI:** API/CLI only **Sample CLI code snippet** ```bash az devops security permission update --organization https://dev.azure.com/ --namespace-id cb594ebe-87dd-4fc9-ac2c-6a10a4c92046 --subject @xxxxxx.onmicrosoft.com --token PublisherSecurity/ --allow-bit 2 ``` **DeleteSubscriptions**
**Security Namespace ID:** cb594ebe-87dd-4fc9-ac2c-6a10a4c92046 **Namespace:** ServiceHooks **Permission:** ```json { "bit": 4, "displayName": "Delete Subscriptions", "name": "DeleteSubscriptions" } ``` **Uses:** To delete any redundant Azure DevOps service hooks subscriptions **Token (where applicable - API only):** * PublisherSecurity for access to all projects * PublisherSecurity/\ for per project access **UI/API/CLI:** API/CLI only **Sample CLI code snippet** ```bash az devops security permission update --organization https://dev.azure.com/ --namespace-id cb594ebe-87dd-4fc9-ac2c-6a10a4c92046 --subject @xxxxxx.onmicrosoft.com --token PublisherSecurity/ --allow-bit 4 ``` **Additional Notes:** This permission has been deprecated in recent Azure DevOps versions. Edit Subscriptions (bit 2) has Delete permissions. **PullRequestContribute**
**Security Namespace ID:** 2e9eb7ed-3c0a-47d4-87c1-0ffdd275fd87 **Namespace:** Git Repositories **Permission:** ```json { "bit": 16384, "displayName": "Contribute to pull requests", "name": "PullRequestContribute" } ``` **Uses:** To post Pull Request statuses to Azure DevOps **Token (where applicable - API only):** * repoV2 for access to all projects * repoV2/\ for per project access * repoV2/\/\ for per repo access **UI/API/CLI:** UI, API, and CLI **Sample CLI code snippet** ```bash az devops security permission update --organization https://dev.azure.com/ --namespace-id 2e9eb7ed-3c0a-47d4-87c1-0ffdd275fd87 --subject @xxxxxx.onmicrosoft.com --token repoV2// --allow-bit 16384 ``` **Additional Notes:** This permission is automatically inherited if Project Reader/Contributor/Administrator is set in the UI. **GenericContribute**
**Security Namespace ID:** 2e9eb7ed-3c0a-47d4-87c1-0ffdd275fd87 **Namespace:** Git Repositories **Permission:** ```json { "bit": 4, "displayName": "Contribute", "name": "GenericContribute" } ``` **Uses:** To post commit statuses to Azure DevOps **Token (where applicable - API only):** * repoV2 for access to all projects * repoV2/\ for access to a single project at a time * repoV2/\/\ for access to a single repo at a time **UI/API/CLI:** UI, API, and CLI **Sample CLI code snippet** ```bash az devops security permission update --organization https://dev.azure.com/ --namespace-id 2e9eb7ed-3c0a-47d4-87c1-0ffdd275fd87 --subject @xxxxxx.onmicrosoft.com --token repoV2// --allow-bit 4 ``` **Additional Notes:** This permission is automatically inherited if Project Contributor/Administrator is set in the UI. You must connect your service user before setting up a dbt project, as the service user's permissions determine which projects dbt can import. A dbt account admin with access to the service user's Azure DevOps account must complete the following to connect the service user: 1. Sign in to the service user's Azure DevOps account. 2. In dbt, go to **Account settings** > **Integrations**. 3. Go to the **Azure DevOps** section and select **Service User**. 4. Enter values for the required fields. 5. Click **Save**. 6. Click **Link Azure service user**. 7. You will be directed to Azure DevOps and you must accept the Microsoft Entra ID app's permissions. 8. Finally, you will be redirected to dbt, and the service user will be connected. [![Connecting an Azure Service User](/img/docs/dbt-cloud/connecting-azure-devops/azure-service-user.png?v=2 "Connecting an Azure Service User")](#)Connecting an Azure Service User Once connected, dbt displays the email address of the service user so you know which user's permissions are enabling headless actions in deployment environments. To change which account is connected, disconnect the profile in dbt, sign into the alternative Azure DevOps service account, and re-link the account in dbt. Personal Access Tokens (PATs) dbt leverages the service user to generate temporary access tokens called [PATs](https://learn.microsoft.com/en-us/azure/devops/organizations/accounts/use-personal-access-tokens-to-authenticate?toc=%2Fazure%2Fdevops%2Fmarketplace-extensibility%2Ftoc.json\&view=azure-devops\&tabs=Windows). These tokens are limited in scope, are only valid for 5 minutes, and become invalid after a single API call. These tokens are limited to the following [scopes](https://learn.microsoft.com/en-us/azure/devops/integrate/get-started/authentication/oauth?view=azure-devops): * `vso.code_full`: Grants full access to source code and version control metadata (commits, branches, and so on). Also grants the ability to create and manage code repositories, create and manage pull requests and code reviews, and receive notifications about version control events with service hooks. Also includes limited support for Client OM APIs. * `vso.project`: Grants the ability to read projects and teams. * `vso.build_execute`: Grants the ability to access build artifacts, including build results, definitions, and requests, and the ability to queue a build, update build properties, and the ability to receive notifications about build events with service hooks. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ##### Version control basics When you develop in the command line interface (CLI) or Cloud integrated development environment (Studio IDE), you can leverage Git directly to version control your code. To use version control, make sure you are connected to a Git repository in the CLI or Studio IDE. You can create a separate branch to develop and make changes. The changes you make aren’t merged into the default branch in your connected repository (typically named the `main` branch) unless it successfully passes tests. This helps keep the code organized and improves productivity by making the development process smooth. You can read more about git terminology below and also check out [GitHub Docs](https://docs.github.com/en) as well. #### Git overview[​](#git-overview "Direct link to Git overview") Check out some common git terms below that you might encounter when developing: | Name | Definition | | ------------------ | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | Repository or repo | A repository is a directory that stores all the files, folders, and content needed for your project. You can think of this as an object database of the project, storing everything from the files themselves to the versions of those files, commits, and deletions. Repositories are not limited by user and can be shared and copied. | | Branch | A branch is a parallel version of a repository. It is contained within the repository but does not affect the primary or main branch allowing you to work freely without disrupting the live version. When you've made the changes you want to make, you can merge your branch back into the main branch to publish your changes | | Checkout | The `checkout` command is used to create a new branch, change your current working branch to a different branch, or switch to a different version of a file from a different branch. | | Commit | A commit is a user’s change to a file (or set of files). When you make a commit to save your work, Git creates a unique ID that allows you to keep a record of the specific changes committed along with who made them and when. Commits usually contain a commit message which is a brief description of what changes were made. | | main | The primary, base branch of all repositories. All committed and accepted changes should be on the main branch.

In the Studio IDE, the main branch is protected. This means you can't directly edit, format, or lint files, and execute dbt commands in your protected primary git branch when using the dbt platform user interface. Keep in mind that all Studio IDE Git activity is subject to the permissions of your configured credentials, and the rules configured at the remote Git provider (for example, GitHub or GitLab branch protection). Since the Studio IDE prevents commits to the protected branch, you can commit those changes to a new branch. | | Merge | Merge takes the changes from one branch and adds them into another (usually main) branch. These commits are usually first requested via pull request before being merged by a maintainer. | | Pull Request | If someone has changed code on a separate branch of a project and wants it to be reviewed to add to the main branch, they can submit a pull request. Pull requests ask the repo maintainers to review the commits made, and then, if acceptable, merge the changes upstream. A pull happens when adding the changes to the main branch. | | Push | A `push` updates a remote branch with the commits made to the current branch. You are literally *pushing* your changes into the remote. | | Remote | This is the version of a repository or branch that is hosted on a server. Remote versions can be connected to local clones so that changes can be synced. | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | #### The git button in the Cloud IDE[​](#the-git-button-in-the-cloud-ide "Direct link to The git button in the Cloud IDE") You can perform git tasks with the git button in the [Studio IDE](https://docs.getdbt.com/docs/cloud/studio-ide/develop-in-studio.md). The following are descriptions of each git button command and what they do: | Name | Actions | | ----------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | Abort merge | This option allows you to cancel a merge that had conflicts. Be careful with this action because all changes will be reset and this operation can't be reverted, so make sure to commit or save all your changes before you start a merge. | | Change branch | This option allows you to change between branches (checkout). | | Commit | A commit is an individual change to a file (or set of files). When you make a commit to save your work, Git creates a unique ID (a.k.a. the "SHA" or "hash") that allows you to keep record of the specific changes committed along with who made them and when. Commits usually contain a commit message which is a brief description of what changes were made. When you make changes to your code in the future, you'll need to commit them as well. | | Create new branch | This allows you to branch off of your base branch and edit your project. You’ll notice after initializing your project that the main branch is protected.

This means you can directly edit, format, or lint files and execute dbt commands in your protected primary git branch. When ready, you can commit those changes to a new branch. | | Initialize your project | This is done when first setting up your project. Initializing a project creates all required directories and files within an empty repository by using the dbt starter project.
Note: This option will not display if your repo isn't completely empty (i.e. includes a README file).
Once you click **Initialize your project**, click **Commit** to finish setting up your project. | | Open pull request | This allows you to open a pull request in Git for peers to review changes before merging into the base branch. | | Pull changes from main | This option is available if you are on any local branch that is behind the remote version of the base branch or the remote version of the branch that you're currently on. | | Pull from remote | This option is available if you’re on the local base branch and changes have recently been pushed to the remote version of the branch. Pulling in changes from the remote repo allows you to pull in the most recent version of the base branch. | | Rollback to remote | Reset changes to your repository directly from the Studio IDE. You can rollback your repository back to an earlier clone from your remote. To do this, click on the three dot ellipsis in the bottom right-hand side of the Studio IDE and select **Rollback to remote**. | | Refresh git state | This enables you to pull new branches from a different remote branch to your local branch with just one command. | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | #### Merge conflicts[​](#merge-conflicts "Direct link to Merge conflicts") Merge conflicts often occur when multiple users are concurrently making edits to the same section in the same file. This makes it difficult for Git to determine which change should be kept. Refer to [merge conflicts](https://docs.getdbt.com/docs/cloud/git/merge-conflicts.md) to learn how to resolve merge conflicts. #### The .gitignore file[​](#the-gitignore-file "Direct link to The .gitignore file") dbt implements a global [`.gitignore file`](https://github.com/dbt-labs/dbt-starter-project/blob/main/.gitignore) that automatically excludes the following sub-folders from your git repository to ensure smooth operation: ```text dbt_packages/ logs/ target/ ``` This inclusion uses a trailing slash, making these lines in the `.gitignore` file act as 'folder wildcards' that prevent any files or folders within them from being tracked by git. You can also specify additional exclusions as needed for your project. However, this global `.gitignore` *does not* apply to dbt Core and dbt CLI users directly. Therefore, if you're working with dbt Core or dbt CLI, you need to manually add the three lines mentioned previously to your project's `.gitignore` file. It's worth noting that while some git providers generate a basic `.gitignore` file when the repository is created, these often lack the necessary exclusions for dbt. This means it's important to ensure you add the three lines mentioned previously in your `.gitignore` to ensure dbt operates smoothly. note * **dbt projects created after Dec 1, 2022** — If you use the **Initialize dbt Project** button in the Studio IDE to set up a new and empty dbt project, dbt will automatically add a `.gitignore` file with the required entries. If a `.gitignore` file already exists, the necessary folders will be appended to the existing file. * **Migrating project from dbt Core to dbt** — Make sure you check the `.gitignore` file contains the necessary entries. dbt Core doesn't interact with git so dbt doesn't automatically add or verify entries in the `.gitignore` file. Additionally, if the repository already contains dbt code and doesn't require initialization, dbt won't add any missing entries to the .gitignore file. For additional info or troubleshooting tips please refer to the [detailed FAQ](https://docs.getdbt.com/faqs/Git/gitignore.md). #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- #### Manage access ##### About environment-level permissions Environment-level permissions give dbt admins the ability to grant write permission to groups and service tokens for specific [environment types](https://docs.getdbt.com/docs/dbt-cloud-environments.md) within a project. Granting access to an environment give users access to all environment-level write actions and resources associated with their assigned roles. For example, users with a Developer role can create and run jobs within the environment(s) they have access to. For all other environments, those same users will have read-only access. For configuration instructions, check out the [setup page](https://docs.getdbt.com/docs/cloud/manage-access/environment-permissions-setup.md). #### Current limitations[​](#current-limitations "Direct link to Current limitations") Environment-level permissions give dbt admins more flexibility to protect their environments, but it's important to understand that there are some limitations to this feature, so those admins can make informed decisions about granting access. * Environment-level permissions do not allow you to create custom roles and permissions for each resource type in dbt. * You can only select environment types, and can’t specify a particular environment within a project. * You can't select specific resources within environments. dbt jobs and runs are environment resources. * For example, you can't specify that a user only has access to jobs but not runs. Access to a given environment gives the user access to everything within that environment. #### Environments and roles[​](#environments-and-roles "Direct link to Environments and roles") dbt has four different environment types per project: * **Production** — Primary deployment environment. Only one unique Production env per project. * **Development** — Developer testing environment. Only one unique Development env per project. * **Staging** — Pre-prod environment that sits between development and production. Only one unique Staging env per project. * **General** — Mixed use environments. No limit on the number per project. ##### Environment write permissions[​](#environment-write-permissions "Direct link to Environment write permissions") Environment write permissions grant access to create, edit, and delete runs and jobs within an environment. However, they don't grant users access to create or delete environments themselves. See [Enterprise permissions](https://docs.getdbt.com/docs/cloud/manage-access/enterprise-permissions.md) for more information about elevated permission sets. Environment write permissions can be specified for the following roles: * Analyst * Database admin * Developer (Previous default write access for all environments. The new default is read access for environments unless access is specified) * Git admin * Team admin Depending on your current group mappings, you may have to update roles to ensure users have the correct access level to environments. Determine what personas need updated environment access and the roles they should be mapped to. The personas below highlight a few scenarios for environment permissions: * **Developer** — Write access to create/run jobs in non-production environments * **Testing/QA** — Write access to staging and development environments to test * **Production deployment** — Write access to all environments, including production, for deploying * **Analyst** — Doesn't need environmental write access but read-only access for discovery and troubleshooting * **Other admins** — These admins may need write access to create/run jobs or configure integrations for any number of environments #### Projects and environments[​](#projects-and-environments "Direct link to Projects and environments") Environment-level permissions can be enforced over one or multiple projects with mixed access to the environments themselves. ##### Single project environments[​](#single-project-environments "Direct link to Single project environments") If you’re working with a single project, we recommend restricting access to the Production environment and ensuring groups have access to Development, Staging, or General environments where they can safely create and run jobs. The following is an example of how the personas could be mapped to roles: * **Developer:** Developer role with write access to Development and General environments * **Testing/QA:** Developer role with write access to Development, Staging, and General environments * **Production Deployment:** Developer role with write access to all environments or Job Admin which has access to all environments by default. * **Analyst:** Analyst role with no write access and read-only access to environments. * **Other Admins:** Depends on the admin needs. For example, if they are managing the production deployment grant access to all environments. ##### Multiple projects[​](#multiple-projects "Direct link to Multiple projects") Let's say Acme corp has 12 projects and 3 of them belong to Finance, 3 belong to Marketing, 4 belong to Manufacturing, and 2 belong to Technology. With mixed access across projects: * **Developer:** If the user has the Developer role and has access to Projects A, B, C, then they only need access to Dev and General environments. * **Testing/QA:** If they have the Developer role and they have access to Projects A, B, C, then they only need access to Development, Staging, and General environments. * **Production Deployment:** If the user has the Admin *or* Developer role *and* they have access to Projects A, B, C, then they need access to all Environments. * **Analyst:** If the user has the Analyst role, then the need *no* write access to *any environment*. * **Other Admins:** A user (non-Admin) can have access to multiple projects depending on the requirements. If the user has the same roles across projects, you can apply environment access across all projects. #### Related docs[​](#related-docs "Direct link to Related docs") * [Environment-level permissions setup](https://docs.getdbt.com/docs/cloud/manage-access/environment-permissions-setup.md) #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ##### About user access "User access" is not "Model access" This page covers user groups and access, including: * User licenses, permissions, and group memberships * Role-based access controls for projects and environments * Single sign-on, and secure authentication For model-specific access and their availability across projects, refer to [Model access](https://docs.getdbt.com/docs/mesh/govern/model-access.md). You can regulate access to dbt by various measures, including licenses, groups, permissions, and role-based access control (RBAC). To understand the possible approaches to user access to dbt features and functionality, you should first know how we approach users and groups. #### Users[​](#users "Direct link to Users") Individual users in dbt can be people you [manually invite](https://docs.getdbt.com/docs/cloud/manage-access/invite-users.md) or grant access via an external identity provider (IdP), such as Microsoft Entra ID, Okta, or Google Workspace. In either scenario, when you add a user to dbt, they are assigned a [license](#licenses). You assign licenses at the individual user or group levels. When you manually invite a user, you will assign the license in the invitation window. [![Example of the license dropdown in the user invitation window.](/img/docs/dbt-cloud/dbt-cloud-enterprise/access-control/license-dropdown.png?v=2 "Example of the license dropdown in the user invitation window.")](#)Example of the license dropdown in the user invitation window. You can edit an existing user's license by navigating to the **Users** section of the **Account settings**, clicking on a user, and clicking **Edit** on the user pane. Delete users from this same window to free up licenses for new users. [![Example of the user information window in the user directory](/img/docs/dbt-cloud/dbt-cloud-enterprise/access-control/edit-user.png?v=2 "Example of the user information window in the user directory")](#)Example of the user information window in the user directory ##### User passwords[​](#user-passwords "Direct link to User passwords") By default, new users will be prompted to set a password for their account. All plan tiers support and enforce [multi-factor authentication](https://docs.getdbt.com/docs/cloud/manage-access/mfa.md) for users with password logins. However, they will still need to configure their password before configuring MFA. Enterprise tier accounts can configure [SSO](#sso-mappings) and advanced authentication measures. Developer and Starter plans only support user passwords with MFA. User passwords must meet the following criteria: * Be at least nine characters in length * Contain at least one uppercase and one lowercase letter * Contain at least one number 0-9 * Contain at least one special character #### Groups[​](#groups "Direct link to Groups") Groups in dbt serve much of the same purpose as they do in traditional directory tools — to gather individual users together to make bulk assignments of permissions easier. The permissions available depends on whether you're on an [Enterprise-tier](https://docs.getdbt.com/docs/cloud/manage-access/enterprise-permissions.md) or [self-service Starter](https://docs.getdbt.com/docs/cloud/manage-access/self-service-permissions.md) plan. * Admins use groups in dbt to assign [licenses](#licenses) and [permissions](#permissions). * The permissions are more granular than licenses, and you only assign them at the group level; *you can’t assign permissions at the user level.* * Every user in dbt must be assigned to at least one group. There are three default groups available as soon as you create your dbt account (the person who created the account is added to all three automatically): * **Owner:** This group is for individuals responsible for the entire account and will give them elevated account admin privileges. You cannot change the permissions. * **Member:** This group is for the general members of your organization. Default permissions are broad, restricting only access to features that can alter billing or security. By default, dbt adds new users to this group. * **Everyone:** A general group for all members of your organization. Customize the permissions to fit your organizational needs. By default, dbt adds new users to this group. default group permissions The `Owner` and `Member` groups have default permission sets: * **Starter plan:** The `Owner` and `Member` groups use the `Owner` and `Member` [permission sets](https://docs.getdbt.com/docs/cloud/manage-access/self-service-permissions.md#table-of-groups-licenses-and-permissions), respectively. * **Enterprise plans:** By default, dbt assigns the `Owner` group an [`Account Admin`](https://docs.getdbt.com/docs/cloud/manage-access/enterprise-permissions.md#account-admin) permission set, and the `Member` group a `Member` permission set, which doesn't appear in the settings, but has the same privileges as the [`Admin`](https://docs.getdbt.com/docs/cloud/manage-access/enterprise-permissions.md#admin) permission set. Default groups are automatically provisioned for all accounts to simplify the initial set up. We recommend creating your own organizational groups so you can customize the permissions. Once you create your own groups, you can delete the default groups. ##### Create new groups [Enterprise](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing")[Enterprise +](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing")[​](#create-new-groups- "Direct link to create-new-groups-") * Create new groups from the **Groups & Licenses** section of the **Account settings**. * If you use an external IdP for SSO, you can sync those SSO groups to dbt from the **Group details** pane when creating or editing existing groups. [![Example the new group pane in the account settings.](/img/docs/dbt-cloud/dbt-cloud-enterprise/access-control/new-group.png?v=2 "Example the new group pane in the account settings.")](#)Example the new group pane in the account settings. important If a user is assigned licenses and permissions from multiple groups, the group that grants the most access will take precedence. You must assign a permission set to any groups created beyond the three defaults, or users assigned will not have access to features beyond their user profile. ##### Group access and permissions[​](#group-access-and-permissions "Direct link to Group access and permissions") The **Access & Permissions** section of a group is where you can assign users the right level of access based on their role or responsibilities. You decide: * Projects the group can access * Roles that the group members are assigned for each * Environments the group can edit This setup provides you with the flexibility to determine the level of access users in any given group will have. For example, you might allow one group of analysts to edit jobs in their project, but only let them view related projects, or you could grant admin-level access to a team that owns a specific project while keeping others restricted to read-only. [![Assign a variety of roles and access permissions to user groups.](/img/docs/dbt-cloud/dbt-cloud-enterprise/access-control/sample-access-policy.png?v=2 "Assign a variety of roles and access permissions to user groups.")](#)Assign a variety of roles and access permissions to user groups. ###### Environment write access[​](#environment-write-access "Direct link to Environment write access") Some permission sets grant users read-only access to environment settings that can be overridden if you assign them to a group with **Environment write access**. They will then be able to create, edit, and delete environment settings such as jobs and runs, bypassing the read-only restriction. This elevated access doesn't grant users the ability to create or delete environments. In the following example, the `analyst` permission set, which by default has read-only access to jobs, is assigned to the group across all projects; however, the **Environment write access** is set to `All Environments`. This grants all users in this group the ability to create, edit, and delete jobs across all environments and projects. [![Users assigned environment write access will be able to edit environment settings.](/img/docs/dbt-cloud/dbt-cloud-enterprise/access-control/environment-write.png?v=2 "Users assigned environment write access will be able to edit environment settings.")](#)Users assigned environment write access will be able to edit environment settings. Only use **Environment write access** settings when you intend to grant users the ability to edit environments. To grant users only the permissions inherent to their set, leave this setting blank (all boxes unchecked). ##### SSO mappings [Enterprise](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing")[Enterprise +](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing")[​](#sso-mappings- "Direct link to sso-mappings-") SSO Mappings connect an identity provider (IdP) group membership to a dbt group. When users log into dbt via a supported identity provider, their IdP group memberships sync with dbt. Upon logging in successfully, the user's group memberships (and permissions) will automatically adjust within dbt. Creating SSO Mappings While dbt supports mapping multiple IdP groups to a single dbt group, we recommend using a 1:1 mapping to make administration as simple as possible. Use the same names for your dbt groups and your IdP groups. Create an SSO mapping in the group view: 1. Open an existing group to edit or create a new group. 2. In the **SSO** portion of the group screen, enter the name of the SSO group exactly as it appears in the IdP. If the name is not the same, the users will not be properly placed into the group. 3. In the **Users** section, ensure the **Add all users by default** option is disabled. 4. Save the group configuration. New SSO users will be added to the group upon login, and existing users will be added to the group upon their next login. [![Example of an SSO group mapped to a dbt group.](/img/docs/dbt-cloud/dbt-cloud-enterprise/access-control/sso-mapping.png?v=2 "Example of an SSO group mapped to a dbt group.")](#)Example of an SSO group mapped to a dbt group. Refer to [role-based access control](#role-based-access-control) for more information about mapping SSO groups for user assignment to dbt groups. #### Grant access[​](#grant-access "Direct link to Grant access") dbt users have both a license (assigned to an individual user or by group membership) and permissions (by group membership only) that determine what actions they can take. Licenses are account-wide, and permissions provide more granular access or restrictions to specific features. ##### Licenses[​](#licenses "Direct link to Licenses") Every user in dbt will have a license assigned. Licenses consume "seats" which impact how your account is [billed](https://docs.getdbt.com/docs/cloud/billing.md), depending on your [service plan](https://www.getdbt.com/pricing). There are four license types in dbt: * **Analyst** — Available on [Enterprise and Enterprise+ plans only](https://www.getdbt.com/pricing). Requires developer seat license purchase. * User can be granted *any* permission sets. * **Developer** — User can be granted *any* permission sets. * **IT** — Available on [Starter, Enterprise, and Enterprise+ plans only](https://www.getdbt.com/pricing). User has Security Admin and Billing Admin [permissions](https://docs.getdbt.com/docs/cloud/manage-access/enterprise-permissions.md#permission-sets) applied, as well as permissions to edit **Connections** in the **Account settings** page. * Can manage users, groups, connections, and licenses, among other permissions. * *IT licensed users do not inherit rights from any permission sets*. * Every IT licensed user has the same access across the account, regardless of the group permissions assigned. * **Read-Only** — Available on [Starter, Enterprise, and Enterprise+ plans only](https://www.getdbt.com/pricing). * User has read-only permissions applied to all dbt resources. * Intended to view the [artifacts](https://docs.getdbt.com/docs/deploy/artifacts.md) and the [deploy](https://docs.getdbt.com/docs/deploy/deployments.md) section (jobs, runs, schedules) in a dbt account, but can’t make changes. * *Read-only licensed users do not inherit rights from any permission sets*. * Every read-only licensed user has the same access across the account, regardless of the group permissions assigned. Developer licenses will make up a majority of the users in your environment and have the highest impact on billing, so it's important to monitor how many you have at any given time. For more information on these license types, see [Seats & Users](https://docs.getdbt.com/docs/cloud/manage-access/seats-and-users.md) License types override group permissions **User license types always override their assigned group permission sets.** For example, a user with a Read-Only license cannot perform administrative actions, even if they belong to an Account Admin group. This ensures that license restrictions are always enforced, regardless of group membership. ##### Permissions[​](#permissions "Direct link to Permissions") Permissions determine what a developer-licensed user can do in your dbt account. By default, members of the `Owner` and `Member` groups have full access to all areas and features. When you want to restrict access to features, assign users to groups with stricter permission sets. Keep in mind that if a user belongs to multiple groups, the most permissive group will take precedence. The permissions available depends on whether you're on an [Enterprise, Enterprise+](https://docs.getdbt.com/docs/cloud/manage-access/enterprise-permissions.md), or [self-service Starter](https://docs.getdbt.com/docs/cloud/manage-access/self-service-permissions.md) plan. Developer accounts only have a single user, so permissions aren't applicable. [![Example permissions dropdown while editing an existing group.](/img/docs/dbt-cloud/dbt-cloud-enterprise/access-control/assign-group-permissions.png?v=2 "Example permissions dropdown while editing an existing group.")](#)Example permissions dropdown while editing an existing group. Some permissions (those that don't grant full access, like admins) allow groups to be "assigned" to specific projects and environments only. Read about [environment-level permissions](https://docs.getdbt.com/docs/cloud/manage-access/environment-permissions-setup.md) for more information on restricting environment access. [![Example environment access control for a group with Git admin assigned.](/img/docs/dbt-cloud/dbt-cloud-enterprise/access-control/environment-access-control.png?v=2 "Example environment access control for a group with Git admin assigned.")](#)Example environment access control for a group with Git admin assigned. #### Role-based access control [Enterprise](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing")[Enterprise +](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing")[​](#role-based-access-control- "Direct link to role-based-access-control-") Role-based access control (RBAC) allows you to grant users access to features and functionality based on their group membership. With this method, you can grant users varying access levels to different projects and environments. You can take access and security to the next level by integrating dbt with a third-party identity provider (IdP) to grant users access when they authenticate with your SSO or OAuth service. There are a few things you need to know before you configure RBAC for SSO users: * New SSO users join any groups with the **Add all new users by default** option enabled. By default, the `Everyone` and `Member` groups have this option enabled. Disable this option across all groups for the best RBAC experience. * You must have the appropriate SSO groups configured in the group details SSO section. If the SSO group name does not match *exactly*, users will not be placed in the group correctly. [![The Group details SSO section with a group configured.](/img/docs/dbt-cloud/dbt-cloud-enterprise/access-control/sso-window-details.png?v=2 "The Group details SSO section with a group configured.")](#)The Group details SSO section with a group configured. * dbt Labs recommends that your dbt group names match the IdP group names. Let's say you have a new employee being onboarded into your organization using [Okta](https://docs.getdbt.com/docs/cloud/manage-access/set-up-sso-okta.md) as the IdP and dbt groups with SSO mappings. In this scenario, users are working on `The Big Project` and a new analyst named `Euclid Ean` is joining the group. Check out the following example configurations for an idea of how you can implement RBAC for your organization (these examples assume you have already configured [SSO](https://docs.getdbt.com/docs/cloud/manage-access/sso-overview.md)):  Okta configuration You and your IdP team add `Euclid Ean` to your Okta environment and assign them to the `dbt` SSO app via a group called `The Big Project`. [![The user in the group in Okta.](/img/docs/dbt-cloud/dbt-cloud-enterprise/access-control/okta-group-config.png?v=2 "The user in the group in Okta.")](#)The user in the group in Okta. Configure the group attribute statements the `dbt` application in Okta. The group statements in the following example are set to the group name exactly (`The Big Project`), but yours will likely be a much broader configuration. Companies often use the same prefix across all dbt groups in their IdP. For example `DBT_GROUP_` [![Group attributes set in the dbt SAML 2.0 app in Okta.](/img/docs/dbt-cloud/dbt-cloud-enterprise/access-control/group-attributes.png?v=2 "Group attributes set in the dbt SAML 2.0 app in Okta.")](#)Group attributes set in the dbt SAML 2.0 app in Okta.  dbt configuration You and your dbt admin team configure the groups in your account's settings: 1. Navigate to the **Account settings** and click **Groups & Licenses** on the left-side menu. 2. Click **Create group** or select an existing group and click **Edit**. 3. Enter the group name in the **SSO** field. 4. Configure the **Access and permissions** fields to your needs. Select a [permission set](https://docs.getdbt.com/docs/cloud/manage-access/enterprise-permissions.md), the project they can access, and [environment-level access](https://docs.getdbt.com/docs/cloud/manage-access/environment-permissions.md). [![The group configuration with SSO field filled out in dbt.](/img/docs/dbt-cloud/dbt-cloud-enterprise/access-control/dbt-cloud-group-config.png?v=2 "The group configuration with SSO field filled out in dbt.")](#)The group configuration with SSO field filled out in dbt. Euclid is limited to the `Analyst` role, the `Jaffle Shop` project, and the `Development`, `Staging`, and `General` environments of that project. Euclid has no access to the `Production` environment in their role.  The user journey Euclid takes the following steps to log in: 1. Access the SSO URL or the dbt app in their Okta account. The URL can be found on the **SSO & SCIM** configuration page in the **Account settings**. [![The SSO login URL in the account settings.](/img/docs/dbt-cloud/dbt-cloud-enterprise/access-control/sso-login-url.png?v=2 "The SSO login URL in the account settings.")](#)The SSO login URL in the account settings. 2. Log in with their Okta credentials. [![The SSO login screen when using Okta as the identity provider.](/img/docs/dbt-cloud/dbt-cloud-enterprise/access-control/sso-login.png?v=2 "The SSO login screen when using Okta as the identity provider.")](#)The SSO login screen when using Okta as the identity provider. 3. Since it's their first time logging in with SSO, Euclid Ean is presented with a message and no option to move forward until they check the email address associated with their Okta account. [![The screen users see after their first SSO login.](/img/docs/dbt-cloud/dbt-cloud-enterprise/access-control/post-login-screen.png?v=2 "The screen users see after their first SSO login.")](#)The screen users see after their first SSO login. 4. They now open their email and click the link to join dbt Labs. [![The email the user receives on first SSO login.](/img/docs/dbt-cloud/dbt-cloud-enterprise/access-control/sample-email.png?v=2 "The email the user receives on first SSO login.")](#)The email the user receives on first SSO login. 5. Their email address is now verified. They click **Authenticate with your enterprise login**, which completes the process. [![The confirmation that the email address is verified.](/img/docs/dbt-cloud/dbt-cloud-enterprise/access-control/email-verified.png?v=2 "The confirmation that the email address is verified.")](#)The confirmation that the email address is verified. Euclid is now logged in to their account. They only have access to the `Jaffle Shop` project. Under **Orchestration**, they can configure development credentials. [![The Orchestration page with the environments.](/img/docs/dbt-cloud/dbt-cloud-enterprise/access-control/orchestration-environments.png?v=2 "The Orchestration page with the environments.")](#)The Orchestration page with the environments. The `Production` environment is visible, but it is `read-only`, and they have full access in the `Staging` environment. [![The Production environment landing page with read-only access.](/img/docs/dbt-cloud/dbt-cloud-enterprise/access-control/production-restricted.png?v=2 "The Production environment landing page with read-only access.")](#)The Production environment landing page with read-only access. [![The Staging environment landing page with full access.](/img/docs/dbt-cloud/dbt-cloud-enterprise/access-control/staging-access.png?v=2 "The Staging environment landing page with full access.")](#)The Staging environment landing page with full access. With RBAC configured, you now have granular control over user access to features across dbt. ##### SCIM license management[​](#scim-license-management "Direct link to SCIM license management") As part of the SSO configuration for supported IdPs, you can also configure the [System for Cross-Domain Identity Management (SCIM)](https://docs.getdbt.com/docs/cloud/manage-access/scim.md) settings to add a layer of security to your user lifecycle management. As part of this process, you can integrate user license distribution into the user provisioning process through your IdP. See the [SCIM license management instructions](https://docs.getdbt.com/docs/cloud/manage-access/scim-manage-user-licenses.md) for more information. #### FAQs[​](#faqs "Direct link to FAQs")  When are IdP group memberships updated for SSO Mapped groups? Group memberships are updated whenever a user logs into dbt via a supported SSO provider. If you've changed group memberships in your identity provider or dbt, ask your users to log back into dbt to synchronize these group memberships.  Can I set up SSO without RBAC? Yes, see the documentation on [Manual Assignment](#manual-assignment) above for more information on using SSO without RBAC.  Can I configure a user's license type based on IdP attributes? Yes, see the docs on [managing license types](https://docs.getdbt.com/docs/cloud/manage-access/seats-and-users.md#managing-license-types) for more information.  Why can't I edit a user's group membership? Don't try to edit your own user, as this isn't allowed for security reasons. You'll need a different user to make changes to your own user's group membership.  How do I add or remove users? Each dbt plan has a base number of Developer and Read-Only licenses. You can add or remove licenses by modifying the number of users in your account settings. * If you're on an Enterprise or Enterprise+ plan and have the correct [permissions](https://docs.getdbt.com/docs/cloud/manage-access/enterprise-permissions.md), you can add or remove developers by adjusting your developer user seat count in **Account settings** -> **Users**. * If you're on a Starter plan and have the correct [permissions](https://docs.getdbt.com/docs/cloud/manage-access/self-service-permissions.md), you can add or remove developers by making two changes: adjust your developer user seat count AND your developer billing seat count in **Account settings** -> **Users** and then in **Account settings** -> **Billing**. For detailed steps, refer to [Users and licenses](https://docs.getdbt.com/docs/cloud/manage-access/seats-and-users.md#licenses). #### Learn more[​](#learn-more "Direct link to Learn more") #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ##### dbt audit log EnterpriseEnterprise + ### dbt audit log [Enterprise](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing")[Enterprise +](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing") To review actions performed by people in your organization, dbt provides logs of audited user and system events in real time. The audit log appears as events happen and includes details such as who performed the action, what the action was, and when it was performed. You can use these details to troubleshoot access issues, perform security audits, or analyze specific events. You must be an **Account Admin** or an **Account Viewer** to access the audit log and this feature is only available on Enterprise plans. The dbt audit log stores all the events that occurred in your organization in real-time, including: * For events within 90 days, the dbt audit log has a selectable date range that lists events triggered. * For events beyond 90 days, **Account Admins** and **Account Viewers** can [export all events](#exporting-logs) by using **Export All**. Note that the retention period for events in the audit log is at least 12 months. #### Accessing the audit log[​](#accessing-the-audit-log "Direct link to Accessing the audit log") To access the audit log, click on your account name in the left-side menu and select **Account settings**. Click **Audit log** in the left sidebar. #### Understanding the audit log[​](#understanding-the-audit-log "Direct link to Understanding the audit log") On the audit log page, you will see a list of various events and their associated event data. Each of these events show the following information in dbt: * **Event name**: Action that was triggered * **Agent**: User who triggered that action/event * **Timestamp**: Local timestamp of when the event occurred ##### Event details[​](#event-details "Direct link to Event details") Click the event card to see the details about the activity that triggered the event. This view provides important details, including when it happened and what type of event was triggered. For example, if someone changes the settings for a job, you can use the event details to see which job was changed (type of event: `job_definition.Changed`), by whom (person who triggered the event: `actor`), and when (time it was triggered: `created_at_utc`). For types of events and their descriptions, see [Events in audit log](#audit-log-events). The event details provide the key factors of an event: | Name | Description | | -------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------ | | account\_id | Account ID of where the event occurred | | actor | Actor that carried out the event - User or Service | | actor\_id | Unique ID of the actor | | actor\_ip | IP address of the actor | | actor\_name | Identifying name of the actor | | actor\_type | Whether the action was done by a user or an API request | | created\_at | UTC timestamp of when the event occurred | | event\_type | Unique key identifying the event | | event\_context | This key will be different for each event and will match the event\_type. This data will include all the details about the object(s) that was changed. | | id | Unique ID of the event | | service | Service that carried out the action | | source | Source of the event - dbt UI or API | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | #### Audit log events[​](#audit-log-events "Direct link to Audit log events") The audit log supports various events for different objects in dbt. You will find events for authentication, environment, jobs, service tokens, groups, user, project, permissions, license, connection, repository, and credentials. ##### Authentication[​](#authentication "Direct link to Authentication") | Event Name | Event Type | Description | | -------------------------- | ------------------------ | ----------------------------------------------------------------------------------------------------------------------- | | Access Token Issued | access\_token.issued | dbt issued an access token after OAuth sign-in (for example, VS Code extension or Model Context Protocol (MCP) server). | | Auth Provider Changed | auth\_provider.changed | Authentication provider settings changed | | Credential Login Succeeded | login.password.succeeded | User successfully logged in with username and password | | Refresh Token Issued | refresh\_token.issued | dbt issued a refresh token after OAuth sign-in (for example, VS Code extension or MCP server). | | SSO Login Failed | login.sso.failed | User login via SSO failed | | SSO Login Succeeded | login.sso.succeeded | User successfully logged in via SSO | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | ##### Environment[​](#environment "Direct link to Environment") | Event Name | Event Type | Description | | ------------------- | ------------------- | ------------------------------------ | | Environment Added | environment.added | New environment successfully created | | Environment Changed | environment.changed | Environment settings changed | | Environment Removed | environment.removed | Environment successfully removed | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | ##### Jobs[​](#jobs "Direct link to Jobs") | Event Name | Event Type | Description | | ----------- | ----------------------- | ---------------------------- | | Job Added | job\_definition.added | New Job successfully created | | Job Changed | job\_definition.changed | Job settings changed | | Job Removed | job\_definition.removed | Job definition removed | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | ##### Service Token[​](#service-token "Direct link to Service Token") | Event Name | Event Type | Description | | --------------------- | ---------------------- | ------------------------------------------ | | Service Token Created | service\_token.created | New Service Token was successfully created | | Service Token Revoked | service\_token.revoked | Service Token was revoked | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | ##### Group[​](#group "Direct link to Group") | Event Name | Event Type | Description | | ------------- | ------------- | ------------------------------ | | Group Added | group.added | New Group successfully created | | Group Changed | group.changed | Group settings changed | | Group Removed | group.removed | Group successfully removed | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | ##### User[​](#user "Direct link to User") | Event Name | Event Type | Description | | ---------------------------- | ------------------------ | ----------------------------------------------- | | Invite Added | user.invite.added | User invitation added and sent to the user | | Invite Redeemed | user.invite.redeemed | User redeemed invitation | | User Added to Account | user.added | New user added to the account | | User Added to Group | group.user.added | An existing user was added to a group | | User Removed from Account | user.removed | User removed from the account | | User Removed from Group | group.user.removed | An existing user was removed from a group | | User License Created | user\_license.added | A new user license was consumed | | User License Removed | user\_license.removed | A user license was removed from the seat count | | Verification Email Confirmed | user.jit.email.confirmed | Email verification confirmed by user | | Verification Email Sent | user.jit.email.sent | Email verification sent to user created via JIT | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | ##### Project[​](#project "Direct link to Project") | Event Name | Event Type | Description | | --------------- | --------------- | ------------------------ | | Project Added | project.added | New project added | | Project Changed | project.changed | Project settings changed | | Project Removed | project.removed | Project is removed | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | ##### Permissions[​](#permissions "Direct link to Permissions") | Event Name | Event Type | Description | | ----------------------- | ------------------ | ------------------------------ | | User Permission Added | permission.added | New user permissions are added | | User Permission Removed | permission.removed | User permissions are removed | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | ##### License[​](#license "Direct link to License") | Event Name | Event Type | Description | | ----------------------- | -------------------- | ----------------------------------------- | | License Mapping Added | license\_map.added | New user license mapping is added | | License Mapping Changed | license\_map.changed | User license mapping settings are changed | | License Mapping Removed | license\_map.removed | User license mapping is removed | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | ##### Connection[​](#connection "Direct link to Connection") | Event Name | Event Type | Description | | ------------------ | ------------------ | ------------------------------------------ | | Connection Added | connection.added | New Data Warehouse connection added | | Connection Changed | connection.changed | Data Warehouse Connection settings changed | | Connection Removed | connection.removed | Data Warehouse connection removed | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | ##### Repository[​](#repository "Direct link to Repository") | Event Name | Event Type | Description | | ------------------ | ------------------ | --------------------------- | | Repository Added | repository.added | New repository added | | Repository Changed | repository.changed | Repository settings changed | | Repository Removed | repository.removed | Repository removed | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | ##### Credentials[​](#credentials "Direct link to Credentials") | Event Name | Event Type | Description | | -------------------------------- | ------------------- | -------------------------------- | | Credentials Added to Project | credentials.added | Project credentials added | | Credentials Changed in Project | credentials.changed | Credentials changed in project | | Credentials Removed from Project | credentials.removed | Credentials removed from project | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | ##### Git integration[​](#git-integration "Direct link to Git integration") | Event Name | Event Type | Description | | -------------------------- | --------------------------- | ----------------------------------- | | GitLab Application Changed | gitlab\_application.changed | GitLab configuration in dbt changed | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | ##### Webhooks[​](#webhooks "Direct link to Webhooks") | Event Name | Event Type | Description | | ----------------------------- | ----------------------------- | -------------------------------------- | | Webhook Subscriptions Added | webhook\_subscription.added | New webhook configured in settings | | Webhook Subscriptions Changed | webhook\_subscription.changed | Existing webhook configuration altered | | Webhook Subscriptions Removed | webhook\_subscription.removed | Existing webhook deleted | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | ##### Semantic Layer[​](#semantic-layer "Direct link to Semantic Layer") | Event Name | Event Type | Description | | ---------------------------------- | ------------------------------------ | ------------------------------------------------------------------------------------ | | Semantic Layer Config Added | semantic\_layer\_config.added | Semantic Layer config added | | Semantic Layer Config Changed | semantic\_layer\_config.changed | Semantic Layer config (not related to credentials) changed | | Semantic Layer Config Removed | semantic\_layer\_config.removed | Semantic Layer config removed | | Semantic Layer Credentials Added | semantic\_layer\_credentials.added | Semantic Layer credentials added | | Semantic Layer Credentials Changed | semantic\_layer\_credentials.changed | Semantic Layer credentials changed. Does not trigger semantic\_layer\_config.changed | | Semantic Layer Credentials Removed | semantic\_layer\_credentials.removed | Semantic Layer credentials removed | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | ##### Extended attributes[​](#extended-attributes "Direct link to Extended attributes") | Event Name | Event Type | Description | | -------------------------- | ---------------------------- | ------------------------------------- | | Extended Attribute Added | extended\_attributes.added | Extended attribute added to a project | | Extended Attribute Changed | extended\_attributes.changed | Extended attribute changed or removed | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | ##### Account-scoped personal access token[​](#account-scoped-personal-access-token "Direct link to Account-scoped personal access token") | Event Name | Event Type | Description | | -------------------------------------------- | ---------------------------- | --------------------------------- | | Account Scoped Personal Access Token Created | account\_scoped\_pat.created | An account-scoped PAT was created | | Account Scoped Personal Access Token Deleted | account\_scoped\_pat.deleted | An account-scoped PAT was deleted | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | ##### IP restrictions[​](#ip-restrictions "Direct link to IP restrictions") | Event Name | Event Type | Description | | ---------------------------- | ----------------------------- | ------------------------------------------- | | IP Restrictions Toggled | ip\_restrictions.toggled | IP restrictions feature enabled or disabled | | IP Restrictions Rule Added | ip\_restrictions.rule.added | IP restriction rule created | | IP Restrictions Rule Changed | ip\_restrictions.rule.changed | IP restriction rule edited | | IP Restrictions Rule Removed | ip\_restrictions.rule.removed | IP restriction rule deleted | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | ##### SCIM[​](#scim "Direct link to SCIM") | Event Name | Event Type | Description | | -------------- | ----------------------------------- | -------------------------------------------- | | User Creation | v1.events.account.UserAdded | New user created by SCIM service | | User Update | v1.events.account.UserUpdated | User record updated by SCIM service | | User Removal | v1.events.account.UserRemoved | User deleted by the SCIM service | | Group Creation | v1.events.user\_group.Added | New group created by SCIM service | | Group Update | v1.events.user\_group\_user.Changed | Group membership was updated by SCIM service | | Group Removal | v1.events.user\_group.Removed | Group removed by SCIM service | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | #### Searching the audit log[​](#searching-the-audit-log "Direct link to Searching the audit log") You can search the audit log to find a specific event or actor, which is limited to the ones listed in [Events in audit log](#events-in-audit-log). The audit log successfully lists historical events spanning the last 90 days. You can search for an actor or event using the search bar, and then narrow your results using the time window. [![Use search bar to find content in the audit log](/img/docs/dbt-cloud/dbt-cloud-enterprise/audit-log-search.png?v=2 "Use search bar to find content in the audit log")](#)Use search bar to find content in the audit log #### Exporting logs[​](#exporting-logs "Direct link to Exporting logs") You can use the audit log to export all historical audit results for security, compliance, and analysis purposes. Events in the audit log are retained for at least 12 months. * **For events within 90 days** — dbt will automatically display the 90-day selectable date range. Select **Export Selection** to download a CSV file of all the events that occurred in your organization within 90 days. * **For events beyond 90 days** — Select **Export All**. The Account Admin or Account Viewer will receive an email link to download a CSV file of all the events that occurred in your organization. [![View audit log export options](/img/docs/dbt-cloud/dbt-cloud-enterprise/audit-log-section.png?v=2 "View audit log export options")](#)View audit log export options #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ##### Enterprise permissions EnterpriseEnterprise + ### Enterprise permissions [Enterprise](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing")[Enterprise +](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing") Available to Enterprise-tier plans This feature is available to the dbt Enterprise and Enterprise+ plans. If you're interested in learning more, contact us at . The dbt Enterprise and Enterprise+ plans support a number of pre-built permission sets to help manage access controls within a dbt account. See the docs on [access control](https://docs.getdbt.com/docs/cloud/manage-access/about-user-access.md) for more information on Role-Based access control (RBAC). #### Permission sets[​](#permission-sets "Direct link to Permission sets") The following permission sets are available for assignment in all dbt Enterprise-tier accounts. They can be granted to dbt groups and then to users. A dbt group can be associated with more than one permission set. Permission assignments with more access take precedence. Access to dbt features and functionality is split into `account-level` and `project-level` permission sets. Account-level permissions are primarily for account administration (inviting users, configuring SSO, and creating groups). Project-level permissions are for the configuration and maintenance of the projects themselves (configuring environments, accessing IDE, and running jobs). Account permission sets may have access to project features, and project permission sets may have access to account features. Check out the [permissions tables](https://docs.getdbt.com/docs/cloud/manage-access/enterprise-permissions.md#account-permissions) to compare sets and their access.  Account admin The Account admin permission set is the highest level of access and control over your dbt account and projects. We recommend limiting the number of users and groups assigned the account admin permission set. Notable features: * Account admin is an account-level set. * Unrestricted access to every feature. * The default permissions for every user who creates a new dbt account. * The default permissions assigned to the `Owner` group.  Admin The Admin permission set is intended for project administration but with limited account-level access to invite and assign users. Notable features: * Admin is a project-level set. * Unrestricted access to existing projects, but can't create new projects. * Can invite new members and assign access but can't create groups. * Can access Catalog. * The default permissions assigned to the `Member` group.  Analyst The Analyst permission set is designed for users who need to run and analyze dbt models in the IDE but can't create or edit anything outside the IDE. Notable features: * Analyst is a project-level set. * Full access to the IDE and the ability to configure personal credentials for adapters and Git. * Read-only access to environment configs. * Can view jobs but can't edit them. * Can access Catalog.  Billing admin The Billing admin permission set can review product usage information that impacts the final billing of dbt (for example, models run). Notable features: * Billing admin is an account-level set. * Unrestricted access to the **Billing** section of your **Account settings**. * Read access to public models. * No other access.  Cost Insights Admin The Cost Insights Admin permission set provides the minimum permissions needed to configure and manage [Cost Insights](https://docs.getdbt.com/docs/explore/cost-insights.md) settings and view cost data. Notable features: * Cost Insights Admin is both an account-level and project-level set. * Can configure platform metadata credentials and Cost Insights settings in connection settings. * Can view cost and savings data across projects, models, and jobs. * Read-only access to connections, projects, jobs, and metadata. * Can access dbt Catalog.  Cost Insights Viewer The Cost Insights Viewer permission set provides read-only access to [Cost Insights](https://docs.getdbt.com/docs/explore/cost-insights.md) data with the minimum permissions needed to view estimated cost and reduction information. Notable features: * Cost Insights Viewer is both an account-level and project-level set. * Read-only access to cost and savings data across projects, models, and jobs. * Read-only access to connections, platform metadata credentials, projects, jobs, and metadata. * Cannot configure or edit Cost Insights settings. * Can access dbt Catalog.  Database admin Database admins manage configurations between dbt and the underlying databases. Notable features: * Database admin is a project-level set. * Can set up and maintain environment variables and Semantic Layer configs. * Write access to data platform configurations within environments (credentials, warehouse, schema per environment). * Helpful for scenarios where your data warehouse admins only need access to dbt to configure data platform settings within environments. * Read-only access to account-level connections, Git repo, job, and run settings. * Can access Catalog.  Developer The Developer permission set is intended for users who build and maintain dbt models under development and manage production behavior. This is the primary permission set for users working in the IDE and should not be conflated with the [Developer license](https://docs.getdbt.com/docs/cloud/manage-access/seats-and-users.md#licenses). Notable features: * Developer is a project-level set. * Can create, edit, and test dbt code in the IDE. * Read-only access to the underlying configs for environments, jobs, runs, and Git. * Users manage their credentials to data warehouses and Git. * Can access Catalog.  Fusion admin This permission set is used exclusively to enable users to interact with the Fusion upgrade workflows. We recommend limiting this permission to users and projects that are Fusion-ready. By default, all users can access the Fusion upgrade experience and perform upgrades based on their existing permissions. When the Fusion upgrade permissions setting is enabled (when you see a check mark), only users with the fusion admin or account admin permission set can perform upgrades. If the setting is disabled (no check mark), upgrades are not restricted. When upgrade permissions are enabled: * **Fusion admin** — Assign to user accounts. This permission cannot be assigned to service tokens. * **Account admin** — Assign to users or service tokens. This permission allows both users and service tokens to perform upgrades. See the [dbt platform Fusion upgrade](https://docs.getdbt.com/docs/dbt-versions/upgrade-dbt-version-in-cloud.md#dbt-fusion-engine) docs for more information.  Git admin Git admins manage Git repository integrations and cloning. Notable features: * Git admin is a project-level set. * Can create new Git integrations and environment variables. * Can edit project settings. * Read-only access to account settings (including users and groups). * No access to the IDE. * Can access Catalog.  Job admin Job admin is an administrative permission set for users who create, run, and manage jobs in dbt. Notable features: * Job admin is a project-level set. * Job admins can create and edit jobs, runs, environment variables, and data warehouse configs. * Job admins can set up project integrations, including [Tableau lineage](https://docs.getdbt.com/docs/cloud-integrations/semantic-layer/tableau.md). * Read-only access to project configs. * Read-only access to connections and public models. * Can access Catalog.  Job runner Job runner is a specialized permission set for users who need access to run jobs and view the outcomes. Notable features: * Job runner is a project-level set. * Can run jobs. * Has read-only access to jobs, including status and results. * No other access to dbt features.  Job viewer Job viewer enables users to monitor and review job executions within dbt. Users with this role can see jobs’ status, logs, and outcomes but cannot initiate or modify them. Notable features: * Job viewer is a project-level set. * Read-only access to job results, status, and logs. * No other access to dbt features. * Can access Catalog.  Manage marketplace apps Manage marketplace apps is a specialized permission set associated with dbt marketplace apps. Usually implemented for the Snowflake Native App. Notable features: * Manage marketplace apps is an account-level set. * Used exclusively for marketplace app integrations. * Not intended for general user/group assignment.  Metadata (Discovery API only) Metadata is intended to be a read-only [Discovery API](https://docs.getdbt.com/docs/dbt-cloud-apis/discovery-api.md) integration permission set. Notable features: * Metadata is a project-level set. * Grants read-only access to metadata related to dbt models, runs, sources, and tests. * No access to modify, execute, or manage dbt jobs, repositories, or users. * No other access to dbt features.  Project creator The Project creator permission set can create, configure, and set up new projects. It is recommended for the admin of teams that will own a project. Notable features: * Project creator is an account-level set. * Only permission set other than Account admin that can create projects. * Limited account settings access. The project creator can create and edit connections, invite users, create groups, and assign licenses. * Unrestricted access to project configurations. * Can access Catalog  Security admin Security admins have limited access to the security settings and policies for the dbt account. This is intended for members of a security team who need to ensure compliance with security standards and oversee the implementation of best security practices across the account. The [IT license-type](https://docs.getdbt.com/docs/cloud/manage-access/seats-and-users.md#licenses) includes this permission set by default. Notable features: * Security admin is an account-level set. * Can create and edit users and groups and assign licenses. * Can create and edit authentication and SSO settings. * Can create and edit IP restrictions and service tokens, as well as manage user access controls. * No access to jobs, runs, environments, or the IDE.  Semantic Layer A specialized permission set with strict access to only query the Semantic Layer using a service token. Notable features: * Semantic Layer is a project-level set. * Can only query Semantic Layer. * No other access to dbt features.  Stakeholder and Read-Only The Stakeholder and Read-Only are identical permission sets that are similar to Viewer, but without access to sensitive content such as account settings, billing information, or audit logs. Useful for personas who need to monitor projects and their configurations. Notable features: * Stakeholder is a project-level set. * Read-only access to projects, environments, jobs, and runs. * Read-only access to user and group information. * Can access Catalog. * No access to the IDE. * Limited access to audit log content that excludes sensitive information, such as user settings and account-level changes.  Team admin Team admin is an administrative permission set intended for team leaders or similar personas. The permissions grants the ability to manage projects for the team. Notable features: * Team admin is a project-level set. * Access to manage the project(s) for a team of users. Limited scope and access can be extended via environment permissions. * Read-only access to many account settings (excluding sensitive content like billing and auth providers). * Can access Catalog.  Viewer The Account Viewer permissions set provides read-only access to the dbt account. Useful for any persona who needs insights into your dbt account without access to create or change configurations. The Viewer permission set is frequently paired with the [Read-only license-type](https://docs.getdbt.com/docs/cloud/manage-access/seats-and-users.md). Notable features: * Viewer is an account-level set. * Read-only access to all settings, projects, environments, and runs. * Read-only access to audit logs, including sensitive account-level information. * No access to the IDE. * Can access Catalog License types override group permissions **User license types always override their assigned group permission sets.** For example, a user with a Read-Only license cannot perform administrative actions, even if they belong to an Account Admin group. This ensures that license restrictions are always enforced, regardless of group membership. Permissions: * **Account-level permissions** — Permissions related to the management of the dbt account. For example, billing and account settings. * **Project-level permissions** — Permissions related to the projects in dbt. For example, repos and access to the Studio IDE or dbt CLI. note Some permissions sets have read-only access to environment settings that can be overriden with more privileged access if the user is assigned to a group with [Environment write access](https://docs.getdbt.com/docs/cloud/manage-access/about-user-access.md#environment-write-access) configured. ##### Account permissions[​](#account-permissions "Direct link to Account permissions") Account permission sets enable you to manage the dbt account and manage the account settings (for example, generating service tokens, inviting users, and configuring SSO). They also provide project-level permissions. The **Account Admin** permission set is the highest level of access you can assign. Key: * **(W)rite** — Create new or modify existing. Includes `send`, `create`, `delete`, `allocate`, `modify`, and `develop`. * **(R)ead** — Can view but cannot create or change any fields. ###### Account access for account permissions[​](#account-access-for-account-permissions "Direct link to Account access for account permissions") | | | - | | Account-level permission | Account Admin | Billing admin | Manage marketplace apps | Project creator | Security admin | Viewer | | ------------------------ | ------------- | ------------- | ----------------------- | --------------- | -------------- | ------ | | Account settings\* | W | - | - | R | R | R | | Audit logs | R | - | - | - | R | R | | Auth provider | W | - | - | - | W | R | | Billing | W | W | - | - | - | R | | Connections | W | - | - | W | - | - | | Groups | W | - | - | R | W | R | | Invitations | W | - | - | W | W | R | | IP restrictions | W | - | - | - | W | R | | Licenses | W | - | - | W | W | R | | Marketplace app | - | - | W | - | - | - | | Members | W | - | - | W | W | R | | Project (create) | W | - | - | W | - | - | | Public models | R | R | - | R | R | R | | Service tokens | W | - | - | - | R | R | | Webhooks | W | - | - | - | - | - | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | --- ##### Invite users to dbt dbt makes it easy to invite new users to your environment out of the box. This feature is available to all dbt customers on Starter, Enterprise, and Enterprise+ plans. #### Prerequisites[​](#prerequisites "Direct link to Prerequisites") You must have proper permissions to invite new users: * [**Starter accounts**](https://docs.getdbt.com/docs/cloud/manage-access/self-service-permissions.md) — must have `member` or `owner` permissions. * [**Enterprise-tier accounts**](https://docs.getdbt.com/docs/cloud/manage-access/enterprise-permissions.md) — must have `admin`, `account admin`, `project creator`, or `security admin` permissions. * The admin inviting the users must have a `developer` or `IT` license. #### Invite new users[​](#invite-new-users "Direct link to Invite new users") 1. In your dbt account, select your account name in the bottom left corner. Then select **Account settings**. 2. Under **Settings**, select **Users**. 3. Click on **Invite users**. [![The invite users pane](/img/docs/dbt-cloud/access-control/invite-users.png?v=2 "The invite users pane")](#)The invite users pane 4. In the **Email Addresses** field, enter the email addresses of the users you want to invite separated by a comma, semicolon, or a new line. 5. Select the license type for the batch of users from the **License** dropdown. 6. Select the group(s) you want the invitees to belong to. 7. Click **Send invitations**. * If the list of invitees exceeds the number of licenses your account has available, you will receive a warning when you click **Send Invitations** and the invitations will not be sent. #### User experience[​](#user-experience "Direct link to User experience") Email verification Email verification is mandatory for all new users in dbt, including using Single Sign-On (SSO)⁠⁠. Automatic provisioning without email verification is not allowed. This is a security requirement that cannot be bypassed. dbt generates and sends emails from `support@getdbt.com` to the specified addresses. Make sure that traffic from the `support@getdbt.com` email is allowed in your settings to avoid emails from going to spam or being blocked. This is the originating email address for all [instances worldwide](https://docs.getdbt.com/docs/cloud/about-cloud/access-regions-ip-addresses.md). The email contains a link to create an account. When the user clicks on this link, they will be brought to one of two screens depending on whether SSO is configured or not. [![Example or an email invitation](/img/docs/dbt-cloud/access-control/email-invite.png?v=2 "Example or an email invitation")](#)Example or an email invitation * Local user * SSO user The default settings send the email, the user clicks the link, and is prompted to create their account: [![Default user invitation](/img/docs/dbt-cloud/access-control/default-user-invite.png?v=2 "Default user invitation")](#)Default user invitation If SSO is configured for the environment, the user must: 1. Click the link in the verification email. 2. Click the option to join the account. 3. A confirmation screen appears, with a link to authenticate against the company's identity provider. Click **Authenticate with your enterprise login**. Complete the SSO flow Accepting the invite doesn't fully complete the process. The user *must* log in using SSO to redeem the invite and access the account. [![User invitation with SSO configured](/img/docs/dbt-cloud/access-control/sso-user-invite.png?v=2 "User invitation with SSO configured")](#)User invitation with SSO configured Once the user completes this process, their email and user information will populate in the **Users** screen in dbt. #### FAQ[​](#faq "Direct link to FAQ")  Is there a limit to the number of users I can invite? Your ability to invite users is limited to the number of licenses you have available.  What happens if I reinstate a deleted user? If you invite a previously deleted user back to your account with the same email address, their personal profile information, including linked accounts, will persist (unless the user deleted the connection from the source account, like GitHub). Any previously assigned permissions and access settings are reset, and you will have to reassign them.  Why are users clicking the invitation link and getting an 'Invalid Invitation Code' error? We have seen scenarios where embedded secure link technology (such as enterprise Outlook's [Safe Link](https://learn.microsoft.com/en-us/microsoft-365/security/office-365-security/safe-links-about?view=o365-worldwide) feature) can result in errors when clicking on the email link. Be sure to include the `getdbt.com` URL in the allowlists for these services.  Can I have a mixture of users with SSO and username/password authentication? Once SSO is enabled, you will no longer be able to add local users. If you have contractors or similar contingent workers, we recommend you add them to your SSO service.  What happens if I need to resend the invitation? From the **Users** page, click on the invite record, and you will be presented with the option to resend the invitation.  What can I do if I entered an email address incorrectly? From the **Users** page, click on the invite record, and you will be presented with the option to revoke it. Once revoked, generate a new invitation to the correct email address. [![Resend or revoke the users invitation](/img/docs/dbt-cloud/access-control/resend-invite.png?v=2 "Resend or revoke the users invitation")](#)Resend or revoke the users invitation #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ##### Manage user licenses with SCIM EnterpriseEnterprise + ### Manage user licenses with SCIM [Enterprise](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing")[Enterprise +](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing") You can manage user license assignments using System for Cross-Domain Identity Management (SCIM) and a user attribute in Okta, so the license type is set as users are provisioned and onboarded. SCIM license mapping available for Okta only SCIM license mapping is currently only supported for Okta. For other providers, use [SSO license mapping](https://docs.getdbt.com/docs/cloud/manage-access/seats-and-users.md#mapped-configuration) or manage [licenses](https://docs.getdbt.com/docs/cloud/manage-access/seats-and-users.md) in the dbt platform user interface. ###### Considerations[​](#considerations "Direct link to Considerations") Before you enable SCIM license mapping: * **Default license**: New users are assigned a Developer license unless you change it manually using [SSO license mappings](https://docs.getdbt.com/docs/cloud/manage-access/seats-and-users.md#mapped-configuration), or using SCIM. * **Best practice**: Use one source of truth for license assignment (either dbt platform or SCIM). Don't mix SCIM license management with manual or single sign-on (SSO) mapping changes. * **Analyst license**: Only available on [select plans](https://docs.getdbt.com/docs/cloud/manage-access/seats-and-users.md). Assigning this license using SCIM will return an error if that license type isn't available for your account. #### Enable SCIM license mapping[​](#enable-scim-license-mapping "Direct link to Enable SCIM license mapping") To use license management using SCIM, go to your **Account settings** > **SSO & SCIM**. Under the **SCIM** section, enable **Manage user licenses with SCIM**. This setting enforces license type for a user based on their SCIM attribute and disable the license mapping and manual configuration set up in dbt. [![Enable SCIM managed user license distribution.](/img/docs/dbt-cloud/access-control/scim-managed-licenses.png?v=2 "Enable SCIM managed user license distribution.")](#)Enable SCIM managed user license distribution. We recommend that you complete the setup instructions for your identity provider (IdP) prior to enabling this toggle in your dbt account. Once enabled, any existing license mappings in dbt will be ignored. The recommended steps for migrating to SCIM license mapping are as follows: 1. Set up SCIM but keep the toggle disabled so existing license mappings continue to work as expected. 2. Configure license attributes in your IdP. 3. Test that SCIM attributes are being used to set license type in dbt. 4. Enable the toggle to ignore existing license mappings so that SCIM is the source of truth for assigning licenses to users. #### Enterprise default groups[​](#enterprise-default-groups "Direct link to Enterprise default groups") On the Enterprise licensing page, the following default groups are available. These are often used for initial setup but should be managed or removed once IdP groups are successfully synced via SCIM. | Group | Access permissions | | ------------ | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | **Owner** | Full access to account features and billing. | | **Member** | Robust access to the account with restrictions on billing and security settings. Users in the Member group are assigned a Developer license by default. | | **Everyone** | A catch-all group for all users. This group does not have permission assignments beyond the user's profile. Users must be assigned to the Member or Owner group to work in dbt. | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | Best practice After creating and syncing your specific IdP groups, remove users from these default groups to ensure that SCIM remains the single source of truth for permissions and licensing. Once all users have been removed, you can also delete the groups altogether. #### Automated license mapping[​](#automated-license-mapping "Direct link to Automated license mapping") Automating license assignments is available for Okta only. It's a common strategy to reduce administrative overhead. For SSO-based license mapping (for example, Entra ID), see [Mapped configuration](https://docs.getdbt.com/docs/cloud/manage-access/seats-and-users.md#mapped-configuration). ###### Define IdP groups[​](#define-idp-groups "Direct link to Define IdP groups") A common strategy involves defining two primary groups in your IdP, for example: * `dbt_developers` * `dbt_read_only` ###### Fundamental licensing rules[​](#fundamental-licensing-rules "Direct link to Fundamental licensing rules") Understanding these rules will help you plan your group structure in Okta: * **Default assignment:** When someone new is added to your account, they automatically receive a Developer license unless you configure otherwise. * **Mapping basis:** The license someone receives is determined by which groups they belong to in your identity provider (Okta) — not by groups you create in dbt platform. In other words, your Okta groups are the source of truth. When you add or remove someone from a group in Okta, their license in dbt platform updates automatically to match. ###### Mapping logic and precedence[​](#mapping-logic-and-precedence "Direct link to Mapping logic and precedence") With SSO license mapping, the Developer license takes precedence over all other licenses. With SCIM license mapping (Okta), precedence depends on your configuration — whether you assign the license attribute directly to the user or derive it from group membership through the expression in your Okta Profile Editor. When you use groups like `dbt_developers` and `dbt_read_only`, a user might be in one group, both groups, or neither. The following table shows one common scenario (group-based mapping) and can be useful when you're deciding how to structure your groups or troubleshooting why someone received a particular license. Users in the Enterprise default group **Member** are assigned a **Developer** license by default. Until you remove users from **Member** (per the [best practice](#enterprise-default-groups) earlier), that default applies when they're not in any IdP license-mapping groups. Once you enable SCIM license mapping, the IdP group mapping overrides the Member default. **SSO license mapping (IdP group-based):** | In `dbt_developers` group? | In `dbt_read_only` group? | License assigned | | -------------------------- | ------------------------- | --------------------------------------------------- | | No | No | Developer (Member default or default for new users) | | No | Yes | Read-Only | | Yes | No | Developer | | Yes | Yes | Developer (Developer takes precedence) | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | #### Add license type attribute for Okta[​](#add-license-type-attribute-for-okta "Direct link to Add license type attribute for Okta") To add the attribute for license types to your Okta environment: 1. From your Okta application, navigate to the **Provisioning** tab, scroll down to **Attribute Mappings**, and click **Go to Profile Editor**. 2. Click **Add Attribute**. 3. Configure the attribute fields as follows (the casing should match for each value): * **Data type:** `string` * **Display name:** `License Type` * **Variable name:** `licenseType` * **External name:** `licenseType` * **External namespace:** `urn:ietf:params:scim:schemas:extension:dbtLabs:2.0:User` * **Description:** An arbitrary string of your choosing. * **Enum:** Select the box for **Define enumerated list of values**. * **Attribute members:** Add the initial attribute and then click **Add another** until each license type is defined. We recommend adding all of the values even if you don't use them today, so they'll be available in the future. Refer to the following table for the values you can use. | Display name | Value | | ------------- | ----------- | | **IT** | `it` | | **Analyst** | `analyst` | | **Developer** | `developer` | | **Read Only** | `read_only` | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | The **Analyst** license is only available on [select plans](https://docs.getdbt.com/docs/cloud/manage-access/seats-and-users.md). * **Attribute type:** Personal [![Enter the fields as they appear in the image. Ensure the cases match.](/img/docs/dbt-cloud/access-control/scim-license-attributes.png?v=2 "Enter the fields as they appear in the image. Ensure the cases match.")](#)Enter the fields as they appear in the image. Ensure the cases match. 4. **Save** the attribute mapping. Users can now have license types set in their profiles and when they are being provisioned. [![Set the license type for the user in their Okta profile.](/img/docs/dbt-cloud/access-control/scim-license-provisioning.png?v=2 "Set the license type for the user in their Okta profile.")](#)Set the license type for the user in their Okta profile. #### Automate license assignments with Okta groups[​](#automate-license-assignments-with-okta-groups "Direct link to Automate license assignments with Okta groups") To automate seat assignments in Okta, use the Profile Editor to map Okta group memberships to dbt license types. 1. Create groups in Okta, for example: * `dbt_developers` * `dbt_read_only` 2. Within the dbt app Profile Editor in Okta, create mapping rules for Okta users to dbt app users: * **Attribute:** `licenseType` * **Logic (Expression):** `IIF(isMemberOf("dbt_developers"), "developer", "read_only")` * **Default behavior:** Users not in the `dbt_developers` group will default to Read-Only. Adding or removing users from these Okta groups automatically updates their dbt app profile and triggers a SCIM update to synchronize the `licenseType` in dbt. Admins also have the option of using **Manual Push** in the Okta app to synchronize the changes. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ##### Migrating to Auth0 for SSO EnterpriseEnterprise + ### Migrating to Auth0 for SSO [Enterprise](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing")[Enterprise +](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing") dbt Labs is partnering with Auth0 to bring enhanced features to dbt's single sign-on (SSO) capabilities. Auth0 is an identity and access management (IAM) platform with advanced security features, and it will be leveraged by dbt. These changes will require some action from customers with SSO configured in dbt today, and this guide will outline the necessary changes for each environment. If you have not yet configured SSO in dbt, refer instead to our setup guides for [SAML](https://docs.getdbt.com/docs/cloud/manage-access/set-up-sso-saml-2.0.md), [Okta](https://docs.getdbt.com/docs/cloud/manage-access/set-up-sso-okta.md), [Google Workspace](https://docs.getdbt.com/docs/cloud/manage-access/set-up-sso-google-workspace.md), or [Microsoft Entra ID (formerly Azure AD)](https://docs.getdbt.com/docs/cloud/manage-access/set-up-sso-microsoft-entra-id.md) single sign-on services. #### Start the migration[​](#start-the-migration "Direct link to Start the migration") The Auth0 migration feature is being rolled out incrementally to customers who have SSO features already enabled. When the migration option has been enabled on your account, you will see **SSO Update Required** on the right side of the menu bar, near the settings icon. Alternatively, you can start the process by clicking your account name at the bottom left-hand menu and going to **Account settings** > **SSO & SCIM**. Click the **Begin Migration** button to start. vanity urls Don't use vanity URLs when configuring the SSO settings. You need to use the generic URL provided in the SSO settings for your environment. For example, if your vanity URL is `cloud.MY_COMPANY.getdbt.com`, configure `auth.cloud.getdbt.com` as ``. There are two fields in the SSO settings that you need for the migration: * **Single sign-on URL:** This will be in the format of your login URL `https:///login/callback?connection=` * **Audience URI (SP Entity ID):** This will be in the format `urn:auth0::` Once you have opted to begin the migration process, the following steps will vary depending on the configured identity provider. You can just skip to the section that's right for your environment. These steps only apply to customers going through the migration; new setups will use the existing [setup instructions](https://docs.getdbt.com/docs/cloud/manage-access/sso-overview.md). #### SAML 2.0[​](#saml-20 "Direct link to SAML 2.0") SAML 2.0 users must update a few fields in the SSO app configuration to match the new Auth0 URL and URI. You can approach this by editing the existing SSO app settings or creating a new one to accommodate the Auth0 settings. One approach isn't inherently better, so you can choose whichever works best for your organization. ##### SAML 2.0 and Okta[​](#saml-20-and-okta "Direct link to SAML 2.0 and Okta") The Okta fields that will be updated are: * Single sign-on URL — `https:///login/callback?connection=` * Audience URI (SP Entity ID) — `urn:auth0::` Below are sample steps to update. You must complete all of them to ensure uninterrupted access to dbt and you should coordinate with your identity provider admin when making these changes. 1. Replace `` with your account’s login URL slug. Here is an example of an updated SAML 2.0 setup in Okta. [![Okta configuration with new URL](/img/docs/dbt-cloud/access-control/new-okta-config.png?v=2 "Okta configuration with new URL")](#)Okta configuration with new URL 2. Save the configuration, and your SAML settings will look something like this: [![New Okta configuration completed](/img/docs/dbt-cloud/access-control/new-okta-completed.png?v=2 "New Okta configuration completed")](#)New Okta configuration completed 3. Toggle the `Enable new SSO authentication` option to ensure the traffic is routed correctly. *The new SSO migration action is final and cannot be undone.* 4. Save the settings and test the new configuration using the SSO login URL provided on the settings page. ##### SAML 2.0 and Entra ID[​](#saml-20-and-entra-id "Direct link to SAML 2.0 and Entra ID") The Entra ID fields that will be updated are: * Single sign-on URL — `https:///login/callback?connection=` * Audience URI (SP Entity ID) — `urn:auth0::` The new values for these fields can be found in dbt by navigating to **Account settings** > **SSO & SCIM**. 1. Replace `` with your organization’s login URL slug. 2. Locate your dbt SAML2.0 app in the **Enterprise applications** section of Azure. Click **Single sign-on** on the left side menu. 3. Edit the **Basic SAML configuration** tile and enter the values from your account: * Entra ID **Identifier (Entity ID)** = dbt **Audience URI (SP Entity ID)** * Entra ID **Reply URL (Assertion Consumer Service URL)** = dbt **Single sign-on URL** [![Editing the SAML configuration window in Entra ID](/img/docs/dbt-cloud/access-control/edit-entra-saml.png?v=2 "Editing the SAML configuration window in Entra ID")](#)Editing the SAML configuration window in Entra ID 4. Save the fields and the completed configuration will look something like this: [![Completed configuration of the SAML fields in Entra ID](/img/docs/dbt-cloud/access-control/entra-id-saml.png?v=2 "Completed configuration of the SAML fields in Entra ID")](#)Completed configuration of the SAML fields in Entra ID 5. Toggle the `Enable new SSO authentication` option to ensure the traffic is routed correctly. *The new SSO migration action is final and cannot be undone.* 6. Save the settings and test the new configuration using the SSO login URL provided on the settings page. #### Microsoft Entra ID[​](#microsoft-entra-id "Direct link to Microsoft Entra ID") Microsoft Entra ID admins using OpenID Connect (ODIC) will need to make a slight adjustment to the existing authentication app in the Azure portal. This migration does not require that the entire app be deleted or recreated; you can edit the existing app. Start by opening the Azure portal and navigating to the Microsoft Entra ID overview. Below are steps to update. You must complete all of them to ensure uninterrupted access to dbt and you should coordinate with your identity provider admin when making these changes. 1. Click **App Registrations** on the left side menu. [![Select App Registrations](/img/docs/dbt-cloud/access-control/aad-app-registration.png?v=2 "Select App Registrations")](#)Select App Registrations 2. Select the proper **dbt** app (name may vary) from the list. From the app overview, click on the hyperlink next to **Redirect URI** [![Click the Redirect URI hyperlink](/img/docs/dbt-cloud/access-control/app-overview.png?v=2 "Click the Redirect URI hyperlink")](#)Click the Redirect URI hyperlink 3. In the **Web** pane with **Redirect URIs**, click **Add URI** and enter the appropriate `https:///login/callback`. Save the settings and verify it is counted in the updated app overview. [![Enter new redirect URI](/img/docs/dbt-cloud/access-control/redirect-URI.png?v=2 "Enter new redirect URI")](#)Enter new redirect URI 4. Navigate to the dbt environment and open the **Account settings**. Click the **SSO & SCIM** option from the left-side menu and click the **Edit** option from the right side of the SSO pane. The **domain** field is the domain your organization uses to login to Microsoft Entra ID. Toggle the **Enable New SSO Authentication** option and **Save**. *Once this option is enabled, it cannot be undone.* Domain authorization You must complete the domain authorization before you toggle `Enable New SSO Authentication`, or the migration will not complete successfully. #### Google Workspace[​](#google-workspace "Direct link to Google Workspace") Google Workspace admins updating their SSO APIs with the Auth0 URL won't have to do much if it is an existing setup. This can be done as a new project or by editing an existing SSO setup. No additional scopes are needed since this is migrating from an existing setup. All scopes were defined during the initial configuration. Below are steps to update. You must complete all of them to ensure uninterrupted access to dbt and you should coordinate with your identity provider admin when making these changes. 1. Open the [Google Cloud console](https://console.cloud.google.com/) and select the project with your dbt single sign-on settings. From the project page **Quick Access**, select **APIs and Services** [![Google Cloud Console](/img/docs/dbt-cloud/access-control/google-cloud-sso.png?v=2 "Google Cloud Console")](#)Google Cloud Console 2. Click **Credentials** from the left side pane and click the appropriate name from **OAuth 2.0 Client IDs** [![Select the OAuth 2.0 Client ID](/img/docs/dbt-cloud/access-control/sso-project.png?v=2 "Select the OAuth 2.0 Client ID")](#)Select the OAuth 2.0 Client ID 3. In the **Client ID for Web application** window, find the **Authorized Redirect URIs** field and click **Add URI** and enter `https:///login/callback`. Click **Save** once you are done. 4. *You will need a person with Google Workspace admin privileges to complete these steps in dbt*. In dbt, navigate to the **Account settings**, click on **SSO & SCIM**, and then click **Edit** on the right side of the **Single sign-on** pane. Toggle the **Enable New SSO Authentication** option and select **Save**. This will trigger an authorization window from Google that will require admin credentials. *The migration action is final and cannot be undone*. Once the authentication has gone through, test the new configuration using the SSO login URL provided on the settings page. Domain authorization You must complete the domain authorization before you toggle `Enable New SSO Authentication`, or the migration will not complete successfully. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ##### Multi-factor authentication important dbt enforces multi-factor authentication (MFA) for all users with username and password credentials. If MFA is not set up, you will see a notification bar prompting you to configure one of the supported methods when you log in. If you do not, you will have to configure MFA upon subsequent logins, or you will be unable to access dbt. dbt provides multiple options for multi-factor authentication (MFA), which adds an extra layer of security to username and password logins. MFA is available across dbt plans for users with username and password logins only. The available MFA methods are: * SMS verification code * Currently, only phone numbers with the North American Numbering Plan (NANP) +1 country code are supported. * Authenticator app * Webauthn-compliant security key #### Configuration[​](#configuration "Direct link to Configuration") You can only have one of the three MFA methods configured per user. These are enabled at the user level, not the account level. 1. Navigate to the **Account settings** and under **Your profile** click on **Password & Security**. Click **Enroll** next to the preferred method. [![List of available MFA enrollment methods in dbt.](/img/docs/dbt-cloud/mfa-enrollment.png?v=2 "List of available MFA enrollment methods in dbt.")](#)List of available MFA enrollment methods in dbt. Choose the next steps based on your preferred enrollment selection:  SMS verification code 2. Select the +1 country code, enter your phone number in the field, and click **Continue**. [![The phone number selection, including a dropdown for country code.](/img/docs/dbt-cloud/sms-enter-phone.png?v=2 "The phone number selection, including a dropdown for country code.")](#)The phone number selection, including a dropdown for country code. 3. You will receive an SMS message with a six digit code. Enter the code in dbt. [![Enter the 6-digit code.](/img/docs/dbt-cloud/enter-code.png?v=2 "Enter the 6-digit code.")](#)Enter the 6-digit code.  Authenticator app 2. Open your preferred authentication app (like Google Authenticator) and scan the QR code. [![Example of the user generated QR code.](/img/docs/dbt-cloud/scan-qr.png?v=2 "Example of the user generated QR code.")](#)Example of the user generated QR code. 3. Enter the code provide for "dbt Labs: YOUR\_EMAIL\_ADDRESS" from your authenticator app into the the field in dbt.  Webauthn-compliant security key 2. Follow the instructions in the modal window and click **Use security key**. [![Example of the Security Key activation window.](/img/docs/dbt-cloud/create-security-key.png?v=2 "Example of the Security Key activation window.")](#)Example of the Security Key activation window. 3. Scan the QR code or insert and touch activate your USB key to begin the process. Follow the on-screen prompts. 4) You will be given a backup passcode, store it in a secure location. This key will be useful if the MFA method fails (like a lost or broken phone). #### Account Recovery[​](#account-recovery "Direct link to Account Recovery") When setting up MFA, ensure that you store your recovery codes in a secure location, in case your MFA method fails. If you are unable to access your account, reach out to for further support. You may need to create a new account if your account cannot be recovered. If possible, it's recommended to configure multiple MFA methods so that if one fails, there is a backup option. #### Disclaimer[​](#disclaimer "Direct link to Disclaimer") The terms below apply to dbt’s MFA via SMS program, that dbt Labs (“dbt Labs”, “we”, or “us”) uses to facilitate auto sending of authorization codes to users via SMS for dbt log-in requests. Any clients of dbt Labs that use dbt Labs 2FA via SMS program (after password is input) are subject to the dbt Labs privacy policy, the client warranty in TOU Section 5.1 second paragraph that Client's use will comply with the Documentation (or similar language in the negotiated service agreement between the parties) and these terms: (1) The message frequency is a maximum of 1 message per user login; (2) Message and data rates may apply; (3) Carriers are not liable for delayed or undelivered messages; (4) For help, please reply HELP to the SMS number from which you receive the log-in authorization code(s); (5) To opt-out of future SMS messages, please reply STOP to the SMS number from which you receive the log-in authorization code(s). We encourage you to enable an alternate 2FA method before opting-out of SMS messages or you might not be able to log into your account. Further questions can be submitted to . #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ##### OAuth overview EnterpriseEnterprise + ### OAuth overview [Enterprise](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing")[Enterprise +](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing") OAuth in dbt platform lets developers authorize their development credentials with a data platform using that platform’s login or its single sign-on (SSO) instead of storing usernames and passwords in dbt platform. This improves security and aligns with how your team already signs in to the warehouse. **Who it’s for:** Account admins and developers on Enterprise or Enterprise+ plans who connect to supported data platforms. Admins configure the OAuth integration; developers complete a one-time authorization in their profile. #### Available integrations[​](#available-integrations "Direct link to Available integrations") Pick the best documentation for your platform to configure OAuth and have developers authorize their credentials. | Platform | Doc | Description | | ------------------------ | -------------------------------------------------------------------------------------------------------------------- | --------------------------------------------------------------------- | | **Snowflake** | [Set up Snowflake OAuth](https://docs.getdbt.com/docs/cloud/manage-access/set-up-snowflake-oauth.md) | Authorize development credentials using Snowflake (or Snowflake SSO). | | **Databricks** | [Set up Databricks OAuth](https://docs.getdbt.com/docs/cloud/manage-access/set-up-databricks-oauth.md) | Authorize development credentials using Databricks. | | **BigQuery** | [Set up BigQuery OAuth](https://docs.getdbt.com/docs/cloud/manage-access/set-up-bigquery-oauth.md) | Authorize development credentials using Google. | | **Snowflake (external)** | [Set up external OAuth with Snowflake](https://docs.getdbt.com/docs/cloud/manage-access/snowflake-external-oauth.md) | Use an external identity provider (IdP) for Snowflake OAuth. | | **Redshift (external)** | [Set up external OAuth with Redshift](https://docs.getdbt.com/docs/cloud/manage-access/redshift-external-oauth.md) | Use an external identity provider (IdP) for Redshift OAuth. | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ##### Self-service Starter account permissions Self-service Starter accounts are a quick and easy way to get dbt up and running for a small team. For teams looking to scale and access advanced features like SSO, group management, and support for larger user bases, upgrading to an [Enterprise-tier](https://docs.getdbt.com/docs/cloud/manage-access/enterprise-permissions.md) account unlocks these capabilities — if you're interested in upgrading, contact [dbt Labs today](https://www.getdbt.com/contact). #### Groups and permissions[​](#groups-and-permissions "Direct link to Groups and permissions") Groups determine a user's permission and there are three groups are available for the Starter plan dbt accounts: Owner, Member, and Everyone. The first Owner user is the person who created the dbt account. New users are added to the Member and Everyone groups when they onboard but this can be changed when the invitation is created. These groups only affect users with a [Developer license](#licenses) assigned. The group access permissions are as follows: * **Owner** — Full access to account features. * **Member** — Robust access to the account with restrictions on features that can alter billing or security. * **Everyone** — A catch-all group for all users in the account. This group does not have any permission assignments beyond the user's profile. Users must be assigned to either the Member or Owner group to work in dbt. #### Licenses[​](#licenses "Direct link to Licenses") You assign licenses to every user onboarded into dbt. You only assign Developer-licensed users to the Owner and Member groups. The groups have no impact on Read-only or IT licensed users. There are three license types: * **Developer** — The default license. Developer licenses don't restrict access to any features, so users with this license should be assigned to either the Owner or Member group. You're allotted up to 8 developer licenses per account. * **Read-Only** — Read-only access to your project, including environments Catalog. Doesn't have access to account settings at all. Functions the same regardless of group assignments. You're allotted up to 5 read-only licenses per account. * **IT** — Partial access to the account settings including users, integrations, billing, and API settings. Cannot create or edit connects or access the project at all. Functions the same regardless of group assignments. You're allocated 1 seat per account. See [Seats and Users](https://docs.getdbt.com/docs/cloud/manage-access/seats-and-users.md) for more information on the impact of licenses on these permissions. License types override group permissions **User license types always override their assigned group permission sets.** For example, a user with a Read-Only license cannot perform administrative actions, even if they belong to an Account Admin group. This ensures that license restrictions are always enforced, regardless of group membership. #### Table of groups, licenses, and permissions[​](#table-of-groups-licenses-and-permissions "Direct link to Table of groups, licenses, and permissions") Key: * (W)rite — Create new or modify existing. Includes `send`, `create`, `delete`, `allocate`, `modify`, and `read`. * (R)ead — Can view but can not create or change any fields. * No value — No access to the feature. Permissions: * [Account-level permissions](#account-permissions-for-account-roles) — Permissions related to management of the dbt account. For example, billing and account settings. * [Project-level permissions](#project-permissions-for-account-roles) — Permissions related to the projects in dbt. For example, Catalog and the Studio IDE. The following tables outline the access that users have if they are assigned a Developer license and the Owner or Member group, Read-only license, or IT license. ###### Account permissions for account roles[​](#account-permissions-for-account-roles "Direct link to Account permissions for account roles") | Account-level permission | Owner | Member | Read-only license | IT license | | ------------------------ | ----- | ------ | ----------------- | ---------- | | Account settings | W | W | - | W | | Billing | W | - | - | W | | Invitations | W | W | - | W | | Licenses | W | R | - | W | | Users | W | R | - | W | | Project (create) | W | W | - | W | | Connections | W | W | - | W | | Service tokens | W | - | - | W | | Webhooks | W | W | - | - | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | ###### Project permissions for account roles[​](#project-permissions-for-account-roles "Direct link to Project permissions for account roles") | Project-level permission | Owner | Member | Read-only | IT license | | ------------------------------- | ----- | ------ | --------- | ---------- | | Adapters | W | W | R | - | | Connections | W | W | R | W | | Credentials | W | W | R | - | | Custom env. variables | W | W | R | - | | Develop (Studio IDE or dbt CLI) | W | W | - | - | | Environments | W | W | R | - | | Jobs | W | W | R | - | | Catalog | W | W | R | - | | Permissions | W | R | - | - | | Profile | W | W | R | - | | Projects | W | W | R | - | | Repositories | W | W | R | - | | Runs | W | W | R | - | | Semantic Layer Config | W | W | R | - | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ##### Set up BigQuery OAuth EnterpriseEnterprise + ### Set up BigQuery OAuth [Enterprise](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing")[Enterprise +](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing") Enterprise-tier feature This guide describes a feature available in the dbt platform Enterprise and Enterprise+ plans. If you’re interested in learning more about our Enterprise-tier plans, contact us at . The dbt platform supports [OAuth](https://cloud.google.com/bigquery/docs/authentication) with BigQuery, providing an additional layer of security for dbt enterprise users. #### Set up BigQuery native OAuth[​](#set-up-bigquery-native-oauth "Direct link to Set up BigQuery native OAuth") When BigQuery OAuth is enabled for a dbt platform project, all dbt platform developers must authenticate with BigQuery to access development tools, such as the Studio IDE. To set up BigQuery OAuth in the dbt platform, a BigQuery admin must: 1. [Locate the redirect URI value](#locate-the-redirect-uri-value) in the dbt platform. 2. [Create a BigQuery OAuth 2.0 client ID and secret](#creating-a-bigquery-oauth-20-client-id-and-secret) in BigQuery. 3. [Configure the connection](https://docs.getdbt.com/docs/cloud/manage-access/set-up-bigquery-oauth.md#configure-the-connection-in-dbt) in the dbt platform. To use BigQuery in the Studio IDE, all developers must: 1. [Authenticate to BigQuery](#authenticating-to-bigquery) in their profile credentials. ##### Locate the redirect URI value[​](#locate-the-redirect-uri-value "Direct link to Locate the redirect URI value") To get started, locate the connection's redirect URI for configuring BigQuery OAuth. To do so: 1. Navigate to your account name, above your profile icon on the left side panel. 2. Select **Account settings** from the menu. 3. From the left sidebar, select **Connections**. 4. Click the BigQuery connection. 5. Locate the **Redirect URI** field under the **Development OAuth** section. Copy this value to your clipboard to use later on. [![Accessing the BigQuery OAuth configuration in dbt](/img/docs/dbt-cloud/using-dbt-cloud/dbt-cloud-enterprise/BQ-auth/dbt-cloud-bq-id-secret-02.png?v=2 "Accessing the BigQuery OAuth configuration in dbt")](#)Accessing the BigQuery OAuth configuration in dbt ##### Creating a BigQuery OAuth 2.0 client ID and secret[​](#creating-a-bigquery-oauth-20-client-id-and-secret "Direct link to Creating a BigQuery OAuth 2.0 client ID and secret") note The dbt platform will request the required BigQuery scopes automatically during the OAuth flow. [Configuring scopes in the BigQuery OAuth consent screen](https://developers.google.com/workspace/guides/configure-oauth-consent) is optional and not required for dbt platform to connect to BigQuery. Required scopes are requested and approved when users authenticate with OAuth. To get started, you need to create a client ID and secret for [authentication](https://cloud.google.com/bigquery/docs/authentication) with BigQuery. This client ID and secret will be stored in the dbt platform to manage the OAuth connection between the dbt platform users and BigQuery. 1. In the BigQuery console, navigate to **APIs & Services** and select **Credentials**: [![BigQuery navigation to credentials](/img/docs/dbt-cloud/using-dbt-cloud/dbt-cloud-enterprise/BQ-auth/BQ-nav.gif?v=2 "BigQuery navigation to credentials")](#)BigQuery navigation to credentials On the **Credentials** page, you can see your existing keys, client IDs, and service accounts. 2. Set up an [OAuth consent screen](https://support.google.com/cloud/answer/6158849) if you haven't already. 3. Click **+ Create Credentials** at the top of the page and select **OAuth client ID**. 4. Fill in the client ID configuration. **Authorized JavaScript Origins** are not applicable. 5. Add an item to **Authorized redirect URIs** and replace `REDIRECT_URI` with the value you copied to your clipboard earlier from the connection's **OAuth 2.0 Settings** section in the dbt platform: | Config | Value | | ---------------------------- | --------------- | | **Application type** | Web application | | **Name** | dbt platform | | **Authorized redirect URIs** | `REDIRECT_URI` | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | |
6. Click **Create** to create the BigQuery OAuth app and see the app client ID and secret values. These values are available even if you close the app screen, so this isn't the only chance you have to save them. [![Create an OAuth app in BigQuery](/img/docs/dbt-cloud/using-dbt-cloud/dbt-cloud-enterprise/BQ-auth/bq-oauth-app-02.png?v=2 "Create an OAuth app in BigQuery")](#)Create an OAuth app in BigQuery ##### Configure the Connection in dbt[​](#configure-the-connection-in-dbt "Direct link to Configure the Connection in dbt") Now that you have an OAuth app set up in BigQuery, you'll need to add the client ID and secret to the dbt platform. To do so: 1. Navigate back to the Connection details page, as described in [Locate the redirect URI value](#locate-the-redirect-uri-value). 2. Add the client ID and secret from the BigQuery OAuth app under the **OAuth 2.0 Settings** section. 3. Enter the BigQuery token URI. The default value is `https://oauth2.googleapis.com/token`. ##### Authenticating to BigQuery[​](#authenticating-to-bigquery "Direct link to Authenticating to BigQuery") Once the BigQuery OAuth app is set up for a dbt platform project, each dbt platform user will need to authenticate with BigQuery in order to use the Studio IDE. To do so: 1. Navigate to your account name, above your profile icon on the left side panel. 2. Select **Account settings** from the menu. 3. From the left sidebar, select **Credentials**. 4. Choose the project from the list. 5. Ensure the **Authentication Method** is set to **Native OAuth**. 6. Select **Connect to BigQuery**. [![Navigate to account settings in dbt platform.](/img/guides/dbt-ecosystem/dbt-python-snowpark/5-development-schema-name/1-settings-gear-icon.png?v=2 "Navigate to account settings in dbt platform.")](#)Navigate to account settings in dbt platform. [![Connect BigQuery account.](/img/docs/dbt-cloud/connect-to-bigquery.png?v=2 "Connect BigQuery account.")](#)Connect BigQuery account. You will then be redirected to BigQuery and asked to approve the drive, cloud platform, and BigQuery scopes, unless the connection is less privileged. [![BigQuery access request](/img/docs/dbt-cloud/using-dbt-cloud/dbt-cloud-enterprise/BQ-auth/BQ-access.png?v=2 "BigQuery access request")](#)BigQuery access request Select **Allow**. This redirects you back to the dbt platform. You are now an authenticated BigQuery user and can begin accessing dbt development tools. #### Set up BigQuery Workload Identity Federation [Enterprise](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing")Preview[​](#set-up-bigquery-workload-identity-federation- "Direct link to set-up-bigquery-workload-identity-federation-") Workload Identity Federation (WIF) allows application workloads, running outside the dbt platform, to act as a service account without the need to manage service accounts or other keys for deployment environments. The following instructions will enable you to authenticate your BigQuery connection in the dbt platform using WIF. Currently, Microsoft Entra ID is the only supported identity provider (IdP). If you need additional IdP support, please contact your account team. ##### 1. Set up Entra ID[​](#1-set-up-entra-id "Direct link to 1. Set up Entra ID") Create an app in Entra where dbt will request access tokens when authenticating to BigQuery via the workload identity pool: 1. From the **app registrations** screen, click **New registration**. 2. Give the app a name that makes it easily identifiable. 3. Ensure **Supported account types** are set to “Accounts in this organizational directory only (Org name - Single Tenant).” 4. Click **Register** to see the application’s overview screen. 5. From the **app overview**, click **Expose an API** in the left menu. 6. Click **Add** next to Application ID URI. The field will automatically populate. 7. Click **Save**. 8. (Optional) To include the `sub` claim in tokens issued by this application, configure [optional claims in Entra ID](https://learn.microsoft.com/en-us/entra/identity-platform/optional-claims?tabs=appui).
The `sub` (subject) claim uniquely identifies the user or service principal for whom the token is issued.
When you configure service account impersonation in GCP, the Workload Identity Federation mapping uses this `sub` value to verify the identity of the calling Entra application. 9. (Optional but recommended) Test your Entra ID configuration by requesting a token: Run the following command, replacing ``, ``, ``, and `` with your actual values: ```bash curl -X POST -H "Content-Type: application/x-www-form-urlencoded" \ -d 'client_id= \ &scope=/.default \ &client_secret= \ &grant_type=client_credentials' \ 'https://login.microsoftonline.com//oauth2/v2.0/token' ``` The response will include an `access_token`. You can decode this token using [jwt.io](https://jwt.io) to view the `sub` claim value. Workload Identity Federation utilizes a machine-to-machine OAuth flow that is unattended by the user; as such, a redirect URI won't need to be set for the application. Step 3 in this section is crucial because it determines the audience for tokens issued from the app and informs the workpool in GCP whether the calling application has permission to access the resources guarded by the workpool. * **Related documentation:** [GCP — Prepare your external identity provider](https://cloud.google.com/iam/docs/workload-identity-federation-with-other-clouds#create) ##### 2. Create a Workpool and Workpool Provider in GCP[​](#2-create-a-workpool-and-workpool-provider-in-gcp "Direct link to 2. Create a Workpool and Workpool Provider in GCP") 1. In your GCP account's main menu, navigate to **IAM & Admin** and click the **Workload Identity Federation** option (not to be confused with the **Work\_force\_ Identity Federation** option directly adjacent). 2. If you haven’t created a workpool yet, click **Get started** or create a new workpool (use button near the top of the page). 3. Give the workpool a name and description. Per the [GCP documentation](https://cloud.google.com/iam/docs/workload-identity-federation#pools), a new pool should be created for each non-Google Cloud environment that needs to access Google Cloud resources, such as development, staging, or production environments. The workpool should be named accordingly to make it easily identifiable in the future. 4. When creating your provider: * Set the type of the provider to **OpenID Connect (OIDC)**. * Name the provider something identifiable, like `Entra ID`. * Set the URL to . This can be found in the token itself, if you decode it via jwt.io. You can also see a reference to the expected issuer URL for Entra in the [GCP documentation for WIF](https://cloud.google.com/iam/docs/workload-identity-federation-with-other-clouds#create_the_workload_identity_pool_and_provider). * Replace `YOUR_TENANT_ID` with your tenant ID. * The tenant (provider) ID can be found in the app registration created in [section 1 of these instructions](#1-set-up-entra-id); it's called **Directory (tenant) ID** and can be found in the overview section for the application. * For **Audiences**, select **Allowed Audiences** and set the value to the **Application ID URI** that was defined for your Entra ID app. 5. Click **Continue**. 6. Under **Configure provider attributes**, set the mapping for `google.subject` to `assertion.sub`. 7. Click **Save**. ##### 3. Service Account Impersonation[​](#3-service-account-impersonation "Direct link to 3. Service Account Impersonation") A workpool either uses a service account or is granted direct resource access to determine which resources a caller can access. The [GCP documentation](https://cloud.google.com/iam/docs/workload-identity-federation-with-other-clouds#access) provides more detailed information on configuring both for your workpool. We chose the service account approach in our implementation because it offered greater flexibility. If you haven’t already, create a new service account: 1. From the main menu, select **IAM & Admin** 2. Click **Service Accounts**. 3. Click **Create service account**. Google recommends creating a service account per workload. 4. Assign the relevant roles you would like this service account to have. In our experience, `BigQuery Admin` is the default role with required access. Once you've created the service account, navigate back to the workpool you created in the previous step: 1. Click the **Grant Access** option at the top of the page. 2. Select **Grant access using Service Account Impersonation**. 3. Select the service account you just created. 4. Under **Select Principals**, set `subject` as the **Attribute Name**. For the **Attribute Value**, set it to the `sub` (subject) claim value from the Entra ID access token.  Obtain the sub value To obtain the `sub` value, request an access token from Entra ID. The `sub` claim is consistent across all tokens issued by this application: Run the following command, replacing ``, ``, ``, and `` with your actual values: ```bash curl -X POST -H "Content-Type: application/x-www-form-urlencoded" \ -d 'client_id= \ &scope=/.default \ &client_secret= \ &grant_type=client_credentials' \ 'https://login.microsoftonline.com//oauth2/v2.0/token' ``` The response will include an `access_token`. You can decode this token using [jwt.io](https://jwt.io) to view the `sub` claim value. ##### 4. Set up dbt[​](#4-set-up-dbt "Direct link to 4. Set up dbt") To configure a BigQuery connection to use WIF authentication in the dbt platform, you must set up a custom OAuth integration configured with details from the Entra application used as your workpool provider in GCP. dbt platform: 1. Navigate to **Account settings** --> **Integrations**. 2. Scroll down to the section for **Custom OAuth Integrations** and create a new integration. 3. Fill out all fields with the appropriate information from your IdP environment. * The Application ID URI should be set to the expected audience claim on tokens issued from the Entra application. It will be the same URI your workpool provider has been configured to expect. * You do not have to add the Redirect URI to your Entra application. ##### 5. Create connections in dbt[​](#5-create-connections-in-dbt "Direct link to 5. Create connections in dbt") To get started, create a new connection in the dbt platform: 1. Navigate to **Account settings** --> **Connections**. 2. Click **New connection** and select **BigQuery** as the connection type. You will then see the option to select **BigQuery** or **BigQuery (Legacy)**. Select **BigQuery**. 3. For the **Deployment Environment Authentication Method**, select **Workload Identity Federation**. 4. Fill out the **Google Cloud Project ID** and any optional settings you need. 5. Select the OAuth Configuration you created in the previous section from the drop-down. 6. Configure your development connection: * [BigQuery OAuth](https://docs.getdbt.com/docs/cloud/connect-data-platform/connect-bigquery.md#bigquery-oauth) (recommended). * Set this up in the same connection as the one you're using for WIF under **`OAuth2.0 settings`** * Service JSON. * You must create a separate connection with the Service JSON configuration. ##### 6. Set up project[​](#6-set-up-project "Direct link to 6. Set up project") To connect a new project to your WIF configuration: 1. Navigate to **Account settings** --> **Projects**. 2. Click **New project**. 3. Give your project a name and (optional) subdirectory path and click **Continue**. 4. Select the **Connection** with the WIF configuration. 5. Configure the remainder of the project with the appropriate fields. ##### 7. Set up deployment environment[​](#7-set-up-deployment-environment "Direct link to 7. Set up deployment environment") Create a new or updated environment to use the WIF connection. When you set your environment connection to the WIF configuration, you will then see two fields appear under the Deployment credentials section: * **Workload pool provider path:** This field is required for all WIF configurations. Example: `//iam.googleapis.com/projects//locations/global/workloadIdentityPools//providers/` * **Service account impersonation URL:** Used only if you’ve configured your workpool to use a service account impersonation for accessing your BigQuery resources (as opposed to granting the workpool direct resource access to the BigQuery resources). Example: `https://iamcredentials.googleapis.com/v1/projects/-/serviceAccounts:generateAccessToken` If you don't already have a job based on the deployment environment with a connection set up for WIF, you should create one now. Once you've configured it with the preferred settings, run the job. #### FAQs[​](#faqs "Direct link to FAQs") Why does the BigQuery OAuth application require scopes to Google Drive? BigQuery supports external tables over both personal Google Drive files and shared files. For more information, refer to [Create Google Drive external tables](https://cloud.google.com/bigquery/docs/external-data-drive). #### Troubleshooting[​](#troubleshooting "Direct link to Troubleshooting") The following section provides troubleshooting steps for common issues with BigQuery OAuth connections. ###### Connection fails after granting Google permissions[​](#connection-fails-after-granting-google-permissions "Direct link to Connection fails after granting Google permissions") When connecting a BigQuery account, you may successfully sign in to Google and approve the requested permissions, but then see an error page (like `403 Account restricted`) instead of returning to the dbt platform. This typically appears as a server error message rather than the credentials page you started from. This can happen when your Google Workspace organization restricts which third-party applications can access Google services. Even though Google allowed you to sign in and view the consent screen, the organization's security policy blocked the final step of issuing credentials to the dbt platform. This troubleshooting section will explain Why this happens, how to resolve it, and what to do if it keeps happening.  1. Why this affects BigQuery OAuth Google classifies OAuth scopes into three tiers: non-sensitive, sensitive, and restricted. Applications that only request non-sensitive scopes (such as basic sign-in) typically work without additional admin approval. The BigQuery OAuth connection requires scopes that Google classifies as [restricted](https://support.google.com/cloud/answer/9110914), which always require explicit admin trust regardless of your organization's default policy for third-party apps. The BigQuery OAuth connection requests these restricted scopes: * `bigquery` — Required to run queries and access BigQuery resources * `cloud-platform` — Required for cross-project BigQuery access and service interoperability * `drive` — Required to support [BigQuery external tables over Google Drive](https://cloud.google.com/bigquery/docs/external-data-drive) Other applications you use may connect without this approval step because they likely request only non-sensitive or sensitive scopes that don't trigger the same restriction.  2. How to resolve it Your Google Workspace administrator must trust the OAuth client ID that your dbt platform BigQuery connection uses. Find this client ID in the dbt platform under **Account settings** > **Connections** > your BigQuery connection > **OAuth 2.0 Settings**. Once you have the client ID, ask your Google Workspace administrator to: 1. Sign in to the [Google Workspace Admin Console](https://admin.google.com). 2. Navigate to **Security** > **Access and data control** > **API controls** > **Manage third-party app access**. 3. Click **Add app** > **OAuth App Name Or Client ID**. 4. Enter the client ID from your dbt platform BigQuery connection. 5. Set the access level to **Trusted**. 6. Apply the setting to the relevant organizational unit or the entire organization. After the administrator completes these steps, retry the BigQuery connection from your credentials page in the dbt platform. For more details, see Google's documentation on [controlling third-party app access to Google Workspace data](https://support.google.com/a/answer/7281227).  3. If the issue persists If the connection still fails after your administrator trusts the OAuth client in Google Workspace, ask your GCP project administrator to check whether a Google Cloud [Organization Policy](https://cloud.google.com/resource-manager/docs/organization-policy/overview) restricts external OAuth clients. The relevant constraint is `constraints/iam.allowedExternalOAuthClients`. If neither of these steps resolves the issue, contact [dbt Support](https://docs.getdbt.com/community/resources/getting-help#dbt-cloud-support) with the approximate time of the failed connection attempt and the email address you used to authenticate with Google. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ##### Set up Databricks OAuth EnterpriseEnterprise + ### Set up Databricks OAuth [Enterprise](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing")[Enterprise +](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing") dbt supports developer OAuth ([OAuth for partner solutions](https://docs.databricks.com/en/integrations/manage-oauth.html)) with Databricks, providing an additional layer of security for dbt enterprise users. When you enable Databricks OAuth for a dbt project, all dbt developers must authenticate with Databricks in order to use the Studio IDE. The project's deployment environments will still leverage the Databricks authentication method set at the environment level. Current limitation: * The current experience requires the Studio IDE to be restarted every hour (access tokens expire after 1 hour - [workaround](https://docs.databricks.com/en/integrations/manage-oauth.html#override-the-default-token-lifetime-policy-for-dbt-core-power-bi-or-tableau-desktop)) ##### Configure Databricks OAuth (Databricks admin)[​](#configure-databricks-oauth-databricks-admin "Direct link to Configure Databricks OAuth (Databricks admin)") To get started, you will need to [add dbt as an OAuth application](https://docs.databricks.com/en/integrations/configure-oauth-dbt.html) with Databricks. There are two ways of configuring this application (CLI or Databricks UI). Here's how you can set this up in the Databricks UI: 1. Log in to the [account console](https://accounts.cloud.databricks.com/?_ga=2.255771976.118201544.1712797799-1002575874.1704693634) and click the **Settings** icon in the sidebar. 2. On the **App connections** tab, click **Add connection**. 3. Enter the following details: * A name for your connection. * The redirect URLs for your OAuth connection, which you can find in the table later in this section. * For Access scopes, the APIs the application should have access to: * For BI applications, the SQL scope is required to allow the connected app to access Databricks SQL APIs (this is required for SQL models). * For applications that need to access Databricks APIs for purposes other than querying, the ALL APIs scope is required (this is required if running Python models). * The access token time-to-live (TTL) in minutes. Default: 60. * The refresh token time-to-live (TTL) in minutes. Default: 10080. 4. Select **Generate a client secret**. Copy and securely store the client secret. The client secret will not be available later. You can use the following table to set up the redirect URLs for your application with dbt: | Region | Redirect URLs | | ------------------- | ------------------------------------------------------------------------------------------------------- | | **US multi-tenant** |
| | **US cell 1** |
| | **EMEA** |
| | **APAC** |
| | **Single tenant** |
| Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | ##### Configure the Connection in dbt (dbt project admin)[​](#configure-the-connection-in-dbt-dbt-project-admin "Direct link to Configure the Connection in dbt (dbt project admin)") Now that you have an OAuth app set up in Databricks, you'll need to add the client ID and secret to dbt. To do so: 1. From dbt, click on your account name in the left side menu and select **Account settings**. 2. Select **Projects** from the menu. 3. Choose your project from the list. 4. Click **Connections** and select the Databricks connection. 5. Click **Edit**. 6. Under the **Optional settings** section, add the `OAuth Client ID` and `OAuth Client Secret` from the Databricks OAuth app. [![Adding Databricks OAuth application client ID and secret to dbt](/img/docs/dbt-cloud/using-dbt-cloud/dbt-cloud-enterprise/DBX-auth/dbt-databricks-oauth.png?v=2 "Adding Databricks OAuth application client ID and secret to dbt")](#)Adding Databricks OAuth application client ID and secret to dbt ##### Authenticating to Databricks (Studio IDE developer)[​](#authenticating-to-databricks-studio-ide-developer "Direct link to Authenticating to Databricks (Studio IDE developer)") Once the Databricks connection via OAuth is set up for a dbt project, each dbt user will need to authenticate with Databricks in order to use the Studio IDE. To do so: 1. From dbt, click on your account name in the left side menu and select **Account settings**. 2. Under **Your profile**, select **Credentials**. 3. Choose your project from the list and click **Edit**. 4. Select `OAuth` as the authentication method, and click **Save**. 5. Finalize by clicking the **Connect Databricks Account** button. [![Connecting to Databricks from an IDE user profile](/img/docs/dbt-cloud/using-dbt-cloud/dbt-cloud-enterprise/DBX-auth/dbt-databricks-oauth-user.png?v=2 "Connecting to Databricks from an IDE user profile")](#)Connecting to Databricks from an IDE user profile You will then be redirected to Databricks and asked to approve the connection. This redirects you back to dbt. You should now be an authenticated Databricks user, ready to use the Studio IDE. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ##### Set up environment-level permissions To set up and configure environment-level permissions, you must have write permissions to the **Groups & Licenses** settings of your dbt account. For more information about roles and permissions, check out [User permissions and licenses](https://docs.getdbt.com/docs/cloud/manage-access/seats-and-users.md). Environment-level permissions are not the same as account-level [role-based access control (RBAC)](https://docs.getdbt.com/docs/cloud/manage-access/about-user-access.md#role-based-access-control) and are configured separately from those workflows. #### Setup instructions[​](#setup-instructions "Direct link to Setup instructions") In your dbt account: 1. Click your account name, above your profile icon on the left side panel. Then select **Account settings**. 2. Select **Groups & Licenses**. We recommend deleting the default `Owner`, `Member`, and `Everyone` groups and, instead, assigning users to your organizational groups to avoid granting them unnecessary elevated privileges. However, before deleting these groups, ensure that any existing users — including yourself — are reassigned to their appropriate organizational groups. You won’t be able to delete the `Owner` group until *at least* one user is added to another group with the account admin permission set or if there is a user with an IT license. This safety ensures that an account admin is always available to manage group changes. [![Groups & Licenses page in dbt with the default groups highlighted.](/img/docs/dbt-cloud/groups-and-licenses.png?v=2 "Groups & Licenses page in dbt with the default groups highlighted.")](#)Groups & Licenses page in dbt with the default groups highlighted. 3. Create a new or open an existing group. If it's a new group, give it a name, then scroll down to **Access & permissions**. Click **Add permission**. [![The Access & permissions section with the Add button highlighted.](/img/docs/dbt-cloud/add-permissions.png?v=2 "The Access & permissions section with the Add button highlighted.")](#)The Access & permissions section with the Add button highlighted. 4. Select the **Permission set** for the group. Only the following permissions sets can have environment-level permissions configured: * Database admin * Git admin * Team admin * Analyst * Developer Other permission sets are restricted because they have access to everything (for example, Account admin), or limitations prevent them from having write access to environments (for example, Account viewer). If you select a permission set that is not supported, the environment permission option will not appear. [![The view of the permissions box if there is no option for environment permissions.](/img/docs/dbt-cloud/no-option.png?v=2 "The view of the permissions box if there is no option for environment permissions.")](#)The view of the permissions box if there is no option for environment permissions. 5. Select the **Environment** for group access. The default is **All environments**, but you can select multiple. If none are selected, the group will have read-only access. [![A list of available environments with the Staging and General boxes selected.](/img/docs/dbt-cloud/environment-options.png?v=2 "A list of available environments with the Staging and General boxes selected.")](#)A list of available environments with the Staging and General boxes selected. 6. Save the Group settings. You're now setup and ready to assign users! #### User experience[​](#user-experience "Direct link to User experience") Users with permissions to the environment will see all capabilities assigned to their role. The environment-level permissions are `write` or `read-only` access. This feature does not currently support determining which features in the environment are accessible. For more details on what can and can not be done with environment-level permissions, refer to [About environment-permissions](https://docs.getdbt.com/docs/cloud/manage-access/environment-permissions.md). For example, here is an overview of the **Jobs** section of the environment page if a user has been granted access: [![The jobs page with write access and the 'Create job' button visible .](/img/docs/dbt-cloud/write-access.png?v=2 "The jobs page with write access and the 'Create job' button visible .")](#)The jobs page with write access and the 'Create job' button visible . The same page if the user has not been granted environment-level permissions: [![The jobs page with read-only access and the 'Create job' button is not visible .](/img/docs/dbt-cloud/read-only-access.png?v=2 "The jobs page with read-only access and the 'Create job' button is not visible .")](#)The jobs page with read-only access and the 'Create job' button is not visible . #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ##### Set up external OAuth with Redshift EnterpriseEnterprise + ### Set up external OAuth with Redshift [Enterprise](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing")[Enterprise +](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing") note This feature is currently only available for Okta and Entra ID identity providers. dbt Enterprise and Enterprise+ plans support OAuth authentication with external providers. When **External OAuth** is enabled, users can authorize their Development credentials using single sign-on (SSO) via the identity provider (IdP). External OAuth authorizes users to access multiple applications, including dbt, without sharing their static credentials with the service. This makes the process of authenticating for development environments easier for the user and provides an additional layer of security to your dbt account. #### Getting started[​](#getting-started "Direct link to Getting started") The process of setting up external OAuth will require a little bit of back-and-forth between your dbt, IdP, and data warehouse accounts, and having them open in multiple browser tabs will help speed up the configuration process: * **dbt:** You’ll primarily be working in the **Account settings** —> **Integrations** page. You will need [proper permission](https://docs.getdbt.com/docs/cloud/manage-access/enterprise-permissions.md) to set up the integration and create the connections. * **Identity providers:** * **Okta:** You’ll be working in multiple areas of the Okta account, but you can start in the **Applications** section. You will need permissions to [create an application](https://help.okta.com/en-us/content/topics/security/custom-admin-role/about-role-permissions.htm#Application_permissions) and an [authorization server](https://help.okta.com/en-us/content/topics/security/custom-admin-role/about-role-permissions.htm#Authorization_server_permissions). * **Entra ID** An admin with access to create [Entra ID apps](https://learn.microsoft.com/en-us/entra/identity/role-based-access-control/custom-available-permissions) who is also a user in the data warehouse is required. * **Data warehouse:** * **Redshift:** Create and manage the [Identity Center integration](https://aws.amazon.com/blogs/big-data/integrate-identity-provider-idp-with-amazon-redshift-query-editor-v2-and-sql-client-using-aws-iam-identity-center-for-seamless-single-sign-on/) with your identity provider. If the admins that handle these products are all different people, it’s better to have them coordinating simultaneously to reduce friction. Ensure your Amazon admins have completed the [Amazon Identity Center integration](https://aws.amazon.com/blogs/big-data/integrate-identity-provider-idp-with-amazon-redshift-query-editor-v2-and-sql-client-using-aws-iam-identity-center-for-seamless-single-sign-on/) with Okta or Entra ID. #### Identity provider configuration[​](#identity-provider-configuration "Direct link to Identity provider configuration") Select a supported identity provider (IdP) for instructions on configuring external OAuth in their environment and completing the integration in dbt: * Okta * Entra ID ##### 1. Initialize the dbt settings[​](#1-initialize-the-dbt-settings "Direct link to 1. Initialize the dbt settings") 1. In your dbt account, navigate to **Account settings** —> **Integrations**. 2. Scroll down to **Custom integrations** and click **Add integrations** 3. Leave this window open. You can set the **Integration type** to Okta and note the **Redirect URI** at the bottom of the page. Copy this to your clipboard for use in the next steps. [![Copy the callback URI at the bottom of the integration page in dbt.](/img/docs/dbt-cloud/callback-uri.png?v=2 "Copy the callback URI at the bottom of the integration page in dbt.")](#)Copy the callback URI at the bottom of the integration page in dbt. ##### 2. Create the Okta app[​](#2-create-the-okta-app "Direct link to 2. Create the Okta app") 1. Expand the **Applications** section from the Okta dashboard and click **Applications.** Click the **Create app integration** button. 2. Select **OIDC** as the sign-in method and **Web applications** as the application type. Click **Next**. [![The Okta app creation window with OIDC and Web Application selected.](/img/docs/dbt-cloud/create-okta-app.png?v=2 "The Okta app creation window with OIDC and Web Application selected.")](#)The Okta app creation window with OIDC and Web Application selected. 3. Give the application an appropriate name, something like “External OAuth app for dbt,” that will make it easily identifiable. 4. In the **Grant type** section, enable the **Refresh token** option. 5. Scroll down to the **Sign-in redirect URIs** option. You’ll need to paste the redirect URI you gathered from dbt in step 1.3. [![The Okta app configuration window with the sign-in redirect URI configured to the dbt value.](/img/docs/dbt-cloud/configure-okta-app.png?v=2 "The Okta app configuration window with the sign-in redirect URI configured to the dbt value.")](#)The Okta app configuration window with the sign-in redirect URI configured to the dbt value. 6. Save the app configuration. You’ll come back to it, but move on to the next steps for now. ##### 3. Create the Okta API[​](#3-create-the-okta-api "Direct link to 3. Create the Okta API") 1. Expand the **Security** section and click **API** from the Okta sidebar menu. 2. On the API screen, click **Add authorization server**. Give the authorization server a name (a nickname for your data warehouse account would be appropriate). For the **Audience** field, copy and paste your data warehouse login URL. Give the server an appropriate description and click **Save**. [![The Okta API window with the Audience value set.](/img/docs/dbt-cloud/create-okta-api.png?v=2 "The Okta API window with the Audience value set.")](#)The Okta API window with the Audience value set. 3. On the authorization server config screen, open the **Metadata URI** in a new tab. You’ll need information from this screen in later steps. [![The Okta API settings page with the metadata URI highlighted.](/img/docs/dbt-cloud/metadata-uri.png?v=2 "The Okta API settings page with the metadata URI highlighted.")](#)The Okta API settings page with the metadata URI highlighted. [![Sample output of the metadata URI.](/img/docs/dbt-cloud/metadata-example.png?v=2 "Sample output of the metadata URI.")](#)Sample output of the metadata URI. 4. Click on the **Scopes** tab and **Add scope**. In the **Name** field, add `session:role-any`. (Optional) Configure **Display phrase** and **Description** and click **Create**. [![API scope configured in the Add Scope window.](/img/docs/dbt-cloud/add-api-scope.png?v=2 "API scope configured in the Add Scope window.")](#)API scope configured in the Add Scope window. 5. Open the **Access policies** tab and click **Add policy**. Give the policy a **Name** and **Description** and set **Assign to** as **The following clients**. Start typing the name of the app you created in step 2.3, and you’ll see it autofill. Select the app and click **Create Policy**. [![Assignment field autofilling the value.](/img/docs/dbt-cloud/add-api-assignment.png?v=2 "Assignment field autofilling the value.")](#)Assignment field autofilling the value. 6. On the **access policy** screen, click **Add rule**. [![API Add rule button highlighted.](/img/docs/dbt-cloud/add-api-rule.png?v=2 "API Add rule button highlighted.")](#)API Add rule button highlighted. 7. Give the rule a descriptive name and scroll down to **token lifetimes**. Configure the **Access token lifetime is**, **Refresh token lifetime is**, and **but will expire if not used every** settings according to your organizational policies. We recommend the defaults of 1 hour and 90 days. Stricter rules increase the odds of your users having to re-authenticate. [![Token lifetime settings in the API rule window.](/img/docs/dbt-cloud/configure-token-lifetime.png?v=2 "Token lifetime settings in the API rule window.")](#)Token lifetime settings in the API rule window. 8. Navigate back to the **Settings** tab and leave it open in your browser. You’ll need some of the information in later steps. ##### 4. Create the OAuth settings in the data warehouse[​](#4-create-the-oauth-settings-in-the-data-warehouse "Direct link to 4. Create the OAuth settings in the data warehouse") Ensure your Amazon admins have completed the Identity Center integration with Okta. Configure the Okta application and APIs in accordance with your Amazon configs. ##### 5. Configuring the integration in dbt[​](#5-configuring-the-integration-in-dbt "Direct link to 5. Configuring the integration in dbt") 1. Navigate back to the dbt **Account settings** —> **Integrations** page you were on at the beginning. It’s time to start filling out all of the fields. 1. `Integration name`: Give the integration a descriptive name that includes identifying information about the Okta environment so future users won’t have to guess where it belongs. 2. `Client ID` and `Client secrets`: Retrieve these from the Okta application page. [![The client ID and secret highlighted in the Okta app.](/img/docs/dbt-cloud/gather-clientid-secret.png?v=2 "The client ID and secret highlighted in the Okta app.")](#)The client ID and secret highlighted in the Okta app. 3. Authorize URL and Token URL: Found in the metadata URI. [![The authorize and token URLs highlighted in the metadata URI.](/img/docs/dbt-cloud/gather-authorization-token-endpoints.png?v=2 "The authorize and token URLs highlighted in the metadata URI.")](#)The authorize and token URLs highlighted in the metadata URI. 2. **Save** the configuration ##### 6. Create a new connection in dbt[​](#6-create-a-new-connection-in-dbt "Direct link to 6. Create a new connection in dbt") 1. Navigate to **Account settings** and click **Connections** from the menu. Click **New connection**. 2. Configure the `Account`, `Database`, and `Warehouse` as you normally would, and for the `OAuth method`, select the external OAuth you just created. [![The new configuration window in dbt with the External OAuth showing as an option.](/img/docs/dbt-cloud/configure-new-connection.png?v=2 "The new configuration window in dbt with the External OAuth showing as an option.")](#)The new configuration window in dbt with the External OAuth showing as an option. 3. Scroll down to the **External OAuth** configurations box and select the config from the list. [![The new connection displayed in the External OAuth Configurations box.](/img/docs/dbt-cloud/select-oauth-config.png?v=2 "The new connection displayed in the External OAuth Configurations box.")](#)The new connection displayed in the External OAuth Configurations box. 4. **Save** the connection, and you have now configured External OAuth with Okta! ##### 1. Initialize the dbt settings[​](#1-initialize-the-dbt-settings-1 "Direct link to 1. Initialize the dbt settings") 1. In your dbt account, navigate to **Account settings** —> **Integrations**. 2. Scroll down to **Custom integrations** and click **Add integrations**. 3. Leave this window open. You can set the **Integration type** to Entra ID and note the **Redirect URI** at the bottom of the page. Copy this to your clipboard for use in the next steps. ##### 2. Create the Entra ID apps[​](#2-create-the-entra-id-apps "Direct link to 2. Create the Entra ID apps") You’ll create two apps in the Azure portal: A resource server and a client app. ###### Create a resource server[​](#create-a-resource-server "Direct link to Create a resource server") In your Entra ID account: 1. From the app registrations screen, click **New registration**. 1. Give the app a name. 2. Ensure **Supported account types** are set to “Accounts in this organizational directory only (`Org name` - Single Tenant).” 3. Click **Register** to see the application’s overview. 2. From the app overview page left menu, click **Expose an API**. 3. Click **Add** next to **Application ID URI**. The field will automatically populate. 4. Click **Save**. [![Create the Entra ID resource server.](/img/docs/dbt-cloud/create-resource-server.png?v=2 "Create the Entra ID resource server.")](#)Create the Entra ID resource server. 5. Record the `value` field for use in a future step. 6. From the same screen, click **Add scope**: 1. Name the scope `dbt-redshift`. 2. Set **Who can consent?** to **Admins and users**. 3. Set **Admin consent display name** to `dbt-redshift` and give it a description. 4. Ensure **State** is set to **Enabled**. 5. Click **Add scope**. ###### Create a client app[​](#create-a-client-app "Direct link to Create a client app") 1. From the **App registration page**, click **New registration**. 1. Give the app a name that uniquely identifies it as the client app. 2. Ensure **Supported account types** are set to “Accounts in this organizational directory only (`Org name` - Single Tenant).” 3. Set the **Redirect URI** to **Web** and copy/paste the **Redirect URI** from dbt into the field. 4. Click **Register**. 2. From the app overview page, click **API permissions** from the left menu, and click **Add permission**. [![Add permissions to the Entra ID app.](/img/docs/dbt-cloud/add-permission-entra.png?v=2 "Add permissions to the Entra ID app.")](#)Add permissions to the Entra ID app. 3. From the pop-out screen, click **APIs my organization uses**, search for the resource server name from the previous steps, and click it. 4. Ensure the box for the **Permissions** `dbt-redshift` is enabled and click **Add permissions**. 5. Click **Grant admin consent** and from the popup modal click **Yes**. 6. From the left menu, click **Certificates and secrets** and click **New client secret**. Name the secret, set an expiration, and click **Add**.  * **Note**: Microsoft does not allow “forever” as an expiration date. The maximum time is two years. Documenting the expiration date so you can refresh the secret before the expiration or user authorization fails is essential. 7. Record the `value` for use in a future step and record it immediately.  * **Note**: Entra ID will not display this value again once you navigate away from this screen. ##### 3. Configuring the integration in dbt[​](#3-configuring-the-integration-in-dbt "Direct link to 3. Configuring the integration in dbt") 1. Navigate back to the dbt **Account settings** —> **Integrations** page you were on at the beginning. It’s time to start filling out all of the fields. There will be some back-and-forth between the Entra ID account and dbt. 2. `Integration name`: Give the integration a descriptive name that includes identifying information about the Entra ID environment so future users won’t have to guess where it belongs. 3. `Client secrets`: Found in the Client ID from the **Certificates and secrets** page. `Value` is the `Client secret`. Note that it only appears when created; *Microsoft hides the secret if you return later, and you must recreate it.* 4. `Client ID`: Copy the’ Application (client) ID’ on the overview page for the client ID app. 5. `Authorization URL` and `Token URL`: From the client ID app, open the `Endpoints` tab. These URLs map to the `OAuth 2.0 authorization endpoint (v2)` and `OAuth 2.0 token endpoint (v2)` fields. *You must use v2 of the `OAuth 2.0 authorization endpoint`. Do not use V1.* You can use either version of the `OAuth 2.0 token endpoint`. 6. `Application ID URI`: Copy the `Application ID URI` field from the resource server’s Overview screen. #### Configure the Trusted Token Issuer in IAM IdC[​](#configure-the-trusted-token-issuer-in-iam-idc "Direct link to Configure the Trusted Token Issuer in IAM IdC") A *trusted token issuer* generates an access token that is used to identify a user, and then authenticates that user. This essentially lets services outside of the AWS ecosystem, such as the dbt platform, connect to IAM IdC (and Redshift) with access tokens they have generated or retrieved from an external IdP (Entra ID or Okta). The following steps are outlined per [this blog post](https://aws.amazon.com/blogs/big-data/integrate-tableau-and-microsoft-entra-id-with-amazon-redshift-using-aws-iam-identity-center/): 1. Open the AWS Management Console and navigate to [IAM Identity Center](https://console.aws.amazon.com/singlesignon), and then to the **Settings**. 2. Select the **Authentication** tab and under **Trusted token issuers**, choose **Create trusted token issuer**. 3. On the **Set up an external IdP to issue trusted tokens** page, under **Trusted token issuer details**, do the following: 1. For **Issuer URL**, enter the OIDC discovery URL of the external IdP that will issue tokens for trusted identity propagation. *Include the forward slash at the end of the URL*. 2. For **Trusted token issuer name**, enter a name to identify this TTI in IAM Identity Center and the application console. 3. Under Map attributes, do the following: 1. For **Identity provider attribute**, select an attribute from the list to map to an attribute in the Identity Center identity store. You can choose: * Email  * Object Identifier * Subject * Other — When using this options with UPN, it's been our experience that `upn` matched up with `Email`. #### Configure Redshift IdC application to utilize TTI[​](#configure-redshift-idc-application-to-utilize-tti "Direct link to Configure Redshift IdC application to utilize TTI") To start, select **IAM Identity Center connection** from the Amazon Redshift console menu. [![The AWS Redshift console.](/img/docs/dbt-cloud/redshift-idc.png?v=2 "The AWS Redshift console.")](#)The AWS Redshift console. 1. Select the Amazon Redshift application that you created as part of the setup. 2. Select the **Client connections tab** and choose **Edit**. 3. Choose **Yes** under **Configure client connections that use third-party IdPs**. 4. Select the checkbox for **Trusted token issuer** that you created in the previous section. 5. Enter the aud claim value under **Configure selected trusted token issuers**. **This should be the application ID URI you set for the integration in the dbt platform.** #### Finalizing the dbt configuration[​](#finalizing-the-dbt-configuration "Direct link to Finalizing the dbt configuration") If you have an existing connection, make sure that the OAuth method is set to **External OAuth** and select the integration you created in an earlier step. Otherwise, create a new Redshift connection, being sure to set values for: * **Server Hostname** * **OAuth Method** * **Database name** (this field can be found under the **Optional Settings**) This connection should be set as the connection for a development environment in an existing or new project. Once the connection has been assigned to a development environment, you can configure your user credentials for that development environment under `Account Settings > Your Profile > Credentials > `. Set the authentication method to `External OAuth`, set the `schema` and other fields if desired, and save the credentials. You can then click the `Connect to Redshift` button. ##### Verify connection in Studio[​](#verify-connection-in-studio "Direct link to Verify connection in Studio") Once your development session has initialized, you can test that you’re able to connect to Redshift using external OAuth by running `dbt debug`. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ##### Set up external OAuth with Snowflake EnterpriseEnterprise + ### Set up external OAuth with Snowflake [Enterprise](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing")[Enterprise +](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing") note This feature is currently only available for Okta and Entra ID identity providers. dbt Enterprise and Enterprise+ plans support OAuth authentication with external providers. When **External OAuth** is enabled, users can authorize their Development credentials using single sign-on (SSO) via the identity provider (IdP). External OAuth authorizes users to access multiple applications, including dbt, without sharing their static credentials with the service. This makes the process of authenticating for development environments easier for the user and provides an additional layer of security to your dbt account. #### Getting started[​](#getting-started "Direct link to Getting started") The process of setting up external OAuth will require a little bit of back-and-forth between your dbt, IdP, and data warehouse accounts, and having them open in multiple browser tabs will help speed up the configuration process: * **dbt:** You’ll primarily be working in the **Account settings** —> **Integrations** page. You will need [proper permission](https://docs.getdbt.com/docs/cloud/manage-access/enterprise-permissions.md) to set up the integration and create the connections. **Identity providers:** * **Okta:** You’ll be working in multiple areas of the Okta account, but you can start in the **Applications** section. You will need permissions to [create an application](https://help.okta.com/en-us/content/topics/security/custom-admin-role/about-role-permissions.htm#Application_permissions) and an [authorization server](https://help.okta.com/en-us/content/topics/security/custom-admin-role/about-role-permissions.htm#Authorization_server_permissions). * **Entra ID** An admin with access to create [Entra ID apps](https://learn.microsoft.com/en-us/entra/identity/role-based-access-control/custom-available-permissions) who is also a user in the data warehouse is required. **Data warehouse:** * **Snowflake:** Open a worksheet in an account that has permissions to [create a security integration](https://docs.snowflake.com/en/sql-reference/sql/create-security-integration). If the admins that handle these products are all different people, it’s better to have them coordinating simultaneously to reduce friction. Snowflake and IdP username matching required Ensure that the username/email address entered by the IdP admin matches the Snowflake credentials username. If the email address used in the dbt setup is different from the Snowflake email address, the connection will fail or you may run into issues. #### Data warehouse configurations[​](#data-warehouse-configurations "Direct link to Data warehouse configurations") The following is a template for creating the OAuth configurations in the Snowflake environment: ```sql create security integration your_integration_name type = external_oauth enabled = true external_oauth_type = okta external_oauth_issuer = '' external_oauth_jws_keys_url = '' external_oauth_audience_list = ('') external_oauth_token_user_mapping_claim = 'sub' external_oauth_snowflake_user_mapping_attribute = 'email_address' external_oauth_any_role_mode = 'ENABLE' ``` The `external_oauth_token_user_mapping_claim` and `external_oauth_snowflake_user_mapping_attribute` can be modified based on the your organizations needs. These values point to the claim in the users’ token. In the example, Snowflake will look up the Snowflake user whose `email` matches the value in the `sub` claim. **Notes:** * The Snowflake default roles ACCOUNTADMIN, ORGADMIN, or SECURITYADMIN, are blocked from external OAuth by default and they will likely fail to authenticate. See the [Snowflake documentation](https://docs.snowflake.com/en/sql-reference/sql/create-security-integration-oauth-external) for more information. * The value for `external_oauth_snowflake_user_mapping_attribute` must map correctly to the Snowflake username. For example, if `email_address` is used, the email in the token from the IdP must match the Snowflake username exactly. #### Identity provider configuration[​](#identity-provider-configuration "Direct link to Identity provider configuration") Select a supported identity provider (IdP) for instructions on configuring external OAuth in their environment and completing the integration in dbt: * Okta * Entra ID ##### 1. Initialize the dbt settings[​](#1-initialize-the-dbt-settings "Direct link to 1. Initialize the dbt settings") 1. In your dbt account, navigate to **Account settings** —> **Integrations**. 2. Scroll down to **Custom integrations** and click **Add integrations** 3. Leave this window open. You can set the **Integration type** to Okta and note the **Redirect URI** at the bottom of the page. Copy this to your clipboard for use in the next steps. [![Copy the callback URI at the bottom of the integration page in dbt.](/img/docs/dbt-cloud/callback-uri.png?v=2 "Copy the callback URI at the bottom of the integration page in dbt.")](#)Copy the callback URI at the bottom of the integration page in dbt. ##### 2. Create the Okta app[​](#2-create-the-okta-app "Direct link to 2. Create the Okta app") 1. Expand the **Applications** section from the Okta dashboard and click **Applications.** Click the **Create app integration** button. 2. Select **OIDC** as the sign-in method and **Web applications** as the application type. Click **Next**. [![The Okta app creation window with OIDC and Web Application selected.](/img/docs/dbt-cloud/create-okta-app.png?v=2 "The Okta app creation window with OIDC and Web Application selected.")](#)The Okta app creation window with OIDC and Web Application selected. 3. Give the application an appropriate name, something like “External OAuth app for dbt,” that will make it easily identifiable. 4. In the **Grant type** section, enable the **Refresh token** option. 5. Scroll down to the **Sign-in redirect URIs** option. You’ll need to paste the redirect URI you gathered from dbt in step 1.3. [![The Okta app configuration window with the sign-in redirect URI configured to the dbt value.](/img/docs/dbt-cloud/configure-okta-app.png?v=2 "The Okta app configuration window with the sign-in redirect URI configured to the dbt value.")](#)The Okta app configuration window with the sign-in redirect URI configured to the dbt value. 6. Save the app configuration. You’ll come back to it, but move on to the next steps for now. ##### 3. Create the Okta API[​](#3-create-the-okta-api "Direct link to 3. Create the Okta API") 1. Expand the **Security** section and click **API** from the Okta sidebar menu. 2. On the API screen, click **Add authorization server**. Give the authorization server a name (a nickname for your data warehouse account would be appropriate). For the **Audience** field, copy and paste your data warehouse login URL (for example, ). Give the server an appropriate description and click **Save**. [![The Okta API window with the Audience value set.](/img/docs/dbt-cloud/create-okta-api.png?v=2 "The Okta API window with the Audience value set.")](#)The Okta API window with the Audience value set. 3. On the authorization server config screen, open the **Metadata URI** in a new tab. You’ll need information from this screen in later steps. [![The Okta API settings page with the metadata URI highlighted.](/img/docs/dbt-cloud/metadata-uri.png?v=2 "The Okta API settings page with the metadata URI highlighted.")](#)The Okta API settings page with the metadata URI highlighted. [![Sample output of the metadata URI.](/img/docs/dbt-cloud/metadata-example.png?v=2 "Sample output of the metadata URI.")](#)Sample output of the metadata URI. 4. Click on the **Scopes** tab and **Add scope**. In the **Name** field, add `session:role-any`. (Optional) Configure **Display phrase** and **Description** and click **Create**. [![API scope configured in the Add Scope window.](/img/docs/dbt-cloud/add-api-scope.png?v=2 "API scope configured in the Add Scope window.")](#)API scope configured in the Add Scope window. 5. Open the **Access policies** tab and click **Add policy**. Give the policy a **Name** and **Description** and set **Assign to** as **The following clients**. Start typing the name of the app you created in step 2.3, and you’ll see it autofill. Select the app and click **Create Policy**. [![Assignment field autofilling the value.](/img/docs/dbt-cloud/add-api-assignment.png?v=2 "Assignment field autofilling the value.")](#)Assignment field autofilling the value. 6. On the **access policy** screen, click **Add rule**. [![API Add rule button highlighted.](/img/docs/dbt-cloud/add-api-rule.png?v=2 "API Add rule button highlighted.")](#)API Add rule button highlighted. 7. Give the rule a descriptive name and scroll down to **token lifetimes**. Configure the **Access token lifetime is**, **Refresh token lifetime is**, and **but will expire if not used every** settings according to your organizational policies. We recommend the defaults of 1 hour and 90 days. Stricter rules increase the odds of your users having to re-authenticate. [![Token lifetime settings in the API rule window.](/img/docs/dbt-cloud/configure-token-lifetime.png?v=2 "Token lifetime settings in the API rule window.")](#)Token lifetime settings in the API rule window. 8. Navigate back to the **Settings** tab and leave it open in your browser. You’ll need some of the information in later steps. ##### 4. Create the OAuth settings in the data warehouse[​](#4-create-the-oauth-settings-in-the-data-warehouse "Direct link to 4. Create the OAuth settings in the data warehouse") 1. Open up a Snowflake worksheet and copy/paste the following: ```sql create security integration your_integration_name type = external_oauth enabled = true external_oauth_type = okta external_oauth_issuer = '' external_oauth_jws_keys_url = '' external_oauth_audience_list = ('') external_oauth_token_user_mapping_claim = 'sub' external_oauth_snowflake_user_mapping_attribute = 'email_address' external_oauth_any_role_mode = 'ENABLE' ``` 2. Change `your_integration_name` to something appropriately descriptive. For example, `dev_OktaAccountNumber_okta`. Copy the `external_oauth_issuer` and `external_oauth_jws_keys_url` from the metadata URI in step 3.3. Use the same Snowflake URL you entered in step 3.2 as the `external_oauth_audience_list`. Adjust the other settings as needed to meet your organization's configurations in Okta and Snowflake. [![The issuer and jws keys URIs in the metadata URL](/img/docs/dbt-cloud/gather-uris.png?v=2 "The issuer and jws keys URIs in the metadata URL")](#)The issuer and jws keys URIs in the metadata URL 3. Run the steps to create the integration in Snowflake. Username consistency Ensure that the username (for example, email address) entered in the IdP matches the Snowflake credentials for all users. Mismatched usernames will result in authentication failures. ##### 5. Configuring the integration in dbt[​](#5-configuring-the-integration-in-dbt "Direct link to 5. Configuring the integration in dbt") 1. Navigate back to the dbt **Account settings** —> **Integrations** page you were on at the beginning. It’s time to start filling out all of the fields. 1. `Integration name`: Give the integration a descriptive name that includes identifying information about the Okta environment so future users won’t have to guess where it belongs. 2. `Client ID` and `Client secrets`: Retrieve these from the Okta application page. [![The client ID and secret highlighted in the Okta app.](/img/docs/dbt-cloud/gather-clientid-secret.png?v=2 "The client ID and secret highlighted in the Okta app.")](#)The client ID and secret highlighted in the Okta app. 3. Authorize URL and Token URL: Found in the metadata URI. [![The authorize and token URLs highlighted in the metadata URI.](/img/docs/dbt-cloud/gather-authorization-token-endpoints.png?v=2 "The authorize and token URLs highlighted in the metadata URI.")](#)The authorize and token URLs highlighted in the metadata URI. 2. **Save** the configuration ##### 6. Create a new connection in dbt[​](#6-create-a-new-connection-in-dbt "Direct link to 6. Create a new connection in dbt") 1. Navigate to **Account settings** and click **Connections** from the menu. Click **New connection**. 2. Configure the `Account`, `Database`, and `Warehouse` as you normally would, and for the `OAuth method`, select the external OAuth you just created. [![The new configuration window in dbt with the External OAuth showing as an option.](/img/docs/dbt-cloud/configure-new-connection.png?v=2 "The new configuration window in dbt with the External OAuth showing as an option.")](#)The new configuration window in dbt with the External OAuth showing as an option. 3. Scroll down to the **External OAuth** configurations box and select the config from the list. [![The new connection displayed in the External OAuth Configurations box.](/img/docs/dbt-cloud/select-oauth-config.png?v=2 "The new connection displayed in the External OAuth Configurations box.")](#)The new connection displayed in the External OAuth Configurations box. 4. **Save** the connection, and you have now configured External OAuth with Okta! ##### 1. Initialize the dbt settings[​](#1-initialize-the-dbt-settings-1 "Direct link to 1. Initialize the dbt settings") 1. In your dbt account, navigate to **Account settings** —> **Integrations**. 2. Scroll down to **Custom integrations** and click **Add integrations**. 3. Leave this window open. You can set the **Integration type** to Entra ID and note the **Redirect URI** at the bottom of the page. Copy this to your clipboard for use in the next steps. ##### 2. Create the Entra ID apps[​](#2-create-the-entra-id-apps "Direct link to 2. Create the Entra ID apps") * You’ll create two apps in the Azure portal: A resource server and a client app. * In your Azure portal, open the **Entra ID** and click **App registrations** from the left menu. important * You need both an Entra ID admin and a data warehouse admin to complete the setup. These roles don’t need to be the same person — as long as they collaborate, everything should work smoothly. * Typically, the Entra ID admin handles app registration and permissions, while the data warehouse admin manages roles, grants, and integrations on the warehouse side. * The `value` field gathered in these steps is only displayed once. When created, record it immediately. * Ensure that the username (for example, email address) entered in the IdP matches the data warehouse credentials for all users. Mismatched usernames will result in authentication failures. ##### 3. Create a resource server[​](#3-create-a-resource-server "Direct link to 3. Create a resource server") 1. From the app registrations screen, click **New registration**. 1. Give the app a name. 2. Ensure **Supported account types** are set to “Accounts in this organizational directory only (`Org name` - Single Tenant).” 3. Click **Register**to see the application’s overview. 2. From the app overview page, click **Expose an API** from the left menu. 3. Click **Add** next to **Application ID URI**. The field will automatically populate. Click **Save**. 4. Record the `value` field for use in a future step. *This is only displayed once. Be sure to record it immediately. Microsoft hides the field when you leave the page and come back.* 5. From the same screen, click **Add scope**. 1. Name the scope `session:role-any`. 2. Set “Who can consent?” to **Admins and users**. 3. Set **Admin consent display name** to `session:role-any` and give it a description. 4. Ensure **State** is set to **Enabled**. 5. Click **Add scope**. ##### 4. Create a client app[​](#4-create-a-client-app "Direct link to 4. Create a client app") 1. From the **App registration page**, click **New registration**. 1. Give the app a name that uniquely identifies it as the client app. 2. Ensure **Supported account types** are set to “Accounts in this organizational directory only (`Org name` - Single Tenant).” 3. Set the **Redirect URI** to **Web** and copy/paste the **Redirect URI** from dbt into the field. 4. Click **Register**. 2. From the app overview page, click **API permissions** from the left menu, and click **Add permission**. 3. From the pop-out screen, click **APIs my organization uses**, search for the resource server name from the previous steps, and click it. 4. Ensure the box for the **Permissions** `session:role-any` is enabled and click **Add permissions**. 5. Click **Grant admin consent** and from the popup modal click **Yes**. 6. From the left menu, click **Certificates and secrets** and click **New client secret**. Name the secret, set an expiration, and click **Add**. **Note**: Microsoft does not allow “forever” as an expiration date. The maximum time is two years. Documenting the expiration date so you can refresh the secret before the expiration or user authorization fails is essential. 7. Record the `value` for use in a future step and record it immediately. **Note**: Entra ID will not display this value again once you navigate away from this screen. ##### 5. Data warehouse configuration[​](#5-data-warehouse-configuration "Direct link to 5. Data warehouse configuration") You'll be switching between the Entra ID site and Snowflake. Keep your Entra ID account open for this process. Copy and paste the following as a template in a Snowflake worksheet: ```sql create or replace security integration type = external_oauth enabled = true external_oauth_type = azure external_oauth_issuer = '' external_oauth_jws_keys_url = '' external_oauth_audience_list = ('') external_oauth_token_user_mapping_claim = 'upn' external_oauth_any_role_mode = 'ENABLE' external_oauth_snowflake_user_mapping_attribute = 'login_name'; ``` On the Entra ID site: 1. From the Client ID app in Entra ID, click **Endpoints** and open the **Federation metadata document** in a new tab. * The **entity ID** on this page maps to the `external_oauth_issuer` field in the Snowflake config. 2. Back on the list of endpoints, open the **OpenID Connect metadata document** in a new tab. * The **jwks\_uri** field maps to the `external_oauth_jws_keys_url` field in Snowflake. 3. Navigate to the resource server in previous steps. * The **Application ID URI** maps to the `external_oauth_audience_list` field in Snowflake. 4. Run the configurations. You need both an Entra ID admin and a data warehouse admin to complete the setup. If these admins are not the same person, they should work together to complete the configuration. ##### 6. Configuring the integration in dbt[​](#6-configuring-the-integration-in-dbt "Direct link to 6. Configuring the integration in dbt") 1. Navigate back to the dbt **Account settings** —> **Integrations** page you were on at the beginning. It’s time to start filling out all of the fields. There will be some back-and-forth between the Entra ID account and dbt. 2. `Integration name`: Give the integration a descriptive name that includes identifying information about the Entra ID environment so future users won’t have to guess where it belongs. 3. `Client secrets`: Found in the Client ID from the **Certificates and secrets** page. `Value` is the `Client secret`. Note that it only appears when created; *Microsoft hides the secret if you return later, and you must recreate it.* 4. `Client ID`: Copy the’ Application (client) ID’ on the overview page for the client ID app. 5. `Authorization URL` and `Token URL`: From the client ID app, open the `Endpoints` tab. These URLs map to the `OAuth 2.0 authorization endpoint (v2)` and `OAuth 2.0 token endpoint (v2)` fields. *You must use v2 of the `OAuth 2.0 authorization endpoint`. Do not use V1.* You can use either version of the `OAuth 2.0 token endpoint`. 6. `Application ID URI`: Copy the `Application ID URI` field from the resource server’s Overview screen. #### FAQs[​](#faqs "Direct link to FAQs") Receiving a \`Failed to connect to DB\` error when connecting to Snowflake 1. If you see the following error: ```text Failed to connect to DB: xxxxxxx.snowflakecomputing.com:443. The role requested in the connection, or the default role if none was requested in the connection ('xxxxx'), is not listed in the Access Token or was filtered. Please specify another role, or contact your OAuth Authorization server administrator. ``` 2. Edit your OAuth Security integration and explicitly specify this scope mapping attribute: ```sql ALTER INTEGRATION SET EXTERNAL_OAUTH_SCOPE_MAPPING_ATTRIBUTE = 'scp'; ``` You can read more about this error in [Snowflake's documentation](https://community.snowflake.com/s/article/external-custom-oauth-error-the-role-requested-in-the-connection-is-not-listed-in-the-access-token). *** 1. If you see the following error: ```text Failed to connect to DB: xxxxxxx.snowflakecomputing.com:443. Incorrect username or password was specified. ``` * **Unique email addresses** — Each user in Snowflake must have a unique email address. You can't have multiple users (for example, a human user and a service account) using the same email, such as `alice@acme.com`, to authenticate to Snowflake. * **Match email addresses with identity provider** — The email address of your Snowflake user must exactly match the email address you use to authenticate with your Identity Provider (IdP). For example, if your Snowflake user's email is `alice@acme.com` but you log in to Entra or Okta with `alice_adm@acme.com`, this mismatch can cause an error. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ##### Set up SCIM EnterpriseEnterprise + ### Set up SCIM [Enterprise](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing")[Enterprise +](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing") The System for Cross-Domain Identity Management (SCIM) makes user data more secure and simplifies the admin and end-user lifecycle experience by automating user identities and groups. You can create or disable user identities in your Identity Provider (IdP), and SCIM will automatically make those changes in near real-time downstream in dbt. #### Prerequisites[​](#prerequisites "Direct link to Prerequisites") To configure SCIM in your dbt environment: * You must be on an [Enterprise or Enterprise+ plan](https://www.getdbt.com/pricing). * You must use Okta or Entra ID as your SSO provider and have it connected in the dbt platform. * You must have [permissions](https://docs.getdbt.com/docs/cloud/manage-access/enterprise-permissions.md) to configure the account settings in dbt platform and change application settings in [Okta](https://help.okta.com/en-us/content/topics/security/administrators-admin-comparison.htm). * If you have IP restrictions enabled, you must add [Okta's IPs](https://help.okta.com/en-us/content/topics/security/ip-address-allow-listing.htm) to your allowlist. ##### Supported features[​](#supported-features "Direct link to Supported features") The currently available supported features for SCIM are: * User provisioning and de-provisioning * User profile updates * Group creation and management * Importing groups and users When SCIM is enabled, the following functionality will change: * Users are not automatically added to default groups * Manual actions such as inviting users, updating user information and updating group memberships are disabled by default * SSO group mappings are disabled in favor of SCIM group management To overwrite these updates to functionality with SCIM enabled, enable manual updates as part of the SCIM configuration (not recommended). When users are provisioned, the following attributes are supported * Username * Family name * Given name The following IdPs are supported in the dbt user interface: * [Set up SCIM with Okta](https://docs.getdbt.com/docs/cloud/manage-access/scim-okta.md) (includes [license management](https://docs.getdbt.com/docs/cloud/manage-access/scim-manage-user-licenses.md)) * [Set up SCIM with Entra ID](https://docs.getdbt.com/docs/cloud/manage-access/scim-entra-id.md) If your IdP isn't on the list, it can be supported using dbt [APIs](https://docs.getdbt.com/dbt-cloud/api-v3#/operations/Retrieve%20SCIM%20configuration). #### Set up dbt[​](#set-up-dbt "Direct link to Set up dbt") To retrieve the necessary dbt configurations for use in Okta or Entra ID: 1. Navigate to your dbt **Account settings**. 2. Under **Settings**, click **SSO & SCIM**. 3. Scroll to the bottom of your SSO configuration settings and click **Enable SCIM**. [![SCIM enabled in the configuration settings.](/img/docs/dbt-cloud/access-control/enable-scim.png?v=2 "SCIM enabled in the configuration settings.")](#)SCIM enabled in the configuration settings. 4. Record the **SCIM base URL** field for use in a later step. 5. Click **Create SCIM token**. note To follow best practices, you should regularly rotate your SCIM tokens. To do so, follow these same instructions you did to create a new one. To avoid service disruptions, remember to replace your token in your IdP before deleting the old token in dbt. 6. In the pop-up window, give the token a name that will make it easily identifiable. Click **Save**. [![Give your token and identifier.](/img/docs/dbt-cloud/access-control/name-scim-token.png?v=2 "Give your token and identifier.")](#)Give your token and identifier. 7. Copy the token and record it securely, as *it will not be available again after you close the window*. You must create a new token if you lose the current one. [![Give your token and identifier.](/img/docs/dbt-cloud/access-control/copy-scim-token.png?v=2 "Give your token and identifier.")](#)Give your token and identifier. 8. (Optional) Manual updates are turned off by default for all SCIM-managed entities, including the ability to invite new users manually. This ensures SCIM-managed entities stay in sync with the IdP, and we recommend keeping this setting disabled. * However, if you need to make manual updates (like update group membership for a SCIM-managed group), you can enable this setting by clicking **Allow manual updates** and confirming the **Allow manual updates** pop up. [![Enabling manual updates in SCIM settings.](/img/docs/dbt-cloud/access-control/scim-manual-updates.png?v=2 "Enabling manual updates in SCIM settings.")](#)Enabling manual updates in SCIM settings. #### Next steps[​](#next-steps "Direct link to Next steps") Configure SCIM for your identity provider and optionally manage licenses: * **[Set up SCIM with Okta](https://docs.getdbt.com/docs/cloud/manage-access/scim-okta.md)** — User and group provisioning, profile updates, and [license management](https://docs.getdbt.com/docs/cloud/manage-access/scim-manage-user-licenses.md) (Okta only). * **[Set up SCIM with Entra ID](https://docs.getdbt.com/docs/cloud/manage-access/scim-entra-id.md)** — User and group provisioning and profile updates, plus assigning users to the SCIM app. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ##### Set up SCIM with Entra ID EnterpriseEnterprise + ### Set up SCIM with Entra ID [Enterprise](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing")[Enterprise +](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing") SCIM available for Entra ID dbt supports System for Cross-Domain Identity Management (SCIM) with Microsoft Entra ID for user and group provisioning and profile updates. Automatic license type mapping is not currently supported with Entra ID SCIM. See [mapped configuration](https://docs.getdbt.com/docs/cloud/manage-access/seats-and-users.md#mapped-configuration) to manage license types within the dbt platform user interface. #### Prerequisites[​](#prerequisites "Direct link to Prerequisites") * Available on [Enterprise or Enterprise+ plans](https://www.getdbt.com/pricing). * You must use Entra ID as your single sign-on (SSO) provider and have it connected in the dbt platform. * You must have [permissions](https://docs.getdbt.com/docs/cloud/manage-access/enterprise-permissions.md) to configure the account settings in dbt platform. * Complete [setup SSO with Entra ID](https://docs.getdbt.com/docs/cloud/manage-access/set-up-sso-microsoft-entra-id.md) before configuring SCIM settings. * Complete the [Set up SCIM](https://docs.getdbt.com/docs/cloud/manage-access/scim.md#set-up-dbt) to get your SCIM base URL and token. #### Set up Entra ID[​](#set-up-entra-id "Direct link to Set up Entra ID") 1. Log in to your Azure account and open the **Entra ID** configurations. 2. From the sidebar, under **Manage**, click **Enterprise Applications**. 3. Click **New Application** and select the option to **Create your own application**. [![Create your own application.](/img/docs/dbt-cloud/access-control/create-your-own.png?v=2 "Create your own application.")](#)Create your own application. 4. Give your app a unique name and ensure the **Integrate any other application you don't find in the gallery (Non-gallery)** field is selected. Ignore any prompts for existing apps. Click **Create**. [![Give your app a unique name.](/img/docs/dbt-cloud/access-control/create-application.png?v=2 "Give your app a unique name.")](#)Give your app a unique name. 5. From the application **Overview** screen, click **Provision User Accounts**. [![The 'Provision user accounts' option.](/img/docs/dbt-cloud/access-control/provision-user-accounts.png?v=2 "The 'Provision user accounts' option.")](#)The 'Provision user accounts' option. 6. From the **Create configuration** section, click **Connect your application**. 7. Fill out the form with the information from your dbt account: * The **Tenant URL** in Entra ID is your **SCIM base URL** from dbt. * The **Secret token** in Entra ID is your **SCIM token** from dbt. 8. Click **Test connection** and click **Create** once complete. [![Configure the app and test the connection.](/img/docs/dbt-cloud/access-control/provisioning-config.png?v=2 "Configure the app and test the connection.")](#)Configure the app and test the connection. #### Attribute mapping[​](#attribute-mapping "Direct link to Attribute mapping") To map the attributes that will sync with dbt: 1. From the enterprise app **Overview** screen sidebar menu, click **Provisioning**. [![The Provisioning option on the sidebar.](/img/docs/dbt-cloud/access-control/provisioning.png?v=2 "The Provisioning option on the sidebar.")](#)The Provisioning option on the sidebar. 2. Under **Manage**, click **Provisioning** again. 3. Expand the **Mappings** section and click **Provision Microsoft Entra ID users**. [![Provision the Entra ID users.](/img/docs/dbt-cloud/access-control/provision-entra-users.png?v=2 "Provision the Entra ID users.")](#)Provision the Entra ID users. 4. Select the box for **Show advanced options** and then click **Edit attribute list for customappsso**. [![Click to edit the customappsso attributes.](/img/docs/dbt-cloud/access-control/customappsso-attributes.png?v=2 "Click to edit the customappsso attributes.")](#)Click to edit the customappsso attributes. 5. Scroll to the bottom of the **Edit Attribute List** window and find an empty field where you can add a new entry with the following fields: * **Name:** `emails[type eq "work"].primary` * **Type:** `Boolean` * **Required:** True [![Add the new field to the entry list.](/img/docs/dbt-cloud/access-control/customappsso-entry.png?v=2 "Add the new field to the entry list.")](#)Add the new field to the entry list. 6. Mark all of the fields listed in Step 10 below as `Required`. [![Mark the fields as required.](/img/docs/dbt-cloud/access-control/mark-as-required.png?v=2 "Mark the fields as required.")](#)Mark the fields as required. 7. Click **Save**. 8. Back on the **Attribute mapping** window, click **Add new mapping** and complete fields with the following: * **Mapping type:** `none` * **Default value if null (optional):** `True` * **Target attribute:** `emails[type eq "work"].primary` * **Match objects using this attribute:** `No` * **Matching precedence:** *Leave blank* * **Apply this mapping:** `Always` [![Complete the fields as shown.](/img/docs/dbt-cloud/access-control/edit-attribute.png?v=2 "Complete the fields as shown.")](#)Complete the fields as shown. 9. Click **Ok**. 10. Make sure the following mappings are in place and delete any others: * **UserName:** `userPrincipalName` or the value you want users to leverage to log in to dbt. * **active:** `Switch([IsSoftDeleted], , "False", "True", "True", "False")` * **emails\[type eq "work"].value:** `userPrincipalName` is the most common, but this value needs to be the same set for **UserName**. * **name.givenName:** `givenName` * **name.familyName:** `surname` * **externalid:** `mailNickname` * **emails\[type eq "work"].primary** [![Edit the attributes so they match the list as shown.](/img/docs/dbt-cloud/access-control/attribute-list.png?v=2 "Edit the attributes so they match the list as shown.")](#)Edit the attributes so they match the list as shown. 11. Click **Save**. You can now begin assigning users to your SCIM app in Entra ID! #### Assign users to SCIM app[​](#assign-users-to-scim-app "Direct link to Assign users to SCIM app") The following steps go over how to assign users/groups to the SCIM app. Refer to Microsoft's [official instructions for assigning users or groups to an Enterprise App in Entra ID](https://learn.microsoft.com/en-us/azure/databricks/admin/users-groups/scim/aad#step-3-assign-users-and-groups-to-the-application) to learn more. Although the article is written for Databricks, the steps are identical. 1. Navigate to Enterprise applications and select the SCIM app. 2. Go to **Manage** > **Provisioning**. 3. To synchronize Microsoft Entra ID users and groups to dbt, click **Start provisioning**. [![Start provisioning to synchronize users and groups.](/img/docs/dbt-cloud/dbt-cloud-enterprise/access-control/scim-entraid-start-provision.png?v=2 "Start provisioning to synchronize users and groups.")](#)Start provisioning to synchronize users and groups. 4. Navigate back to the SCIM app's overview page and go to **Manage** > **Users and groups**. 5. Click **Add user/group** and select the users and groups. [![Add user/group.](/img/docs/dbt-cloud/dbt-cloud-enterprise/access-control/scim-entraid-add-users.png?v=2 "Add user/group.")](#)Add user/group. 6. Click the **Assign** button. 7. Wait a few minutes. In the dbt platform, confirm the users and groups exist in your dbt account. * Users and groups that you add and assign will automatically be provisioned to your dbt account when Microsoft Entra ID schedules the next sync. * By enabling provisioning, you immediately trigger the initial Microsoft Entra ID sync. Subsequent syncs are triggered every 20-40 minutes, depending on the number of users and groups in the application. Refer to Microsoft Entra ID's [Provisioning tips](https://learn.microsoft.com/en-us/azure/databricks/admin/users-groups/scim/aad#provisioning-tips) documentation for more information. * You can also prompt a manual provisioning outside of the cycle by clicking **Restart provisioning**. [![Prompt manual provisioning.](/img/docs/dbt-cloud/dbt-cloud-enterprise/access-control/scim-entraid-manual.png?v=2 "Prompt manual provisioning.")](#)Prompt manual provisioning. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ##### Set up SCIM with Okta EnterpriseEnterprise + ### Set up SCIM with Okta [Enterprise](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing")[Enterprise +](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing") SCIM available for Okta System for Cross-Domain Identity Management (SCIM) [license mapping](https://docs.getdbt.com/docs/cloud/manage-access/scim-manage-user-licenses.md) is currently only supported for Okta. For other providers, license types must be [managed](https://docs.getdbt.com/docs/cloud/manage-access/seats-and-users.md#mapped-configuration) within the dbt platform user interface. #### Prerequisites[​](#prerequisites "Direct link to Prerequisites") * Available on [Enterprise or Enterprise+ plans](https://www.getdbt.com/pricing). * You must use Okta as your single sign-on (SSO) provider and have it connected in the dbt platform. * You must have [permissions](https://docs.getdbt.com/docs/cloud/manage-access/enterprise-permissions.md) to configure the account settings in dbt platform. * Complete [setup SSO with Okta](https://docs.getdbt.com/docs/cloud/manage-access/set-up-sso-okta.md) before configuring SCIM settings. * Complete the [Set up SCIM](https://docs.getdbt.com/docs/cloud/manage-access/scim.md#set-up-dbt) to get your SCIM base URL and token. #### Set up Okta[​](#set-up-okta "Direct link to Set up Okta") 1. Log in to your Okta account and locate the app configured for the dbt SSO integration. 2. Navigate to the **General** tab and ensure **Enable SCIM provisioning** is selected or the **Provisioning** tab will not be displayed. [![Enable SCIM provisioning in Okta.](/img/docs/dbt-cloud/access-control/scim-provisioned.png?v=2 "Enable SCIM provisioning in Okta.")](#)Enable SCIM provisioning in Okta. 3. Open the **Provisioning** tab and select **Integration**. 4. Enter the **SCIM base URL** from [Set up SCIM](https://docs.getdbt.com/docs/cloud/manage-access/scim.md#set-up-dbt) in the first field, then enter your preferred **Unique identifier field for users** — we recommend `userName`. 5. Select the boxes for the following **Supported provisioning actions**: * **Push New Users** * **Push Profile Updates** * **Push Groups** * **Import New Users and Profile Updates** (Optional for users created before SSO/SCIM setup) 6. From the **Authentication mode** dropdown, select **HTTP Header**. 7. In the **Authorization** section, enter the token from dbt into the **Bearer** field. [![The completed SCIM configuration in the Okta app.](/img/docs/dbt-cloud/access-control/scim-okta-config.png?v=2 "The completed SCIM configuration in the Okta app.")](#)The completed SCIM configuration in the Okta app. 8. Ensure the following provisioning actions are selected: * **Create Users** * **Update User Attributes** * **Deactivate Users** [![Ensure the users are properly provisioned with these settings.](/img/docs/dbt-cloud/access-control/provisioning-actions.png?v=2 "Ensure the users are properly provisioned with these settings.")](#)Ensure the users are properly provisioned with these settings. 9. Test the connection and click **Save** once completed. You've now configured SCIM for the Okta SSO integration in dbt platform. You can [manage user licenses with SCIM](https://docs.getdbt.com/docs/cloud/manage-access/scim-manage-user-licenses.md) to set license type for users as they are provisioned. #### SCIM username format[​](#scim-username-format "Direct link to SCIM username format") For dbt platform SCIM with Okta, `userName` **must be the email address format**. dbt platform uses `userName` to look up existing users during SCIM sync. If Okta sends another format (such as an Okta internal ID like `00u...` or an employee ID), dbt platform cannot match the existing user, and provisioning will fail. If your Okta configurations map the `Username` field to a different attribute, set your Okta app config to `Email`: 1. Open the SAML app created for the dbt integration. 2. In the **Sign on** tab, click **Edit** in the **Settings** pane. 3. Set the **Application username format** field to **Email**. 4. Click **Save**. #### SCIM license mapping[​](#scim-license-mapping "Direct link to SCIM license mapping") To automate seat assignments in Okta for users as they are provisioned, see [Manage user licenses with SCIM](https://docs.getdbt.com/docs/cloud/manage-access/scim-manage-user-licenses.md). #### Existing Okta integrations[​](#existing-okta-integrations "Direct link to Existing Okta integrations") If you are adding SCIM to an existing Okta integration in dbt (as opposed to setting up SCIM and SSO concurrently for the first time), be aware of the following behavior: * Users and groups already synced to dbt will become SCIM-managed once you complete the SCIM configuration. * (Recommended) Import and manage existing dbt groups and users with Okta's **Import Groups** and **Import Users** features. Update the groups in your IdP with the same naming convention used for dbt groups. New users, groups, and changes to existing profiles will be automatically imported into dbt. * Ensure the **Import users and profile updates** and **Import Groups** boxes are selected under the **Provisioning settings** tab in the Okta SCIM configuration. * Use **Import Users** to sync all users from dbt, including previously deleted users, if you need to re-provision those users. * Read more about this feature in the [Okta documentation](https://help.okta.com/en-us/content/topics/users-groups-profiles/usgp-import-groups-app-provisioning.htm). To set license type for users as they are provisioned, see [Manage user licenses with SCIM](https://docs.getdbt.com/docs/cloud/manage-access/scim-manage-user-licenses.md). #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ##### Set up Snowflake OAuth EnterpriseEnterprise + ### Set up Snowflake OAuth [Enterprise](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing")[Enterprise +](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing") Subdomain migration We're migrating dbt platform [multi-tenant accounts worldwide](https://docs.getdbt.com/docs/cloud/about-cloud/access-regions-ip-addresses.md) to static subdomains. After the migration, you’ll be automatically redirected from your original URL (for example, `cloud.getdbt.com`) to the new URL static subdomain (for example, `abc123.us1.dbt.com`), which you can find in your account settings. If your organization has network allow listing, add the `us1.dbt.com` domain to your allow list. The migration may require additional actions in your Snowflake account. See [subdomain migration](#subdomain-migration) for more information. dbt Enterprise and Enterprise+ supports [OAuth authentication](https://docs.snowflake.net/manuals/user-guide/oauth-intro.html) with Snowflake. When Snowflake OAuth is enabled, users can authorize their Development credentials using Single Sign On (SSO) via Snowflake rather than submitting a username and password to dbt. If Snowflake is set up with SSO through a third-party identity provider, developers can use this method to log into Snowflake and authorize the dbt Development credentials without any additional setup. Snowflake OAuth with PrivateLink Users connecting to Snowflake using [Snowflake OAuth](https://docs.getdbt.com/docs/cloud/manage-access/set-up-snowflake-oauth.md) over an AWS PrivateLink connection from dbt will also require access to a PrivateLink endpoint from their local workstation. Where possible, use [Snowflake External OAuth](https://docs.getdbt.com/docs/cloud/manage-access/snowflake-external-oauth.md) instead to bypass this limitation. From the [Snowflake](https://docs.snowflake.com/en/user-guide/admin-security-fed-auth-overview#label-sso-private-connectivity) docs: > Currently, for any given Snowflake account, SSO works with only one account URL at a time: either the public account URL or the URL associated with the private connectivity service To set up Snowflake OAuth in dbt, admins from both are required for the following steps: 1. [Locate the redirect URI value](#locate-the-redirect-uri-value) in dbt. 2. [Create a security integration](#create-a-security-integration) in Snowflake. 3. [Configure a connection](#configure-a-connection-in-dbt-cloud) in dbt. To use Snowflake in the Studio IDE, all developers must [authenticate with Snowflake](#authorize-developer-credentials) in their profile credentials. ##### Locate the redirect URI value[​](#locate-the-redirect-uri-value "Direct link to Locate the redirect URI value") To get started, copy the connection's redirect URI from dbt: 1. Navigate to **Account settings**. 2. Select **Projects** and choose a project from the list. 3. Click the **Development connection** field to view its details and set the **OAuth method** to "Snowflake SSO". 4. Copy the **Redirect URI** to use in the later steps. [![The OAuth method and Redirect URI inputs for a Snowflake connection in dbt.](/img/docs/dbt-cloud/dbt-cloud-enterprise/snowflake-oauth-redirect-uri.png?v=2 "Locate the Snowflake OAuth redirect URI")](#)Locate the Snowflake OAuth redirect URI ##### Create a security integration[​](#create-a-security-integration "Direct link to Create a security integration") In Snowflake, execute a query to create a security integration. Please find the complete documentation on creating a security integration for custom clients [here](https://docs.snowflake.net/manuals/sql-reference/sql/create-security-integration.html#syntax). In the following `CREATE OR REPLACE SECURITY INTEGRATION` example query, replace `` value with the Redirect URI (also referred to as the [access URL](https://docs.getdbt.com/docs/cloud/about-cloud/access-regions-ip-addresses.md)) copied in dbt. To locate the Redirect URI, refer to the previous [locate the redirect URI value](#locate-the-redirect-uri-value) section. Important: If you’re using secondary roles, you must include `OAUTH_USE_SECONDARY_ROLES = 'IMPLICIT';` in the statement. ```text CREATE OR REPLACE SECURITY INTEGRATION DBT_CLOUD TYPE = OAUTH ENABLED = TRUE OAUTH_CLIENT = CUSTOM OAUTH_CLIENT_TYPE = 'CONFIDENTIAL' OAUTH_REDIRECT_URI = '' OAUTH_ISSUE_REFRESH_TOKENS = TRUE OAUTH_REFRESH_TOKEN_VALIDITY = 7776000 OAUTH_USE_SECONDARY_ROLES = 'IMPLICIT'; -- Required for secondary roles ``` Permissions Note: Only Snowflake account administrators (users with the `ACCOUNTADMIN` role) or a role with the global `CREATE INTEGRATION` privilege can execute this SQL command. | Field | Description | | ------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | | TYPE | Required | | ENABLED | Required | | OAUTH\_CLIENT | Required | | OAUTH\_CLIENT\_TYPE | Required | | OAUTH\_REDIRECT\_URI | Required. Use the value in the [dbt account settings](#locate-the-redirect-uri-value). | | OAUTH\_ISSUE\_REFRESH\_TOKENS | Required | | OAUTH\_REFRESH\_TOKEN\_VALIDITY | Required. This configuration dictates the number of seconds that a refresh token is valid for. Use a smaller value to force users to re-authenticate with Snowflake more frequently. | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | Additional configuration options may be specified for the security integration as needed. ##### Configure a Connection in dbt[​](#configure-a-connection-in-dbt "Direct link to Configure a Connection in dbt") The Database Admin is responsible for creating a Snowflake Connection in dbt. This Connection is configured using a Snowflake Client ID and Client Secret. These values can be determined by running the following query in Snowflake: ```text with integration_secrets as ( select parse_json(system$show_oauth_client_secrets('DBT_CLOUD')) as secrets ) select secrets:"OAUTH_CLIENT_ID"::string as client_id, secrets:"OAUTH_CLIENT_SECRET"::string as client_secret from integration_secrets; ``` To complete the creation of your connection in dbt: 1. Navigate to your **Account Settings**, click **Connections**, and select a connection. 2. Edit the connection and enter the Client ID and Client Secret. 3. Click **Save**. [![Configuring Snowflake OAuth credentials in dbt](/img/docs/dbt-cloud/dbt-cloud-enterprise/database-connection-snowflake-oauth.png?v=2 "Configuring Snowflake OAuth credentials in dbt")](#)Configuring Snowflake OAuth credentials in dbt ##### Authorize developer credentials[​](#authorize-developer-credentials "Direct link to Authorize developer credentials") Once Snowflake SSO is enabled, users on the project will be able to configure their credentials in their Profiles. By clicking the "Connect to Snowflake Account" button, users will be redirected to Snowflake to authorize with the configured SSO provider, then back to dbt to complete the setup process. At this point, users should now be able to use the Studio IDE with their development credentials. ##### SSO OAuth flow diagram[​](#sso-oauth-flow-diagram "Direct link to SSO OAuth flow diagram") [![SSO OAuth flow diagram](/img/docs/dbt-cloud/dbt-cloud-enterprise/84427818-841b3680-abf3-11ea-8faf-693d4a39cffb.png?v=2 "SSO OAuth flow diagram")](#)SSO OAuth flow diagram Once a user has authorized dbt with Snowflake via their identity provider, Snowflake will return a Refresh Token to the dbt application. dbt is then able to exchange this refresh token for an Access Token which can then be used to open a Snowflake connection and execute queries in the Studio IDE on behalf of users. **NOTE**: The lifetime of the refresh token is dictated by the OAUTH\_REFRESH\_TOKEN\_VALIDITY parameter supplied in the “create security integration” statement. When a user’s refresh token expires, the user will need to re-authorize with Snowflake to continue development in dbt. ##### Setting up multiple dbt projects with Snowflake 0Auth[​](#setting-up-multiple-dbt-projects-with-snowflake-0auth "Direct link to Setting up multiple dbt projects with Snowflake 0Auth") If you are planning to set up the same Snowflake account to different dbt projects, you can use the same security integration for all of the projects. #### Subdomain migration[​](#subdomain-migration "Direct link to Subdomain migration") If you're a [multi-tenant account](https://docs.getdbt.com/docs/cloud/about-cloud/access-regions-ip-addresses.md) being migrated to a static subdomain, you may need to take additional action in your Snowflake account to prevent service disruptions. Snowflake limits each security integration (`CREATE SECURITY INTEGRATION … TYPE = OAUTH`) to a single redirect URI. If you configured your OAuth integration with `cloud.getdbt.com`, you must take one of two courses of action: * **Configure an additional security integration:** In your Snowflake account, you will have one with the original URL (for example, `cloud.getdbt.com/complete/snowflake`) as the redirect URI, and another using the new static subdomain. Refer to our [regions & IP addresses page](https://docs.getdbt.com/docs/cloud/about-cloud/access-regions-ip-addresses.md) for a complete list of the original domains in your region (marked as "multi-tenant" on the chart). * **Use a single security integration:** Create one that uses the new static subdomain as the redirect URI. In this scenario, you must recreate all of your [existing connections](https://docs.getdbt.com/docs/cloud/connect-data-platform/about-connections.md#connection-management). ##### Troubleshooting[​](#troubleshooting "Direct link to Troubleshooting")  Invalid consent request When clicking on the `Connect Snowflake Account` successfully redirects you to the Snowflake login page, but you receive an `Invalid consent request` error. This could mean: * Your user might not have access to the Snowflake role defined on the development credentials in dbt. Double-check that you have access to that role and if the role name has been correctly entered in as Snowflake is case sensitive. * You're trying to use a role that is in the [BLOCKED\_ROLES\_LIST](https://docs.snowflake.com/en/user-guide/oauth-partner.html#blocking-specific-roles-from-using-the-integration), such as `ACCOUNTADMIN`.  The requested scope is invalid When you select the `Connect Snowflake Account` button to try to connect to your Snowflake account, you might get an error that says `The requested scope is invalid` even though you were redirected to the Snowflake login page successfully. This error might be because of a configuration issue in the Snowflake OAuth flow, where the `role` in the profile config is mandatory for each user and doesn't inherit it from the project connection page. This means each user needs to supply their role information, regardless of whether it's provided on the project connection page. * In the Snowflake OAuth flow, `role` in the profile config is not optional, as it does not inherit from the project connection config. So each user must supply their role, regardless of whether it is provided in the project connection.  Server error 500 If you experience a 500 server error when redirected from Snowflake to dbt, double-check that you have allow-listed [dbt's IP addresses](https://docs.getdbt.com/docs/cloud/about-cloud/access-regions-ip-addresses.md), or [VPC Endpoint ID (for PrivateLink connections)](https://docs.getdbt.com/docs/cloud/secure/private-connectivity/aws/aws-snowflake.md#configuring-network-policies), on a Snowflake account level. Enterprise customers who have single-tenant deployments will have a different range of IP addresses (network CIDR ranges) to allow list. Depending on how you've configured your Snowflake network policies or IP allow listing, you may have to explicitly add the network policy that includes the allow listed dbt IPs to the security integration you just made. ```text ALTER SECURITY INTEGRATION SET NETWORK_POLICY = ; ```  Secondary role not working. Error: USE ROLE not allowed If you want to use secondary roles but experience `Current sessions is restricted. USE ROLE not allowed` error when setting up Snowflake OAuth, double-check you added the following statement to the query: ```text OAUTH_USE_SECONDARY_ROLES = 'IMPLICIT'; ``` For the full query example, see [Create a security integration](#create-a-security-integration). #### Learn more[​](#learn-more "Direct link to Learn more") #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ##### Set up SSO with Google Workspace EnterpriseEnterprise + ### Set up SSO with Google Workspace [Enterprise](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing")[Enterprise +](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing") dbt Enterprise-tier plans support Single-Sign On (SSO) via Google GSuite. You will need permissions to create and manage a new Google OAuth2 application, as well as access to enable the Google Admin SDK. Gsuite is a component within Google Cloud Platform (GCP), so you will also need access to a login with permissions to manage the GSuite application within a GCP account. Some customers choose to use different cloud providers for User and Group permission setup than for hosting infrastructure. For example, it's certainly possible to use GSuite to manage login information and Multifactor Authentication (MFA) configuration while hosting data workloads on AWS. Currently supported features include: * SP-initiated SSO * Just-in-time provisioning This guide outlines the setup process for authenticating to dbt with Google GSuite. #### Configuration of the GSuite organization within GCP[​](#configuration-of-the-gsuite-organization-within-gcp "Direct link to Configuration of the GSuite organization within GCP") dbt uses a Client ID and Client Secret to authenticate users of a GSuite organization. The steps below outline how to create a Client ID and Client Secret for use in dbt. ##### Creating credentials[​](#creating-credentials "Direct link to Creating credentials") 1. Navigate to the GCP [API Manager](https://console.developers.google.com/projectselector/apis/credentials) 2. Select an existing project, or create a new project for your API Credentials 3. Click on **Create Credentials** and select **OAuth Client ID** in the resulting popup 4. Google requires that you configure an OAuth consent screen for OAuth credentials. Click the **Configure consent screen** button to create a new consent screen if prompted. 5. On the OAuth consent screen page, configure the following settings ([Google docs](https://support.google.com/cloud/answer/6158849?hl=en#userconsent)): | Configuration | Value | notes | | ---------------------- | --------------------------------------------------------------------------------------------------------------------------- | ----------------------------------------------------------- | | **Application type** | internal | required | | **Application name** | dbt | required | | **Application logo** | Download the logo [here](https://cdn.sanity.io/images/wl0ndo6t/main/333fef4fc72db6f1ce4d1bc0789f355b4f0bbaa2-1280x1280.png) | optional | | **Authorized domains** | `getdbt.com` (US multi-tenant) `getdbt.com` and `dbt.com`(US Cell 1) `dbt.com` (EMEA or AU) | If deploying into a VPC, use the domain for your deployment | | **Scopes** | `email, profile, openid` | The default scopes are sufficient | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | [![GSuite Consent Screen configuration](/img/docs/dbt-cloud/dbt-cloud-enterprise/gsuite/gsuite-sso-consent-top.png?v=2 "GSuite Consent Screen configuration")](#)GSuite Consent Screen configuration 6. Save the **Consent screen** settings to navigate back to the **Create OAuth client id** page. 7. Use the following configuration values when creating your Credentials, replacing `YOUR_ACCESS_URL` and `YOUR_AUTH0_URI`, which need to be replaced with the appropriate Access URL and Auth0 URI from your [account settings](https://docs.getdbt.com/docs/cloud/manage-access/sso-overview.md#auth0-uris). | Config | Value | | --------------------------------- | --------------------------------------- | | **Application type** | Web application | | **Name** | dbt | | **Authorized Javascript origins** | `https://YOUR_ACCESS_URL` | | **Authorized Redirect URIs** | `https://YOUR_AUTH0_URI/login/callback` | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | [![GSuite Credentials configuration](/img/docs/dbt-cloud/dbt-cloud-enterprise/gsuite/gsuite-sso-credentials.png?v=2 "GSuite Credentials configuration")](#)GSuite Credentials configuration 8. Press "Create" to create your new credentials. A popup will appear with a **Client ID** and **Client Secret**. Write these down as you will need them later! ##### Enabling the Admin SDK[​](#enabling-the-admin-sdk "Direct link to Enabling the Admin SDK") dbt requires that the Admin SDK is enabled in this application to request Group Membership information from the GSuite API. To enable the Admin SDK for this project, navigate to the [Admin SDK Settings page](https://console.developers.google.com/apis/api/admin.googleapis.com/overview) and ensure that the API is enabled. [![The 'Admin SDK' page](/img/docs/dbt-cloud/dbt-cloud-enterprise/7f36f50-Screen_Shot_2019-12-03_at_10.15.01_AM.png?v=2 "The 'Admin SDK' page")](#)The 'Admin SDK' page #### Configuration in dbt[​](#configuration-in-dbt "Direct link to Configuration in dbt") To complete setup, follow the steps below in the dbt application. ##### Supply your OAuth Client ID and Client Secret[​](#supply-your-oauth-client-id-and-client-secret "Direct link to Supply your OAuth Client ID and Client Secret") 1. Navigate to the **Enterprise > Single Sign On** page under **Account settings**. 2. Click the **Edit** button and supply the following SSO details: * **Log in with**: GSuite * **Client ID**: Paste the Client ID generated in the steps above * **Client Secret**: Paste the Client Secret generated in the steps above * **Domain in GSuite**: Enter the domain name for your GSuite account (eg. `dbtlabs.com`). Only users with an email address from this domain will be able to log into your dbt account using GSuite auth. Optionally, you may specify a CSV of domains which are *all* authorized to access your dbt account (eg. `dbtlabs.com, fishtowndata.com`) [![GSuite SSO Configuration](/img/docs/dbt-cloud/dbt-cloud-enterprise/gsuite/gsuite-sso-cloud-config.png?v=2 "GSuite SSO Configuration")](#)GSuite SSO Configuration 3. Click **Save & Authorize** to authorize your credentials. You should be dropped into the GSuite OAuth flow and prompted to log into dbt with your work email address. If authentication is successful, you will be redirected back to the dbt application. 4. On the **Credentials** page, verify that a `groups` entry is present, and that it reflects the groups you are a member of in GSuite. If you do not see a `groups` entry in the IdP attribute list, consult the following Troubleshooting steps. [![GSuite verify groups](/img/docs/dbt-cloud/dbt-cloud-enterprise/gsuite/gsuite-sso-cloud-verify.png?v=2 "GSuite verify groups")](#)GSuite verify groups If the verification information looks appropriate, then you have completed the configuration of GSuite SSO. Logging in Users can now log into dbt platform by navigating to the following URL, replacing `LOGIN-SLUG` with the value used in the previous steps and `YOUR_ACCESS_URL` with the [appropriate Access URL](https://docs.getdbt.com/docs/cloud/about-cloud/access-regions-ip-addresses.md) for your region and plan: `https://YOUR_ACCESS_URL/enterprise-login/LOGIN-SLUG` #### Setting up RBAC[​](#setting-up-rbac "Direct link to Setting up RBAC") Now you have completed setting up SSO with GSuite, the next steps will be to set up [RBAC groups](https://docs.getdbt.com/docs/cloud/manage-access/about-user-access.md#role-based-access-control-) to complete your access control configuration. #### Troubleshooting[​](#troubleshooting "Direct link to Troubleshooting") ##### Invalid client error[​](#invalid-client-error "Direct link to Invalid client error") If you experience an `Error 401: invalid_client` when authorizing with GSuite, double check that: * The Client ID provided matches the value generated in the GCP API Credentials page. * Ensure the Domain Name(s) provided matches the one(s) for your GSuite account. ##### OAuth errors[​](#oauth-errors "Direct link to OAuth errors") If OAuth verification does not complete successfully, double check that: * The Admin SDK is enabled in your GCP project * The Client ID and Client Secret provided match the values generated in the GCP Credentials page * An Authorized Domain was provided in the OAuth Consent Screen configuration If authentication with the GSuite API succeeds but you do not see a `groups` entry on the **Credentials** page, then you may not have permissions to access Groups in your GSuite account. Either request that your GSuite user is granted the ability to request groups from an administrator, or have an administrator log into dbt and authorize the GSuite integration. #### Learn more[​](#learn-more "Direct link to Learn more") #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ##### Set up SSO with Microsoft Entra ID EnterpriseEnterprise + ### Set up SSO with Microsoft Entra ID [Enterprise](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing")[Enterprise +](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing") dbt Enterprise-tier plans support single-sign on via Microsoft Entra ID (formerly Azure AD). SCIM available for Entra ID After setting up single sign-on (SSO), you can [set up System for Cross-Domain Identity Management (SCIM)](https://docs.getdbt.com/docs/cloud/manage-access/scim-entra-id.md) with Entra ID to automate user and group provisioning. Currently supported SSO features include: * IdP-initiated SSO * SP-initiated SSO * Just-in-time provisioning #### Configuration[​](#configuration "Direct link to Configuration") dbt supports both single tenant and multi-tenant Microsoft Entra ID (formerly Azure AD) SSO Connections. For most Enterprise purposes, you will want to use the single-tenant flow when creating a Microsoft Entra ID Application. ##### Creating an application[​](#creating-an-application "Direct link to Creating an application") Log into the Azure portal for your organization. Using the [**Microsoft Entra ID**](https://portal.azure.com/#home) page, you will need to select the appropriate directory and then register a new application. 1. Under **Manage**, select **App registrations**. 2. Click **+ New Registration** to begin creating a new application registration. [![Creating a new app registration](/img/docs/dbt-cloud/dbt-cloud-enterprise/azure/azure-app-registration-empty.png?v=2 "Creating a new app registration")](#)Creating a new app registration 3. Supply configurations for the **Name** and **Supported account types** fields as shown in the following table: | Field | Value | | --------------------------- | ---------------------------------------------------------------- | | **Name** | dbt | | **Supported account types** | Accounts in this organizational directory only *(single tenant)* | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | 4. Configure the **Redirect URI**. The table below shows the appropriate Redirect URI values for single-tenant and multi-tenant Entra ID app deployments. For most enterprise use-cases, you will want to use the single-tenant Redirect URI. Replace `YOUR_AUTH0_URI` with the [appropriate Auth0 URI](https://docs.getdbt.com/docs/cloud/manage-access/sso-overview.md#auth0-uris) for your region and plan. **Note:** Your dbt platform tenancy has no bearing on this setting. This Entra ID app setting controls app access: * **Single-tenant:** Only users from your Entra ID tenant can access the app. * **Multi-tenant:** Users from *any* Entra ID tenant can access the app. | Application Type | Redirect URI | | ----------------------------- | --------------------------------------- | | Single-tenant *(recommended)* | `https://YOUR_AUTH0_URI/login/callback` | | Multi-tenant | `https://YOUR_AUTH0_URI/login/callback` | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | [![Configuring a new app registration](/img/docs/dbt-cloud/dbt-cloud-enterprise/azure/azure-new-application-alternative.png?v=2 "Configuring a new app registration")](#)Configuring a new app registration 5. Save the App registration to continue setting up Microsoft Entra ID SSO. Configuration with the new Microsoft Entra ID interface (optional) Depending on your Microsoft Entra ID settings, your App Registration page might look different than the screenshots shown earlier. If you are *not* prompted to configure a Redirect URI on the **New Registration** page, then follow steps 6 - 7 below after creating your App Registration. If you were able to set up the Redirect URI in the steps above, then skip ahead to [step 8](#adding-users-to-an-enterprise-application). 6. After registering the new application without specifying a Redirect URI, click on **App registration** and then navigate to the **Authentication** tab for the new application. 7. Click **+ Add platform** and enter a Redirect URI for your application. See step 4 above for more information on the correct Redirect URI value for your dbt application. [![Configuring a Redirect URI](/img/docs/dbt-cloud/dbt-cloud-enterprise/azure/azure-redirect-uri.png?v=2 "Configuring a Redirect URI")](#)Configuring a Redirect URI ##### Azure <-> dbt User and Group mapping[​](#azure---dbt-user-and-group-mapping "Direct link to Azure <-> dbt User and Group mapping") important There is a [limitation](https://learn.microsoft.com/en-us/entra/identity/hybrid/connect/how-to-connect-fed-group-claims#important-caveats-for-this-functionality) on the number of groups Azure will emit (capped at 150) via the SSO token, meaning if a user belongs to more than 150 groups, it will appear as though they belong to none. To prevent this, configure [group assignments](https://learn.microsoft.com/en-us/entra/identity/enterprise-apps/assign-user-or-group-access-portal?pivots=portal) with the dbt app in Azure and set a [group claim](https://learn.microsoft.com/en-us/entra/identity/hybrid/connect/how-to-connect-fed-group-claims#add-group-claims-to-tokens-for-saml-applications-using-sso-configuration) so Azure emits only the relevant groups. The Azure users and groups you will create in the following steps are mapped to groups created in dbt based on the group name. Reference the docs on [enterprise permissions](https://docs.getdbt.com/docs/cloud/manage-access/enterprise-permissions.md) for additional information on how users, groups, and permission sets are configured in dbt. The dbt platform uses the **User principal name** (UPN) in Microsoft Entra ID to identify and match users logging in to dbt through SSO. The UPN is usually formatted as an email address. ##### Adding users to an Enterprise application[​](#adding-users-to-an-enterprise-application "Direct link to Adding users to an Enterprise application") Once you've registered the application, the next step is to assign users to it. Add the users you want to be viewable to dbt with the following steps: 8. Navigate back to the [**Default Directory**](https://portal.azure.com/#home) (or **Home**) and click **Enterprise Applications**. 9. Click the name of the application you created earlier. 10. Click **Assign Users and Groups**. 11. Click **Add User/Group**. 12. Assign additional users and groups as needed. [![Adding Users to an Enterprise Application a Redirect URI](/img/docs/dbt-cloud/dbt-cloud-enterprise/azure/azure-enterprise-app-users.png?v=2 "Adding Users to an Enterprise Application a Redirect URI")](#)Adding Users to an Enterprise Application a Redirect URI User assignment required? Under **Properties** check the toggle setting for **User assignment required?** and confirm it aligns to your requirements. Most customers will want this toggled to **Yes** so that only users/groups explicitly assigned to dbt will be able to sign in. If this setting is toggled to **No** any user will be able to access the application if they have a direct link to the application per [Microsoft Entra ID Documentation](https://docs.microsoft.com/en-us/azure/active-directory/manage-apps/assign-user-or-group-access-portal#configure-an-application-to-require-user-assignment) ##### Configuring permissions[​](#configuring-permissions "Direct link to Configuring permissions") 13. Navigate back to [**Default Directory**](https://portal.azure.com/#home) (or **Home**) and then **App registration**. 14. Select your application and then select **API permissions**. 15. Click **+Add a permission** and add the permissions shown below. | API Name | Type | Permission | Required? | | --------------- | --------- | ---------------------------- | ---------------------------------------------------------------- | | Microsoft Graph | Delegated | `User.Read` | Yes | | Microsoft Graph | Delegated | `GroupMember.Read.All` | Yes | | Microsoft Graph | Delegated | `Directory.AccessAsUser.All` | Optional — may be required if users are assigned to > 200 groups | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | The default scope only requires `User.Read` and `GroupMember.Read.All`. If you assign a user to more than 200 groups, you may need to grant additional permissions such as `Directory.AccessAsUser.All`. 16. Save these permissions, then click **Grant admin consent** to grant admin consent for this directory on behalf of all of your users. [![Configuring application permissions](/img/docs/dbt-cloud/dbt-cloud-enterprise/azure/azure-permissions-overview.png?v=2 "Configuring application permissions")](#)Configuring application permissions ##### Creating a client secret[​](#creating-a-client-secret "Direct link to Creating a client secret") 17. Under **Manage**, click **Certificates & secrets**. 18. Click **+New client secret**. 19. Name the client secret "dbt" (or similar) to identify the secret. 20. Select **730 days (24 months)** as the expiration value for this secret (recommended). 21. Click **Add** to finish creating the client secret value (not the client secret ID). 22. Record the generated client secret somewhere safe. Later in the setup process, we'll use this client secret in dbt to finish configuring the integration. [![Configuring certificates & secrets](/img/docs/dbt-cloud/dbt-cloud-enterprise/azure/azure-secret-config.png?v=2 "Configuring certificates & secrets")](#)Configuring certificates & secrets [![Recording the client secret](/img/docs/dbt-cloud/dbt-cloud-enterprise/azure/azure-secret-saved.png?v=2 "Recording the client secret")](#)Recording the client secret ##### Collect client credentials[​](#collect-client-credentials "Direct link to Collect client credentials") 23. Navigate to the **Overview** page for the app registration. 24. Note the **Application (client) ID** and **Directory (tenant) ID** shown in this form and record them along with your client secret. We'll use these keys in the steps below to finish configuring the integration in dbt. [![Collecting credentials. Store these somewhere safe](/img/docs/dbt-cloud/dbt-cloud-enterprise/azure/azure-overview.png?v=2 "Collecting credentials. Store these somewhere safe")](#)Collecting credentials. Store these somewhere safe #### Configuring dbt[​](#configuring-dbt "Direct link to Configuring dbt") To complete setup, follow the steps below in the dbt application. ##### Supplying credentials[​](#supplying-credentials "Direct link to Supplying credentials") 25. From dbt, click on your account name in the left side menu and select **Account settings**. 26. Click **SSO & SCIM** from the menu. 27. Click the **Edit** button and supply the following SSO details: | Field | Value | | ----------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | **Log in with** | Microsoft Entra ID Single Tenant | | **Client ID** | Paste the **Application (client) ID** recorded in the steps above | | **Client Secret** | Paste the **Client Secret** (remember to use the Secret Value instead of the Secret ID) from the steps above;
**Note:** When the client secret expires, an Entra ID admin will have to generate a new one to be pasted into dbt for uninterrupted application access. | | **Tenant ID** | Paste the **Directory (tenant ID)** recorded in the steps above | | **Domain** | Enter the domain name for your Azure directory (such as `fishtownanalytics.com`). Only use the primary domain; this won't block access for other domains. | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | [![Configuring Entra ID AD SSO in dbt](/img/docs/dbt-cloud/dbt-cloud-enterprise/azure/azure-cloud-sso.png?v=2 "Configuring Entra ID AD SSO in dbt")](#)Configuring Entra ID AD SSO in dbt 28. Click **Save** to complete setup for the Microsoft Entra ID SSO integration. From here, you can navigate to the login URL generated for your account's *slug* to test logging in with Entra ID. Logging in Users can now log into dbt platform by navigating to the following URL, replacing `LOGIN-SLUG` with the value used in the previous steps and `YOUR_ACCESS_URL` with the [appropriate Access URL](https://docs.getdbt.com/docs/cloud/about-cloud/access-regions-ip-addresses.md) for your region and plan: `https://YOUR_ACCESS_URL/enterprise-login/LOGIN-SLUG` ##### Additional configuration options[​](#additional-configuration-options "Direct link to Additional configuration options") The **Single sign-on** section also contains additional configuration options which are located after the credentials fields. * **Include all groups:** Retrieve all groups to which a user belongs from your identity provider. If a user is a member of nested groups, it will also include the parent groups. When this option is disabled, only groups where the user has direct membership will be supplied. This option is enabled by default. * **Maximum number of groups to retrieve:** Provides a configurable limit to the number of groups to retrieve for users. By default, this is set to 250 groups, but this number can be increased if users' group memberships exceed that amount. #### Setting up RBAC[​](#setting-up-rbac "Direct link to Setting up RBAC") Now you have completed setting up SSO with Entra ID, the next steps will be to set up [RBAC groups](https://docs.getdbt.com/docs/cloud/manage-access/enterprise-permissions.md) to complete your access control configuration. Set up SCIM Now that you've set up SSO with Entra ID, you can [set up SCIM](https://docs.getdbt.com/docs/cloud/manage-access/scim-entra-id.md) to automate user and group provisioning. #### Troubleshooting tips[​](#troubleshooting-tips "Direct link to Troubleshooting tips") Ensure that the domain name under which user accounts exist in Azure matches the domain you supplied in [Supplying credentials](#supplying-credentials) when you configured SSO. [![Obtaining the user domain from Azure](/img/docs/dbt-cloud/dbt-cloud-enterprise/azure/azure-get-domain.png?v=2 "Obtaining the user domain from Azure")](#)Obtaining the user domain from Azure #### Learn more[​](#learn-more "Direct link to Learn more") #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ##### Set up SSO with Okta EnterpriseEnterprise + ### Set up SSO with Okta [Enterprise](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing")[Enterprise +](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing") dbt Enterprise-tier plans support single-sign on via Okta (using SAML). SCIM available for Okta After setting up single sign-on (SSO), you can [set up System for Cross-Domain Identity Management (SCIM)](https://docs.getdbt.com/docs/cloud/manage-access/scim-okta.md) with Okta to automate user and group provisioning, and license assignment. Currently supported SSO features include: * IdP-initiated SSO * SP-initiated SSO * Just-in-time provisioning This guide outlines the setup process for authenticating to dbt with Okta. #### Configuration in Okta[​](#configuration-in-okta "Direct link to Configuration in Okta") ##### Create a new application[​](#create-a-new-application "Direct link to Create a new application") Note: You'll need administrator access to your Okta organization to follow this guide. First, log into your Okta account. Using the Admin dashboard, create a new app. [![Create a new app](/img/docs/dbt-cloud/dbt-cloud-enterprise/okta/okta-1-new-app.png?v=2 "Create a new app")](#)Create a new app On the following screen, select the following configurations: * **Platform**: Web * **Sign on method**: SAML 2.0 Click **Create** to continue the setup process. [![Configure a new app](/img/docs/dbt-cloud/dbt-cloud-enterprise/okta/okta-1-new-app-create.png?v=2 "Configure a new app")](#)Configure a new app ##### Configure the Okta application[​](#configure-the-okta-application "Direct link to Configure the Okta application") On the **General Settings** page, enter the following details:: * **App name**: dbt * **App logo** (optional): You can optionally [download the dbt logo](https://cdn.sanity.io/images/wl0ndo6t/main/333fef4fc72db6f1ce4d1bc0789f355b4f0bbaa2-1280x1280.png), and upload it to Okta to use as the logo for this app. Click **Next** to continue. [![Configure the app's General Settings](/img/docs/dbt-cloud/dbt-cloud-enterprise/okta/okta-2-general-settings.png?v=2 "Configure the app's General Settings")](#)Configure the app's General Settings ##### Configure SAML Settings[​](#configure-saml-settings "Direct link to Configure SAML Settings") The SAML Settings page configures how Okta and dbt communicate. You will want to use an [appropriate Access URL](https://docs.getdbt.com/docs/cloud/about-cloud/access-regions-ip-addresses.md) for your region and plan. To complete this section, you will need your login URL slug. This slug controls the URL where users on your account can log into your application. dbt automatically generates login URL slugs, which can't be altered. It will contain only letters, numbers, and dashes. For example, the login URL slug for dbt Labs would look something like `dbt-labs-afk123`. Login URL slugs are unique across all dbt accounts. The following steps use `YOUR_AUTH0_URI` and `YOUR_AUTH0_ENTITYID`. Replace these placeholders with the [appropriate Auth0 URI and Auth0 Entity ID](https://docs.getdbt.com/docs/cloud/manage-access/sso-overview.md#auth0-uris) for your region. You can find these values in **Account settings** > **SSO & SCIM** > **Edit** or **Get started** after selecting your identity provider. * **Single sign on URL**: `https://YOUR_AUTH0_URI/login/callback?connection=` * **Audience URI (SP Entity ID)**: `urn:auth0::{login URL slug}` * **Relay State**: `` * **Name ID format**: `Unspecified` * **Application username**: `Custom` / `user.getInternalProperty("id")` * **Update Application username on**: `Create and update` [![Configure the app's SAML Settings](/img/docs/dbt-cloud/dbt-cloud-enterprise/okta/okta-3-saml-settings-top.png?v=2 "Configure the app's SAML Settings")](#)Configure the app's SAML Settings Application username configuration The **Application username** setting depends on whether you plan to use SCIM or not: * **SSO only:** Use a unique value such as `Custom` / `user.getInternalProperty("id")` (recommended in earlier steps) * **SSO and SCIM:** Use `Email` format instead, as SCIM requires the username to be in email address format Use the **Attribute Statements** and **Group Attribute Statements** forms to map your organization's Okta User and Group Attributes to the format that dbt expects. Expected **User Attribute Statements**: | Name | Name format | Value | Description | | ------------ | ----------- | ---------------- | -------------------------- | | `email` | Unspecified | `user.email` | *The user's email address* | | `first_name` | Unspecified | `user.firstName` | *The user's first name* | | `last_name` | Unspecified | `user.lastName` | *The user's last name* | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | Expected **Group Attribute Statements**: | Name | Name format | Filter | Value | Description | | -------- | ----------- | ------------- | ----- | ------------------------------------- | | `groups` | Unspecified | Matches regex | `.*` | *The groups that the user belongs to* | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | **Note:** You may use a more restrictive Group Attribute Statement than the example shown above. For example, if all of your dbt groups start with `DBT_CLOUD_`, you may use a filter like `Starts With: DBT_CLOUD_`. **Okta only returns 100 groups for each user, so if your users belong to more than 100 IdP groups, you will need to use a more restrictive filter**. Please contact support if you have any questions. [![Configure the app's User and Group Attribute Statements](/img/docs/dbt-cloud/dbt-cloud-enterprise/okta/okta-3-saml-settings-bottom.png?v=2 "Configure the app's User and Group Attribute Statements")](#)Configure the app's User and Group Attribute Statements Click **Next** to continue. ##### Finish Okta setup[​](#finish-okta-setup "Direct link to Finish Okta setup") Select *I'm an Okta customer adding an internal app*, and select *This is an internal app that we have created*. Click **Finish** to finish setting up the app. [![Finishing setup in Okta](/img/docs/dbt-cloud/dbt-cloud-enterprise/okta/okta-4-feedback.png?v=2 "Finishing setup in Okta")](#)Finishing setup in Okta ##### View setup instructions[​](#view-setup-instructions "Direct link to View setup instructions") On the next page, click **View Setup Instructions**. In the steps below, you'll supply these values in your dbt Account Settings to complete the integration between Okta and dbt. [![Viewing the configured application](/img/docs/dbt-cloud/dbt-cloud-enterprise/okta/okta-5-view-instructions.png?v=2 "Viewing the configured application")](#)Viewing the configured application [![Application setup instructions](/img/docs/dbt-cloud/dbt-cloud-enterprise/okta/okta-5-instructions.png?v=2 "Application setup instructions")](#)Application setup instructions #### Configuration in dbt[​](#configuration-in-dbt "Direct link to Configuration in dbt") To complete setup, follow the steps below in dbt. ##### Supplying credentials[​](#supplying-credentials "Direct link to Supplying credentials") First, navigate to the **Enterprise > Single Sign On** page under Account Settings. Next, click the **Edit** button and supply the following SSO details: | Field | Value | | ----------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | **Log in with** | Okta | | **Identity Provider SSO Url** | Paste the **Identity Provider Single Sign-On URL** shown in the Okta setup instructions | | **Identity Provider Issuer** | Paste the **Identity Provider Issuer** shown in the Okta setup instructions | | **X.509 Certificate** | Paste the **X.509 Certificate** shown in the Okta setup instructions;
**Note:** When the certificate expires, an Okta admin will have to generate a new one to be pasted into dbt for uninterrupted application access. | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | [![Configuring the application in dbt](/img/docs/dbt-cloud/dbt-cloud-enterprise/okta/okta-6-setup-integration.png?v=2 "Configuring the application in dbt")](#)Configuring the application in dbt 21. Click **Save** to complete setup for the Okta integration. From here, you can navigate to the URL generated for your account's *slug* to test logging in with Okta. Additionally, users added the Okta app will be able to log in to dbt from Okta directly. Logging in Users can now log into dbt platform by navigating to the following URL, replacing `LOGIN-SLUG` with the value used in the previous steps and `YOUR_ACCESS_URL` with the [appropriate Access URL](https://docs.getdbt.com/docs/cloud/about-cloud/access-regions-ip-addresses.md) for your region and plan: `https://YOUR_ACCESS_URL/enterprise-login/LOGIN-SLUG` #### Setting up RBAC[​](#setting-up-rbac "Direct link to Setting up RBAC") Now you have completed setting up SSO with Okta, the next steps will be to set up [RBAC groups](https://docs.getdbt.com/docs/cloud/manage-access/about-user-access.md#role-based-access-control-) to complete your access control configuration. Set up SCIM Now that you've set up SSO with Okta, you can [set up SCIM](https://docs.getdbt.com/docs/cloud/manage-access/scim-okta.md) to automate user and group provisioning (and license assignment for Okta). #### Learn more[​](#learn-more "Direct link to Learn more") #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ##### Set up SSO with SAML 2.0 EnterpriseEnterprise + ### Set up SSO with SAML 2.0 [Enterprise](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing")[Enterprise +](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing") dbt Enterprise-tier plans support single-sign on (SSO) for any SAML 2.0-compliant identity provider (IdP). Currently supported features include: * IdP-initiated SSO * SP-initiated SSO * Just-in-time provisioning This document details the steps to integrate dbt with an identity provider in order to configure Single Sign On and [role-based access control](https://docs.getdbt.com/docs/cloud/manage-access/about-user-access.md#role-based-access-control). #### Auth0 URIs[​](#auth0-uris "Direct link to Auth0 URIs") The URI used for SSO connections will vary based on your dbt hosted region. To find the Auth0 URI (also called the **Single sign-on URL**, **Authorization URL**, or **Callback URI**) for your environment: 1. Navigate to your **Account settings** and click **SSO & SCIM** in the left menu. 2. In the **Single sign-on** pane, click **Get started** (if SSO has not been configured) or **Edit** (if it has already been set up). 3. Select the appropriate **Identity provider** from the **Provider type** dropdown. 4. The Auth0 URI is displayed under the **Identity provider values** section. The field label depends on the provider you selected: | Identity provider | Field label | Example URI | | ------------------ | --------------------------- | --------------------------------------------------------------- | | SAML 2.0 | **Single sign-on URL** | `https://YOUR_AUTH0_URI/login/callback` | | Okta | **Single sign-on URL** | `https://YOUR_AUTH0_URI/login/callback?connection=ACCOUNT_NAME` | | Google Workspace | **Authorized Redirect URI** | `https://YOUR_AUTH0_URI/login/callback` | | Microsoft Entra ID | **Callback URI** | `https://YOUR_AUTH0_URI/login/callback` | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | *Replace `YOUR_AUTH0_URI` and `ACCOUNT_NAME` with your account values.* [![Example of the identity provider values for a SAML 2.0 provider](/img/docs/dbt-cloud/access-control/sso-uri.png?v=2 "Example of the identity provider values for a SAML 2.0 provider")](#)Example of the identity provider values for a SAML 2.0 provider Auth0 URI The Auth0 URI always contains YOUR\_AUTH0\_URI (for example, auth.cloud.getdbt.com), not your account-specific prefix URL (such as ks123.us1.dbt.com). This is because dbt uses Auth0 as a centralized authentication service across all regions and accounts. You don't need to replace this value with your cell-specific URL. #### Generic SAML 2.0 integrations[​](#generic-saml-20-integrations "Direct link to Generic SAML 2.0 integrations") If your SAML identity provider is one of Okta, Google, Azure or OneLogin, navigate to the relevant section further down this page. For all other SAML compliant identity providers, you can use the instructions in this section to configure that identity provider. ##### Configure your identity provider[​](#configure-your-identity-provider "Direct link to Configure your identity provider") You'll need administrator access to your SAML 2.0 compliant identity provider to configure the identity provider. You can use the following instructions with any SAML 2.0 compliant identity provider. ##### Creating the application[​](#creating-the-application "Direct link to Creating the application") 1. Log into your SAML 2.0 identity provider and create a new application. 2. When promoted, configure the application with the following details: * **Platform:** Web * **Sign on method:** SAML 2.0 * **App name:** dbt * **App logo (optional):** You can optionally [download the dbt logo](https://drive.google.com/file/d/1fnsWHRu2a_UkJBJgkZtqt99x5bSyf3Aw/view?usp=sharing), and use as the logo for this app. ###### Configuring the application[​](#configuring-the-application "Direct link to Configuring the application") The following steps use `YOUR_AUTH0_URI` and `YOUR_AUTH0_ENTITYID`. Replace these placeholders with the [appropriate Auth0 URI and Auth0 Entity ID](https://docs.getdbt.com/docs/cloud/manage-access/sso-overview.md#auth0-uris) for your region. You can find these values in **Account settings** > **SSO & SCIM** > **Edit** or **Get started** after selecting your identity provider. To complete this section, you will need your login URL slug. This slug controls the URL where users on your account can log into your application. dbt automatically generates login URL slugs, which can't be altered. It will contain only letters, numbers, and dashes. For example, the login URL slug for dbt Labs would look something like `dbt-labs-afk123`. Login URL slugs are unique across all dbt accounts. When prompted for the SAML 2.0 application configurations, supply the following values: * Single sign on URL: `https://YOUR_AUTH0_URI/login/callback?connection=` * Audience URI (SP Entity ID): `urn:auth0::{login URL slug}` - Relay State: `` (Note: Relay state may be shown as optional in the IdP settings; it is *required* for the dbt SSO configuration.) Additionally, you may configure the IdP attributes passed from your identity provider into dbt. [SCIM configuration](https://docs.getdbt.com/docs/cloud/manage-access/scim.md) requires `NameID` and `email` to associate logins with the correct user. If you're using license mapping for groups, you need to additionally configure the `groups` attribute. We recommend using the following values: | name | name format | value | description | | ----------- | ----------- | ---------------- | ------------------------ | | email | Unspecified | user.email | The user's email address | | first\_name | Unspecified | user.first\_name | The user's first name | | last\_name | Unspecified | user.last\_name | The user's last name | | NameID | Unspecified | ID | The user's unchanging ID | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | `NameID` values can be persistent (`urn:oasis:names:tc:SAML:2.0:nameid-format:persistent`) rather than unspecified if your IdP supports these values. Using an email address for `NameID` will work, but dbt creates an entirely new user if that email address changes. Configuring a value that will not change, even if the user's email address does, is a best practice. dbt's [role-based access control](https://docs.getdbt.com/docs/cloud/manage-access/about-user-access.md#role-based-access-control) relies on group mappings from the IdP to assign dbt users to dbt groups. To use role-based access control in dbt, also configure your identity provider to provide group membership information in user attribute called `groups`: | name | name format | value | description | | ------ | ----------- | ---------------- | --------------------------------------- | | groups | Unspecified | `` | The groups a user belongs to in the IdP | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | Note You may use a restricted group attribute statement to limit the groups set to dbt for each authenticated user. For example, if all of your dbt groups start with `DBT_CLOUD_...`, you may optionally apply a filter like `Starts With: DBT_CLOUD_`. ##### Collect integration secrets[​](#collect-integration-secrets "Direct link to Collect integration secrets") After confirming your details, the IdP should show you the following values for the new SAML 2.0 integration. Keep these values somewhere safe, as you will need them to complete setup in dbt. * Identity Provider Issuer * Identity Provider SSO Url * X.509 Certificate (PEM format required) *  Example of PEM format ```text -----BEGIN CERTIFICATE----- MIIC8DCCAdigAwIBAgIQSANTIKwxA1221kqhkiG9w0dbtLabsBAQsFADA0MTIwMAYDVQQD EylNaWNyb3NvZnQgQXp1cmUgRmVkZXJhdGVkIFNTTyBDZXJ0aWZpY2F0ZTAeFw0yMzEyMjIwMDU1 MDNaFw0yNjEyMjIwMDU1MDNaMDQxMjAwBgNVBAMTKU1pY3Jvc29mdCBBenVyZSBGZWRlcmF0ZWQg U1NPIENlcnRpZmljYXRlMIIBIjANBgkqhkiG9w0BAEFAAFRANKIEMIIBCgKCAQEAqfXQGc/D8ofK aXbPXftPotqYLEQtvqMymgvhFuUm+bQ9YSpS1zwNQ9D9hWVmcqis6gO/VFw61e0lFnsOuyx+XMKL rJjAIsuWORavFqzKFnAz7hsPrDw5lkNZaO4T7tKs+E8N/Qm4kUp5omZv/UjRxN0XaD+o5iJJKPSZ PBUDo22m+306DE6ZE8wqxT4jTq4g0uXEitD2ZyKaD6WoPRETZELSl5oiCB47Pgn/mpqae9o0Q2aQ LP9zosNZ07IjKkIfyFKMP7xHwzrl5a60y0rSIYS/edqwEhkpzaz0f8QW5pws668CpZ1AVgfP9TtD Y1EuxBSDQoY5TLR8++2eH4te0QIDAQABMA0GCSqGSIb3DmAKINgAA4IBAQCEts9ujwaokRGfdtgH 76kGrRHiFVWTyWdcpl1dNDvGhUtCRsTC76qwvCcPnDEFBebVimE0ik4oSwwQJALExriSvxtcNW1b qvnY52duXeZ1CSfwHkHkQLyWBANv8ZCkgtcSWnoHELLOWORLD4aSrAAY2s5hP3ukWdV9zQscUw2b GwN0/bTxxQgA2NLZzFuHSnkuRX5dbtrun21USPTHMGmFFYBqZqwePZXTcyxp64f3Mtj3g327r/qZ squyPSq5BrF4ivguYoTcGg4SCP7qfiNRFyBUTTERFLYU0n46MuPmVC7vXTsPRQtNRTpJj/b2gGLk 1RcPb1JosS1ct5Mtjs41 -----END CERTIFICATE----- ``` ##### Finish setup[​](#finish-setup "Direct link to Finish setup") After creating the application, follow the instructions in the [dbt setup](#dbt-setup) section to complete the integration. #### Okta integration[​](#okta-integration "Direct link to Okta integration") You can use the instructions in this section to configure Okta as your identity provider. 1. Log into your Okta account. Using the Admin dashboard, create a new app. [![Create a new app](/img/docs/dbt-cloud/dbt-cloud-enterprise/okta/okta-1-new-app.png?v=2 "Create a new app")](#)Create a new app 2. Select the following configurations: * **Platform**: Web * **Sign on method**: SAML 2.0 3. Click **Create** to continue the setup process. [![Configure a new app](/img/docs/dbt-cloud/dbt-cloud-enterprise/okta/okta-1-new-app-create.png?v=2 "Configure a new app")](#)Configure a new app ##### Configure the Okta application[​](#configure-the-okta-application "Direct link to Configure the Okta application") The following steps use `YOUR_AUTH0_URI` and `YOUR_AUTH0_ENTITYID`. Replace these placeholders with the [appropriate Auth0 URI and Auth0 Entity ID](https://docs.getdbt.com/docs/cloud/manage-access/sso-overview.md#auth0-uris) for your region. You can find these values in **Account settings** > **SSO & SCIM** > **Edit** or **Get started** after selecting your identity provider. To complete this section, you will need your login URL slug. This slug controls the URL where users on your account can log into your application. dbt automatically generates login URL slugs, which can't be altered. It will contain only letters, numbers, and dashes. For example, the login URL slug for dbt Labs would look something like `dbt-labs-afk123`. Login URL slugs are unique across all dbt accounts. 1. On the **General Settings** page, enter the following details: * **App name**: dbt * **App logo** (optional): You can optionally [download the dbt logo](https://drive.google.com/file/d/1fnsWHRu2a_UkJBJgkZtqt99x5bSyf3Aw/view?usp=sharing), and upload it to Okta to use as the logo for this app. 2. Click **Next** to continue. [![Configure the app's General Settings](/img/docs/dbt-cloud/dbt-cloud-enterprise/okta/okta-2-general-settings.png?v=2 "Configure the app's General Settings")](#)Configure the app's General Settings ##### Configure SAML Settings[​](#configure-saml-settings "Direct link to Configure SAML Settings") 1. On the **SAML Settings** page, enter the following values: * **Single sign on URL**: `https://YOUR_AUTH0_URI/login/callback?connection=` * **Audience URI (SP Entity ID)**: `urn:auth0::` * **Relay State**: `` * **Name ID format**: `Unspecified` * **Application username**: `Custom` / `user.getInternalProperty("id")` * **Update Application username on**: `Create and update` [![Configure the app's SAML Settings](/img/docs/dbt-cloud/dbt-cloud-enterprise/okta/okta-3-saml-settings-top.png?v=2 "Configure the app's SAML Settings")](#)Configure the app's SAML Settings 2. Map your organization's Okta User and Group Attributes to the format that dbt expects by using the Attribute Statements and Group Attribute Statements forms. [SCIM configuration](https://docs.getdbt.com/docs/cloud/manage-access/scim.md) requires `email` to associate logins with the correct user. If you're using license mapping for groups, you need to additionally configure the `groups` attribute. 3. The following table illustrates expected User Attribute Statements: | Name | Name format | Value | Description | | ------------ | ----------- | ---------------- | -------------------------- | | `email` | Unspecified | `user.email` | *The user's email address* | | `first_name` | Unspecified | `user.firstName` | *The user's first name* | | `last_name` | Unspecified | `user.lastName` | *The user's last name* | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | 4. The following table illustrates expected **Group Attribute Statements**: | Name | Name format | Filter | Value | Description | | -------- | ----------- | ------------- | ----- | ------------------------------------- | | `groups` | Unspecified | Matches regex | `.*` | *The groups that the user belongs to* | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | You can instead use a more restrictive Group Attribute Statement than the example shown in the previous steps. For example, if all of your dbt groups start with `DBT_CLOUD_`, you may use a filter like `Starts With: DBT_CLOUD_`. **Okta only returns 100 groups for each user, so if your users belong to more than 100 IdP groups, you will need to use a more restrictive filter**. Please contact support if you have any questions. [![Configure the app's User and Group Attribute Statements](/img/docs/dbt-cloud/dbt-cloud-enterprise/okta/okta-3-saml-settings-bottom.png?v=2 "Configure the app's User and Group Attribute Statements")](#)Configure the app's User and Group Attribute Statements 5. Click **Next** to continue. ##### Finish Okta setup[​](#finish-okta-setup "Direct link to Finish Okta setup") 1. Select *I'm an Okta customer adding an internal app*. 2. Select *This is an internal app that we have created*. 3. Click **Finish** to finish setting up the app. [![Finishing setup in Okta](/img/docs/dbt-cloud/dbt-cloud-enterprise/okta/okta-4-feedback.png?v=2 "Finishing setup in Okta")](#)Finishing setup in Okta ##### View setup instructions[​](#view-setup-instructions "Direct link to View setup instructions") 1. On the next page, click **View Setup Instructions**. 2. In the steps below, you'll supply these values in your dbt Account Settings to complete the integration between Okta and dbt. [![Viewing the configured application](/img/docs/dbt-cloud/dbt-cloud-enterprise/okta/okta-5-view-instructions.png?v=2 "Viewing the configured application")](#)Viewing the configured application [![Application setup instructions](/img/docs/dbt-cloud/dbt-cloud-enterprise/okta/okta-5-instructions.png?v=2 "Application setup instructions")](#)Application setup instructions 3. After creating the Okta application, follow the instructions in the [dbt setup](#dbt-setup) section to complete the integration. #### Google integration[​](#google-integration "Direct link to Google integration") Use this section if you are configuring Google as your identity provider. ##### Configure the Google application[​](#configure-the-google-application "Direct link to Configure the Google application") The following steps use `YOUR_AUTH0_URI` and `YOUR_AUTH0_ENTITYID`. Replace these placeholders with the [appropriate Auth0 URI and Auth0 Entity ID](https://docs.getdbt.com/docs/cloud/manage-access/sso-overview.md#auth0-uris) for your region. You can find these values in **Account settings** > **SSO & SCIM** > **Edit** or **Get started** after selecting your identity provider. To complete this section, you will need your login URL slug. This slug controls the URL where users on your account can log into your application. dbt automatically generates login URL slugs, which can't be altered. It will contain only letters, numbers, and dashes. For example, the login URL slug for dbt Labs would look something like `dbt-labs-afk123`. Login URL slugs are unique across all dbt accounts. 1. Sign into your **Google Admin Console** via an account with super administrator privileges. 2. From the Admin console Home page, go to **Apps** and then click **Web and mobile apps**. 3. Click **Add**, then click **Add custom SAML app**. 4. Click **Next** to continue. 5. Make these changes on the App Details page: * Name the custom app * Upload an app logo (optional) * Click **Continue**. ##### Configure SAML Settings[​](#configure-saml-settings-1 "Direct link to Configure SAML Settings") 1. Go to the **Google Identity Provider details** page. 2. Download the **IDP metadata**. 3. Copy the **SSO URL** and **Entity ID** and download the **Certificate** (or **SHA-256 fingerprint**, if needed). 4. Enter the following values on the **Service Provider Details** window: * **ACS URL**: `https://YOUR_AUTH0_URI/login/callback?connection=` * **Audience URI (SP Entity ID)**: `urn:auth0::` - **Start URL**: `` 5. Select the **Signed response** checkbox. 6. The default **Name ID** is the primary email. Multi-value input is not supported. If your user profile has a unique, stable value that will persist across email address changes, it's best to use that; otherwise, email will work. 7. Use the **Attribute mapping** page to map your organization's Google Directory Attributes to the format that dbt expects. 8. Click **Add another mapping** to map additional attributes. Expected **Attributes**: | Name | Name format | Value | Description | | --------------- | ----------- | ------------ | ------------------------- | | `First name` | Unspecified | `first_name` | The user's first name. | | `Last name` | Unspecified | `last_name` | The user's last name. | | `Primary email` | Unspecified | `email` | The user's email address. | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | 9. To use [role-based access control](https://docs.getdbt.com/docs/cloud/manage-access/about-user-access.md#role-based-access-control) in dbt, enter the groups in the **Group membership** field during configuration: | Google groups | App attributes | | -------------- | -------------- | | Name of groups | `groups` | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | 10. Click **Finish** to continue. ##### Finish Google setup[​](#finish-google-setup "Direct link to Finish Google setup") 1. From the Admin console Home page, go to **Apps** and then click **Web and mobile apps**. 2. Select your SAML app. 3. Click **User access**. 4. To turn on or off a service for everyone in your organization, click **On for everyone** or **Off for everyone**, and then click **Save**. 5. Ensure that the email addresses your users use to sign in to the SAML app match the email addresses they use to sign in to your Google domain. **Note:** Changes typically take effect in minutes, but can take up to 24 hours. ##### Finish setup[​](#finish-setup-1 "Direct link to Finish setup") After creating the Google application, follow the instructions in the [dbt setup](#dbt-setup) #### Microsoft Entra ID (formerly Azure AD) integration[​](#microsoft-entra-id-formerly-azure-ad-integration "Direct link to Microsoft Entra ID (formerly Azure AD) integration") If you're using Microsoft Entra ID (formerly Azure AD), the instructions below will help you configure it as your identity provider. ##### Create a Microsoft Entra ID Enterprise application[​](#create-a-microsoft-entra-id-enterprise-application "Direct link to Create a Microsoft Entra ID Enterprise application") The following steps use `YOUR_AUTH0_URI` and `YOUR_AUTH0_ENTITYID`. Replace these placeholders with the [appropriate Auth0 URI and Auth0 Entity ID](https://docs.getdbt.com/docs/cloud/manage-access/sso-overview.md#auth0-uris) for your region. You can find these values in **Account settings** > **SSO & SCIM** > **Edit** or **Get started** after selecting your identity provider. To complete this section, you will need your login URL slug. This slug controls the URL where users on your account can log into your application. dbt automatically generates login URL slugs, which can't be altered. It will contain only letters, numbers, and dashes. For example, the login URL slug for dbt Labs would look something like `dbt-labs-afk123`. Login URL slugs are unique across all dbt accounts. Follow these steps to set up single sign-on (SSO) with dbt: 1. Log into your Azure account. 2. In the Entra ID portal, select **Enterprise applications** and click **+ New application**. 3. Select **Create your own application**. 4. Name the application "dbt" or another descriptive name. 5. Select **Integrate any other application you don't find in the gallery (Non-gallery)** as the application type. 6. Click **Create**. 7. You can find the new application by clicking **Enterprise applications** and selecting **All applications**. 8. Click the application you just created. 9. Select **Single sign-on** under Manage in the left navigation. 10. Click **Set up single sign on** under Getting Started. [![In your Overview page, select 'Set up single sign on](/img/docs/dbt-cloud/access-control/single-sign-on-overview.jpg?v=2 "In your Overview page, select 'Set up single sign on")](#)In your Overview page, select 'Set up single sign on 11. Click **SAML** in "Select a single sign-on method" section. [![Select the 'SAML' card in the 'Seelct a single sign-on method' section. ](/img/docs/dbt-cloud/access-control/saml.jpg?v=2 "Select the 'SAML' card in the 'Seelct a single sign-on method' section. ")](#)Select the 'SAML' card in the 'Seelct a single sign-on method' section. 12. Click **Edit** in the Basic SAML Configuration section. [![In the 'Set up Single Sign-On with SAML' page, click 'Edit' in the 'Basic SAML Configuration' card](/img/docs/dbt-cloud/access-control/basic-saml.jpg?v=2 "In the 'Set up Single Sign-On with SAML' page, click 'Edit' in the 'Basic SAML Configuration' card")](#)In the 'Set up Single Sign-On with SAML' page, click 'Edit' in the 'Basic SAML Configuration' card 13. Use the following table to complete the required fields and connect to dbt: | Field | Value | | ---------------------------------------------- | ------------------------------------------------------------------------ | | **Identifier (Entity ID)** | Use `urn:auth0::`. | | **Reply URL (Assertion Consumer Service URL)** | Use `https://YOUR_AUTH0_URI/login/callback?connection=`. | | **Relay State** | `` | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | 14. Click **Save** at the top of the form. ##### Creating SAML settings[​](#creating-saml-settings "Direct link to Creating SAML settings") From the Set up Single Sign-On with SAML page: 1. Click **Edit** in the User Attributes & Claims section. 2. Click **Unique User Identifier (Name ID)** under **Required claim.** 3. Set **Name identifier format** to **Unspecified**. 4. Set **Source attribute** to **user.objectid**. 5. Delete all claims under **Additional claims.** 6. Click **Add new claim** and add the following new claims: | Name | Source attribute | | --------------- | ---------------- | | **email** | user.mail | | **first\_name** | user.givenname | | **last\_name** | user.surname | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | 7. Click **Add a group claim** from **User Attributes and Claims.** 8. If you assign users directly to the enterprise application, select **Security Groups**. If not, select **Groups assigned to the application**. 9. Set **Source attribute** to **Group ID**. 10. Under **Advanced options**, check **Customize the name of the group claim** and specify **Name** to **groups**. **Note:** Keep in mind that the Group ID in Entra ID maps to that group's GUID. It should be specified in lowercase for the mappings to work as expected. The Source Attribute field alternatively can be set to a different value of your preference. ##### Finish setup[​](#finish-setup-2 "Direct link to Finish setup") 9. After creating the Azure application, follow the instructions in the [dbt setup](#dbt-setup) section to complete the integration. The names for fields in dbt vary from those in the Entra ID app. They're mapped as follows: | dbt field | Corresponding Entra ID field | | ----------------------------- | ---------------------------- | | **Identity Provider SSO URL** | Login URL | | **Identity Provider Issuer** | Microsoft Entra Identifier | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | #### OneLogin integration[​](#onelogin-integration "Direct link to OneLogin integration") Use this section if you are configuring OneLogin as your identity provider. To configure OneLogin, you will need **Administrator** access. ##### Configure the OneLogin application[​](#configure-the-onelogin-application "Direct link to Configure the OneLogin application") The following steps use `YOUR_AUTH0_URI` and `YOUR_AUTH0_ENTITYID`. Replace these placeholders with the [appropriate Auth0 URI and Auth0 Entity ID](https://docs.getdbt.com/docs/cloud/manage-access/sso-overview.md#auth0-uris) for your region. You can find these values in **Account settings** > **SSO & SCIM** > **Edit** or **Get started** after selecting your identity provider. To complete this section, you will need your login URL slug. This slug controls the URL where users on your account can log into your application. dbt automatically generates login URL slugs, which can't be altered. It will contain only letters, numbers, and dashes. For example, the login URL slug for dbt Labs would look something like `dbt-labs-afk123`. Login URL slugs are unique across all dbt accounts. 1. Log into OneLogin, and add a new SAML 2.0 Application. 2. Configure the application with the following details: * **Platform:** Web * **Sign on method:** SAML 2.0 * **App name:** dbt * **App logo (optional):** You can optionally [download the dbt logo](https://drive.google.com/file/d/1fnsWHRu2a_UkJBJgkZtqt99x5bSyf3Aw/view?usp=sharing), and use as the logo for this app. ##### Configure SAML settings[​](#configure-saml-settings-2 "Direct link to Configure SAML settings") 3. Under the **Configuration tab**, input the following values: * **RelayState:** `` * **Audience (EntityID):** `urn:auth0::` * **ACS (Consumer) URL Validator:** `https://YOUR_AUTH0_URI/login/callback?connection=` * **ACS (Consumer) URL:** `https://YOUR_AUTH0_URI/login/callback?connection=` 4. Next, go to the **Parameters tab**. You must have a parameter for the Email, First Name, and Last Name attributes and include all parameters in the SAML assertions. When you add the custom parameters, make sure you select the **Include in SAML assertion** checkbox. We recommend using the following values: | name | name format | value | | ----------- | ----------- | ----------- | | NameID | Unspecified | OneLogin ID | | email | Unspecified | Email | | first\_name | Unspecified | First Name | | last\_name | Unspecified | Last Name | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | dbt's [role-based access control](https://docs.getdbt.com/docs/cloud/manage-access/about-user-access.md#role-based-access-control) relies on group mappings from the IdP to assign dbt users to dbt groups. To use role-based access control in dbt, also configure OneLogin to provide group membership information in user attribute called `groups`: | name | name format | value | description | | ------ | ----------- | ------------------------------------------------- | --------------------------------------- | | groups | Unspecified | Series of groups to be used for your organization | The groups a user belongs to in the IdP | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | ##### Collect integration secrets[​](#collect-integration-secrets-1 "Direct link to Collect integration secrets") 5. After confirming your details, go to the **SSO tab**. OneLogin should show you the following values for the new integration. Keep these values somewhere safe, as you will need them to complete setup in dbt. * Issuer URL * SAML 2.0 Endpoint (HTTP) * X.509 Certificate (PEM format required) *  Example of PEM format ```text -----BEGIN CERTIFICATE----- MIIC8DCCAdigAwIBAgIQSANTIKwxA1221kqhkiG9w0dbtLabsBAQsFADA0MTIwMAYDVQQD EylNaWNyb3NvZnQgQXp1cmUgRmVkZXJhdGVkIFNTTyBDZXJ0aWZpY2F0ZTAeFw0yMzEyMjIwMDU1 MDNaFw0yNjEyMjIwMDU1MDNaMDQxMjAwBgNVBAMTKU1pY3Jvc29mdCBBenVyZSBGZWRlcmF0ZWQg U1NPIENlcnRpZmljYXRlMIIBIjANBgkqhkiG9w0BAEFAAFRANKIEMIIBCgKCAQEAqfXQGc/D8ofK aXbPXftPotqYLEQtvqMymgvhFuUm+bQ9YSpS1zwNQ9D9hWVmcqis6gO/VFw61e0lFnsOuyx+XMKL rJjAIsuWORavFqzKFnAz7hsPrDw5lkNZaO4T7tKs+E8N/Qm4kUp5omZv/UjRxN0XaD+o5iJJKPSZ PBUDo22m+306DE6ZE8wqxT4jTq4g0uXEitD2ZyKaD6WoPRETZELSl5oiCB47Pgn/mpqae9o0Q2aQ LP9zosNZ07IjKkIfyFKMP7xHwzrl5a60y0rSIYS/edqwEhkpzaz0f8QW5pws668CpZ1AVgfP9TtD Y1EuxBSDQoY5TLR8++2eH4te0QIDAQABMA0GCSqGSIb3DmAKINgAA4IBAQCEts9ujwaokRGfdtgH 76kGrRHiFVWTyWdcpl1dNDvGhUtCRsTC76qwvCcPnDEFBebVimE0ik4oSwwQJALExriSvxtcNW1b qvnY52duXeZ1CSfwHkHkQLyWBANv8ZCkgtcSWnoHELLOWORLD4aSrAAY2s5hP3ukWdV9zQscUw2b GwN0/bTxxQgA2NLZzFuHSnkuRX5dbtrun21USPTHMGmFFYBqZqwePZXTcyxp64f3Mtj3g327r/qZ squyPSq5BrF4ivguYoTcGg4SCP7qfiNRFyBUTTERFLYU0n46MuPmVC7vXTsPRQtNRTpJj/b2gGLk 1RcPb1JosS1ct5Mtjs41 -----END CERTIFICATE----- ``` ##### Finish setup[​](#finish-setup-3 "Direct link to Finish setup") 6. After creating the OneLogin application, follow the instructions in the [dbt setup](#dbt-setup) section to complete the integration. #### dbt setup[​](#dbt-setup "Direct link to dbt setup") ##### Providing IdP values to dbt[​](#providing-idp-values-to-dbt "Direct link to Providing IdP values to dbt") To complete setup, follow the steps below in dbt: 1. Navigate to the **Account Settings** and then click on **Single Sign On**. 2. Click **Edit** on the upper right corner. 3. Provide the following SSO details: | Field | Value | | ------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | Log in with | SAML 2.0 | | Identity Provider SSO Url | Paste the **Identity Provider Single Sign-On URL** shown in the IdP setup instructions | | Identity Provider Issuer | Paste the **Identity Provider Issuer** shown in the IdP setup instructions | | X.509 Certificate | Paste the **X.509 Certificate** shown in the IdP setup instructions;
**Note:** When the certificate expires, an Idp admin will have to generate a new one to be pasted into dbt for uninterrupted application access. | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | [![Configuring the application in dbt](/img/docs/dbt-cloud/dbt-cloud-enterprise/okta/okta-6-setup-integration.png?v=2 "Configuring the application in dbt")](#)Configuring the application in dbt 4. Click **Save** to complete setup for the SAML 2.0 integration. 5. After completing the setup, you can navigate to the URL generated for your account's *slug* to test logging in with your identity provider. Additionally, users added the the SAML 2.0 app will be able to log in to dbt from the IdP directly. ##### Additional configuration options[​](#additional-configuration-options "Direct link to Additional configuration options") The **Single sign-on** section also contains additional configuration options which are located after the credentials fields. * **Sign SAML Auth Request:** dbt will sign SAML requests sent to your identity provider when users attempt to log in. Metadata for configuring this in your identity provider can be downloaded from the value shown in **SAML Metadata URL**. We recommend leaving this disabled for most situations. * **Attribute Mappings:** Associate SAML attributes that dbt needs with attributes your identity provider includes in SAML assertions. The value must be a valid JSON object with the `email`, `first_name`, `last_name`, or `groups` keys and values that are strings or lists of strings. For example, if your identity provider is unable to include an `email` attribute in assertions, but does include one called `EmailAddress`, then **Attribute Mappings** should be set to `{ "email": "EmailAddress" }`. The mappings are only needed if you cannot configure attributes as specified in the instructions on this page. If you can, the default value of `{}` is acceptable. Logging in Users can now log into dbt platform by navigating to the following URL, replacing `LOGIN-SLUG` with the value used in the previous steps and `YOUR_ACCESS_URL` with the [appropriate Access URL](https://docs.getdbt.com/docs/cloud/about-cloud/access-regions-ip-addresses.md) for your region and plan: `https://YOUR_ACCESS_URL/enterprise-login/LOGIN-SLUG` ##### Setting up RBAC[​](#setting-up-rbac "Direct link to Setting up RBAC") After configuring an identity provider, you will be able to set up [role-based access control](https://docs.getdbt.com/docs/cloud/manage-access/enterprise-permissions.md) for your account. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ##### Single sign-on (SSO) overview EnterpriseEnterprise + ### Single sign-on (SSO) overview [Enterprise](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing")[Enterprise +](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing") This overview explains how users are provisioned in dbt via single sign-on (SSO). dbt supports JIT (Just-in-Time) provisioning and IdP-initiated login. To further automate your workflow, you can use [System for Cross-Domain Identity Management (SCIM)](https://docs.getdbt.com/docs/cloud/manage-access/scim.md) to provision users, manage group memberships, and automate license assignments directly from your identity provider (IdP) (Okta or Microsoft Entra ID). Learn more about our dbt plans [here](https://www.getdbt.com/pricing/). #### Prerequisites[​](#prerequisites "Direct link to Prerequisites") * You have a dbt Enterprise or Enterprise+ plan. [Contact us](mailto:sales@getdbt.com) to learn more, book a demo, or enroll. #### Auth0 URIs[​](#auth0-uris "Direct link to Auth0 URIs") The URI used for SSO connections will vary based on your dbt hosted region. To find the Auth0 URI (also called the **Single sign-on URL**, **Authorization URL**, or **Callback URI**) for your environment: 1. Navigate to your **Account settings** and click **SSO & SCIM** in the left menu. 2. In the **Single sign-on** pane, click **Get started** (if SSO has not been configured) or **Edit** (if it has already been set up). 3. Select the appropriate **Identity provider** from the **Provider type** dropdown. 4. The Auth0 URI is displayed under the **Identity provider values** section. The field label depends on the provider you selected: | Identity provider | Field label | Example URI | | ------------------ | --------------------------- | --------------------------------------------------------------- | | SAML 2.0 | **Single sign-on URL** | `https://YOUR_AUTH0_URI/login/callback` | | Okta | **Single sign-on URL** | `https://YOUR_AUTH0_URI/login/callback?connection=ACCOUNT_NAME` | | Google Workspace | **Authorized Redirect URI** | `https://YOUR_AUTH0_URI/login/callback` | | Microsoft Entra ID | **Callback URI** | `https://YOUR_AUTH0_URI/login/callback` | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | *Replace `YOUR_AUTH0_URI` and `ACCOUNT_NAME` with your account values.* [![Example of the identity provider values for a SAML 2.0 provider](/img/docs/dbt-cloud/access-control/sso-uri.png?v=2 "Example of the identity provider values for a SAML 2.0 provider")](#)Example of the identity provider values for a SAML 2.0 provider Auth0 URI The Auth0 URI always contains YOUR\_AUTH0\_URI (for example, auth.cloud.getdbt.com), not your account-specific prefix URL (such as ks123.us1.dbt.com). This is because dbt uses Auth0 as a centralized authentication service across all regions and accounts. You don't need to replace this value with your cell-specific URL. #### SSO process[​](#sso-process "Direct link to SSO process") The diagram below explains the basic process by which users are provisioned in dbt upon logging in with SSO. [![SSO diagram](/img/sso_overview.png?v=2 "SSO diagram")](#)SSO diagram ###### Diagram Explanation[​](#diagram-explanation "Direct link to Diagram Explanation") * **Login Page**: The user accesses the dbt login page, initiating the SSO flow. * **IdP-Initiated Login**: The user accesses the dbt login page within the Identity Provider by selecting the dbt application. This will begin the IdP login flow. * **IdP Login Page**: The user is prompted to login to the Identity Provider. This will grant the dbt application access to the details of their account. * **Login?**: The user can choose to continue or to abort the login process. * **Yes**: The user logs in, grants the dbt application, and continues. * **No**: The user does not log in. They return to the IdP login page. * **User Exists?**: This step checks if the user already exist in dbt's user database. * **Yes**: If so, skip the user creation process * **No**: If so, create a new entry in the dbt database for the new user. * **Create dbt User**: This will create a new entry in the dbt database for the new user. This user record contains the user's email address, first and last name, and any IdP attributes (for example, groups) passed along from the Identity Provider. dbt will send a verification email, and the user must follow the steps in the [User experience section](https://docs.getdbt.com/docs/cloud/manage-access/invite-users.md#user-experience) to use SSO in dbt. * **Attach Matching Accounts**: dbt find all of the accounts configured to match the SSO config used by this user to log in, and then create a user license record mapping the user to the account. This step will also delete any licenses that the user should not have based on the current SSO config. * **Attach Matching Permissions (Groups)**: dbt iterates through the groups on the matching accounts, and find all that fit one of the below categories: * Have an SSO mapping group that is assigned to the user * Have the "Assign by Default" option checked. Then, assign all of these (and only these) to the user license. This step will also remove any permissions that the user should not have based on the current SSO group mappings. * **dbt Application**: After these steps, the user is redirected into the dbt application, and they can begin to use the application normally. License and permission mappings use IdP groups License type mappings and SSO group mappings are based on **IdP group** membership (groups in your identity provider), not dbt platform group names. When configuring [license mappings](https://docs.getdbt.com/docs/cloud/manage-access/seats-and-users.md#mapped-configuration) or group assignments, use the group names and memberships from your IdP. #### SSO enforcement[​](#sso-enforcement "Direct link to SSO enforcement") * **SSO Enforcement:** If SSO is turned on in your organization, dbt will enforce SSO-only logins for all non-admin users. By default, if an Account Admin or Security Admin already has a password, they can continue logging in with a password. To restrict admins from using passwords, turn off **Allow password logins for account administrators** in the **SSO & SCIM** section of your organization's **Account settings**. * **SSO Re-Authentication:** dbt will prompt you to re-authenticate using your SSO provider every 24 hours to ensure high security. ##### How should non-admin users log in?[​](#how-should-non-admin-users-log-in "Direct link to How should non-admin users log in?") Non-admin users that currently login with a password will no longer be able to do so. They must login using the dbt Enterprise Login URL or an identity provider (IdP). For example, Okta, Microsoft Entra ID (formerly Azure AD), etc. ##### Security best practices[​](#security-best-practices "Direct link to Security best practices") There are a few scenarios that might require you to login with a password. We recommend these security best-practices for the two most common scenarios: * **Onboarding partners and contractors** — We highly recommend that you add partners and contractors to your Identity Provider. IdPs like Okta and Microsoft Entra ID offer capabilities explicitly for temporary employees. We highly recommend that you reach out to your IT team to provision an SSO license for these situations. Using an IdP highly secure, reduces any breach risk, and significantly increases the security posture of your dbt environment. * **Identity Provider is down** — Account admins will continue to be able to log in with a password which would allow them to work with your Identity Provider to troubleshoot the problem. * **Offboarding admins** — When offboarding admins, revoke access to dbt by deleting the user from your environment; otherwise, they can continue to use username/password credentials to log in. ##### Next steps for non-admin users currently logging in with passwords[​](#next-steps-for-non-admin-users-currently-logging-in-with-passwords "Direct link to Next steps for non-admin users currently logging in with passwords") If you have any non-admin users logging into dbt with a password today: 1. Ensure that all users have a user account in your identity provider and are assigned dbt so they won’t lose access. 2. Alert all dbt users that they won’t be able to use a password for logging in anymore unless they are already an Admin with a password. 3. We **DO NOT** recommend promoting any users to Admins just to preserve password-based logins because you will reduce security of your dbt environment. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ##### Single sign-on and OAuth EnterpriseEnterprise + ### Single sign-on and OAuth [Enterprise](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing")[Enterprise +](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing") This section covers how to authenticate users and connect data platforms in dbt platform using: * [Single sign-on (SSO)](#sso) * [System for Cross-Domain Identity Management (SCIM)](#scim) * [Connection OAuth](#connection-oauth) These features are available on Enterprise and Enterprise+ plans and are typically configured by account admins or security teams. #### SSO[​](#sso "Direct link to SSO") Lets users log in to dbt with your identity provider (IdP) instead of a password. Supports Just-in-Time provisioning and IdP-initiated login. *For admins setting up Okta, Microsoft Entra ID, Google Workspace, or SAML 2.0.* * [Single sign-on (SSO) overview](https://docs.getdbt.com/docs/cloud/manage-access/sso-overview.md) — How SSO works and prerequisites * [Migrating to Auth0 for SSO](https://docs.getdbt.com/docs/cloud/manage-access/auth0-migration.md) * [Set up SSO with SAML 2.0](https://docs.getdbt.com/docs/cloud/manage-access/set-up-sso-saml-2.0.md) * [Set up SSO with Okta](https://docs.getdbt.com/docs/cloud/manage-access/set-up-sso-okta.md) * [Set up SSO with Google Workspace](https://docs.getdbt.com/docs/cloud/manage-access/set-up-sso-google-workspace.md) * [Set up SSO with Microsoft Entra ID](https://docs.getdbt.com/docs/cloud/manage-access/set-up-sso-microsoft-entra-id.md) #### SCIM[​](#scim "Direct link to SCIM") Automates user and group provisioning from your IdP into dbt (and, with Okta, license assignment). *For admins using Okta or Microsoft Entra ID who want to sync users and groups.* * [Set up SCIM](https://docs.getdbt.com/docs/cloud/manage-access/scim.md) — Prerequisites and enabling SCIM in dbt * [Set up SCIM with Okta](https://docs.getdbt.com/docs/cloud/manage-access/scim-okta.md) (includes [license management](https://docs.getdbt.com/docs/cloud/manage-access/scim-manage-user-licenses.md)) * [Set up SCIM with Entra ID](https://docs.getdbt.com/docs/cloud/manage-access/scim-entra-id.md) #### Connection OAuth[​](#connection-oauth "Direct link to Connection OAuth") Connection OAuth is for authenticating to your data platform (like Snowflake, BigQuery), which is different from SSO, which handles user login to dbt platform. It lets developers authorize their development credentials with a data platform using that platform's login instead of storing passwords in dbt. *For admins and developers connecting to supported data platforms.* * [OAuth overview](https://docs.getdbt.com/docs/cloud/manage-access/oauth-intro.md) — What's available by platform * [Set up Snowflake OAuth](https://docs.getdbt.com/docs/cloud/manage-access/set-up-snowflake-oauth.md) * [Set up Databricks OAuth](https://docs.getdbt.com/docs/cloud/manage-access/set-up-databricks-oauth.md) * [Set up BigQuery OAuth](https://docs.getdbt.com/docs/cloud/manage-access/set-up-bigquery-oauth.md) * [Set up external OAuth with Snowflake](https://docs.getdbt.com/docs/cloud/manage-access/snowflake-external-oauth.md) * [Set up external OAuth with Redshift](https://docs.getdbt.com/docs/cloud/manage-access/redshift-external-oauth.md) #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ##### Users and licenses In dbt, *licenses* are used to allocate users to your account. There are four license types in dbt: * **Analyst** — Available on [Enterprise and Enterprise+ plans only](https://www.getdbt.com/pricing). Requires developer seat license purchase. * User can be granted *any* permission sets. * **Developer** — User can be granted *any* permission sets. * **IT** — Available on [Starter, Enterprise, and Enterprise+ plans only](https://www.getdbt.com/pricing). User has Security Admin and Billing Admin [permissions](https://docs.getdbt.com/docs/cloud/manage-access/enterprise-permissions.md#permission-sets) applied, as well as permissions to edit **Connections** in the **Account settings** page. * Can manage users, groups, connections, and licenses, among other permissions. * *IT licensed users do not inherit rights from any permission sets*. * Every IT licensed user has the same access across the account, regardless of the group permissions assigned. * **Read-Only** — Available on [Starter, Enterprise, and Enterprise+ plans only](https://www.getdbt.com/pricing). * User has read-only permissions applied to all dbt resources. * Intended to view the [artifacts](https://docs.getdbt.com/docs/deploy/artifacts.md) and the [deploy](https://docs.getdbt.com/docs/deploy/deployments.md) section (jobs, runs, schedules) in a dbt account, but can’t make changes. * *Read-only licensed users do not inherit rights from any permission sets*. * Every read-only licensed user has the same access across the account, regardless of the group permissions assigned. The user's assigned license determines the specific capabilities they can access in dbt. | Functionality | Developer or Analyst Users | Read-Only Users | IT Users\* | | ------------------------------------------------------------------------------------- | -------------------------- | --------------- | ---------- | | Use the Studio IDE | ✅ | ❌ | ❌ | | Use the dbt CLI | ✅ | ❌ | ❌ | | Use Jobs | ✅ | ❌ | ❌ | | Manage Account | ✅ | ❌ | ✅ | | API Access | ✅ | ✅ | ❌ | | Use [Catalog](https://docs.getdbt.com/docs/explore/explore-projects.md) | ✅ | ✅ | ❌ | | Use [Source Freshness](https://docs.getdbt.com/docs/deploy/source-freshness.md) | ✅ | ✅ | ❌ | | Use [Docs](https://docs.getdbt.com/docs/explore/build-and-view-your-docs.md) | ✅ | ✅ | ❌ | | Receive [Job notifications](https://docs.getdbt.com/docs/deploy/job-notifications.md) | ✅ | ✅ | ✅ | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | \*Available on Starter, Enterprise, and Enterprise+ plans only. IT seats are limited to 1 seat per Starter or Enterprise-tier account and don't count toward seat usage. #### Licenses[​](#licenses "Direct link to Licenses") License types override group permissions **User license types always override their assigned group permission sets.** For example, a user with a Read-Only license cannot perform administrative actions, even if they belong to an Account Admin group. This ensures that license restrictions are always enforced, regardless of group membership. Each dbt plan comes with a base number of Developer, IT, and Read-Only licenses. You can add or remove licenses by modifying the number of users in your account settings. If you have a Developer plan account and want to add more people to your team, you'll need to upgrade to the Starter plan. Refer to [dbt Pricing Plans](https://www.getdbt.com/pricing/) for more information about licenses available with each plan. The following tabs detail steps on how to modify your user license count: * Enterprise-tier plans * Starter plans If you're on an Enterprise-tier plan and have the correct [permissions](https://docs.getdbt.com/docs/cloud/manage-access/enterprise-permissions.md), you can add or remove licenses by adjusting your user seat count. Note, an IT license does not count toward seat usage. * To remove a user, click on your account name in the left side menu, click **Account settings** and select **Users**. * Select the user you want to remove, click **Edit**, and then **Delete**. * This action cannot be undone. However, you can re-invite the user with the same info if you deleted the user in error.
* To add a user, go to **Account Settings** and select **Users**. * Click the [**Invite Users**](https://docs.getdbt.com/docs/cloud/manage-access/invite-users.md) button. * For fine-grained permission configuration, refer to [Role based access control](https://docs.getdbt.com/docs/cloud/manage-access/about-user-access.md#role-based-access-control-). If you're on a Starter plan and have the correct [permissions](https://docs.getdbt.com/docs/cloud/manage-access/self-service-permissions.md), you can add or remove developers. Refer to [Self-service Starter account permissions](https://docs.getdbt.com/docs/cloud/manage-access/self-service-permissions.md#licenses) for more information on the number of each license type included in the Starter plan. You'll need to make two changes: * Adjust your developer user seat count, which manages the users invited to your dbt project. * Adjust your developer billing seat count, which manages the number of billable seats. You can add or remove developers by increasing or decreasing the number of users and billable seats in your account settings: * Adding users * Deleting users To add a user in dbt, you must be an account owner or have admin privileges. 1. From dbt, click on your account name in the left side menu and select **Account settings**. [![Navigate to Account settings](/img/docs/dbt-cloud/Navigate-to-account-settings.png?v=2 "Navigate to Account settings")](#)Navigate to Account settings 2. In **Account Settings**, select **Billing**. 3. Under **Billing details**, enter the number of developer seats you want and make sure you fill in all the payment details, including the **Billing address** section. Leaving these blank won't allow you to save your changes. 4. Press **Update Payment Information** to save your changes. [![Navigate to Account settings -> Billing to modify billing seat count](/img/docs/dbt-cloud/faq-account-settings-billing.png?v=2 "Navigate to Account settings -> Billing to modify billing seat count")](#)Navigate to Account settings -> Billing to modify billing seat count Now that you've updated your billing, you can now [invite users](https://docs.getdbt.com/docs/cloud/manage-access/invite-users.md) to join your dbt account: Great work! After completing those these steps, your dbt user count and billing count should now be the same. To delete a user in dbt, you must be an account owner or have admin privileges. If the user has a `developer` license type, this will open up their seat for another user or allow the admins to lower the total number of seats. 1. From dbt, click on your account name in the left side menu and select **Account settings**. [![Navigate to Account settings](/img/docs/dbt-cloud/Navigate-to-account-settings.png?v=2 "Navigate to Account settings")](#)Navigate to Account settings 2. In **Account settings**, select **Users**. 3. Select the user you want to delete, then click **Edit**. 4. Click **Delete** in the bottom left. Click **Confirm Delete** to immediately delete the user without additional password prompts. This action cannot be undone. However, you can re-invite the user with the same information if the deletion was made in error. [![Deleting a user](/img/docs/dbt-cloud/delete_user_20221023.gif?v=2 "Deleting a user")](#)Deleting a user If you are on a **Starter** plan and you're deleting users to reduce the number of billable seats, follow these steps to lower the license count to avoid being overcharged: 1. In **Account Settings**, select **Billing**. 2. Under **Billing details**, enter the number of developer seats you want and make sure you fill in all the payment details, including the **Billing address** section. If you leave any field blank, you won't be able to save your changes. 3. Click **Update Payment Information** to save your changes. [![The Billing\*\* page in your \*\*Account settings](/img/docs/dbt-cloud/faq-account-settings-billing.png?v=2 "The Billing** page in your **Account settings")](#)The Billing\*\* page in your \*\*Account settings Great work! After completing these steps, your dbt user count and billing count should now be the same. #### Managing license types[​](#managing-license-types "Direct link to Managing license types") Licenses can be assigned to users individually or through group membership. To assign a license via group membership, you can manually add a user to a group during the invitation process or assign them to a group after they’ve enrolled in dbt. Alternatively, with [SSO configuration](https://docs.getdbt.com/docs/cloud/manage-access/sso-overview.md) and [role-based access control](https://docs.getdbt.com/docs/cloud/manage-access/about-user-access.md#role-based-access-control-) (Enterprise-tier only), users can be automatically assigned to groups. By default, new users in an account are assigned a Developer license. ##### Manual configuration[​](#manual-configuration "Direct link to Manual configuration") To manually assign a specific type of license to a user on your team: 1. Click on your account name in the left side menu and select **Account settings**. 2. Navigate to the **Users** page in your **Account settings**. 3. Select the user you want to manage and click the **Edit** button. 4. On the **User details** page, you can select the license type and relevant groups for the user. **Note:** You will need to have an available license ready to allocate for the user. If your account does not have an available license to allocate, you will need to add more licenses to your plan to complete the license change. [![Manually assigning licenses](/img/docs/dbt-cloud/access-control/license-manual.png?v=2 "Manually assigning licenses")](#)Manually assigning licenses ##### Mapped configuration [Enterprise](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing")[Enterprise +](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing")[​](#mapped-configuration- "Direct link to mapped-configuration-") If your account is connected to an Identity Provider (IdP) for [Single Sign On](https://docs.getdbt.com/docs/cloud/manage-access/sso-overview.md), you can automatically map IdP user groups to specific license types in dbt. For SCIM-based license mapping with Okta, see [Automated license mapping](https://docs.getdbt.com/docs/cloud/manage-access/scim-manage-user-licenses.md#automated-license-mapping). ###### Configure license mappings[​](#configure-license-mappings "Direct link to Configure license mappings") 1. Click on your account name in the left side menu and select **Account settings**. 2. Navigate to **Groups & Licenses** and scroll to the **License mappings** section. 3. Create or edit SSO mappings for both Read-Only and Developer license types. 4. Enter a comma-separated list of **IdP group names** that should receive each license type. [![Configuring IdP group license mapping](/img/docs/dbt-cloud/access-control/license-mapping.png?v=2 "Configuring IdP group license mapping")](#)Configuring IdP group license mapping ###### Fundamental licensing rules[​](#fundamental-licensing-rules "Direct link to Fundamental licensing rules") * **Default assignment**: All new members of a dbt account are assigned a Developer license unless you configure otherwise. * **Mapping basis**: License type mappings are based on *IdP groups* (groups in your identity provider), not *dbt groups*. Check group memberships in your IdP when configuring or troubleshooting. * **When changes take effect**: License types are adjusted when users sign into dbt using single sign-on. Changes to license type mappings take effect the next time users sign in. ###### Mapping logic and precedence[​](#mapping-logic-and-precedence "Direct link to Mapping logic and precedence") When a user belongs to multiple IdP groups, the Developer license takes precedence. The following table shows how group membership determines the assigned license: | In a Developer-mapped group? | In a Read-Only-mapped group? | License assigned | | ---------------------------- | ---------------------------- | -------------------------------------- | | No | No | Developer (default) | | No | Yes | Read-Only | | Yes | No | Developer | | Yes | Yes | Developer (Developer takes precedence) | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | note If a user's IdP groups do not match *any* license type mappings, dbt assigns a Developer license by default. #### Granular permissioning[​](#granular-permissioning "Direct link to Granular permissioning") dbt Enterprise-tier plans support role-based access controls for configuring granular in-app permissions. See [access control](https://docs.getdbt.com/docs/cloud/manage-access/about-user-access.md) for more information on Enterprise permissioning. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- #### Secure ##### About network security Network security in dbt gives you control over how traffic flows between dbt and your infrastructure. Choose the approach that best fits your security requirements. #### Choose your connectivity approach[​](#choose-your-connectivity-approach "Direct link to Choose your connectivity approach") [![](/img/icons/dbt-bit.svg)](https://docs.getdbt.com/docs/cloud/secure/ip-restrictions.md) ###### [Over the public internet](https://docs.getdbt.com/docs/cloud/secure/ip-restrictions.md) [Use IP restrictions to limit which IP addresses can access dbt Cloud or your data platform.](https://docs.getdbt.com/docs/cloud/secure/ip-restrictions.md) [![](/img/icons/dbt-bit.svg)](https://docs.getdbt.com/docs/cloud/secure/private-connectivity/private-connectivity.md) ###### [Over a private network](https://docs.getdbt.com/docs/cloud/secure/private-connectivity/private-connectivity.md) [Use your cloud provider's private connectivity technology to keep traffic off the public internet entirely.](https://docs.getdbt.com/docs/cloud/secure/private-connectivity/private-connectivity.md) #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ##### About private connectivity Available to certain Enterprise tiers The private connection feature is available on the following dbt Enterprise tiers: * Business Critical * Virtual Private To learn more about these tiers, contact us at . Environment variables Using [Environment variables](https://docs.getdbt.com/docs/build/environment-variables.md) when configuring private connection endpoints isn't supported in dbt. Instead, use [Extended Attributes](https://docs.getdbt.com/docs/deploy/deploy-environments.md#extended-attributes) to dynamically change these values in your dbt environment. Private connections enables secure communication from any dbt environment to your data platform hosted on a cloud provider, such as [AWS](https://aws.amazon.com/privatelink/), [Azure](https://azure.microsoft.com/en-us/products/private-link), or [GCP](https://cloud.google.com/vpc/docs/private-service-connect), using that provider's private connection technology. Private connections allow dbt customers to meet security and compliance controls as it allows connectivity between dbt and your data platform without traversing the public internet. This feature is supported in most regions across North America, Europe, and Asia, but [contact us](https://www.getdbt.com/contact/) if you have questions about availability. Private connection endpoints can't connect across cloud providers (AWS, Azure, and GCP). For a private connection to work, both dbt and the server (like ) must be hosted on the same cloud provider. For example, dbt hosted on AWS cannot connect to services hosted on Azure, and dbt hosted on Azure can’t connect to services hosted on GCP. #### Available platforms[​](#available-platforms "Direct link to Available platforms") Select your cloud platform to view private connectivity options, support matrix, and configuration guides. [![](/img/icons/dbt-bit.svg)](https://docs.getdbt.com/docs/cloud/secure/private-connectivity/aws/aws-overview.md) ###### [AWS](https://docs.getdbt.com/docs/cloud/secure/private-connectivity/aws/aws-overview.md) [Amazon Web Services PrivateLink](https://docs.getdbt.com/docs/cloud/secure/private-connectivity/aws/aws-overview.md) [![](/img/icons/dbt-bit.svg)](https://docs.getdbt.com/docs/cloud/secure/private-connectivity/azure/azure-overview.md) ###### [Azure](https://docs.getdbt.com/docs/cloud/secure/private-connectivity/azure/azure-overview.md) [Microsoft Azure Private Link](https://docs.getdbt.com/docs/cloud/secure/private-connectivity/azure/azure-overview.md) [![](/img/icons/dbt-bit.svg)](https://docs.getdbt.com/docs/cloud/secure/private-connectivity/gcp/gcp-overview.md) ###### [GCP](https://docs.getdbt.com/docs/cloud/secure/private-connectivity/gcp/gcp-overview.md) [Google Cloud Platform Private Service Connect](https://docs.getdbt.com/docs/cloud/secure/private-connectivity/gcp/gcp-overview.md) #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ##### AWS private connectivity Available to certain Enterprise tiers The private connection feature is available on the following dbt Enterprise tiers: * Business Critical * Virtual Private To learn more about these tiers, contact us at . AWS PrivateLink enables secure, private connectivity between dbt and your AWS-hosted services. With PrivateLink, traffic between dbt and your data platforms or self-hosted services stays within the AWS network and does not traverse the public internet. For more details, refer to the [AWS PrivateLink documentation](https://docs.aws.amazon.com/vpc/latest/privatelink/). #### AWS private connectivity matrix[​](#aws-private-connectivity-matrix "Direct link to AWS private connectivity matrix") The following charts outline private connectivity options for AWS deployments of dbt ([multi-tenant and single-tenant](https://docs.getdbt.com/docs/cloud/about-cloud/tenancy.md)). **Legend:** * ✅ = Available * ❌ = Not currently available *Tenancy:* MT (multi-tenant) and ST (single-tenant) — [learn more about tenancy](https://docs.getdbt.com/docs/cloud/about-cloud/tenancy.md). About the following matrix tables These tables indicate whether private connectivity can be established to specific services, considering major factors such as the network and basic auth layers. dbt has validated these configurations using common deployment patterns and typical use cases. However, individual configurations may vary. If you encounter issues or have questions about your environment, [contact dbt Support](https://docs.getdbt.com/community/resources/getting-help.md#dbt-cloud-support) for guidance. *** ##### Connecting to the dbt platform (Ingress)[​](#connecting-to-the-dbt-platform-ingress "Direct link to Connecting to the dbt platform (Ingress)") Your services can connect to dbt over private connectivity using the dbt-provisioned model. In this case, dbt is the service producer and you are the consumer. | Connectivity type | MT | ST | | ------------------------------ | -- | -- | | Private dbt access | ❌ | ✅ | | Dual access (public + private) | ❌ | ✅ | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | *** ##### Connecting the dbt platform to managed services (Egress)[​](#connecting-the-dbt-platform-to-managed-services-egress "Direct link to Connecting the dbt platform to managed services (Egress)") dbt can establish private connections to managed data platforms and cloud-native services. | Service | MT | ST | Setup guide | | -------------------------- | -- | -- | -------------------------------------------------------------------------------------------- | | Snowflake | ✅ | ✅ | [View](https://docs.getdbt.com/docs/cloud/secure/private-connectivity/aws/aws-snowflake.md) | |   Snowflake Internal Stage | ✅ | ✅ | [View](https://docs.getdbt.com/docs/cloud/secure/private-connectivity/aws/aws-snowflake.md) | | Databricks | ✅ | ✅ | [View](https://docs.getdbt.com/docs/cloud/secure/private-connectivity/aws/aws-databricks.md) | | Redshift | ✅ | ✅ | [View](https://docs.getdbt.com/docs/cloud/secure/private-connectivity/aws/aws-redshift.md) | | Redshift Serverless | ✅ | ✅ | [View](https://docs.getdbt.com/docs/cloud/secure/private-connectivity/aws/aws-redshift.md) | | Amazon Athena w/ AWS Glue | ❌ | ✅ | | | AWS CodeCommit | ❌ | ✅ | | | Teradata VantageCloud | ✅ | ✅ | | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | *** ##### Connecting the dbt platform to self-hosted services (Egress)[​](#connecting-the-dbt-platform-to-self-hosted-services-egress "Direct link to Connecting the dbt platform to self-hosted services (Egress)") All of the services below share a common PrivateLink setup guide — backend configuration varies by service. Self-hosted connections use the customer-provisioned model — you are the service producer and dbt is the consumer. **Setup guide:** [Configuring AWS PrivateLink for self-hosted services](https://docs.getdbt.com/docs/cloud/secure/private-connectivity/aws/aws-self-hosted.md) | Service | MT | ST | | ------------------------ | -- | -- | | GitHub Enterprise Server | ✅ | ✅ | | GitLab Self-Managed | ✅ | ✅ | | Bitbucket Data Center | ✅ | ✅ | | Azure DevOps Server | ✅ | ✅ | | Postgres | ✅ | ✅ | | Spark | ✅ | ✅ | | Starburst / Trino | ✅ | ✅ | | Teradata (self-hosted) | ✅ | ✅ | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | If you have questions about whether your specific architecture is supported, [contact dbt Support](https://docs.getdbt.com/community/resources/getting-help.md#dbt-cloud-support). #### Cross-region private connections[​](#cross-region-private-connections "Direct link to Cross-region private connections") dbt Labs has globally connected private networks specifically used to host private endpoints, which are connected to dbt instance environments. This connectivity allows dbt environments to connect to any supported region from any dbt instance within the same cloud provider network. To ensure security, access to these endpoints is protected by security groups, network policies, and application connection safeguards, in addition to the authentication and authorization mechanisms provided by each of the connected platforms. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ##### Azure private connectivity Available to certain Enterprise tiers The private connection feature is available on the following dbt Enterprise tiers: * Business Critical * Virtual Private To learn more about these tiers, contact us at . Azure Private Link enables secure, private connectivity between dbt and your Azure-hosted services. With Private Link, traffic between dbt and your data platforms or self-hosted services stays within the Azure network and does not traverse the public internet. For more details, refer to the [Azure Private Link documentation](https://learn.microsoft.com/en-us/azure/private-link/). #### Azure private connectivity matrix[​](#azure-private-connectivity-matrix "Direct link to Azure private connectivity matrix") The following charts outline private connectivity options for Azure deployments of dbt ([multi-tenant and single-tenant](https://docs.getdbt.com/docs/cloud/about-cloud/tenancy.md)). **Legend:** * ✅ = Available * ❌ = Not currently available *Tenancy:* MT (multi-tenant) and ST (single-tenant) — [learn more about tenancy](https://docs.getdbt.com/docs/cloud/about-cloud/tenancy.md). About the following matrix tables These tables indicate whether private connectivity can be established to specific services, considering major factors such as the network and basic auth layers. dbt has validated these configurations using common deployment patterns and typical use cases. However, individual configurations may vary. If you encounter issues or have questions about your environment, [contact dbt Support](https://docs.getdbt.com/community/resources/getting-help.md#dbt-cloud-support) for guidance. *** ##### Connecting to the dbt platform (Ingress)[​](#connecting-to-the-dbt-platform-ingress "Direct link to Connecting to the dbt platform (Ingress)") Your services can connect to dbt over private connectivity using the dbt-provisioned model. In this case, dbt is the service producer and you are the consumer. | Connectivity type | MT | ST | | ------------------------------ | -- | -- | | Private dbt access | ❌ | ✅ | | Dual access (public + private) | ❌ | ✅ | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | *** ##### Connecting the dbt platform to managed services (Egress)[​](#connecting-the-dbt-platform-to-managed-services-egress "Direct link to Connecting the dbt platform to managed services (Egress)") dbt can establish private connections to managed data platforms and cloud-native services. | Service | MT | ST | Setup guide | | --------------------------------------------- | -- | -- | ------------------------------------------------------------------------------------------------ | | Snowflake | ✅ | ✅ | [View](https://docs.getdbt.com/docs/cloud/secure/private-connectivity/azure/azure-snowflake.md) | |   Snowflake Internal Stage | ✅ | ✅ | [View](https://docs.getdbt.com/docs/cloud/secure/private-connectivity/azure/azure-snowflake.md) | | Databricks | ✅ | ✅ | [View](https://docs.getdbt.com/docs/cloud/secure/private-connectivity/azure/azure-databricks.md) | | Azure Database for PostgreSQL Flexible Server | ✅ | ✅ | [View](https://docs.getdbt.com/docs/cloud/secure/private-connectivity/azure/azure-postgres.md) | | Azure Synapse | ✅ | ✅ | [View](https://docs.getdbt.com/docs/cloud/secure/private-connectivity/azure/azure-synapse.md) | | Azure Fabric | ❌ | ❌ | | | Teradata VantageCloud | ✅ | ✅ | | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | *** ##### Connecting the dbt platform to self-hosted services (Egress)[​](#connecting-the-dbt-platform-to-self-hosted-services-egress "Direct link to Connecting the dbt platform to self-hosted services (Egress)") All of the services below share a common Private Link setup guide — backend configuration varies by service. Self-hosted connections use the customer-provisioned model — you are the service producer and dbt is the consumer. **Setup guide:** [Configuring Azure Private Link for self-hosted services](https://docs.getdbt.com/docs/cloud/secure/private-connectivity/azure/azure-self-hosted.md) | Service | MT | ST | | ------------------------ | -- | -- | | GitHub Enterprise Server | ✅ | ✅ | | GitLab Self-Managed | ✅ | ✅ | | Bitbucket Data Center | ✅ | ✅ | | Azure DevOps Server | ✅ | ✅ | | Postgres | ✅ | ✅ | | Starburst / Trino | ✅ | ✅ | | Teradata (self-hosted) | ✅ | ✅ | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | If you have questions about whether your specific architecture is supported, [contact dbt Support](https://docs.getdbt.com/community/resources/getting-help.md#dbt-cloud-support). #### Cross-region private connections[​](#cross-region-private-connections "Direct link to Cross-region private connections") dbt Labs has globally connected private networks specifically used to host private endpoints, which are connected to dbt instance environments. This connectivity allows dbt environments to connect to any supported region from any dbt instance within the same cloud provider network. To ensure security, access to these endpoints is protected by security groups, network policies, and application connection safeguards, in addition to the authentication and authorization mechanisms provided by each of the connected platforms. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ##### Configure AWS PrivateLink for Postgres Enterprise + ### Configure AWS PrivateLink for Postgres [Enterprise +](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing") Available to certain Enterprise tiers The private connection feature is available on the following dbt Enterprise tiers: * Business Critical * Virtual Private To learn more about these tiers, contact us at . A Postgres database, hosted either in AWS or in a properly connected on-prem data center, can be accessed through a private network connection using AWS Interface-type PrivateLink. The type of Target Group connected to the Network Load Balancer (NLB) may vary based on the location and type of Postgres instance being connected, as explained in the following steps. Private connection endpoints can't connect across cloud providers (AWS, Azure, and GCP). For a private connection to work, both dbt and the server (like Postgres) must be hosted on the same cloud provider. For example, dbt hosted on AWS cannot connect to services hosted on Azure, and dbt hosted on Azure can’t connect to services hosted on GCP. #### Configuring Postgres interface-type PrivateLink[​](#configuring-postgres-interface-type-privatelink "Direct link to Configuring Postgres interface-type PrivateLink") ##### 1. Provision AWS resources[​](#1-provision-aws-resources "Direct link to 1. Provision AWS resources") Creating an Interface VPC PrivateLink connection requires creating multiple AWS resources in the account containing, or connected to, the Postgres instance: * **Security Group (AWS hosted only)** — If you are connecting to an existing Postgres instance, this likely already exists, however, you may need to add or modify Security Group rules to accept traffic from the Network Load Balancer (NLB) created for this Endpoint Service. * **Target Group** — The Target Group will be attached to the NLB to tell it where to route requests. There are various target types available for NLB Target Groups, so choose the one appropriate for your Postgres setup. * Target Type: * *[Amazon RDS for PostgreSQL](https://aws.amazon.com/rds/postgresql/)* - **IP** * Find the IP address of your RDS instance using a command line tool such as `nslookup ` or `dig +short ` with your RDS DNS endpoint * *Note*: With RDS Multi-AZ failover capabilities the IP address of your RDS instance can change, at which point your Target Group would need to be updated. See [this AWS blog post](https://aws.amazon.com/blogs/database/access-amazon-rds-across-vpcs-using-aws-privatelink-and-network-load-balancer/) for more details and a possible solution. * *On-prem Postgres server* - **IP** * Use the IP address of the on-prem Postgres server linked to AWS through AWS Direct Connect or a Site-to-Site VPN connection * *Postgres on EC2* - **Instance/ASG** (or **IP**) * If your Postgres instance is hosted on EC2 the *instance* Target Group type (or ideally [using the instance type to connect to an auto-scaling group](https://docs.aws.amazon.com/autoscaling/ec2/userguide/attach-load-balancer-asg.html)) can be used to attach the instance without needing a static IP address * The IP type can also be used, with the understanding that the IP of the EC2 instance can change if the instance is relaunched for any reason * Target Group protocol: **TCP** * **Network Load Balancer (NLB)** — Requires creating a Listener that attaches to the newly created Target Group for port `5432` * **Scheme:** Internal * **IP address type:** IPv4 * **Network mapping:** Choose the VPC that the VPC Endpoint Service and NLB are being deployed in, and choose subnets from at least two Availability Zones. * **Security Groups:** The Network Load Balancer (NLB) associated with the VPC endpoint service must either not have an associated security group, or the security group must have a rule that allows requests from the appropriate dbt **private CIDR(s)**. Note that *this is different* than the static public IPs listed on the dbt [Access, Regions, & IP addresses](https://docs.getdbt.com/docs/cloud/about-cloud/access-regions-ip-addresses.md) page. dbt Support can provide the correct private CIDR(s) upon request. If necessary, until you can refine the rule to the smaller CIDR provided by dbt, allow connectivity by temporarily adding an allow rule of `10.0.0.0/8`. * **Listeners:** Create one listener per target group that maps the appropriate incoming port to the corresponding target group ([details](https://docs.aws.amazon.com/elasticloadbalancing/latest/network/load-balancer-listeners.html)). * **VPC Endpoint Service** — Attach to the newly created NLB. * Acceptance required (optional) — Requires you to [accept our connection request](https://docs.aws.amazon.com/vpc/latest/privatelink/configure-endpoint-service.html#accept-reject-connection-requests) after dbt creates the endpoint. Cross-Zone Load Balancing We highly recommend cross-zone load balancing for your NLB or Target Group; some connections may require it. Cross-zone load balancing may also [improve routing distribution and connection resiliency](https://docs.aws.amazon.com/elasticloadbalancing/latest/userguide/how-elastic-load-balancing-works.html#cross-zone-load-balancing). Note that cross-zone connectivity may incur additional data transfer charges, though this should be minimal for requests from dbt. * [Enabling cross-zone load balancing for a load balancer or target group](https://docs.aws.amazon.com/elasticloadbalancing/latest/network/edit-target-group-attributes.html#target-group-cross-zone) ##### 2. Grant dbt AWS account access to the VPC endpoint service[​](#2-grant-dbt-aws-account-access-to-the-vpc-endpoint-service "Direct link to 2. Grant dbt AWS account access to the VPC endpoint service") On the provisioned VPC endpoint service, click the **Allow principals** tab. Click **Allow principals** to grant access. Enter the ARN of the root user in the appropriate production AWS account and save your changes. * Principal: `arn:aws:iam::346425330055:role/MTPL_Admin` [![Enter ARN](/img/docs/dbt-cloud/privatelink-allow-principals.png?v=2 "Enter ARN")](#)Enter ARN ##### 3. Obtain VPC endpoint service name[​](#3-obtain-vpc-endpoint-service-name "Direct link to 3. Obtain VPC endpoint service name") Once the VPC Endpoint Service is provisioned, you can find the service name in the AWS console by navigating to **VPC** → **Endpoint Services** and selecting the appropriate endpoint service. You can copy the service name field value and include it in your communication to dbt support. [![Get service name field value](/img/docs/dbt-cloud/privatelink-endpoint-service-name.png?v=2 "Get service name field value")](#)Get service name field value ##### 4. Submit your request to dbt Support[​](#4-submit-your-request-to-dbt-support "Direct link to 4. Submit your request to dbt Support") Add the required information to the template below and submit your request to [dbt Support](https://docs.getdbt.com/community/resources/getting-help.md#dbt-cloud-support): ```text Subject: New Multi-Tenant PrivateLink Request - Type: Postgres Interface-type - VPC Endpoint Service Name: - Postgres server AWS Region (for example, us-east-1, eu-west-2): - dbt AWS multi-tenant environment (US, EMEA, AU): ``` dbt Labs will work on your behalf to complete the private connection setup. Please allow 3-5 business days for this process to complete. Support will contact you when the endpoint is available. ##### 5. Accept the connection request[​](#5-accept-the-connection-request "Direct link to 5. Accept the connection request") When you receive notification that the resources are provisioned within the dbt environment, you must accept the endpoint connection (unless the VPC endpoint service is set to auto-accept connection requests). You can accept requests through the AWS console, as shown below, or through the AWS CLI. [![Accept the connection request](/img/docs/dbt-cloud/cloud-configuring-dbt-cloud/accept-request.png?v=2 "Accept the connection request")](#)Accept the connection request #### Create connection in dbt[​](#create-connection-in-dbt "Direct link to Create connection in dbt") Once dbt Support completes the configuration, you can start creating new connections using PrivateLink. 1. Navigate to **Settings** → **Create new project** → select **PostgreSQL**. 2. You will see two radio buttons: **Public** and **Private**. Select **Private**. 3. Select the private endpoint from the dropdown (this automatically populates the hostname/account field). 4. Configure the remaining data platform details. 5. Test your connection and save it. #### Troubleshooting[​](#troubleshooting "Direct link to Troubleshooting") If the PrivateLink endpoint has been provisioned and configured in dbt but connectivity is still failing, check the following in your networking setup to ensure requests and responses can be successfully routed between dbt and the backing service. ##### Configuration[​](#configuration "Direct link to Configuration") Start with the configuration:  1. NLB Security Group The Network Load Balancer (NLB) associated with the VPC Endpoint Service must either not have an associated Security Group or the Security Group must have a rule that allows requests from the appropriate dbt *private CIDR(s)*. Note that this differs from the static public IPs listed on the dbt Connection page. dbt Support can provide the correct private CIDR(s) upon request. * **Note**\*: To test if this is the issue, temporarily adding an allow rule of `10.0.0.0/8` should allow connectivity until the rule can be refined to a smaller CIDR  2. NLB Listener and Target Group Check that there is a Listener connected to the NLB that matches the port that dbt is trying to connect to. This Listener must have a configured action to forward to a Target Group with targets that point to your backing service. At least one (but preferably all) of these targets must be **Healthy**. Unhealthy targets could suggest that the backing service is, in fact, unhealthy or that the service is protected by a Security Group that doesn't allow requests from the NLB.  3. Cross-zone Load Balancing Check that *Cross-zone load balancing* is enabled for your NLB (check the **Attributes** tab of the NLB in the AWS console). If this is disabled, and the zones that dbt is connected to are misaligned with the zones where the service is running, requests may not be able to be routed correctly. Enabling cross-zone load balancing will also make the connection more resilient in the case of a failover in a zone outage scenario. Cross-zone connectivity may incur additional data transfer charges, though this should be minimal for requests from dbt.  4. Routing tables and ACLs If all the above check out, it may be possible that requests are not routing correctly within the private network. This could be due to a misconfiguration in the VPCs routing tables or access control lists. Review these settings with your network administrator to ensure that requests can be routed from the VPC Endpoint Service to the backing service and that the response can be returned to the VPC Endpoint Service. One way to test this is to create a VPC endpoint in another VPC in your network to test that connectivity is working independent of dbt's connection. ##### Monitoring[​](#monitoring "Direct link to Monitoring") To help isolate connection issues over a PrivateLink connection from dbt, there are a few monitoring sources that can be used to verify request activity. Requests must first be sent to the endpoint to see anything in the monitoring. Contact dbt Support to understand when connection testing occurred or request new connection attempts. Use these times to correlate with activity in the following monitoring sources.  VPC Endpoint Service Monitoring In the AWS Console, navigate to VPC -> Endpoint Services. Select the Endpoint Service being tested and click the **Monitoring** tab. Update the time selection to include when test connection attempts were sent. If there is activity in the *New connections* and *Bytes processed* graphs, then requests have been received by the Endpoint Service, suggesting that the dbt endpoint is routing properly.  NLB Monitoring In the AWS Console, navigate to EC2 -> Load Balancers. Select the Network Load Balancer (NLB) being tested and click the **Monitoring** tab. Update the time selection to include when test connection attempts were sent. If there is activity in the *New flow count* and *Processed bytes* graphs, then requests have been received by the NLB from the Endpoint Service, suggesting the NLB Listener, Target Group, and Security Group are correctly configured.  VPC Flow Logs VPC Flow Logs can provide various helpful information for requests being routed through your VPCs, though they can sometimes be challenging to locate and interpret. Flow logs can be written to either S3 or CloudWatch Logs, so determine the availability of these logs for your VPC and query them accordingly. Flow logs record the Elastic Network Interface (ENI) ID, source and destination IP and port, and whether the request was accepted or rejected by the security group and/or network ACL. This can be useful in understanding if a request arrived at a certain network interface and whether that request was accepted, potentially illuminating overly restrictive rules. For more information on accessing and interpreting VPC Flow Logs, see the related [AWS documentation](https://docs.aws.amazon.com/vpc/latest/userguide/flow-logs.html). #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ##### Configure AWS PrivateLink for Redshift Enterprise + ### Configure AWS PrivateLink for Redshift [Enterprise +](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing") Available to certain Enterprise tiers The private connection feature is available on the following dbt Enterprise tiers: * Business Critical * Virtual Private To learn more about these tiers, contact us at . AWS provides two different ways to create a PrivateLink VPC endpoint for a Redshift cluster that is running in another VPC: * [Redshift-managed PrivateLink Endpoints](https://docs.aws.amazon.com/redshift/latest/mgmt/managing-cluster-cross-vpc.html) * [Redshift Interface-type PrivateLink Endpoints](https://docs.aws.amazon.com/redshift/latest/mgmt/security-private-link.html) dbt supports both types of endpoints, but there are several [considerations](https://docs.aws.amazon.com/redshift/latest/mgmt/managing-cluster-cross-vpc.html#managing-cluster-cross-vpc-considerations) to take into account when deciding which endpoint type to use. Redshift-managed provides a simpler setup with no additional cost, which might make it the preferred option for many, but may not be an option in all environments. Based on these criteria, determine which type is right for your system. Follow the instructions from the section below that corresponds to your chosen endpoint type. Private connection endpoints can't connect across cloud providers (AWS, Azure, and GCP). For a private connection to work, both dbt and the server (like Redshift) must be hosted on the same cloud provider. For example, dbt hosted on AWS cannot connect to services hosted on Azure, and dbt hosted on Azure can’t connect to services hosted on GCP. #### Configuring Redshift-managed PrivateLink[​](#configuring-redshift-managed-privatelink "Direct link to Configuring Redshift-managed PrivateLink") 1. Locate the **Granted accounts** section of the Redshift configuration * **Standard Redshift** * On the running Redshift cluster, select the **Properties** tab. [![Redshift Properties tab](/img/docs/dbt-cloud/redshiftprivatelink1.png?v=2 "Redshift Properties tab")](#)Redshift Properties tab * **Redshift Serverless** * On the Redshift Serverless **Workgroup configuration** page. 2. In the **Granted accounts** section, click **Grant access**. [![Redshift granted accounts](/img/docs/dbt-cloud/redshiftprivatelink2.png?v=2 "Redshift granted accounts")](#)Redshift granted accounts 3. Enter the AWS account ID: `346425330055` - *NOTE: This account ID only applies to dbt Multi-Tenant environments. For Virtual Private/Single-Tenant account IDs please contact [Support](https://docs.getdbt.com/community/resources/getting-help.md#dbt-cloud-support).* 4. Choose **Grant access to all VPCs** —or— (optional) contact [Support](https://docs.getdbt.com/community/resources/getting-help.md#dbt-cloud-support) for the appropriate regional VPC ID to designate in the **Grant access to specific VPCs** field. [![Redshift grant access](/img/docs/dbt-cloud/redshiftprivatelink3.png?v=2 "Redshift grant access")](#)Redshift grant access 5. Add the required information to the following template, and submit your request to [dbt Support](https://docs.getdbt.com/community/resources/getting-help.md#dbt-cloud-support): * **Standard Redshift** ```text Subject: New Multi-Tenant PrivateLink Request - Type: Redshift-managed - Redshift cluster name: - Redshift cluster AWS account ID: - Redshift cluster AWS Region (for example, us-east-1, eu-west-2): - multi-tenant environment (US, EMEA, AU): ``` * **Redshift Serverless** ```text Subject: New Multi-Tenant PrivateLink Request - Type: Redshift-managed - Serverless - Redshift workgroup name: - Redshift workgroup AWS account ID: - Redshift workgroup AWS Region (for example, us-east-1, eu-west-2): - multi-tenant environment (US, EMEA, AU): ``` dbt Labs will work on your behalf to complete the private connection setup. Please allow 3-5 business days for this process to complete. Support will contact you when the endpoint is available. #### Configuring Redshift interface-type PrivateLink[​](#configuring-redshift-interface-type-privatelink "Direct link to Configuring Redshift interface-type PrivateLink") ##### 1. Provision AWS resources[​](#1-provision-aws-resources "Direct link to 1. Provision AWS resources") Creating an Interface VPC PrivateLink connection requires creating multiple AWS resources in the account containing the Redshift cluster: * **Security Group** — If you are connecting to an existing Redshift cluster, this likely already exists, however, you may need to add or modify Security Group rules to accept traffic from the Network Load Balancer (NLB) created for this Endpoint Service. * **Target Group** — The Target Group will be attached to the NLB to tell it where to route requests. There are various target types available for NLB Target Groups, but you will use the IP address type. * Target Type: **IP** * **Standard Redshift** * Use IP addresses from the Redshift cluster’s **Network Interfaces** whenever possible. While IPs listed in the **Node IP addresses** section will work, they are also more likely to change. [![Target type: IP address](/img/docs/dbt-cloud/redshiftprivatelink4.png?v=2 "Target type: IP address")](#)Target type: IP address * There will likely be only one Network Interface (NI) to start, but if the cluster fails over to another availability zone (AZ), a new NI will also be created for that AZ. The NI IP from the original AZ will still work, but the new NI IP can also be added to the Target Group. If adding additional IPs, note that the NLB will also need to add the corresponding AZ. Once created, the NI(s) should stay the same (This is our observation from testing, but AWS does not officially document it). * **Redshift Serverless** * To find the IP addresses for Redshift Serverless instance locate and copy the endpoint (only the URL listed before the port) in the Workgroup configuration section of the AWS console for the instance. [![Redshift Serverless endpoint](/img/docs/dbt-cloud/redshiftserverless.png?v=2 "Redshift Serverless endpoint")](#)Redshift Serverless endpoint * From a command line run the command `nslookup ` using the endpoint found in the previous step and use the associated IP(s) for the Target Group. * Target Group protocol: **TCP** * **Network Load Balancer (NLB)** — Requires creating a Listener that attaches to the newly created Target Group (port `5439`is the default) * **Scheme:** Internal * **IP address type:** IPv4 * **Network mapping:** Choose the VPC that the VPC Endpoint Service and NLB are being deployed in, and choose subnets from at least two Availability Zones. * **Security Groups:** The Network Load Balancer (NLB) associated with the VPC endpoint service must either not have an associated security group, or the security group must have a rule that allows requests from the appropriate dbt **private CIDR(s)**. Note that *this is different* than the static public IPs listed on the dbt [Access, Regions, & IP addresses](https://docs.getdbt.com/docs/cloud/about-cloud/access-regions-ip-addresses.md) page. dbt Support can provide the correct private CIDR(s) upon request. If necessary, until you can refine the rule to the smaller CIDR provided by dbt, allow connectivity by temporarily adding an allow rule of `10.0.0.0/8`. * **Listeners:** Create one listener per target group that maps the appropriate incoming port to the corresponding target group ([details](https://docs.aws.amazon.com/elasticloadbalancing/latest/network/load-balancer-listeners.html)). * **VPC Endpoint Service** — Attach to the newly created NLB. * Acceptance required (optional) — Requires you to [accept our connection request](https://docs.aws.amazon.com/vpc/latest/privatelink/configure-endpoint-service.html#accept-reject-connection-requests) after dbt creates the endpoint. Cross-Zone Load Balancing We highly recommend cross-zone load balancing for your NLB or Target Group; some connections may require it. Cross-zone load balancing may also [improve routing distribution and connection resiliency](https://docs.aws.amazon.com/elasticloadbalancing/latest/userguide/how-elastic-load-balancing-works.html#cross-zone-load-balancing). Note that cross-zone connectivity may incur additional data transfer charges, though this should be minimal for requests from dbt. * [Enabling cross-zone load balancing for a load balancer or target group](https://docs.aws.amazon.com/elasticloadbalancing/latest/network/edit-target-group-attributes.html#target-group-cross-zone) ##### 2. Grant dbt AWS account access to the VPC endpoint service[​](#2-grant-dbt-aws-account-access-to-the-vpc-endpoint-service "Direct link to 2. Grant dbt AWS account access to the VPC endpoint service") On the provisioned VPC endpoint service, click the **Allow principals** tab. Click **Allow principals** to grant access. Enter the ARN of the root user in the appropriate production AWS account and save your changes. * Principal: `arn:aws:iam::346425330055:role/MTPL_Admin` [![Enter ARN](/img/docs/dbt-cloud/privatelink-allow-principals.png?v=2 "Enter ARN")](#)Enter ARN ##### 3. Obtain VPC endpoint service name[​](#3-obtain-vpc-endpoint-service-name "Direct link to 3. Obtain VPC endpoint service name") Once the VPC Endpoint Service is provisioned, you can find the service name in the AWS console by navigating to **VPC** → **Endpoint Services** and selecting the appropriate endpoint service. You can copy the service name field value and include it in your communication to dbt support. [![Get service name field value](/img/docs/dbt-cloud/privatelink-endpoint-service-name.png?v=2 "Get service name field value")](#)Get service name field value ##### 4. Submit your request to dbt Support[​](#4-submit-your-request-to-dbt-support "Direct link to 4. Submit your request to dbt Support") Add the required information to the template below and submit your request to [dbt Support](https://docs.getdbt.com/community/resources/getting-help.md#dbt-cloud-support): ```text Subject: New Multi-Tenant PrivateLink Request - Type: Redshift Interface-type - VPC Endpoint Service Name: - Redshift cluster AWS Region (for example, us-east-1, eu-west-2): - dbt AWS multi-tenant environment (US, EMEA, AU): ``` dbt Labs will work on your behalf to complete the private connection setup. Please allow 3-5 business days for this process to complete. Support will contact you when the endpoint is available. #### Create connection in dbt[​](#create-connection-in-dbt "Direct link to Create connection in dbt") Once dbt Support completes the configuration, you can start creating new connections using PrivateLink. 1. Navigate to **Settings** → **Create new project** → select **Redshift**. 2. You will see two radio buttons: **Public** and **Private**. Select **Private**. 3. Select the private endpoint from the dropdown (this automatically populates the hostname/account field). 4. Configure the remaining data platform details. 5. Test your connection and save it. #### Troubleshooting[​](#troubleshooting "Direct link to Troubleshooting") If the PrivateLink endpoint has been provisioned and configured in dbt but connectivity is still failing, check the following in your networking setup to ensure requests and responses can be successfully routed between dbt and the backing service. ##### Configuration[​](#configuration "Direct link to Configuration") Start with the configuration:  1. NLB Security Group The Network Load Balancer (NLB) associated with the VPC Endpoint Service must either not have an associated Security Group or the Security Group must have a rule that allows requests from the appropriate dbt *private CIDR(s)*. Note that this differs from the static public IPs listed on the dbt Connection page. dbt Support can provide the correct private CIDR(s) upon request. * **Note**\*: To test if this is the issue, temporarily adding an allow rule of `10.0.0.0/8` should allow connectivity until the rule can be refined to a smaller CIDR  2. NLB Listener and Target Group Check that there is a Listener connected to the NLB that matches the port that dbt is trying to connect to. This Listener must have a configured action to forward to a Target Group with targets that point to your backing service. At least one (but preferably all) of these targets must be **Healthy**. Unhealthy targets could suggest that the backing service is, in fact, unhealthy or that the service is protected by a Security Group that doesn't allow requests from the NLB.  3. Cross-zone Load Balancing Check that *Cross-zone load balancing* is enabled for your NLB (check the **Attributes** tab of the NLB in the AWS console). If this is disabled, and the zones that dbt is connected to are misaligned with the zones where the service is running, requests may not be able to be routed correctly. Enabling cross-zone load balancing will also make the connection more resilient in the case of a failover in a zone outage scenario. Cross-zone connectivity may incur additional data transfer charges, though this should be minimal for requests from dbt.  4. Routing tables and ACLs If all the above check out, it may be possible that requests are not routing correctly within the private network. This could be due to a misconfiguration in the VPCs routing tables or access control lists. Review these settings with your network administrator to ensure that requests can be routed from the VPC Endpoint Service to the backing service and that the response can be returned to the VPC Endpoint Service. One way to test this is to create a VPC endpoint in another VPC in your network to test that connectivity is working independent of dbt's connection. ##### Monitoring[​](#monitoring "Direct link to Monitoring") To help isolate connection issues over a PrivateLink connection from dbt, there are a few monitoring sources that can be used to verify request activity. Requests must first be sent to the endpoint to see anything in the monitoring. Contact dbt Support to understand when connection testing occurred or request new connection attempts. Use these times to correlate with activity in the following monitoring sources.  VPC Endpoint Service Monitoring In the AWS Console, navigate to VPC -> Endpoint Services. Select the Endpoint Service being tested and click the **Monitoring** tab. Update the time selection to include when test connection attempts were sent. If there is activity in the *New connections* and *Bytes processed* graphs, then requests have been received by the Endpoint Service, suggesting that the dbt endpoint is routing properly.  NLB Monitoring In the AWS Console, navigate to EC2 -> Load Balancers. Select the Network Load Balancer (NLB) being tested and click the **Monitoring** tab. Update the time selection to include when test connection attempts were sent. If there is activity in the *New flow count* and *Processed bytes* graphs, then requests have been received by the NLB from the Endpoint Service, suggesting the NLB Listener, Target Group, and Security Group are correctly configured.  VPC Flow Logs VPC Flow Logs can provide various helpful information for requests being routed through your VPCs, though they can sometimes be challenging to locate and interpret. Flow logs can be written to either S3 or CloudWatch Logs, so determine the availability of these logs for your VPC and query them accordingly. Flow logs record the Elastic Network Interface (ENI) ID, source and destination IP and port, and whether the request was accepted or rejected by the security group and/or network ACL. This can be useful in understanding if a request arrived at a certain network interface and whether that request was accepted, potentially illuminating overly restrictive rules. For more information on accessing and interpreting VPC Flow Logs, see the related [AWS documentation](https://docs.aws.amazon.com/vpc/latest/userguide/flow-logs.html). #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ##### Configuring AWS PrivateLink for a self-hosted service Enterprise + ### Configuring AWS PrivateLink for a self-hosted service [Enterprise +](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing") Available to certain Enterprise tiers The private connection feature is available on the following dbt Enterprise tiers: * Business Critical * Virtual Private To learn more about these tiers, contact us at . AWS PrivateLink enables secure, private connectivity between dbt and your self-hosted services. These services may include version control systems (VCS), data warehouses, or any other applications you manage. With PrivateLink, you do not need to expose your service to the public internet. All communication occurs over a private network, significantly enhancing security. For more details, refer to the [AWS PrivateLink documentation](https://docs.aws.amazon.com/vpc/latest/privatelink/). #### What this guide covers[​](#what-this-guide-covers "Direct link to What this guide covers") The focus of this guide is not on any particular service or backend architecture, but on the [Endpoint Service](#terminology) that interconnects dbt with your self-hosted service. This process should be standard across most use cases. [![The scope of this guide](/img/docs/dbt-cloud/aws-self-hosted-privatelink/scope-of-guide.png?v=2 "The scope of this guide")](#)The scope of this guide Out of scope This guide does not cover the configuration or troubleshooting of your self-hosted service, load balancer, or target group health, due to the virtually limitless ways these environments can be configured. While dbt Support may assist with such issues on a best-effort basis, we recommend engaging [AWS Support](https://aws.amazon.com/support/) to expedite resolution. #### Audience[​](#audience "Direct link to Audience") This guide is intended for cloud network administrators or engineers responsible for configuring and maintaining secure network communications within your organization's AWS environment. #### Terminology[​](#terminology "Direct link to Terminology") This guide uses several important terms related to AWS PrivateLink. Understanding these definitions will help ensure successful implementation. For a more detailed explanation of these concepts, refer to the [AWS PrivateLink documentation](https://docs.aws.amazon.com/vpc/latest/privatelink/privatelink-share-your-services.html). * **Consumer:** In this context, the Consumer is dbt, which creates a VPC Endpoint to connect to your Endpoint Service. * **Service provider:** Your organization, which owns and operates the service behind the Network Load Balancer and creates the Endpoint Service. * **Endpoint Service:** The AWS resource that exposes your service to consumers, allowing them to create VPC Endpoints to access it. This is tied to a Network Load Balancer. * **Service Name:** A globally unique identifier for your Endpoint Service (format: `com.amazonaws.vpce.region.vpce-svc-xxx`). You share this with dbt Support to establish the connection. * **Network Load Balancer (NLB):** The required load balancer type (internal) that sits in front of your service. Your application must run behind an NLB to use PrivateLink. * **Target Group:** Routes traffic from the NLB to your service instances (EC2, IP addresses, or ALB). #### Prerequisites[​](#prerequisites "Direct link to Prerequisites") Before you begin, make sure to review the following requirements: 1. **Supported Load Balancer Types** dbt has officially validated PrivateLink functionality with the following load balancer type: * Network Load Balancer (Internal) > While other configurations may be compatible with AWS PrivateLink, this guide assumes your service is configured behind an Internal Network Load Balancer. For more details, see the [AWS Network Load Balancer documentation](https://docs.aws.amazon.com/elasticloadbalancing/latest/network/introduction.html). 2. **Service Health** * Confirm that your service or application is operational and healthy behind the designated load balancer before proceeding. 3. **dbt AWS Account ARN** * Contact [dbt Support](https://docs.getdbt.com/community/resources/getting-help.md#dbt-cloud-support) to obtain the dbt AWS account ARN. You will need this in order to allow dbt to connect to your Endpoint Service. #### Additional NLB configuration[​](#additional-nlb-configuration "Direct link to Additional NLB configuration") The following settings are optional but recommended when configuring your Network Load Balancer for PrivateLink connectivity with dbt. ##### Cross-zone load balancing[​](#cross-zone-load-balancing "Direct link to Cross-zone load balancing") Enable [cross-zone load balancing](https://docs.aws.amazon.com/elasticloadbalancing/latest/network/network-load-balancers.html#cross-zone-load-balancing) on your NLB to avoid availability zone mismatches between your service and dbt's VPC endpoint. This ensures traffic is distributed evenly across all healthy targets, regardless of which availability zone the request originates from. ##### Security group configuration[​](#security-group-configuration "Direct link to Security group configuration") If your NLB has an associated [security group](https://docs.aws.amazon.com/elasticloadbalancing/latest/network/load-balancer-security-groups.html), you need to ensure PrivateLink traffic from dbt is allowed. By default, when a security group is associated with an NLB, inbound rules are enforced on all traffic — including PrivateLink traffic. You have two options: | Option | Description | | ------------------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | **Disable enforcement** (recommended) | Turn off security group enforcement for PrivateLink traffic. This is the simplest approach and doesn't require knowledge of dbt's internal CIDRs. In the AWS Console: NLB → Security → Edit → Clear **Enforce inbound rules on PrivateLink traffic**. | | **Add dbt CIDRs to inbound rules** | If your use case requires security group enforcement on PrivateLink traffic, [contact dbt Support](https://docs.getdbt.com/community/resources/getting-help.md#dbt-cloud-support) to obtain the internal CIDR ranges to add to your NLB's security group inbound rules. | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | For more details, see [Update the security groups for your Network Load Balancer](https://docs.aws.amazon.com/elasticloadbalancing/latest/network/load-balancer-security-groups.html). #### Instructions[​](#instructions "Direct link to Instructions") 1. Log in to the [AWS Console](https://console.aws.amazon.com). 2. Navigate to the AWS Account and Region where your self-hosted service is located. ##### Create a VPC endpoint service[​](#create-a-vpc-endpoint-service "Direct link to Create a VPC endpoint service") 3. In the AWS Console, navigate to **VPC** → **Endpoint Services** → **Create Endpoint Service** 4. In the Create endpoint service page: a. **Load balancer type:** Select **Network** b. **Available load balancers:** Select the NLB in front of your service c. **Acceptance required:** Enable this option (recommended) to manually approve connection requests d. Click **Create** ##### Grant dbt access to the endpoint service[​](#grant-dbt-access-to-the-endpoint-service "Direct link to Grant dbt access to the endpoint service") 5. After the Endpoint Service is created, select it and go to the **Allow principals** tab 6. Click **Allow principals** and add the dbt AWS account ARN that you obtained from support: * Principal: `arn:aws:iam:::root` ##### Obtain the endpoint service name[​](#obtain-the-endpoint-service-name "Direct link to Obtain the endpoint service name") 7. On the Endpoint Service details page, copy the **Service name** value (format: `com.amazonaws.vpce.region.vpce-svc-xxx`) [![Copy the Endpoint Service name](/img/docs/dbt-cloud/aws-self-hosted-privatelink/obtain-endpoint-svc-name.png?v=2 "Copy the Endpoint Service name")](#)Copy the Endpoint Service name ##### Providing dbt Support with connection details[​](#providing-dbt-support-with-connection-details "Direct link to Providing dbt Support with connection details") 8. Add the required information to the template below, and submit your request to [dbt Support](https://docs.getdbt.com/community/resources/getting-help.md#dbt-cloud-support): ```text Subject: New AWS Self-hosted PrivateLink Request - Type: Self-hosted PrivateLink - Platform/Service: (for example, Postgres, Starburst, Spark, GitLab, etc.) - VPC Endpoint Service Name: - Custom DNS (if HTTPS/TLS) - DNS record: - Service Region: (for example, us-east-1, eu-west-2) - dbt AWS environment (US, EMEA, AU): ``` dbt Labs will work on your behalf to complete the private connection setup. Please allow 3-5 business days for this process to complete. Support will contact you when the endpoint is available. #### Troubleshooting[​](#troubleshooting "Direct link to Troubleshooting") If the PrivateLink endpoint has been provisioned and configured in dbt but connectivity is still failing, check the following in your networking setup to ensure requests and responses can be successfully routed between dbt and your service. ##### Configuration checklist[​](#configuration-checklist "Direct link to Configuration checklist") 1. **NLB security group** The Network Load Balancer (NLB) associated with the VPC Endpoint Service must either not have an associated security group, or the security group must have a rule that allows requests from dbt's private CIDR(s). See [Security group configuration](#security-group-configuration) for details. Testing tip To test if this is the issue, temporarily adding an allow rule of `10.0.0.0/8` should allow connectivity until the rule can be refined to the dbt-provided CIDR. 2. **NLB listener and target group** Check that there is a Listener connected to the NLB that matches the port that dbt is trying to connect to. This Listener must have a configured action to forward to a Target Group with targets that point to your service. At least one (but preferably all) of these targets must be **Healthy**. Unhealthy targets could suggest that the service is down or that the service is protected by a security group that doesn't allow requests from the NLB. 3. **Cross-zone load balancing** Check that cross-zone load balancing is enabled for your NLB (check the **Attributes** tab of the NLB in the AWS console). If this is disabled, and the zones that dbt is connected to are misaligned with the zones where the service is running, requests may not be able to be routed correctly. See [Cross-zone load balancing](#cross-zone-load-balancing) for details. 4. **Routing tables and ACLs** If all the above check out, it may be possible that requests are not routing correctly within the private network. This could be due to a misconfiguration in the VPC's routing tables or access control lists. Review these settings with your network administrator to ensure that requests can be routed from the VPC Endpoint Service to the service and that the response can be returned to the VPC Endpoint Service. Testing tip One way to test this is to create a VPC endpoint in another VPC in your network to verify that connectivity is working independent of dbt's connection. ##### Monitoring[​](#monitoring "Direct link to Monitoring") To help isolate connection issues over a PrivateLink connection from dbt, there are a few monitoring sources that can be used to verify request activity. Requests must first be sent to the endpoint to see anything in the monitoring. [Contact dbt Support](https://docs.getdbt.com/community/resources/getting-help.md#dbt-cloud-support) to understand when connection testing occurred or request new connection attempts. Use these times to correlate with activity in the following monitoring sources. ###### VPC Endpoint Service monitoring[​](#vpc-endpoint-service-monitoring "Direct link to VPC Endpoint Service monitoring") In the AWS Console, navigate to **VPC** → **Endpoint Services**. Select the Endpoint Service being tested and click the **Monitoring** tab. Update the time selection to include when test connection attempts were sent. If there is activity in the *New connections* and *Bytes processed* graphs, then requests have been received by the Endpoint Service, suggesting that the dbt endpoint is routing properly. ###### NLB monitoring[​](#nlb-monitoring "Direct link to NLB monitoring") In the AWS Console, navigate to **EC2** → **Load Balancers**. Select the Network Load Balancer (NLB) being tested and click the **Monitoring** tab. Update the time selection to include when test connection attempts were sent. If there is activity in the *New flow count* and *Processed bytes* graphs, then requests have been received by the NLB from the Endpoint Service, suggesting the NLB Listener, Target Group, and security group are correctly configured. ###### VPC Flow Logs[​](#vpc-flow-logs "Direct link to VPC Flow Logs") VPC Flow Logs can provide various helpful information for requests being routed through your VPCs, though they can sometimes be challenging to locate and interpret. Flow logs can be written to either S3 or CloudWatch Logs, so determine the availability of these logs for your VPC and query them accordingly. Flow logs record the Elastic Network Interface (ENI) ID, source and destination IP and port, and whether the request was accepted or rejected by the security group and/or network ACL. This can be useful in understanding if a request arrived at a certain network interface and whether that request was accepted, potentially illuminating overly restrictive rules. For more information on accessing and interpreting VPC Flow Logs, see the [AWS documentation](https://docs.aws.amazon.com/vpc/latest/userguide/flow-logs.html). #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ##### Configuring Azure Private Link for a self-hosted service Enterprise + ### Configuring Azure Private Link for a self-hosted service [Enterprise +](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing") Available to certain Enterprise tiers The private connection feature is available on the following dbt Enterprise tiers: * Business Critical * Virtual Private To learn more about these tiers, contact us at . Azure Private Link enables secure, private connectivity between dbt and your self-hosted services. These services may include version control systems (VCS), data warehouses, or any other applications you manage. With Private Link, you do not need to expose your service to the public internet. All communication occurs over a private network, significantly enhancing security. For more details, refer to the Azure [Private Link documentation](https://learn.microsoft.com/en-us/azure/private-link/private-link-overview). #### What this guide covers[​](#what-this-guide-covers "Direct link to What this guide covers") The focus of this guide is not on any particular service or backend architecture, but on the [Private Link Service](#terminology) that interconnects dbt with your self-hosted service. This process should be standard across most use cases. [![The scope of this guide](/img/docs/dbt-cloud/az-self-hosted-privatelink/scope-of-guide.png?v=2 "The scope of this guide")](#)The scope of this guide Out of scope This guide does not cover the configuration or troubleshooting of your self-hosted service, load balancer, or backend pool health, due to the virtually limitless ways these environments can be configured. While dbt Support may assist with such issues on a best-effort basis, we recommend engaging [Azure Support](https://azure.microsoft.com/en-us/support/) to expedite resolution. #### Audience[​](#audience "Direct link to Audience") This guide is intended for cloud network administrators or engineers responsible for configuring and maintaining secure network communications within your organization's Microsoft Azure environment. #### Terminology[​](#terminology "Direct link to Terminology") This guide uses several important terms related to Azure Private Link. Understanding these definitions will help ensure successful implementation. For a more detailed explanation of these concepts, refer to the [Azure Private Link Service documentation](https://learn.microsoft.com/en-us/azure/private-link/private-link-service-overview). * **Consumer:** In this context, the Consumer is dbt, which creates a private endpoint to connect to your Private Link Service. * **Service provider:** Your organization, which owns and operates the service behind the Standard Load Balancer and creates the Private Link Service. * **Private Link Service:** The Azure resource that exposes your service to consumers, allowing them to create private endpoints to access it. This is tied to a Standard Load Balancer frontend IP configuration. * **Alias:** A globally unique name generated by Azure for your Private Link Service. You share this alias with dbt Support to establish the connection to your service as a consumer. * **Standard Load Balancer:** The required load balancer type that sits in front of your service. Your application must run behind a Standard Load Balancer to use Private Link Service. * **NAT subnet:** A dedicated subnet in your VNet used for Source Network Address Translation (SNAT) IP addresses for the Private Link Service. Consumer traffic appears to originate from this pool of private IP addresses. #### Prerequisites[​](#prerequisites "Direct link to Prerequisites") Before you begin, make sure to review the following requirements: 1. **Supported Load Balancer Types** dbt has officially validated Private Link functionality with the following load balancer type: * Standard Load Balancer (Internal) > While other configurations may be compatible with Azure Private Link Services, this guide assumes your service is configured behind a Standard Internal Load Balancer. For more details, see the [Azure Load Balancer documentation](https://learn.microsoft.com/en-us/azure/load-balancer/load-balancer-overview). 2. **Service Health** * Confirm that your service or application is operational and healthy behind the designated load balancer before proceeding. 3. **dbt Azure Subscription ID** * Contact [dbt Support](https://docs.getdbt.com/community/resources/getting-help.md#dbt-cloud-support) to obtain the dbt Azure subscription ID. You will need this in order to allow dbt to connect to your Private Link Service. #### Instructions[​](#instructions "Direct link to Instructions") 1. Log in to the [Azure Portal](https://portal.azure.com). 2. Navigate to the Azure Subscription and Resource Group where your self-hosted service is located. ##### Create a NAT subnet for the Private Link service[​](#create-a-nat-subnet-for-the-private-link-service "Direct link to Create a NAT subnet for the Private Link service") 3. Under the **Resources** section, search for **Virtual network** and go into the VNet where your self-hosted service is running. 4. Expand the **Settings** in the left side panel, and go into **Subnets**. Click the **+ Subnet** button to create a new subnet. 5. In the subnet creation panel: a. **Subnet purpose:** Leave as **Default** b. **Name:** Provide a descriptive name, such as **private-link-nat-subnet** c. **IPv4 address range:** Choose the appropriate CIDR block from your VNet that you want to create a NAT subnet from. In this example, the CIDR is 10.30.0.0/16, as seen in the screenshot below. d. **Starting address:** Your desired starting address of the new subnet e. **Size:** The smallest available size is recommended (for example, /28). f. Check the **Enable private subnet (no default outbound access)** checkbox. g. **NAT gateway:** Leave as **None** h. Leave **Network security group** and **Route table** fields as **None** unless your environment requires specific values here. i. Leave all remaining fields as their default values. j. Click **Add** to create the subnet. [![Screenshot of step 3: Search for VNet of self-hosted service](/img/docs/dbt-cloud/az-self-hosted-privatelink/vnet-search.png?v=2 "Screenshot of step 3: Search for VNet of self-hosted service")](#)Screenshot of step 3: Search for VNet of self-hosted service [![Screenshot of steps 4-5: NAT Subnet creation for Private Link Service](/img/docs/dbt-cloud/az-self-hosted-privatelink/nat-subnet-creation.png?v=2 "Screenshot of steps 4-5: NAT Subnet creation for Private Link Service")](#)Screenshot of steps 4-5: NAT Subnet creation for Private Link Service ##### Create a Private Link service[​](#create-a-private-link-service "Direct link to Create a Private Link service") 6. After the subnet creation has completed, in the search field at the top-middle of the portal, search for **Private link services**, and click on its page. 7. Click the **+ Create** button. 8. In the Create private link service page: **Under Basics** a. Select your **Subscription** and **Resource group** b. **Name:** Give a descriptive name, such as **pls-to-my-vcs** c. **Region:** Select the region where your self-hosted service is located **Under Outbound settings** d. **Load balancer:** In the dropdown, choose the Standard Internal Load Balancer that is in front of your self-hosted service e. **Load balancer frontend IP address:** Choose the frontend IP configuration for your load balancer f. **Source NAT subnet:** Select the NAT subnet you created in step 5 above g. **Source NAT Virtual network:** This will auto-populate based on your subnet selection h. **Enable TCP proxy V2:** Leave this disabled **Under Access security** i. Select **Restricted by subscription** j. Click **Add subscriptions** and add dbt's Azure subscription ID that you acquired from support k. Set **Request Auto-approve** selection to **Yes** for dbt's subscription l. Click **Next: Review + create**, then **Create** [![Screenshot of step 8: Creation of Azure Private Link Service](/img/docs/dbt-cloud/az-self-hosted-privatelink/privatelink-service-creation.png?v=2 "Screenshot of step 8: Creation of Azure Private Link Service")](#)Screenshot of step 8: Creation of Azure Private Link Service 9. After the Private Link Service has been created, click on it to open its details page. 10. Copy the **Alias** value (this is the identifier you'll share with dbt Support). [![Screenshot of step 10: Copy the Private Link Service Alias](/img/docs/dbt-cloud/az-self-hosted-privatelink/alias-info.png?v=2 "Screenshot of step 10: Copy the Private Link Service Alias")](#)Screenshot of step 10: Copy the Private Link Service Alias ##### Providing dbt Support with connection details[​](#providing-dbt-support-with-connection-details "Direct link to Providing dbt Support with connection details") 11. Add the required information to the template below, and submit your request to [dbt Support](https://docs.getdbt.com/community/resources/getting-help.md#dbt-cloud-support): ```text Subject: New Azure Self-hosted Private Link Request - Type: Self-hosted Private Link - Platform/Service: (for example, Postgres, Starburst, Spark, GitLab, etc.) - Private Link Service Alias: - Custom DNS (if HTTPS/TLS) - DNS record: - Service Region: (for example, East US, West Europe) - dbt Azure environment (EMEA): ``` dbt Labs will work on your behalf to complete the private connection setup. Please allow 3-5 business days for this process to complete. Support will contact you when the endpoint is available. #### Troubleshooting[​](#troubleshooting "Direct link to Troubleshooting") If the Private Link endpoint has been provisioned and configured in dbt but connectivity is still failing, check the following in your networking setup to ensure requests and responses can be successfully routed between dbt and your service. ##### Configuration checklist[​](#configuration-checklist "Direct link to Configuration checklist") 1. **Private Link Service status** In the Azure Portal, navigate to your Private Link Service and verify the **Provisioning state** is **Succeeded**. Check the **Private endpoint connections** tab to confirm dbt's connection shows as **Approved**. 2. **Load balancer backend health** Navigate to your Standard Load Balancer and check the **Backend pool** health. At least one backend instance must be **Healthy**. Unhealthy backends could indicate the service is down or that a Network Security Group (NSG) is blocking traffic from the load balancer. 3. **NAT subnet configuration** Verify the NAT subnet has sufficient IP addresses available. Azure Private Link Service uses these IPs for SNAT. If the subnet is exhausted, new connections may fail. 4. **Network Security Groups** If you have NSGs applied to the NAT subnet or backend subnet, ensure they allow traffic appropriately: * NAT subnet: Recommended to leave NSG as **None** (as noted in the setup instructions) * Backend subnet: Must allow traffic from the load balancer's frontend IP ##### Monitoring[​](#monitoring "Direct link to Monitoring") To help isolate connection issues, use Azure's monitoring tools: ###### Private Link Service metrics[​](#private-link-service-metrics "Direct link to Private Link Service metrics") In the Azure Portal, navigate to your Private Link Service and click **Metrics**. Monitor: * **Bytes In/Out** — Confirms traffic is flowing through the service * **NAT Port Usage** — High usage may indicate the NAT subnet needs more IPs ###### Load Balancer metrics[​](#load-balancer-metrics "Direct link to Load Balancer metrics") Navigate to your Standard Load Balancer and click **Metrics**. Monitor: * **Health Probe Status** — Shows backend health over time * **Byte Count** — Confirms traffic is reaching the load balancer * **SNAT Connection Count** — Tracks outbound connections For more information, see [Azure Private Link monitoring](https://learn.microsoft.com/en-us/azure/private-link/private-link-service-overview#monitoring) and [Load Balancer monitoring](https://learn.microsoft.com/en-us/azure/load-balancer/monitor-load-balancer). #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ##### Configuring BigQuery Private Service Connect Enterprise + ### Configuring BigQuery Private Service Connect [Enterprise +](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing") Available to certain Enterprise tiers The private connection feature is available on the following dbt Enterprise tiers: * Business Critical * Virtual Private To learn more about these tiers, contact us at . The following steps walk you through the setup of a GCP BigQuery [Private Service Connect](https://cloud.google.com/vpc/docs/private-service-connect) (PSC) endpoint in a dbt multi-tenant environment. Private connection endpoints can't connect across cloud providers (AWS, Azure, and GCP). For a private connection to work, both dbt and the server (like BigQuery) must be hosted on the same cloud provider. For example, dbt hosted on AWS cannot connect to services hosted on Azure, and dbt hosted on Azure can’t connect to services hosted on GCP. #### Enabling dbt for GCP Private Service Connect[​](#enabling-dbt-for-gcp-private-service-connect "Direct link to Enabling dbt for GCP Private Service Connect") To enable dbt to privately connect to your BigQuery project via PSC, the regional PSC endpoint needs be enabled for your dbt account. Using the following template, submit a request to [dbt Support](https://docs.getdbt.com/docs/dbt-support.md#dbt-cloud-support): ```text Subject: New Multi-Tenant GCP PSC Request - Type: BigQuery - BigQuery project region: - dbt GCP multi-tenant environment: ``` dbt Labs will work on your behalf to complete the private connection setup. Please allow 3-5 business days for this process to complete. Support will contact you when the endpoint is available. #### (Optional) Generate BigQuery credentials[​](#optional-generate-bigquery-credentials "Direct link to (Optional) Generate BigQuery credentials") You may already have credentials set up for your datasets. If not, you can follow the steps in our [BigQuery quickstart guide](https://docs.getdbt.com/guides/bigquery.md?step=4) to generate credentials. #### Create connection in dbt[​](#create-connection-in-dbt "Direct link to Create connection in dbt") Once dbt Support completes the configuration, you can start creating new connections using PSC: 1. Navigate to **Account settings** > **Connections**. 2. In the **Connections** page, select **BigQuery**. Click **Edit**. 3. You will see two radio buttons: **Default Endpoint** and **PrivateLink Endpoint**. Select **PrivateLink Endpoint**. 4. Select the private endpoint from the dropdown (this will automatically populate the API endpoint field). 5. Input any remaining data platform details, including the BigQuery credentials you might have created in previous steps. 6. Save the connection and test in either a project job or Studio session. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ##### Configuring Databricks and Azure Private Link Available to certain Enterprise tiers The private connection feature is available on the following dbt Enterprise tiers: * Business Critical * Virtual Private To learn more about these tiers, contact us at . The following steps walk you through the setup of a Databricks Azure Private Link endpoint in the dbt multi-tenant environment. Private connection endpoints can't connect across cloud providers (AWS, Azure, and GCP). For a private connection to work, both dbt and the server (like Databricks) must be hosted on the same cloud provider. For example, dbt hosted on AWS cannot connect to services hosted on Azure, and dbt hosted on Azure can’t connect to services hosted on GCP. #### Configure Azure Private Link[​](#configure-azure-private-link "Direct link to Configure Azure Private Link") 1. Navigate to your Azure Databricks workspace. The path format is: `/subscriptions//resourceGroups//providers/Microsoft.Databricks/workspaces/`. 2. From the workspace overview, click **JSON view**. 3. Copy the value in the `resource_id` field. 4. Add the required information to the following template and submit your Azure Private Link request to [dbt Support](https://docs.getdbt.com/docs/dbt-support.md#dbt-cloud-support): ```text Subject: New Azure Multi-Tenant Private Link Request - Type: Databricks - Databricks instance name: - Azure Databricks Workspace URL (for example, adb-################.##.azuredatabricks.net) - Databricks Azure resource ID: - dbt Azure multi-tenant environment (EMEA): - Azure Databricks workspace region (like WestEurope, NorthEurope): ``` 5. Once our Support team confirms the resources are available in the Azure portal, navigate to the Azure Databricks Workspace and browse to **Networking** > **Private Endpoint Connections**. Then, highlight the `dbt` named option and select **Approve**. #### Create connection in dbt[​](#create-connection-in-dbt "Direct link to Create connection in dbt") Once you've completed the setup in the Databricks environment, you can configure a private endpoint in dbt: 1. Navigate to **Settings** → **Create new project** → select **Databricks**. 2. You will see two radio buttons: **Public** and **Private**. Select **Private**. 3. Select the private endpoint from the dropdown (this automatically populates the hostname/account field). 4. Configure the remaining data platform details. 5. Test your connection and save it. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ##### Configuring Databricks PrivateLink Enterprise + ### Configuring Databricks PrivateLink [Enterprise +](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing") Available to certain Enterprise tiers The private connection feature is available on the following dbt Enterprise tiers: * Business Critical * Virtual Private To learn more about these tiers, contact us at . The following steps walk you through the setup of a Databricks AWS PrivateLink endpoint in the dbt multi-tenant environment. Private connection endpoints can't connect across cloud providers (AWS, Azure, and GCP). For a private connection to work, both dbt and the server (like Databricks) must be hosted on the same cloud provider. For example, dbt hosted on AWS cannot connect to services hosted on Azure, and dbt hosted on Azure can’t connect to services hosted on GCP. #### Configure AWS PrivateLink[​](#configure-aws-privatelink "Direct link to Configure AWS PrivateLink") 1. Locate your [Databricks instance name](https://docs.databricks.com/en/workspace/workspace-details.html#workspace-instance-names-urls-and-ids). * Example: `cust-success.cloud.databricks.com` 2. Add the required information to the following template and submit your AWS PrivateLink request to [dbt Support](https://docs.getdbt.com/docs/dbt-support.md#dbt-cloud-support): ```text Subject: New AWS Multi-Tenant PrivateLink Request - Type: Databricks - Databricks instance name: - Databricks cluster AWS Region (for example, us-east-1, eu-west-2): - dbt AWS multi-tenant environment (US, EMEA, AU): ``` dbt Labs will work on your behalf to complete the private connection setup. Please allow 3-5 business days for this process to complete. Support will contact you when the endpoint is available. 3. Once dbt Support notifies you that setup is complete, [register the VPC endpoint in Databricks](https://docs.databricks.com/administration-guide/cloud-configurations/aws/privatelink.html#step-3-register-privatelink-objects-and-attach-them-to-a-workspace) and attach it to the workspace: * [Register your VPC endpoint](https://docs.databricks.com/en/security/network/classic/vpc-endpoints.html) — Register the VPC endpoint using the VPC endpoint ID provided by dbt Support. * [Create a Private Access Settings object](https://docs.databricks.com/en/security/network/classic/private-access-settings.html) — Create a Private Access Settings (PAS) object with your desired public access settings, and setting Private Access Level to **Endpoint**. Choose the registered endpoint created in the previous step. * [Create or update your workspace](https://docs.databricks.com/en/security/network/classic/privatelink.html#step-3d-create-or-update-the-workspace-front-end-back-end-or-both) — Create a workspace, or update an existing workspace. Under **Advanced configurations → Private Link** choose the private access settings object created in the previous step. warning If using an existing Databricks workspace, all workloads running in the workspace need to be stopped to enable Private Link. Workloads also can't be started for another 20 minutes after making changes. From the [Databricks documentation](https://docs.databricks.com/en/security/network/classic/privatelink.html#step-3d-create-or-update-the-workspace-front-end-back-end-or-both): "After creating (or updating) a workspace, wait until it’s available for using or creating clusters. The workspace status stays at status RUNNING and the VPC change happens immediately. However, you cannot use or create clusters for another 20 minutes. If you create or use clusters before this time interval elapses, clusters do not launch successfully, fail, or could cause other unexpected behavior." #### Create connection in dbt[​](#create-connection-in-dbt "Direct link to Create connection in dbt") Once you've completed the setup in the Databricks environment, you can configure a private endpoint in dbt: 1. Navigate to **Settings** → **Create new project** → select **Databricks**. 2. You will see two radio buttons: **Public** and **Private**. Select **Private**. 3. Select the private endpoint from the dropdown (this automatically populates the hostname/account field). 4. Configure the remaining data platform details. 5. Test your connection and save it. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ##### Configuring GCP Private Service Connect for a self-hosted service Enterprise + ### Configuring GCP Private Service Connect for a self-hosted service [Enterprise +](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing") Available to certain Enterprise tiers The private connection feature is available on the following dbt Enterprise tiers: * Business Critical * Virtual Private To learn more about these tiers, contact us at . GCP Private Service Connect (PSC) enables secure, private connectivity between dbt and your self-hosted services. These services may include version control systems (VCS), data warehouses, or any other applications you manage. With PSC, you do not need to expose your service to the public internet. All communication occurs over a private network, significantly enhancing security. For more details, refer to the GCP [Private Service Connect documentation](https://cloud.google.com/private-service-connect). #### What this guide covers[​](#what-this-guide-covers "Direct link to What this guide covers") The focus of this guide is not on any particular service or [Backend](#terminology) architecture, but on the [Service Attachment](#terminology) that interconnects dbt with your self-hosted service. This attachment process should be standard across most use cases. [![The scope of this guide](/img/docs/dbt-cloud/gcp-self-hosted-psc/scope-of-guide.png?v=2 "The scope of this guide")](#)The scope of this guide Out of scope This guide does not cover the configuration or troubleshooting of your self-hosted service, load balancer, or backend health, due to the virtually limitless ways these environments can be configured. While dbt Support may assist with such issues on a best-effort basis, we recommend engaging [Google Cloud Support](https://cloud.google.com/support) to expedite resolution. #### Audience[​](#audience "Direct link to Audience") This guide is intended for cloud network administrators or engineers responsible for configuring and maintaining secure network communications within your organization's Google Cloud Platform (GCP) environment. #### Terminology[​](#terminology "Direct link to Terminology") This guide uses several important terms related to Private Service Connect. Understanding these definitions helps ensure successful implementation. For a more detailed explanation of these concepts, refer to the [GCP Private Service Connect documentation](https://cloud.google.com/vpc/docs/private-service-connect#managed-services). * **Consumer:** In this context, the Consumer is dbt, which establishes the PSC connection as the client. * **Published Service:** The service you are exposing via PSC to the dbt platform, such as your version control system (VCS), data warehouse, or another application. * **Service Attachment:** Refers to the resource that is shared with consumer(s) of your Published Service, so that they can establish endpoints to it. * **Backend:** Can also be referred to as Network Endpoint Groups (NEGs). This is the particular architecture that your service is running on. For example, this may be VMs, GKE Instance Groups, or even on-prem IPs. #### Prerequisites[​](#prerequisites "Direct link to Prerequisites") Before you begin, make sure to review the following requirements: 1. **Supported Load Balancer Types** dbt has officially validated Private Service Connect (PSC) functionality with the following load balancer types: * Regional Internal Proxy Load Balancer * Cross-Regional Internal Proxy Load Balancer > While other load balancer types can be compatible with PSC Service Attachments, this guide assumes your service is configured behind one of the officially supported Proxy Load Balancers. For more details, see the [Proxy Load Balancers documentation](https://docs.cloud.google.com/load-balancing/docs/tcp/internal-proxy). 2. **Service Health** * Confirm that your service or application is operational and healthy behind the designated load balancer before proceeding. 3. **dbt GCP Project ID** * Contact [dbt Support](https://docs.getdbt.com/community/resources/getting-help.md#dbt-cloud-support) to obtain the dbt GCP project ID. You will need this in order to share your service attachment with the dbt platform. #### Instructions[​](#instructions "Direct link to Instructions") 1. Log in to the Google Cloud Platform [console](https://console.cloud.google.com) 2. Navigate to the GCP Organization and Project that your self-hosted service is in. ##### Create a dedicated service attachment subnet[​](#create-a-dedicated-service-attachment-subnet "Direct link to Create a dedicated service attachment subnet") 3. In the search field at the top-middle of the console, search for **VPC networks** and navigate to its product page. 4. On the product page, click the VPC network link where your self-hosted service is located. 5. Select the **Subnets** tab on the next page, and click the **Add subnet** button. 6. In the subnet creation panel: a. **Name:** Provide a descriptive name, such as **service-attachment-subnet** b. **Description:** This subnet is dedicated to service attachment(s) c. **Region:** Pick the region of your self-hosted service d. **Purpose:** Choose **Private Service Connect** e. Click **Add** to create the subnet [![Screenshot of step 6: Subnet creation for PSC Service Attachment](/img/docs/dbt-cloud/gcp-self-hosted-psc/service-attach-subnet-creation.png?v=2 "Screenshot of step 6: Subnet creation for PSC Service Attachment")](#)Screenshot of step 6: Subnet creation for PSC Service Attachment ##### Create a service attachment[​](#create-a-service-attachment "Direct link to Create a service attachment") 7. After the subnet creation for the service attachment has completed, in the search field at the top-middle of the console, search for **Private Service Connect**, and click on its product page. 8. On the product page, select the **Published services** tab, and click the **Publish service** button. 9. In the Publish service page: **Under Target details** a. Choose **Load Balancer** b. The load balancer types that dbt has validated are the **Regional Internal Proxy Load Balancer** and the **Cross-Regional Internal Proxy Load Balancer**. However, the others may work as well, although not officially supported. c. In the **Load balancer** dropdown, choose the load balancer that is in front of your self-hosted service. d. Choose the relevant **Forwarding rule** from the dropdown for your load balancer. **Under Service details** e. Give a descriptive **Service Name**, such as **service-to-my-vcs** f. In the **Subnets** dropdown, choose the subnet that you created in step 6 above. **Under Connection Preference** g. Leave the selection on **Accept connections from selected projects** h. Click the **Add accepted project** button and add dbt's GCP project ID that you acquired from support. Note: This project ID may differ for each configuration. * Set connection limit to 1 i. Click **Add service** [![Screenshot of step 9: Creation of PSC Service Attachment](/img/docs/dbt-cloud/gcp-self-hosted-psc/service-attach-creation.png?v=2 "Screenshot of step 9: Creation of PSC Service Attachment")](#)Screenshot of step 9: Creation of PSC Service Attachment 10. After the Published Service attachment has been created, click on it to open its details page. 11. Copy the **Service attachment** URI (*not* the Service attachment ID). [![Screenshot of step 11: Copy the Service attachment URI](/img/docs/dbt-cloud/gcp-self-hosted-psc/service-attach-details.png?v=2 "Screenshot of step 11: Copy the Service attachment URI")](#)Screenshot of step 11: Copy the Service attachment URI ##### Providing dbt Support with connection details[​](#providing-dbt-support-with-connection-details "Direct link to Providing dbt Support with connection details") 12. Add the required information to the template below, and submit your request to [dbt Support](https://docs.getdbt.com/community/resources/getting-help.md#dbt-cloud-support): ```text Subject: New GCP Self-hosted Private Service Connect Request - Type: Self-hosted PSC - Platform/Service: (for example, Postgres, Starburst, Spark, GitLab, etc.) - Service Attachment URI: - Custom DNS (if HTTPS/TLS) - DNS record: - Service Region: (for example, us-east1, us-central1) - dbt GCP environment (US): ``` dbt Labs will work on your behalf to complete the private connection setup. Please allow 3-5 business days for this process to complete. Support will contact you when the endpoint is available. #### Troubleshooting[​](#troubleshooting "Direct link to Troubleshooting") If the Private Service Connect endpoint has been provisioned and configured in dbt but connectivity is still failing, check the following in your networking setup to ensure requests and responses can be successfully routed between dbt and your service. ##### Configuration checklist[​](#configuration-checklist "Direct link to Configuration checklist") 1. **Service Attachment status** In the Google Cloud Console, navigate to **Network services** → **Private Service Connect** → **Published services**. Select your Service Attachment and verify: * Status is **Active** * dbt's project appears in the **Connected projects** list with status **Accepted** 2. **Load balancer backend health** Navigate to **Network services** → **Load balancing** and select your load balancer. Check the **Backend services** tab to confirm at least one backend is **Healthy**. Unhealthy backends could indicate the service is down or that firewall rules are blocking health check probes. 3. **NAT subnet configuration** Verify the Private Service Connect subnet has sufficient IP addresses available. GCP uses these IPs for SNAT when routing consumer traffic to your backends. 4. **Firewall rules** Ensure your VPC firewall rules allow: * Health check traffic from Google's health check ranges (`35.191.0.0/16` and `130.211.0.0/22`) * Traffic from the proxy-only subnet to your backends (for Proxy Load Balancers) For more details, see [Firewall rules for health checks](https://cloud.google.com/load-balancing/docs/health-check-concepts#ip-ranges). ##### Monitoring[​](#monitoring "Direct link to Monitoring") To help isolate connection issues, use Google Cloud's monitoring tools: ###### Service Attachment metrics[​](#service-attachment-metrics "Direct link to Service Attachment metrics") In the Google Cloud Console, navigate to **Monitoring** → **Metrics Explorer**. Search for Private Service Connect metrics: * `compute.googleapis.com/nat/nat_connections` — Tracks active NAT connections * `compute.googleapis.com/nat/sent_bytes_count` — Confirms traffic is flowing ###### Load Balancer logs[​](#load-balancer-logs "Direct link to Load Balancer logs") Enable logging on your load balancer's backend service to capture request details. Navigate to your backend service, click **Edit**, and enable **Logging** with a sample rate of 1.0 for troubleshooting. For more information, see [Private Service Connect monitoring](https://cloud.google.com/vpc/docs/monitor-private-service-connect-connections) and [Load Balancer logging](https://cloud.google.com/load-balancing/docs/https/https-logging-monitoring). #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ##### Configuring Private Link for Azure Database for Postgres Flexible Server Available to certain Enterprise tiers The private connection feature is available on the following dbt Enterprise tiers: * Business Critical * Virtual Private To learn more about these tiers, contact us at . The following steps walk you through the setup of a Private Link endpoint for Azure Database for Postgres Flexible Server in a dbt multi-tenant environment. Private connection endpoints can't connect across cloud providers (AWS, Azure, and GCP). For a private connection to work, both dbt and the server (like Azure Database) must be hosted on the same cloud provider. For example, dbt hosted on AWS cannot connect to services hosted on Azure, and dbt hosted on Azure can’t connect to services hosted on GCP. #### Configure Azure Private Link[​](#configure-azure-private-link "Direct link to Configure Azure Private Link") From your Azure portal: 1. Navigate to your Azure Database for Postgres Flexible Server. 2. From the server overview, click **JSON view**. 3. Copy the value in the **Resource ID** field at the top of the pane.
The path format is: `/subscriptions//resourceGroups//providers/Microsoft.DBforPostgreSQL/flexibleServers/`. 4. Add the required information to the following template and submit your Azure Private Link request to [dbt Support](https://docs.getdbt.com/docs/dbt-support.md#dbt-cloud-support): ```text Subject: New Azure Multi-Tenant Private Link Request - Type: Azure Database for Postgres Flexible Server - Postgres Flexible Server name: - Azure Database for Postgres Flexible Server resource ID: - dbt Azure multi-tenant environment (EMEA): - Azure Postgres server region (for example, WestEurope, NorthEurope): ``` 5. Once our Support team confirms the endpoint has been created, navigate to the Azure Database for Postgres Flexible Server in the Azure Portal and browse to **Settings** > **Networking**. In the **Private Endpoints** section, highlight the `dbt` named option and select **Approve**. Confirm with dbt Support that the connection has been approved so they can validate the connection and make it available for use in dbt. #### Create connection in dbt[​](#create-connection-in-dbt "Direct link to Create connection in dbt") Once you've completed the setup in the Azure environment, you can configure a private endpoint in dbt: 1. Navigate to **Settings** → **Create new project** → select **Postgres**. 2. You will see two radio buttons: **Default Endpoint** and **PrivateLink Endpoint**. Select **PrivateLink Endpoint**. 3. Select the private endpoint from the dropdown (this will automatically populate the hostname/account field). 4. Configure the remaining data platform details. 5. Test your connection and save it. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ##### Configuring Private Link for Azure Synapse Available to certain Enterprise tiers The private connection feature is available on the following dbt Enterprise tiers: * Business Critical * Virtual Private To learn more about these tiers, contact us at . The following steps walk you through the setup of a Private Link endpoint for Azure Synapse in a dbt multi-tenant environment. Private connection endpoints can't connect across cloud providers (AWS, Azure, and GCP). For a private connection to work, both dbt and the server (like Azure Synapse) must be hosted on the same cloud provider. For example, dbt hosted on AWS cannot connect to services hosted on Azure, and dbt hosted on Azure can’t connect to services hosted on GCP. #### Configure Azure Private Link[​](#configure-azure-private-link "Direct link to Configure Azure Private Link") From your Azure portal: 1. Navigate to your Azure Synapse workspace. 2. From the workspace overview, click **JSON view**. 3. Copy the value in the **Resource ID** field at the top of the pane.
The path format is: `/subscriptions//resourceGroups//providers/Microsoft.Synapse/workspaces/`. 4. Add the required information to the following template and submit your Azure Private Link request to [dbt Support](https://docs.getdbt.com/docs/dbt-support.md#dbt-cloud-support): ```text Subject: New Azure Multi-Tenant Private Link Request - Type: Azure Synapse - Server name: - Azure Synapse workspace resource ID: - dbt Azure multi-tenant environment (EMEA): - Azure Synapse workspace region (for example, WestEurope, NorthEurope): ``` 5. Once our Support team confirms the endpoint has been created, navigate to the Azure Synapse workspace in the Azure Portal and browse to **Security** > **Private endpoint connections**. In the **Private endpoint connections** table, highlight the `dbt` named option and select **Approve**. Confirm with dbt Support that the connection has been approved so they can validate the connection and make it available for use in dbt. #### Create connection in dbt[​](#create-connection-in-dbt "Direct link to Create connection in dbt") Once you've completed the step above, you can configure a private endpoint in dbt: 1. Navigate to **Settings** → **Create new project** → select **Synapse**. 2. You will see two radio buttons: **Default Endpoint** and **PrivateLink Endpoint**. Select **PrivateLink Endpoint**. 3. Select the private endpoint from the dropdown (this will automatically populate the hostname/account field). 4. Configure the remaining data platform details. 5. Test your connection and save it. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ##### Configuring public IP restrictions Enterprise + ### Configuring public IP restrictions [Enterprise +](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing") Available to certain Enterprise tiers Organizations can configure IP restrictions using the following dbt Enterprise tiers: * Business Critical * Virtual Private To learn more about these tiers, contact us at . IP restrictions help control which IP addresses can connect to dbt. They allow dbt customers to meet security and compliance controls by only allowing approved IPs to connect to their dbt environment. This feature is supported in all regions across NA, Europe, and Asia-Pacific, but contact us if you have questions about availability. #### Configuring IP restrictions[​](#configuring-ip-restrictions "Direct link to Configuring IP restrictions") To configure IP restrictions, go to **Account Settings** → **IP Restrictions**. IP restrictions provide two methods for determining which IPs can access dbt: an allowlist and a blocklist. IPs in the allowlist can access dbt, and IPs in the blocklist are blocked from accessing dbt. You can use IP restrictions for a range of use cases, including: * Allow only corporate VPN traffic and deny all other traffic * Deny IPs flagged by the security team * Allow only VPN traffic but make an exception for contractors' IP addresses IP restrictions block all service tokens, user requests made through the API (using personal user tokens), and the UI if they come from blocked IP addresses. For any version control system integrations (GitHub, GitLab, ADO, and others) inbound into dbt, ensure you add their IP addresses to the allowed list. ##### Allowing IPs[​](#allowing-ips "Direct link to Allowing IPs") To add an IP to the allowlist, from the **IP Restrictions** page: 1. Click **Edit**. 2. Click **Add Rule**. 3. Add a name and description for the rule. * For example, Corporate VPN CIDR Range 4. Select **Allow**. 5. Add the ranges in CIDR notation. * For example, 1.1.1.1/8 * You can add multiple ranges in the same rule. 6. Click **Save**. Add multiple IP ranges by clicking the **Add IP range** button to create a new text field. Simply adding the IP ranges does not enforce IP restrictions. For more information, refer to the [Enabling restrictions](#enabling-restrictions) section. If you only want to allow the IP ranges added to this list and deny all other requests, you don't need to add a blocklist. By default, if you only add an allowlist, dbt only allows IPs in the allowable range and denies all other IPs. However, you can add a blocklist if you want to deny specific IP addresses within your allowlist CIDR range. ##### Blocking IPs (deny)[​](#blocking-ips-deny "Direct link to Blocking IPs (deny)") If you have IPs defined in the allowlist that need to be denied, you can add those IP ranges to the blocklist: 1. Click **Edit**. 2. Click **Add Rule**. 3. Add a name and description for the rule. * For example, "Corporate VPN Deny Range" 4. Select **Deny**. 5. Add the ranges or individual IP addresses in CIDR notation. 6. Click **Save**. Duplicate IP addresses If identical IP addresses are in both the allow and block configurations, the second entry fails to save. You can put an IP range on one list and then a sub-range or IP address that is part of it on the other. Using USA (range) and NY (sub-range) as an example, the expected behavior is: * USA is on blocklist and NY is on allowlist — Traffic from the USA is blocked, but IPs from NY are allowed. * USA is on the allowlist and NY is on the blocklist — USA traffic is allowed, but IPs from NY are blocked. #### Enabling restrictions[​](#enabling-restrictions "Direct link to Enabling restrictions") Once you finish adding all your ranges, you can enable IP restrictions by selecting **Enable IP restrictions** and clicking **Save**. If your IP address is in any of the blocklist ranges, you can't save or enable IP restrictions — this prevents accidental account lockouts. If you get locked out due to IP changes on your end, reach out to . Once enabled, when someone attempts to access dbt from a restricted IP, they encounter one of the following messages depending on whether they use email and password or SSO login: * For email logins: "Access denied! Please contact your admin for more details." * For SSO logins: "Access denied! Please contact your admin for more details." on a dbt login page #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ##### Configuring Snowflake and Azure Private Link Available to certain Enterprise tiers The private connection feature is available on the following dbt Enterprise tiers: * Business Critical * Virtual Private To learn more about these tiers, contact us at . The following steps walk you through the setup of an Azure-hosted Snowflake Private Link endpoint in a dbt multi-tenant environment. Private connection endpoints can't connect across cloud providers (AWS, Azure, and GCP). For a private connection to work, both dbt and the server (like Snowflake) must be hosted on the same cloud provider. For example, dbt hosted on AWS cannot connect to services hosted on Azure, and dbt hosted on Azure can’t connect to services hosted on GCP. Snowflake OAuth with Private Link Users connecting to Snowflake using [Snowflake OAuth](https://docs.getdbt.com/docs/cloud/manage-access/set-up-snowflake-oauth.md) over an Azure Private Link connection from dbt also require access to a Private Link endpoint from their local workstation. Where possible, use [Snowflake External OAuth](https://docs.getdbt.com/docs/cloud/manage-access/snowflake-external-oauth.md) instead to bypass this limitation. Snowflake docs: > Currently, for any given Snowflake account, SSO works with only one account URL at a time: either the public account URL or the URL associated with the private connectivity service * [Snowflake SSO with Private Connectivity](https://docs.snowflake.com/en/user-guide/admin-security-fed-auth-overview#label-sso-private-connectivity) #### Configure Azure Private Link[​](#configure-azure-private-link "Direct link to Configure Azure Private Link") To configure Snowflake instances hosted on Azure for [Private Link](https://learn.microsoft.com/en-us/azure/private-link/private-link-overview): 1. In your Snowflake account, run the following SQL statements and copy the output: ```sql USE ROLE ACCOUNTADMIN; SELECT SYSTEM$GET_PRIVATELINK_CONFIG(); ``` 2. Add the required information to the following template and submit your request to [dbt Support](https://docs.getdbt.com/docs/dbt-support.md#dbt-cloud-support): ```text Subject: New Multi-Tenant Azure PrivateLink Request - Type: Snowflake - The output from SYSTEM$GET_PRIVATELINK_CONFIG: - Include the privatelink-pls-id - Enable Internal Stage Private Link? Y/N (If Y, output must include `privatelink-internal-stage`) - dbt Azure multi-tenant environment (EMEA): ``` 3. dbt Support will provide the `private endpoint resource_id` of our `private_endpoint` and the `CIDR` range for you to complete the [PrivateLink configuration](https://community.snowflake.com/s/article/HowtosetupPrivatelinktoSnowflakefromCloudServiceVendors) by contacting the Snowflake Support team. 4. (Optional) If enabling an [Azure private endpoint for an Internal Stage](https://docs.snowflake.com/en/user-guide/private-internal-stages-azure), it will also provide the `resource_id` for the Internal Stage endpoint. As the Snowflake administrator, call the `SYSTEM$AUTHORIZE_STAGE_PRIVATELINK_ACCESS` function using the resource ID value as the function argument. This authorizes access to the Snowflake internal stage through the private endpoint. ```sql USE ROLE ACCOUNTADMIN; -- Azure Private Link SELECT SYSTEM$AUTHORIZE_STAGE_PRIVATELINK_ACCESS ( 'AZURE_PRIVATE_ENDPOINT_RESOURCE_ID' ); ``` #### Configuring network policies[​](#configuring-network-policies "Direct link to Configuring network policies") If your organization uses [Snowflake Network Policies](https://docs.snowflake.com/en/user-guide/network-policies) to restrict access to your Snowflake account, you need to add a network rule for dbt. ##### Find the endpoint Azure Link ID[​](#find-the-endpoint-azure-link-id "Direct link to Find the endpoint Azure Link ID") Snowflake allows you to find the Azure Link ID of configured endpoints by running the `SYSTEM$GET_PRIVATELINK_AUTHORIZED_ENDPOINTS` command. Use the following to isolate the Link ID value and the associated endpoint resource name: ```sql select value:linkIdentifier, REGEXP_SUBSTR(value: endpointId, '([^\/]+$)') from table( flatten( input => parse_json(system$get_privatelink_authorized_endpoints()) ) ); ``` ##### Using the UI[​](#using-the-ui "Direct link to Using the UI") Open the Snowflake UI and take the following steps: 1. Go to the **Security** tab. 2. Click on **Network Rules**. 3. Click on **+ Network Rule**. 4. Give the rule a name. 5. Select a database and schema where the rule will be stored. These selections are for permission settings and organizational purposes; they do not affect the rule itself. 6. Set the type to `Azure Link ID` and the mode to `Ingress`. 7. In the identifier box, type the Azure Link ID obtained in the previous section and press **Enter**. 8. Click **Create Network Rule**. [![Create Network Rule](/img/docs/dbt-cloud/snowflakeprivatelink2.png?v=2 "Create Network Rule")](#)Create Network Rule 9. In the **Network Policy** tab, edit the policy to which you want to add the rule. This could be your account-level policy or one specific to the users connecting from dbt. 10. Add the new rule to the allowed list and click **Update Network Policy**. [![Update Network Policy](/img/docs/dbt-cloud/snowflakeprivatelink3.png?v=2 "Update Network Policy")](#)Update Network Policy ##### Using SQL[​](#using-sql "Direct link to Using SQL") For quick and automated setup of network rules via SQL in Snowflake, the following commands allow you to create and configure access rules for dbt. These SQL examples demonstrate how to add a network rule and update your network policy accordingly. 1. Create a new network rule with the following SQL: ```sql CREATE NETWORK RULE allow_dbt_cloud_access MODE = INGRESS TYPE = AZURELINKID VALUE_LIST = (''); -- Replace '' with the actual ID obtained above ``` 2. Add the rule to a network policy with the following SQL: ```sql ALTER NETWORK POLICY ADD ALLOWED_NETWORK_RULE_LIST =('allow_dbt_cloud_access'); ``` #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ##### Configuring Snowflake Private Service Connect Enterprise + ### Configuring Snowflake Private Service Connect [Enterprise +](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing") Available to certain Enterprise tiers The private connection feature is available on the following dbt Enterprise tiers: * Business Critical * Virtual Private To learn more about these tiers, contact us at . The following steps walk you through the setup of a GCP Snowflake Private Service Connect (PSC) endpoint in a dbt multi-tenant environment. Private connection endpoints can't connect across cloud providers (AWS, Azure, and GCP). For a private connection to work, both dbt and the server (like Snowflake) must be hosted on the same cloud provider. For example, dbt hosted on AWS cannot connect to services hosted on Azure, and dbt hosted on Azure can’t connect to services hosted on GCP. warning GCP Internal Stage PSC connections are not currently supported. #### Configure GCP Private Service Connect[​](#configure-gcp-private-service-connect "Direct link to Configure GCP Private Service Connect") The dbt Labs GCP project has been pre-authorized for connections to Snowflake accounts. To configure Snowflake instances hosted on GCP for [Private Service Connect](https://cloud.google.com/vpc/docs/private-service-connect): 1. Run the Snowflake system function [SYSTEM$GET\_PRIVATELINK\_CONFIG](https://docs.snowflake.com/en/sql-reference/functions/system_get_privatelink_config.html) and copy the output. 2. Add the required information to the following template and submit your request to [dbt Support](https://docs.getdbt.com/docs/dbt-support.md#dbt-cloud-support): ```text Subject: New Multi-Tenant GCP PSC Request - Type: Snowflake - SYSTEM$GET_PRIVATELINK_CONFIG output: - *Use privatelink-account-url or regionless-privatelink-account-url?: - dbt GCP multi-tenant environment: ``` *\*By default, dbt will be configured to use `privatelink-account-url` from the provided [SYSTEM$GET\_PRIVATELINK\_CONFIG](https://docs.snowflake.com/en/sql-reference/functions/system_get_privatelink_config.html) as the PrivateLink endpoint. Upon request, `regionless-privatelink-account-url` can be used instead.* dbt Labs will work on your behalf to complete the private connection setup. Please allow 3-5 business days for this process to complete. Support will contact you when the endpoint is available. #### Create connection in dbt[​](#create-connection-in-dbt "Direct link to Create connection in dbt") Once dbt Support completes the configuration, you can start creating new connections using PrivateLink. 1. Navigate to **Settings** → **Create new project** → select **Snowflake**. 2. You will see two radio buttons: **Public** and **Private**. Select **Private**. 3. Select the private endpoint from the dropdown (this automatically populates the hostname/account field). 4. Configure the remaining data platform details. 5. Test your connection and save it. #### Configuring network policies[​](#configuring-network-policies "Direct link to Configuring network policies") If your organization uses [Snowflake Network Policies](https://docs.snowflake.com/en/user-guide/network-policies) to restrict access to your Snowflake account, you need to add a network rule for dbt. Request the **PSC connection ID** from [dbt Support](mailto:support@getdbt.com) to use in a network rule. Snowflake supports [`GCPPSCID` as a network rule identifier type](https://docs.snowflake.com/en/sql-reference/sql/create-network-rule), and this is the recommended approach. A PSC connection ID uniquely identifies your organization's connection endpoint, whereas IP-based rules rely on CIDR ranges that may be shared across multiple dbt customers. ##### Using the UI[​](#using-the-ui "Direct link to Using the UI") Open the Snowflake UI and take the following steps: 1. Go to the **Security** tab. 2. Click on **Network Rules**. 3. Click on **Add Rule**. 4. Give the rule a name. 5. Select a database and schema where the rule will be stored. These selections are for permission settings and organizational purposes; they do not affect the rule itself. 6. Set the type to `GCPPSCID` and the mode to `Ingress`. 7. Enter the PSC connection ID provided by dbt Support into the identifier box and press **Enter**. 8. Click **Create Network Rule**. 9. In the **Network Policy** tab, edit the policy you want to add the rule to. This could be your account-level policy or a policy specific to the users connecting from dbt. 10. Add the new rule to the allowed list and click **Update Network Policy**. ##### Using SQL[​](#using-sql "Direct link to Using SQL") For quick and automated setup of network rules via SQL in Snowflake, the following commands allow you to create and configure access rules for dbt. These SQL examples demonstrate how to add a network rule and update your network policy accordingly. 1. Create a new network rule with the following SQL: ```sql CREATE NETWORK RULE allow_dbt_cloud_access MODE = INGRESS TYPE = GCPPSCID VALUE_LIST = (''); -- Replace with the PSC connection ID from dbt Support ``` 2. Add the rule to a network policy with the following SQL: ```sql ALTER NETWORK POLICY ADD ALLOWED_NETWORK_RULE_LIST =('allow_dbt_cloud_access'); ``` #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ##### Configuring Snowflake PrivateLink Enterprise + ### Configuring Snowflake PrivateLink [Enterprise +](https://www.getdbt.com/pricing "Go to https://www.getdbt.com/pricing") Available to certain Enterprise tiers The private connection feature is available on the following dbt Enterprise tiers: * Business Critical * Virtual Private To learn more about these tiers, contact us at . The following steps walk you through the setup of an AWS-hosted Snowflake PrivateLink endpoint in a dbt multi-tenant environment. Private connection endpoints can't connect across cloud providers (AWS, Azure, and GCP). For a private connection to work, both dbt and the server (like Snowflake) must be hosted on the same cloud provider. For example, dbt hosted on AWS cannot connect to services hosted on Azure, and dbt hosted on Azure can’t connect to services hosted on GCP. Snowflake OAuth with PrivateLink Users connecting to Snowflake using [Snowflake OAuth](https://docs.getdbt.com/docs/cloud/manage-access/set-up-snowflake-oauth.md) over an AWS PrivateLink connection from dbt will also require access to a PrivateLink endpoint from their local workstation. Where possible, use [Snowflake External OAuth](https://docs.getdbt.com/docs/cloud/manage-access/snowflake-external-oauth.md) instead to bypass this limitation. From the [Snowflake](https://docs.snowflake.com/en/user-guide/admin-security-fed-auth-overview#label-sso-private-connectivity) docs: > Currently, for any given Snowflake account, SSO works with only one account URL at a time: either the public account URL or the URL associated with the private connectivity service #### Configure AWS PrivateLink[​](#configure-aws-privatelink "Direct link to Configure AWS PrivateLink") To configure Snowflake instances hosted on AWS for [PrivateLink](https://aws.amazon.com/privatelink): 1. Open a support case with Snowflake to allow access from the dbt AWS account. * Snowflake prefers that the account owner opens the support case directly rather than dbt Labs acting on their behalf. For more information, refer to [Snowflake's knowledge base article](https://community.snowflake.com/s/article/HowtosetupPrivatelinktoSnowflakefromCloudServiceVendors). * Provide them with your dbt account ID along with any other information requested in the article. * **AWS account ID**: `346425330055` — *Note: This account ID only applies to AWS dbt multi-tenant environments. For AWS Virtual Private/Single-Tenant account IDs, contact [dbt Support](https://docs.getdbt.com/docs/dbt-support.md#dbt-cloud-support).* * You need `ACCOUNTADMIN` access to the Snowflake instance to submit a support request. [![Open snowflake case](/img/docs/dbt-cloud/snowflakeprivatelink1.png?v=2 "Open snowflake case")](#)Open snowflake case 2. After Snowflake has granted the requested access, run the Snowflake system function [SYSTEM$GET\_PRIVATELINK\_CONFIG](https://docs.snowflake.com/en/sql-reference/functions/system_get_privatelink_config.html) and copy the output. 3. Add the required information to the following template and submit your request to [dbt Support](https://docs.getdbt.com/docs/dbt-support.md#dbt-cloud-support): ```text Subject: New Multi-Tenant (Azure or AWS) PrivateLink Request - Type: Snowflake - SYSTEM$GET_PRIVATELINK_CONFIG output: - *Use privatelink-account-url or regionless-privatelink-account-url?: - **Create Internal Stage PrivateLink endpoint? (Y/N): - dbt AWS multi-tenant environment (US, EMEA, AU): ``` *\*By default, dbt will be configured to use `privatelink-account-url` from the provided [SYSTEM$GET\_PRIVATELINK\_CONFIG](https://docs.snowflake.com/en/sql-reference/functions/system_get_privatelink_config.html) as the PrivateLink endpoint. Upon request, `regionless-privatelink-account-url` can be used instead.* *\*\* Internal Stage PrivateLink must be [enabled on the Snowflake account](https://docs.snowflake.com/en/user-guide/private-internal-stages-aws#prerequisites) to use this feature* dbt Labs will work on your behalf to complete the private connection setup. Please allow 3-5 business days for this process to complete. Support will contact you when the endpoint is available. #### Create connection in dbt[​](#create-connection-in-dbt "Direct link to Create connection in dbt") Once dbt Support completes the configuration, you can start creating new connections using PrivateLink. 1. Navigate to **Settings** → **Create new project** → select **Snowflake**. 2. You will see two radio buttons: **Public** and **Private**. Select **Private**. 3. Select the private endpoint from the dropdown (this automatically populates the hostname/account field). 4. Configure the remaining data platform details. 5. Test your connection and save it. #### Configuring internal stage PrivateLink in dbt[​](#configuring-internal-stage-privatelink-in- "Direct link to configuring-internal-stage-privatelink-in-") If an Internal Stage PrivateLink endpoint has been provisioned, your dbt environments must be configured to use this endpoint instead of the account default set in Snowflake. 1. Obtain the Internal Stage PrivateLink endpoint DNS from dbt Support. For example, `*.vpce-012345678abcdefgh-4321dcba.s3.us-west-2.vpce.amazonaws.com`. 2. In the appropriate dbt project, navigate to **Orchestration** → **Environments**. 3. In any environment that should use the dbt Internal Stage PrivateLink endpoint, set an **Extended Attribute** similar to the following: ```text s3_stage_vpce_dns_name: '*.vpce-012345678abcdefgh-4321dcba.s3.us-west-2.vpce.amazonaws.com' ``` 4. Save the changes. [![Internal Stage DNS](/img/docs/dbt-cloud/snowflake-internal-stage-dns.png?v=2 "Internal Stage DNS")](#)Internal Stage DNS #### Configuring network policies[​](#configuring-network-policies "Direct link to Configuring network policies") If your organization uses [Snowflake Network Policies](https://docs.snowflake.com/en/user-guide/network-policies) to restrict access to your Snowflake account, you need to add a network rule for dbt. You can request the VPCE IDs from [dbt Support](mailto:support@getdbt.com), that you can use to create a network policy. If creating an endpoint for Internal Stage, the VPCE ID will be different from the VPCE ID of the main service endpoint. Network Policy for Snowflake Internal Stage PrivateLink For guidance on protecting both the Snowflake service and Internal Stage consult the Snowflake [network policies](https://docs.snowflake.com/en/user-guide/network-policies#strategies-for-protecting-both-service-and-internal-stage) and [network rules](https://docs.snowflake.com/en/user-guide/network-rules#incoming-requests) docs. ##### Using the UI[​](#using-the-ui "Direct link to Using the UI") Open the Snowflake UI and take the following steps: 1. Go to the **Security** tab. 2. Click on **Network Rules**. 3. Click on **Add Rule**. 4. Give the rule a name. 5. Select a database and schema where the rule will be stored. These selections are for permission settings and organizational purposes; they do not affect the rule itself. 6. Set the type to `AWS VPCE ID` and the mode to `Ingress`. 7. Type the VPCE ID provided by dbt Support into the identifier box and press **Enter**. 8. Click **Create Network Rule**. [![Create Network Rule](/img/docs/dbt-cloud/snowflakeprivatelink2.png?v=2 "Create Network Rule")](#)Create Network Rule 9. In the **Network Policy** tab, edit the policy you want to add the rule to. This could be your account-level policy or a policy specific to the users connecting from dbt. 10. Add the new rule to the allowed list and click **Update Network Policy**. [![Update Network Policy](/img/docs/dbt-cloud/snowflakeprivatelink3.png?v=2 "Update Network Policy")](#)Update Network Policy ##### Using SQL[​](#using-sql "Direct link to Using SQL") For quick and automated setup of network rules via SQL in Snowflake, the following commands allow you to create and configure access rules for dbt. These SQL examples demonstrate how to add a network rule and update your network policy accordingly. 1. Create a new network rule with the following SQL: ```sql CREATE NETWORK RULE allow_dbt_cloud_access MODE = INGRESS TYPE = AWSVPCEID VALUE_LIST = (''); -- Replace '' with the actual ID provided ``` 2. Add the rule to a network policy with the following SQL: ```sql ALTER NETWORK POLICY ADD ALLOWED_NETWORK_RULE_LIST =('allow_dbt_cloud_access'); ``` #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ##### GCP private connectivity Available to certain Enterprise tiers The private connection feature is available on the following dbt Enterprise tiers: * Business Critical * Virtual Private To learn more about these tiers, contact us at . GCP Private Service Connect enables secure, private connectivity between dbt and your GCP-hosted services. With Private Service Connect, traffic between dbt and your data platforms or self-hosted services stays within the Google Cloud network and does not traverse the public internet. For more details, refer to the [GCP Private Service Connect documentation](https://cloud.google.com/vpc/docs/private-service-connect). #### GCP private connectivity matrix[​](#gcp-private-connectivity-matrix "Direct link to GCP private connectivity matrix") The following charts outline private connectivity options for GCP deployments of dbt ([multi-tenant](https://docs.getdbt.com/docs/cloud/about-cloud/tenancy.md)). **Legend:** * ✅ = Available * ❌ = Not currently available * \* = Shared endpoint (all others are dedicated) *Tenancy:* MT (multi-tenant) — [learn more about tenancy](https://docs.getdbt.com/docs/cloud/about-cloud/tenancy.md). About the following matrix tables These tables indicate whether private connectivity can be established to specific services, considering major factors such as the network and basic auth layers. dbt has validated these configurations using common deployment patterns and typical use cases. However, individual configurations may vary. If you encounter issues or have questions about your environment, [contact dbt Support](https://docs.getdbt.com/community/resources/getting-help.md#dbt-cloud-support) for guidance. **GCP regional considerations:** Some GCP services, such as BigQuery, may have regional restrictions for Private Service Connect endpoints. Refer to [Google's Private Service Connect documentation](https://cloud.google.com/vpc/docs/private-service-connect) for service-specific regional availability. *** ##### Connecting the dbt platform to managed services (Egress)[​](#connecting-the-dbt-platform-to-managed-services-egress "Direct link to Connecting the dbt platform to managed services (Egress)") dbt can establish private connections to managed data platforms and cloud-native services. | Service | MT | Setup guide | | --------------------- | ---- | ------------------------------------------------------------------------------------------- | | Snowflake | ✅ | [View](https://docs.getdbt.com/docs/cloud/secure/private-connectivity/gcp/gcp-snowflake.md) | | Google BigQuery | ✅\* | [View](https://docs.getdbt.com/docs/cloud/secure/private-connectivity/gcp/gcp-bigquery.md) | | Teradata VantageCloud | ✅ | | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | *** ##### Connecting the dbt platform to self-hosted services (Egress)[​](#connecting-the-dbt-platform-to-self-hosted-services-egress "Direct link to Connecting the dbt platform to self-hosted services (Egress)") All of the services below share a common Private Service Connect setup guide — backend configuration varies by service. Self-hosted connections use the customer-provisioned model — you are the service producer and dbt is the consumer. **Setup guide:** [Configuring GCP Private Service Connect for self-hosted services](https://docs.getdbt.com/docs/cloud/secure/private-connectivity/gcp/gcp-self-hosted.md) | Service | MT | | ------------------------ | -- | | GitHub Enterprise Server | ✅ | | GitLab Self-Managed | ✅ | | Bitbucket Data Center | ✅ | | Azure DevOps Server | ✅ | | Postgres | ✅ | | Starburst / Trino | ✅ | | Teradata (self-hosted) | ✅ | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | If you have questions about whether your specific architecture is supported, [contact dbt Support](https://docs.getdbt.com/community/resources/getting-help.md#dbt-cloud-support). #### Cross-region private connections[​](#cross-region-private-connections "Direct link to Cross-region private connections") dbt Labs has globally connected private networks specifically used to host private endpoints, which are connected to dbt instance environments. This connectivity allows dbt environments to connect to any supported region from any dbt instance within the same cloud provider network. To ensure security, access to these endpoints is protected by security groups, network policies, and application connection safeguards, in addition to the authentication and authorization mechanisms provided by each of the connected platforms. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ## Sql Reference ### SQL AND The AND operator returns results that meet all requirements passed into it; compared to the [OR operator](https://docs.getdbt.com/sql-reference/or.md) that only needs to have one true requirement. You’ll often see the AND operator used in a [WHERE clause](https://docs.getdbt.com/sql-reference/where.md) to filter query results or in a case statement to create multiple criteria for a result. Use this page to understand how to use the AND operator and why it might be helpful in analytics engineering work. #### How to use the AND operator[​](#how-to-use-the-and-operator "Direct link to How to use the AND operator") It’s straightforward to use the AND operator, and you’ll typically see it appear in a WHERE clause to filter query results appropriately, in case statements, or joins that involve multiple fields. ```sql -- and in a where clause where and and… -- and in a case statement case when and then … -- and in a join from join on = and = ``` Surrogate keys > joins with AND Using surrogate keys, hashed values of multiple columns, is a great way to avoid using AND operators in joins. Typically, having AND or [OR operators](https://docs.getdbt.com/sql-reference/or.md) in a join can cause the query or model to be potentially inefficient, especially at considerable data volume, so creating surrogate keys earlier in your upstream tables ([using the surrogate key macro](https://docs.getdbt.com/blog/sql-surrogate-keys)) can potentially improve performance in downstream models. ##### SQL AND operator example[​](#sql-and-operator-example "Direct link to SQL AND operator example") ```sql select order_id, status, round(amount) as amount from {{ ref('orders') }} where status = 'shipped' and amount > 20 limit 3 ``` This query using the sample dataset Jaffle Shop’s `orders` table will return results where the order status is shipped and the order amount is greater than $20: | **order\_id** | **status** | **amount** | | ------------- | ---------- | ---------- | | 74 | shipped | 30 | | 88 | shipped | 29 | | 78 | shipped | 26 | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | #### AND operator syntax in Snowflake, Databricks, BigQuery, and Redshift[​](#and-operator-syntax-in-snowflake-databricks-bigquery-and-redshift "Direct link to AND operator syntax in Snowflake, Databricks, BigQuery, and Redshift") Snowflake, Databricks, Google BigQuery, and Amazon Redshift all support the AND operator with the same syntax for it across each platform. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### SQL ANY and ALL The SQL ANY and ALL operators are useful for evaluating conditions to limit query results; they are often passed in with [LIKE](https://docs.getdbt.com/sql-reference/like.md) and [ILIKE](https://docs.getdbt.com/sql-reference/ilike.md) operators. The ANY operator will return true if any of the conditions passed into evaluate to true, while ALL will only return true if *all* conditions passed into it are true. Use this page to better understand how to use ANY and ALL operators, use cases for these operators, and which data warehouses support them. #### How to use the SQL ANY and ALL operators[​](#how-to-use-the-sql-any-and-all-operators "Direct link to How to use the SQL ANY and ALL operators") The ANY and ALL operators have very simple syntax and are often passed in the LIKE/ILIKE operator or subquery: `where like/ilike any/all (array_of_options)` `where = any/all (subquery)` Some notes on this operator’s syntax and functionality: * You may pass in a subquery into the ANY or ALL operator instead of an array of options * Use the ILIKE operator with ANY or ALL to avoid case sensitivity Let’s dive into a practical example using the ANY operator now. ##### SQL ANY example[​](#sql-any-example "Direct link to SQL ANY example") ```sql select order_id, status from {{ ref('orders') }} where status like any ('return%', 'ship%') ``` This simple query using the [Jaffle Shop’s](https://github.com/dbt-labs/jaffle_shop) `orders` table will return orders whose status is like the patterns `start with 'return'` or `start with 'ship'`: | order\_id | status | | --------- | --------------- | | 18 | returned | | 23 | return\_pending | | 74 | shipped | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | Because LIKE is case-sensitive, it would not return results in this query for orders whose status were say `RETURNED` or `SHIPPED`. If you have a mix of uppercase and lowercase strings in your data, consider standardizing casing for strings using the [UPPER](https://docs.getdbt.com/sql-reference/upper.md) and [LOWER](https://docs.getdbt.com/sql-reference/lower.md) functions or use the more flexible ILIKE operator. #### ANY and ALL syntax in Snowflake, Databricks, BigQuery, and Redshift[​](#any-and-all-syntax-in-snowflake-databricks-bigquery-and-redshift "Direct link to ANY and ALL syntax in Snowflake, Databricks, BigQuery, and Redshift") Snowflake and Databricks support the ability to use ANY in a LIKE operator. Amazon Redshift and Google BigQuery, however, do not support the use of ANY in a LIKE or ILIKE operator. Use the table below to read more on the documentation for the ANY operator in your data warehouse. | **Data warehouse** | **ANY support?** | **ALL support?** | | --------------------------------------------------------------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | -------------------------------------------------------------------------------------------------------- | | [Snowflake](https://docs.snowflake.com/en/sql-reference/functions/like_any.html) | ✅ | ✅ | | [Databricks](https://docs.databricks.com/sql/language-manual/functions/like.html) | ✅ | ✅ | | Amazon Redshift | ❌Not supported; consider utilizing multiple OR clauses or [IN operators](https://docs.getdbt.com/sql-reference/in.md). | ❌Not supported; consider utilizing multiple [AND clauses](https://docs.getdbt.com/sql-reference/and.md) | | Google BigQuery | ❌Not supported; consider utilizing [multiple OR clauses](https://stackoverflow.com/questions/54645666/how-to-implement-like-any-in-bigquery-standard-sql) or IN operators. | ❌Not supported; consider utilizing multiple AND clauses | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### SQL ARRAY_AGG In any typical programming language such as Python or Javascript, arrays are typically innate and bountiful; when you’re processing data in SQL, arrays are a little less common but are a handy way to provide more structure to your data. To create an array of multiple data values in SQL, you’ll likely leverage the ARRAY\_AGG function (short for *array aggregation*), which puts your input column values into an array. #### How to use SQL ARRAY\_AGG[​](#how-to-use-sql-array_agg "Direct link to How to use SQL ARRAY_AGG") The ARRAY\_AGG function has the following syntax: `array_agg( [distinct] ) [within group () over ([partition by ])` A few notes on the functionality of this function: * Most of the example syntax from above is optional, meaning the ARRAY\_AGG function can be as simple as `array_agg()` or used as a more complex as a window function * [DISTINCT](https://docs.getdbt.com/sql-reference/distinct.md) is an optional argument that can be passed in, so only distinct values are in the return array * If input column is empty, the returning array will also be empty * Since the ARRAY\_AGG is an aggregate function (gasp!), you’ll need a GROUP BY statement at the end of your query if you’re grouping by certain field * ARRAY\_AGG and similar aggregate functions can become inefficient or costly to compute on large datasets, so use ARRAY\_AGG wisely and truly understand your use cases for having arrays in your datasets Let’s dive into a practical example using the ARRAY\_AGG function. ##### SQL ARRAY\_AGG example[​](#sql-array_agg-example "Direct link to SQL ARRAY_AGG example") ```sql select date_trunc('month', order_date) as order_month, array_agg(distinct status) as status_array from {{ ref('orders') }} group by 1 order by 1 ``` This simple query using the sample dataset [Jaffle Shop’s](https://github.com/dbt-labs/jaffle_shop) `orders` table is returning a new column of distinct order statuses by order month: | order\_month | status\_array | | ------------ | ----------------------------------------------- | | 2018-01-01 | \[ "returned", "completed", "return\_pending" ] | | 2018-02-01 | \[ "completed", "return\_pending" ] | | 2018-03-01 | \[ "completed", "shipped", "placed" ] | | 2018-04-01 | \[ "placed" ] | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | Looking at the query results—this makes sense! We’d expect newer orders to likely not have any returns, and older orders to have completed returns. #### SQL ARRAY\_AGG syntax in Snowflake, Databricks, BigQuery, and Redshift[​](#sql-array_agg-syntax-in-snowflake-databricks-bigquery-and-redshift "Direct link to SQL ARRAY_AGG syntax in Snowflake, Databricks, BigQuery, and Redshift") [Snowflake](https://docs.snowflake.com/en/sql-reference/functions/array_agg.html), [Databricks](https://docs.databricks.com/sql/language-manual/functions/array_agg.html), and [BigQuery](https://cloud.google.com/bigquery/docs/reference/standard-sql/aggregate_functions#array_agg) all support the ARRAY\_AGG function. Redshift, however, supports an out-of-the-box [LISTAGG function](https://docs.aws.amazon.com/redshift/latest/dg/r_LISTAGG.html) that can perform similar functionality to ARRAY\_AGG. The primary difference is that LISTAGG allows you to explicitly choose a delimiter to separate a list whereas arrays are naturally delimited by commas. #### ARRAY\_AGG use cases[​](#array_agg-use-cases "Direct link to ARRAY_AGG use cases") There are definitely too many use cases to list out for using the ARRAY\_AGG function in your dbt models, but it’s very likely that ARRAY\_AGG is used pretty downstream in your DAG since you likely don’t want your data so bundled up earlier in your DAG to improve modularity and dryness. A few downstream use cases for ARRAY\_AGG: * In [`export_` models](https://www.getdbt.com/open-source-data-culture/reverse-etl-playbook) that are used to send data to platforms using a reverse ETL tool to pair down multiple rows into a single row. Some downstream platforms, for example, require certain values that we’d usually keep as separate rows to be one singular row per customer or user. ARRAY\_AGG is handy to bring multiple column values together by a singular id, such as creating an array of all items a user has ever purchased and sending that array downstream to an email platform to create a custom email campaign. * Similar to export models, you may see ARRAY\_AGG used in [mart tables](https://docs.getdbt.com/best-practices/how-we-structure/4-marts.md) to create final aggregate arrays per a singular dimension; performance concerns of ARRAY\_AGG in these likely larger tables can potentially be bypassed with use of [incremental models in dbt](https://docs.getdbt.com/docs/build/incremental-models.md). #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### SQL AVG You’re a data person, so we assume you’re going to be calculating averages of some metrics \**waves hands airily*\* at some point in your career. And the way to calculate averages of a numeric column in SQL is by using the AVG function. #### How to use the AVG function[​](#how-to-use-the-avg-function "Direct link to How to use the AVG function") The AVG function is a part of the group of mathematical or aggregate functions (ex. MIN, MAX, SUM) that are often used in SQL to summarize datasets. You’ll most likely see the AVG function used to straightforwardly calculate the average of a numeric column, but you may also see it used in a window function to calculate rolling averages. ##### AVG function example[​](#avg-function-example "Direct link to AVG function example") The following example is querying from a sample dataset created by dbt Labs called [jaffle\_shop](https://github.com/dbt-labs/jaffle_shop): ```sql select date_trunc('month', order_date) as order_month, round(avg(amount)) as avg_order_amount from {{ ref('orders') }} where status not in ('returned', 'return_pending') group by 1 ``` This query using the Jaffle Shop’s `orders` table will return the rounded order amount per each order month: | order\_month | avg\_order\_amount | | ------------ | ------------------ | | 2018-01-01 | 18 | | 2018-02-01 | 15 | | 2018-03-01 | 18 | | 2018-04-01 | 17 | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | The AVG function, like many other mathematical functions, is an aggregate function. Aggregate functions operate across all rows, or a group of rows, to return a singular value. When calculating the average of a column across a dimension (or group of dimensions)—in our example above, `order_month`—you need a GROUP BY statement; the query above would not successfully run without it. #### SQL AVG function syntax in Snowflake, Databricks, BigQuery, and Redshift[​](#sql-avg-function-syntax-in-snowflake-databricks-bigquery-and-redshift "Direct link to SQL AVG function syntax in Snowflake, Databricks, BigQuery, and Redshift") Snowflake, Databricks, Google BigQuery, and Amazon Redshift all support the ability to take the average of a column value and the syntax for the AVG functions is the same across all of those data platforms. #### AVG function use cases[​](#avg-function-use-cases "Direct link to AVG function use cases") We most commonly see the AVG function used in data work to calculate: * The average of key metrics (ex. Average CSAT, average lead time, average order amount) in downstream [fact or dim models](https://docs.getdbt.com/best-practices/how-we-structure/4-marts.md) * Rolling or moving averages (ex. 7-day, 30-day averages for key metrics) using window functions * Averages in [dbt metrics](https://docs.getdbt.com/docs/build/build-metrics-intro.md) This isn’t an extensive list of where your team may be using the AVG function throughout your dbt models and BI tool logic, but contains some common scenarios analytics engineers face in their day-to-day. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### SQL BETWEEN The SQL BETWEEN condition allows you to specify a range of numerical, date-type, or text values to filter rows on in a query. It’s particularly useful during ad hoc analysis work to narrow query results on a specific data range. In this page, we’ll dive into how to use the SQL BETWEEN condition and elaborate on why it might be useful to you. #### How to use the SQL BETWEEN condition[​](#how-to-use-the-sql-between-condition "Direct link to How to use the SQL BETWEEN condition") The BETWEEN condition has a simple syntax and should be passed in a WHERE clause: `where between and ` It’s important to note that the BETWEEN condition is inclusive of `beginning_value` and `end_value`. Let’s take a look at a practical example using the BETWEEN condition below. ##### SQL BETWEEN example[​](#sql-between-example "Direct link to SQL BETWEEN example") ```sql select customer_id, order_id, order_date from {{ ref('orders') }} where order_date between '2018-01-01' and '2018-01-31' ``` This simple query using the [Jaffle Shop’s](https://github.com/dbt-labs/jaffle_shop) `orders` table will return all rows where the `order_date` falls during January 2018: | **customer\_id** | **order\_id** | **order\_date** | | ---------------- | ------------- | --------------- | | 1 | 1 | 2018-01-01 | | 3 | 2 | 2018-01-02 | | 94 | 3 | 2018-01-04 | | 50 | 4 | 2018-01-05 | | 64 | 5 | 2018-01-05 | | 54 | 6 | 2018-01-07 | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | Alternatively, you could build this same query using >/= operators (`where order_date >= 2018-01-01' and order_date <= '2018-01-31'` or `where order_date >= '2018-01-01' and order_date < '2018-02-01'`). You may additionally see the NOT clause used in front of BETWEEN to exclude rows that fall between specified ranges. #### BETWEEN syntax in Snowflake, Databricks, BigQuery, and Redshift[​](#between-syntax-in-snowflake-databricks-bigquery-and-redshift "Direct link to BETWEEN syntax in Snowflake, Databricks, BigQuery, and Redshift") Most, if not all, modern data warehouses support the BETWEEN condition; the syntax is also the same across them. If your data warehouse does not support the BETWEEN condition, consider using the >/= operators similar to the example outlined above. Use the table below to read more on the documentation for the BETWEEN operator in your data warehouse. | **Data warehouse** | **BETWEEN support?** | | ------------------ | -------------------- | | Snowflake | ✅ | | Databricks | ✅ | | Amazon Redshift | ✅ | | Google BigQuery | ✅ | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | #### SQL BETWEEN condition use cases[​](#sql-between-condition-use-cases "Direct link to SQL BETWEEN condition use cases") You’ll most commonly see the BETWEEN condition used in data work to: * Filter query results to be in a specified date range * Create buckets for data using case statements, common for bucketing web session engagement or NPS score classification ```sql case when time_engaged between 0 and 9 then 'low_engagement' when time_engaged between 10 and 29 then 'medium_engagement' else 'high_engagement' end as engagement ``` This isn’t an extensive list of where your team may be using the BETWEEN condition throughout your dbt models or ad hoc analyses, but contains some common scenarios analytics engineers may encounter. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### SQL CASE WHEN SQL case statements are the backbone of analytics engineers and dbt projects. They help add context to data, make fields more readable or usable, and allow you to create specified buckets with your data. To informally formalize it, case statements are the SQL equivalent of an if-then statement in other programming languages. They allow you to cascade through multiple scenarios (or cases) in your data, evaluate them if they’re true, and output a corresponding value for each case. In this page, we’ll break down how to use SQL case statements and demonstrate why they’re valuable to modern data teams. #### How to use the SQL case statements[​](#how-to-use-the-sql-case-statements "Direct link to How to use the SQL case statements") Case when statements are created in [SELECT statements](https://docs.getdbt.com/sql-reference/select.md) along with other fields you choose to select. The general syntax for SQL case when statements is as follows: ```sql case when [scenario 1] then [result 1] when [scenario 2] then [result 2] -- …as many scenarios as you want when [scenario n] then [result n] else [fallback result] -- this else is optional end as ``` Some notes on case statement functionality: * Scenarios in case statements are *evaluated in the order they’re listed*. What does this mean? It means that if multiple scenarios evaluate to true, the earliest listed true scenario is the one whose result is returned. * The results in each scenario need to be of the same data type; if scenario 1 results in a string, all other scenarios need to be [strings](https://docs.getdbt.com/sql-reference/strings.md). * Oftentimes data teams will omit a final `else` scenario since the `else [fallback result]`is optional and defaulted to `else null`. * In general, case statement performance in select statements is relatively efficient (compared to other SQL functionality like aggregates or clunky joins involving ANDs and ORs); this isn’t to say it’s efficient (or smart) to be comparing a ton of scenarios, but it likely won’t be the bottleneck in your data models. * Case when statement results can also be passed into aggregate functions, such as [MAX](https://docs.getdbt.com/sql-reference/max.md), [MIN](https://docs.getdbt.com/sql-reference/min.md), and [COUNT](https://docs.getdbt.com/sql-reference/count.md), or even date functions (ex. `date_trunc('month', `) Below, let’s take a look at a practical example using a case statement. ##### SQL CASE WHEN example[​](#sql-case-when-example "Direct link to SQL CASE WHEN example") ```sql select order_id, round(amount) as amount, case when amount between 0 and 10 then 'low' when amount between 11 and 20 then 'medium' else 'high' end as order_value_bucket from {{ ref('orders') }} ``` This simple query using the [Jaffle Shop’s](https://github.com/dbt-labs/jaffle_shop) `orders` table will return a new field that buckets order amount based on criteria: | **order\_id** | **amount** | **order\_value\_bucket** | | ------------- | ---------- | ------------------------ | | 1 | 10 | low | | 2 | 20 | medium | | 3 | 1 | low | | 4 | 25 | high | | 5 | 17 | medium | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | #### SQL CASE WHEN syntax in Snowflake, Databricks, BigQuery, and Redshift[​](#sql-case-when-syntax-in-snowflake-databricks-bigquery-and-redshift "Direct link to SQL CASE WHEN syntax in Snowflake, Databricks, BigQuery, and Redshift") Since it’s a fundamental of SQL, most, if not all, modern data warehouses support the ability to add case when statements to their queries. Snowflake, Databricks, Google BigQuery, and Amazon Redshift all support case statements and have the same syntax for them. #### CASE WHEN use cases[​](#case-when-use-cases "Direct link to CASE WHEN use cases") The use cases for case statements in dbt models and ad hoc queries is almost endless; as a result, we won’t (be able to) create an exhaustive list of where you might see case statements in the wild. Instead, it’s important to know *why* you’d want to use them in your data work and when you wouldn’t want to use them. Some example reasons you’d want to leverage case statements: * Create booleans from your existing data (ex. `case when cnt > 1 then true else false end as is_active`) * Establish mappings between raw data and more general buckets of data (see example earlier in the page); note that if you find yourself creating many case when scenarios for a mapping that doesn’t change over time, you’ll likely want to import that mapping either as its own dbt model or data source (a good use case for [seeds](https://docs.getdbt.com/docs/build/seeds.md)) * If you find yourself creating the same case when statement throughout your models, consider abstracting that case when into its own model or into a DRY [macro](https://docs.getdbt.com/docs/build/jinja-macros.md) * Generate more business-user friendly columns values that can be easily comprehended by business users #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### SQL CAST Let’s set the scene: You are knee-deep in a new data model and cannot figure out why the join between `user_id` in` table a` is not successfully joining with the `user_id` in `table b`. You dig a little deeper and discover that `user_id` in `table a` is an integer and `user_id` in `table b` is a string. *Cue throwing hands in the air.* It *will* happen: You’ll find column types in your source data or upstream models that will likely need to be cast into different data types; perhaps to make joins easier, calculations more intuitive, or data more readable. Regardless of the reason, you’ll find yourself inevitably casting some data as an analytics engineer and using the SQL CAST function to help you out. #### How to use SQL CAST function[​](#how-to-use-sql-cast-function "Direct link to How to use SQL CAST function") The syntax for using the CAST function looks like the following: ```sql cast( as ) ``` Executing this function in a SELECT statement will return the column you specified as the newly specified data type. Analytics engineers will typically be casting fields to more appropriate or useful numeric, strings, and date types. You may additionally use the CAST function in WHERE clauses and in joins. Below, we’ll walk through a practical example using the CAST function. ##### SQL CAST function example[​](#sql-cast-function-example "Direct link to SQL CAST function example") You can cast the `order_id` and `customer_id` fields of the [Jaffle Shop’s](https://github.com/dbt-labs/jaffle_shop) `orders` model from number types to strings using the following code: ```sql select cast(order_id as string) as order_id, cast(customer_id as string) as customer_id, order_date, status from {{ ref('orders') }} ``` After running this query, the `orders` table will look a little something like this: | order\_id | customer\_id | order\_date | status | | --------- | ------------ | ----------- | --------- | | 1 | 1 | 2018-01-01 | returned | | 2 | 3 | 2018-01-02 | completed | | 3 | 94 | 2018-01-04 | completed | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | Let’s be clear: the resulting data from this query looks exactly the same as the upstream `orders` model. However, the `order_id` and `customer_id` fields are now strings, meaning you could easily concat different string variables to them. > Casting columns to their appropriate types typically happens in our dbt project’s [staging models](https://docs.getdbt.com/best-practices/how-we-structure/2-staging.md). A few reasons for that: data cleanup and standardization, such as aliasing, casting, and lower or upper casing, should ideally happen in staging models to create downstream uniformity and improve downstream performance. #### SQL CAST function syntax in Snowflake, Databricks, BigQuery, and Redshift[​](#sql-cast-function-syntax-in-snowflake-databricks-bigquery-and-redshift "Direct link to SQL CAST function syntax in Snowflake, Databricks, BigQuery, and Redshift") Google BigQuery, Amazon Redshift, Snowflake, Postgres, and Databricks all support the ability to cast columns and data to different types. In addition, the syntax to cast is the same across all of them using the CAST function. You may also see the CAST function replaced with a double colon (::), followed by the data type to convert to; `cast(order_id as string)` is the same thing as `order_id::string` in most data warehouses. #### CAST function use cases[​](#cast-function-use-cases "Direct link to CAST function use cases") You know at one point you’re going to need to cast a column to a different data type. But what are the scenarios folks run into that call for these conversions? At their core, these conversions need to happen because raw source data doesn’t match the analytics or business use case. This typically happens for a few reasons: * Differences in needs or miscommunication from [backend developers](https://docs.getdbt.com/blog/when-backend-devs-spark-joy#signs-the-data-is-sparking-joy) * ETL tools [defaulting to certain data types](https://airbytehq.github.io/integrations/sources/google-sheets/) * BI tools require certain fields to be specific data types A key thing to remember when you’re casting data is the user experience in your end BI tool: are business users expecting `customer_id` to be filtered on 1 or '1'? What is more intuitive for them? If one `id` field is an integer, all `id` fields should be integers. Just like all data modeling, consistency and standardization is key when determining when and what to cast. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### SQL Comments SQL comments…a two-folded thing: Are we talking about comments *inline* in SQL? Or comments on a table or view in the database? Why not both!? In this page, we’ll unpack how to create both inline and database object-level comments, general best practices around SQL comments, and how dbt can help you improve (and version-control) your comments. #### How to create SQL comments[​](#how-to-create-sql-comments "Direct link to How to create SQL comments") Inline SQL comments will begin with two dashes (--) in front of them in a query or dbt model; any text following these dashes is therefore what you’d call “commented out.” For longer, multi-line comments, you’ll typically see this syntax `/* your multi-line comment here */` used. ##### SQL comment example[​](#sql-comment-example "Direct link to SQL comment example") ```sql /* these lines form a multi-line SQL comment; if it’s uncommented, it will make this query error out */ select customer_id, -- order_id, this row is commented out order_date from {{ ref ('orders') }} ``` In practice, you’ll likely see SQL comments at the beginning of complex code logic, to help future developers or even advanced business users understand what specific blocks of code are accomplishing. Other times, you’ll see comments like the code above, that are commenting out lines no longer needed (or in existence) for that query or model. We’ll dive more into best practices around inline comments later on this page. For comments *on* database objects, such as views and tables, there’s a different syntax to add these explicit comments: ```sql comment on [database object type] is 'comment text here'; ``` These database object-level comments are more useful for adding additional context or metadata to these objects versus inline comments being useful for explaining code functionality. Alternatively, these table and view-level comments can be easily abstracted out and version-controlled using [model descriptions in dbt](https://docs.getdbt.com/reference/resource-properties/description.md) and persisted in the objects using the [persist\_docs config](https://docs.getdbt.com/reference/resource-configs/persist_docs.md) in dbt. #### SQL comments in Snowflake, Databricks, BigQuery, and Redshift[​](#sql-comments-in-snowflake-databricks-bigquery-and-redshift "Direct link to SQL comments in Snowflake, Databricks, BigQuery, and Redshift") Google BigQuery, Amazon Redshift, Snowflake, and Databricks all support the ability to add inline SQL comments. With the exception of BigQuery, these data warehouses also support native database object-level comments; BigQuery does, however, support native field-level descriptions. #### SQL commenting best practices[​](#sql-commenting-best-practices "Direct link to SQL commenting best practices") In general, inline SQL comments should be used thoughtfully; another analytics engineer should be able to pair your comments with your code to clearly understand model functionality. We recommend leveraging inline comments in the following situations: * Explain complex code logic that if you had to scratch your head at, someone else will have to scratch their head at * Explain niche, unique-to-your-business logic * Separate out field types (ex. Ids, booleans, strings, dates, numerics, and timestamps) in [staging models](https://docs.getdbt.com/best-practices/how-we-structure/2-staging.md) to create more readable, organized, and formulaic models * Clearly label tech debt (`-- [TODO]: TECH DEBT`) in queries or models If you find your inline SQL comments are getting out of control, less scannable and readable, that’s a sign to lean more heavily on dbt Docs and markdown files in your dbt project. dbt supports [descriptions](https://docs.getdbt.com/reference/resource-properties/description.md), which allow you to add robust model (or macro, source, snapshot, seed, and source) and column descriptions that will populate in hosted dbt Docs. For models or columns that need more thorough or customizable documentation, leverage [doc blocks in markdown and YAML files](https://docs.getdbt.com/reference/resource-properties/description.md#use-a-docs-block-in-a-description) to create more detailed explanations and comments. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### SQL CONCAT There is no better or simpler way to join multiple string values in a query than by using the CONCAT function. Full stop. It’s a straightforward function with pretty straightforward use cases. Use this page to understand how to use the CONCAT function in your data warehouse and why analytics engineers use it throughout their dbt models. #### How to use the CONCAT function[​](#how-to-use-the-concat-function "Direct link to How to use the CONCAT function") Using the CONCAT function is pretty straightforward: you’ll pass in the strings or binary values you want to join together in the correct order into the CONCAT function. You can pass in as many expressions into the CONCAT function as you would like. ##### CONCAT function example[​](#concat-function-example "Direct link to CONCAT function example") ```sql select user_id, first_name, last_name, concat(first_name, ' ', last_name) as full_name from {{ ref('customers') }} limit 3 ``` This query using the [Jaffle Shop’s](https://github.com/dbt-labs/jaffle_shop) `customers` table will return results like this with a new column of the combined `first_name` and `last_name` field with a space between them: | user\_id | first\_name | last\_name | full\_name | | -------- | ----------- | ---------- | ----------- | | 1 | Michael | P. | Michael P. | | 2 | Shawn | M. | Shawn M. | | 3 | Kathleen | P. | Kathleen P. | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | #### CONCAT function syntax in Snowflake, Databricks, BigQuery, and Redshift[​](#concat-function-syntax-in-snowflake-databricks-bigquery-and-redshift "Direct link to CONCAT function syntax in Snowflake, Databricks, BigQuery, and Redshift") Snowflake, Databricks, Google BigQuery, and Amazon Redshift all support the CONCAT function with the syntax looking the same in each platform. You may additionally see the CONCAT function represented by the `||` operator (ex. `select first_name || last_name AS full_name from {{ ref('customers') }}`) which has the same functionality as the CONCAT function in these data platforms. #### CONCAT use cases[​](#concat-use-cases "Direct link to CONCAT use cases") We most commonly see concatenation in SQL for strings to: * Join together address/geo columns into one field * Add hard-coded string values to columns to create clearer column values * Create surrogate keys using a hashing method and multiple column values (ex. `md5(column_1 || column_2) as unique_id` This isn’t an extensive list of where your team may be using CONCAT throughout your data work, but it contains some common scenarios analytics engineers face day-to-day. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### SQL COUNT COUNT is a SQL function you need to know how to use. Whether it’s in an ad hoc query, a data model, or in a BI tool calculation, you’ll be using the SQL COUNT function countless times (pun intended) in your data work. To formalize it, COUNT is an aggregate function that is used to return the count of rows of a specified field (`count()`) or all rows in a dataset (`count(*)`). It is commonly used to get baseline statistical information of a dataset, help ensure primary keys are unique, and calculate business metrics. #### How to use SQL COUNT in a query[​](#how-to-use-sql-count-in-a-query "Direct link to How to use SQL COUNT in a query") Use the following syntax to generate the aggregate count of a field: `count()` Since COUNT is an aggregate function, you’ll need a GROUP BY statement in your query if you’re looking at counts broken out by dimension(s). If you’re calculating the standalone counts of fields without the need to break them down by another field, you don’t need a GROUP BY statement. Let’s take a look at a practical example using COUNT, DISTINCT, and GROUP BY below. ##### COUNT example[​](#count-example "Direct link to COUNT example") The following example is querying from a sample dataset created by dbt Labs called [jaffle\_shop](https://github.com/dbt-labs/jaffle_shop): ```sql select date_part('month', order_date) as order_month, count(order_id) as count_all_orders, count(distinct(customer_id)) as count_distinct_customers from {{ ref('orders') }} group by 1 ``` This simple query is something you may do while doing initial exploration of your data; it will return the count of `order_ids` and count of distinct `customer_ids` per order month that appear in the Jaffle Shop’s `orders` table: | order\_month | count\_all\_orders | count\_distinct\_customers | | ------------ | ------------------ | -------------------------- | | 1 | 29 | 24 | | 2 | 27 | 25 | | 3 | 35 | 31 | | 4 | 8 | 8 | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | An analyst or analytics engineer may want to perform a query like this to understand the ratio of orders to customers and see how it changes seasonally. #### SQL COUNT syntax in Snowflake, Databricks, BigQuery, and Redshift[​](#sql-count-syntax-in-snowflake-databricks-bigquery-and-redshift "Direct link to SQL COUNT syntax in Snowflake, Databricks, BigQuery, and Redshift") All modern data warehouses support the ability to use the COUNT function (and follow the same syntax!). Some data warehouses, such as Snowflake and Google BigQuery, additionally support a COUNT\_IF/COUNTIF function that allows you to pass in a boolean expression to determine whether to count a row or not. #### COUNT use cases[​](#count-use-cases "Direct link to COUNT use cases") We most commonly see queries using COUNT to: * Perform initial data exploration on a dataset to understand dataset volume, primary key uniqueness, distribution of column values, and more. * Calculate the counts of key business metrics (daily orders, customers created, etc.) in your data models or BI tool. * Define [metrics](https://docs.getdbt.com/docs/build/build-metrics-intro.md) to aggregate key metrics. This isn’t an extensive list of where your team may be using COUNT throughout your development work, dbt models, and BI tool logic, but it contains some common scenarios analytics engineers face day-to-day. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### SQL CROSS JOIN A truly rarely seen, but important join: the cross join. The majority of your analytics engineering work will require you to join tables together to create robust, wide tables that will eventually be exposed to end business users. These models will usually be created using mostly [left](https://docs.getdbt.com/sql-reference/left-join.md) (and some [inner](https://docs.getdbt.com/sql-reference/inner-join.md)) joins. A cross join, on the other hand, typically takes two columns between two database objects and creates a table forming a combination of all rows across joined tables, called a cartesian product. Use this page to understand how to use cross joins and where you might leverage them in your dbt project. #### How to create a cross join[​](#how-to-create-a-cross-join "Direct link to How to create a cross join") Unlike regular joins, cross joins don’t use keys to join database objects together: ```text select from as t1 cross join as t2 ``` Cross joins are one of those SQL concepts that is easier to understand with a tangible example, so let’s jump into it. ##### SQL cross join example[​](#sql-cross-join-example "Direct link to SQL cross join example") Table A `date_spine` | date | | ---------- | | 2022-01-01 | | 2022-01-02 | | 2022-01-03 | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | Table B `users` | user\_id | | -------- | | 1 | | 3 | | 4 | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | ```sql select users.user_id as user_id, date.date as date from {{ ref('users') }} as users cross join {{ ref('date_spine') }} as date order by 1 ``` This simple query will return a cartesian cross of all users and dates, essentially creating a unique combination of user per date per row: | user\_id | type | | -------- | ---------- | | 1 | 2022-01-01 | | 1 | 2022-01-02 | | 1 | 2022-01-03 | | 2 | 2022-01-01 | | 2 | 2022-01-02 | | 2 | 2022-01-03 | | 3 | 2022-01-01 | | 3 | 2022-01-02 | | 3 | 2022-01-03 | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | Generate surrogate keys from cross joins In the generated table above, the unique key is a combination of the `user_id` and `date` per row. To add a primary key to this table, you could generate a surrogate key using an MD5 hash the `generate_surrogate_key` macro in dbt-utils (ex. `{{ dbt_utils.generate_surrogate_key(user_id, type) }}` that could eventually be joined onto other tables. #### SQL cross join use case[​](#sql-cross-join-use-case "Direct link to SQL cross join use case") When would the generated table above be useful? Cross joining unique dates and users can be an effective way to create a base table to join various event counts, such as key website, email, or product events, to. These report-type tables are useful to expose to end business users in BI tools to look at aggregate counts per day per user and other useful measures. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### SQL Data Types Below, we’ll unpack the different umbrellas of data types and the unique data types that fall under each category. #### Numeric data types[​](#numeric-data-types "Direct link to Numeric data types") There are many different numeric types in SQL and that makes sense because…we’re data people and numbers are important, bit length is important, decimal places are even more important, and numbers are ultimately what allow stakeholders to make certain decisions. There’s slight differentiation in which numeric data types are supported across each data warehouse, but fundamentally, it’s most important to understand the differences between integers, decimals, and floats. | **Type** | **Definition** | **Use cases** | | -------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------ | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | Integer | Integers are numbers without fractions. Think 1, 2, 72384191203—nice, clean numbers. | Though many column values may look like integers (and in theory, they are), they’re often reflected or cast as decimal/numeric types to offer future precision and scale if required. | | Decimal | Decimal, also known as the NUMERIC type, is a numeric data type that has a default precision of 38 and a scale of 0. | Typical numeric columns in datasets, such as lifetime value or user ids. Most likely the most common form of numeric data in your tables. | | Float | Floats are used to provide approximate numeric values of fractions, with a precision of up to 64 bits. Floats offer a larger range of values compared to decimals. | Columns that are percentages; longitude/latitude. | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | #### String data types[​](#string-data-types "Direct link to String data types") Strings are everywhere in data—they allow folks to have descriptive text field columns, use regex in their data work, and honestly, they just make the data world go ‘round. To formalize it, a string type is a word, or the combination of characters that you’ll typically see encased in single quotes (ex. 'Jaffle Shop', '1234 Shire Lane', 'Plan A'). Snowflake, Databricks, Google BigQuery, and Amazon Redshift all support the string data type. They may have slightly varying sub-types for strings; some data warehouses such as Snowflake and Redshift support `text`, `char`, and `character` string types which typically differ in byte length in comparison to the generic string type. Again, since most string type columns are inherent in your data, you’ll likely be ok using generic varchar or strings for casting, but it never hurts to read up on the docs specific to your data warehouse string support! #### Date data types[​](#date-data-types "Direct link to Date data types") Dates, timestamps, timezones—all the fun (slightly painful) data things that make analytics engineers real data practitioners (people who occasionally want to yank their hair out). Below, we’ll unpack dates, datetimes, times, and timestamps, to help you better understand the core date data types. Working our way from simplest to most complex, dates, typically represented with the DATE type are what you typically associate with a calendar date (ex. 2022-12-16), and are limited to the range of 0001-01-01 to 9999-12-31. DATETIME values contain both calendar date and time (ex. 2022-12-16 02:33:24) and may additionally include the sub-seconds. TIME types are typically represented as the HH:MM:SS of a time and don’t contain a specified timezone. TIMESTAMP data types allow for the greatest specification and precision of a point in time and can be specified with or without a timezone. Most event-driven data fields (ex. Order completed time, account created time, user churned time) will be represented as timestamps in your data sources. Some data warehouses, such as [Amazon Redshift](https://docs.amazonaws.cn/en_us/redshift/latest/dg/r_Datetime_types.html) and [Snowflake](https://docs.snowflake.com/en/sql-reference/data-types-datetime.html#date-time-data-types), support different timestamp options that allow for explicit specification of a timezone (or lack thereof). In general, the two best practices when it comes to dates and times are: 1. Keep (or convert) timestamps to the same timezone. 2. Keep date types in the most specific date-type as possible: you can always zoom out of a timestamp to get a date, but can’t get a timestamp from a date. You’ll ultimately leverage handy date functions to zoom in and out of dates, convert dates, or add times to dates. #### Booleans[​](#booleans "Direct link to Booleans") A boolean is a column value that is either true, false, or null. In your datasets, you’ll use booleans to create `is_` or `has_` fields to create clear segments in your data; for example, you may use booleans to indicate whether a customer has churned (`has_churned`) or denote employee records (`is_employee`), or filter out records that have been removed from your source data (`is_deleted`). Typically, you’ll see `True` or `False` as the actual boolean values in a column, but may also choose to use numeric values, such as 1 and 0, to represent true and false values. The strings of `True` and `False`, however, tend to be a bit easier to read and interpret for end business users. #### Semi-structured data types[​](#semi-structured-data-types "Direct link to Semi-structured data types") Semi-structured data types are a great way to combine or aggregate data across multiple fields; you may also find yourself in the inverse situation where you need to unpack semi-structured data, such as a JSON object, and unnest it into its individual key-value pair. The two primary semi-structured data types you’ll see across data warehouses are JSON and arrays. Below, we’ll unpack what the difference is between the two and provide an example of each one. | **Type** | **Definition** | **Example** | **Use case** | | -------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | -------------------------------------------------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | JSON | When looking at data formatted in JSON, we say that the data is stored in JSON objects. These are composed of key-value pairs. JSON objects are enclosed in curly brackets ({ }) and each key-value pair is separated by a comma. Read more about using JSON here. | {"customer\_id":2947, "order\_id":4923, "order\_items":"cheesecake"} | One of the great things about JSON data is that it doesn't require schema definition—until you unnest it. Extract exactly what you need from your JSON object, and you can forget about the rest! JSON values will often come inherent in your data sources, so learn how to unnest them and your life will become easier. | | Array | Similar to arrays in other programming languages, an array contains multiple elements that are accessible via its position in that array. | \["cheesecake", "cupcake", "brownie"] | Arrays are a clear way to aggregate multiple values together to create a singular value. Many use cases here, but be cautious: using aggregate functions, such as `array_agg` , can become inefficient on large datasets. | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### SQL DATE_PART In this post, we’re going to give a deep dive into the DATE\_PART function, how it works, and why we use it. The DATE\_PART function allows you to extract a specified date part from a date/time. For example, if you were to extract the month from the date February 14, 2022, it would return 2 since February is the second month in the year. #### How to use the DATE\_PART function[​](#how-to-use-the-date_part-function "Direct link to How to use the DATE_PART function") Like most other SQL functions, you need to pass in arguments; for the DATE\_PART function, you’ll pass in a date/timestamp/date field that you want to extract a date part from and specify the part you want removed. You can extract the numeric month, date, year, hour, seconds, etc. from a timestamp or date field) using the DATE\_PART function using the following syntax: `date_part(, )` Let’s take a look at a practical example below. ##### DATE\_PART function example[​](#date_part-function-example "Direct link to DATE_PART function example") ```sql select date_part('month', order_date) as order_month, round(avg(amount)) as avg_order_amount from {{ ref('orders') }} group by 1 ``` This query using the [Jaffle Shop’s](https://github.com/dbt-labs/jaffle_shop) `orders` table will return the rounded order amount per each order month (represented as a numeric value): | order\_month | avg\_order\_amount | | ------------ | ------------------ | | 1 | 17 | | 2 | 15 | | 3 | 18 | | 4 | 17 | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | Unlike the DATE\_TRUNC function that actually truncates a date to its first instance of a given date part (so it maintains a date structure), the DATE\_PART function returns a numeric value from a date field. You may commonly see the DATE\_PART function replaced with an EXTRACT function, which performs the same functionality. #### DATE\_PART function syntax in Snowflake, Databricks, BigQuery, and Redshift[​](#date_part-function-syntax-in-snowflake-databricks-bigquery-and-redshift "Direct link to DATE_PART function syntax in Snowflake, Databricks, BigQuery, and Redshift") | Data warehouse | DATE\_PART support? | Notes | | --------------- | ------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | Snowflake | ✅ | | | Databricks | ✅ | | | Amazon Redshift | ✅ | | | Google BigQuery | ❌ | BigQuery supports the EXTRACT function which performs the same functionality as the DATE\_PART function | | Postgres | ✅ | This is overly pedantic and you’ll likely never encounter an issue with DATE\_PART and EXTRACT evaluating to differences in values that truly matter, but it’s worth noting. Postgres’ DATE\_PART and EXTRACT functions would previously evaluate to the same output. However, with Postgres 14, the EXTRACT function now returns a numeric type instead of an 8-byte float. | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | #### DATE\_PART function use cases[​](#date_part-function-use-cases "Direct link to DATE_PART function use cases") We most commonly see the DATE\_PART or EXTRACT function used in data work to analyze: * Fiscal calendars: If your business uses fiscal years, or calendars that differ from the normal 12-month cycle, DATE\_PART functions can help create alignment between fiscal calendars and normal calendars * Ad hoc analysis: The DATE\_PART function are useful in ad hoc analyses and queries when you need to look at values grouped by date periods or for period comparisons This isn’t an extensive list of where your team may be using the DATE\_PART function throughout your dbt models and BI tool logic, but it contains some common scenarios analytics engineers face day-to-day. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### SQL DATE_TRUNC In general, data people prefer the more granular over the less granular. [Timestamps > dates](https://docs.getdbt.com/blog/when-backend-devs-spark-joy#signs-the-data-is-sparking-joy), daily data > weekly data, etc.; having data at a more granular level always allows you to zoom in. However, you’re likely looking at your data at a somewhat zoomed-out level—weekly, monthly, or even yearly. To do that, you’re going to need a handy dandy function that helps you round out date or time fields. The DATE\_TRUNC function will truncate a date or time to the first instance of a given date part. Wordy, wordy, wordy! What does this really mean? If you were to truncate `2021-12-13` out to its month, it would return `2021-12-01` (the first day of the month). Using the DATE\_TRUNC function, you can truncate to the weeks, months, years, or other date parts for a date or time field. This can make date/time fields easier to read, as well as help perform cleaner time-based analyses. Overall, it’s a great function to use to help you aggregate your data into specific date parts while keeping a date format. However, the DATE\_TRUNC function isn’t your swiss army knife–it’s not able to do magic or solve all of your problems (we’re looking at you [star](https://getdbt.com/sql-foundations/star-sql-love-letter/)). Instead, DATE\_TRUNC is your standard kitchen knife—it’s simple and efficient, and you almost never start cooking (data modeling) without it. #### How to use the DATE\_TRUNC function​[​](#how-to-use-the-date_trunc-function "Direct link to How to use the DATE_TRUNC function​") For the DATE\_TRUNC function, there are two arguments you must pass in: * The date part: This is the days/months/weeks/years (level) you want your field to be truncated out to * The date/time you want to be truncated The DATE\_TRUNC function can be used in [SELECT](https://docs.getdbt.com/sql-reference/select.md) statements and [WHERE](https://docs.getdbt.com/sql-reference/where.md) clauses. Most, if not all, modern cloud data warehouses support some type of the DATE\_TRUNC function. There may be some minor differences between the argument order for DATE\_TRUNC across data warehouses, but the functionality very much remains the same. Below, we’ll outline some of the slight differences in the implementation between some of the data warehouses. #### The DATE\_TRUNC function in Snowflake and Databricks​[​](#the-date_trunc-function-in-snowflake-and-databricks "Direct link to The DATE_TRUNC function in Snowflake and Databricks​") In [Snowflake](https://docs.snowflake.com/en/sql-reference/functions/date_trunc.html) and [Databricks](https://docs.databricks.com/sql/language-manual/functions/date_trunc.html), you can use the DATE\_TRUNC function using the following syntax: ```sql date_trunc(, ) ``` In these platforms, the \ is passed in as the first argument in the DATE\_TRUNC function. #### The DATE\_TRUNC function in Google BigQuery and Amazon Redshift​[​](#the-date_trunc-function-in-google-bigquery-and-amazon-redshift "Direct link to The DATE_TRUNC function in Google BigQuery and Amazon Redshift​") In [Google BigQuery](https://cloud.google.com/bigquery/docs/reference/standard-sql/date_functions#date_trunc) and [Amazon Redshift](https://docs.aws.amazon.com/redshift/latest/dg/r_DATE_TRUNC.html), the `` is passed in as the first argument and the `` is the second argument: ```sql date_trunc(, ) ``` A note on BigQuery: BigQuery’s DATE\_TRUNC function supports the truncation of date types, whereas Snowflake, Redshift, and Databricks’ `` can be a date or timestamp data type. BigQuery also supports DATETIME\_TRUNC and TIMESTAMP\_TRUNC functions to support truncation of more granular date/time types. #### A dbt macro to remember​[​](#a-dbt-macro-to-remember "Direct link to A dbt macro to remember​") Why Snowflake, Amazon Redshift, Databricks, and Google BigQuery decided to use different implementations of essentially the same function is beyond us and it’s not worth the headache trying to figure that out. Instead of remembering if the `` or the `` comes first, (which, let’s be honest, we can literally never remember) you can rely on a dbt Core macro to help you get away from finicky syntax. [Adapters](https://docs.getdbt.com/docs/supported-data-platforms.md) support [cross-database macros](https://docs.getdbt.com/reference/dbt-jinja-functions/cross-database-macros.md) to help you write certain functions, like DATE\_TRUNC and DATEDIFF, without having to memorize sticky function syntax. Using the [Jaffle Shop](https://github.com/dbt-labs/jaffle_shop/blob/main/models/orders.sql), a simple dataset and dbt project, you can truncate the `order_date` from the orders table using the [dbt DATE\_TRUNC macro](https://docs.getdbt.com/reference/dbt-jinja-functions/cross-database-macros.md#date_trunc): ```sql select order_id, order_date, {{ date_trunc("week", "order_date") }} as order_week, {{ date_trunc("month", "order_date") }} as order_month, {{ date_trunc("year", "order_date") }} as order_year from {{ ref('orders') }} ``` Running the above would product the following sample results: | **order\_id** | **order\_date** | **order\_week** | **order\_month** | **order\_year** | | ------------- | --------------- | --------------- | ---------------- | --------------- | | 1 | 2018-01-01 | 2018-01-01 | 2018-01-01 | 2018-01-01 | | 70 | 2018-03-12 | 2018-03-12 | 2018-03-01 | 2018-01-01 | | 91 | 2018-03-31 | 2018-03-26 | 2018-03-01 | 2018-01-01 | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | The `order_week`, `order_month`, and `order_yea`r fields are the truncated values from the `order_date` field. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### SQL DATEADD If you’ve ever used the DATEADD SQL function across dialects (such as BigQuery, Postgres and Snowflake), you’ve probably had to google the syntax of the function every time. It's almost impossible to remember the argument order (or exact function name) of dateadd. This article will go over how the DATEADD function works, the nuances of using it across the major cloud warehouses, and how to standardize the syntax variances using dbt macro. #### What is the DATEADD SQL function?[​](#what-is-the-dateadd-sql-function "Direct link to What is the DATEADD SQL function?") The DATEADD function in SQL adds a time/date interval to a date and then returns the date. This allows you to add or subtract a certain period of time from a given start date. Sounds simple enough, but this function lets you do some pretty useful things like calculating an estimated shipment date based on the ordered date. #### Differences in DATEADD syntax across data warehouse platforms[​](#differences-in-dateadd-syntax-across-data-warehouse-platforms "Direct link to Differences in DATEADD syntax across data warehouse platforms") All of them accept the same rough parameters, in slightly different syntax and order: * Start / from date * Datepart (day, week, month, year) * Interval (integer to increment by) The *functions themselves* are named slightly differently, which is common across SQL dialects. ##### For example, the DATEADD function in Snowflake…[​](#for-example-the-dateadd-function-in-snowflake "Direct link to For example, the DATEADD function in Snowflake…") ```text dateadd( {{ datepart }}, {{ interval }}, {{ from_date }} ) ``` *Hour, minute and second are supported!* ##### The DATEADD function in Databricks[​](#the-dateadd-function-in-databricks "Direct link to The DATEADD function in Databricks") ```sql date_add( {{ startDate }}, {{ numDays }} ) ``` ##### The DATEADD function in BigQuery…[​](#the-dateadd-function-in-bigquery "Direct link to The DATEADD function in BigQuery…") ```sql date_add( {{ from_date }}, INTERVAL {{ interval }} {{ datepart }} ) ``` *Dateparts of less than a day (hour / minute / second) are not supported.* ##### The DATEADD function in Postgres…[​](#the-dateadd-function-in-postgres "Direct link to The DATEADD function in Postgres…") Postgres doesn’t provide a dateadd function out of the box, so you’ve got to go it alone - but the syntax looks very similar to BigQuery’s function… ```sql {{ from_date }} + (interval '{{ interval }} {{ datepart }}') ``` Switching back and forth between those SQL syntaxes usually requires a quick scan through the warehouse’s docs to get back on the horse. #### Standardizing your DATEADD SQL syntax with a dbt macro[​](#standardizing-your-dateadd-sql-syntax-with-a-dbt-macro "Direct link to Standardizing your DATEADD SQL syntax with a dbt macro") But couldn’t we be doing something better with those keystrokes, like typing out and then deleting a tweet? dbt helps smooth out these wrinkles of writing [SQL across data warehouses](https://docs.getdbt.com/reference/dbt-jinja-functions/cross-database-macros.md). Instead of looking up the syntax each time you use it, you can just write it the same way each time, and the macro compiles it to run on your chosen warehouse: ```text {{ dateadd(datepart, interval, from_date_or_timestamp) }} ``` Adding 1 month to a specific date would look like… ```text {{ dateadd(datepart="month", interval=1, from_date_or_timestamp="'2021-08-12'") }} ``` #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### SQL DATEDIFF *“How long has it been since this customer last ordered with us?”*
*“What is the average number of days to conversion?”* Business users will have these questions, data people will have to answer these questions, and the only way to solve them is by calculating the time between two different dates. Luckily, there’s a handy DATEDIFF function that can do that for you. The DATEDIFF function will return the difference in specified units (ex. days, weeks, years) between a start date/time and an end date/time. It’s a simple and widely used function that you’ll find yourself using more often than you expect. DATEDIFF is a little bit like your favorite pair of socks; you’ll usually find the first one easily and feel like the day is going to be great. But for some reason, the matching sock requires a little digging in the drawer. DATEDIFF is this pair of socks—you’ll inevitably find yourself Googling the syntax almost every time you use it, but you can’t go through your day without using it. This page will go over how to use the DATEDIFF function across different data warehouses and how to write more standardized DATEDIFF functions using a dbt macro (or successfully find your socks as a pair in one go). #### How to use the DATEDIFF function​[​](#how-to-use-the-datediff-function "Direct link to How to use the DATEDIFF function​") For the DATEDIFF function, there are three elements, or arguments, passed in: * The date part: This is the days/months/weeks/years (unit) of the difference calculated * The first (start) date/time * The second (end) date/time The DATEDIFF function can be used in [SELECT](https://docs.getdbt.com/sql-reference/select.md) statements and WHERE clauses. Most, if not all, modern cloud data warehouses support some type of the DATEDIFF function. There may be some minor differences between the argument order and function name for DATEDIFF across data warehouses, but the functionality very much remains the same. Below, we’ll outline some of the slight differences in the implementation between some data warehouses. #### SQL DATEDIFF function syntax in Snowflake, Databricks, and Redshift[​](#sql-datediff-function-syntax-in-snowflake-databricks-and-redshift "Direct link to SQL DATEDIFF function syntax in Snowflake, Databricks, and Redshift") The syntax for using the DATEDIFF function in Snowflake and Amazon Redshift, and Databricks looks like the following: ```sql datediff(, , ) ``` A note on Databricks: Databricks additionally supports a separate [DATEDIFF function](https://docs.databricks.com/sql/language-manual/functions/datediff.html) that takes only two arguments: a start date and an end date. The function will always return the difference between two dates in days. ##### DATEDIFF in Google BigQuery​[​](#datediff-in-google-bigquery "Direct link to DATEDIFF in Google BigQuery​") The syntax for using the DATEDIFF function in [Google BigQuery](https://cloud.google.com/bigquery/docs/reference/standard-sql/datetime_functions#datetime_diff) looks like the following: * Unlike in Snowflake, Amazon Redshift, and Databricks where the `` is passed as the first argument, the `` is passed in as the last argument in Google BigQuery. * Google BigQuery also calls the function DATETIME\_DIFF with an additional underscore separating the function name. This is on par with [Google BigQuery’s preference to have underscores in function names](https://cloud.google.com/bigquery/docs/reference/standard-sql/date_functions). * The DATETIME\_DIFF arguments are datetimes, not dates; Snowflake, Redshift, and Databricks’ DATEDIFF functions support multiple [date types](https://docs.getdbt.com/sql-reference/data-types.md#date-data-types) such as dates and timestamps. BigQuery also supports a separate [DATE\_DIFF function](https://cloud.google.com/bigquery/docs/reference/standard-sql/date_functions#date_diff) that will return the difference between two date types, unlike the DATETIME\_DIFF that only supports the datetime type. #### A hero in the shadows: The DATEDIFF dbt macro!​[​](#a-hero-in-the-shadows-the-datediff-dbt-macro "Direct link to A hero in the shadows: The DATEDIFF dbt macro!​") You may be able to memorize the syntax for the DATEDIFF function for the primary data warehouse you use. What happens when you switch to a different one for a new job or a new data stack? Remembering if there’s an underscore in the function name or which argument the `` is passed in as is… no fun and leads to the inevitable, countless “datediff in bigquery” Google searches. Luckily, [dbt Core](https://github.com/dbt-labs/dbt-core) has your back! dbt Core is the open source dbt product that helps data folks write their [data transformations](https://www.getdbt.com/analytics-engineering/transformation/) following software engineering best practices. [Adapters](https://docs.getdbt.com/docs/supported-data-platforms.md) support [cross-database macros](https://docs.getdbt.com/reference/dbt-jinja-functions/cross-database-macros.md) to help you write certain functions, like DATE\_TRUNC and DATEDIFF, without having to memorize sticky function syntax. Using the DATEDIFF macro, you can calculate the difference between two dates without having to worry about finicky syntax. Specifically, this means you could successfully run the same code across multiple databases without having to worry about the finicky differences in syntax. Using the [jaffle shop](https://github.com/dbt-labs/jaffle_shop/blob/main/models/orders.sql), a simple dataset and dbt project, we can calculate the difference between two dates using the dbt DATEDIFF macro: ```sql select *, {{ datediff("order_date", "'2022-06-09'", "day") }} from {{ ref('orders') }} ``` This would return all fields from the orders table and the difference in days between order dates and June 9, 2022. Under the hood, this macro is taking your inputs and creating the appropriate SQL syntax for the DATEDIFF function *specific to your data warehouse*. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### SQL DISTINCT Let’s just put it out there: at one point in your data work, you’ll encounter duplicates in your data. They may be introduced from a faulty data source or created during the joining and transforming of data. You may need a more sophisticated or refactored solution for the latter scenario, but it never hurts to know how to use DISTINCT in a query. Using DISTINCT in a SELECT statement will force a query to only return non-duplicate rows. You may commonly see a DISTINCT clause in COUNT functions to get counts of distinct rows. #### How to use SQL DISTINCT in a query[​](#how-to-use-sql-distinct-in-a-query "Direct link to How to use SQL DISTINCT in a query") To remove duplicate rows from a query, you add DISTINCT immediately after SELECT followed by the rows you want to be selected: ```sql select distinct row_1, row_2 from my_data_source ``` Let’s take a look at a practical example using DISTINCT below. ##### SQL DISTINCT example[​](#sql-distinct-example "Direct link to SQL DISTINCT example") ```sql select count(customer_id) as cnt_all_orders, count(distinct customer_id) as cnt_distinct_customers from {{ ref('orders') }} ``` This simple query is something you may do while doing initial exploration of your data; it will return the count of `customer_ids` and count of distinct `customer_ids` that appear in the [Jaffle Shop’s](https://github.com/dbt-labs/jaffle_shop) `orders` table: | cnt\_all\_orders | cnt\_distinct\_customers | | ---------------- | ------------------------ | | 99 | 62 | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | As you can see from the query results, there are 99 orders placed by customers, but only 62 distinct customers in the table. #### DISTINCT syntax in Snowflake, Databricks, BigQuery, and Redshift[​](#distinct-syntax-in-snowflake-databricks-bigquery-and-redshift "Direct link to DISTINCT syntax in Snowflake, Databricks, BigQuery, and Redshift") Since it’s a pillar of SQL, all modern data warehouses support the ability to use DISTINCT in a SELECT statement 😀 #### DISTINCT use cases[​](#distinct-use-cases "Direct link to DISTINCT use cases") You’ll most commonly see queries using a DISTINCT statement to: * Remove unnecessary duplicate rows from a data model; a word of caution on this: if you need to use DISTINCT in a downstream, non-source model that contains joins, there’s a chance that there could be faulty logic producing duplicates in the data, so always double-check that they are true duplicates. * Find the counts of distinct fields in a dataset, especially for primary or surrogate keys. This isn’t an extensive list of where your team may be using DISTINCT throughout your development work, dbt models, and BI tool logic, but it contains some common scenarios analytics engineers face day-to-day. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### SQL FROM What makes the analytics world go ‘round? Queries and bad graphs. (Since we’re here to keep it brief, we won’t go into the latter here 😉) The first thing someone learns in SQL: how to build a query using [SELECT](https://docs.getdbt.com/sql-reference/select.md) and FROM statements. The SQL FROM statement is the fundamental building block of any query: it allows you to identify the database schema object (table/view) you want to select data from in a query. In a dbt project, a SQL dbt model is technically a singular SELECT statement (often built leveraging CTEs or subqueries) using a [reference](https://docs.getdbt.com/reference/dbt-jinja-functions/ref.md) to an upstream data model or table in a FROM statement. #### How to use SQL FROM statements[​](#how-to-use-sql-from-statements "Direct link to How to use SQL FROM statements") Any query begins with a simple SELECT statement and wrapped up with a FROM statement: ```sql select order_id, --select your columns customer_id, order_date from {{ ref('orders') }} --the table/view/model you want to select from limit 3 ``` Woah woah woah! That is not the typical FROM statement you’re probably used to seeing! Most FROM statements in the non-dbt world, such as when you’re running ad-hoc queries directly in your data warehouse, will follow the `FROM database.schema.table_name` syntax. In dbt projects, analytics engineers leverage [the ref statement](https://docs.getdbt.com/reference/dbt-jinja-functions/ref.md) to refer to other data models and sources to automatically build a dependency graph and avoid having to hard-code schema names. This flexibility is valuable as analytics engineers develop in their own development environments (schemas) without having to rename tables in their FROM statements. This basic query is selecting three columns from the [Jaffle Shop’s](https://github.com/dbt-labs/jaffle_shop/blob/main/models/orders.sql) `orders` table and returning three rows. If you execute this query in your data warehouse, it will return a result looking like this: | **order\_id** | **customer\_id** | **order\_date** | | ------------- | ---------------- | --------------- | | 1 | 1 | 2018-01-01 | | 2 | 3 | 2018-01-02 | | 3 | 95 | 2018-01-04 | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | In the query above, dbt automatically compiles the `from {{ ref('orders') }}` to `from analytics.jaffle_shop.orders` when the query is sent down to the data warehouse and run in the production environment. If you’re selecting from multiple tables or models, that’s where you’d rely on unions or joins to bring multiple tables together in a way that makes sense to your data. #### FROM statement syntax in Snowflake, Databricks, BigQuery, and Redshift[​](#from-statement-syntax-in-snowflake-databricks-bigquery-and-redshift "Direct link to FROM statement syntax in Snowflake, Databricks, BigQuery, and Redshift") Just as the humble SELECT statement is a SQL fundamental that goes untampered by the data warehouses, FROM syntax does not vary within them. As a result, writing the actual `select…from` statement across Snowflake, Databricks, Google BigQuery, and Amazon Redshift would look the same. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### SQL GROUP BY GROUP BY…it’s a little hard to explicitly define in a way *that actually makes sense*, but it will inevitably show up countless times in analytics work and you’ll need it frequently. To put it in the simplest terms, the GROUP BY statement allows you to group query results by specified columns and is used in pair with aggregate functions such as [AVG](https://docs.getdbt.com/sql-reference/avg.md) and [SUM](https://docs.getdbt.com/sql-reference/sum.md) to calculate those values across specific rows. #### How to use the SQL GROUP BY statement[​](#how-to-use-the-sql-group-by-statement "Direct link to How to use the SQL GROUP BY statement") The GROUP BY statement appears at the end of a query, after any joins and [WHERE](https://docs.getdbt.com/sql-reference/where.md) filters have been applied: ```sql select my_first_field, count(id) as cnt --or any other aggregate function (sum, avg, etc.) from my_table where my_first_field is not null group by 1 --grouped by my_first_field order by 1 desc ``` A few things to note about the GROUP BY implementation: * It’s usually listed as one of the last rows in a query, after any joins or where statements; typically you’ll only see [HAVING](https://docs.getdbt.com/sql-reference/having.md), [ORDER BY](https://docs.getdbt.com/sql-reference/order-by.md), or [LIMIT](https://docs.getdbt.com/sql-reference/limit.md) statements following it in a query * You can group by multiple fields (ex. `group by 1,2,3`) if you need to; in general, we recommend performing aggregations and joins in separate CTEs to avoid having to group by too many fields in one query or CTE * You may also group by explicit column name (ex. `group by my_first_field`) or even a manipulated column name that is in the query (ex. `group by date_trunc('month', order_date)`) Readability over DRYness? Grouping by explicit column name (versus column number in query) can be two folded: on one hand, it’s potentially more readable by end business users; on the other hand, if a grouped column name changes, that name change needs to be reflected in the group by statement. Use a grouping convention that works for you and your data, but try to keep to one standard style. ##### SQL GROUP BY example[​](#sql-group-by-example "Direct link to SQL GROUP BY example") ```sql select customer_id, count(order_id) as num_orders from {{ ref('orders') }} group by 1 order by 1 limit 5 ``` This simple query using the sample dataset [Jaffle Shop’s](https://github.com/dbt-labs/jaffle_shop) `order` table will return customers and the count of orders they’ve placed: | customer\_id | num\_orders | | ------------ | ----------- | | 1 | 2 | | 2 | 1 | | 3 | 3 | | 6 | 1 | | 7 | 1 | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | Note that the `order by` and `limit` statements are after the `group by` in the query. #### SQL GROUP BY syntax in Snowflake, Databricks, BigQuery, and Redshift[​](#sql-group-by-syntax-in-snowflake-databricks-bigquery-and-redshift "Direct link to SQL GROUP BY syntax in Snowflake, Databricks, BigQuery, and Redshift") Snowflake, Databricks, BigQuery, and Redshift all support the ability to group by columns and follow the same syntax. #### GROUP BY use cases[​](#group-by-use-cases "Direct link to GROUP BY use cases") Aggregates, aggregates, and did we mention, aggregates? GROUP BY statements are needed when you’re calculating aggregates (averages, sum, counts, etc.) by specific columns; your query will not run successfully without them if you’re attempting to use aggregate functions in your query. You may also see GROUP BY statements used to deduplicate rows or join aggregates onto other tables with CTEs; [this article provides a great writeup](https://www.getdbt.com/blog/write-better-sql-a-defense-of-group-by-1/) on specific areas you might see GROUP BYs used in your dbt projects and data modeling work. 👋Bye bye finicky group bys In some sticky data modeling scenarios, you may find yourself needing to group by many columns to collapse a table down into fewer rows or deduplicate rows. In that scenario, you may find yourself writing `group by 1, 2, 3,.....,n` which can become tedious, confusing, and difficult to troubleshoot. Instead, you can leverage a [dbt macro](https://github.com/dbt-labs/dbt-utils#group_by-source) that will save you from writing `group by 1,2,....,46` to instead a simple `{{ dbt_utils.group_by(46) }}`...you’ll thank us later 😉 #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### SQL HAVING SQL HAVING is just one of those little things that are going to make your ad hoc data work a little easier. A not-so-fun fact about the [WHERE clause](https://docs.getdbt.com/sql-reference/where.md) is that you can’t filter on aggregates with it…that’s where HAVING comes in. With HAVING, you can not only define an aggregate in a [select](https://docs.getdbt.com/sql-reference/select.md) statement, but also filter on that newly created aggregate within the HAVING clause. This page will walk through how to use HAVING, when you should use it, and discuss data warehouse support for it. #### How to use the HAVING clause in SQL[​](#how-to-use-the-having-clause-in-sql "Direct link to How to use the HAVING clause in SQL") The HAVING clause essentially requires one thing: an aggregate field to evaluate. Since HAVING is technically a boolean, it will return rows that execute to true, similar to the WHERE clause. The HAVING condition is followed after a [GROUP BY statement](https://docs.getdbt.com/sql-reference/group-by.md) and optionally enclosed with an ORDER BY statement: ```sql select -- query from
group by having condition [optional order by] ``` That example syntax looks a little gibberish without some real fields, so let’s dive into a practical example using HAVING. ##### SQL HAVING example[​](#sql-having-example "Direct link to SQL HAVING example") * HAVING example * CTE example ```sql select customer_id, count(order_id) as num_orders from {{ ref('orders') }} group by 1 having num_orders > 1 --if you replace this with `where`, this query would not successfully run ``` ```sql with counts as ( select customer_id, count(order_id) as num_orders from {{ ref('orders') }} group by 1 ) select customer_id, num_orders from counts where num_orders > 1 ``` This simple query using the sample dataset [Jaffle Shop’s](https://github.com/dbt-labs/jaffle_shop) `orders` table will return customers who have had more than one order: | customer\_id | num\_orders | | ------------ | ----------- | | 1 | 2 | | 3 | 3 | | 94 | 2 | | 64 | 2 | | 54 | 4 | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | The query above using the CTE utilizes more lines compared to the simpler query using HAVING, but will produce the same result. #### SQL HAVING clause syntax in Snowflake, Databricks, BigQuery, and Redshift[​](#sql-having-clause-syntax-in-snowflake-databricks-bigquery-and-redshift "Direct link to SQL HAVING clause syntax in Snowflake, Databricks, BigQuery, and Redshift") [Snowflake](https://docs.snowflake.com/en/sql-reference/constructs/having.html), [Databricks](https://docs.databricks.com/sql/language-manual/sql-ref-syntax-qry-select-having.html), [BigQuery](https://cloud.google.com/bigquery/docs/reference/standard-sql/query-syntax#having_clause), and [Redshift](https://docs.aws.amazon.com/redshift/latest/dg/r_HAVING_clause.html) all support the HAVING clause and the syntax for using HAVING is the same across each of those data warehouses. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### SQL ILIKE The favorite child ILIKE helps you easily match, find, and filter out string values of a specified pattern by using SQL wildcards *without having to worry about case sensitivity*. If you’re a stickler for case-sensitivity, don’t hesitate to use the not-as-special (but still important) child, the LIKE operator 😆 #### How to use the SQL ILIKE operator[​](#how-to-use-the-sql-ilike-operator "Direct link to How to use the SQL ILIKE operator") The ILIKE operator has a simple syntax, with the ability to have it utilized in WHERE clauses or case statements: `where ilike ''` or `case when ilike ''` Some notes on this operator’s syntax and functionality: * The `` can use two SQL wildcards (`%` and `_`); the underscore will match any single character and the % matches zero or more characters * Ex. '%j' = any string that ends with the letter j * Ex. 'j%' = any string that starts with a letter j * Ex. 'j%l' = any string that starts with a the letter j and ends with a letter l * Ex. '\_j%' = any string that has a letter j in the second position * Majority of use cases for the ILIKE operator will likely involve the `%` wildcard * The ILIKE operator is case-insensitive, meaning that the casing in the `` you want to filter does not need to match the same-case in your column values * The ILIKE operator can be paired with the NOT operator, to filter on rows that are not like a specified pattern Let’s dive into a practical example using the ILIKE operator now. ##### SQL ILIKE example[​](#sql-ilike-example "Direct link to SQL ILIKE example") ```sql select payment_id, order_id, payment_method, case when payment_method ilike '%card' then 'card_payment' else 'non_card_payment' end as was_card from {{ ref('payments') }} ``` This simple query using the [Jaffle Shop’s](https://github.com/dbt-labs/jaffle_shop) `payments` table is creating a new column to determine if a payment used a type of card (ex. debit card, credit card, gift card) payment based on if the `payment_method` value ends in `card`: | **payment\_id** | **order\_id** | **payment\_method** | **was\_card** | | --------------- | ------------- | ------------------- | ------------------ | | 1 | 1 | credit\_card | card\_payment | | 9 | 9 | gift\_card | card\_payment | | 3 | 3 | coupon | non\_card\_payment | | 4 | 4 | coupon | non\_card\_payment | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | #### ILIKE syntax in Snowflake, Databricks, BigQuery, and Redshift[​](#ilike-syntax-in-snowflake-databricks-bigquery-and-redshift "Direct link to ILIKE syntax in Snowflake, Databricks, BigQuery, and Redshift") Most modern data warehouses, with the exception of Google BigQuery, support the ILIKE operator and the syntax is the same across them. Use the table below to read more on the documentation for the ILIKE operator in your data warehouse. | **Data warehouse** | **ILIKE support?** | | ------------------ | ---------------------------------------------------------------- | | Snowflake | ✅ | | Databricks | ✅ | | Amazon Redshift | ✅ | | Google BigQuery | ❌, recommend using regular expressions or the CONTAINS function | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | #### ILIKE operator example use cases[​](#ilike-operator-example-use-cases "Direct link to ILIKE operator example use cases") The ILIKE operator has very similar use cases to the [LIKE operator](https://docs.getdbt.com/sql-reference/like.md), so we won’t repeat ourselves here. The important thing to understand when using the LIKE or ILIKE operators is what the casing variations look like in your data: if casing is inconsistent within a column, ILIKE will be your friend; if your backend engineers and analytics engineers rigorously follow a style-guide (and our source data is magically of the same case), the LIKE operator is there for you if you need it. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### SQL IN It happens to the best of data people: The `orders` table always needs to filter out `status = employee_order` in order to get the accurate order counts. So you’re data model for the `orders` table looks a little something like this: ```sql select * from {{ source('backend_db', 'orders') }} where status != 'employee_order' ``` What happens one day if there’s an additional `status` that needs to be filtered out? Well, that’s where the handy IN operator comes into play. The IN operator ultimately allows you to specify multiple values in a WHERE clause, so you can easily filter your query on multiple options. Using the IN operator is a more refined version of using multiple OR conditions in a WHERE clause. #### How to use SQL IN operator[​](#how-to-use-sql-in-operator "Direct link to How to use SQL IN operator") In the scenario above if you now needed to filter on an additional new `status` value to remove certain rows, your use of the IN operator would look like this: ```sql select * from {{ source('backend_db', 'orders') }} where status not in ('employee_order', 'influencer_order') --list of order statuses to filter out ``` Woah woah woah, what is a `not in`? This is exactly what it sounds like: return all rows where the status is not `employee_order` or `influencer_order`. If you wanted to just use the IN operator, you can specify all other statuses that are appropriate (ex. `where status in ('regular_order', 'temp_order')`). You can additionally use the IN/NOT IN operator for a subquery, to remove/include rows from a subquery’s result: ```sql where status in (select …) ``` Compare columns against appropriate data types The only “gotcha” that really exists in using the IN operator is remembering that the values in your IN list **must** match the data type of the column they’re compared against. This is especially important for boolean columns that could be accidentally cast as strings. #### IN operator syntax in Snowflake, Databricks, BigQuery, and Redshift[​](#in-operator-syntax-in-snowflake-databricks-bigquery-and-redshift "Direct link to IN operator syntax in Snowflake, Databricks, BigQuery, and Redshift") The IN operator, like most of the SQL operators, are not syntactically different across data warehouses. That means the syntax for using the IN/NOT IN operator is the same in Snowflake, Databricks, Google BigQuery, and Amazon Redshift. #### IN operator use cases[​](#in-operator-use-cases "Direct link to IN operator use cases") Use the IN condition to filter out inappropriate or inaccurate rows from a query or database schema object based on parameters you define and understand. We guarantee there’s an IN somewhere in your dbt project 😀 #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### SQL INNER JOINS The cleanest and easiest of SQL joins: the humble inner join. Just as its name suggests, an inner join between two database objects returns all rows that have matching join keys; any keys that don’t match are omitted from the query result. #### How to create an inner join[​](#how-to-create-an-inner-join "Direct link to How to create an inner join") Like all joins, you need some database objects (ie tables/views), keys to join on, and a [select statement](https://docs.getdbt.com/sql-reference/select.md) to perform an inner join: ```text select from as t1 inner join as t2 on t1.id = t2.id ``` In this example above, there’s only one field from each table being used to join the two together; if you’re joining between two database objects that require multiple fields, you can leverage AND/OR operators, and more preferably, surrogate keys. You may additionally add [WHERE](https://docs.getdbt.com/sql-reference/where.md), [GROUP BY](https://docs.getdbt.com/sql-reference/group-by.md), [ORDER BY](https://docs.getdbt.com/sql-reference/order-by.md), [HAVING](https://docs.getdbt.com/sql-reference/having.md), and other clauses after your joins to create filtering, ordering, and performing aggregations. As with any query, you can perform as many joins as you want in a singular query. A general word of advice: try to keep data models modular by performing regular DAG audits. If you join certain tables further upstream, are those individual tables needed again further downstream? If your query involves multiple joins and complex logic and is exposed to end business users, ensure that you leverage table or [incremental materializations](https://docs.getdbt.com/docs/build/incremental-models.md). ##### SQL inner join example[​](#sql-inner-join-example "Direct link to SQL inner join example") Table A `car_type` | user\_id | car\_type | | -------- | --------- | | 1 | van | | 2 | sedan | | 3 | truck | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | Table B `car_color` | user\_id | car\_color | | -------- | ---------- | | 1 | red | | 3 | green | | 4 | yellow | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | ```sql select car_type.user_id as user_id, car_type.car_type as type, car_color.car_color as color from {{ ref('car_type') }} as car_type inner join {{ ref('car_color') }} as car_color on car_type.user_id = car_color.user_id ``` This simple query will return all rows that have the same `user_id` in both Table A and Table B: | user\_id | type | color | | -------- | ----- | ----- | | 1 | van | red | | 3 | truck | green | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | Because there’s no `user_id` = 4 in Table A and no `user_id` = 2 in Table B, rows with ids 2 and 4 (from either table) are omitted from the inner join query results. #### SQL inner join use cases[​](#sql-inner-join-use-cases "Direct link to SQL inner join use cases") There are probably countless scenarios where you’d want to inner join multiple tables together—perhaps you have some really nicely structured tables with the exact same primary keys that should really just be one larger, wider table or you’re joining two tables together don’t want any null or missing column values if you used a left or right join—it’s all pretty dependent on your source data and end use cases. Where you will not (and should not) see inner joins is in [staging models](https://docs.getdbt.com/best-practices/how-we-structure/2-staging.md) that are used to clean and prep raw source data for analytics uses. Any joins in your dbt projects should happen further downstream in [intermediate](https://docs.getdbt.com/best-practices/how-we-structure/3-intermediate.md) and [mart models](https://docs.getdbt.com/best-practices/how-we-structure/4-marts.md) to improve modularity and DAG cleanliness. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### SQL LEFT JOIN An analytics engineer favorite: the left join. Without a doubt, this is probably the most regularly used join in any dbt project (and for good reason). The left join returns all rows in the [FROM statement](https://docs.getdbt.com/sql-reference/from.md), regardless of match in the left join database object. Compare this to an [inner join](https://docs.getdbt.com/sql-reference/inner-join.md), where only rows are returned that have successful key matches between the database object in the FROM statement and in the inner join statement. #### How to create a left join[​](#how-to-create-a-left-join "Direct link to How to create a left join") Like all joins, you need some database objects (ie tables/views), keys to join on, and a [select statement](https://docs.getdbt.com/sql-reference/select.md) to perform a left join: ```text select from as t1 left join as t2 on t1.id = t2.id ``` In this example above, there’s only one field from each table being used to join the two together together; if you’re joining between two database objects that require multiple fields, you can leverage AND/OR operators, and more preferably, surrogate keys. You may additionally add [WHERE](https://docs.getdbt.com/sql-reference/where.md), [GROUP BY](https://docs.getdbt.com/sql-reference/group-by.md), [ORDER BY](https://docs.getdbt.com/sql-reference/order-by.md), [HAVING](https://docs.getdbt.com/sql-reference/having.md), and other clauses after your joins to create filtering, ordering, and performing aggregations. You may also left (or any join really) as many joins as you’d like in an individual query or CTE. ##### SQL left join example[​](#sql-left-join-example "Direct link to SQL left join example") Table A `car_type` | **user\_id** | **car\_type** | | ------------ | ------------- | | 1 | van | | 2 | sedan | | 3 | truck | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | Table B `car_color` | user\_id | car\_color | | -------- | ---------- | | 1 | red | | 3 | green | | 4 | yellow | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | ```sql select car_type.user_id as user_id, car_type.car_type as type, car_color.car_color as color from {{ ref('car_type') }} as car_type left join {{ ref('car_color') }} as car_color on car_type.user_id = car_color.user_id ``` This simple query will return *all rows* from Table A and adds the `color` column to rows where there’s a successful match to Table B: | **user\_id** | **type** | **color** | | ------------ | -------- | --------- | | 1 | van | red | | 2 | sedan | null | | 3 | truck | green | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | Because there’s no `user_id` = 2 in Table B, there is no `color` available, thus a null result `color` column for `user_id` 2. #### SQL left join use cases[​](#sql-left-join-use-cases "Direct link to SQL left join use cases") Left joins are a fundamental in data modeling and analytics engineering work—they allow you to easily join database objects onto each other while maintaining an original table’s row count (in the from statement). Compared to right joins, that return all rows in a right join database object (and not the from statement), we find left joins a little more intuitive to understand and build off of. Ensure your joins are just ~~left~~ right Something to note if you use left joins: if there are multiple records for an individual key in the left join database object, be aware that duplicates can potentially be introduced in the final query result. This is where dbt tests, such as testing for primary key uniqueness and [equal row count](https://github.com/dbt-labs/dbt-utils#equal_rowcount-source) across upstream source tables and downstream child models, can help you identify faulty data modeling logic and improve data quality. Where you will not (and should not) see left joins is in [staging models](https://docs.getdbt.com/best-practices/how-we-structure/2-staging.md) that are used to clean and prep raw source data for analytics uses. Any joins in your dbt projects should happen further downstream in [intermediate](https://docs.getdbt.com/best-practices/how-we-structure/3-intermediate.md) and [mart models](https://docs.getdbt.com/best-practices/how-we-structure/4-marts.md) to improve modularity and DAG cleanliness. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### SQL LIKE The LIKE operator helps you easily match, find, and filter out string values of a specified pattern by using SQL wildcards. Important to note that the pattern passed into the LIKE operator is case-sensitive, unlike its case-insensitive cousin, [ILIKE](https://docs.getdbt.com/sql-reference/ilike.md). #### How to use the SQL LIKE operator[​](#how-to-use-the-sql-like-operator "Direct link to How to use the SQL LIKE operator") The LIKE operator has a simple syntax, with the ability to have it utilized in [WHERE clauses](https://docs.getdbt.com/sql-reference/where.md) or case statements: `where like ''` or `case when like ''` Some notes on this operator’s syntax and functionality: * The `` can use two SQL wildcards (`%` and `_`); the underscore will match any *single character* and the % matches zero or more characters * Ex. '%J' = any string that ends with a capital J * Ex. 'J%' = any string that starts with a capital J * Ex. 'J%L' = any string that starts with a capital J and ends with a capital L * Ex. '\_J%' = any string that has a capital J in the second position * Majority of use cases for the LIKE operator will likely involve the `%` wildcard * The LIKE operator is case-sensitive, meaning that the casing in the `` you want to filter for should match the same-case in your column values; for columns with varied casing, leverage the case-insensitive ILIKE operator * The LIKE operator can be paired with the NOT operator, to filter on rows that are not like a specified pattern Let’s dive into a practical example using the LIKE operator now. ##### SQL LIKE example[​](#sql-like-example "Direct link to SQL LIKE example") ```sql select user_id, first_name from {{ ref('customers') }} where first_name like 'J%' order by 1 ``` This simple query using the [Jaffle Shop’s](https://github.com/dbt-labs/jaffle_shop) `customers` table would return all of the customers whose first name starts with the *uppercase* letter J: | **customer\_id** | **first\_name** | | ---------------- | --------------- | | 1 | Julia | | 4 | Jeremy | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | Because LIKE is case-sensitive, it would not return results in this query for customers with lowercase J-names. If you have a mix of uppercase and lowercase strings in your data, consider standardizing casing for strings using the [UPPER](https://docs.getdbt.com/sql-reference/upper.md) and [LOWER](https://docs.getdbt.com/sql-reference/lower.md) functions or use the more flexible [ILIKE operator](https://docs.getdbt.com/sql-reference/ilike.md). #### LIKE syntax in Snowflake, Databricks, BigQuery, and Redshift[​](#like-syntax-in-snowflake-databricks-bigquery-and-redshift "Direct link to LIKE syntax in Snowflake, Databricks, BigQuery, and Redshift") Most, if not all, modern data warehouses support the LIKE operator and the syntax is also the same across them. Some data warehouses, such as Snowflake and Databricks, additionally support similar or more flexible operators such as ILIKE, the case-insensitive version of LIKE, or LIKE ANY, which allows you to pass in multiple pattern options to scan for. Use the table below to read more on the documentation for the LIKE operator in your data warehouse. | **Data warehouse** | **LIKE support?** | | ------------------ | ----------------- | | Snowflake | ✅ | | Databricks | ✅ | | Amazon Redshift | ✅ | | Google BigQuery | ✅ | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | #### LIKE operator example use cases[​](#like-operator-example-use-cases "Direct link to LIKE operator example use cases") You may see the LIKE operator used in analytics engineering work to: * Bucket column values together based on general requirements using case statements and the LIKE operator (ex. `case when page_path like '/product%' then 'product_page' else 'non_product_page'`) * Filter out employee email records based on a similar email address pattern (ex. `where email_address not like '%@dbtlabs.com'`) This isn’t an extensive list of where your team may be using the LIKE operator throughout your dbt models, but contains some common scenarios analytics engineers face day-to-day. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### SQL LIMIT When you’re developing data models or drafting up a query, do you usually need to see all results from it? Not normally. Hence, we LIMIT. Adding the LIMIT clause to a query will limit the number of rows returned. It’s useful for when you’re developing data models, ensuring SQL in a query is functioning as expected, and wanting to save some money during development periods. #### How to use the LIMIT clause in a query[​](#how-to-use-the-limit-clause-in-a-query "Direct link to How to use the LIMIT clause in a query") To limit the number of rows returned from a query, you would pass the LIMIT in the last line of the query with the number of rows you want returned: ```sql select some_rows from my_data_source limit ``` Let’s take a look at a practical example using LIMIT below. ##### LIMIT example[​](#limit-example "Direct link to LIMIT example") ```sql select order_id, order_date, rank () over (order by order_date) as order_rnk from {{ ref('orders') }} order by 2 limit 5 ``` This simple query using the [Jaffle Shop’s](https://github.com/dbt-labs/jaffle_shop) `orders` table will return these exact 5 rows: | order\_id | order\_date | order\_rnk | | --------- | ----------- | ---------- | | 1 | 2018-01-01 | 1 | | 2 | 2018-01-02 | 2 | | 3 | 2018-01-04 | 3 | | 4 | 2018-01-05 | 4 | | 5 | 2018-01-05 | 4 | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | After ensuring that this is the result you want from this query, you can omit the LIMIT in your final data model. Save money and time by limiting data in development You could limit your data used for development by manually adding a LIMIT statement, a WHERE clause to your query, or by using a [dbt macro to automatically limit data based](https://docs.getdbt.com/best-practices/best-practice-workflows.md#limit-the-data-processed-when-in-development) on your development environment to help reduce your warehouse usage during dev periods. #### LIMIT syntax in Snowflake, Databricks, BigQuery, and Redshift[​](#limit-syntax-in-snowflake-databricks-bigquery-and-redshift "Direct link to LIMIT syntax in Snowflake, Databricks, BigQuery, and Redshift") All modern data warehouses support the ability to LIMIT a query and the syntax is also the same across them. Use the table below to read more on the documentation for limiting query results in your data warehouse. | Data warehouse | LIMIT support? | | --------------- | -------------- | | Snowflake | ✅ | | Databricks | ✅ | | Amazon Redshift | ✅ | | Google BigQuery | ✅ | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | #### LIMIT use cases[​](#limit-use-cases "Direct link to LIMIT use cases") We most commonly see queries limited in data work to: * Save some money in development work, especially for large datasets; just make sure the model works across a subset of the data instead of all of the data 💸 * Paired with an ORDER BY statement, grab the top 5, 10, 50, 100, etc. entries from a dataset This isn’t an extensive list of where your team may be using LIMIT throughout your development work, but it contains some common scenarios analytics engineers face day-to-day. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### SQL LOWER We’ve all been there: * In a user signup form, user A typed in their name as `Kira Furuichi`, user B typed it in as `john blust`, and user C wrote `DAvid KrevitT` (what’s up with that, David??) * Your backend application engineers are adamant customer emails are in all caps * All of your event tracking names are lowercase In the real world of human imperfection, opinions, and error, string values are likely to take inconsistent capitalization across different data sources (or even within the same data source). There’s always a little lack of rhyme or reason for why some values are passed as upper or lowercase, and it’s not worth the headache to unpack that. So how do you create uniformity for string values that you collect across all your data sources? The LOWER function! Using the LOWER function on a string value will return the input as an all-lowercase string. It’s an effective way to create consistent capitalization for string values across your data. #### How to use the SQL LOWER function[​](#how-to-use-the-sql-lower-function "Direct link to How to use the SQL LOWER function") The syntax for using the LOWER function looks like the following: ```sql lower() ``` Executing this command in a SELECT statement will return the lowercase version of the input string. You may additionally use the LOWER function in WHERE clauses and on join values. Let’s take a look at a practical example using the LOWER function. ##### SQL LOWER function example[​](#sql-lower-function-example "Direct link to SQL LOWER function example") You can lower the first name and last name of the [Jaffle Shop’s](https://github.com/dbt-labs/jaffle_shop) `customers` model using the following code: ```sql select customer_id, lower(first_name) as first_name, lower(last_name) as last_name from {{ ref('customers') }} ``` After running this query, the `customers` table will look a little something like this: | customer\_id | first\_name | last\_name | | ------------ | ----------- | ---------- | | 1 | michael | p. | | 2 | shawn | m. | | 3 | kathleen | p. | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | Now, all characters in the `first_name` and `last_name` columns are lowercase. > Changing all string columns to lowercase to create uniformity across data sources typically happens in our [dbt project’s staging models](https://docs.getdbt.com/best-practices/how-we-structure/2-staging.md). There are a few reasons for that: data cleanup and standardization, such as aliasing, casting, and lowercasing, should ideally happen in staging models to create downstream uniformity and improve downstream performance. #### SQL LOWER function syntax in Snowflake, Databricks, BigQuery, and Redshift[​](#sql-lower-function-syntax-in-snowflake-databricks-bigquery-and-redshift "Direct link to SQL LOWER function syntax in Snowflake, Databricks, BigQuery, and Redshift") Google BigQuery, Amazon Redshift, Snowflake, Postgres, and Databricks all support the LOWER function. In addition, the syntax to use LOWER is the same across all of them. #### LOWER function use cases[​](#lower-function-use-cases "Direct link to LOWER function use cases") Let’s go back to our chaotic trio of users A, B, and C who all used different capitalizations to type in their names. If you don’t create consistent capitalization for string values, how would a business user know what to filter for in their BI tool? A business user could filter a name field on “John Blust” since that’s what they would expect it to look like, only to get zero results back. By creating a consistent capitalization format (upper or lowercase) for all string values in your data models, you therefore create some expectations for business users in your BI tool. There will most likely never be 100% consistency in your data models, but doing all that you can to mitigate that chaos will make your life and the life of your business users hopefully a little easier. Use the LOWER function to create a consistent casing for all strings in your data sources. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### SQL MAX The SQL MAX aggregate function allows you to compute the maximum value from a column. This kind of measure is useful for understanding the distribution of column values, determining the most recent timestamps of key events, and creating booleans from CASE WHEN statements to flatten semi-structured data. #### How to use the SQL MAX function in a query[​](#how-to-use-the-sql-max-function-in-a-query "Direct link to How to use the SQL MAX function in a query") Use the following syntax to find the maximum value of a field: `max()` Since MAX is an aggregate function, you’ll need a GROUP BY statement in your query if you’re looking at counts broken out by dimension(s). If you’re calculating the standalone maximum of fields without the need to break them down by another field, you don’t need a GROUP BY statement. MAX can also be used as a window function to operate across specified or partitioned rows. Let’s take a look at a practical example using MAX and GROUP BY below. ##### MAX example[​](#max-example "Direct link to MAX example") The following example is querying from a sample dataset created by dbt Labs called [jaffle\_shop](https://github.com/dbt-labs/jaffle_shop): ```sql select date_part('month', order_date) as order_month, max(amount) as max_amaount from {{ ref('orders') }} group by 1 ``` This simple query is something you may do while doing initial exploration of your data; it will return the maximum order `amount` per order month that appear in the Jaffle Shop’s `orders` table: | order\_month | max\_amount | | ------------ | ----------- | | 1 | 58 | | 2 | 30 | | 3 | 56 | | 4 | 26 | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | #### SQL MAX function syntax in Snowflake, Databricks, BigQuery, and Redshift[​](#sql-max-function-syntax-in-snowflake-databricks-bigquery-and-redshift "Direct link to SQL MAX function syntax in Snowflake, Databricks, BigQuery, and Redshift") All modern data warehouses support the ability to use the MAX function (and follow the same syntax!). #### MAX function use cases[​](#max-function-use-cases "Direct link to MAX function use cases") We most commonly see queries using MAX to: * Perform initial data exploration on a dataset to understand the distribution of column values. * Identify the most recent timestamp for key events (ex. `max(login_timestamp_utc) as last_login`). * Create descriptive boolean values from case when statements (ex. `max(case when status = 'complete' then 1 else 0 end) as has_complete_order`). * Establish the most recent timestamp from a table to filter on rows appropriately for [incremental model builds](https://docs.getdbt.com/docs/build/incremental-models.md). This isn’t an extensive list of where your team may be using MAX throughout your development work, dbt models, and BI tool logic, but it contains some common scenarios analytics engineers face day-to-day. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### SQL MIN SQL MIN, MAX, SUM…the aggregate functions that you’ll live and die by as an analytics practitioner. In this post, we’re going to unpack the SQL MIN function, how to use it, and why it's valuable in data work. The MIN aggregate function allows you to compute the minimum value from a column or across a set of rows for a column. The results from the MIN function are useful for understanding the distribution of column values and determining the first timestamps of key events. #### How to use the MIN function in a query[​](#how-to-use-the-min-function-in-a-query "Direct link to How to use the MIN function in a query") Use the following syntax in a query to find the minimum value of a field: `min()` Since MIN is an aggregate function, you’ll need a GROUP BY statement in your query if you’re looking at counts broken out by dimension(s). If you’re calculating the standalone minimum of fields without the need to break them down by another field, you don’t need a GROUP BY statement. MIN can also be used as a window function to operate across specified or partitioned rows. Let’s take a look at a practical example using MIN below. ##### MIN example[​](#min-example "Direct link to MIN example") The following example is querying from a sample dataset created by dbt Labs called [jaffle\_shop](https://github.com/dbt-labs/jaffle_shop): ```sql select customer_id, min(order_date) as first_order_date, max(order_date) as last_order_date from {{ ref('orders') }} group by 1 limit 3 ``` This simple query is returning the first and last order date for a customer in the Jaffle Shop’s `orders` table: | customer\_id | first\_order\_date | last\_order\_date | | ------------ | ------------------ | ----------------- | | 1 | 2018-01-01 | 2018-02-10 | | 3 | 2018-01-02 | 2018-03-11 | | 94 | 2018-01-04 | 2018-01-29 | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | #### SQL MIN function syntax in Snowflake, Databricks, BigQuery, and Redshift[​](#sql-min-function-syntax-in-snowflake-databricks-bigquery-and-redshift "Direct link to SQL MIN function syntax in Snowflake, Databricks, BigQuery, and Redshift") All modern data warehouses support the ability to use the MIN function (and follow the same syntax). #### MIN function use cases[​](#min-function-use-cases "Direct link to MIN function use cases") We most commonly see queries using MIN to: * Perform initial data exploration on a dataset to understand the distribution of column values. * Identify the first timestamp for key events (ex. `min(login_timestamp_utc) as first_login`). This isn’t an extensive list of where your team may be using MIN throughout your development work, dbt models, and BI tool logic, but it contains some common scenarios analytics engineers face day-to-day. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### SQL NOT This will be a not *not* useful page on a helpful SQL operator. Ok we had to get that out of the way. The SQL NOT operator allows you to return results from conditions that are not true. Pretty intuitive, right? In this page, we’ll dive into how to use the NOT operator, demonstrate an example, and elaborate on potential use cases. #### How to use the SQL NOT operator[​](#how-to-use-the-sql-not-operator "Direct link to How to use the SQL NOT operator") The NOT boolean is kind of similar to an adjective—it’s often put in front of another operator, such as [BETWEEN](https://docs.getdbt.com/sql-reference/between.md), [LIKE](https://docs.getdbt.com/sql-reference/like.md)/[ILIKE](https://docs.getdbt.com/sql-reference/ilike.md), IS, and [IN](https://docs.getdbt.com/sql-reference/in.md), to return rows that do not meet the specified criteria. Below is an example of how to use NOT in front of a LIKE operator: `where not like ` This syntax can be easily modified for other operators: * `where not between and ` * `where is not null` * `where is not in (array_of_options)` * …or placed altogether in a different place, such as a case statement (ex. `case when is not null then 1 else 0 end`) Let’s dive into a practical example using the NOT operator. ##### SQL NOT example[​](#sql-not-example "Direct link to SQL NOT example") ```sql select payment_id, order_id, payment_method from {{ ref('payments') }} where payment_method not like '%card' ``` This simple query using the sample dataset [Jaffle Shop’s](https://github.com/dbt-labs/jaffle_shop) `payments` table is returning all rows whose `payment_method` is not a card-type (ex. gift card or credit card): | **payment\_id** | **order\_id** | **payment\_method** | | --------------- | ------------- | ------------------- | | 3 | 3 | coupon | | 4 | 4 | coupon | | 5 | 5 | bank\_transfer | | 10 | 9 | bank\_transfer | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | #### SQL NOT syntax in Snowflake, Databricks, BigQuery, and Redshift[​](#sql-not-syntax-in-snowflake-databricks-bigquery-and-redshift "Direct link to SQL NOT syntax in Snowflake, Databricks, BigQuery, and Redshift") [Snowflake](https://docs.snowflake.com/en/sql-reference/operators-logical.html), [Databricks](https://docs.databricks.com/sql/language-manual/functions/not.html), [BigQuery](https://cloud.google.com/bigquery/docs/reference/standard-sql/operators), and [Redshift](https://docs.aws.amazon.com/redshift/latest/dg/r_logical_condition.html) all support the NOT operator, but may not all support secondary operators you would typically use the NOT operator in pair with. For example, `where not ilike ` is valid in Snowflake, Databricks, and Redshift, but the ILIKE operator is not supported in BigQuery, so this example would not be valid across all data warehouses. #### NOT operator example use cases[​](#not-operator-example-use-cases "Direct link to NOT operator example use cases") There are probably many scenarios where you’d want to use the NOT operators in your WHERE clauses or case statements, but we commonly see NOT operators used to remove nulls or boolean-identifed deleted rows in source data in [staging models](https://docs.getdbt.com/best-practices/how-we-structure/2-staging.md). This removal of unnecessary rows can potentially help the performance of downstream [intermediate](https://docs.getdbt.com/best-practices/how-we-structure/3-intermediate.md) and [mart models](https://docs.getdbt.com/best-practices/how-we-structure/4-marts.md). #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### SQL OR We tried to come up with something witty about using the OR operator in a query, but couldn’t think of any 🤷 Use the OR operator in a WHERE clause to filter on multiple field values or perform more advanced joins on multiple fields. #### How to use the OR operator[​](#how-to-use-the-or-operator "Direct link to How to use the OR operator") The OR operator is technically a boolean operator—meaning it returns results that execute to true. It’s straightforward to use, and you’ll typically see it appear in a WHERE clause to filter query results appropriately or joins that involve multiple possible fields. ##### OR operator example[​](#or-operator-example "Direct link to OR operator example") ```sql select order_id, customer_id, order_date, status, amount from {{ ref('orders') }} where status = 'shipped' or status = 'completed' limit 3 ``` This query using the [Jaffle Shop’s](https://github.com/dbt-labs/jaffle_shop) `orders` table will return results where the order status is shipped or completed: | order\_id | customer\_id | order\_date | status | amount | | --------- | ------------ | ----------- | --------- | ------- | | 2 | 3 | 2018-01-02 | completed | 20.0000 | | 3 | 94 | 2018-01-04 | completed | 1.00000 | | 4 | 50 | 2018-01-05 | completed | 25.0000 | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | #### OR operator syntax in Snowflake, Databricks, BigQuery, and Redshift[​](#or-operator-syntax-in-snowflake-databricks-bigquery-and-redshift "Direct link to OR operator syntax in Snowflake, Databricks, BigQuery, and Redshift") Snowflake, Databricks, Google BigQuery, and Amazon Redshift all support the OR operator with the syntax looking the same in each platform. You may see the OR operator substituted for a more appropriate IN operator. #### OR use cases[​](#or-use-cases "Direct link to OR use cases") We most commonly see OR operators used in queries and dbt models to: * Return results for fields of varying values * Joining tables on multiple fields using an OR operator (fair warning: this can be a bit scary and inefficient, so use OR operators in joins very carefully and consider refactoring your work to avoid these scenarios) This isn’t an extensive list of where your team may be using OR throughout your data work, but it contains some common scenarios analytics engineers face day-to-day. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### SQL ORDER BY The ORDER BY clause allows you to specify the resulting row order for a query. In practice, you use the ORDER BY clause to indicate which field(s) you want to order by and in what type of order you want (ascending or descending). It’s useful to leverage during ad hoc analyses and for creating appropriate column values for partitioned rows in window functions. #### How to use the SQL ORDER BY clause[​](#how-to-use-the-sql-order-by-clause "Direct link to How to use the SQL ORDER BY clause") ORDER BY clauses have multiple use cases in analytics work, but we see it most commonly utilized to: * Order a query or subquery result by a column or group of columns * Appropriately order a subset of rows in a window function To use the ORDER BY clause to a query or model, use the following syntax: ```sql select column_1, column_2 from source_table order by --comes after FROM, WHERE, and GROUP BY statements ``` You can order a query result by multiple columns, represented by their column name or by their column number in the select statement (ex. `order by column_2 == order by 2`). You can additionally specify the ordering type you want (ascending or descending) to return the desired row order. Let’s take a look at a practical example using ORDER BY. ##### ORDER BY example[​](#order-by-example "Direct link to ORDER BY example") ```sql select date_trunc('month, order_date') as order_month, round(avg(amount)) as avg_order_amount from {{ ref('orders') }} group by 1 order by 1 desc ``` This query using the [Jaffle Shop’s](https://github.com/dbt-labs/jaffle_shop) `orders` table will return the rounded order amount per each order month in descending order: | order\_month | avg\_order\_amount | | ------------ | ------------------ | | 2018-04-01 | 17 | | 2018-03-01 | 18 | | 2018-02-01 | 15 | | 2018-01-01 | 17 | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | #### SQL ORDER BY syntax in Snowflake, Databricks, BigQuery, and Redshift[​](#sql-order-by-syntax-in-snowflake-databricks-bigquery-and-redshift "Direct link to SQL ORDER BY syntax in Snowflake, Databricks, BigQuery, and Redshift") Since the ORDER BY clause is a SQL fundamental, data warehouses, including Snowflake, Databricks, Google BigQuery, and Amazon Redshift, all support the ability to add ORDER BY clauses in queries and window functions. #### ORDER BY use cases[​](#order-by-use-cases "Direct link to ORDER BY use cases") We most commonly see the ORDER BY clause used in data work to: * Analyze data for both initial exploration of raw data sources and ad hoc querying of [mart datasets](https://docs.getdbt.com/best-practices/how-we-structure/4-marts.md) * Identify the top 5/10/50/100 of a dataset when used in pair with a [LIMIT](https://docs.getdbt.com/sql-reference/limit.md) * (For Snowflake) Optimize the performance of large incremental models that use both a `cluster_by` [configuration](https://docs.getdbt.com/reference/resource-configs/snowflake-configs.md#using-cluster_by) and ORDER BY statement * Control the ordering of window function partitions (ex. `row_number() over (partition by user_id order by updated_at)`) This isn’t an extensive list of where your team may be using the ORDER BY clause throughout your dbt models, ad hoc queries, and BI tool logic, but it contains some common scenarios analytics engineers face day-to-day. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### SQL OUTER JOIN SQL full outer joins exist and therefore we have to talk about them, but they’re *highly unlikely* to be a join you regularly leverage in your data work. In plain terms, a SQL full outer join is a join between two tables that returns *all rows* from both tables, regardless of join key match success; compare this to [left](https://docs.getdbt.com/sql-reference/left-join.md), [inner](https://docs.getdbt.com/sql-reference/outer-join.md), or [right joins](https://docs.getdbt.com/sql-reference/right-join.md) that require matches to be successful to return certain rows. In this page, we’ll unpack how to create a full outer join and demonstrate when you might need one in your analytics engineering work. #### How to create a full outer join[​](#how-to-create-a-full-outer-join "Direct link to How to create a full outer join") Like all joins, you need some database objects (ie tables/views), keys to join on, and a [select statement](https://docs.getdbt.com/sql-reference/select.md) to perform a full outer join: ```text select from as t1 full outer join as t2 on t1.id = t2.id ``` In this example above, there’s only one field being used to join the table together; if you’re joining between database objects that require multiple fields, you can leverage AND/OR operators, and more preferably, surrogate keys. You may additionally add [WHERE](https://docs.getdbt.com/sql-reference/where.md), [GROUP BY](https://docs.getdbt.com/sql-reference/group-by.md), [ORDER BY](https://docs.getdbt.com/sql-reference/order-by.md), [HAVING](https://docs.getdbt.com/sql-reference/having.md), and other clauses after your joins to create filtering, ordering, and performing aggregations. A note on full outer joins: it may sound obvious, but because full outer joins can return all rows between two tables, they therefore can return *many* rows, which is not necessarily a recipe for efficiency. When you use full outer joins, you often can find alternatives using different joins or unions to potentially bypass major inefficiencies caused by a full outer join. ##### SQL full outer join example[​](#sql-full-outer-join-example "Direct link to SQL full outer join example") Table A `car_type` | user\_id | car\_type | | -------- | --------- | | 1 | van | | 2 | sedan | | 3 | truck | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | Table B `car_color` | user\_id | car\_color | | -------- | ---------- | | 1 | red | | 3 | green | | 4 | yellow | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | ```sql select car_type.user_id as user_id, car_type.car_type as type, car_color.car_color as color from {{ ref('car_type') }} as car_type full outer join {{ ref('car_color') }} as car_color on car_type.user_id = car_color.user_id order by 1 ``` This simple query will return all rows from tables A and B, regardless of `user_id` match success between the two tables: | user\_id | type | color | | -------- | ----- | ------ | | 1 | van | red | | 2 | sedan | null | | 3 | truck | green | | 4 | null | yellow | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | #### SQL full outer join use cases[​](#sql-full-outer-join-use-cases "Direct link to SQL full outer join use cases") There will inevitably be valid use cases for full outer joins in your dbt project. However, because of the nature of dbt, which heavily encourages modularity and DRY dryness, the necessity for full outer joins may go down (slightly). Regardless, the two primary cases for full outer joins we typically see are around consolidating or merging multiple entities together and data validation. * Merging tables together: A full outer join between two tables can bring those entities together, regardless of join key match. This type of joining can often be bypassed by using different joins, unions, pivots, and a combination of these, but hey, sometimes the full outer join is a little less work 🤷 * Data validation: Full outer joins can be incredibly useful when performing data validation; for example, in the [dbt-audit-helper package](https://github.com/dbt-labs/dbt-audit-helper), a full outer join is used in the [compare\_column\_values test](https://github.com/dbt-labs/dbt-audit-helper/blob/main/macros/compare_column_values.sql) to help determine where column values are mismatched between two dbt models. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### SQL RANK There are many different ranking window functions…[ROW\_NUMBER](https://docs.getdbt.com/sql-reference/row-number.md), DENSE\_RANK, RANK. Let’s start off with the most basic (RANK) and talk about what it is, how to use it, and why it’s important in analytics engineering work. The RANK function is an effective way to create a ranked column or filter a query based on rankings. More specifically, the RANK function returns the rank of a value (starting at 1) in an ordered group or dataset. It's important to note that if multiple values executed by the rank function are the same, they’ll have the same rank. #### How to use the RANK function[​](#how-to-use-the-rank-function "Direct link to How to use the RANK function") The RANK function has a pretty simple syntax, with an optional partition field and support for ordering customization: `rank() over ([partition by ] order by field(s) [asc | desc])` Some notes on this function’s syntax: * The `partition by` field is optional; if you want to rank your entire dataset by certain fields (compared to partitioning *and ranking* within a dataset), you would simply omit the `partition by` from the function call (see the example below for this). * By default, the ordering of a ranking function is set to ascending. To explicitly make the ordering in a descending order, you’ll need to pass in `desc` to the `order by` part of the function. Let’s take a look at a practical example using the RANK function below. ##### RANK function example[​](#rank-function-example "Direct link to RANK function example") ```sql select order_id, order_date, rank() over (order by order_date) as order_rank from {{ ref('orders') }} ``` This simple query using the [Jaffle Shop’s](https://github.com/dbt-labs/jaffle_shop) `orders` table will return the rank of orders by their `order_date`: | order\_id | order\_date | order\_rank | | --------- | ----------- | ----------- | | 1 | 2018-01-01 | 1 | | 2 | 2018-01-02 | 2 | | 3 | 2018-01-04 | 3 | | 4 | 2018-01-05 | 4 | | 5 | 2018-01-05 | 4 | | 6 | 2018-01-07 | 6 | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | Some notes on these results: * Orders that have the same `order_date`(ex. Orders 4 and 5) have the same `order_rank` (4). * Order 6’s `order_rank` is 6 (if you wanted the rank to execute to 5, you would use the DENSE\_RANK function). Ranking functions to know RANK is just one of the ranking functions that analytics engineering practitioners will use throughout their data models. There’s also DENSE\_RANK and [ROW\_NUMBER](https://docs.getdbt.com/sql-reference/row-number.md) which rank rows differently than RANK. #### RANK syntax in Snowflake, Databricks, BigQuery, and Redshift[​](#rank-syntax-in-snowflake-databricks-bigquery-and-redshift "Direct link to RANK syntax in Snowflake, Databricks, BigQuery, and Redshift") Most, if not all, modern data warehouses support RANK and other similar ranking functions; the syntax is also the same across them. Use the table below to read more on the documentation for the RANK function in your data warehouse. | Data warehouse | RANK support? | | --------------- | ------------- | | Snowflake | ✅ | | Databricks | ✅ | | Amazon Redshift | ✅ | | Google BigQuery | ✅ | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | #### RANK function use cases[​](#rank-function-use-cases "Direct link to RANK function use cases") We most commonly see the RANK function used in data work to: * In [SELECT statements](https://docs.getdbt.com/sql-reference/select.md) to add explicit ranking to rows * In QUALIFY statements to filter a query on a ranking without having to add the rank to the query result This isn’t an extensive list of where your team may be using the RANK function throughout your dbt models and BI tool logic, but contains some common scenarios analytics engineers face in a day-to-day. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### SQL Reference #### [🗃️ Statements](https://docs.getdbt.com/sql-reference/select.md) [5 items](https://docs.getdbt.com/sql-reference/select.md) --- ### SQL RIGHT JOIN The not-as-favorite child: the right join. Unlike [left joins](https://docs.getdbt.com/sql-reference/left-join.md) that return all rows in the database object in [the FROM statement](https://docs.getdbt.com/sql-reference/from.md), regardless of match in the left join object, right joins return all rows *in the right join database object*, regardless of match in the database object in the FROM statement. What you really need to know: You can accomplish anything a right join does with a left join and left joins typically are more readable and intuitive. However, we’ll still walk you through how to use right joins and elaborate on why we think left joins are superior 😉 #### How to create a right join[​](#how-to-create-a-right-join "Direct link to How to create a right join") Like all joins, you need some database objects (ie tables/views), keys to join on, and a [select statement](https://docs.getdbt.com/sql-reference/select.md) to perform a right join: ```text select from as t1 right join as t2 on t1.id = t2.id ``` In this example above, there’s only one field from each table being used to join the two together together; if you’re joining between two database objects that require multiple fields, you can leverage AND/OR operators, and more preferably, surrogate keys. You may additionally add [WHERE](https://docs.getdbt.com/sql-reference/where.md), [GROUP BY](https://docs.getdbt.com/sql-reference/group-by.md), [ORDER BY](https://docs.getdbt.com/sql-reference/order-by.md), [HAVING](https://docs.getdbt.com/sql-reference/having.md), and other clauses after your joins to create filtering, ordering, and performing aggregations. You may also right (or any join really) as many joins as you’d like in an individual query or CTE. ##### SQL right join example[​](#sql-right-join-example "Direct link to SQL right join example") Table A `car_type` | **user\_id** | **car\_type** | | ------------ | ------------- | | 1 | van | | 2 | sedan | | 3 | truck | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | Table B `car_color` | **user\_id** | **car\_color** | | ------------ | -------------- | | 1 | red | | 3 | green | | 4 | yellow | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | ```sql select car_type.user_id as user_id, car_type.car_type as type, car_color.car_color as color from {{ ref('car_type') }} as car_type right join {{ ref('car_color') }} as car_color on car_type.user_id = car_color.user_id ``` This simple query will return *all* rows from Table B and adds the `color` column to rows where there’s a successful match to Table A: | **user\_id** | **type** | **color** | | ------------ | -------- | --------- | | 1 | van | red | | 3 | truck | green | | 4 | null | yellow | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | Because there’s no `user_id` = 4 in Table A, there is no `type` available, thus a null result `type` column for `user_id` 4. Since no `user_id` = 2 exists in Table B, and that id is not in the right join database object, no rows with a `user_id` of 2 will be returned. #### SQL right join use cases[​](#sql-right-join-use-cases "Direct link to SQL right join use cases") Compared to left joins, you likely won’t see right joins as often (or ever) in data modeling and analytics engineering work. But why not? Simply because right joins are a little less intuitive than a left join. When you’re data modeling, you’re usually focused on one database object, and adding the supplementary data or tables you need to give you a final dataset. That one focal database object is typically what is put in the `from {{ ref('my_database_object')}}`; any other columns that are joined onto it from other tables are usually supplementary, but keeping all the rows from the initial table of focus is usually the priority. Don’t get us wrong—right joins can get you there—it’s likely just a little less intuitive and can get complex with queries that involve multiple joins. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### SQL ROUND If you’re reading this, that probably means you’re a data person. And as a data person who’s likely modeling data for analytics use cases, you’re going to need to round data from time to time. For the unacquainted, "rounding" is making a number simpler so that it's easier to understand while keeping it close to its original value. In data, a common use case for rounding is to decrease the number of decimal places a numeric record has. To round numeric fields or values in SQL, you’re going to use the handy ROUND function. #### How to use the SQL ROUND function[​](#how-to-use-the-sql-round-function "Direct link to How to use the SQL ROUND function") The syntax for using ROUND function looks like the following: ```sql round(, [optional] ) ``` In this function, you’ll need to input the *numeric* field or data you want rounded and pass in an optional number to round your field by. For most data warehouses, the number of decimal places is defaulted to 0 or 1, meaning if you rounded 20.00 using `round(20.00)`, it would return 20 or 20.0 (depending on your data warehouse). ##### SQL ROUND function example[​](#sql-round-function-example "Direct link to SQL ROUND function example") You can round some of the numeric fields of the [Jaffle Shop’s](https://github.com/dbt-labs/jaffle_shop) `orders` model using the following code: ```sql select cast(order_id as string) as order_id, order_date, amount, round(amount, 1) as rounded_amount from {{ ref('orders') }} ``` After running this query, the resulting `orders` table will look a little something like this: | order\_id | order\_date | amount | rounded\_amount | | --------- | ----------- | --------- | --------------- | | 1 | 2018-01-01 | 10.000000 | 10.0 | | 2 | 2018-01-02 | 20.000000 | 20.0 | | 3 | 2018-01-04 | 1.000000 | 1.0 | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | The new `rounded_amount` column is the `amount` fielded rounded to 1 decimal place. For most data warehouses, the returned data from the ROUND function should be the same as the input data. If you input a float type into the ROUND function, the returned rounded number should also be a float. #### SQL ROUND function syntax in Snowflake, Databricks, BigQuery, and Redshift[​](#sql-round-function-syntax-in-snowflake-databricks-bigquery-and-redshift "Direct link to SQL ROUND function syntax in Snowflake, Databricks, BigQuery, and Redshift") Google BigQuery, Amazon Redshift, Snowflake, and Databricks all support the ability to round numeric columns and data. In addition, the syntax to round is the same across all of them using the ROUND function. #### ROUND function use cases[​](#round-function-use-cases "Direct link to ROUND function use cases") If you find yourself rounding numeric data, either in data models or ad-hoc analyses, you’re probably rounding to improve the readability and usability of your data using downstream [intermediate](https://docs.getdbt.com/best-practices/how-we-structure/3-intermediate.md) or [mart models](https://docs.getdbt.com/best-practices/how-we-structure/4-marts.md). Specifically, you’ll likely use the ROUND function to: * Make numeric calculations using division or averages a little cleaner and easier to understand * Create concrete buckets of data for a cleaner distribution of values during ad-hoc analysis You’ll additionally likely see the ROUND function used in your BI tool as it generates rounded clean numbers for business users to interact with. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### SQL ROW_NUMBER In this page, let’s go deep into the ROW\_NUMBER function and talk about what it is, how to use it, and why it’s important in analytics engineering work. The ROW\_NUMBER window function is an effective way to create a ranked column or filter a query based on rankings. More specifically, the ROW\_NUMBER function returns the *unique* row number of a row in an ordered group or dataset. Unlike the [RANK](https://docs.getdbt.com/sql-reference/rank.md) and DENSE\_RANK functions, ROW\_NUMBER is non-deterministic, meaning that a *unique* number is assigned arbitrarily for rows with duplicate values. #### How to use the ROW\_NUMBER function[​](#how-to-use-the-row_number-function "Direct link to How to use the ROW_NUMBER function") The ROW\_NUMBER function has a pretty simple syntax, with an optional partition field and support for ordering customization: `row_number() over ([partition by ] order by field(s) [asc | desc])` Some notes on this function’s syntax: * The `partition by` field is optional; if you want to get the row numbers of your entire dataset (compared to grabbing row number within a group of rows in your dataset), you would simply omit the `partition by` from the function call (see the example below for this). * By default, the ordering of a ROW\_NUMBER function is set to ascending. To explicitly make the resulting order descending, you’ll need to pass in `desc` to the `order by` part of the function. Let’s take a look at a practical example using the ROW\_NUMBER function below. ##### ROW\_NUMBER function example[​](#row_number-function-example "Direct link to ROW_NUMBER function example") ```sql select customer_id, order_id, order_date, row_number() over (partition by customer_id order by order_date) as row_n from {{ ref('orders') }} order by 1 ``` This simple query using the [Jaffle Shop’s](https://github.com/dbt-labs/jaffle_shop) `orders` table will return the unique row number per customer by their `order_date`: | customer\_id | order\_id | order\_date | row\_n | | ------------ | --------- | ----------- | ------ | | 1 | 1 | 2018-01-01 | 1 | | 1 | 37 | 2018-02-10 | 2 | | 2 | 8 | 2018-01-11 | 1 | | 3 | 2 | 2018-01-02 | 1 | | 3 | 24 | 2018-01-27 | 2 | | 3 | 69 | 2018-03-11 | 3 | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | Because ROW\_NUMBER is non-deterministic, orders per customer that have the same `order_date` would have unique `row_n` values (unlike if you used the RANK or DENSE\_RANK functions). #### ROW\_NUMBER syntax in Snowflake, Databricks, BigQuery, and Redshift[​](#row_number-syntax-in-snowflake-databricks-bigquery-and-redshift "Direct link to ROW_NUMBER syntax in Snowflake, Databricks, BigQuery, and Redshift") Most, if not all, modern data warehouses support ROW\_NUMBER and other similar ranking functions; the syntax is also the same across them. Use the table below to read more on the documentation for the ROW\_NUMBER function in your data warehouse. | Data warehouse | ROW\_NUMBER support? | | --------------------------------------------------------------------------------------------------------------- | -------------------- | | [Snowflake](https://docs.snowflake.com/en/sql-reference/functions/row_number.html) | ✅ | | [Databricks](https://docs.databricks.com/sql/language-manual/functions/row_number.html) | ✅ | | [Amazon Redshift](https://docs.aws.amazon.com/redshift/latest/dg/r_WF_ROW_NUMBER.html) | ✅ | | [Google BigQuery](https://cloud.google.com/bigquery/docs/reference/standard-sql/numbering_functions#row_number) | ✅ | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | #### ROW\_NUMBER function use cases[​](#row_number-function-use-cases "Direct link to ROW_NUMBER function use cases") We most commonly see the ROW\_NUMBER function used in data work to: * In [SELECT statements](https://docs.getdbt.com/sql-reference/select.md) to add explicit and unique row numbers in a group of data or across an entire table * Paired with QUALIFY statement, filter CTEs, queries, or models to capture one unique row per specified partition with the ROW\_NUMBER function. This is particularly useful when you need to remove duplicate rows from a dataset (but use this wisely!). This isn’t an extensive list of where your team may be using the ROW\_NUMBER function throughout your dbts some common scenarios analytics engineers face day-to-day. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### SQL SELECT My goodness, would there even be modern data teams without SQL SELECT statements? Probably not. Luckily, we live in a world of tabular data, cloud data warehouses, and SQL prowess. Analysts and analytics engineers are writing queries, creating data models, and leveraging SQL to power their [data transformations](https://www.getdbt.com/analytics-engineering/transformation/) and analysis. But what makes these queries possible? SELECT statements. The SQL SELECT statement is the fundamental building block of any query: it allows you to select specific columns (data) from a database schema object (table/view). In a dbt project, a SQL dbt model is technically a singular SELECT statement (often built leveraging CTEs or subqueries). #### How to use SELECT[​](#how-to-use-select "Direct link to How to use SELECT") Any query begins with a simple SELECT statement: ```sql select order_id, --your first column you want selected customer_id, --your second column you want selected order_date --your last column you want selected (and so on) from {{ ref('orders') }} --the table/view/model you want to select from limit 3 ``` This basic query is selecting three columns from the [jaffle shop’s](https://github.com/dbt-labs/jaffle_shop/blob/main/models/orders.sql) `orders` table and returning three rows. If you execute this query in your data warehouse, it will return a result looking like this: | order\_id | customer\_id | order\_date | | --------- | ------------ | ----------- | | 1 | 1 | 2018-01-01 | | 2 | 3 | 2018-01-02 | | 3 | 95 | 2018-01-04 | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | You may also commonly see queries that `select * from table_name`. The asterisk or star is telling the compiler to select all columns from a specified table or view. Goodbye carpal tunnel Leverage [dbt utils’ star macro](https://docs.getdbt.com/blog/star-sql-love-letter) to be able to both easily select many and specifically exclude certain columns. In a dbt project, analytics engineers will typically write models that contain multiple CTEs that build to one greater query. For folks that are newer to analytics engineering or dbt, we recommend they check out the [“How we structure our dbt projects” guide](https://docs.getdbt.com/best-practices/how-we-structure/1-guide-overview.md) to better understand why analytics folks like modular data modeling and CTEs. #### SELECT statement syntax in Snowflake, Databricks, BigQuery, and Redshift[​](#select-statement-syntax-in-snowflake-databricks-bigquery-and-redshift "Direct link to SELECT statement syntax in Snowflake, Databricks, BigQuery, and Redshift") While we know the data warehouse players like to have their own slightly different flavors and syntax for SQL, they have conferred together that the SELECT statement is sacred and unchangeable. As a result, writing the actual `select…from` statement across Snowflake, Databricks, Google BigQuery, and Amazon Redshift would look the same. However, the actual SQL manipulation of data within the SELECT statement (ex. adding dates, casting columns) might look slightly different between each data warehouse. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### SQL SELF JOINS Simultaneously the easiest and most confusing of joins: the self join. Simply put, a self join allows you to join a dataset back onto itself. If you’re newer to data work or SQL, you may be asking yourself: why in the world would you want to do this? Shouldn’t joins happen between multiple *different* entities? The majority of joins you see in analytics work and dbt projects will probably be left and inner joins, but occasionally, depending on how the raw source table is built out, you’ll leverage a self join. One of the most common use cases to leverage a self join is when a table contains a foreign key to the primary key of that same table. It’s ok if none of that made sense—jump into this page to better understand how and where you might use a self join in your analytics engineering work. #### How to create a self join[​](#how-to-create-a-self-join "Direct link to How to create a self join") No funny venn diagrams here—there’s actually even no special syntax for self joins. To create a self join, you’ll use a regular join syntax, the only differences is the join objects are *the same*: ```text select from as t1 [] join as t2 on t1.id = t2.id ``` Since you can choose the dialect of join for a self join, you can specify if you want to do a [left](https://docs.getdbt.com/sql-reference/left-join.md), [outer](https://docs.getdbt.com/sql-reference/outer-join.md), [inner](https://docs.getdbt.com/sql-reference/inner-join.md), [cross](https://docs.getdbt.com/sql-reference/cross-join.md), or [right join](https://docs.getdbt.com/sql-reference/right-join.md) in the join statement. ##### SQL self join example[​](#sql-self-join-example "Direct link to SQL self join example") Given a `products` table that looks likes this, where there exists both a primary key (`sku_id`) and foreign key (`parent_id`) to that primary key: | **sku\_id** | **sku\_name** | **parent\_id** | | ----------- | ------------- | -------------- | | 1 | Lilieth Bed | 4 | | 2 | Holloway Desk | 3 | | 3 | Basic Desk | null | | 4 | Basic Bed | null | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | And this query utilizing a self join to join `parent_name` onto skus: ```sql select products.sku_id, products.sku_name, products.parent_id, parents.sku_name as parent_name from {{ ref('products') }} as products left join {{ ref('products') }} as parents on products.parent_id = parents.sku_id ``` This query utilizing a self join adds the `parent_name` of skus that have non-null `parent_ids`: | sku\_id | sku\_name | parent\_id | parent\_name | | ------- | ------------- | ---------- | ------------ | | 1 | Lilieth Bed | 4 | Basic Bed | | 2 | Holloway Desk | 3 | Basic Desk | | 3 | Basic Desk | null | null | | 4 | Basic Bed | null | null | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | #### SQL self join use cases[​](#sql-self-join-use-cases "Direct link to SQL self join use cases") Again, self joins are probably rare in your dbt project and will most often be utilized in tables that contain a hierarchical structure, such as consisting of a column which is a foreign key to the primary key of the same table. If you do have use cases for self joins, such as in the example above, you’ll typically want to perform that self join early upstream in your DAG, such as in a [staging](https://docs.getdbt.com/best-practices/how-we-structure/2-staging.md) or [intermediate](https://docs.getdbt.com/best-practices/how-we-structure/3-intermediate.md) model; if your raw, unjoined table is going to need to be accessed further downstream sans self join, that self join should happen in a modular intermediate model. You can also use self joins to create a cartesian product (aka a cross join) of a table against itself. Again, slim use cases, but still there for you if you need it 😉 #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### SQL Strings We can almost guarantee that there is not a single dbt model or table in your database that doesn’t have at least one column of a string type. Strings are everywhere in data—they allow folks to have descriptive text field columns, use regex in their data work, and honestly, they just make the data world go ‘round. Below, we’ll unpack the different string formats you might see in a modern cloud data warehouse and common use cases for strings. #### Using SQL strings[​](#using-sql-strings "Direct link to Using SQL strings") Strings are inherent in your data—they’re the name fields that someone inputs when they sign up for an account, they represent the item someone bought from your ecommerce store, they describe the customer address, and so on. To formalize it a bit, a string type is a word, or the combination of characters that you’ll typically see encased in single quotes (ex. 'Jaffle Shop', '1234 Shire Lane', 'Plan A'). Most often, when you’re working with strings in a dbt model or query, you’re: * Changing the casing (uppering/lowering) to create some standard for your string type columns in your data warehouse * Concatenating strings together to create more robust, uniform, or descriptive string values * Unnesting JSON or more complex structured data objects and converting those values to explicit strings * Casting a column of a different type to a string for better compatibility or usability in a BI tool * Filtering queries on certain string values * Creating a new string column type based off a CASE WHEN statement to bucket data by * Splitting a string into a substring This is not an exhaustive list of string functionality or use cases, but contains some common scenarios analytics engineers face day-to-day. ##### Strings in an example query[​](#strings-in-an-example-query "Direct link to Strings in an example query") ```sql select date_trunc('month', order_date)::string as order_month, round(avg(amount)) as avg_order_amount from {{ ref('orders') }} where status not in ('returned', 'return_pending') group by 1 ``` This query using the [Jaffle Shop’s](https://github.com/dbt-labs/jaffle_shop) `orders` table will return the order month as a string and rounded order amount for only orders with statuses not equal to `returned` or `pending` string values: | order\_month | avg\_order\_amount | | ------------ | ------------------ | | 2018-01-01 | 18 | | 2018-02-01 | 15 | | 2018-03-01 | 18 | | 2018-04-01 | 17 | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | #### String support in Snowflake, Databricks, BigQuery, and Redshift[​](#string-support-in-snowflake-databricks-bigquery-and-redshift "Direct link to String support in Snowflake, Databricks, BigQuery, and Redshift") Snowflake, Databricks, Google BigQuery, and Amazon Redshift all support the string [data type](https://docs.getdbt.com/sql-reference/data-types.md#string-data-types). They may have slightly varying sub-types for strings; some data warehouses such as Snowflake and Redshift support text, char, and character string types which typically differ in byte length in comparison to the generic string type. Again, since most string type columns are inherent in your data, you’ll likely be ok using generic varchar or strings for casting, but it never hurts to read up on the docs specific to your data warehouse string support! #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### SQL SUM The SQL SUM function is handy and ever-present in data work. Let’s unpack what it is, how to use it, and why it's valuable. Jumping into it, the SUM aggregate function allows you to calculate the sum of a numeric column or across a set of rows for a column. Ultimately, the SUM function is incredibly useful for calculating meaningful business metrics, such as Lifetime Value (LTV), and creating key numeric fields in [`fct_` and `dim_` models](https://www.getdbt.com/blog/guide-to-dimensional-modeling). #### How to use the SUM function in a query[​](#how-to-use-the-sum-function-in-a-query "Direct link to How to use the SUM function in a query") Use the following syntax in a query to find the sum of a numeric field: `sum()` Since SUM is an aggregate function, you’ll need a GROUP BY statement in your query if you’re looking at counts broken out by dimension(s). If you’re calculating the standalone sum of a numeric field without the need to break them down by another field, you don’t need a GROUP BY statement. SUM can also be used as a window function to operate across specified or partitioned rows. You can additionally pass a DISTINCT statement into a SUM function to only sum distinct values in a column. Let’s take a look at a practical example using the SUM function below. ##### SUM example[​](#sum-example "Direct link to SUM example") The following example is querying from a sample dataset created by dbt Labs called [jaffle\_shop](https://github.com/dbt-labs/jaffle_shop): ```sql select customer_id, sum(order_amount) as all_orders_amount from {{ ref('orders') }} group by 1 limit 3 ``` This simple query is returning the summed amount of all orders for a customer in the Jaffle Shop’s `orders` table: | customer\_id | all\_orders\_amount | | ------------ | ------------------- | | 1 | 33 | | 3 | 65 | | 94 | 24 | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | #### SQL SUM function syntax in Snowflake, Databricks, BigQuery, and Redshift[​](#sql-sum-function-syntax-in-snowflake-databricks-bigquery-and-redshift "Direct link to SQL SUM function syntax in Snowflake, Databricks, BigQuery, and Redshift") All modern data warehouses support the ability to use the SUM function (and follow the same syntax). #### SUM function use cases[​](#sum-function-use-cases "Direct link to SUM function use cases") We most commonly see queries using SUM to: * Calculate the cumulative sum of a metric across a customer/user id using a CASE WHEN statement (ex. `sum(case when order_array is not null then 1 else 0 end) as count_orders`) * Create [dbt metrics](https://docs.getdbt.com/docs/build/build-metrics-intro.md) for key business values, such as LTV * Calculate the total of a field across a dimension (ex. total session time, total time spent per ticket) that you typically use in `fct_` or `dim_` models * Summing clicks, spend, impressions, and other key ad reporting metrics in tables from ad platforms This isn’t an extensive list of where your team may be using SUM throughout your development work, dbt models, and BI tool logic, but it contains some common scenarios analytics engineers face day-to-day. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### SQL TRIM We’ve been there: pesky blank spaces, weird, inconsistent formats, or unaccountable asterisks hiding at the end of your column value—[strings](https://docs.getdbt.com/sql-reference/strings.md) are one of the most variable data types in your datasets. They likely lack a uniform casing, vary in length, and will inevitably have characters you need to trim from them. Introducing: the SQL TRIM function, which removes the leading and trailing characters of a string. By default, it removes the blank space character from the beginning and end of a string. #### How to use the SQL TRIM function[​](#how-to-use-the-sql-trim-function "Direct link to How to use the SQL TRIM function") The syntax for using TRIM function looks like the following: ```sql trim( [, ]) ``` Like we said earlier, the default `` is a blank space, such that if you were to `trim(' string with extra leading space')` it would return `'string with extra leading space'`. You can explicitly specify single characters or a pattern to trim from your strings. ##### SQL TRIM function example[​](#sql-trim-function-example "Direct link to SQL TRIM function example") ```sql select first_name, concat('*', first_name, '**') as test_string, trim(test_string, '*') as back_to_first_name from {{ ref('customers') }} limit 3 ``` After running this query, the resulting `orders` table will look like this: | first\_name | test\_string | back\_to\_first\_name | | ----------- | ------------ | --------------------- | | Julia | *Julia*\* | Julia | | Max | *Max*\* | Max | | Laura | *Laura*\* | Laura | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | In this query, you’re adding superfluous asterisks to a string using the [CONCAT function](https://docs.getdbt.com/sql-reference/concat.md) and recleaning it using the TRIM function. Even though I specified one asterisk in the TRIM function itself, it recognizes that as a pattern to remove from the beginning and end of a string, which is why the double asterisks (\*\*) were removed from the end of the `test_string` column. #### SQL TRIM function syntax in Snowflake, Databricks, BigQuery, and Redshift[​](#sql-trim-function-syntax-in-snowflake-databricks-bigquery-and-redshift "Direct link to SQL TRIM function syntax in Snowflake, Databricks, BigQuery, and Redshift") [Google BigQuery](https://cloud.google.com/bigquery/docs/reference/standard-sql/string_functions#trim), [Amazon Redshift](https://docs.aws.amazon.com/redshift/latest/dg/r_TRIM.html), [Snowflake](https://docs.snowflake.com/en/sql-reference/functions/trim.html), and [Databricks](https://docs.databricks.com/sql/language-manual/functions/trim.html) all support the ability to use the TRIM function. In addition, the syntax to trim strings is the same across all of them using the TRIM function. These data warehouses also support the RTRIM and LTRIM functions, which allow you to only trim characters from the right side and left side of a string, respectively. #### TRIM function use cases[​](#trim-function-use-cases "Direct link to TRIM function use cases") If string values in your raw data have extra white spaces or miscellaneous characters, you’ll leverage the TRIM (and subset RTRIM AND LTRIM) functions to help you quickly remove them. You’ll likely do this cleanup in [staging models](https://docs.getdbt.com/best-practices/how-we-structure/2-staging.md), where you’re probably standardizing casing and doing other minor formatting changes to string values, so you can use a clean and consistent format across your downstream models. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### SQL UPPER UPPER is the counterpart to [LOWER](https://docs.getdbt.com/sql-reference/lower.md) (who would have guessed?)—and they’re probably the most intuitive of SQL functions. Using the UPPER function on a string value will return the input as an all-uppercase string. It’s an effective way to create expected capitalization for certain string values across your data. #### How to use the SQL UPPER function[​](#how-to-use-the-sql-upper-function "Direct link to How to use the SQL UPPER function") The syntax for using the UPPER function looks like the following: ```sql upper() ``` Executing this command in a SELECT statement will return the uppercase version of the input string value. You may additionally use the UPPER function in WHERE clauses and on join values. Below, we’ll walk through a practical example using the UPPER function. ##### SQL UPPER function example[​](#sql-upper-function-example "Direct link to SQL UPPER function example") You can uppercase the first name of the [Jaffle Shop’s](https://github.com/dbt-labs/jaffle_shop) `customers` model using the following code: ```sql select customer_id, upper(first_name) as first_name, last_name from {{ ref('customers') }} ``` After running this query, the `customers` table will look a little something like this: | customer\_id | first\_name | last\_name | | ------------ | ----------- | ---------- | | 1 | MICHAEL | P. | | 2 | SHAWN | M. | | 3 | KATHLEEN | P. | Search table... | | | | | | | ---------------- | - | - | - | - | | Loading table... | | | | | Now, all characters in the `first_name` are uppercase (and `last_name` are unchanged). > Changing string columns to uppercase to create uniformity across data sources typically happens in our [dbtodels](https://docs.getdbt.com/best-practices/how-we-structure/2-staging.md). There are a few reasons for that: data cleanup and standardization, such as aliasing, casting, and lower or upper casing, should ideally happen in staging models to create downstream uniformity and improve downstream performance. #### SQL UPPER function syntax in Snowflake, Databricks, BigQuery, and Redshift[​](#sql-upper-function-syntax-in-snowflake-databricks-bigquery-and-redshift "Direct link to SQL UPPER function syntax in Snowflake, Databricks, BigQuery, and Redshift") Google BigQuery, Amazon Redshift, Snowflake, Postgres, and Databricks all support the UPPER function. In addition, the syntax to use the UPPER function is the same across all of them. #### UPPER function use cases[​](#upper-function-use-cases "Direct link to UPPER function use cases") By creating a consistent capitalization format (upper or lowercase) for all string values in your data models, you therefore create some expectations for business users in your BI tool. * Uppercase country codes in data sources to meet user expectations * Create a consistent capitalization format for string values in your data models, also creating expectations for business users in your BI tool There will most likely never be 100% consistency in your data models, but doing all that you can to mitigate that chaos will make your life and the life of your business users hopefully a little easier. Use the UPPER function to create a consistent casing for all strings in your data sources. #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ### SQL WHERE If the humble [SELECT statement](https://docs.getdbt.com/sql-reference/select.md) is an analytics engineer kitchen knife, the WHERE clause is the corresponding knife sharpener: no (good) cooking (or data modeling) is happening without it. The WHERE clause is a fundamental SQL statement—it allows you to appropriately filter your data models and queries, so you can look at specific subsets of data based on your requirements. #### How to use the SQL WHERE clause[​](#how-to-use-the-sql-where-clause "Direct link to How to use the SQL WHERE clause") The syntax for using WHERE clause in a SELECT statement looks like the following: ```sql select order_id, customer_id, amount from {{ ref('orders') }} where status != 'returned' ``` In this query, you’re filtering for any order from the [Jaffle Shop’s](https://github.com/dbt-labs/jaffle_shop) `orders` model whose status is not `returned` by adding a WHERE clause after the FROM statement. You could additionally filter on string, numeric, date, or other data types to meet your query conditions. You will likely see WHERE clauses show up 99.99% of the time in a typical query or dbt model. The other .01% is probably in a DML statement, such as DELETE or ALTER, to modify specific rows in tables. #### SQL WHERE clause syntax in Snowflake, Databricks, BigQuery, and Redshift[​](#sql-where-clause-syntax-in-snowflake-databricks-bigquery-and-redshift "Direct link to SQL WHERE clause syntax in Snowflake, Databricks, BigQuery, and Redshift") Since the WHERE clause is a SQL fundamental, Google BigQuery, Amazon Redshift, Snowflake, and Databricks all support the ability to filter queries and data models using it. In addition, the syntax to round is the same across all of them using the WHERE clause. #### SQL WHERE clause use cases[​](#sql-where-clause-use-cases "Direct link to SQL WHERE clause use cases") WHERE clauses are probably some of the most widely used SQL capabilities, right after SELECT and FROM statements. Below is a non-exhaustive list of where you’ll commonly see WHERE clauses throughout dbt projects and data work: * Removing source-deleted rows from staging models to increase accuracy and improve downstream model performance * Filtering out employee records from models * Performing ad-hoc analysis on specific rows or users, either in a dbt model, BI tool, or ad-hoc query * Paired with IN, LIKE, NOT IN clauses to create more generalized or a group of specific requirements to filter on #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. --- ## Tags ### 10 docs tagged with "scheduler" #### [About state-aware orchestration](https://docs.getdbt.com/docs/deploy/state-aware-about.md) Learn about how state-aware orchestration automatically determines which models to build by detecting changes in code or data every time a job runs. --- ### 18 docs tagged with "Metrics" #### [About MetricFlow](https://docs.getdbt.com/docs/build/about-metricflow.md) Learn more about MetricFlow and its key concepts --- ### 18 docs tagged with "Quickstart" #### [Analyze your data in dbt](https://docs.getdbt.com/guides/analyze-your-data.md) Introduction --- ### 2 docs tagged with "Analyst" #### [Analyze your data in dbt](https://docs.getdbt.com/guides/analyze-your-data.md) Introduction --- ### 2 docs tagged with "API" #### [JDBC](https://docs.getdbt.com/docs/dbt-cloud-apis/sl-jdbc.md) Integrate and use the JDBC API to query your metrics. --- ### 2 docs tagged with "Best practices" #### [How to use prompts for dbt Copilot](https://docs.getdbt.com/guides/prompt-cookbook.md) A cookbook of prompts and real-world examples to use dbt Copilot efficiently. --- ### 2 docs tagged with "CI" #### [Customizing CI/CD with custom pipelines](https://docs.getdbt.com/guides/custom-cicd-pipelines.md) Learn the benefits of version-controlled analytics code and custom pipelines in dbt for enhanced code testing and workflow automation during the development process. --- ### 2 docs tagged with "dbt Insights" #### [Access the dbt Insights interface](https://docs.getdbt.com/docs/explore/access-dbt-insights.md) Learn how to access the dbt Insights interface and run queries --- ### 2 docs tagged with "Redshift" #### [Quickstart for dbt and Redshift](https://docs.getdbt.com/guides/redshift.md) Introduction --- ### 2 docs tagged with "Troubleshooting" #### [Debug errors](https://docs.getdbt.com/guides/debug-errors.md) Learn about errors and the art of debugging them. --- ### 2 docs tagged with "Upgrade" #### [Upgrade to Fusion part 1: Preparing to upgrade](https://docs.getdbt.com/guides/prepare-fusion-upgrade.md) Introduction --- ### 20 docs tagged with "dbt Core" #### [BigQuery configurations](https://docs.getdbt.com/reference/resource-configs/bigquery-configs.md) Reference guide for Big Query configurations in dbt. --- ### 25 docs tagged with "dbt platform" #### [Airflow and dbt](https://docs.getdbt.com/guides/airflow-and-dbt-cloud.md) Introduction --- ### 3 docs tagged with "Agents" #### [Analyst agent](https://docs.getdbt.com/docs/dbt-ai/analyst-agent.md) Chat with your data using the Analyst agent powered by the dbt Semantic Layer --- ### 3 docs tagged with "APIs" #### [GraphQL](https://docs.getdbt.com/docs/dbt-cloud-apis/sl-graphql.md) Integrate and use the GraphQL API to query your metrics. --- ### 3 docs tagged with "BigQuery" #### [BigQuery configurations](https://docs.getdbt.com/reference/resource-configs/bigquery-configs.md) Reference guide for Big Query configurations in dbt. --- ### 3 docs tagged with "cost insights" #### [Cost Insights in the dbt platform](https://docs.getdbt.com/docs/explore/cost-insights.md) Track warehouse compute costs and understand the impact of optimizations across your dbt projects and models. --- ### 3 docs tagged with "IDE" #### [About the Studio IDE](https://docs.getdbt.com/docs/cloud/studio-ide/develop-in-studio.md) Develop, test, run, and build in the Studio IDE. You can compile dbt code into SQL and run it against your database directly --- ### 3 docs tagged with "Platform" #### [Quickstart for dbt and BigQuery](https://docs.getdbt.com/guides/bigquery.md) Introduction --- ### 4 docs tagged with "cost savings" #### [Cost Insights in the dbt platform](https://docs.getdbt.com/docs/explore/cost-insights.md) Track warehouse compute costs and understand the impact of optimizations across your dbt projects and models. --- ### 4 docs tagged with "dbt Fusion engine" #### [Fusion package upgrade guide](https://docs.getdbt.com/guides/fusion-package-compat.md) Learn how to upgrade your packages to be compatible with the dbt Fusion engine. --- ### 4 docs tagged with "dbt Fusion" #### [BigQuery configurations](https://docs.getdbt.com/reference/resource-configs/bigquery-configs.md) Reference guide for Big Query configurations in dbt. --- ### 4 docs tagged with "models built" #### [Cost Insights in the dbt platform](https://docs.getdbt.com/docs/explore/cost-insights.md) Track warehouse compute costs and understand the impact of optimizations across your dbt projects and models. --- ### 4 docs tagged with "Orchestration" #### [Airflow and dbt](https://docs.getdbt.com/guides/airflow-and-dbt-cloud.md) Introduction --- ### 40 docs tagged with "Semantic Layer" #### [About dbt Insights](https://docs.getdbt.com/docs/explore/dbt-insights.md) Learn how to query data and perform exploratory data analysis using dbt Insights --- ### 5 docs tagged with "AI" #### [Analyst agent](https://docs.getdbt.com/docs/dbt-ai/analyst-agent.md) Chat with your data using the Analyst agent powered by the dbt Semantic Layer --- ### 5 docs tagged with "Migration" #### [Migrate from dbt-spark to dbt-databricks](https://docs.getdbt.com/guides/migrate-from-spark-to-databricks.md) Learn how to migrate from dbt-spark to dbt-databricks. --- ### 5 docs tagged with "SAO" #### [About state-aware orchestration](https://docs.getdbt.com/docs/deploy/state-aware-about.md) Learn about how state-aware orchestration automatically determines which models to build by detecting changes in code or data every time a job runs. --- ### 5 docs tagged with "Snowflake" #### [Leverage dbt to generate analytics and ML-ready pipelines with SQL and Python with Snowflake](https://docs.getdbt.com/guides/dbt-python-snowpark.md) Leverage dbt to generate analytics and ML-ready pipelines with SQL and Python with Snowflake --- ### 6 docs tagged with "Databricks" #### [Databricks configurations](https://docs.getdbt.com/reference/resource-configs/databricks-configs.md) Configuring tables --- ### 6 docs tagged with "Webhooks" #### [Create Datadog events from dbt results](https://docs.getdbt.com/guides/serverless-datadog.md) Configure a serverless app to add dbt events to Datadog logs. --- ### One doc tagged with "Adapter creation" #### [Build, test, document, and promote adapters](https://docs.getdbt.com/guides/adapter-creation.md) Create an adapter that connects dbt to you platform, and learn how to maintain and version that adapter. --- ### One doc tagged with "Amazon" #### [Quickstart for dbt and Amazon Athena](https://docs.getdbt.com/guides/athena.md) Introduction --- ### One doc tagged with "Athena" #### [Quickstart for dbt and Amazon Athena](https://docs.getdbt.com/guides/athena.md) Introduction --- ### One doc tagged with "BigFrames" #### [Using BigQuery DataFrames with dbt Python models](https://docs.getdbt.com/guides/dbt-python-bigframes.md) Use this guide to help you set up dbt with BigQuery DataFrames (BigFrames). --- ### One doc tagged with "Canvas" #### [Quickstart for dbt Canvas](https://docs.getdbt.com/guides/canvas.md) Introduction --- ### One doc tagged with "Catalog" #### [Quickstart for the dbt Catalog workshop](https://docs.getdbt.com/guides/explorer-quickstart.md) Use this guide to build and define metrics, set up the dbt Semantic Layer, and query them using Google Sheets. --- ### One doc tagged with "cost optimization" #### [Set up Cost Insights](https://docs.getdbt.com/docs/explore/set-up-cost-insights.md) Learn how to set up Cost Insights to track warehouse compute costs and view realized savings from state-aware orchestration across your dbt projects and models. --- ### One doc tagged with "cost optimizations" #### [Cost Insights in the dbt platform](https://docs.getdbt.com/docs/explore/cost-insights.md) Track warehouse compute costs and understand the impact of optimizations across your dbt projects and models. --- ### One doc tagged with "cost reduction" #### [Set up Cost Insights](https://docs.getdbt.com/docs/explore/set-up-cost-insights.md) Learn how to set up Cost Insights to track warehouse compute costs and view realized savings from state-aware orchestration across your dbt projects and models. --- ### One doc tagged with "cost reductions" #### [Cost Insights in the dbt platform](https://docs.getdbt.com/docs/explore/cost-insights.md) Track warehouse compute costs and understand the impact of optimizations across your dbt projects and models. --- ### One doc tagged with "dbt Cloud" #### [Quickstart for the dbt Fusion engine](https://docs.getdbt.com/guides/fusion.md) Introduction --- ### One doc tagged with "dbt Copilot" #### [How to use prompts for dbt Copilot](https://docs.getdbt.com/guides/prompt-cookbook.md) A cookbook of prompts and real-world examples to use dbt Copilot efficiently. --- ### One doc tagged with "Dremio" #### [Build a data lakehouse with dbt Core and Dremio Cloud](https://docs.getdbt.com/guides/build-dremio-lakehouse.md) Learn how to build a data lakehouse with dbt Core and Dremio Cloud. --- ### One doc tagged with "Explorer" #### [Quickstart for the dbt Catalog workshop](https://docs.getdbt.com/guides/explorer-quickstart.md) Use this guide to build and define metrics, set up the dbt Semantic Layer, and query them using Google Sheets. --- ### One doc tagged with "Fusion" #### [Migrate to the latest YAML spec](https://docs.getdbt.com/docs/build/latest-metrics-spec.md) Learn how to migrate from the legacy metrics spec to the latest metrics spec. --- ### One doc tagged with "GCP" #### [Using BigQuery DataFrames with dbt Python models](https://docs.getdbt.com/guides/dbt-python-bigframes.md) Use this guide to help you set up dbt with BigQuery DataFrames (BigFrames). --- ### One doc tagged with "Git" #### [How to migrate git providers](https://docs.getdbt.com/faqs/Git/git-migration.md) Learn how to migrate git providers in dbt with minimal disruption. --- ### One doc tagged with "Google" #### [Using BigQuery DataFrames with dbt Python models](https://docs.getdbt.com/guides/dbt-python-bigframes.md) Use this guide to help you set up dbt with BigQuery DataFrames (BigFrames). --- ### One doc tagged with "Governance" #### [Build your metrics](https://docs.getdbt.com/docs/build/build-metrics-intro.md) Learn about MetricFlow and build your metrics with semantic models --- ### One doc tagged with "Intelligence" #### [dbt AI and intelligence](https://docs.getdbt.com/docs/dbt-ai/about-dbt-ai.md) Learn about dbt AI and intelligence --- ### One doc tagged with "Jan-1-2020" #### [Changelog 2019 and 2020](https://docs.getdbt.com/docs/dbt-versions/release-notes/dbt-cloud-changelog-2019-2020.md) 2019 and 2020 Changelog for the dbt Cloud application --- ### One doc tagged with "Jan-1-2021" #### [Changelog 2021](https://docs.getdbt.com/docs/dbt-versions/release-notes/dbt-cloud-changelog-2021.md) 2021 Changelog for the dbt Cloud application --- ### One doc tagged with "Jinja" #### [Use Jinja to improve your SQL code](https://docs.getdbt.com/guides/using-jinja.md) Learn how to improve your SQL code using Jinja. --- ### One doc tagged with "Model" #### [Quickstart for dbt Canvas](https://docs.getdbt.com/guides/canvas.md) Introduction --- ### One doc tagged with "SQL" #### [Refactoring legacy SQL to dbt](https://docs.getdbt.com/guides/refactoring-legacy-sql.md) This guide walks through refactoring a long SQL query (perhaps from a stored procedure) into modular dbt data models. --- ### One doc tagged with "Studio" #### [Developer agent](https://docs.getdbt.com/docs/dbt-ai/developer-agent.md) Use the Developer agent to write or refactor dbt models from natural language, validate with dbt Fusion Engine, and run against your warehouse with full context. --- ### One doc tagged with "Teradata" #### [Quickstart for dbt and Teradata](https://docs.getdbt.com/guides/teradata.md) Introduction --- ### One doc tagged with "v0.5.0" #### [Changelog 2019 and 2020](https://docs.getdbt.com/docs/dbt-versions/release-notes/dbt-cloud-changelog-2019-2020.md) 2019 and 2020 Changelog for the dbt Cloud application --- ### One doc tagged with "v1.1.01" #### [Changelog 2019 and 2020](https://docs.getdbt.com/docs/dbt-versions/release-notes/dbt-cloud-changelog-2019-2020.md) 2019 and 2020 Changelog for the dbt Cloud application --- ### One doc tagged with "v1.1.02" #### [Changelog 2019 and 2020](https://docs.getdbt.com/docs/dbt-versions/release-notes/dbt-cloud-changelog-2019-2020.md) 2019 and 2020 Changelog for the dbt Cloud application --- ### One doc tagged with "v1.1.03" #### [Changelog 2019 and 2020](https://docs.getdbt.com/docs/dbt-versions/release-notes/dbt-cloud-changelog-2019-2020.md) 2019 and 2020 Changelog for the dbt Cloud application --- ### One doc tagged with "v1.1.04" #### [Changelog 2019 and 2020](https://docs.getdbt.com/docs/dbt-versions/release-notes/dbt-cloud-changelog-2019-2020.md) 2019 and 2020 Changelog for the dbt Cloud application --- ### One doc tagged with "v1.1.05" #### [Changelog 2019 and 2020](https://docs.getdbt.com/docs/dbt-versions/release-notes/dbt-cloud-changelog-2019-2020.md) 2019 and 2020 Changelog for the dbt Cloud application --- ### One doc tagged with "v1.1.06" #### [Changelog 2019 and 2020](https://docs.getdbt.com/docs/dbt-versions/release-notes/dbt-cloud-changelog-2019-2020.md) 2019 and 2020 Changelog for the dbt Cloud application --- ### One doc tagged with "v1.1.07" #### [Changelog 2019 and 2020](https://docs.getdbt.com/docs/dbt-versions/release-notes/dbt-cloud-changelog-2019-2020.md) 2019 and 2020 Changelog for the dbt Cloud application --- ### One doc tagged with "v1.1.08" #### [Changelog 2019 and 2020](https://docs.getdbt.com/docs/dbt-versions/release-notes/dbt-cloud-changelog-2019-2020.md) 2019 and 2020 Changelog for the dbt Cloud application --- ### One doc tagged with "v1.1.09" #### [Changelog 2019 and 2020](https://docs.getdbt.com/docs/dbt-versions/release-notes/dbt-cloud-changelog-2019-2020.md) 2019 and 2020 Changelog for the dbt Cloud application --- ### One doc tagged with "v1.1.10" #### [Changelog 2019 and 2020](https://docs.getdbt.com/docs/dbt-versions/release-notes/dbt-cloud-changelog-2019-2020.md) 2019 and 2020 Changelog for the dbt Cloud application --- ### One doc tagged with "v1.1.11" #### [Changelog 2019 and 2020](https://docs.getdbt.com/docs/dbt-versions/release-notes/dbt-cloud-changelog-2019-2020.md) 2019 and 2020 Changelog for the dbt Cloud application --- ### One doc tagged with "v1.1.12" #### [Changelog 2019 and 2020](https://docs.getdbt.com/docs/dbt-versions/release-notes/dbt-cloud-changelog-2019-2020.md) 2019 and 2020 Changelog for the dbt Cloud application --- ### One doc tagged with "v1.1.13" #### [Changelog 2019 and 2020](https://docs.getdbt.com/docs/dbt-versions/release-notes/dbt-cloud-changelog-2019-2020.md) 2019 and 2020 Changelog for the dbt Cloud application --- ### One doc tagged with "v1.1.14" #### [Changelog 2019 and 2020](https://docs.getdbt.com/docs/dbt-versions/release-notes/dbt-cloud-changelog-2019-2020.md) 2019 and 2020 Changelog for the dbt Cloud application --- ### One doc tagged with "v1.1.15" #### [Changelog 2019 and 2020](https://docs.getdbt.com/docs/dbt-versions/release-notes/dbt-cloud-changelog-2019-2020.md) 2019 and 2020 Changelog for the dbt Cloud application --- ### One doc tagged with "v1.1.16" #### [Changelog 2019 and 2020](https://docs.getdbt.com/docs/dbt-versions/release-notes/dbt-cloud-changelog-2019-2020.md) 2019 and 2020 Changelog for the dbt Cloud application --- ### One doc tagged with "v1.1.18" #### [Changelog 2021](https://docs.getdbt.com/docs/dbt-versions/release-notes/dbt-cloud-changelog-2021.md) 2021 Changelog for the dbt Cloud application --- ### One doc tagged with "v1.1.19" #### [Changelog 2021](https://docs.getdbt.com/docs/dbt-versions/release-notes/dbt-cloud-changelog-2021.md) 2021 Changelog for the dbt Cloud application --- ### One doc tagged with "v1.1.20" #### [Changelog 2021](https://docs.getdbt.com/docs/dbt-versions/release-notes/dbt-cloud-changelog-2021.md) 2021 Changelog for the dbt Cloud application --- ### One doc tagged with "v1.1.21" #### [Changelog 2021](https://docs.getdbt.com/docs/dbt-versions/release-notes/dbt-cloud-changelog-2021.md) 2021 Changelog for the dbt Cloud application --- ### One doc tagged with "v1.1.22" #### [Changelog 2021](https://docs.getdbt.com/docs/dbt-versions/release-notes/dbt-cloud-changelog-2021.md) 2021 Changelog for the dbt Cloud application --- ### One doc tagged with "v1.1.23" #### [Changelog 2021](https://docs.getdbt.com/docs/dbt-versions/release-notes/dbt-cloud-changelog-2021.md) 2021 Changelog for the dbt Cloud application --- ### One doc tagged with "v1.1.24" #### [Changelog 2021](https://docs.getdbt.com/docs/dbt-versions/release-notes/dbt-cloud-changelog-2021.md) 2021 Changelog for the dbt Cloud application --- ### One doc tagged with "v1.1.25" #### [Changelog 2021](https://docs.getdbt.com/docs/dbt-versions/release-notes/dbt-cloud-changelog-2021.md) 2021 Changelog for the dbt Cloud application --- ### One doc tagged with "v1.1.26" #### [Changelog 2021](https://docs.getdbt.com/docs/dbt-versions/release-notes/dbt-cloud-changelog-2021.md) 2021 Changelog for the dbt Cloud application --- ### One doc tagged with "v1.1.27" #### [Changelog 2021](https://docs.getdbt.com/docs/dbt-versions/release-notes/dbt-cloud-changelog-2021.md) 2021 Changelog for the dbt Cloud application --- ### One doc tagged with "v1.1.28" #### [Changelog 2021](https://docs.getdbt.com/docs/dbt-versions/release-notes/dbt-cloud-changelog-2021.md) 2021 Changelog for the dbt Cloud application --- ### One doc tagged with "v1.1.29" #### [Changelog 2021](https://docs.getdbt.com/docs/dbt-versions/release-notes/dbt-cloud-changelog-2021.md) 2021 Changelog for the dbt Cloud application --- ### One doc tagged with "v1.1.30" #### [Changelog 2021](https://docs.getdbt.com/docs/dbt-versions/release-notes/dbt-cloud-changelog-2021.md) 2021 Changelog for the dbt Cloud application --- ### One doc tagged with "v1.1.31" #### [Changelog 2021](https://docs.getdbt.com/docs/dbt-versions/release-notes/dbt-cloud-changelog-2021.md) 2021 Changelog for the dbt Cloud application --- ### One doc tagged with "v1.1.32" #### [Changelog 2021](https://docs.getdbt.com/docs/dbt-versions/release-notes/dbt-cloud-changelog-2021.md) 2021 Changelog for the dbt Cloud application --- ### One doc tagged with "v1.1.33" #### [Changelog 2021](https://docs.getdbt.com/docs/dbt-versions/release-notes/dbt-cloud-changelog-2021.md) 2021 Changelog for the dbt Cloud application --- ### One doc tagged with "v1.1.34" #### [Changelog 2021](https://docs.getdbt.com/docs/dbt-versions/release-notes/dbt-cloud-changelog-2021.md) 2021 Changelog for the dbt Cloud application --- ### One doc tagged with "v1.1.35" #### [Changelog 2021](https://docs.getdbt.com/docs/dbt-versions/release-notes/dbt-cloud-changelog-2021.md) 2021 Changelog for the dbt Cloud application --- ### One doc tagged with "v1.1.36" #### [Changelog 2021](https://docs.getdbt.com/docs/dbt-versions/release-notes/dbt-cloud-changelog-2021.md) 2021 Changelog for the dbt Cloud application --- ### One doc tagged with "v1.1.37" #### [Changelog 2021](https://docs.getdbt.com/docs/dbt-versions/release-notes/dbt-cloud-changelog-2021.md) 2021 Changelog for the dbt Cloud application --- ### One doc tagged with "v1.1.38" #### [Changelog 2021](https://docs.getdbt.com/docs/dbt-versions/release-notes/dbt-cloud-changelog-2021.md) 2021 Changelog for the dbt Cloud application --- ### One doc tagged with "v1.1.39" #### [Changelog 2021](https://docs.getdbt.com/docs/dbt-versions/release-notes/dbt-cloud-changelog-2021.md) 2021 Changelog for the dbt Cloud application --- ### One doc tagged with "v1.1.40" #### [Changelog 2021](https://docs.getdbt.com/docs/dbt-versions/release-notes/dbt-cloud-changelog-2021.md) 2021 Changelog for the dbt Cloud application --- ### One doc tagged with "v1.1.41" #### [Changelog 2021](https://docs.getdbt.com/docs/dbt-versions/release-notes/dbt-cloud-changelog-2021.md) 2021 Changelog for the dbt Cloud application --- ### Tags #### A[​](#A "Direct link to A") * [Adapter creation1](https://docs.getdbt.com/tags/adapter-creation.md) * [Agents3](https://docs.getdbt.com/tags/agents.md) * [AI5](https://docs.getdbt.com/tags/ai.md) * [Amazon1](https://docs.getdbt.com/tags/amazon.md) * [Analyst2](https://docs.getdbt.com/tags/analyst.md) * [API2](https://docs.getdbt.com/tags/api.md) * [APIs3](https://docs.getdbt.com/tags/ap-is.md) * [Athena1](https://docs.getdbt.com/tags/athena.md) *** --- ## Terms ### hover-terms #### Was this page helpful? YesNo [Privacy policy](https://www.getdbt.com/cloud/privacy-policy)[Create a GitHub issue](https://github.com/dbt-labs/docs.getdbt.com/issues) This site is protected by reCAPTCHA and the Google [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms) apply. ---